SSE API

History

Connection Info

Item	Value
Base path	`https://vas-poc.vurbo.ai/api/v1/sse`
Protocol	HTTP + Server-Sent Events (SSE)
Data format	text/event-stream
Auth method	Header `X-API-Key: {KEY}`

Note: The browser's native EventSource API does not support custom headers. Use the fetch API with ReadableStream, or use an SSE client library that supports headers.

Endpoint Overview

Method	Endpoint	Description
GET	`/api/v1/sse/history/transcribe/{taskId}`	Retrieve historical conversation records

GET /api/v1/sse/history/transcribe/{taskId}

Description

Loads the complete conversation record for a specified task, including all sentences and the summary. The data is sent one item at a time over an SSE stream.

Difference from the Transcript Download API (GET /api/v1/tasks/{taskId}/transcript/export):
This endpoint: for progressive loading; pushes raw structured data (JSON fragments) sentence by sentence as an event stream, so the front end can render the UI progressively.
Transcript download: for offline download; returns the complete file (TXT / SRT / SBV / VTT / CSV) in one response, ready to open in subtitle software or a spreadsheet.

Use Cases

View the recording details page
Load historical transcripts

Authentication

Header: X-API-Key (see Authentication)

Request Parameters

Parameter	Location	Type	Required	Description
`taskId`	path	string	Yes	Recording ID (UUID)

Request Example

curl -N "https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/550e8400-e29b-41d4-a716-446655440000" \
  -H "X-API-Key: vas_aB3dE5fG7hI9jK1lM3nO5pQ7rS9tU1vW"

// Use the fetch API (because EventSource does not support headers)
async function connectSSE(taskId, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );
  const reader = response.body.getReader();
  // ... handle SSE events
}

Event Sequence

1. connected        → connection confirmation
2. init_metadata    → send task metadata
3. init_sentence    → send sentences one by one (repeats N times)
4. init_summary     → send summary
5. init_done        → initialization complete

Event Formats

connected

{
  "message": "History service connected (recordingId: xxx)"
}

init_metadata

{
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "title": "Meeting Notes",
  "created_at": "2026-02-23T10:00:00Z",
  "type": "transcribe",
  "has_speaker_diarization": true,
  "transcription_languages": ["zh-TW"],
  "translation_languages": ["en-US"],
  "summary_template": "general",
  "summary_language": "zh-TW",
  "speaker_aliases": {"speaker_1": "Manager Wang"}
}

Field	Type	Description
`task_id`	string	Task ID (UUID)
`title`	string	Task title
`created_at`	string	Creation time (ISO 8601)
`type`	string	Recording type
`has_speaker_diarization`	boolean	Whether speaker diarization is enabled
`transcription_languages`	array\|null	Array of transcription languages (BCP 47, e.g. `["zh-TW"]`), up to 2
`translation_languages`	array\|null	Array of translation languages (BCP 47, e.g. `["en-US", "ja-JP"]`), up to 8
`summary_template`	string\|null	Summary template slug (e.g. `general`, `meeting`); `null` if not specified
`summary_language`	string\|null	Summary output language (BCP 47, e.g. `zh-TW`, `en-US`); `null` if not specified
`speaker_aliases`	object	Mapping of "original speaker ID → display name"; `{}` (an empty object, not an array) when there are no aliases. The front end uses this for duplicate-name precheck before a rename (added in v1.3.12)

init_sentence

{
  "sid": 1,
  "origin": "Hello",
  "translations": {
    "en-US": "Hello"
  },
  "start_time": "00:05",
  "speaker_id": "speaker_1",
  "speaker_label": "Manager Wang"
}

If a sentence has a translation failure, it carries an additional translation_errors field (present only when there is a failure). This lets the front end distinguish "the language was never scheduled for translation" (translations missing the key) from "it was translated but failed" (translation_errors has the key):

{
  "sid": 5,
  "origin": "Sentence with sensitive words",
  "translations": {
    "en-US": "Sensitive sentence"
  },
  "translation_errors": {
    "ja": "llm_content_filtered"
  },
  "start_time": "00:25",
  "speaker_id": "speaker_1",
  "speaker_label": "Manager Wang"
}

If the user has edited the original text of a sentence (via PATCH /api/v1/recordings/{id}/entries/{sid}), it carries additional original_text_raw and original_text_edited_at fields (present only after editing):

{
  "sid": 7,
  "origin": "Corrected text",
  "original_text_raw": "Original STT output",
  "original_text_edited_at": "2026-05-06T10:30:00.000000Z",
  "translations": { "en-US": "Corrected text" },
  "start_time": "00:35",
  "speaker_id": "speaker_1",
  "speaker_label": "Manager Wang"
}

Field	Type	Description
`sid`	number	Sentence ID
`origin`	string	Original text content (the user-corrected version if it has been edited)
`translations`	object\|null	Map of translated text (`{"language_code": "translated text"}`); `null` when there is no translation
`translation_errors`	object	Optional. Map of translation failure error codes (`{"language_code": "error_code"}`); this field is omitted when there are no failures
`original_text_raw`	string	Optional. The raw STT output text. Present only when the user has edited the sentence. The front end can use it to display an "edited" marker and offer a "restore original text" function
`original_text_edited_at`	string	Optional. The most recent edit time of the original text (ISO 8601). Appears together with `original_text_raw`
`start_time`	string	Start time (mm:ss)
`speaker_id`	string\|null	The original speaker ID (immutable and always stable, e.g. `speaker_1`). Provided as the source for `target_speaker_id` in `PATCH /speakers/reassign` (v1.5.3 reversal: previously the display name)
`speaker_label`	string\|null	The display label (the human-readable name after applying `speaker_aliases`, e.g. `Manager Wang`). Equal to `speaker_id` when there is no alias (added in v1.5.3 to replace the former display semantics of `speaker_id`)

Front-end detection: Determine whether a sentence has been edited by the presence of the field ('original_text_raw' in data or data.original_text_raw !== undefined). Do not compare origin === original_text_raw — the user may have edited the text and then changed it back to the same string; in that case the text is equal but the "edited" marker should still be shown.

v1.5.3 naming reversal: speaker_id is reversed from the display name to the original ID; a new speaker_label field holds the display label. Speaker edits (reassign / merge) always use speaker_id as the locating key. See the V1.5.3 changelog.

init_summary

In addition to the summary text text, this includes mode-aware metadata (mode / template / plain_text / prompt_snapshot), which lets the client trace the mode, effective slug, and customer prompt content (custom mode) that correspond to that summary.

v1.5.5 adds fallback_level / dropped_segments: these appear only when the summary actually went through the LLM service content-filter fallback chain (L2 neutral prompt or L3 segment trimming), for auditing and UI hints during history playback.

Example (L1 succeeds directly, no fallback):

{
  "text": "Summary content...",
  "mode": "custom",
  "template": "acme-meeting-v2",
  "plain_text": true,
  "prompt_snapshot": "Please emphasize KPIs"
}

Example (L3 triggered, generated after 2 transcript segments were trimmed):

{
  "text": "Summary content (2 segments omitted)...",
  "mode": "custom",
  "template": "acme-meeting-v2",
  "plain_text": true,
  "prompt_snapshot": "Please emphasize KPIs",
  "fallback_level": 3,
  "dropped_segments": [3, 7]
}

Field	Type	Description
`text`	string	Summary text
`mode`	string \| null	`"builtin"` / `"custom"` / `null` (null when no summary was generated)
`template`	string \| null	effective slug — builtin → built-in template slug; custom → customer slug
`plain_text`	boolean	Whether the output is plain text
`prompt_snapshot`	string \| null	Has a value only in custom mode; the prompt content passed in verbatim by the customer (the basis for reconstruction)
`fallback_level`	int (omit)	Present only when a fallback was triggered (`2` or `3`). `2` = L2 neutral prompt; `3` = L3 segment trimming. Omitted when L1 succeeds directly
`dropped_segments`	int (omit)	Present only when fallback_level=3; the indices of the trimmed transcript segments (an integer array in original order)

fallback_level / dropped_segments and prompt_snapshot are complementary: the former records the actual execution path (whether a fallback was taken), and the latter records the customer intent (the original prompt content). Even if a fallback was triggered and the customer prompt was not actually used, prompt_snapshot still preserves the original text as an audit record. See V1.5.5 changelog – LLM service content-filter automatic fallback.

init_done

{
  "totalSentences": 10
}

Field	Type	Description
`totalSentences`	number	Total number of sentences

Edge Case: No Speech Content (V1.3.7)

If the task is silent throughout, the volume is too low, there is too much noise, or the recognition language does not match the actual audio—so that the speech recognition engine recognizes no sentences—this endpoint still completes with the normal event sequence (not an sse_transcript_not_found error):

init_metadata is sent normally
init_sentence is sent 0 times (no sentences)
The text of init_summary is an empty string ""
The totalSentences of init_done is 0

This behavior applies to both sources: real-time recording (WebSocket recording ends) and file import (offline processing completes), and is aligned with the "zero recognition results" legalization behavior of the V1.3.5 import flow. The client should use totalSentences === 0 to decide whether to show a "no speech content" empty state, rather than treating it as an error branch. See File Import Guide – Behavior When Audio Cannot Be Recognized.

Specific Error Codes

Error Code	HTTP Status	Description	Recommended Action
`recording_not_found`	404	Recording not found	Verify that taskId is correct
`sse_transcript_not_found`	404	Transcript blob not found	The transcript file for the specified `taskId` does not exist or could not be accessed (does not occur under the normal flow; after V1.3.7, silence during real-time recording does not trigger this error either)

Front-End Example

async function loadHistory(taskId, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = decoder.decode(value);
    // Parse the SSE format: event: xxx\ndata: {...}\n\n
    const events = parseSSE(text);

    for (const event of events) {
      if (event.type === 'init_metadata') {
        console.log('Task info:', event.data.title);
      } else if (event.type === 'init_sentence') {
        console.log(`[${event.data.start_time}] ${event.data.origin}`);
        if (event.data.translations) {
          console.log(`Translations:`, event.data.translations);
        }
      } else if (event.type === 'init_summary') {
        console.log('Summary:', event.data.text);
      } else if (event.type === 'init_done') {
        console.log(`Load complete, ${event.data.totalSentences} sentences total`);
      }
    }
  }
}

Version: V1.5.7 Last Updated: 2026-05-20

Broadcast Viewer

Import Progress