SSE API

History

Connection Info

ItemValue
Base pathhttps://vas-poc.vurbo.ai/api/v1/sse
ProtocolHTTP + Server-Sent Events (SSE)
Data formattext/event-stream
Auth methodHeader X-API-Key: {KEY}

Note: The browser's native EventSource API does not support custom headers. Use the fetch API with ReadableStream, or use an SSE client library that supports headers.


Endpoint Overview

MethodEndpointDescription
GET/api/v1/sse/history/transcribe/{taskId}Retrieve historical conversation records

GET /api/v1/sse/history/transcribe/{taskId}

Description

Loads the complete conversation record for a specified task, including all sentences and the summary. The data is sent one item at a time over an SSE stream.

Difference from the Transcript Download API (GET /api/v1/tasks/{taskId}/transcript/export):

  • This endpoint: for progressive loading; pushes raw structured data (JSON fragments) sentence by sentence as an event stream, so the front end can render the UI progressively.
  • Transcript download: for offline download; returns the complete file (TXT / SRT / SBV / VTT / CSV) in one response, ready to open in subtitle software or a spreadsheet.

Use Cases

  • View the recording details page
  • Load historical transcripts

Authentication

Header: X-API-Key (see Authentication)

Request Parameters

ParameterLocationTypeRequiredDescription
taskIdpathstringYesRecording ID (UUID)

Request Example

curl -N "https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/550e8400-e29b-41d4-a716-446655440000" \
  -H "X-API-Key: vas_aB3dE5fG7hI9jK1lM3nO5pQ7rS9tU1vW"
// Use the fetch API (because EventSource does not support headers)
async function connectSSE(taskId, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );
  const reader = response.body.getReader();
  // ... handle SSE events
}

Event Sequence

1. connected        → connection confirmation
2. init_metadata    → send task metadata
3. init_sentence    → send sentences one by one (repeats N times)
4. init_summary     → send summary
5. init_done        → initialization complete

Event Formats


connected

{
  "message": "History service connected (recordingId: xxx)"
}

init_metadata

{
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "title": "Meeting Notes",
  "created_at": "2026-02-23T10:00:00Z",
  "type": "transcribe",
  "has_speaker_diarization": true,
  "transcription_languages": ["zh-TW"],
  "translation_languages": ["en-US"],
  "summary_template": "general",
  "summary_language": "zh-TW",
  "speaker_aliases": {"speaker_1": "Manager Wang"}
}
FieldTypeDescription
task_idstringTask ID (UUID)
titlestringTask title
created_atstringCreation time (ISO 8601)
typestringRecording type
has_speaker_diarizationbooleanWhether speaker diarization is enabled
transcription_languagesarray|nullArray of transcription languages (BCP 47, e.g. ["zh-TW"]), up to 2
translation_languagesarray|nullArray of translation languages (BCP 47, e.g. ["en-US", "ja-JP"]), up to 8
summary_templatestring|nullSummary template slug (e.g. general, meeting); null if not specified
summary_languagestring|nullSummary output language (BCP 47, e.g. zh-TW, en-US); null if not specified
speaker_aliasesobjectMapping of "original speaker ID → display name"; {} (an empty object, not an array) when there are no aliases. The front end uses this for duplicate-name precheck before a rename (added in v1.3.12)

init_sentence

{
  "sid": 1,
  "origin": "Hello",
  "translations": {
    "en-US": "Hello"
  },
  "start_time": "00:05",
  "speaker_id": "speaker_1",
  "speaker_label": "Manager Wang"
}

If a sentence has a translation failure, it carries an additional translation_errors field (present only when there is a failure). This lets the front end distinguish "the language was never scheduled for translation" (translations missing the key) from "it was translated but failed" (translation_errors has the key):

{
  "sid": 5,
  "origin": "Sentence with sensitive words",
  "translations": {
    "en-US": "Sensitive sentence"
  },
  "translation_errors": {
    "ja": "llm_content_filtered"
  },
  "start_time": "00:25",
  "speaker_id": "speaker_1",
  "speaker_label": "Manager Wang"
}

If the user has edited the original text of a sentence (via PATCH /api/v1/recordings/{id}/entries/{sid}), it carries additional original_text_raw and original_text_edited_at fields (present only after editing):

{
  "sid": 7,
  "origin": "Corrected text",
  "original_text_raw": "Original STT output",
  "original_text_edited_at": "2026-05-06T10:30:00.000000Z",
  "translations": { "en-US": "Corrected text" },
  "start_time": "00:35",
  "speaker_id": "speaker_1",
  "speaker_label": "Manager Wang"
}
FieldTypeDescription
sidnumberSentence ID
originstringOriginal text content (the user-corrected version if it has been edited)
translationsobject|nullMap of translated text ({"language_code": "translated text"}); null when there is no translation
translation_errorsobjectOptional. Map of translation failure error codes ({"language_code": "error_code"}); this field is omitted when there are no failures
original_text_rawstringOptional. The raw STT output text. Present only when the user has edited the sentence. The front end can use it to display an "edited" marker and offer a "restore original text" function
original_text_edited_atstringOptional. The most recent edit time of the original text (ISO 8601). Appears together with original_text_raw
start_timestringStart time (mm:ss)
speaker_idstring|nullThe original speaker ID (immutable and always stable, e.g. speaker_1). Provided as the source for target_speaker_id in PATCH /speakers/reassign (v1.5.3 reversal: previously the display name)
speaker_labelstring|nullThe display label (the human-readable name after applying speaker_aliases, e.g. Manager Wang). Equal to speaker_id when there is no alias (added in v1.5.3 to replace the former display semantics of speaker_id)

Front-end detection: Determine whether a sentence has been edited by the presence of the field ('original_text_raw' in data or data.original_text_raw !== undefined). Do not compare origin === original_text_raw — the user may have edited the text and then changed it back to the same string; in that case the text is equal but the "edited" marker should still be shown.

v1.5.3 naming reversal: speaker_id is reversed from the display name to the original ID; a new speaker_label field holds the display label. Speaker edits (reassign / merge) always use speaker_id as the locating key. See the V1.5.3 changelog.


init_summary

In addition to the summary text text, this includes mode-aware metadata (mode / template / plain_text / prompt_snapshot), which lets the client trace the mode, effective slug, and customer prompt content (custom mode) that correspond to that summary.

v1.5.5 adds fallback_level / dropped_segments: these appear only when the summary actually went through the LLM service content-filter fallback chain (L2 neutral prompt or L3 segment trimming), for auditing and UI hints during history playback.

Example (L1 succeeds directly, no fallback):

{
  "text": "Summary content...",
  "mode": "custom",
  "template": "acme-meeting-v2",
  "plain_text": true,
  "prompt_snapshot": "Please emphasize KPIs"
}

Example (L3 triggered, generated after 2 transcript segments were trimmed):

{
  "text": "Summary content (2 segments omitted)...",
  "mode": "custom",
  "template": "acme-meeting-v2",
  "plain_text": true,
  "prompt_snapshot": "Please emphasize KPIs",
  "fallback_level": 3,
  "dropped_segments": [3, 7]
}
FieldTypeDescription
textstringSummary text
modestring | null"builtin" / "custom" / null (null when no summary was generated)
templatestring | nulleffective slug — builtin → built-in template slug; custom → customer slug
plain_textbooleanWhether the output is plain text
prompt_snapshotstring | nullHas a value only in custom mode; the prompt content passed in verbatim by the customer (the basis for reconstruction)
fallback_levelint (omit)Present only when a fallback was triggered (2 or 3). 2 = L2 neutral prompt; 3 = L3 segment trimming. Omitted when L1 succeeds directly
dropped_segmentsint (omit)Present only when fallback_level=3; the indices of the trimmed transcript segments (an integer array in original order)

fallback_level / dropped_segments and prompt_snapshot are complementary: the former records the actual execution path (whether a fallback was taken), and the latter records the customer intent (the original prompt content). Even if a fallback was triggered and the customer prompt was not actually used, prompt_snapshot still preserves the original text as an audit record. See V1.5.5 changelog – LLM service content-filter automatic fallback.


init_done

{
  "totalSentences": 10
}
FieldTypeDescription
totalSentencesnumberTotal number of sentences

Edge Case: No Speech Content (V1.3.7)

If the task is silent throughout, the volume is too low, there is too much noise, or the recognition language does not match the actual audio—so that the speech recognition engine recognizes no sentences—this endpoint still completes with the normal event sequence (not an sse_transcript_not_found error):

  • init_metadata is sent normally
  • init_sentence is sent 0 times (no sentences)
  • The text of init_summary is an empty string ""
  • The totalSentences of init_done is 0

This behavior applies to both sources: real-time recording (WebSocket recording ends) and file import (offline processing completes), and is aligned with the "zero recognition results" legalization behavior of the V1.3.5 import flow. The client should use totalSentences === 0 to decide whether to show a "no speech content" empty state, rather than treating it as an error branch. See File Import Guide – Behavior When Audio Cannot Be Recognized.

Specific Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
recording_not_found404Recording not foundVerify that taskId is correct
sse_transcript_not_found404Transcript blob not foundThe transcript file for the specified taskId does not exist or could not be accessed (does not occur under the normal flow; after V1.3.7, silence during real-time recording does not trigger this error either)

Front-End Example

async function loadHistory(taskId, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = decoder.decode(value);
    // Parse the SSE format: event: xxx\ndata: {...}\n\n
    const events = parseSSE(text);

    for (const event of events) {
      if (event.type === 'init_metadata') {
        console.log('Task info:', event.data.title);
      } else if (event.type === 'init_sentence') {
        console.log(`[${event.data.start_time}] ${event.data.origin}`);
        if (event.data.translations) {
          console.log(`Translations:`, event.data.translations);
        }
      } else if (event.type === 'init_summary') {
        console.log('Summary:', event.data.text);
      } else if (event.type === 'init_done') {
        console.log(`Load complete, ${event.data.totalSentences} sentences total`);
      }
    }
  }
}

Version: V1.5.7 Last Updated: 2026-05-20

Copyright © 2026