API Docs

Sse Api

Note: The URL used in this document (vas-poc.vurbo.ai) is the planned deployment URL. The official URL will be announced separately after launch.


Table of Contents


Connection Information

ItemValue
Base Pathhttps://vas-poc.vurbo.ai/api/v1/sse
ProtocolHTTP + Server-Sent Events (SSE)
Data Formattext/event-stream
AuthenticationHeader X-API-Key: {KEY}

Authentication

SSE APIs that require authentication accept two delivery methods (the backend VerifyApiKeyQuery middleware supports both):

# Method A: HTTP Header (recommended, better security)
X-API-Key: vas_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Method B: Query string (native browser EventSource fallback)
?api_key=vas_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Note: The native browser EventSource API does not support custom headers. You can use the ?api_key= query string instead, or use the fetch API with a ReadableStream / an SSE client library that supports headers. In query string mode, the API key appears in the URL, so avoid writing the full URL into server logs or leaking it in screenshots.


Broadcast SSE API

The Broadcast SSE API provides a live subtitle streaming feature, allowing viewers to watch real-time transcription and translation content through a share link.

Note: The base path for Broadcast SSE is https://vas-poc.vurbo.ai/broadcast, which differs from the other SSE APIs.

GET /broadcast/{token}/text (Viewer Live Subtitle Stream)

Description

Viewers connect using a share token to receive an SSE stream of real-time transcription and translation.

Use Cases

  • Viewers watching live subtitles
  • Multilingual translation subtitle display
  • TTS audio playback

Authentication

Token authentication (no API key required): verified through the {token} in the URL path.

Request Parameters

ParameterTypeRequiredDescription
tokenstringYesBroadcast share token (4-character short code, a-z0-9, path parameter)
langstringNoFilter for a specific translation language (e.g., en-US)
ttsbooleanNoWhether to enable TTS (true / false, default false)
viewer_access_tokenstringConditionalViewer access token (required for password-protected broadcasts)

Password Protection Note: When a broadcast is set to password-protected, viewers must first obtain a viewer_access_token through the password verification API, then include this token in the query parameters of the SSE connection.

Request Example

// Receive all languages
const eventSource = new EventSource(
  'https://vas-poc.vurbo.ai/broadcast/a3f9/text'
);

// Receive only English translation
const eventSource = new EventSource(
  'https://vas-poc.vurbo.ai/broadcast/a3f9/text?lang=en-US'
);

// Receive English translation and enable TTS
const eventSource = new EventSource(
  'https://vas-poc.vurbo.ai/broadcast/a3f9/text?lang=en-US&tts=true'
);

Event Types

EventDescriptionNotes
connectedConnection confirmation-
queuedAdded to the waiting queueQueueing mechanism
admittedEntered live from the queueQueueing mechanism
originOriginal text (STT)-
translationTranslation result-
tts_readyTTS audio ready-
pausedBroadcast pausedHost paused or disconnected
resumedBroadcast resumedHost resumed
endedBroadcast ended-
kickedRemovedViewer management
errorError-
speaker_renamedSpeaker renamed-
speaker_reassignedSingle-sentence speaker change-
speakers_mergedSpeakers merged-
standbyStandby phase notification-
phase_changedPhase change notification-
announcementHost announcement-

Event Format

connected:

{
  "session_id": "abc123",
  "source_lang": "zh-TW",
  "subscribed_lang": "en-US",
  "available_langs": ["en-US", "ja-JP"],
  "tts_languages": ["en-US"],
  "phase": "standby",
  "recognition_mode": "single",
  "client_id": "client_xyz"
}
FieldTypeDescription
session_idstringBroadcast session ID
source_langstringOriginal language (set by the host)
subscribed_langstringThe filter language the viewer subscribed to (null if not specified)
available_langsarrayList of available translation languages
tts_languagesarrayList of languages with TTS enabled (an empty array means no TTS)
phasestringBroadcast phase: standby (preparing) or live (active)
recognition_modestringRecognition mode: single (single speaker) or multi_speaker (multi-speaker diarization)
client_idstringClient ID

queued:

{
  "position": 3,
  "estimated_wait": "About 2 minutes"
}
FieldTypeDescription
positionnumberPosition in the queue (1 = next up)
estimated_waitstringEstimated wait time

admitted:

{
  "message": "Entered live"
}

origin:

{
  "sid": 1,
  "text": "Hello everyone",
  "speaker_id": "Guest-1",
  "speaker_label": "Guest-1",
  "start_time": "00:05",
  "is_final": true
}
FieldTypeDescription
sidnumberSentence ID
textstringOriginal text content
speaker_idstringOriginal speaker ID (immutable; "0" in single-speaker mode, or "1"/"2" in conversation mode)
speaker_labelstringDisplay label (after applying speaker_aliases; equals speaker_id when no alias exists)
start_timestringStart time (mm:ss); this field is not sent during the standby phase, and counts from 00:00 once live
is_finalbooleanWhether this is the final result

translation:

{
  "sid": 1,
  "language": "en-US",
  "text": "Hello everyone",
  "speaker_id": "Guest-1",
  "speaker_label": "Royx",
  "is_final": true
}
FieldTypeDescription
sidnumberThe corresponding sentence ID
languagestringTranslation language
textstringTranslated content
speaker_idstringOriginal speaker ID (multi-speaker conversation mode; immutable)
speaker_labelstringDisplay label (after applying speaker_aliases)
is_finalbooleanWhether this is the final result

tts_ready:

{
  "sid": 1,
  "language": "en-US",
  "transcript": "Hello, hi everyone",
  "text": "Hello everyone",
  "audio": "//uQxAAAAAANIAAAAAExBTUUzLjEwMFVVVV...",
  "format": "mp3",
  "duration_ms": 2340,
  "boundaries": [
    {"offset_ms": 0, "duration_ms": 320, "text": "Hello", "text_offset": 0, "word_length": 5},
    {"offset_ms": 320, "duration_ms": 280, "text": "everyone", "text_offset": 6, "word_length": 8}
  ]
}
FieldTypeDescription
sidnumberThe corresponding sentence ID
languagestringTTS language
transcriptstringOriginal transcript (source text)
textstringTranslated text
audiostringBase64-encoded MP3 audio
formatstringAudio format, fixed as "mp3"
duration_msnumberAudio duration (milliseconds)
boundariesarrayWord boundaries (optional, see table below)

Word boundary fields (each object in the boundaries array):

FieldTypeDescription
offset_msnumberThe word's start time in the audio (ms)
duration_msnumberThe word's pronunciation duration (ms)
textstringThe word text
text_offsetnumberThe word's starting position in the text
word_lengthnumberThe word's character length

Note:

  • The host must specify which languages enable TTS via the tts_config parameter in the start command
  • Only viewers who subscribed to that language and enabled TTS will receive this event
  • It is sent only during the live phase; no TTS is sent during the standby phase

paused:

{
  "reason": "host_paused",
  "message": "Live broadcast is paused",
  "paused_at": "2025-12-23T10:30:45.123Z"
}
FieldTypeDescription
reasonstringPause reason: host_paused / host_disconnected
messagestringNotification message
paused_atstringPause time (ISO 8601)

resumed:

{
  "message": "Live broadcast resumed",
  "resumed_at": "2025-12-23T10:32:15.456Z"
}

ended:

{
  "reason": "session_stopped",
  "duration_ms": 3600000
}
FieldTypeDescription
reasonstringEnd reason
duration_msnumberBroadcast duration (milliseconds)

End reasons:

reasonDescription
session_stoppedHost ended normally
token_revokedToken was revoked
host_timeoutHost disconnection timeout
capacity_exceededQueue timeout

kicked:

{
  "message": "Removed by host"
}

error:

{
  "error_code": "broadcast_session_ended",
  "severity": "error",
  "message": "Broadcast session ended",
  "context": "broadcast",
  "request_id": "req_abc123xyz789",
  "timestamp": "2025-12-05T10:30:45.123Z"
}

Sentence-level errors (such as a translation failure for a specific language) additionally carry sid and translation_language, making it easy for the frontend to flag which language failed for a given sentence:

{
  "error_code": "llm_content_filtered",
  "severity": "warning",
  "message": "Content filtered",
  "context": "translation",
  "sid": 5,
  "translation_language": "ja",
  "request_id": "req_abc123xyz789",
  "timestamp": "2026-04-26T10:30:45.123Z"
}
FieldTypeDescription
error_codestringError code
severitystringSeverity: warning / error / fatal
messagestringError message
contextstringThe context in which the error occurred (e.g., broadcast, translation)
sidintOptional. The sentence number for a sentence-level error (e.g., when that sentence's translation fails)
translation_languagestringOptional. The target language that failed to translate (viewers can use this to determine whether a specific language failed for that sentence)
request_idstringRequest tracking ID
timestampstringTime the error occurred (ISO 8601)

speaker_renamed:

Multi-speaker conversation mode only. Sent when the host performs a global speaker rename.

{
  "speaker_id": "Guest-1",
  "new_label": "Royx",
  "affected_sids": [1, 3, 5, 7]
}
FieldTypeDescription
speaker_idstringThe resolved original speaker ID (even if the input is a display label, the event returns the original ID)
new_labelstringNew display label (e.g., Royx)
affected_sidsarrayList of affected sentence IDs

speaker_reassigned:

Multi-speaker conversation mode only. Sent when the host changes the speaker of a single sentence.

{
  "sid": 3,
  "old_speaker_id": "Guest-1",
  "new_speaker_id": "Guest-2",
  "new_speaker_label": "Amy"
}
FieldTypeDescription
sidnumberThe sentence ID that was modified
old_speaker_idstringOriginal speaker ID (e.g., Guest-1)
new_speaker_idstringThe new original speaker ID (e.g., Guest-2)
new_speaker_labelstringNew speaker display label (after applying speaker_aliases; equals the original ID when no alias exists)

speakers_merged:

Multi-speaker conversation mode only. Sent when the host merges speakers. After merging, all sentences belonging to that speaker are reassigned to the target speaker.

{
  "source_speaker_id": "Guest-2",
  "target_speaker_id": "Guest-1",
  "target_speaker_label": "Manager Wang",
  "affected_sids": [3, 5, 7]
}
FieldTypeDescription
source_speaker_idstringThe original speaker ID being merged (e.g., Guest-2)
target_speaker_idstringThe original speaker ID of the merge target (e.g., Guest-1)
target_speaker_labelstringTarget speaker display label (after applying speaker_aliases; equals the original ID when no alias exists)
affected_sidsarrayList of affected sentence IDs

standby:

When a viewer connects during the standby phase, this event is received immediately after the connected event, indicating that the broadcast has not yet officially started. The host can dynamically update the standby message via the WebSocket set_standby_message action; after the update, all viewers receive a new standby event.

{
  "message": "The presentation is about to begin, please wait...",
  "translations": {
    "en-US": "The presentation is about to begin, please wait...",
    "ja-JP": "プレゼンテーションがまもなく始まります。お待ちください..."
  }
}
FieldTypeDescription
messagestringThe message displayed during the standby phase (original text)
translationsobjectTranslation results (optional); the key is the language code and the value is the translated text

phase_changed:

Sent when the broadcast switches from the standby phase to the active phase.

{
  "phase": "live",
  "message": "Broadcast has started"
}
FieldTypeDescription
phasestringThe new phase: live (active phase)
messagestringPhase change message

announcement:

An announcement message sent by the host; all viewers receive it.

{
  "message": "The meeting will end in 5 minutes",
  "translations": {
    "en-US": "The meeting will end in 5 minutes",
    "ja-JP": "会議は5分後に終了します"
  }
}
FieldTypeDescription
messagestringThe announcement content (original text)
translationsobjectTranslation results (optional); the key is the language code and the value is the translated text

Heartbeat Mechanism

The SSE connection uses a heartbeat to keep the connection alive:

  • Interval: 15 seconds
  • Format: SSE comment (starting with :)
  • The frontend does not need to handle it; the browser automatically ignores it
: heartbeat

Error Responses

Error CodeHTTP StatusDescriptionRecommended Handling
broadcast_session_not_found404Broadcast not foundConfirm the token is correct
broadcast_session_ended410Broadcast endedNotify the user that the broadcast has ended
broadcast_capacity_exceeded503Viewer capacity reachedJoin the waiting queue

Note: If an SSE endpoint encounters an unexpected internal exception, it may return internal_error (the same per-message panic recovery mechanism as WebSocket); expected domain errors return the corresponding error code (e.g., sse_translation_failed).

Frontend Example

function connectBroadcast(token, lang = null) {
  let url = `https://vas-poc.vurbo.ai/broadcast/${token}/text`;
  if (lang) {
    url += `?lang=${lang}`;
  }

  const eventSource = new EventSource(url);

  eventSource.addEventListener('connected', (e) => {
    const data = JSON.parse(e.data);
    console.log(`Connected, original language: ${data.source_lang}`);
    console.log(`Available translations: ${data.available_langs.join(', ')}`);
  });

  eventSource.addEventListener('queued', (e) => {
    const data = JSON.parse(e.data);
    console.log(`In queue, position: ${data.position}, estimated wait: ${data.estimated_wait}`);
  });

  eventSource.addEventListener('admitted', (e) => {
    console.log('Entered live');
  });

  eventSource.addEventListener('origin', (e) => {
    const data = JSON.parse(e.data);
    console.log(`[${data.start_time}] ${data.text}`);
  });

  eventSource.addEventListener('translation', (e) => {
    const data = JSON.parse(e.data);
    console.log(`Translation (${data.language}): ${data.text}`);
  });

  eventSource.addEventListener('tts_ready', (e) => {
    const data = JSON.parse(e.data);
    // Decode the Base64 audio and play it
    const byteCharacters = atob(data.audio);
    const byteNumbers = new Array(byteCharacters.length);
    for (let i = 0; i < byteCharacters.length; i++) {
      byteNumbers[i] = byteCharacters.charCodeAt(i);
    }
    const blob = new Blob([new Uint8Array(byteNumbers)], { type: 'audio/mpeg' });
    const audio = new Audio(URL.createObjectURL(blob));
    audio.play();
  });

  eventSource.addEventListener('paused', (e) => {
    const data = JSON.parse(e.data);
    console.log(`Broadcast paused: ${data.message}`);
  });

  eventSource.addEventListener('resumed', (e) => {
    console.log('Broadcast resumed');
  });

  eventSource.addEventListener('ended', (e) => {
    const data = JSON.parse(e.data);
    console.log(`Broadcast ended, reason: ${data.reason}`);
    eventSource.close();
  });

  eventSource.addEventListener('kicked', (e) => {
    console.log('You have been removed');
    eventSource.close();
  });

  eventSource.addEventListener('speaker_renamed', (e) => {
    const data = JSON.parse(e.data);
    console.log(`Speaker renamed: ${data.speaker_id} → ${data.new_label}`);
    console.log(`Affected sentences: ${data.affected_sids.join(', ')}`);
    // Update the speaker display name for all affected sentences
  });

  eventSource.addEventListener('speaker_reassigned', (e) => {
    const data = JSON.parse(e.data);
    console.log(`Speaker of sentence ${data.sid} changed from ${data.old_speaker_id} to: ${data.new_speaker_label}`);
    // Update the speaker display name for that sentence
  });

  eventSource.addEventListener('standby', (e) => {
    const data = JSON.parse(e.data);
    // Display the translation matching the viewer's selected language
    const displayLang = 'en-US'; // The language the viewer selected
    const displayMessage = data.translations?.[displayLang] || data.message;
    console.log(`Standby phase: ${displayMessage}`);
    // Show the waiting screen
  });

  eventSource.addEventListener('phase_changed', (e) => {
    const data = JSON.parse(e.data);
    console.log(`Phase changed: ${data.phase} - ${data.message}`);
    // Remove the waiting screen and start displaying subtitles
  });

  eventSource.addEventListener('announcement', (e) => {
    const data = JSON.parse(e.data);
    // Display the translation matching the viewer's selected language
    const displayLang = 'en-US'; // The language the viewer selected
    const displayMessage = data.translations?.[displayLang] || data.message;
    console.log(`Announcement: ${displayMessage}`);
    // Show the announcement message
  });

  eventSource.addEventListener('error', (e) => {
    if (e.data) {
      const error = JSON.parse(e.data);
      console.error(`Error [${error.error_code}]: ${error.message}`);
    }
    eventSource.close();
  });

  return eventSource;
}

REST API also available: For the endpoint to query broadcast information, GET /broadcast/{token}/info, see REST API - Broadcasts API.


GET /api/v1/sse/history/transcribe/{taskId} (Retrieve Conversation History)

Description

Loads the complete conversation history for the specified task, including all sentences and the summary. Delivered one item at a time via an SSE stream.

Use Cases

  • Viewing the recording details page
  • Loading the historical transcript

Authentication

Header: X-API-Key: YOUR_API_KEY

Request Parameters

ParameterTypeRequiredDescription
taskIdstringYesRecording ID (path parameter)

Request Example

// Use the fetch API (because EventSource does not support headers)
async function connectSSE(taskId, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );
  const reader = response.body.getReader();
  // ... handle SSE events
}

Event Sequence

1. connected        → Connection confirmation
2. init_metadata    → Send task metadata
3. init_sentence    → Send sentences one at a time (repeated N times)
4. init_summary     → Send the summary
5. init_done        → Initialization complete

Event Format

connected:

{"message": "History service connected (recordingId: xxx)"}

init_metadata:

{
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "title": "Meeting Notes",
  "created_at": "2025-12-17T10:00:00Z",
  "type": "transcribe",
  "has_speaker_diarization": true,
  "transcription_languages": ["zh-TW"],
  "translation_languages": ["en-US"],
  "summary_template": "general",
  "summary_language": "zh-TW",
  "speaker_aliases": {"speaker_1": "Manager Wang"}
}

speaker_aliases is a mapping of "original speaker ID → display name"; it is {} (an empty object, not an array) when there are no aliases. The frontend can use this mapping to run a duplicate-name pre-check before renaming a speaker (added in v1.3.12).

init_sentence:

{
  "sid": 1,
  "origin": "Hello, nice to meet you",
  "translations": {
    "en-US": "Hello, nice to meet you"
  },
  "start_time": "00:05",
  "speaker_id": "speaker_1",
  "speaker_label": "Manager Wang"
}

If a sentence has a translation failure, it additionally carries a translation_errors field (only present when there is a failure), so the frontend can distinguish between "that language was not scheduled for translation" (the key is missing from translations) and "translated but failed" (the key is present in translation_errors):

{
  "sid": 5,
  "origin": "Sentence with sensitive words",
  "translations": {
    "en-US": "Sensitive sentence"
  },
  "translation_errors": {
    "ja": "llm_content_filtered"
  },
  "start_time": "00:25",
  "speaker_id": "speaker_1",
  "speaker_label": "Manager Wang"
}
FieldTypeDescription
sidintSentence number
originstringOriginal text
translationsobjectTranslation results (optional); the key is the language code and the value is the translated text
translation_errorsobjectOptional. Translation failure error codes; the key is the language code and the value is the error_code (e.g., llm_content_filtered)
start_timestringStart time (mm:ss format)
speaker_idstring|nullOriginal speaker ID (immutable, e.g., speaker_1); the source for target_speaker_id in PATCH /speakers/reassign (flipped in v1.5.3: previously the display name)
speaker_labelstring|nullDisplay label (the human-readable name after applying speaker_aliases, e.g., Manager Wang); equals speaker_id when no alias exists (added in v1.5.3 to replace the original speaker_id display semantics)

init_summary:

{"text": "This is a summary of the meeting notes..."}

init_done:

{"totalSentences": 10}

Error Responses

Error CodeHTTP StatusDescriptionRecommended Handling
recording_not_found404Recording not foundConfirm the taskId is correct
sse_transcript_not_found404Transcript not foundThe recording may not have finished processing yet

Frontend Example

// Use the fetch API to handle SSE (you must parse the event-stream yourself)
async function loadHistory(taskId, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = decoder.decode(value);
    // Parse the SSE format: event: xxx\ndata: {...}\n\n
    const events = parseSSE(text);

    for (const event of events) {
      if (event.type === 'init_metadata') {
        console.log('Task info:', event.data.title);
      } else if (event.type === 'init_sentence') {
        console.log(`[${event.data.start_time}] ${event.data.origin}`);
        if (event.data.translation) {
          console.log(`Translation: ${event.data.translation}`);
        }
      } else if (event.type === 'init_done') {
        console.log('Loading complete');
      }
    }
  }
}

GET /api/v1/sse/retranslate/{taskId} (Retranslate Full Transcript)

Description

Retranslates all sentences of the specified task into the target language. Translation results are delivered one at a time via an SSE stream.

Use Cases

  • Switching the display language
  • Updating the translation content

Authentication

Header: X-API-Key: YOUR_API_KEY

Request Parameters

ParameterTypeRequiredDescription
taskIdstringYesRecording ID (path parameter)
targetLangstringYesTarget language code

Request Example

// Use the fetch API (because EventSource does not support headers)
async function retranslateSSE(taskId, targetLang, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/retranslate/${taskId}?targetLang=${targetLang}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );
  const reader = response.body.getReader();
  // ... handle SSE events
}

Event Format

translation:

{"sid": 1, "text": "Hello, nice to meet you", "is_final": true}

done:

{"totalUpdated": 10}

error (per-sid sentence translation failure):

When a sentence fails to translate (e.g., LLM provider error, content filtering), instead of a translation event, an event: error is sent carrying sid + error_code, interleaved with the translation events. The frontend can handle this with the same translationError.ts interceptor (aligned with the WebSocket spec):

event: error
data: {"error_code": "sse_translation_failed", "severity": "error", "message": "SSE translation failed", "context": "sse", "sid": 5, "request_id": "req_abc123xyz789", "timestamp": "2026-04-27T10:30:45.123Z", "details": {"translation_language": "ja", "original_error": "..."}}
FieldTypeDescription
error_codestringError code, currently fixed as sse_translation_failed
severitystringerror
messagestringHuman-readable message
contextstringsse (automatically matched by the ErrorContextEnum prefix rule)
sidintThe sentence number that failed
request_idstringRequest tracking ID
timestampstringTime the error occurred (ISO 8601)
detailsobjectIncludes debug info such as translation_language and original_error

Failed sentences are saved as translation error records (see the history-playback guide), and the failure markers are visible the next time the history is loaded. For the full specification, see reference/sse/retranslate.md.

Error Responses

Error CodeHTTP StatusDescriptionRecommended Handling
sse_missing_target_lang422Missing target language parameterProvide targetLang
sse_unsupported_language422Unsupported target languageUse a valid language code
sse_translation_failed500Translation failed (per-sid)The failed sentence is still reported via event: error; the overall flow is not interrupted

Frontend Example

// Use the fetch API to handle SSE
async function retranslate(taskId, targetLang, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/retranslate/${taskId}?targetLang=${targetLang}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const events = parseSSE(decoder.decode(value));
    for (const event of events) {
      if (event.type === 'translation') {
        console.log(`Sentence ${event.data.sid}: ${event.data.text}`);
      } else if (event.type === 'error') {
        console.warn(`Sentence ${event.data.sid} failed to translate: ${event.data.error_code}`);
      } else if (event.type === 'done') {
        console.log(`Complete, ${event.data.totalUpdated} sentences updated`);
      }
    }
  }
}

GET /api/v1/sse/recordings/{taskId}/entries/{sid}/retranslate (Single-Sentence Retranslation, added in v1.4.0)

Description

Retranslates a single sentence. The most common scenario: after a user edits the original text via PATCH /api/v1/recordings/{id}/entries/{sid}, you call this endpoint to redo all translations for that sentence.

Differences from full-transcript retranslation (/retranslate/{taskId}):

  • Full-transcript retranslation: all sentences are translated into a single target language
  • Single-sentence retranslation: only one sentence is translated, and all existing target languages can be translated at once; supports optimistic locking

Authentication

Query: api_key (the browser EventSource does not support headers)

Request Parameters

ParameterLocationTypeRequiredDescription
taskIdpathstringYesRecording ID (UUID)
sidpathnumberYesSentence ID (1-based)
targetLangquerystringNoTarget language code. When omitted, all languages already present in translated_texts for that sentence are retranslated
expectedRevisionquerynumberNoOptimistic lock: the current transcript revision; a mismatch returns transcript_revision_conflict
api_keyquerystringYesAPI key

Event Format

Event sequence: connected → progress / translated / error ×N → done

// progress (when translation begins for each language)
{ "sid": 5, "lang": "en-US", "status": "translating" }

// translated (when each language completes successfully)
{ "sid": 5, "lang": "en-US", "text": "Hello world", "tokens_used": 25 }

// done (all complete; successfully translated languages are listed in languages_translated)
{
  "sid": 5,
  "revision": 6,
  "original_text_edited_at": "2026-05-06T10:30:00.000000Z",
  "languages_translated": ["en-US"],
  "languages_failed": ["ja-JP"]
}

Error Responses

Error CodeHTTPDescription
recording_not_found404Recording does not exist or does not belong to the user
recording_not_completed422The recording has not finished processing
entry_not_found404The specified sentence was not found
entry_text_empty422The original text of that sentence is empty
transcript_revision_conflict409Revision mismatch (already modified by another request)
storage_upload_failed500Failed to save the transcript

For the full event format and a workflow example combining optimistic locking with PATCH, see reference/sse/retranslate.md.


init_sentence Edit Marker Fields (added in v1.4.0)

For sentences edited by a user, historyTranscribe adds two fields to the init_sentence event (only present after editing):

{
  "sid": 7,
  "origin": "Corrected text",
  "original_text_raw": "Original STT output",
  "original_text_edited_at": "2026-05-06T10:30:00.000000Z",
  "translations": { "en-US": "Corrected text" }
}

Frontend detection: determine this by the presence of the field ('original_text_raw' in data); do not compare origin === original_text_raw — a user may edit and then change it back to the same string, in which case the text is equal but the "edited" marker should still be shown. See reference/sse/history.md.


GET /api/v1/sse/retranslate/summary/{taskId} (Retranslate Summary)

Description

Retranslates the summary of the specified task into the target language. Translation results are delivered segment by segment via an SSE stream.

Use Cases

  • Switching the summary display language
  • Obtaining the summary in a different language

Authentication

Header: X-API-Key: YOUR_API_KEY

Request Parameters

ParameterTypeRequiredDescription
taskIdstringYesRecording ID
targetLangstringYesTarget language code

Request Example

// Use the fetch API (because EventSource does not support headers)
async function retranslateSummarySSE(taskId, targetLang, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/retranslate/summary/${taskId}?targetLang=${targetLang}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );
  const reader = response.body.getReader();
  // ... handle SSE events
}

Event Format

summary_translation:

{"text": "Accumulated translation result...", "is_final": false}

done:

{"totalUpdated": 1}

Error Responses

Error CodeHTTP StatusDescriptionRecommended Handling
sse_summary_not_found404Summary not foundThis recording has no summary
sse_summary_translation_failed500Summary translation failedRetry later

Regenerate Summary (GET Preview / POST Save)

Split into two endpoints + mode-aware. For the full schema, see reference/sse/regenerate-summary.md; this is a quick summary.

MethodEndpointWrites DBSaves TranscriptBilledPurpose
GET/api/v1/sse/regenerate/summary/{taskId}Preview (dry run)
POST/api/v1/sse/regenerate/summary/{taskId}✅ + bump revisionSave (persist officially)

Known limitation: GET is also billed — the LLM actually consumes tokens, so the GET endpoint cannot be used for free.

Shared Parameters (GET via query string, POST via JSON body)

ParameterTypeRequiredDescription
taskId (path)stringYesRecording UUID
modestringYesSummary mode enum: builtin / custom
templatestringRequired for builtin / forbidden for customBuilt-in template slug
promptstringRequired for custom / forbidden for builtinThe customer's full prompt (replaces the built-in layered prompt, ≤2000 characters)
promptSlugstringRequired for custom / forbidden for builtinThe customer's own identifier (≤64 Unicode characters, no control characters)
languagestringNoOutput language (defaults to the first transcription language)
plainTextbooleanNoWhether to request plain-text output (default false)

Mutual exclusivity rule: violation → 422 summary_mode_field_mismatch.

Request Example

# Preview builtin (does not write DB / blob)
curl -N "https://vas-poc.vurbo.ai/api/v1/sse/regenerate/summary/550e8400-...?mode=builtin&template=meeting&language=zh-TW&plainText=true" \
  -H "X-API-Key: YOUR_API_KEY"

# Save custom
curl -N -X POST "https://vas-poc.vurbo.ai/api/v1/sse/regenerate/summary/550e8400-..." \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"mode":"custom","prompt":"Please emphasize KPIs","promptSlug":"acme-v2","plainText":true}'

Event Sequence

1. connected              → Connection confirmation (includes mode=builtin|custom, endpoint=preview|persist)
2. summary_regeneration   → Stream summary segments (accumulating; is_final=true marks the last one)
3. done                   → Complete, includes final_content / mode / template(effective) / prompt_snapshot (only for custom)

done event

{
  "task_id": "550e8400-...",
  "tokens_used": 123,
  "final_content": "This meeting...",
  "mode": "custom",
  "template": "acme-v2",
  "plain_text": true,
  "persisted": true,
  "prompt_snapshot": "Please emphasize KPIs"
}
  • mode: business mode (builtin / custom)
  • template: effective slug — builtin → built-in template slug; custom → customer slug
  • persisted: whether this summary has been officially saved (false for GET, true for POST)
  • prompt_snapshot: only present in custom mode; the prompt content the customer passed in verbatim (a mandatory snapshot, the sole basis for reconstruction)

Error Codes

Error CodeHTTPDescription
recording_not_found404Recording not found
sse_template_not_found404Summary template not found
sse_transcript_not_found404Transcript not found
summary_text_empty400The transcript has no content
summary_text_too_long400The transcript exceeds the 100,000-character limit
sse_summary_regeneration_failed500Regeneration failed (raw error already sanitized)
summary_invalid_mode422mode is not builtin / custom
summary_mode_field_mismatch422The mode and field combination do not match (required field missing / forbidden field provided)
summary_prompt_too_long422prompt exceeds 2000 characters
summary_prompt_slug_too_long422promptSlug exceeds 64 characters
summary_prompt_slug_invalid422promptSlug contains control characters (\n / \r / \t / \0, etc.)

Frontend Example

async function regenerateSummary(taskId, body, apiKey, { persist = false } = {}) {
  const url = `https://vas-poc.vurbo.ai/api/v1/sse/regenerate/summary/${taskId}`;
  const init = persist
    ? { method: 'POST', headers: { 'X-API-Key': apiKey, 'Content-Type': 'application/json' }, body: JSON.stringify(body) }
    : { method: 'GET', headers: { 'X-API-Key': apiKey } };
  if (!persist) {
    const params = new URLSearchParams(body);
    return fetch(`${url}?${params}`, init);
  }
  return fetch(url, init);
}

GET /api/v1/sse/tts/{taskId} (TTS Audio Stream)

Description

Converts the translated content of a historical recording into TTS audio, delivered sentence by sentence via an SSE stream. The frontend can control how many sentences are returned per request.

Use Cases

  • Audio playback of translations from historical recordings
  • Karaoke effect (combined with word boundaries)
  • Voice readout of translated content

Authentication

Header: X-API-Key: YOUR_API_KEY

Request Parameters

ParameterTypeRequiredDescription
taskIdstringYesRecording ID (path parameter)
languagestringYesTTS output language (e.g., en-US)
voicestringNoSpecify a voice name (e.g., en-US-JennyNeural)
sidintNoStarting sentence ID (default 1, starting from the first sentence)
lengthintNoNumber of sentences to return (default 1, maximum 20)

Note: The maximum value of length is controlled by the backend environment variable TTS_SSE_MAX_LENGTH (default 20). It is automatically truncated when it exceeds the maximum.

Request Example (Single Sentence Playback)

// Use the fetch API (because EventSource does not support headers)
async function playTTSSingle(taskId, language, sid, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=${language}&sid=${sid}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );
  const reader = response.body.getReader();
  // ... handle SSE events
}

Request Example (Multiple Sentence Playback)

// Play sentences 5, 6, and 7 (3 sentences total)
async function playTTSMultiple(taskId, language, sid, length, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=${language}&sid=${sid}&length=${length}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );
  const reader = response.body.getReader();
  // ... handle SSE events
}

Event Sequence

1. connected    → Connection confirmation
2. tts_audio    → Send TTS audio sentence by sentence (repeated N times, N = length)
3. tts_done     → Playback complete

Event Format

connected:

{
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "language": "en-US",
  "voice": "en-US-JennyNeural",
  "start_sid": 5,
  "length": 3
}

tts_audio:

{
  "sid": 5,
  "transcript": "Hello, nice to meet you",
  "text": "Hello, nice to meet you",
  "audio": "Base64EncodedMP3...",
  "duration_ms": 2500,
  "boundaries": [
    {"offset_ms": 0, "duration_ms": 350, "text_offset": 0, "word_length": 5, "text": "Hello"},
    {"offset_ms": 350, "duration_ms": 100, "text_offset": 5, "word_length": 1, "text": ","},
    {"offset_ms": 500, "duration_ms": 250, "text_offset": 7, "word_length": 4, "text": "nice"},
    {"offset_ms": 750, "duration_ms": 200, "text_offset": 12, "word_length": 2, "text": "to"},
    {"offset_ms": 950, "duration_ms": 350, "text_offset": 15, "word_length": 4, "text": "meet"},
    {"offset_ms": 1300, "duration_ms": 300, "text_offset": 20, "word_length": 3, "text": "you"}
  ]
}
FieldTypeDescription
sidintSentence ID
transcriptstringOriginal transcript (STT recognition result)
textstringTranslated text (the TTS synthesis source)
audiostringBase64-encoded MP3 audio
duration_msintAudio duration (milliseconds)
boundariesarrayWord boundary array

Word Boundary Field Descriptions

FieldTypeDescription
offset_msintThe word's start time in the audio (milliseconds)
duration_msintThe word's duration (milliseconds)
text_offsetintPosition in the original text string (character index)
word_lengthintWord length (number of characters)
textstringWord content

tts_done:

{
  "sentences_sent": 3,
  "total_duration_ms": 7500,
  "total_characters_used": 142
}
FieldTypeDescription
sentences_sentintThe number of sentences actually sent
total_duration_msintThe total audio duration of all sentences (milliseconds)
total_characters_usedintThe total number of characters synthesized in this TTS request (used for quota calculation)

Error Responses

Error CodeHTTP StatusDescriptionRecommended Handling
recording_not_found404Recording not foundConfirm the taskId is correct
sse_missing_target_lang422Missing language parameterProvide the language parameter
sse_unsupported_language422Unsupported languageUse a valid language code
tts_translation_not_found400Translation for that language not foundConfirm the translation for that language exists
tts_synthesis_failed500TTS synthesis failedRetry later
tts_quota_exceeded402TTS usage limit reachedRetry later

Frontend Example

// Use the fetch API to handle TTS SSE
async function playTTS(taskId, language, apiKey, startSid = 1, length = 1) {
  const url = new URL(`https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}`);
  url.searchParams.set('language', language);
  url.searchParams.set('sid', startSid);
  url.searchParams.set('length', length);

  const response = await fetch(url, {
    headers: {
      'X-API-Key': apiKey
    }
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const events = parseSSE(decoder.decode(value));
    for (const event of events) {
      if (event.type === 'connected') {
        console.log(`TTS connection successful, voice: ${event.data.voice}`);
      } else if (event.type === 'tts_audio') {
        console.log(`Sentence ${event.data.sid}: ${event.data.text}`);

        // Play the audio
        const audioBlob = base64ToBlob(event.data.audio, 'audio/mp3');
        const audioUrl = URL.createObjectURL(audioBlob);
        const audio = new Audio(audioUrl);

        // Set up the karaoke effect
        setupKaraoke(audio, event.data.boundaries, event.data.text);

        audio.play();
      } else if (event.type === 'tts_done') {
        console.log(`Playback complete, ${event.data.sentences_sent} sentences total`);
      }
    }
  }
}

// Base64 to Blob
function base64ToBlob(base64, mimeType) {
  const byteCharacters = atob(base64);
  const byteNumbers = new Array(byteCharacters.length);
  for (let i = 0; i < byteCharacters.length; i++) {
    byteNumbers[i] = byteCharacters.charCodeAt(i);
  }
  const byteArray = new Uint8Array(byteNumbers);
  return new Blob([byteArray], { type: mimeType });
}

// Karaoke effect
function setupKaraoke(audio, boundaries, text) {
  const updateHighlight = () => {
    const currentTimeMs = audio.currentTime * 1000;
    const currentWord = boundaries.find((b, i) => {
      const nextOffset = boundaries[i + 1]?.offset_ms ?? Infinity;
      return currentTimeMs >= b.offset_ms && currentTimeMs < nextOffset;
    });

    if (currentWord) {
      // Highlight the current word
      highlightWord(text, currentWord.text_offset, currentWord.word_length);
    }
  };

  const interval = setInterval(updateHighlight, 50);
  audio.addEventListener('ended', () => clearInterval(interval));
}

GET /api/v1/sse/imports/{importId}/progress (Import Progress Stream)

Description

Tracks the processing progress of an audio file import in real time. After connecting, progress updates are continuously pushed via an SSE stream until the import completes, fails, or the connection times out.

Use Cases

  • Showing a real-time processing progress bar after uploading an audio file
  • Tracking the progress of each stage: audio conversion, transcription, translation, summary, etc.

Authentication

Header: X-API-Key: YOUR_API_KEY

Request Parameters

ParameterTypeRequiredDescription
importIdstringYesImport task ID (UUID, path parameter)

Request Example

curl -N "https://vas-poc.vurbo.ai/api/v1/sse/imports/550e8400-e29b-41d4-a716-446655440000/progress" \
  -H "X-API-Key: vas_aB3dE5fG7hI9jK1lM3nO5pQ7rS9tU1vW"

Event Sequence

Scenario 1: Import still in progress
1. connected       → Connection confirmation
2. progress        → Send the current progress
3. progress ×N     → Continuously pushed when progress changes
   heartbeat ×N    → Sent every 15 seconds when there is no progress change
4. completed       → Import succeeded, connection ends
   or failed       → Import failed, connection ends
   or timeout      → Exceeded 15 minutes, connection ends

Scenario 2: Import already complete (terminal state)
1. connected       → Connection confirmation
2. progress        → Send the final progress
3. completed       → Send the completed event directly and end
   or failed       → Send the failed event directly and end

Event Format

connected:

{"message": "Import progress service connected (importId: xxx)"}

progress:

{
  "import_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "stage": "transcribing",
  "progress": 45,
  "message": "Transcribing..."
}
FieldTypeDescription
import_idstringImport task ID (UUID)
statusstringImport status: pending / processing / completed / failed
stagestring / nullThe current processing stage
progressintegerProgress percentage (0-100)
messagestringHuman-readable progress message

Stage values and their corresponding progress ranges:

ValueDescriptionProgress Range
convertingAudio format conversion0% - 10%
transcribingSpeech-to-text10% - 60%
translatingText translation60% - 85%
summarizingGenerating the summary85% - 100%
nullNot started yet

completed:

{
  "import_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "task_id": "abc123-e29b-41d4-a716-446655440000",
  "message": "Processing complete"
}
FieldTypeDescription
import_idstringImport task ID
statusstringFixed as completed
task_idstringThe generated recording ID (recording_id), which can be used for subsequent queries
messagestringFixed as Processing complete

failed:

{
  "import_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "failed",
  "error_code": "import_invalid_format",
  "error_message": "Unsupported audio format"
}
FieldTypeDescription
import_idstringImport task ID
statusstringFixed as failed
error_codestringError code
error_messagestringHuman-readable error message

heartbeat:

Sent every 15 seconds when there is no progress change, used to keep the connection alive.

{"timestamp": 1708761600}

timeout:

Sent when the import has not completed after 15 minutes; the connection ends automatically.

{"message": "Connection timeout"}

Error Responses

Error CodeHTTP StatusDescriptionRecommended Handling
import_not_found404The specified import task was not foundConfirm the importId is correct

Frontend Example

async function trackImportProgress(importId, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/imports/${importId}/progress`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const events = buffer.split('\n\n');
    buffer = events.pop();

    for (const eventStr of events) {
      if (!eventStr.trim()) continue;

      const lines = eventStr.split('\n');
      let eventType = '';
      let eventData = '';

      for (const line of lines) {
        if (line.startsWith('event: ')) eventType = line.slice(7);
        else if (line.startsWith('data: ')) eventData = line.slice(6);
      }

      if (!eventType || !eventData) continue;
      const data = JSON.parse(eventData);

      switch (eventType) {
        case 'connected':
          console.log('Connected:', data.message);
          break;
        case 'progress':
          console.log(`[${data.stage}] ${data.progress}% - ${data.message}`);
          updateProgressBar(data.progress, data.stage, data.message);
          break;
        case 'completed':
          console.log('Import complete! Recording ID:', data.task_id);
          navigateToRecording(data.task_id);
          break;
        case 'failed':
          console.error('Import failed:', data.error_code, data.error_message);
          showError(data.error_message);
          break;
        case 'timeout':
          console.warn('Connection timeout:', data.message);
          break;
      }
    }
  }
}

Version: V1.5.7 Last Updated: 2026-05-20

Copyright © 2026