API Docs

Sse Api

Note: The URL used in this document (vas-poc.vurbo.ai) is the planned deployment URL. The official URL will be announced separately after launch.

SSE API

Connection Information

Item	Value
Base Path	`https://vas-poc.vurbo.ai/api/v1/sse`
Protocol	HTTP + Server-Sent Events (SSE)
Data Format	text/event-stream
Authentication	Header `X-API-Key: {KEY}`

Authentication

SSE APIs that require authentication accept two delivery methods (the backend VerifyApiKeyQuery middleware supports both):

# Method A: HTTP Header (recommended, better security)
X-API-Key: vas_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Method B: Query string (native browser EventSource fallback)
?api_key=vas_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Note: The native browser EventSource API does not support custom headers. You can use the ?api_key= query string instead, or use the fetch API with a ReadableStream / an SSE client library that supports headers. In query string mode, the API key appears in the URL, so avoid writing the full URL into server logs or leaking it in screenshots.

Broadcast SSE API

The Broadcast SSE API provides a live subtitle streaming feature, allowing viewers to watch real-time transcription and translation content through a share link.

Note: The base path for Broadcast SSE is https://vas-poc.vurbo.ai/broadcast, which differs from the other SSE APIs.

GET /broadcast/{token}/text (Viewer Live Subtitle Stream)

Description

Viewers connect using a share token to receive an SSE stream of real-time transcription and translation.

Use Cases

Viewers watching live subtitles
Multilingual translation subtitle display
TTS audio playback

Authentication

Token authentication (no API key required): verified through the {token} in the URL path.

Request Parameters

Parameter	Type	Required	Description
`token`	string	Yes	Broadcast share token (4-character short code, a-z0-9, path parameter)
`lang`	string	No	Filter for a specific translation language (e.g., `en-US`)
`tts`	boolean	No	Whether to enable TTS (`true` / `false`, default false)
`viewer_access_token`	string	Conditional	Viewer access token (required for password-protected broadcasts)

Password Protection Note: When a broadcast is set to password-protected, viewers must first obtain a viewer_access_token through the password verification API, then include this token in the query parameters of the SSE connection.

Request Example

// Receive all languages
const eventSource = new EventSource(
  'https://vas-poc.vurbo.ai/broadcast/a3f9/text'
);

// Receive only English translation
const eventSource = new EventSource(
  'https://vas-poc.vurbo.ai/broadcast/a3f9/text?lang=en-US'
);

// Receive English translation and enable TTS
const eventSource = new EventSource(
  'https://vas-poc.vurbo.ai/broadcast/a3f9/text?lang=en-US&tts=true'
);

Event Types

Event	Description	Notes
`connected`	Connection confirmation	-
`queued`	Added to the waiting queue	Queueing mechanism
`admitted`	Entered live from the queue	Queueing mechanism
`origin`	Original text (STT)	-
`translation`	Translation result	-
`tts_ready`	TTS audio ready	-
`paused`	Broadcast paused	Host paused or disconnected
`resumed`	Broadcast resumed	Host resumed
`ended`	Broadcast ended	-
`kicked`	Removed	Viewer management
`error`	Error	-
`speaker_renamed`	Speaker renamed	-
`speaker_reassigned`	Single-sentence speaker change	-
`speakers_merged`	Speakers merged	-
`standby`	Standby phase notification	-
`phase_changed`	Phase change notification	-
`announcement`	Host announcement	-

Event Format

connected:

{
  "session_id": "abc123",
  "source_lang": "zh-TW",
  "subscribed_lang": "en-US",
  "available_langs": ["en-US", "ja-JP"],
  "tts_languages": ["en-US"],
  "phase": "standby",
  "recognition_mode": "single",
  "client_id": "client_xyz"
}

Field	Type	Description
`session_id`	string	Broadcast session ID
`source_lang`	string	Original language (set by the host)
`subscribed_lang`	string	The filter language the viewer subscribed to (`null` if not specified)
`available_langs`	array	List of available translation languages
`tts_languages`	array	List of languages with TTS enabled (an empty array means no TTS)
`phase`	string	Broadcast phase: `standby` (preparing) or `live` (active)
`recognition_mode`	string	Recognition mode: `single` (single speaker) or `multi_speaker` (multi-speaker diarization)
`client_id`	string	Client ID

queued:

{
  "position": 3,
  "estimated_wait": "About 2 minutes"
}

Field	Type	Description
`position`	number	Position in the queue (1 = next up)
`estimated_wait`	string	Estimated wait time

admitted:

{
  "message": "Entered live"
}

origin:

{
  "sid": 1,
  "text": "Hello everyone",
  "speaker_id": "Guest-1",
  "speaker_label": "Guest-1",
  "start_time": "00:05",
  "is_final": true
}

Field	Type	Description
`sid`	number	Sentence ID
`text`	string	Original text content
`speaker_id`	string	Original speaker ID (immutable; `"0"` in single-speaker mode, or `"1"`/`"2"` in conversation mode)
`speaker_label`	string	Display label (after applying `speaker_aliases`; equals `speaker_id` when no alias exists)
`start_time`	string	Start time (mm:ss); this field is not sent during the standby phase, and counts from `00:00` once live
`is_final`	boolean	Whether this is the final result

translation:

{
  "sid": 1,
  "language": "en-US",
  "text": "Hello everyone",
  "speaker_id": "Guest-1",
  "speaker_label": "Royx",
  "is_final": true
}

Field	Type	Description
`sid`	number	The corresponding sentence ID
`language`	string	Translation language
`text`	string	Translated content
`speaker_id`	string	Original speaker ID (multi-speaker conversation mode; immutable)
`speaker_label`	string	Display label (after applying `speaker_aliases`)
`is_final`	boolean	Whether this is the final result

tts_ready:

{
  "sid": 1,
  "language": "en-US",
  "transcript": "Hello, hi everyone",
  "text": "Hello everyone",
  "audio": "//uQxAAAAAANIAAAAAExBTUUzLjEwMFVVVV...",
  "format": "mp3",
  "duration_ms": 2340,
  "boundaries": [
    {"offset_ms": 0, "duration_ms": 320, "text": "Hello", "text_offset": 0, "word_length": 5},
    {"offset_ms": 320, "duration_ms": 280, "text": "everyone", "text_offset": 6, "word_length": 8}
  ]
}

Field	Type	Description
`sid`	number	The corresponding sentence ID
`language`	string	TTS language
`transcript`	string	Original transcript (source text)
`text`	string	Translated text
`audio`	string	Base64-encoded MP3 audio
`format`	string	Audio format, fixed as `"mp3"`
`duration_ms`	number	Audio duration (milliseconds)
`boundaries`	array	Word boundaries (optional, see table below)

Word boundary fields (each object in the boundaries array):

Field	Type	Description
`offset_ms`	number	The word's start time in the audio (ms)
`duration_ms`	number	The word's pronunciation duration (ms)
`text`	string	The word text
`text_offset`	number	The word's starting position in the text
`word_length`	number	The word's character length

Note:

The host must specify which languages enable TTS via the tts_config parameter in the start command
Only viewers who subscribed to that language and enabled TTS will receive this event
It is sent only during the live phase; no TTS is sent during the standby phase

paused:

{
  "reason": "host_paused",
  "message": "Live broadcast is paused",
  "paused_at": "2025-12-23T10:30:45.123Z"
}

Field	Type	Description
`reason`	string	Pause reason: `host_paused` / `host_disconnected`
`message`	string	Notification message
`paused_at`	string	Pause time (ISO 8601)

resumed:

{
  "message": "Live broadcast resumed",
  "resumed_at": "2025-12-23T10:32:15.456Z"
}

ended:

{
  "reason": "session_stopped",
  "duration_ms": 3600000
}

Field	Type	Description
`reason`	string	End reason
`duration_ms`	number	Broadcast duration (milliseconds)

End reasons:

reason	Description
`session_stopped`	Host ended normally
`token_revoked`	Token was revoked
`host_timeout`	Host disconnection timeout
`capacity_exceeded`	Queue timeout

kicked:

{
  "message": "Removed by host"
}

error:

{
  "error_code": "broadcast_session_ended",
  "severity": "error",
  "message": "Broadcast session ended",
  "context": "broadcast",
  "request_id": "req_abc123xyz789",
  "timestamp": "2025-12-05T10:30:45.123Z"
}

Sentence-level errors (such as a translation failure for a specific language) additionally carry sid and translation_language, making it easy for the frontend to flag which language failed for a given sentence:

{
  "error_code": "llm_content_filtered",
  "severity": "warning",
  "message": "Content filtered",
  "context": "translation",
  "sid": 5,
  "translation_language": "ja",
  "request_id": "req_abc123xyz789",
  "timestamp": "2026-04-26T10:30:45.123Z"
}

Field	Type	Description
`error_code`	string	Error code
`severity`	string	Severity: `warning` / `error` / `fatal`
`message`	string	Error message
`context`	string	The context in which the error occurred (e.g., `broadcast`, `translation`)
`sid`	int	Optional. The sentence number for a sentence-level error (e.g., when that sentence's translation fails)
`translation_language`	string	Optional. The target language that failed to translate (viewers can use this to determine whether a specific language failed for that sentence)
`request_id`	string	Request tracking ID
`timestamp`	string	Time the error occurred (ISO 8601)

speaker_renamed:

Multi-speaker conversation mode only. Sent when the host performs a global speaker rename.

{
  "speaker_id": "Guest-1",
  "new_label": "Royx",
  "affected_sids": [1, 3, 5, 7]
}

Field	Type	Description
`speaker_id`	string	The resolved original speaker ID (even if the input is a display label, the event returns the original ID)
`new_label`	string	New display label (e.g., `Royx`)
`affected_sids`	array	List of affected sentence IDs

speaker_reassigned:

Multi-speaker conversation mode only. Sent when the host changes the speaker of a single sentence.

{
  "sid": 3,
  "old_speaker_id": "Guest-1",
  "new_speaker_id": "Guest-2",
  "new_speaker_label": "Amy"
}

Field	Type	Description
`sid`	number	The sentence ID that was modified
`old_speaker_id`	string	Original speaker ID (e.g., `Guest-1`)
`new_speaker_id`	string	The new original speaker ID (e.g., `Guest-2`)
`new_speaker_label`	string	New speaker display label (after applying `speaker_aliases`; equals the original ID when no alias exists)

speakers_merged:

Multi-speaker conversation mode only. Sent when the host merges speakers. After merging, all sentences belonging to that speaker are reassigned to the target speaker.

{
  "source_speaker_id": "Guest-2",
  "target_speaker_id": "Guest-1",
  "target_speaker_label": "Manager Wang",
  "affected_sids": [3, 5, 7]
}

Field	Type	Description
`source_speaker_id`	string	The original speaker ID being merged (e.g., `Guest-2`)
`target_speaker_id`	string	The original speaker ID of the merge target (e.g., `Guest-1`)
`target_speaker_label`	string	Target speaker display label (after applying `speaker_aliases`; equals the original ID when no alias exists)
`affected_sids`	array	List of affected sentence IDs

standby:

When a viewer connects during the standby phase, this event is received immediately after the connected event, indicating that the broadcast has not yet officially started. The host can dynamically update the standby message via the WebSocket set_standby_message action; after the update, all viewers receive a new standby event.

{
  "message": "The presentation is about to begin, please wait...",
  "translations": {
    "en-US": "The presentation is about to begin, please wait...",
    "ja-JP": "プレゼンテーションがまもなく始まります。お待ちください..."
  }
}

Field	Type	Description
`message`	string	The message displayed during the standby phase (original text)
`translations`	object	Translation results (optional); the key is the language code and the value is the translated text

phase_changed:

Sent when the broadcast switches from the standby phase to the active phase.

{
  "phase": "live",
  "message": "Broadcast has started"
}

Field	Type	Description
`phase`	string	The new phase: `live` (active phase)
`message`	string	Phase change message

announcement:

An announcement message sent by the host; all viewers receive it.

{
  "message": "The meeting will end in 5 minutes",
  "translations": {
    "en-US": "The meeting will end in 5 minutes",
    "ja-JP": "会議は5分後に終了します"
  }
}

Field	Type	Description
`message`	string	The announcement content (original text)
`translations`	object	Translation results (optional); the key is the language code and the value is the translated text

Heartbeat Mechanism

The SSE connection uses a heartbeat to keep the connection alive:

Interval: 15 seconds
Format: SSE comment (starting with :)
The frontend does not need to handle it; the browser automatically ignores it

: heartbeat

Error Responses

Error Code	HTTP Status	Description	Recommended Handling
`broadcast_session_not_found`	404	Broadcast not found	Confirm the token is correct
`broadcast_session_ended`	410	Broadcast ended	Notify the user that the broadcast has ended
`broadcast_capacity_exceeded`	503	Viewer capacity reached	Join the waiting queue

Note: If an SSE endpoint encounters an unexpected internal exception, it may return internal_error (the same per-message panic recovery mechanism as WebSocket); expected domain errors return the corresponding error code (e.g., sse_translation_failed).

Frontend Example

function connectBroadcast(token, lang = null) {
  let url = `https://vas-poc.vurbo.ai/broadcast/${token}/text`;
  if (lang) {
    url += `?lang=${lang}`;
  }

  const eventSource = new EventSource(url);

  eventSource.addEventListener('connected', (e) => {
    const data = JSON.parse(e.data);
    console.log(`Connected, original language: ${data.source_lang}`);
    console.log(`Available translations: ${data.available_langs.join(', ')}`);
  });

  eventSource.addEventListener('queued', (e) => {
    const data = JSON.parse(e.data);
    console.log(`In queue, position: ${data.position}, estimated wait: ${data.estimated_wait}`);
  });

  eventSource.addEventListener('admitted', (e) => {
    console.log('Entered live');
  });

  eventSource.addEventListener('origin', (e) => {
    const data = JSON.parse(e.data);
    console.log(`[${data.start_time}] ${data.text}`);
  });

  eventSource.addEventListener('translation', (e) => {
    const data = JSON.parse(e.data);
    console.log(`Translation (${data.language}): ${data.text}`);
  });

  eventSource.addEventListener('tts_ready', (e) => {
    const data = JSON.parse(e.data);
    // Decode the Base64 audio and play it
    const byteCharacters = atob(data.audio);
    const byteNumbers = new Array(byteCharacters.length);
    for (let i = 0; i < byteCharacters.length; i++) {
      byteNumbers[i] = byteCharacters.charCodeAt(i);
    }
    const blob = new Blob([new Uint8Array(byteNumbers)], { type: 'audio/mpeg' });
    const audio = new Audio(URL.createObjectURL(blob));
    audio.play();
  });

  eventSource.addEventListener('paused', (e) => {
    const data = JSON.parse(e.data);
    console.log(`Broadcast paused: ${data.message}`);
  });

  eventSource.addEventListener('resumed', (e) => {
    console.log('Broadcast resumed');
  });

  eventSource.addEventListener('ended', (e) => {
    const data = JSON.parse(e.data);
    console.log(`Broadcast ended, reason: ${data.reason}`);
    eventSource.close();
  });

  eventSource.addEventListener('kicked', (e) => {
    console.log('You have been removed');
    eventSource.close();
  });

  eventSource.addEventListener('speaker_renamed', (e) => {
    const data = JSON.parse(e.data);
    console.log(`Speaker renamed: ${data.speaker_id} → ${data.new_label}`);
    console.log(`Affected sentences: ${data.affected_sids.join(', ')}`);
    // Update the speaker display name for all affected sentences
  });

  eventSource.addEventListener('speaker_reassigned', (e) => {
    const data = JSON.parse(e.data);
    console.log(`Speaker of sentence ${data.sid} changed from ${data.old_speaker_id} to: ${data.new_speaker_label}`);
    // Update the speaker display name for that sentence
  });

  eventSource.addEventListener('standby', (e) => {
    const data = JSON.parse(e.data);
    // Display the translation matching the viewer's selected language
    const displayLang = 'en-US'; // The language the viewer selected
    const displayMessage = data.translations?.[displayLang] || data.message;
    console.log(`Standby phase: ${displayMessage}`);
    // Show the waiting screen
  });

  eventSource.addEventListener('phase_changed', (e) => {
    const data = JSON.parse(e.data);
    console.log(`Phase changed: ${data.phase} - ${data.message}`);
    // Remove the waiting screen and start displaying subtitles
  });

  eventSource.addEventListener('announcement', (e) => {
    const data = JSON.parse(e.data);
    // Display the translation matching the viewer's selected language
    const displayLang = 'en-US'; // The language the viewer selected
    const displayMessage = data.translations?.[displayLang] || data.message;
    console.log(`Announcement: ${displayMessage}`);
    // Show the announcement message
  });

  eventSource.addEventListener('error', (e) => {
    if (e.data) {
      const error = JSON.parse(e.data);
      console.error(`Error [${error.error_code}]: ${error.message}`);
    }
    eventSource.close();
  });

  return eventSource;
}

REST API also available: For the endpoint to query broadcast information, GET /broadcast/{token}/info, see REST API - Broadcasts API.

GET /api/v1/sse/history/transcribe/{taskId} (Retrieve Conversation History)

Description

Loads the complete conversation history for the specified task, including all sentences and the summary. Delivered one item at a time via an SSE stream.

Use Cases

Viewing the recording details page
Loading the historical transcript

Authentication

Header: X-API-Key: YOUR_API_KEY

Request Parameters

Parameter	Type	Required	Description
`taskId`	string	Yes	Recording ID (path parameter)

Request Example

// Use the fetch API (because EventSource does not support headers)
async function connectSSE(taskId, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );
  const reader = response.body.getReader();
  // ... handle SSE events
}

Event Sequence

1. connected        → Connection confirmation
2. init_metadata    → Send task metadata
3. init_sentence    → Send sentences one at a time (repeated N times)
4. init_summary     → Send the summary
5. init_done        → Initialization complete

Event Format

connected:

{"message": "History service connected (recordingId: xxx)"}

init_metadata:

{
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "title": "Meeting Notes",
  "created_at": "2025-12-17T10:00:00Z",
  "type": "transcribe",
  "has_speaker_diarization": true,
  "transcription_languages": ["zh-TW"],
  "translation_languages": ["en-US"],
  "summary_template": "general",
  "summary_language": "zh-TW",
  "speaker_aliases": {"speaker_1": "Manager Wang"}
}

speaker_aliases is a mapping of "original speaker ID → display name"; it is {} (an empty object, not an array) when there are no aliases. The frontend can use this mapping to run a duplicate-name pre-check before renaming a speaker (added in v1.3.12).

init_sentence:

{
  "sid": 1,
  "origin": "Hello, nice to meet you",
  "translations": {
    "en-US": "Hello, nice to meet you"
  },
  "start_time": "00:05",
  "speaker_id": "speaker_1",
  "speaker_label": "Manager Wang"
}

If a sentence has a translation failure, it additionally carries a translation_errors field (only present when there is a failure), so the frontend can distinguish between "that language was not scheduled for translation" (the key is missing from translations) and "translated but failed" (the key is present in translation_errors):

{
  "sid": 5,
  "origin": "Sentence with sensitive words",
  "translations": {
    "en-US": "Sensitive sentence"
  },
  "translation_errors": {
    "ja": "llm_content_filtered"
  },
  "start_time": "00:25",
  "speaker_id": "speaker_1",
  "speaker_label": "Manager Wang"
}

Field	Type	Description
`sid`	int	Sentence number
`origin`	string	Original text
`translations`	object	Translation results (optional); the key is the language code and the value is the translated text
`translation_errors`	object	Optional. Translation failure error codes; the key is the language code and the value is the error_code (e.g., `llm_content_filtered`)
`start_time`	string	Start time (mm:ss format)
`speaker_id`	string\|null	Original speaker ID (immutable, e.g., `speaker_1`); the source for `target_speaker_id` in `PATCH /speakers/reassign` (flipped in v1.5.3: previously the display name)
`speaker_label`	string\|null	Display label (the human-readable name after applying `speaker_aliases`, e.g., `Manager Wang`); equals `speaker_id` when no alias exists (added in v1.5.3 to replace the original `speaker_id` display semantics)

init_summary:

{"text": "This is a summary of the meeting notes..."}

init_done:

{"totalSentences": 10}

Error Responses

Error Code	HTTP Status	Description	Recommended Handling
`recording_not_found`	404	Recording not found	Confirm the taskId is correct
`sse_transcript_not_found`	404	Transcript not found	The recording may not have finished processing yet

Frontend Example

// Use the fetch API to handle SSE (you must parse the event-stream yourself)
async function loadHistory(taskId, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = decoder.decode(value);
    // Parse the SSE format: event: xxx\ndata: {...}\n\n
    const events = parseSSE(text);

    for (const event of events) {
      if (event.type === 'init_metadata') {
        console.log('Task info:', event.data.title);
      } else if (event.type === 'init_sentence') {
        console.log(`[${event.data.start_time}] ${event.data.origin}`);
        if (event.data.translation) {
          console.log(`Translation: ${event.data.translation}`);
        }
      } else if (event.type === 'init_done') {
        console.log('Loading complete');
      }
    }
  }
}

GET /api/v1/sse/retranslate/{taskId} (Retranslate Full Transcript)

Description

Retranslates all sentences of the specified task into the target language. Translation results are delivered one at a time via an SSE stream.

Use Cases

Switching the display language
Updating the translation content

Authentication

Header: X-API-Key: YOUR_API_KEY

Request Parameters

Parameter	Type	Required	Description
`taskId`	string	Yes	Recording ID (path parameter)
`targetLang`	string	Yes	Target language code

Request Example

// Use the fetch API (because EventSource does not support headers)
async function retranslateSSE(taskId, targetLang, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/retranslate/${taskId}?targetLang=${targetLang}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );
  const reader = response.body.getReader();
  // ... handle SSE events
}

Event Format

translation:

{"sid": 1, "text": "Hello, nice to meet you", "is_final": true}

done:

{"totalUpdated": 10}

error (per-sid sentence translation failure):

When a sentence fails to translate (e.g., LLM provider error, content filtering), instead of a translation event, an event: error is sent carrying sid + error_code, interleaved with the translation events. The frontend can handle this with the same translationError.ts interceptor (aligned with the WebSocket spec):

event: error
data: {"error_code": "sse_translation_failed", "severity": "error", "message": "SSE translation failed", "context": "sse", "sid": 5, "request_id": "req_abc123xyz789", "timestamp": "2026-04-27T10:30:45.123Z", "details": {"translation_language": "ja", "original_error": "..."}}

Field	Type	Description
`error_code`	string	Error code, currently fixed as `sse_translation_failed`
`severity`	string	`error`
`message`	string	Human-readable message
`context`	string	`sse` (automatically matched by the `ErrorContextEnum` prefix rule)
`sid`	int	The sentence number that failed
`request_id`	string	Request tracking ID
`timestamp`	string	Time the error occurred (ISO 8601)
`details`	object	Includes debug info such as `translation_language` and `original_error`

Failed sentences are saved as translation error records (see the history-playback guide), and the failure markers are visible the next time the history is loaded. For the full specification, see reference/sse/retranslate.md.

Error Responses

Error Code	HTTP Status	Description	Recommended Handling
`sse_missing_target_lang`	422	Missing target language parameter	Provide targetLang
`sse_unsupported_language`	422	Unsupported target language	Use a valid language code
`sse_translation_failed`	500	Translation failed (per-sid)	The failed sentence is still reported via `event: error`; the overall flow is not interrupted

Frontend Example

// Use the fetch API to handle SSE
async function retranslate(taskId, targetLang, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/retranslate/${taskId}?targetLang=${targetLang}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const events = parseSSE(decoder.decode(value));
    for (const event of events) {
      if (event.type === 'translation') {
        console.log(`Sentence ${event.data.sid}: ${event.data.text}`);
      } else if (event.type === 'error') {
        console.warn(`Sentence ${event.data.sid} failed to translate: ${event.data.error_code}`);
      } else if (event.type === 'done') {
        console.log(`Complete, ${event.data.totalUpdated} sentences updated`);
      }
    }
  }
}

GET /api/v1/sse/recordings/{taskId}/entries/{sid}/retranslate (Single-Sentence Retranslation, added in v1.4.0)

Description

Retranslates a single sentence. The most common scenario: after a user edits the original text via PATCH /api/v1/recordings/{id}/entries/{sid}, you call this endpoint to redo all translations for that sentence.

Differences from full-transcript retranslation (/retranslate/{taskId}):

Full-transcript retranslation: all sentences are translated into a single target language
Single-sentence retranslation: only one sentence is translated, and all existing target languages can be translated at once; supports optimistic locking

Authentication

Query: api_key (the browser EventSource does not support headers)

Request Parameters

Parameter	Location	Type	Required	Description
`taskId`	path	string	Yes	Recording ID (UUID)
`sid`	path	number	Yes	Sentence ID (1-based)
`targetLang`	query	string	No	Target language code. When omitted, all languages already present in `translated_texts` for that sentence are retranslated
`expectedRevision`	query	number	No	Optimistic lock: the current transcript revision; a mismatch returns `transcript_revision_conflict`
`api_key`	query	string	Yes	API key

Event Format

Event sequence: connected → progress / translated / error ×N → done

// progress (when translation begins for each language)
{ "sid": 5, "lang": "en-US", "status": "translating" }

// translated (when each language completes successfully)
{ "sid": 5, "lang": "en-US", "text": "Hello world", "tokens_used": 25 }

// done (all complete; successfully translated languages are listed in languages_translated)
{
  "sid": 5,
  "revision": 6,
  "original_text_edited_at": "2026-05-06T10:30:00.000000Z",
  "languages_translated": ["en-US"],
  "languages_failed": ["ja-JP"]
}

Error Responses

Error Code	HTTP	Description
`recording_not_found`	404	Recording does not exist or does not belong to the user
`recording_not_completed`	422	The recording has not finished processing
`entry_not_found`	404	The specified sentence was not found
`entry_text_empty`	422	The original text of that sentence is empty
`transcript_revision_conflict`	409	Revision mismatch (already modified by another request)
`storage_upload_failed`	500	Failed to save the transcript

For the full event format and a workflow example combining optimistic locking with PATCH, see reference/sse/retranslate.md.

`init_sentence` Edit Marker Fields (added in v1.4.0)

For sentences edited by a user, historyTranscribe adds two fields to the init_sentence event (only present after editing):

{
  "sid": 7,
  "origin": "Corrected text",
  "original_text_raw": "Original STT output",
  "original_text_edited_at": "2026-05-06T10:30:00.000000Z",
  "translations": { "en-US": "Corrected text" }
}

Frontend detection: determine this by the presence of the field ('original_text_raw' in data); do not compare origin === original_text_raw — a user may edit and then change it back to the same string, in which case the text is equal but the "edited" marker should still be shown. See reference/sse/history.md.

GET /api/v1/sse/retranslate/summary/{taskId} (Retranslate Summary)

Description

Retranslates the summary of the specified task into the target language. Translation results are delivered segment by segment via an SSE stream.

Use Cases

Switching the summary display language
Obtaining the summary in a different language

Authentication

Header: X-API-Key: YOUR_API_KEY

Request Parameters

Parameter	Type	Required	Description
`taskId`	string	Yes	Recording ID
`targetLang`	string	Yes	Target language code

Request Example

// Use the fetch API (because EventSource does not support headers)
async function retranslateSummarySSE(taskId, targetLang, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/retranslate/summary/${taskId}?targetLang=${targetLang}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );
  const reader = response.body.getReader();
  // ... handle SSE events
}

Event Format

summary_translation:

{"text": "Accumulated translation result...", "is_final": false}

done:

{"totalUpdated": 1}

Error Responses

Error Code	HTTP Status	Description	Recommended Handling
`sse_summary_not_found`	404	Summary not found	This recording has no summary
`sse_summary_translation_failed`	500	Summary translation failed	Retry later

Regenerate Summary (GET Preview / POST Save)

Split into two endpoints + mode-aware. For the full schema, see reference/sse/regenerate-summary.md; this is a quick summary.

Method	Endpoint	Writes DB	Saves Transcript	Billed	Purpose
GET	`/api/v1/sse/regenerate/summary/{taskId}`	❌	❌	✅	Preview (dry run)
POST	`/api/v1/sse/regenerate/summary/{taskId}`	✅	✅ + bump `revision`	✅	Save (persist officially)

Known limitation: GET is also billed — the LLM actually consumes tokens, so the GET endpoint cannot be used for free.

Shared Parameters (GET via query string, POST via JSON body)

Parameter	Type	Required	Description
`taskId` (path)	string	Yes	Recording UUID
`mode`	string	Yes	Summary mode enum: `builtin` / `custom`
`template`	string	Required for builtin / forbidden for custom	Built-in template slug
`prompt`	string	Required for custom / forbidden for builtin	The customer's full prompt (replaces the built-in layered prompt, ≤2000 characters)
`promptSlug`	string	Required for custom / forbidden for builtin	The customer's own identifier (≤64 Unicode characters, no control characters)
`language`	string	No	Output language (defaults to the first transcription language)
`plainText`	boolean	No	Whether to request plain-text output (default false)

Mutual exclusivity rule: violation → 422 summary_mode_field_mismatch.

Request Example

# Preview builtin (does not write DB / blob)
curl -N "https://vas-poc.vurbo.ai/api/v1/sse/regenerate/summary/550e8400-...?mode=builtin&template=meeting&language=zh-TW&plainText=true" \
  -H "X-API-Key: YOUR_API_KEY"

# Save custom
curl -N -X POST "https://vas-poc.vurbo.ai/api/v1/sse/regenerate/summary/550e8400-..." \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"mode":"custom","prompt":"Please emphasize KPIs","promptSlug":"acme-v2","plainText":true}'

Event Sequence

1. connected              → Connection confirmation (includes mode=builtin|custom, endpoint=preview|persist)
2. summary_regeneration   → Stream summary segments (accumulating; is_final=true marks the last one)
3. done                   → Complete, includes final_content / mode / template(effective) / prompt_snapshot (only for custom)

done event

{
  "task_id": "550e8400-...",
  "tokens_used": 123,
  "final_content": "This meeting...",
  "mode": "custom",
  "template": "acme-v2",
  "plain_text": true,
  "persisted": true,
  "prompt_snapshot": "Please emphasize KPIs"
}

mode: business mode (builtin / custom)
template: effective slug — builtin → built-in template slug; custom → customer slug
persisted: whether this summary has been officially saved (false for GET, true for POST)
prompt_snapshot: only present in custom mode; the prompt content the customer passed in verbatim (a mandatory snapshot, the sole basis for reconstruction)

Error Codes

Error Code	HTTP	Description
`recording_not_found`	404	Recording not found
`sse_template_not_found`	404	Summary template not found
`sse_transcript_not_found`	404	Transcript not found
`summary_text_empty`	400	The transcript has no content
`summary_text_too_long`	400	The transcript exceeds the 100,000-character limit
`sse_summary_regeneration_failed`	500	Regeneration failed (raw error already sanitized)
`summary_invalid_mode`	422	`mode` is not `builtin` / `custom`
`summary_mode_field_mismatch`	422	The mode and field combination do not match (required field missing / forbidden field provided)
`summary_prompt_too_long`	422	`prompt` exceeds 2000 characters
`summary_prompt_slug_too_long`	422	`promptSlug` exceeds 64 characters
`summary_prompt_slug_invalid`	422	`promptSlug` contains control characters (`\n` / `\r` / `\t` / `\0`, etc.)

Frontend Example

async function regenerateSummary(taskId, body, apiKey, { persist = false } = {}) {
  const url = `https://vas-poc.vurbo.ai/api/v1/sse/regenerate/summary/${taskId}`;
  const init = persist
    ? { method: 'POST', headers: { 'X-API-Key': apiKey, 'Content-Type': 'application/json' }, body: JSON.stringify(body) }
    : { method: 'GET', headers: { 'X-API-Key': apiKey } };
  if (!persist) {
    const params = new URLSearchParams(body);
    return fetch(`${url}?${params}`, init);
  }
  return fetch(url, init);
}

GET /api/v1/sse/tts/{taskId} (TTS Audio Stream)

Description

Converts the translated content of a historical recording into TTS audio, delivered sentence by sentence via an SSE stream. The frontend can control how many sentences are returned per request.

Use Cases

Audio playback of translations from historical recordings
Karaoke effect (combined with word boundaries)
Voice readout of translated content

Authentication

Header: X-API-Key: YOUR_API_KEY

Request Parameters

Parameter	Type	Required	Description
`taskId`	string	Yes	Recording ID (path parameter)
`language`	string	Yes	TTS output language (e.g., `en-US`)
`voice`	string	No	Specify a voice name (e.g., `en-US-JennyNeural`)
`sid`	int	No	Starting sentence ID (default 1, starting from the first sentence)
`length`	int	No	Number of sentences to return (default 1, maximum 20)

Note: The maximum value of length is controlled by the backend environment variable TTS_SSE_MAX_LENGTH (default 20). It is automatically truncated when it exceeds the maximum.

Request Example (Single Sentence Playback)

// Use the fetch API (because EventSource does not support headers)
async function playTTSSingle(taskId, language, sid, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=${language}&sid=${sid}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );
  const reader = response.body.getReader();
  // ... handle SSE events
}

Request Example (Multiple Sentence Playback)

// Play sentences 5, 6, and 7 (3 sentences total)
async function playTTSMultiple(taskId, language, sid, length, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=${language}&sid=${sid}&length=${length}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );
  const reader = response.body.getReader();
  // ... handle SSE events
}

Event Sequence

1. connected    → Connection confirmation
2. tts_audio    → Send TTS audio sentence by sentence (repeated N times, N = length)
3. tts_done     → Playback complete

Event Format

connected:

{
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "language": "en-US",
  "voice": "en-US-JennyNeural",
  "start_sid": 5,
  "length": 3
}

tts_audio:

{
  "sid": 5,
  "transcript": "Hello, nice to meet you",
  "text": "Hello, nice to meet you",
  "audio": "Base64EncodedMP3...",
  "duration_ms": 2500,
  "boundaries": [
    {"offset_ms": 0, "duration_ms": 350, "text_offset": 0, "word_length": 5, "text": "Hello"},
    {"offset_ms": 350, "duration_ms": 100, "text_offset": 5, "word_length": 1, "text": ","},
    {"offset_ms": 500, "duration_ms": 250, "text_offset": 7, "word_length": 4, "text": "nice"},
    {"offset_ms": 750, "duration_ms": 200, "text_offset": 12, "word_length": 2, "text": "to"},
    {"offset_ms": 950, "duration_ms": 350, "text_offset": 15, "word_length": 4, "text": "meet"},
    {"offset_ms": 1300, "duration_ms": 300, "text_offset": 20, "word_length": 3, "text": "you"}
  ]
}

Field	Type	Description
`sid`	int	Sentence ID
`transcript`	string	Original transcript (STT recognition result)
`text`	string	Translated text (the TTS synthesis source)
`audio`	string	Base64-encoded MP3 audio
`duration_ms`	int	Audio duration (milliseconds)
`boundaries`	array	Word boundary array

Word Boundary Field Descriptions

Field	Type	Description
`offset_ms`	int	The word's start time in the audio (milliseconds)
`duration_ms`	int	The word's duration (milliseconds)
`text_offset`	int	Position in the original text string (character index)
`word_length`	int	Word length (number of characters)
`text`	string	Word content

tts_done:

{
  "sentences_sent": 3,
  "total_duration_ms": 7500,
  "total_characters_used": 142
}

Field	Type	Description
`sentences_sent`	int	The number of sentences actually sent
`total_duration_ms`	int	The total audio duration of all sentences (milliseconds)
`total_characters_used`	int	The total number of characters synthesized in this TTS request (used for quota calculation)

Error Responses

Error Code	HTTP Status	Description	Recommended Handling
`recording_not_found`	404	Recording not found	Confirm the taskId is correct
`sse_missing_target_lang`	422	Missing language parameter	Provide the language parameter
`sse_unsupported_language`	422	Unsupported language	Use a valid language code
`tts_translation_not_found`	400	Translation for that language not found	Confirm the translation for that language exists
`tts_synthesis_failed`	500	TTS synthesis failed	Retry later
`tts_quota_exceeded`	402	TTS usage limit reached	Retry later

Frontend Example

// Use the fetch API to handle TTS SSE
async function playTTS(taskId, language, apiKey, startSid = 1, length = 1) {
  const url = new URL(`https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}`);
  url.searchParams.set('language', language);
  url.searchParams.set('sid', startSid);
  url.searchParams.set('length', length);

  const response = await fetch(url, {
    headers: {
      'X-API-Key': apiKey
    }
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const events = parseSSE(decoder.decode(value));
    for (const event of events) {
      if (event.type === 'connected') {
        console.log(`TTS connection successful, voice: ${event.data.voice}`);
      } else if (event.type === 'tts_audio') {
        console.log(`Sentence ${event.data.sid}: ${event.data.text}`);

        // Play the audio
        const audioBlob = base64ToBlob(event.data.audio, 'audio/mp3');
        const audioUrl = URL.createObjectURL(audioBlob);
        const audio = new Audio(audioUrl);

        // Set up the karaoke effect
        setupKaraoke(audio, event.data.boundaries, event.data.text);

        audio.play();
      } else if (event.type === 'tts_done') {
        console.log(`Playback complete, ${event.data.sentences_sent} sentences total`);
      }
    }
  }
}

// Base64 to Blob
function base64ToBlob(base64, mimeType) {
  const byteCharacters = atob(base64);
  const byteNumbers = new Array(byteCharacters.length);
  for (let i = 0; i < byteCharacters.length; i++) {
    byteNumbers[i] = byteCharacters.charCodeAt(i);
  }
  const byteArray = new Uint8Array(byteNumbers);
  return new Blob([byteArray], { type: mimeType });
}

// Karaoke effect
function setupKaraoke(audio, boundaries, text) {
  const updateHighlight = () => {
    const currentTimeMs = audio.currentTime * 1000;
    const currentWord = boundaries.find((b, i) => {
      const nextOffset = boundaries[i + 1]?.offset_ms ?? Infinity;
      return currentTimeMs >= b.offset_ms && currentTimeMs < nextOffset;
    });

    if (currentWord) {
      // Highlight the current word
      highlightWord(text, currentWord.text_offset, currentWord.word_length);
    }
  };

  const interval = setInterval(updateHighlight, 50);
  audio.addEventListener('ended', () => clearInterval(interval));
}

Showing a real-time processing progress bar after uploading an audio file
Tracking the progress of each stage: audio conversion, transcription, translation, summary, etc.

Authentication

Header: X-API-Key: YOUR_API_KEY

Request Parameters

Parameter	Type	Required	Description
`importId`	string	Yes	Import task ID (UUID, path parameter)

Request Example

curl -N "https://vas-poc.vurbo.ai/api/v1/sse/imports/550e8400-e29b-41d4-a716-446655440000/progress" \
  -H "X-API-Key: vas_aB3dE5fG7hI9jK1lM3nO5pQ7rS9tU1vW"

Event Sequence

Scenario 1: Import still in progress
1. connected       → Connection confirmation
2. progress        → Send the current progress
3. progress ×N     → Continuously pushed when progress changes
   heartbeat ×N    → Sent every 15 seconds when there is no progress change
4. completed       → Import succeeded, connection ends
   or failed       → Import failed, connection ends
   or timeout      → Exceeded 15 minutes, connection ends

Scenario 2: Import already complete (terminal state)
1. connected       → Connection confirmation
2. progress        → Send the final progress
3. completed       → Send the completed event directly and end
   or failed       → Send the failed event directly and end

Event Format

connected:

{"message": "Import progress service connected (importId: xxx)"}

progress:

{
  "import_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "stage": "transcribing",
  "progress": 45,
  "message": "Transcribing..."
}

Field	Type	Description
`import_id`	string	Import task ID (UUID)
`status`	string	Import status: `pending` / `processing` / `completed` / `failed`
`stage`	string / null	The current processing stage
`progress`	integer	Progress percentage (0-100)
`message`	string	Human-readable progress message

Stage values and their corresponding progress ranges:

Value	Description	Progress Range
`converting`	Audio format conversion	0% - 10%
`transcribing`	Speech-to-text	10% - 60%
`translating`	Text translation	60% - 85%
`summarizing`	Generating the summary	85% - 100%
`null`	Not started yet	—

completed:

{
  "import_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "task_id": "abc123-e29b-41d4-a716-446655440000",
  "message": "Processing complete"
}

Field	Type	Description
`import_id`	string	Import task ID
`status`	string	Fixed as `completed`
`task_id`	string	The generated recording ID (recording_id), which can be used for subsequent queries
`message`	string	Fixed as `Processing complete`

failed:

{
  "import_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "failed",
  "error_code": "import_invalid_format",
  "error_message": "Unsupported audio format"
}

Field	Type	Description
`import_id`	string	Import task ID
`status`	string	Fixed as `failed`
`error_code`	string	Error code
`error_message`	string	Human-readable error message

heartbeat:

Sent every 15 seconds when there is no progress change, used to keep the connection alive.

{"timestamp": 1708761600}

timeout:

Sent when the import has not completed after 15 minutes; the connection ends automatically.

{"message": "Connection timeout"}

Error Responses

Error Code	HTTP Status	Description	Recommended Handling
`import_not_found`	404	The specified import task was not found	Confirm the importId is correct

Frontend Example

async function trackImportProgress(importId, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/imports/${importId}/progress`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const events = buffer.split('\n\n');
    buffer = events.pop();

    for (const eventStr of events) {
      if (!eventStr.trim()) continue;

      const lines = eventStr.split('\n');
      let eventType = '';
      let eventData = '';

      for (const line of lines) {
        if (line.startsWith('event: ')) eventType = line.slice(7);
        else if (line.startsWith('data: ')) eventData = line.slice(6);
      }

      if (!eventType || !eventData) continue;
      const data = JSON.parse(eventData);

      switch (eventType) {
        case 'connected':
          console.log('Connected:', data.message);
          break;
        case 'progress':
          console.log(`[${data.stage}] ${data.progress}% - ${data.message}`);
          updateProgressBar(data.progress, data.stage, data.message);
          break;
        case 'completed':
          console.log('Import complete! Recording ID:', data.task_id);
          navigateToRecording(data.task_id);
          break;
        case 'failed':
          console.error('Import failed:', data.error_code, data.error_message);
          showError(data.error_message);
          break;
        case 'timeout':
          console.warn('Connection timeout:', data.message);
          break;
      }
    }
  }
}

Version: V1.5.7 Last Updated: 2026-05-20

Rest Api

Websocket Api