Sse Api
Note: The URL used in this document (
vas-poc.vurbo.ai) is the planned deployment URL. The official URL will be announced separately after launch.
Table of Contents
- SSE API
- Table of Contents
- Connection Information
- Broadcast SSE API
- GET /api/v1/sse/history/transcribe/{taskId} (Retrieve Conversation History)
- GET /api/v1/sse/retranslate/{taskId} (Retranslate Full Transcript)
- GET /api/v1/sse/recordings/{taskId}/entries/{sid}/retranslate (Single-Sentence Retranslation, added in v1.4.0)
- GET /api/v1/sse/retranslate/summary/{taskId} (Retranslate Summary)
- Regenerate Summary (GET Preview / POST Save)
- GET /api/v1/sse/audio/{taskId} (Audio Streaming Playback) — see the dedicated spec
- GET /api/v1/sse/tts/{taskId} (TTS Audio Stream)
- GET /api/v1/sse/imports/{importId}/progress (Import Progress Stream)
Connection Information
| Item | Value |
|---|---|
| Base Path | https://vas-poc.vurbo.ai/api/v1/sse |
| Protocol | HTTP + Server-Sent Events (SSE) |
| Data Format | text/event-stream |
| Authentication | Header X-API-Key: {KEY} |
Authentication
SSE APIs that require authentication accept two delivery methods (the backend VerifyApiKeyQuery middleware supports both):
# Method A: HTTP Header (recommended, better security)
X-API-Key: vas_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Method B: Query string (native browser EventSource fallback)
?api_key=vas_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Note: The native browser EventSource API does not support custom headers. You can use the
?api_key=query string instead, or use the fetch API with a ReadableStream / an SSE client library that supports headers. In query string mode, the API key appears in the URL, so avoid writing the full URL into server logs or leaking it in screenshots.
Broadcast SSE API
The Broadcast SSE API provides a live subtitle streaming feature, allowing viewers to watch real-time transcription and translation content through a share link.
Note: The base path for Broadcast SSE is
https://vas-poc.vurbo.ai/broadcast, which differs from the other SSE APIs.
GET /broadcast/{token}/text (Viewer Live Subtitle Stream)
Description
Viewers connect using a share token to receive an SSE stream of real-time transcription and translation.
Use Cases
- Viewers watching live subtitles
- Multilingual translation subtitle display
- TTS audio playback
Authentication
Token authentication (no API key required): verified through the {token} in the URL path.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
token | string | Yes | Broadcast share token (4-character short code, a-z0-9, path parameter) |
lang | string | No | Filter for a specific translation language (e.g., en-US) |
tts | boolean | No | Whether to enable TTS (true / false, default false) |
viewer_access_token | string | Conditional | Viewer access token (required for password-protected broadcasts) |
Password Protection Note: When a broadcast is set to password-protected, viewers must first obtain a
viewer_access_tokenthrough the password verification API, then include this token in the query parameters of the SSE connection.
Request Example
// Receive all languages
const eventSource = new EventSource(
'https://vas-poc.vurbo.ai/broadcast/a3f9/text'
);
// Receive only English translation
const eventSource = new EventSource(
'https://vas-poc.vurbo.ai/broadcast/a3f9/text?lang=en-US'
);
// Receive English translation and enable TTS
const eventSource = new EventSource(
'https://vas-poc.vurbo.ai/broadcast/a3f9/text?lang=en-US&tts=true'
);
Event Types
| Event | Description | Notes |
|---|---|---|
connected | Connection confirmation | - |
queued | Added to the waiting queue | Queueing mechanism |
admitted | Entered live from the queue | Queueing mechanism |
origin | Original text (STT) | - |
translation | Translation result | - |
tts_ready | TTS audio ready | - |
paused | Broadcast paused | Host paused or disconnected |
resumed | Broadcast resumed | Host resumed |
ended | Broadcast ended | - |
kicked | Removed | Viewer management |
error | Error | - |
speaker_renamed | Speaker renamed | - |
speaker_reassigned | Single-sentence speaker change | - |
speakers_merged | Speakers merged | - |
standby | Standby phase notification | - |
phase_changed | Phase change notification | - |
announcement | Host announcement | - |
Event Format
connected:
{
"session_id": "abc123",
"source_lang": "zh-TW",
"subscribed_lang": "en-US",
"available_langs": ["en-US", "ja-JP"],
"tts_languages": ["en-US"],
"phase": "standby",
"recognition_mode": "single",
"client_id": "client_xyz"
}
| Field | Type | Description |
|---|---|---|
session_id | string | Broadcast session ID |
source_lang | string | Original language (set by the host) |
subscribed_lang | string | The filter language the viewer subscribed to (null if not specified) |
available_langs | array | List of available translation languages |
tts_languages | array | List of languages with TTS enabled (an empty array means no TTS) |
phase | string | Broadcast phase: standby (preparing) or live (active) |
recognition_mode | string | Recognition mode: single (single speaker) or multi_speaker (multi-speaker diarization) |
client_id | string | Client ID |
queued:
{
"position": 3,
"estimated_wait": "About 2 minutes"
}
| Field | Type | Description |
|---|---|---|
position | number | Position in the queue (1 = next up) |
estimated_wait | string | Estimated wait time |
admitted:
{
"message": "Entered live"
}
origin:
{
"sid": 1,
"text": "Hello everyone",
"speaker_id": "Guest-1",
"speaker_label": "Guest-1",
"start_time": "00:05",
"is_final": true
}
| Field | Type | Description |
|---|---|---|
sid | number | Sentence ID |
text | string | Original text content |
speaker_id | string | Original speaker ID (immutable; "0" in single-speaker mode, or "1"/"2" in conversation mode) |
speaker_label | string | Display label (after applying speaker_aliases; equals speaker_id when no alias exists) |
start_time | string | Start time (mm:ss); this field is not sent during the standby phase, and counts from 00:00 once live |
is_final | boolean | Whether this is the final result |
translation:
{
"sid": 1,
"language": "en-US",
"text": "Hello everyone",
"speaker_id": "Guest-1",
"speaker_label": "Royx",
"is_final": true
}
| Field | Type | Description |
|---|---|---|
sid | number | The corresponding sentence ID |
language | string | Translation language |
text | string | Translated content |
speaker_id | string | Original speaker ID (multi-speaker conversation mode; immutable) |
speaker_label | string | Display label (after applying speaker_aliases) |
is_final | boolean | Whether this is the final result |
tts_ready:
{
"sid": 1,
"language": "en-US",
"transcript": "Hello, hi everyone",
"text": "Hello everyone",
"audio": "//uQxAAAAAANIAAAAAExBTUUzLjEwMFVVVV...",
"format": "mp3",
"duration_ms": 2340,
"boundaries": [
{"offset_ms": 0, "duration_ms": 320, "text": "Hello", "text_offset": 0, "word_length": 5},
{"offset_ms": 320, "duration_ms": 280, "text": "everyone", "text_offset": 6, "word_length": 8}
]
}
| Field | Type | Description |
|---|---|---|
sid | number | The corresponding sentence ID |
language | string | TTS language |
transcript | string | Original transcript (source text) |
text | string | Translated text |
audio | string | Base64-encoded MP3 audio |
format | string | Audio format, fixed as "mp3" |
duration_ms | number | Audio duration (milliseconds) |
boundaries | array | Word boundaries (optional, see table below) |
Word boundary fields (each object in the boundaries array):
| Field | Type | Description |
|---|---|---|
offset_ms | number | The word's start time in the audio (ms) |
duration_ms | number | The word's pronunciation duration (ms) |
text | string | The word text |
text_offset | number | The word's starting position in the text |
word_length | number | The word's character length |
Note:
- The host must specify which languages enable TTS via the
tts_configparameter in thestartcommand - Only viewers who subscribed to that language and enabled TTS will receive this event
- It is sent only during the
livephase; no TTS is sent during thestandbyphase
paused:
{
"reason": "host_paused",
"message": "Live broadcast is paused",
"paused_at": "2025-12-23T10:30:45.123Z"
}
| Field | Type | Description |
|---|---|---|
reason | string | Pause reason: host_paused / host_disconnected |
message | string | Notification message |
paused_at | string | Pause time (ISO 8601) |
resumed:
{
"message": "Live broadcast resumed",
"resumed_at": "2025-12-23T10:32:15.456Z"
}
ended:
{
"reason": "session_stopped",
"duration_ms": 3600000
}
| Field | Type | Description |
|---|---|---|
reason | string | End reason |
duration_ms | number | Broadcast duration (milliseconds) |
End reasons:
| reason | Description |
|---|---|
session_stopped | Host ended normally |
token_revoked | Token was revoked |
host_timeout | Host disconnection timeout |
capacity_exceeded | Queue timeout |
kicked:
{
"message": "Removed by host"
}
error:
{
"error_code": "broadcast_session_ended",
"severity": "error",
"message": "Broadcast session ended",
"context": "broadcast",
"request_id": "req_abc123xyz789",
"timestamp": "2025-12-05T10:30:45.123Z"
}
Sentence-level errors (such as a translation failure for a specific language) additionally carry sid and translation_language, making it easy for the frontend to flag which language failed for a given sentence:
{
"error_code": "llm_content_filtered",
"severity": "warning",
"message": "Content filtered",
"context": "translation",
"sid": 5,
"translation_language": "ja",
"request_id": "req_abc123xyz789",
"timestamp": "2026-04-26T10:30:45.123Z"
}
| Field | Type | Description |
|---|---|---|
error_code | string | Error code |
severity | string | Severity: warning / error / fatal |
message | string | Error message |
context | string | The context in which the error occurred (e.g., broadcast, translation) |
sid | int | Optional. The sentence number for a sentence-level error (e.g., when that sentence's translation fails) |
translation_language | string | Optional. The target language that failed to translate (viewers can use this to determine whether a specific language failed for that sentence) |
request_id | string | Request tracking ID |
timestamp | string | Time the error occurred (ISO 8601) |
speaker_renamed:
Multi-speaker conversation mode only. Sent when the host performs a global speaker rename.
{
"speaker_id": "Guest-1",
"new_label": "Royx",
"affected_sids": [1, 3, 5, 7]
}
| Field | Type | Description |
|---|---|---|
speaker_id | string | The resolved original speaker ID (even if the input is a display label, the event returns the original ID) |
new_label | string | New display label (e.g., Royx) |
affected_sids | array | List of affected sentence IDs |
speaker_reassigned:
Multi-speaker conversation mode only. Sent when the host changes the speaker of a single sentence.
{
"sid": 3,
"old_speaker_id": "Guest-1",
"new_speaker_id": "Guest-2",
"new_speaker_label": "Amy"
}
| Field | Type | Description |
|---|---|---|
sid | number | The sentence ID that was modified |
old_speaker_id | string | Original speaker ID (e.g., Guest-1) |
new_speaker_id | string | The new original speaker ID (e.g., Guest-2) |
new_speaker_label | string | New speaker display label (after applying speaker_aliases; equals the original ID when no alias exists) |
speakers_merged:
Multi-speaker conversation mode only. Sent when the host merges speakers. After merging, all sentences belonging to that speaker are reassigned to the target speaker.
{
"source_speaker_id": "Guest-2",
"target_speaker_id": "Guest-1",
"target_speaker_label": "Manager Wang",
"affected_sids": [3, 5, 7]
}
| Field | Type | Description |
|---|---|---|
source_speaker_id | string | The original speaker ID being merged (e.g., Guest-2) |
target_speaker_id | string | The original speaker ID of the merge target (e.g., Guest-1) |
target_speaker_label | string | Target speaker display label (after applying speaker_aliases; equals the original ID when no alias exists) |
affected_sids | array | List of affected sentence IDs |
standby:
When a viewer connects during the standby phase, this event is received immediately after the
connectedevent, indicating that the broadcast has not yet officially started. The host can dynamically update the standby message via the WebSocketset_standby_messageaction; after the update, all viewers receive a newstandbyevent.
{
"message": "The presentation is about to begin, please wait...",
"translations": {
"en-US": "The presentation is about to begin, please wait...",
"ja-JP": "プレゼンテーションがまもなく始まります。お待ちください..."
}
}
| Field | Type | Description |
|---|---|---|
message | string | The message displayed during the standby phase (original text) |
translations | object | Translation results (optional); the key is the language code and the value is the translated text |
phase_changed:
Sent when the broadcast switches from the standby phase to the active phase.
{
"phase": "live",
"message": "Broadcast has started"
}
| Field | Type | Description |
|---|---|---|
phase | string | The new phase: live (active phase) |
message | string | Phase change message |
announcement:
An announcement message sent by the host; all viewers receive it.
{
"message": "The meeting will end in 5 minutes",
"translations": {
"en-US": "The meeting will end in 5 minutes",
"ja-JP": "会議は5分後に終了します"
}
}
| Field | Type | Description |
|---|---|---|
message | string | The announcement content (original text) |
translations | object | Translation results (optional); the key is the language code and the value is the translated text |
Heartbeat Mechanism
The SSE connection uses a heartbeat to keep the connection alive:
- Interval: 15 seconds
- Format: SSE comment (starting with
:) - The frontend does not need to handle it; the browser automatically ignores it
: heartbeat
Error Responses
| Error Code | HTTP Status | Description | Recommended Handling |
|---|---|---|---|
broadcast_session_not_found | 404 | Broadcast not found | Confirm the token is correct |
broadcast_session_ended | 410 | Broadcast ended | Notify the user that the broadcast has ended |
broadcast_capacity_exceeded | 503 | Viewer capacity reached | Join the waiting queue |
Note: If an SSE endpoint encounters an unexpected internal exception, it may return
internal_error(the same per-message panic recovery mechanism as WebSocket); expected domain errors return the corresponding error code (e.g.,sse_translation_failed).
Frontend Example
function connectBroadcast(token, lang = null) {
let url = `https://vas-poc.vurbo.ai/broadcast/${token}/text`;
if (lang) {
url += `?lang=${lang}`;
}
const eventSource = new EventSource(url);
eventSource.addEventListener('connected', (e) => {
const data = JSON.parse(e.data);
console.log(`Connected, original language: ${data.source_lang}`);
console.log(`Available translations: ${data.available_langs.join(', ')}`);
});
eventSource.addEventListener('queued', (e) => {
const data = JSON.parse(e.data);
console.log(`In queue, position: ${data.position}, estimated wait: ${data.estimated_wait}`);
});
eventSource.addEventListener('admitted', (e) => {
console.log('Entered live');
});
eventSource.addEventListener('origin', (e) => {
const data = JSON.parse(e.data);
console.log(`[${data.start_time}] ${data.text}`);
});
eventSource.addEventListener('translation', (e) => {
const data = JSON.parse(e.data);
console.log(`Translation (${data.language}): ${data.text}`);
});
eventSource.addEventListener('tts_ready', (e) => {
const data = JSON.parse(e.data);
// Decode the Base64 audio and play it
const byteCharacters = atob(data.audio);
const byteNumbers = new Array(byteCharacters.length);
for (let i = 0; i < byteCharacters.length; i++) {
byteNumbers[i] = byteCharacters.charCodeAt(i);
}
const blob = new Blob([new Uint8Array(byteNumbers)], { type: 'audio/mpeg' });
const audio = new Audio(URL.createObjectURL(blob));
audio.play();
});
eventSource.addEventListener('paused', (e) => {
const data = JSON.parse(e.data);
console.log(`Broadcast paused: ${data.message}`);
});
eventSource.addEventListener('resumed', (e) => {
console.log('Broadcast resumed');
});
eventSource.addEventListener('ended', (e) => {
const data = JSON.parse(e.data);
console.log(`Broadcast ended, reason: ${data.reason}`);
eventSource.close();
});
eventSource.addEventListener('kicked', (e) => {
console.log('You have been removed');
eventSource.close();
});
eventSource.addEventListener('speaker_renamed', (e) => {
const data = JSON.parse(e.data);
console.log(`Speaker renamed: ${data.speaker_id} → ${data.new_label}`);
console.log(`Affected sentences: ${data.affected_sids.join(', ')}`);
// Update the speaker display name for all affected sentences
});
eventSource.addEventListener('speaker_reassigned', (e) => {
const data = JSON.parse(e.data);
console.log(`Speaker of sentence ${data.sid} changed from ${data.old_speaker_id} to: ${data.new_speaker_label}`);
// Update the speaker display name for that sentence
});
eventSource.addEventListener('standby', (e) => {
const data = JSON.parse(e.data);
// Display the translation matching the viewer's selected language
const displayLang = 'en-US'; // The language the viewer selected
const displayMessage = data.translations?.[displayLang] || data.message;
console.log(`Standby phase: ${displayMessage}`);
// Show the waiting screen
});
eventSource.addEventListener('phase_changed', (e) => {
const data = JSON.parse(e.data);
console.log(`Phase changed: ${data.phase} - ${data.message}`);
// Remove the waiting screen and start displaying subtitles
});
eventSource.addEventListener('announcement', (e) => {
const data = JSON.parse(e.data);
// Display the translation matching the viewer's selected language
const displayLang = 'en-US'; // The language the viewer selected
const displayMessage = data.translations?.[displayLang] || data.message;
console.log(`Announcement: ${displayMessage}`);
// Show the announcement message
});
eventSource.addEventListener('error', (e) => {
if (e.data) {
const error = JSON.parse(e.data);
console.error(`Error [${error.error_code}]: ${error.message}`);
}
eventSource.close();
});
return eventSource;
}
REST API also available: For the endpoint to query broadcast information,
GET /broadcast/{token}/info, see REST API - Broadcasts API.
GET /api/v1/sse/history/transcribe/{taskId} (Retrieve Conversation History)
Description
Loads the complete conversation history for the specified task, including all sentences and the summary. Delivered one item at a time via an SSE stream.
Use Cases
- Viewing the recording details page
- Loading the historical transcript
Authentication
Header: X-API-Key: YOUR_API_KEY
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
taskId | string | Yes | Recording ID (path parameter) |
Request Example
// Use the fetch API (because EventSource does not support headers)
async function connectSSE(taskId, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
{
headers: {
'X-API-Key': apiKey
}
}
);
const reader = response.body.getReader();
// ... handle SSE events
}
Event Sequence
1. connected → Connection confirmation
2. init_metadata → Send task metadata
3. init_sentence → Send sentences one at a time (repeated N times)
4. init_summary → Send the summary
5. init_done → Initialization complete
Event Format
connected:
{"message": "History service connected (recordingId: xxx)"}
init_metadata:
{
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Meeting Notes",
"created_at": "2025-12-17T10:00:00Z",
"type": "transcribe",
"has_speaker_diarization": true,
"transcription_languages": ["zh-TW"],
"translation_languages": ["en-US"],
"summary_template": "general",
"summary_language": "zh-TW",
"speaker_aliases": {"speaker_1": "Manager Wang"}
}
speaker_aliases is a mapping of "original speaker ID → display name"; it is {} (an empty object, not an array) when there are no aliases. The frontend can use this mapping to run a duplicate-name pre-check before renaming a speaker (added in v1.3.12).
init_sentence:
{
"sid": 1,
"origin": "Hello, nice to meet you",
"translations": {
"en-US": "Hello, nice to meet you"
},
"start_time": "00:05",
"speaker_id": "speaker_1",
"speaker_label": "Manager Wang"
}
If a sentence has a translation failure, it additionally carries a translation_errors field (only present when there is a failure), so the frontend can distinguish between "that language was not scheduled for translation" (the key is missing from translations) and "translated but failed" (the key is present in translation_errors):
{
"sid": 5,
"origin": "Sentence with sensitive words",
"translations": {
"en-US": "Sensitive sentence"
},
"translation_errors": {
"ja": "llm_content_filtered"
},
"start_time": "00:25",
"speaker_id": "speaker_1",
"speaker_label": "Manager Wang"
}
| Field | Type | Description |
|---|---|---|
sid | int | Sentence number |
origin | string | Original text |
translations | object | Translation results (optional); the key is the language code and the value is the translated text |
translation_errors | object | Optional. Translation failure error codes; the key is the language code and the value is the error_code (e.g., llm_content_filtered) |
start_time | string | Start time (mm:ss format) |
speaker_id | string|null | Original speaker ID (immutable, e.g., speaker_1); the source for target_speaker_id in PATCH /speakers/reassign (flipped in v1.5.3: previously the display name) |
speaker_label | string|null | Display label (the human-readable name after applying speaker_aliases, e.g., Manager Wang); equals speaker_id when no alias exists (added in v1.5.3 to replace the original speaker_id display semantics) |
init_summary:
{"text": "This is a summary of the meeting notes..."}
init_done:
{"totalSentences": 10}
Error Responses
| Error Code | HTTP Status | Description | Recommended Handling |
|---|---|---|---|
recording_not_found | 404 | Recording not found | Confirm the taskId is correct |
sse_transcript_not_found | 404 | Transcript not found | The recording may not have finished processing yet |
Frontend Example
// Use the fetch API to handle SSE (you must parse the event-stream yourself)
async function loadHistory(taskId, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
{
headers: {
'X-API-Key': apiKey
}
}
);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
// Parse the SSE format: event: xxx\ndata: {...}\n\n
const events = parseSSE(text);
for (const event of events) {
if (event.type === 'init_metadata') {
console.log('Task info:', event.data.title);
} else if (event.type === 'init_sentence') {
console.log(`[${event.data.start_time}] ${event.data.origin}`);
if (event.data.translation) {
console.log(`Translation: ${event.data.translation}`);
}
} else if (event.type === 'init_done') {
console.log('Loading complete');
}
}
}
}
GET /api/v1/sse/retranslate/{taskId} (Retranslate Full Transcript)
Description
Retranslates all sentences of the specified task into the target language. Translation results are delivered one at a time via an SSE stream.
Use Cases
- Switching the display language
- Updating the translation content
Authentication
Header: X-API-Key: YOUR_API_KEY
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
taskId | string | Yes | Recording ID (path parameter) |
targetLang | string | Yes | Target language code |
Request Example
// Use the fetch API (because EventSource does not support headers)
async function retranslateSSE(taskId, targetLang, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/retranslate/${taskId}?targetLang=${targetLang}`,
{
headers: {
'X-API-Key': apiKey
}
}
);
const reader = response.body.getReader();
// ... handle SSE events
}
Event Format
translation:
{"sid": 1, "text": "Hello, nice to meet you", "is_final": true}
done:
{"totalUpdated": 10}
error (per-sid sentence translation failure):
When a sentence fails to translate (e.g., LLM provider error, content filtering), instead of a translation event, an event: error is sent carrying sid + error_code, interleaved with the translation events. The frontend can handle this with the same translationError.ts interceptor (aligned with the WebSocket spec):
event: error
data: {"error_code": "sse_translation_failed", "severity": "error", "message": "SSE translation failed", "context": "sse", "sid": 5, "request_id": "req_abc123xyz789", "timestamp": "2026-04-27T10:30:45.123Z", "details": {"translation_language": "ja", "original_error": "..."}}
| Field | Type | Description |
|---|---|---|
error_code | string | Error code, currently fixed as sse_translation_failed |
severity | string | error |
message | string | Human-readable message |
context | string | sse (automatically matched by the ErrorContextEnum prefix rule) |
sid | int | The sentence number that failed |
request_id | string | Request tracking ID |
timestamp | string | Time the error occurred (ISO 8601) |
details | object | Includes debug info such as translation_language and original_error |
Failed sentences are saved as translation error records (see the history-playback guide), and the failure markers are visible the next time the history is loaded. For the full specification, see reference/sse/retranslate.md.
Error Responses
| Error Code | HTTP Status | Description | Recommended Handling |
|---|---|---|---|
sse_missing_target_lang | 422 | Missing target language parameter | Provide targetLang |
sse_unsupported_language | 422 | Unsupported target language | Use a valid language code |
sse_translation_failed | 500 | Translation failed (per-sid) | The failed sentence is still reported via event: error; the overall flow is not interrupted |
Frontend Example
// Use the fetch API to handle SSE
async function retranslate(taskId, targetLang, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/retranslate/${taskId}?targetLang=${targetLang}`,
{
headers: {
'X-API-Key': apiKey
}
}
);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const events = parseSSE(decoder.decode(value));
for (const event of events) {
if (event.type === 'translation') {
console.log(`Sentence ${event.data.sid}: ${event.data.text}`);
} else if (event.type === 'error') {
console.warn(`Sentence ${event.data.sid} failed to translate: ${event.data.error_code}`);
} else if (event.type === 'done') {
console.log(`Complete, ${event.data.totalUpdated} sentences updated`);
}
}
}
}
GET /api/v1/sse/recordings/{taskId}/entries/{sid}/retranslate (Single-Sentence Retranslation, added in v1.4.0)
Description
Retranslates a single sentence. The most common scenario: after a user edits the original text via PATCH /api/v1/recordings/{id}/entries/{sid}, you call this endpoint to redo all translations for that sentence.
Differences from full-transcript retranslation (/retranslate/{taskId}):
- Full-transcript retranslation: all sentences are translated into a single target language
- Single-sentence retranslation: only one sentence is translated, and all existing target languages can be translated at once; supports optimistic locking
Authentication
Query: api_key (the browser EventSource does not support headers)
Request Parameters
| Parameter | Location | Type | Required | Description |
|---|---|---|---|---|
taskId | path | string | Yes | Recording ID (UUID) |
sid | path | number | Yes | Sentence ID (1-based) |
targetLang | query | string | No | Target language code. When omitted, all languages already present in translated_texts for that sentence are retranslated |
expectedRevision | query | number | No | Optimistic lock: the current transcript revision; a mismatch returns transcript_revision_conflict |
api_key | query | string | Yes | API key |
Event Format
Event sequence: connected → progress / translated / error ×N → done
// progress (when translation begins for each language)
{ "sid": 5, "lang": "en-US", "status": "translating" }
// translated (when each language completes successfully)
{ "sid": 5, "lang": "en-US", "text": "Hello world", "tokens_used": 25 }
// done (all complete; successfully translated languages are listed in languages_translated)
{
"sid": 5,
"revision": 6,
"original_text_edited_at": "2026-05-06T10:30:00.000000Z",
"languages_translated": ["en-US"],
"languages_failed": ["ja-JP"]
}
Error Responses
| Error Code | HTTP | Description |
|---|---|---|
recording_not_found | 404 | Recording does not exist or does not belong to the user |
recording_not_completed | 422 | The recording has not finished processing |
entry_not_found | 404 | The specified sentence was not found |
entry_text_empty | 422 | The original text of that sentence is empty |
transcript_revision_conflict | 409 | Revision mismatch (already modified by another request) |
storage_upload_failed | 500 | Failed to save the transcript |
For the full event format and a workflow example combining optimistic locking with PATCH, see reference/sse/retranslate.md.
init_sentence Edit Marker Fields (added in v1.4.0)
For sentences edited by a user, historyTranscribe adds two fields to the init_sentence event (only present after editing):
{
"sid": 7,
"origin": "Corrected text",
"original_text_raw": "Original STT output",
"original_text_edited_at": "2026-05-06T10:30:00.000000Z",
"translations": { "en-US": "Corrected text" }
}
Frontend detection: determine this by the presence of the field ('original_text_raw' in data); do not compare origin === original_text_raw — a user may edit and then change it back to the same string, in which case the text is equal but the "edited" marker should still be shown. See reference/sse/history.md.
GET /api/v1/sse/retranslate/summary/{taskId} (Retranslate Summary)
Description
Retranslates the summary of the specified task into the target language. Translation results are delivered segment by segment via an SSE stream.
Use Cases
- Switching the summary display language
- Obtaining the summary in a different language
Authentication
Header: X-API-Key: YOUR_API_KEY
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
taskId | string | Yes | Recording ID |
targetLang | string | Yes | Target language code |
Request Example
// Use the fetch API (because EventSource does not support headers)
async function retranslateSummarySSE(taskId, targetLang, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/retranslate/summary/${taskId}?targetLang=${targetLang}`,
{
headers: {
'X-API-Key': apiKey
}
}
);
const reader = response.body.getReader();
// ... handle SSE events
}
Event Format
summary_translation:
{"text": "Accumulated translation result...", "is_final": false}
done:
{"totalUpdated": 1}
Error Responses
| Error Code | HTTP Status | Description | Recommended Handling |
|---|---|---|---|
sse_summary_not_found | 404 | Summary not found | This recording has no summary |
sse_summary_translation_failed | 500 | Summary translation failed | Retry later |
Regenerate Summary (GET Preview / POST Save)
Split into two endpoints + mode-aware. For the full schema, see reference/sse/regenerate-summary.md; this is a quick summary.
| Method | Endpoint | Writes DB | Saves Transcript | Billed | Purpose |
|---|---|---|---|---|---|
| GET | /api/v1/sse/regenerate/summary/{taskId} | ❌ | ❌ | ✅ | Preview (dry run) |
| POST | /api/v1/sse/regenerate/summary/{taskId} | ✅ | ✅ + bump revision | ✅ | Save (persist officially) |
Known limitation: GET is also billed — the LLM actually consumes tokens, so the GET endpoint cannot be used for free.
Shared Parameters (GET via query string, POST via JSON body)
| Parameter | Type | Required | Description |
|---|---|---|---|
taskId (path) | string | Yes | Recording UUID |
mode | string | Yes | Summary mode enum: builtin / custom |
template | string | Required for builtin / forbidden for custom | Built-in template slug |
prompt | string | Required for custom / forbidden for builtin | The customer's full prompt (replaces the built-in layered prompt, ≤2000 characters) |
promptSlug | string | Required for custom / forbidden for builtin | The customer's own identifier (≤64 Unicode characters, no control characters) |
language | string | No | Output language (defaults to the first transcription language) |
plainText | boolean | No | Whether to request plain-text output (default false) |
Mutual exclusivity rule: violation → 422 summary_mode_field_mismatch.
Request Example
# Preview builtin (does not write DB / blob)
curl -N "https://vas-poc.vurbo.ai/api/v1/sse/regenerate/summary/550e8400-...?mode=builtin&template=meeting&language=zh-TW&plainText=true" \
-H "X-API-Key: YOUR_API_KEY"
# Save custom
curl -N -X POST "https://vas-poc.vurbo.ai/api/v1/sse/regenerate/summary/550e8400-..." \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"mode":"custom","prompt":"Please emphasize KPIs","promptSlug":"acme-v2","plainText":true}'
Event Sequence
1. connected → Connection confirmation (includes mode=builtin|custom, endpoint=preview|persist)
2. summary_regeneration → Stream summary segments (accumulating; is_final=true marks the last one)
3. done → Complete, includes final_content / mode / template(effective) / prompt_snapshot (only for custom)
done event
{
"task_id": "550e8400-...",
"tokens_used": 123,
"final_content": "This meeting...",
"mode": "custom",
"template": "acme-v2",
"plain_text": true,
"persisted": true,
"prompt_snapshot": "Please emphasize KPIs"
}
mode: business mode (builtin / custom)template: effective slug — builtin → built-in template slug; custom → customer slugpersisted: whether this summary has been officially saved (falsefor GET,truefor POST)prompt_snapshot: only present in custom mode; thepromptcontent the customer passed in verbatim (a mandatory snapshot, the sole basis for reconstruction)
Error Codes
| Error Code | HTTP | Description |
|---|---|---|
recording_not_found | 404 | Recording not found |
sse_template_not_found | 404 | Summary template not found |
sse_transcript_not_found | 404 | Transcript not found |
summary_text_empty | 400 | The transcript has no content |
summary_text_too_long | 400 | The transcript exceeds the 100,000-character limit |
sse_summary_regeneration_failed | 500 | Regeneration failed (raw error already sanitized) |
summary_invalid_mode | 422 | mode is not builtin / custom |
summary_mode_field_mismatch | 422 | The mode and field combination do not match (required field missing / forbidden field provided) |
summary_prompt_too_long | 422 | prompt exceeds 2000 characters |
summary_prompt_slug_too_long | 422 | promptSlug exceeds 64 characters |
summary_prompt_slug_invalid | 422 | promptSlug contains control characters (\n / \r / \t / \0, etc.) |
Frontend Example
async function regenerateSummary(taskId, body, apiKey, { persist = false } = {}) {
const url = `https://vas-poc.vurbo.ai/api/v1/sse/regenerate/summary/${taskId}`;
const init = persist
? { method: 'POST', headers: { 'X-API-Key': apiKey, 'Content-Type': 'application/json' }, body: JSON.stringify(body) }
: { method: 'GET', headers: { 'X-API-Key': apiKey } };
if (!persist) {
const params = new URLSearchParams(body);
return fetch(`${url}?${params}`, init);
}
return fetch(url, init);
}
GET /api/v1/sse/tts/{taskId} (TTS Audio Stream)
Description
Converts the translated content of a historical recording into TTS audio, delivered sentence by sentence via an SSE stream. The frontend can control how many sentences are returned per request.
Use Cases
- Audio playback of translations from historical recordings
- Karaoke effect (combined with word boundaries)
- Voice readout of translated content
Authentication
Header: X-API-Key: YOUR_API_KEY
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
taskId | string | Yes | Recording ID (path parameter) |
language | string | Yes | TTS output language (e.g., en-US) |
voice | string | No | Specify a voice name (e.g., en-US-JennyNeural) |
sid | int | No | Starting sentence ID (default 1, starting from the first sentence) |
length | int | No | Number of sentences to return (default 1, maximum 20) |
Note: The maximum value of
lengthis controlled by the backend environment variableTTS_SSE_MAX_LENGTH(default 20). It is automatically truncated when it exceeds the maximum.
Request Example (Single Sentence Playback)
// Use the fetch API (because EventSource does not support headers)
async function playTTSSingle(taskId, language, sid, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=${language}&sid=${sid}`,
{
headers: {
'X-API-Key': apiKey
}
}
);
const reader = response.body.getReader();
// ... handle SSE events
}
Request Example (Multiple Sentence Playback)
// Play sentences 5, 6, and 7 (3 sentences total)
async function playTTSMultiple(taskId, language, sid, length, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=${language}&sid=${sid}&length=${length}`,
{
headers: {
'X-API-Key': apiKey
}
}
);
const reader = response.body.getReader();
// ... handle SSE events
}
Event Sequence
1. connected → Connection confirmation
2. tts_audio → Send TTS audio sentence by sentence (repeated N times, N = length)
3. tts_done → Playback complete
Event Format
connected:
{
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"language": "en-US",
"voice": "en-US-JennyNeural",
"start_sid": 5,
"length": 3
}
tts_audio:
{
"sid": 5,
"transcript": "Hello, nice to meet you",
"text": "Hello, nice to meet you",
"audio": "Base64EncodedMP3...",
"duration_ms": 2500,
"boundaries": [
{"offset_ms": 0, "duration_ms": 350, "text_offset": 0, "word_length": 5, "text": "Hello"},
{"offset_ms": 350, "duration_ms": 100, "text_offset": 5, "word_length": 1, "text": ","},
{"offset_ms": 500, "duration_ms": 250, "text_offset": 7, "word_length": 4, "text": "nice"},
{"offset_ms": 750, "duration_ms": 200, "text_offset": 12, "word_length": 2, "text": "to"},
{"offset_ms": 950, "duration_ms": 350, "text_offset": 15, "word_length": 4, "text": "meet"},
{"offset_ms": 1300, "duration_ms": 300, "text_offset": 20, "word_length": 3, "text": "you"}
]
}
| Field | Type | Description |
|---|---|---|
sid | int | Sentence ID |
transcript | string | Original transcript (STT recognition result) |
text | string | Translated text (the TTS synthesis source) |
audio | string | Base64-encoded MP3 audio |
duration_ms | int | Audio duration (milliseconds) |
boundaries | array | Word boundary array |
Word Boundary Field Descriptions
| Field | Type | Description |
|---|---|---|
offset_ms | int | The word's start time in the audio (milliseconds) |
duration_ms | int | The word's duration (milliseconds) |
text_offset | int | Position in the original text string (character index) |
word_length | int | Word length (number of characters) |
text | string | Word content |
tts_done:
{
"sentences_sent": 3,
"total_duration_ms": 7500,
"total_characters_used": 142
}
| Field | Type | Description |
|---|---|---|
sentences_sent | int | The number of sentences actually sent |
total_duration_ms | int | The total audio duration of all sentences (milliseconds) |
total_characters_used | int | The total number of characters synthesized in this TTS request (used for quota calculation) |
Error Responses
| Error Code | HTTP Status | Description | Recommended Handling |
|---|---|---|---|
recording_not_found | 404 | Recording not found | Confirm the taskId is correct |
sse_missing_target_lang | 422 | Missing language parameter | Provide the language parameter |
sse_unsupported_language | 422 | Unsupported language | Use a valid language code |
tts_translation_not_found | 400 | Translation for that language not found | Confirm the translation for that language exists |
tts_synthesis_failed | 500 | TTS synthesis failed | Retry later |
tts_quota_exceeded | 402 | TTS usage limit reached | Retry later |
Frontend Example
// Use the fetch API to handle TTS SSE
async function playTTS(taskId, language, apiKey, startSid = 1, length = 1) {
const url = new URL(`https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}`);
url.searchParams.set('language', language);
url.searchParams.set('sid', startSid);
url.searchParams.set('length', length);
const response = await fetch(url, {
headers: {
'X-API-Key': apiKey
}
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const events = parseSSE(decoder.decode(value));
for (const event of events) {
if (event.type === 'connected') {
console.log(`TTS connection successful, voice: ${event.data.voice}`);
} else if (event.type === 'tts_audio') {
console.log(`Sentence ${event.data.sid}: ${event.data.text}`);
// Play the audio
const audioBlob = base64ToBlob(event.data.audio, 'audio/mp3');
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
// Set up the karaoke effect
setupKaraoke(audio, event.data.boundaries, event.data.text);
audio.play();
} else if (event.type === 'tts_done') {
console.log(`Playback complete, ${event.data.sentences_sent} sentences total`);
}
}
}
}
// Base64 to Blob
function base64ToBlob(base64, mimeType) {
const byteCharacters = atob(base64);
const byteNumbers = new Array(byteCharacters.length);
for (let i = 0; i < byteCharacters.length; i++) {
byteNumbers[i] = byteCharacters.charCodeAt(i);
}
const byteArray = new Uint8Array(byteNumbers);
return new Blob([byteArray], { type: mimeType });
}
// Karaoke effect
function setupKaraoke(audio, boundaries, text) {
const updateHighlight = () => {
const currentTimeMs = audio.currentTime * 1000;
const currentWord = boundaries.find((b, i) => {
const nextOffset = boundaries[i + 1]?.offset_ms ?? Infinity;
return currentTimeMs >= b.offset_ms && currentTimeMs < nextOffset;
});
if (currentWord) {
// Highlight the current word
highlightWord(text, currentWord.text_offset, currentWord.word_length);
}
};
const interval = setInterval(updateHighlight, 50);
audio.addEventListener('ended', () => clearInterval(interval));
}
GET /api/v1/sse/imports/{importId}/progress (Import Progress Stream)
Description
Tracks the processing progress of an audio file import in real time. After connecting, progress updates are continuously pushed via an SSE stream until the import completes, fails, or the connection times out.
Use Cases
- Showing a real-time processing progress bar after uploading an audio file
- Tracking the progress of each stage: audio conversion, transcription, translation, summary, etc.
Authentication
Header: X-API-Key: YOUR_API_KEY
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
importId | string | Yes | Import task ID (UUID, path parameter) |
Request Example
curl -N "https://vas-poc.vurbo.ai/api/v1/sse/imports/550e8400-e29b-41d4-a716-446655440000/progress" \
-H "X-API-Key: vas_aB3dE5fG7hI9jK1lM3nO5pQ7rS9tU1vW"
Event Sequence
Scenario 1: Import still in progress
1. connected → Connection confirmation
2. progress → Send the current progress
3. progress ×N → Continuously pushed when progress changes
heartbeat ×N → Sent every 15 seconds when there is no progress change
4. completed → Import succeeded, connection ends
or failed → Import failed, connection ends
or timeout → Exceeded 15 minutes, connection ends
Scenario 2: Import already complete (terminal state)
1. connected → Connection confirmation
2. progress → Send the final progress
3. completed → Send the completed event directly and end
or failed → Send the failed event directly and end
Event Format
connected:
{"message": "Import progress service connected (importId: xxx)"}
progress:
{
"import_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "processing",
"stage": "transcribing",
"progress": 45,
"message": "Transcribing..."
}
| Field | Type | Description |
|---|---|---|
import_id | string | Import task ID (UUID) |
status | string | Import status: pending / processing / completed / failed |
stage | string / null | The current processing stage |
progress | integer | Progress percentage (0-100) |
message | string | Human-readable progress message |
Stage values and their corresponding progress ranges:
| Value | Description | Progress Range |
|---|---|---|
converting | Audio format conversion | 0% - 10% |
transcribing | Speech-to-text | 10% - 60% |
translating | Text translation | 60% - 85% |
summarizing | Generating the summary | 85% - 100% |
null | Not started yet | — |
completed:
{
"import_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"task_id": "abc123-e29b-41d4-a716-446655440000",
"message": "Processing complete"
}
| Field | Type | Description |
|---|---|---|
import_id | string | Import task ID |
status | string | Fixed as completed |
task_id | string | The generated recording ID (recording_id), which can be used for subsequent queries |
message | string | Fixed as Processing complete |
failed:
{
"import_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "failed",
"error_code": "import_invalid_format",
"error_message": "Unsupported audio format"
}
| Field | Type | Description |
|---|---|---|
import_id | string | Import task ID |
status | string | Fixed as failed |
error_code | string | Error code |
error_message | string | Human-readable error message |
heartbeat:
Sent every 15 seconds when there is no progress change, used to keep the connection alive.
{"timestamp": 1708761600}
timeout:
Sent when the import has not completed after 15 minutes; the connection ends automatically.
{"message": "Connection timeout"}
Error Responses
| Error Code | HTTP Status | Description | Recommended Handling |
|---|---|---|---|
import_not_found | 404 | The specified import task was not found | Confirm the importId is correct |
Frontend Example
async function trackImportProgress(importId, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/imports/${importId}/progress`,
{
headers: {
'X-API-Key': apiKey
}
}
);
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const events = buffer.split('\n\n');
buffer = events.pop();
for (const eventStr of events) {
if (!eventStr.trim()) continue;
const lines = eventStr.split('\n');
let eventType = '';
let eventData = '';
for (const line of lines) {
if (line.startsWith('event: ')) eventType = line.slice(7);
else if (line.startsWith('data: ')) eventData = line.slice(6);
}
if (!eventType || !eventData) continue;
const data = JSON.parse(eventData);
switch (eventType) {
case 'connected':
console.log('Connected:', data.message);
break;
case 'progress':
console.log(`[${data.stage}] ${data.progress}% - ${data.message}`);
updateProgressBar(data.progress, data.stage, data.message);
break;
case 'completed':
console.log('Import complete! Recording ID:', data.task_id);
navigateToRecording(data.task_id);
break;
case 'failed':
console.error('Import failed:', data.error_code, data.error_message);
showError(data.error_message);
break;
case 'timeout':
console.warn('Connection timeout:', data.message);
break;
}
}
}
}
Version: V1.5.7 Last Updated: 2026-05-20