History
Connection Info
| Item | Value |
|---|---|
| Base path | https://vas-poc.vurbo.ai/api/v1/sse |
| Protocol | HTTP + Server-Sent Events (SSE) |
| Data format | text/event-stream |
| Auth method | Header X-API-Key: {KEY} |
Note: The browser's native EventSource API does not support custom headers. Use the fetch API with ReadableStream, or use an SSE client library that supports headers.
Endpoint Overview
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/sse/history/transcribe/{taskId} | Retrieve historical conversation records |
GET /api/v1/sse/history/transcribe/{taskId}
Description
Loads the complete conversation record for a specified task, including all sentences and the summary. The data is sent one item at a time over an SSE stream.
Difference from the Transcript Download API (
GET /api/v1/tasks/{taskId}/transcript/export):
- This endpoint: for progressive loading; pushes raw structured data (JSON fragments) sentence by sentence as an event stream, so the front end can render the UI progressively.
- Transcript download: for offline download; returns the complete file (TXT / SRT / SBV / VTT / CSV) in one response, ready to open in subtitle software or a spreadsheet.
Use Cases
- View the recording details page
- Load historical transcripts
Authentication
Header: X-API-Key (see Authentication)
Request Parameters
| Parameter | Location | Type | Required | Description |
|---|---|---|---|---|
taskId | path | string | Yes | Recording ID (UUID) |
Request Example
curl -N "https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/550e8400-e29b-41d4-a716-446655440000" \
-H "X-API-Key: vas_aB3dE5fG7hI9jK1lM3nO5pQ7rS9tU1vW"
// Use the fetch API (because EventSource does not support headers)
async function connectSSE(taskId, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
{
headers: {
'X-API-Key': apiKey
}
}
);
const reader = response.body.getReader();
// ... handle SSE events
}
Event Sequence
1. connected → connection confirmation
2. init_metadata → send task metadata
3. init_sentence → send sentences one by one (repeats N times)
4. init_summary → send summary
5. init_done → initialization complete
Event Formats
connected
{
"message": "History service connected (recordingId: xxx)"
}
init_metadata
{
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Meeting Notes",
"created_at": "2026-02-23T10:00:00Z",
"type": "transcribe",
"has_speaker_diarization": true,
"transcription_languages": ["zh-TW"],
"translation_languages": ["en-US"],
"summary_template": "general",
"summary_language": "zh-TW",
"speaker_aliases": {"speaker_1": "Manager Wang"}
}
| Field | Type | Description |
|---|---|---|
task_id | string | Task ID (UUID) |
title | string | Task title |
created_at | string | Creation time (ISO 8601) |
type | string | Recording type |
has_speaker_diarization | boolean | Whether speaker diarization is enabled |
transcription_languages | array|null | Array of transcription languages (BCP 47, e.g. ["zh-TW"]), up to 2 |
translation_languages | array|null | Array of translation languages (BCP 47, e.g. ["en-US", "ja-JP"]), up to 8 |
summary_template | string|null | Summary template slug (e.g. general, meeting); null if not specified |
summary_language | string|null | Summary output language (BCP 47, e.g. zh-TW, en-US); null if not specified |
speaker_aliases | object | Mapping of "original speaker ID → display name"; {} (an empty object, not an array) when there are no aliases. The front end uses this for duplicate-name precheck before a rename (added in v1.3.12) |
init_sentence
{
"sid": 1,
"origin": "Hello",
"translations": {
"en-US": "Hello"
},
"start_time": "00:05",
"speaker_id": "speaker_1",
"speaker_label": "Manager Wang"
}
If a sentence has a translation failure, it carries an additional translation_errors field (present only when there is a failure). This lets the front end distinguish "the language was never scheduled for translation" (translations missing the key) from "it was translated but failed" (translation_errors has the key):
{
"sid": 5,
"origin": "Sentence with sensitive words",
"translations": {
"en-US": "Sensitive sentence"
},
"translation_errors": {
"ja": "llm_content_filtered"
},
"start_time": "00:25",
"speaker_id": "speaker_1",
"speaker_label": "Manager Wang"
}
If the user has edited the original text of a sentence (via PATCH /api/v1/recordings/{id}/entries/{sid}), it carries additional original_text_raw and original_text_edited_at fields (present only after editing):
{
"sid": 7,
"origin": "Corrected text",
"original_text_raw": "Original STT output",
"original_text_edited_at": "2026-05-06T10:30:00.000000Z",
"translations": { "en-US": "Corrected text" },
"start_time": "00:35",
"speaker_id": "speaker_1",
"speaker_label": "Manager Wang"
}
| Field | Type | Description |
|---|---|---|
sid | number | Sentence ID |
origin | string | Original text content (the user-corrected version if it has been edited) |
translations | object|null | Map of translated text ({"language_code": "translated text"}); null when there is no translation |
translation_errors | object | Optional. Map of translation failure error codes ({"language_code": "error_code"}); this field is omitted when there are no failures |
original_text_raw | string | Optional. The raw STT output text. Present only when the user has edited the sentence. The front end can use it to display an "edited" marker and offer a "restore original text" function |
original_text_edited_at | string | Optional. The most recent edit time of the original text (ISO 8601). Appears together with original_text_raw |
start_time | string | Start time (mm:ss) |
speaker_id | string|null | The original speaker ID (immutable and always stable, e.g. speaker_1). Provided as the source for target_speaker_id in PATCH /speakers/reassign (v1.5.3 reversal: previously the display name) |
speaker_label | string|null | The display label (the human-readable name after applying speaker_aliases, e.g. Manager Wang). Equal to speaker_id when there is no alias (added in v1.5.3 to replace the former display semantics of speaker_id) |
Front-end detection: Determine whether a sentence has been edited by the presence of the field (
'original_text_raw' in dataordata.original_text_raw !== undefined). Do not compareorigin === original_text_raw— the user may have edited the text and then changed it back to the same string; in that case the text is equal but the "edited" marker should still be shown.
v1.5.3 naming reversal:
speaker_idis reversed from the display name to the original ID; a newspeaker_labelfield holds the display label. Speaker edits (reassign/merge) always usespeaker_idas the locating key. See the V1.5.3 changelog.
init_summary
In addition to the summary text text, this includes mode-aware metadata (mode / template / plain_text / prompt_snapshot), which lets the client trace the mode, effective slug, and customer prompt content (custom mode) that correspond to that summary.
v1.5.5 adds fallback_level / dropped_segments: these appear only when the summary actually went through the LLM service content-filter fallback chain (L2 neutral prompt or L3 segment trimming), for auditing and UI hints during history playback.
Example (L1 succeeds directly, no fallback):
{
"text": "Summary content...",
"mode": "custom",
"template": "acme-meeting-v2",
"plain_text": true,
"prompt_snapshot": "Please emphasize KPIs"
}
Example (L3 triggered, generated after 2 transcript segments were trimmed):
{
"text": "Summary content (2 segments omitted)...",
"mode": "custom",
"template": "acme-meeting-v2",
"plain_text": true,
"prompt_snapshot": "Please emphasize KPIs",
"fallback_level": 3,
"dropped_segments": [3, 7]
}
| Field | Type | Description |
|---|---|---|
text | string | Summary text |
mode | string | null | "builtin" / "custom" / null (null when no summary was generated) |
template | string | null | effective slug — builtin → built-in template slug; custom → customer slug |
plain_text | boolean | Whether the output is plain text |
prompt_snapshot | string | null | Has a value only in custom mode; the prompt content passed in verbatim by the customer (the basis for reconstruction) |
fallback_level | int (omit) | Present only when a fallback was triggered (2 or 3). 2 = L2 neutral prompt; 3 = L3 segment trimming. Omitted when L1 succeeds directly |
dropped_segments | int (omit) | Present only when fallback_level=3; the indices of the trimmed transcript segments (an integer array in original order) |
fallback_level/dropped_segmentsandprompt_snapshotare complementary: the former records the actual execution path (whether a fallback was taken), and the latter records the customer intent (the original prompt content). Even if a fallback was triggered and the customer prompt was not actually used,prompt_snapshotstill preserves the original text as an audit record. See V1.5.5 changelog – LLM service content-filter automatic fallback.
init_done
{
"totalSentences": 10
}
| Field | Type | Description |
|---|---|---|
totalSentences | number | Total number of sentences |
Edge Case: No Speech Content (V1.3.7)
If the task is silent throughout, the volume is too low, there is too much noise, or the recognition language does not match the actual audio—so that the speech recognition engine recognizes no sentences—this endpoint still completes with the normal event sequence (not an sse_transcript_not_found error):
init_metadatais sent normallyinit_sentenceis sent 0 times (no sentences)- The
textofinit_summaryis an empty string"" - The
totalSentencesofinit_doneis0
This behavior applies to both sources: real-time recording (WebSocket recording ends) and file import (offline processing completes), and is aligned with the "zero recognition results" legalization behavior of the V1.3.5 import flow. The client should use totalSentences === 0 to decide whether to show a "no speech content" empty state, rather than treating it as an error branch. See File Import Guide – Behavior When Audio Cannot Be Recognized.
Specific Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
recording_not_found | 404 | Recording not found | Verify that taskId is correct |
sse_transcript_not_found | 404 | Transcript blob not found | The transcript file for the specified taskId does not exist or could not be accessed (does not occur under the normal flow; after V1.3.7, silence during real-time recording does not trigger this error either) |
Front-End Example
async function loadHistory(taskId, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
{
headers: {
'X-API-Key': apiKey
}
}
);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
// Parse the SSE format: event: xxx\ndata: {...}\n\n
const events = parseSSE(text);
for (const event of events) {
if (event.type === 'init_metadata') {
console.log('Task info:', event.data.title);
} else if (event.type === 'init_sentence') {
console.log(`[${event.data.start_time}] ${event.data.origin}`);
if (event.data.translations) {
console.log(`Translations:`, event.data.translations);
}
} else if (event.type === 'init_summary') {
console.log('Summary:', event.data.text);
} else if (event.type === 'init_done') {
console.log(`Load complete, ${event.data.totalSentences} sentences total`);
}
}
}
}
Version: V1.5.7 Last Updated: 2026-05-20