History Playback
Table of Contents
- Overview
- Get Task List
- Load Historical Transcript
- Audio Playback
- Retranslation
- Summary Retranslation
- TTS Playback
- Complete Flow Diagram
- Related Documents
Overview
The VAS history feature lets you load past speech recognition results, including transcripts, translations, and summaries, along with original audio playback and retranslation.
History data comes from two sources:
- Real-time voice translation: Tasks created after completing a recording over WebSocket
- Audio import: Tasks created after uploading and processing an audio file via the REST API
Both produce a task_id once completed, and all subsequent operations work exactly the same way.
APIs Involved
| API | Purpose |
|---|---|
GET /api/v1/tasks | Get task list |
GET /api/v1/sse/history/transcribe/{taskId} | Load historical transcript (SSE stream) |
GET /api/v1/sse/audio/{taskId} | Audio streaming playback (supports Range Request) |
GET /api/v1/sse/retranslate/{taskId} | Retranslate full transcript (SSE stream) |
GET /api/v1/sse/retranslate/summary/{taskId} | Retranslate summary (SSE stream) |
GET /api/v1/sse/tts/{taskId} | TTS audio streaming playback |
GET /api/v1/tasks/{taskId}/audio/export | Download original audio file (save offline) |
GET /api/v1/tasks/{taskId}/transcript/export | Download transcript (TXT / SRT / SBV / VTT / CSV) |
Authentication
All APIs are authenticated via the X-API-Key header. See Authentication for details.
Note: The browser's native
EventSourceAPI does not support custom headers, so the SSE APIs must be read using thefetchAPI together withReadableStream.
Get Task List
First, retrieve all of the user's tasks and find the task_id you want to play back.
Request
curl -X GET "https://vas-poc.vurbo.ai/api/v1/tasks" \
-H "X-API-Key: YOUR_API_KEY"
Response
{
"tasks": [
{
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"title": "Product Planning Meeting",
"type": "transcribe",
"duration_ms": 3600000,
"duration_formatted": "60:00",
"source_lang": "zh-TW",
"target_lang": "en-US",
"created_at": "2026-02-20T10:00:00Z",
"is_pinned": false,
"is_unread": true
}
]
}
Key Fields
| Field | Description |
|---|---|
task_id | Task ID (UUID), the key for all subsequent operations |
title | Task title |
type | Recording type: transcribe, conversation, record, broadcast |
duration_ms | Recording duration (milliseconds) |
source_lang | Source language |
target_lang | Target language |
is_pinned | Whether the task is pinned |
is_unread | Whether the task is unread |
Related Operations
| Operation | API | Description |
|---|---|---|
| Delete task | DELETE /api/v1/tasks/{taskId} | Soft delete |
| Pin task | PUT /api/v1/tasks/{taskId}/pin | Mark as important |
| Mark as read | PUT /api/v1/tasks/{taskId}/read | Clear the unread flag |
| Update name | PATCH /api/v1/tasks/{taskId}/name | Customize the task title |
Load Historical Transcript
Use an SSE stream to load the complete transcript of a given task, including the original text, translations, and summary.
Request
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
{
headers: { 'X-API-Key': apiKey }
}
);
Note: The
transcribeendpoint applies to all recording types (transcribe, conversation, record), not just the transcribe type.
Event Sequence
The SSE stream pushes the following events in order:
connected → init_metadata → init_sentence × N → init_summary → init_done
| Order | Event | Description | Count |
|---|---|---|---|
| 1 | connected | Connection confirmation | 1 time |
| 2 | init_metadata | Task metadata | 1 time |
| 3 | init_sentence | Per-sentence push (original + translation) | N times |
| 4 | init_summary | Summary content | 0–1 times |
| 5 | init_done | Initialization complete | 1 time |
Event Formats
connected
event: connected
data: {"message": "History service connected (recordingId: xxx)"}
init_metadata
event: init_metadata
data: {"task_id": "550e8400...", "title": "Meeting Notes", "created_at": "2026-02-20T10:00:00Z", "type": "transcribe", "has_speaker_diarization": false, "transcription_languages": ["zh-TW"], "translation_languages": ["en-US"], "summary_template": "general", "summary_language": "zh-TW"}
| Field | Description |
|---|---|
task_id | Task ID |
title | Task title |
type | Recording type |
has_speaker_diarization | Whether speaker diarization (multi-speaker mode) is enabled |
transcription_languages | Transcription language array (BCP 47, e.g. ["zh-TW"]), up to 2 |
translation_languages | Translation language array (BCP 47, e.g. ["en-US"]), up to 8 |
summary_template | Summary template slug; null when not specified |
summary_language | Summary output language (BCP 47); null when not specified |
init_sentence
event: init_sentence
data: {"sid": 1, "origin": "你好,很高興認識你", "translations": {"en-US": "Hello, nice to meet you"}, "start_time": "00:05", "speaker_id": "0"}
If a sentence has a translation failure (content filtered, provider error, etc.), it carries an additional translation_errors field (only present on failure):
event: init_sentence
data: {"sid": 5, "origin": "敏感詞句子", "translations": {"en-US": "Sensitive sentence"}, "translation_errors": {"ja": "llm_content_filtered"}, "start_time": "00:25", "speaker_id": "0"}
| Field | Description |
|---|---|
sid | Sentence number |
origin | Original text (recognition result) |
translations | Translation result map (may be null) |
translation_errors | Optional. Map of translation failure error codes. The frontend can distinguish "translation not scheduled for that language" (key missing) vs. "translated but failed" (key present) |
start_time | Sentence start time (mm:ss) |
speaker_id | Speaker ID |
init_summary
event: init_summary
data: {"text": "This is a summary of the meeting notes..."}
init_done
event: init_done
data: {"totalSentences": 42}
Frontend Example
async function loadHistory(taskId, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
{ headers: { 'X-API-Key': apiKey } }
);
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
// Parse SSE format (events are separated by double newlines)
const events = buffer.split('\n\n');
buffer = events.pop(); // The last segment may be incomplete
for (const eventStr of events) {
const lines = eventStr.split('\n');
let eventType = '';
let eventData = '';
for (const line of lines) {
if (line.startsWith('event: ')) eventType = line.slice(7);
if (line.startsWith('data: ')) eventData = line.slice(6);
}
if (!eventType || !eventData) continue;
const data = JSON.parse(eventData);
switch (eventType) {
case 'init_metadata':
console.log(`Task: ${data.title} (${data.type})`);
break;
case 'init_sentence':
console.log(`[${data.start_time}] ${data.origin}`);
if (data.translation) {
console.log(` → ${data.translation}`);
}
break;
case 'init_summary':
console.log(`Summary: ${data.text}`);
break;
case 'init_done':
console.log(`Load complete, ${data.totalSentences} sentences total`);
break;
}
}
}
}
Audio Playback
Use the Audio API to play back a task's recording, with support for HTTP Range Request to enable seek playback.
Basic Playback
async function playAudio(taskId, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/audio/${taskId}`,
{ headers: { 'X-API-Key': apiKey } }
);
const blob = await response.blob();
const audioUrl = URL.createObjectURL(blob);
const audio = new Audio(audioUrl);
audio.play();
}
Response Format
| Scenario | HTTP Status Code | Description |
|---|---|---|
| Full file | 200 | Returns the complete audio |
| Partial file | 206 | Returns the requested range of audio (Range Request) |
Response headers:
Content-Type: audio/mp4 (all recording audio files are returned in an M4A container)
Content-Length: 1234567
Accept-Ranges: bytes
Range Request (Seek Playback)
Using the HTML5 <audio> tag automatically handles Range Requests:
const audio = document.createElement('audio');
audio.src = `https://vas-poc.vurbo.ai/api/v1/sse/audio/${taskId}`;
// The browser will automatically include X-API-Key... but extra handling is needed
// Recommended: use the Blob URL approach
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/audio/${taskId}`,
{ headers: { 'X-API-Key': apiKey } }
);
const blob = await response.blob();
audio.src = URL.createObjectURL(blob);
audio.controls = true;
document.body.appendChild(audio);
Common Errors
| Error Code | Description | How to Handle |
|---|---|---|
recording_not_found | Recording not found | Verify the taskId is correct |
recording_audio_not_ready | Recording audio not ready | Retry later |
Retranslation
Retranslate all sentences of a task into a specified target language. Useful for switching the display language or refreshing translations.
Request
GET /api/v1/sse/retranslate/{taskId}?targetLang=ja-JP
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/retranslate/${taskId}?targetLang=ja-JP`,
{ headers: { 'X-API-Key': apiKey } }
);
| Parameter | Type | Required | Description |
|---|---|---|---|
taskId | string | Yes | Task ID (path parameter) |
targetLang | string | Yes | Target language code (e.g. ja-JP) |
Event Sequence
translation × N → done
translation event
event: translation
data: {"sid": 1, "text": "こんにちは、お会いできて嬉しいです", "is_final": true}
| Field | Description |
|---|---|
sid | Sentence number (corresponds to the sid in the original transcript) |
text | New translation result |
is_final | Whether this is the final result |
done event
event: done
data: {"totalUpdated": 42}
Frontend Example
async function retranslate(taskId, targetLang, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/retranslate/${taskId}?targetLang=${targetLang}`,
{ headers: { 'X-API-Key': apiKey } }
);
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const events = buffer.split('\n\n');
buffer = events.pop();
for (const eventStr of events) {
const lines = eventStr.split('\n');
let eventType = '';
let eventData = '';
for (const line of lines) {
if (line.startsWith('event: ')) eventType = line.slice(7);
if (line.startsWith('data: ')) eventData = line.slice(6);
}
if (!eventType || !eventData) continue;
const data = JSON.parse(eventData);
if (eventType === 'translation') {
// Update the translation for the matching sid in the UI
updateTranslation(data.sid, data.text);
} else if (eventType === 'done') {
console.log(`Retranslation complete, ${data.totalUpdated} sentences updated`);
}
}
}
}
Common Errors
| Error Code | Description |
|---|---|
sse_missing_target_lang | Missing targetLang parameter |
sse_unsupported_language | Unsupported target language |
sse_translation_failed | Translation service failed, retry later |
Summary Retranslation
Retranslate a task's summary into a specified language.
Request
GET /api/v1/sse/retranslate/summary/{taskId}?targetLang=ja-JP
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/retranslate/summary/${taskId}?targetLang=ja-JP`,
{ headers: { 'X-API-Key': apiKey } }
);
| Parameter | Type | Required | Description |
|---|---|---|---|
taskId | string | Yes | Task ID (path parameter) |
targetLang | string | Yes | Target language code |
Event Sequence
summary_translation × N → done
summary_translation event
event: summary_translation
data: {"text": "Accumulated translation result...", "is_final": false}
The summary translation is pushed as a stream.
is_final: falsemeans translation is still in progress, whileis_final: trueor receiving thedoneevent indicates completion.
done event
event: done
data: {"totalUpdated": 1}
Common Errors
| Error Code | Description |
|---|---|
sse_summary_not_found | The task has no summary |
sse_summary_translation_failed | Summary translation failed, retry later |
TTS Playback
Convert the translated content of a historical recording into TTS audio for playback. Supports single-sentence or continuous multi-sentence playback.
Request
// Single-sentence playback
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=en-US&sid=1`,
{ headers: { 'X-API-Key': apiKey } }
);
// Multi-sentence playback (start from sentence 5, play 3 sentences)
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=en-US&sid=5&length=3`,
{ headers: { 'X-API-Key': apiKey } }
);
| Parameter | Type | Required | Description |
|---|---|---|---|
taskId | string | Yes | Task ID (path parameter) |
language | string | Yes | TTS output language (e.g. en-US) |
voice | string | No | Specify the voice name (e.g. en-US-JennyNeural) |
sid | int | No | Starting sentence ID (default 1) |
length | int | No | Number of sentences to play (default 1, max 20) |
Event Sequence
connected → tts_audio × N → tts_done
tts_audio event
event: tts_audio
data: {"sid": 5, "transcript": "你好", "text": "Hello", "audio": "Base64...", "duration_ms": 2500, "boundaries": [...]}
| Field | Description |
|---|---|
sid | Sentence ID |
transcript | Original transcript |
text | Translated text (source for TTS synthesis) |
audio | Base64-encoded MP3 audio |
duration_ms | Audio duration (milliseconds) |
boundaries | Word Boundary array (can be used for karaoke effects) |
tts_done event
event: tts_done
data: {"sentences_sent": 3, "total_duration_ms": 7500}
Frontend Playback Example
async function playTTS(taskId, language, sid, length, apiKey) {
const url = new URL(`https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}`);
url.searchParams.set('language', language);
url.searchParams.set('sid', sid);
url.searchParams.set('length', length);
const response = await fetch(url, {
headers: { 'X-API-Key': apiKey }
});
// Play the audio after parsing the SSE events
// ...(SSE parsing logic same as above)
// When a tts_audio event is received:
function handleTTSAudio(data) {
const binaryString = atob(data.audio);
const bytes = new Uint8Array(binaryString.length);
for (let i = 0; i < binaryString.length; i++) {
bytes[i] = binaryString.charCodeAt(i);
}
const blob = new Blob([bytes], { type: 'audio/mp3' });
const audio = new Audio(URL.createObjectURL(blob));
audio.play();
}
}
Complete Flow Diagram
┌──────────────────┐
│ GET /api/v1/tasks │ Get task list
└────────┬─────────┘
│
Select task_id
│
┌────────────────┼────────────────┐
│ │ │
┌─────▼──────┐ ┌────▼─────┐ ┌─────▼──────┐
│ Load │ │ Audio │ │ Retranslate│
│ transcript │ │ playback │ │ │
│ SSE History │ │ Audio API│ │SSE Retrans.│
└─────┬──────┘ └────┬─────┘ └─────┬──────┘
│ │ │
│ ┌─────▼──────┐ │
│ │ HTTP 200 │ │
│ │ Audio │ │
│ │ stream │ │
│ │ (Range OK) │ │
│ └────────────┘ │
│ │
┌─────▼──────────────────┐ ┌─────────▼────────┐
│ SSE event sequence: │ │ SSE event seq.: │
│ │ │ │
│ 1. connected │ │ translation × N │
│ 2. init_metadata │ │ done │
│ 3. init_sentence × N │ └───────────────────┘
│ 4. init_summary │
│ 5. init_done │ ┌──────────────────┐
└────────────────────────┘ │ Summary retrans. │
│ SSE Retrans/Summary│
└────────┬─────────┘
│
summary_translation × N
done
│
┌────────▼─────────┐
│ TTS playback │
│ SSE /tts/{id} │
└────────┬─────────┘
│
connected → tts_audio × N → tts_done
Typical Usage Flow
1. Call GET /api/v1/tasks to get the task list
2. The user selects a task
3. Call these concurrently:
a. SSE History API to load the transcript (render each init_sentence as it arrives)
b. Audio API to prepare audio playback
4. The user can:
- Play / seek the audio
- Switch the translation language (call SSE Retranslate)
- Switch the summary language (call SSE Retranslate Summary)
- Switch the summary template to regenerate (call SSE Regenerate Summary)
- Play the translated TTS audio (call SSE TTS)
Related Documents
| Document | Description |
|---|---|
| Authentication | Detailed API Key authentication explanation |
| Tasks API Reference | Complete task management API specification |
| History SSE Reference | Complete historical transcript SSE specification |
| Retranslate SSE Reference | Complete full-transcript/summary retranslation SSE specification |
| Regenerate Summary SSE Reference | Complete SSE specification for switching templates to regenerate the summary |
| Audio Streaming Reference | Complete audio playback API specification |
| TTS Streaming Reference | Complete TTS speech synthesis SSE specification |
| Real-Time Voice Translation | Real-time voice translation guide |
| Audio Import | Audio import guide |
Version: V1.5.7 Last Updated: 2026-05-20