Guides

History Playback

Overview
Get Task List
Load Historical Transcript
Audio Playback
Retranslation
Summary Retranslation
TTS Playback
Complete Flow Diagram
Related Documents

Overview

The VAS history feature lets you load past speech recognition results, including transcripts, translations, and summaries, along with original audio playback and retranslation.

History data comes from two sources:

Real-time voice translation: Tasks created after completing a recording over WebSocket
Audio import: Tasks created after uploading and processing an audio file via the REST API

Both produce a task_id once completed, and all subsequent operations work exactly the same way.

APIs Involved

API	Purpose
`GET /api/v1/tasks`	Get task list
`GET /api/v1/sse/history/transcribe/{taskId}`	Load historical transcript (SSE stream)
`GET /api/v1/sse/audio/{taskId}`	Audio streaming playback (supports Range Request)
`GET /api/v1/sse/retranslate/{taskId}`	Retranslate full transcript (SSE stream)
`GET /api/v1/sse/retranslate/summary/{taskId}`	Retranslate summary (SSE stream)
`GET /api/v1/sse/tts/{taskId}`	TTS audio streaming playback
`GET /api/v1/tasks/{taskId}/audio/export`	Download original audio file (save offline)
`GET /api/v1/tasks/{taskId}/transcript/export`	Download transcript (TXT / SRT / SBV / VTT / CSV)

Authentication

All APIs are authenticated via the X-API-Key header. See Authentication for details.

Note: The browser's native EventSource API does not support custom headers, so the SSE APIs must be read using the fetch API together with ReadableStream.

Get Task List

First, retrieve all of the user's tasks and find the task_id you want to play back.

Request

curl -X GET "https://vas-poc.vurbo.ai/api/v1/tasks" \
  -H "X-API-Key: YOUR_API_KEY"

Response

{
  "tasks": [
    {
      "task_id": "550e8400-e29b-41d4-a716-446655440000",
      "title": "Product Planning Meeting",
      "type": "transcribe",
      "duration_ms": 3600000,
      "duration_formatted": "60:00",
      "source_lang": "zh-TW",
      "target_lang": "en-US",
      "created_at": "2026-02-20T10:00:00Z",
      "is_pinned": false,
      "is_unread": true
    }
  ]
}

Key Fields

Field	Description
`task_id`	Task ID (UUID), the key for all subsequent operations
`title`	Task title
`type`	Recording type: `transcribe`, `conversation`, `record`, `broadcast`
`duration_ms`	Recording duration (milliseconds)
`source_lang`	Source language
`target_lang`	Target language
`is_pinned`	Whether the task is pinned
`is_unread`	Whether the task is unread

Operation	API	Description
Delete task	`DELETE /api/v1/tasks/{taskId}`	Soft delete
Pin task	`PUT /api/v1/tasks/{taskId}/pin`	Mark as important
Mark as read	`PUT /api/v1/tasks/{taskId}/read`	Clear the unread flag
Update name	`PATCH /api/v1/tasks/{taskId}/name`	Customize the task title

Load Historical Transcript

Use an SSE stream to load the complete transcript of a given task, including the original text, translations, and summary.

Request

const response = await fetch(
  `https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
  {
    headers: { 'X-API-Key': apiKey }
  }
);

Note: The transcribe endpoint applies to all recording types (transcribe, conversation, record), not just the transcribe type.

Event Sequence

The SSE stream pushes the following events in order:

connected → init_metadata → init_sentence × N → init_summary → init_done

Order	Event	Description	Count
1	`connected`	Connection confirmation	1 time
2	`init_metadata`	Task metadata	1 time
3	`init_sentence`	Per-sentence push (original + translation)	N times
4	`init_summary`	Summary content	0–1 times
5	`init_done`	Initialization complete	1 time

Event Formats

connected

event: connected
data: {"message": "History service connected (recordingId: xxx)"}

init_metadata

event: init_metadata
data: {"task_id": "550e8400...", "title": "Meeting Notes", "created_at": "2026-02-20T10:00:00Z", "type": "transcribe", "has_speaker_diarization": false, "transcription_languages": ["zh-TW"], "translation_languages": ["en-US"], "summary_template": "general", "summary_language": "zh-TW"}

Field	Description
`task_id`	Task ID
`title`	Task title
`type`	Recording type
`has_speaker_diarization`	Whether speaker diarization (multi-speaker mode) is enabled
`transcription_languages`	Transcription language array (BCP 47, e.g. `["zh-TW"]`), up to 2
`translation_languages`	Translation language array (BCP 47, e.g. `["en-US"]`), up to 8
`summary_template`	Summary template slug; `null` when not specified
`summary_language`	Summary output language (BCP 47); `null` when not specified

init_sentence

event: init_sentence
data: {"sid": 1, "origin": "你好，很高興認識你", "translations": {"en-US": "Hello, nice to meet you"}, "start_time": "00:05", "speaker_id": "0"}

If a sentence has a translation failure (content filtered, provider error, etc.), it carries an additional translation_errors field (only present on failure):

event: init_sentence
data: {"sid": 5, "origin": "敏感詞句子", "translations": {"en-US": "Sensitive sentence"}, "translation_errors": {"ja": "llm_content_filtered"}, "start_time": "00:25", "speaker_id": "0"}

Field	Description
`sid`	Sentence number
`origin`	Original text (recognition result)
`translations`	Translation result map (may be `null`)
`translation_errors`	Optional. Map of translation failure error codes. The frontend can distinguish "translation not scheduled for that language" (key missing) vs. "translated but failed" (key present)
`start_time`	Sentence start time (`mm:ss`)
`speaker_id`	Speaker ID

init_summary

event: init_summary
data: {"text": "This is a summary of the meeting notes..."}

init_done

event: init_done
data: {"totalSentences": 42}

Frontend Example

async function loadHistory(taskId, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
    { headers: { 'X-API-Key': apiKey } }
  );

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });

    // Parse SSE format (events are separated by double newlines)
    const events = buffer.split('\n\n');
    buffer = events.pop(); // The last segment may be incomplete

    for (const eventStr of events) {
      const lines = eventStr.split('\n');
      let eventType = '';
      let eventData = '';

      for (const line of lines) {
        if (line.startsWith('event: ')) eventType = line.slice(7);
        if (line.startsWith('data: ')) eventData = line.slice(6);
      }

      if (!eventType || !eventData) continue;
      const data = JSON.parse(eventData);

      switch (eventType) {
        case 'init_metadata':
          console.log(`Task: ${data.title} (${data.type})`);
          break;
        case 'init_sentence':
          console.log(`[${data.start_time}] ${data.origin}`);
          if (data.translation) {
            console.log(`  → ${data.translation}`);
          }
          break;
        case 'init_summary':
          console.log(`Summary: ${data.text}`);
          break;
        case 'init_done':
          console.log(`Load complete, ${data.totalSentences} sentences total`);
          break;
      }
    }
  }
}

Audio Playback

Use the Audio API to play back a task's recording, with support for HTTP Range Request to enable seek playback.

Basic Playback

async function playAudio(taskId, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/audio/${taskId}`,
    { headers: { 'X-API-Key': apiKey } }
  );
  const blob = await response.blob();
  const audioUrl = URL.createObjectURL(blob);
  const audio = new Audio(audioUrl);
  audio.play();
}

Response Format

Scenario	HTTP Status Code	Description
Full file	200	Returns the complete audio
Partial file	206	Returns the requested range of audio (Range Request)

Response headers:

Content-Type: audio/mp4      (all recording audio files are returned in an M4A container)
Content-Length: 1234567
Accept-Ranges: bytes

Range Request (Seek Playback)

Using the HTML5 <audio> tag automatically handles Range Requests:

const audio = document.createElement('audio');
audio.src = `https://vas-poc.vurbo.ai/api/v1/sse/audio/${taskId}`;
// The browser will automatically include X-API-Key... but extra handling is needed

// Recommended: use the Blob URL approach
const response = await fetch(
  `https://vas-poc.vurbo.ai/api/v1/sse/audio/${taskId}`,
  { headers: { 'X-API-Key': apiKey } }
);
const blob = await response.blob();
audio.src = URL.createObjectURL(blob);
audio.controls = true;
document.body.appendChild(audio);

Common Errors

Error Code	Description	How to Handle
`recording_not_found`	Recording not found	Verify the taskId is correct
`recording_audio_not_ready`	Recording audio not ready	Retry later

Retranslation

Retranslate all sentences of a task into a specified target language. Useful for switching the display language or refreshing translations.

Request

GET /api/v1/sse/retranslate/{taskId}?targetLang=ja-JP

const response = await fetch(
  `https://vas-poc.vurbo.ai/api/v1/sse/retranslate/${taskId}?targetLang=ja-JP`,
  { headers: { 'X-API-Key': apiKey } }
);

Parameter	Type	Required	Description
`taskId`	string	Yes	Task ID (path parameter)
`targetLang`	string	Yes	Target language code (e.g. `ja-JP`)

Event Sequence

translation × N → done

translation event

event: translation
data: {"sid": 1, "text": "こんにちは、お会いできて嬉しいです", "is_final": true}

Field	Description
`sid`	Sentence number (corresponds to the sid in the original transcript)
`text`	New translation result
`is_final`	Whether this is the final result

done event

event: done
data: {"totalUpdated": 42}

Frontend Example

async function retranslate(taskId, targetLang, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/retranslate/${taskId}?targetLang=${targetLang}`,
    { headers: { 'X-API-Key': apiKey } }
  );

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const events = buffer.split('\n\n');
    buffer = events.pop();

    for (const eventStr of events) {
      const lines = eventStr.split('\n');
      let eventType = '';
      let eventData = '';

      for (const line of lines) {
        if (line.startsWith('event: ')) eventType = line.slice(7);
        if (line.startsWith('data: ')) eventData = line.slice(6);
      }

      if (!eventType || !eventData) continue;
      const data = JSON.parse(eventData);

      if (eventType === 'translation') {
        // Update the translation for the matching sid in the UI
        updateTranslation(data.sid, data.text);
      } else if (eventType === 'done') {
        console.log(`Retranslation complete, ${data.totalUpdated} sentences updated`);
      }
    }
  }
}

Common Errors

Error Code	Description
`sse_missing_target_lang`	Missing `targetLang` parameter
`sse_unsupported_language`	Unsupported target language
`sse_translation_failed`	Translation service failed, retry later

Summary Retranslation

Retranslate a task's summary into a specified language.

Request

GET /api/v1/sse/retranslate/summary/{taskId}?targetLang=ja-JP

const response = await fetch(
  `https://vas-poc.vurbo.ai/api/v1/sse/retranslate/summary/${taskId}?targetLang=ja-JP`,
  { headers: { 'X-API-Key': apiKey } }
);

Parameter	Type	Required	Description
`taskId`	string	Yes	Task ID (path parameter)
`targetLang`	string	Yes	Target language code

Event Sequence

summary_translation × N → done

summary_translation event

event: summary_translation
data: {"text": "Accumulated translation result...", "is_final": false}

The summary translation is pushed as a stream. is_final: false means translation is still in progress, while is_final: true or receiving the done event indicates completion.

done event

event: done
data: {"totalUpdated": 1}

Common Errors

Error Code	Description
`sse_summary_not_found`	The task has no summary
`sse_summary_translation_failed`	Summary translation failed, retry later

TTS Playback

Convert the translated content of a historical recording into TTS audio for playback. Supports single-sentence or continuous multi-sentence playback.

Request

// Single-sentence playback
const response = await fetch(
  `https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=en-US&sid=1`,
  { headers: { 'X-API-Key': apiKey } }
);

// Multi-sentence playback (start from sentence 5, play 3 sentences)
const response = await fetch(
  `https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=en-US&sid=5&length=3`,
  { headers: { 'X-API-Key': apiKey } }
);

Parameter	Type	Required	Description
`taskId`	string	Yes	Task ID (path parameter)
`language`	string	Yes	TTS output language (e.g. `en-US`)
`voice`	string	No	Specify the voice name (e.g. `en-US-JennyNeural`)
`sid`	int	No	Starting sentence ID (default 1)
`length`	int	No	Number of sentences to play (default 1, max 20)

Event Sequence

connected → tts_audio × N → tts_done

tts_audio event

event: tts_audio
data: {"sid": 5, "transcript": "你好", "text": "Hello", "audio": "Base64...", "duration_ms": 2500, "boundaries": [...]}

Field	Description
`sid`	Sentence ID
`transcript`	Original transcript
`text`	Translated text (source for TTS synthesis)
`audio`	Base64-encoded MP3 audio
`duration_ms`	Audio duration (milliseconds)
`boundaries`	Word Boundary array (can be used for karaoke effects)

tts_done event

event: tts_done
data: {"sentences_sent": 3, "total_duration_ms": 7500}

Frontend Playback Example

async function playTTS(taskId, language, sid, length, apiKey) {
  const url = new URL(`https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}`);
  url.searchParams.set('language', language);
  url.searchParams.set('sid', sid);
  url.searchParams.set('length', length);

  const response = await fetch(url, {
    headers: { 'X-API-Key': apiKey }
  });

  // Play the audio after parsing the SSE events
  // ...(SSE parsing logic same as above)

  // When a tts_audio event is received:
  function handleTTSAudio(data) {
    const binaryString = atob(data.audio);
    const bytes = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
      bytes[i] = binaryString.charCodeAt(i);
    }
    const blob = new Blob([bytes], { type: 'audio/mp3' });
    const audio = new Audio(URL.createObjectURL(blob));
    audio.play();
  }
}

Complete Flow Diagram

                ┌──────────────────┐
                │  GET /api/v1/tasks │  Get task list
                └────────┬─────────┘
                         │
                    Select task_id
                         │
        ┌────────────────┼────────────────┐
        │                │                │
  ┌─────▼──────┐   ┌────▼─────┐   ┌─────▼──────┐
  │ Load        │   │ Audio    │   │ Retranslate│
  │ transcript  │   │ playback │   │            │
  │ SSE History │   │ Audio API│   │SSE Retrans.│
  └─────┬──────┘   └────┬─────┘   └─────┬──────┘
        │                │                │
        │          ┌─────▼──────┐         │
        │          │  HTTP 200  │         │
        │          │  Audio     │         │
        │          │  stream    │         │
        │          │ (Range OK) │         │
        │          └────────────┘         │
        │                                 │
  ┌─────▼──────────────────┐   ┌─────────▼────────┐
  │ SSE event sequence:    │   │ SSE event seq.:   │
  │                        │   │                   │
  │ 1. connected           │   │ translation × N   │
  │ 2. init_metadata       │   │ done              │
  │ 3. init_sentence × N   │   └───────────────────┘
  │ 4. init_summary        │
  │ 5. init_done           │         ┌──────────────────┐
  └────────────────────────┘         │ Summary retrans. │
                                     │ SSE Retrans/Summary│
                                     └────────┬─────────┘
                                              │
                                     summary_translation × N
                                     done
                         │
                ┌────────▼─────────┐
                │  TTS playback     │
                │  SSE /tts/{id}   │
                └────────┬─────────┘
                         │
                connected → tts_audio × N → tts_done

Typical Usage Flow

1. Call GET /api/v1/tasks to get the task list
2. The user selects a task
3. Call these concurrently:
   a. SSE History API to load the transcript (render each init_sentence as it arrives)
   b. Audio API to prepare audio playback
4. The user can:
   - Play / seek the audio
   - Switch the translation language (call SSE Retranslate)
   - Switch the summary language (call SSE Retranslate Summary)
   - Switch the summary template to regenerate (call SSE Regenerate Summary)
   - Play the translated TTS audio (call SSE TTS)

Document	Description
Authentication	Detailed API Key authentication explanation
Tasks API Reference	Complete task management API specification
History SSE Reference	Complete historical transcript SSE specification
Retranslate SSE Reference	Complete full-transcript/summary retranslation SSE specification
Regenerate Summary SSE Reference	Complete SSE specification for switching templates to regenerate the summary
Audio Streaming Reference	Complete audio playback API specification
TTS Streaming Reference	Complete TTS speech synthesis SSE specification
Real-Time Voice Translation	Real-time voice translation guide
Audio Import	Audio import guide

Version: V1.5.7 Last Updated: 2026-05-20

File Import

Speaker Management