Guides

History Playback

Table of Contents

  1. Overview
  2. Get Task List
  3. Load Historical Transcript
  4. Audio Playback
  5. Retranslation
  6. Summary Retranslation
  7. TTS Playback
  8. Complete Flow Diagram
  9. Related Documents

Overview

The VAS history feature lets you load past speech recognition results, including transcripts, translations, and summaries, along with original audio playback and retranslation.

History data comes from two sources:

  • Real-time voice translation: Tasks created after completing a recording over WebSocket
  • Audio import: Tasks created after uploading and processing an audio file via the REST API

Both produce a task_id once completed, and all subsequent operations work exactly the same way.

APIs Involved

APIPurpose
GET /api/v1/tasksGet task list
GET /api/v1/sse/history/transcribe/{taskId}Load historical transcript (SSE stream)
GET /api/v1/sse/audio/{taskId}Audio streaming playback (supports Range Request)
GET /api/v1/sse/retranslate/{taskId}Retranslate full transcript (SSE stream)
GET /api/v1/sse/retranslate/summary/{taskId}Retranslate summary (SSE stream)
GET /api/v1/sse/tts/{taskId}TTS audio streaming playback
GET /api/v1/tasks/{taskId}/audio/exportDownload original audio file (save offline)
GET /api/v1/tasks/{taskId}/transcript/exportDownload transcript (TXT / SRT / SBV / VTT / CSV)

Authentication

All APIs are authenticated via the X-API-Key header. See Authentication for details.

Note: The browser's native EventSource API does not support custom headers, so the SSE APIs must be read using the fetch API together with ReadableStream.


Get Task List

First, retrieve all of the user's tasks and find the task_id you want to play back.

Request

curl -X GET "https://vas-poc.vurbo.ai/api/v1/tasks" \
  -H "X-API-Key: YOUR_API_KEY"

Response

{
  "tasks": [
    {
      "task_id": "550e8400-e29b-41d4-a716-446655440000",
      "title": "Product Planning Meeting",
      "type": "transcribe",
      "duration_ms": 3600000,
      "duration_formatted": "60:00",
      "source_lang": "zh-TW",
      "target_lang": "en-US",
      "created_at": "2026-02-20T10:00:00Z",
      "is_pinned": false,
      "is_unread": true
    }
  ]
}

Key Fields

FieldDescription
task_idTask ID (UUID), the key for all subsequent operations
titleTask title
typeRecording type: transcribe, conversation, record, broadcast
duration_msRecording duration (milliseconds)
source_langSource language
target_langTarget language
is_pinnedWhether the task is pinned
is_unreadWhether the task is unread
OperationAPIDescription
Delete taskDELETE /api/v1/tasks/{taskId}Soft delete
Pin taskPUT /api/v1/tasks/{taskId}/pinMark as important
Mark as readPUT /api/v1/tasks/{taskId}/readClear the unread flag
Update namePATCH /api/v1/tasks/{taskId}/nameCustomize the task title

Load Historical Transcript

Use an SSE stream to load the complete transcript of a given task, including the original text, translations, and summary.

Request

const response = await fetch(
  `https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
  {
    headers: { 'X-API-Key': apiKey }
  }
);

Note: The transcribe endpoint applies to all recording types (transcribe, conversation, record), not just the transcribe type.

Event Sequence

The SSE stream pushes the following events in order:

connected → init_metadata → init_sentence × N → init_summary → init_done
OrderEventDescriptionCount
1connectedConnection confirmation1 time
2init_metadataTask metadata1 time
3init_sentencePer-sentence push (original + translation)N times
4init_summarySummary content0–1 times
5init_doneInitialization complete1 time

Event Formats

connected

event: connected
data: {"message": "History service connected (recordingId: xxx)"}

init_metadata

event: init_metadata
data: {"task_id": "550e8400...", "title": "Meeting Notes", "created_at": "2026-02-20T10:00:00Z", "type": "transcribe", "has_speaker_diarization": false, "transcription_languages": ["zh-TW"], "translation_languages": ["en-US"], "summary_template": "general", "summary_language": "zh-TW"}
FieldDescription
task_idTask ID
titleTask title
typeRecording type
has_speaker_diarizationWhether speaker diarization (multi-speaker mode) is enabled
transcription_languagesTranscription language array (BCP 47, e.g. ["zh-TW"]), up to 2
translation_languagesTranslation language array (BCP 47, e.g. ["en-US"]), up to 8
summary_templateSummary template slug; null when not specified
summary_languageSummary output language (BCP 47); null when not specified

init_sentence

event: init_sentence
data: {"sid": 1, "origin": "你好,很高興認識你", "translations": {"en-US": "Hello, nice to meet you"}, "start_time": "00:05", "speaker_id": "0"}

If a sentence has a translation failure (content filtered, provider error, etc.), it carries an additional translation_errors field (only present on failure):

event: init_sentence
data: {"sid": 5, "origin": "敏感詞句子", "translations": {"en-US": "Sensitive sentence"}, "translation_errors": {"ja": "llm_content_filtered"}, "start_time": "00:25", "speaker_id": "0"}
FieldDescription
sidSentence number
originOriginal text (recognition result)
translationsTranslation result map (may be null)
translation_errorsOptional. Map of translation failure error codes. The frontend can distinguish "translation not scheduled for that language" (key missing) vs. "translated but failed" (key present)
start_timeSentence start time (mm:ss)
speaker_idSpeaker ID

init_summary

event: init_summary
data: {"text": "This is a summary of the meeting notes..."}

init_done

event: init_done
data: {"totalSentences": 42}

Frontend Example

async function loadHistory(taskId, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/history/transcribe/${taskId}`,
    { headers: { 'X-API-Key': apiKey } }
  );

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });

    // Parse SSE format (events are separated by double newlines)
    const events = buffer.split('\n\n');
    buffer = events.pop(); // The last segment may be incomplete

    for (const eventStr of events) {
      const lines = eventStr.split('\n');
      let eventType = '';
      let eventData = '';

      for (const line of lines) {
        if (line.startsWith('event: ')) eventType = line.slice(7);
        if (line.startsWith('data: ')) eventData = line.slice(6);
      }

      if (!eventType || !eventData) continue;
      const data = JSON.parse(eventData);

      switch (eventType) {
        case 'init_metadata':
          console.log(`Task: ${data.title} (${data.type})`);
          break;
        case 'init_sentence':
          console.log(`[${data.start_time}] ${data.origin}`);
          if (data.translation) {
            console.log(`  → ${data.translation}`);
          }
          break;
        case 'init_summary':
          console.log(`Summary: ${data.text}`);
          break;
        case 'init_done':
          console.log(`Load complete, ${data.totalSentences} sentences total`);
          break;
      }
    }
  }
}

Audio Playback

Use the Audio API to play back a task's recording, with support for HTTP Range Request to enable seek playback.

Basic Playback

async function playAudio(taskId, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/audio/${taskId}`,
    { headers: { 'X-API-Key': apiKey } }
  );
  const blob = await response.blob();
  const audioUrl = URL.createObjectURL(blob);
  const audio = new Audio(audioUrl);
  audio.play();
}

Response Format

ScenarioHTTP Status CodeDescription
Full file200Returns the complete audio
Partial file206Returns the requested range of audio (Range Request)

Response headers:

Content-Type: audio/mp4      (all recording audio files are returned in an M4A container)
Content-Length: 1234567
Accept-Ranges: bytes

Range Request (Seek Playback)

Using the HTML5 <audio> tag automatically handles Range Requests:

const audio = document.createElement('audio');
audio.src = `https://vas-poc.vurbo.ai/api/v1/sse/audio/${taskId}`;
// The browser will automatically include X-API-Key... but extra handling is needed

// Recommended: use the Blob URL approach
const response = await fetch(
  `https://vas-poc.vurbo.ai/api/v1/sse/audio/${taskId}`,
  { headers: { 'X-API-Key': apiKey } }
);
const blob = await response.blob();
audio.src = URL.createObjectURL(blob);
audio.controls = true;
document.body.appendChild(audio);

Common Errors

Error CodeDescriptionHow to Handle
recording_not_foundRecording not foundVerify the taskId is correct
recording_audio_not_readyRecording audio not readyRetry later

Retranslation

Retranslate all sentences of a task into a specified target language. Useful for switching the display language or refreshing translations.

Request

GET /api/v1/sse/retranslate/{taskId}?targetLang=ja-JP
const response = await fetch(
  `https://vas-poc.vurbo.ai/api/v1/sse/retranslate/${taskId}?targetLang=ja-JP`,
  { headers: { 'X-API-Key': apiKey } }
);
ParameterTypeRequiredDescription
taskIdstringYesTask ID (path parameter)
targetLangstringYesTarget language code (e.g. ja-JP)

Event Sequence

translation × N → done

translation event

event: translation
data: {"sid": 1, "text": "こんにちは、お会いできて嬉しいです", "is_final": true}
FieldDescription
sidSentence number (corresponds to the sid in the original transcript)
textNew translation result
is_finalWhether this is the final result

done event

event: done
data: {"totalUpdated": 42}

Frontend Example

async function retranslate(taskId, targetLang, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/retranslate/${taskId}?targetLang=${targetLang}`,
    { headers: { 'X-API-Key': apiKey } }
  );

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const events = buffer.split('\n\n');
    buffer = events.pop();

    for (const eventStr of events) {
      const lines = eventStr.split('\n');
      let eventType = '';
      let eventData = '';

      for (const line of lines) {
        if (line.startsWith('event: ')) eventType = line.slice(7);
        if (line.startsWith('data: ')) eventData = line.slice(6);
      }

      if (!eventType || !eventData) continue;
      const data = JSON.parse(eventData);

      if (eventType === 'translation') {
        // Update the translation for the matching sid in the UI
        updateTranslation(data.sid, data.text);
      } else if (eventType === 'done') {
        console.log(`Retranslation complete, ${data.totalUpdated} sentences updated`);
      }
    }
  }
}

Common Errors

Error CodeDescription
sse_missing_target_langMissing targetLang parameter
sse_unsupported_languageUnsupported target language
sse_translation_failedTranslation service failed, retry later

Summary Retranslation

Retranslate a task's summary into a specified language.

Request

GET /api/v1/sse/retranslate/summary/{taskId}?targetLang=ja-JP
const response = await fetch(
  `https://vas-poc.vurbo.ai/api/v1/sse/retranslate/summary/${taskId}?targetLang=ja-JP`,
  { headers: { 'X-API-Key': apiKey } }
);
ParameterTypeRequiredDescription
taskIdstringYesTask ID (path parameter)
targetLangstringYesTarget language code

Event Sequence

summary_translation × N → done

summary_translation event

event: summary_translation
data: {"text": "Accumulated translation result...", "is_final": false}

The summary translation is pushed as a stream. is_final: false means translation is still in progress, while is_final: true or receiving the done event indicates completion.

done event

event: done
data: {"totalUpdated": 1}

Common Errors

Error CodeDescription
sse_summary_not_foundThe task has no summary
sse_summary_translation_failedSummary translation failed, retry later

TTS Playback

Convert the translated content of a historical recording into TTS audio for playback. Supports single-sentence or continuous multi-sentence playback.

Request

// Single-sentence playback
const response = await fetch(
  `https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=en-US&sid=1`,
  { headers: { 'X-API-Key': apiKey } }
);

// Multi-sentence playback (start from sentence 5, play 3 sentences)
const response = await fetch(
  `https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=en-US&sid=5&length=3`,
  { headers: { 'X-API-Key': apiKey } }
);
ParameterTypeRequiredDescription
taskIdstringYesTask ID (path parameter)
languagestringYesTTS output language (e.g. en-US)
voicestringNoSpecify the voice name (e.g. en-US-JennyNeural)
sidintNoStarting sentence ID (default 1)
lengthintNoNumber of sentences to play (default 1, max 20)

Event Sequence

connected → tts_audio × N → tts_done

tts_audio event

event: tts_audio
data: {"sid": 5, "transcript": "你好", "text": "Hello", "audio": "Base64...", "duration_ms": 2500, "boundaries": [...]}
FieldDescription
sidSentence ID
transcriptOriginal transcript
textTranslated text (source for TTS synthesis)
audioBase64-encoded MP3 audio
duration_msAudio duration (milliseconds)
boundariesWord Boundary array (can be used for karaoke effects)

tts_done event

event: tts_done
data: {"sentences_sent": 3, "total_duration_ms": 7500}

Frontend Playback Example

async function playTTS(taskId, language, sid, length, apiKey) {
  const url = new URL(`https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}`);
  url.searchParams.set('language', language);
  url.searchParams.set('sid', sid);
  url.searchParams.set('length', length);

  const response = await fetch(url, {
    headers: { 'X-API-Key': apiKey }
  });

  // Play the audio after parsing the SSE events
  // ...(SSE parsing logic same as above)

  // When a tts_audio event is received:
  function handleTTSAudio(data) {
    const binaryString = atob(data.audio);
    const bytes = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
      bytes[i] = binaryString.charCodeAt(i);
    }
    const blob = new Blob([bytes], { type: 'audio/mp3' });
    const audio = new Audio(URL.createObjectURL(blob));
    audio.play();
  }
}

Complete Flow Diagram

                ┌──────────────────┐
                │  GET /api/v1/tasks │  Get task list
                └────────┬─────────┘
                         │
                    Select task_id
                         │
        ┌────────────────┼────────────────┐
        │                │                │
  ┌─────▼──────┐   ┌────▼─────┐   ┌─────▼──────┐
  │ Load        │   │ Audio    │   │ Retranslate│
  │ transcript  │   │ playback │   │            │
  │ SSE History │   │ Audio API│   │SSE Retrans.│
  └─────┬──────┘   └────┬─────┘   └─────┬──────┘
        │                │                │
        │          ┌─────▼──────┐         │
        │          │  HTTP 200  │         │
        │          │  Audio     │         │
        │          │  stream    │         │
        │          │ (Range OK) │         │
        │          └────────────┘         │
        │                                 │
  ┌─────▼──────────────────┐   ┌─────────▼────────┐
  │ SSE event sequence:    │   │ SSE event seq.:   │
  │                        │   │                   │
  │ 1. connected           │   │ translation × N   │
  │ 2. init_metadata       │   │ done              │
  │ 3. init_sentence × N   │   └───────────────────┘
  │ 4. init_summary        │
  │ 5. init_done           │         ┌──────────────────┐
  └────────────────────────┘         │ Summary retrans. │
                                     │ SSE Retrans/Summary│
                                     └────────┬─────────┘
                                              │
                                     summary_translation × N
                                     done
                         │
                ┌────────▼─────────┐
                │  TTS playback     │
                │  SSE /tts/{id}   │
                └────────┬─────────┘
                         │
                connected → tts_audio × N → tts_done

Typical Usage Flow

1. Call GET /api/v1/tasks to get the task list
2. The user selects a task
3. Call these concurrently:
   a. SSE History API to load the transcript (render each init_sentence as it arrives)
   b. Audio API to prepare audio playback
4. The user can:
   - Play / seek the audio
   - Switch the translation language (call SSE Retranslate)
   - Switch the summary language (call SSE Retranslate Summary)
   - Switch the summary template to regenerate (call SSE Regenerate Summary)
   - Play the translated TTS audio (call SSE TTS)

DocumentDescription
AuthenticationDetailed API Key authentication explanation
Tasks API ReferenceComplete task management API specification
History SSE ReferenceComplete historical transcript SSE specification
Retranslate SSE ReferenceComplete full-transcript/summary retranslation SSE specification
Regenerate Summary SSE ReferenceComplete SSE specification for switching templates to regenerate the summary
Audio Streaming ReferenceComplete audio playback API specification
TTS Streaming ReferenceComplete TTS speech synthesis SSE specification
Real-Time Voice TranslationReal-time voice translation guide
Audio ImportAudio import guide

Version: V1.5.7 Last Updated: 2026-05-20

Copyright © 2026