SSE API

Tts

Connection Information

ItemValue
Base pathhttps://vas-poc.vurbo.ai/api/v1/sse
ProtocolHTTP + Server-Sent Events (SSE)
Data formattext/event-stream
AuthenticationHeader X-API-Key: {KEY}

Note: The browser's native EventSource API does not support custom headers. Use the fetch API with a ReadableStream, or use an SSE client library that supports headers.


Endpoint Overview

MethodEndpointDescription
GET/api/v1/sse/tts/{taskId}TTS speech synthesis stream

GET /api/v1/sse/tts/{taskId}

Description

Converts the translated content of a historical recording into TTS speech and streams it sentence by sentence over SSE. The frontend can control how many sentences each request returns.

Use Cases

  • Playing back the translated speech of historical recordings
  • Karaoke-style effects (in combination with Word Boundary data)
  • Reading translated content aloud

Authentication

Header: X-API-Key (see Authentication)

Request Parameters

ParameterLocationTypeRequiredDescription
taskIdpathstringYesRecording ID (UUID)
languagequerystringYesTTS output language (e.g., en-US)
voicequerystringNoSpecific voice name (e.g., en-US-JennyNeural)
sidqueryintNoStarting sentence ID (default 1, starts from the first sentence)
lengthqueryintNoNumber of sentences to return (default 1, maximum 20)

Note: The maximum value of length is controlled by the backend environment variable TTS_SSE_MAX_LENGTH (default 20). Values that exceed the maximum are automatically trimmed.

Request Examples

Single-sentence playback:

curl -N "https://vas-poc.vurbo.ai/api/v1/sse/tts/550e8400-e29b-41d4-a716-446655440000?language=en-US&sid=1" \
  -H "X-API-Key: vas_aB3dE5fG7hI9jK1lM3nO5pQ7rS9tU1vW"
// Use the fetch API (because EventSource does not support headers)
async function playTTSSingle(taskId, language, sid, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=${language}&sid=${sid}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );
  const reader = response.body.getReader();
  // ... handle SSE events
}

Multi-sentence playback:

// Play sentences 5, 6, and 7 (3 sentences total)
async function playTTSMultiple(taskId, language, sid, length, apiKey) {
  const response = await fetch(
    `https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=${language}&sid=${sid}&length=${length}`,
    {
      headers: {
        'X-API-Key': apiKey
      }
    }
  );
  const reader = response.body.getReader();
  // ... handle SSE events
}

Event Sequence

1. connected    → connection confirmed
2. tts_audio    → TTS audio sent sentence by sentence (repeated N times, N = length)
3. tts_done     → playback complete
*  tts_error    → sent when synthesis fails (replaces tts_done)

Event Formats


connected

{
  "task_id": "550e8400-e29b-41d4-a716-446655440000",
  "language": "en-US",
  "voice": "en-US-JennyNeural",
  "start_sid": 5,
  "length": 3
}
FieldTypeDescription
task_idstringTask ID (UUID)
languagestringTTS output language
voicestringVoice name in use
start_sidnumberStarting sentence ID
lengthnumberNumber of sentences requested

tts_audio

{
  "sid": 5,
  "transcript": "Original text",
  "text": "Translation",
  "audio": "Base64EncodedMP3...",
  "duration_ms": 2500,
  "boundaries": [
    {
      "offset_ms": 0,
      "duration_ms": 350,
      "text_offset": 0,
      "word_length": 5,
      "text": "Hello"
    }
  ]
}
FieldTypeDescription
sidnumberSentence ID
transcriptstringOriginal transcript (STT recognition result)
textstringTranslated text (source for TTS synthesis)
audiostringBase64-encoded MP3 audio
duration_msnumberAudio duration (milliseconds)
boundariesarrayWord Boundary array

Word Boundary field descriptions (each object in the boundaries array):

FieldTypeDescription
offset_msnumberStart time of the word in the audio (milliseconds)
duration_msnumberDuration of the word (milliseconds)
text_offsetnumberPosition in the original text string (character index)
word_lengthnumberWord length (number of characters)
textstringWord content

tts_done

{
  "sentences_sent": 3,
  "total_duration_ms": 7500,
  "total_characters_used": 120
}
FieldTypeDescription
sentences_sentnumberNumber of sentences actually sent
total_duration_msnumberTotal audio duration of all sentences (milliseconds)
total_characters_usednumberTotal number of characters consumed by TTS synthesis (usage statistics)

tts_error

Sent when an error occurs during TTS synthesis.

{
  "error": "tts_synthesis_failed",
  "message": "TTS synthesis failed"
}
FieldTypeDescription
errorstringError code
messagestringError message

Specific Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
recording_not_found404Recording not foundVerify that taskId is correct
sse_missing_target_lang422Missing language parameterProvide the language parameter
sse_unsupported_language422Unsupported languageUse a valid language code
tts_translation_not_found400No translation found for the languageVerify that a translation exists for that language
tts_synthesis_failed500TTS synthesis failedRetry later
tts_quota_exceeded402TTS usage limit reachedRetry later

Frontend Example

async function playTTS(taskId, language, apiKey, startSid = 1, length = 1) {
  const url = new URL(`https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}`);
  url.searchParams.set('language', language);
  url.searchParams.set('sid', startSid);
  url.searchParams.set('length', length);

  const response = await fetch(url, {
    headers: {
      'X-API-Key': apiKey
    }
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const events = parseSSE(decoder.decode(value));
    for (const event of events) {
      if (event.type === 'connected') {
        console.log(`TTS connected, voice: ${event.data.voice}`);
      } else if (event.type === 'tts_audio') {
        console.log(`Sentence ${event.data.sid}: ${event.data.text}`);

        // Play the audio
        const audioBlob = base64ToBlob(event.data.audio, 'audio/mp3');
        const audioUrl = URL.createObjectURL(audioBlob);
        const audio = new Audio(audioUrl);

        // Set up the karaoke effect
        setupKaraoke(audio, event.data.boundaries, event.data.text);

        audio.play();
      } else if (event.type === 'tts_done') {
        console.log(`Playback complete, ${event.data.sentences_sent} sentences total`);
      }
    }
  }
}

// Base64 to Blob
function base64ToBlob(base64, mimeType) {
  const byteCharacters = atob(base64);
  const byteNumbers = new Array(byteCharacters.length);
  for (let i = 0; i < byteCharacters.length; i++) {
    byteNumbers[i] = byteCharacters.charCodeAt(i);
  }
  const byteArray = new Uint8Array(byteNumbers);
  return new Blob([byteArray], { type: mimeType });
}

// Karaoke effect
function setupKaraoke(audio, boundaries, text) {
  const updateHighlight = () => {
    const currentTimeMs = audio.currentTime * 1000;
    const currentWord = boundaries.find((b, i) => {
      const nextOffset = boundaries[i + 1]?.offset_ms ?? Infinity;
      return currentTimeMs >= b.offset_ms && currentTimeMs < nextOffset;
    });

    if (currentWord) {
      // Highlight the current word
      highlightWord(text, currentWord.text_offset, currentWord.word_length);
    }
  };

  const interval = setInterval(updateHighlight, 50);
  audio.addEventListener('ended', () => clearInterval(interval));
}

Version: V1.5.7 Last Updated: 2026-05-20

Copyright © 2026