Tts
Connection Information
| Item | Value |
|---|---|
| Base path | https://vas-poc.vurbo.ai/api/v1/sse |
| Protocol | HTTP + Server-Sent Events (SSE) |
| Data format | text/event-stream |
| Authentication | Header X-API-Key: {KEY} |
Note: The browser's native EventSource API does not support custom headers. Use the fetch API with a ReadableStream, or use an SSE client library that supports headers.
Endpoint Overview
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/sse/tts/{taskId} | TTS speech synthesis stream |
GET /api/v1/sse/tts/{taskId}
Description
Converts the translated content of a historical recording into TTS speech and streams it sentence by sentence over SSE. The frontend can control how many sentences each request returns.
Use Cases
- Playing back the translated speech of historical recordings
- Karaoke-style effects (in combination with Word Boundary data)
- Reading translated content aloud
Authentication
Header: X-API-Key (see Authentication)
Request Parameters
| Parameter | Location | Type | Required | Description |
|---|---|---|---|---|
taskId | path | string | Yes | Recording ID (UUID) |
language | query | string | Yes | TTS output language (e.g., en-US) |
voice | query | string | No | Specific voice name (e.g., en-US-JennyNeural) |
sid | query | int | No | Starting sentence ID (default 1, starts from the first sentence) |
length | query | int | No | Number of sentences to return (default 1, maximum 20) |
Note: The maximum value of
lengthis controlled by the backend environment variableTTS_SSE_MAX_LENGTH(default 20). Values that exceed the maximum are automatically trimmed.
Request Examples
Single-sentence playback:
curl -N "https://vas-poc.vurbo.ai/api/v1/sse/tts/550e8400-e29b-41d4-a716-446655440000?language=en-US&sid=1" \
-H "X-API-Key: vas_aB3dE5fG7hI9jK1lM3nO5pQ7rS9tU1vW"
// Use the fetch API (because EventSource does not support headers)
async function playTTSSingle(taskId, language, sid, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=${language}&sid=${sid}`,
{
headers: {
'X-API-Key': apiKey
}
}
);
const reader = response.body.getReader();
// ... handle SSE events
}
Multi-sentence playback:
// Play sentences 5, 6, and 7 (3 sentences total)
async function playTTSMultiple(taskId, language, sid, length, apiKey) {
const response = await fetch(
`https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}?language=${language}&sid=${sid}&length=${length}`,
{
headers: {
'X-API-Key': apiKey
}
}
);
const reader = response.body.getReader();
// ... handle SSE events
}
Event Sequence
1. connected → connection confirmed
2. tts_audio → TTS audio sent sentence by sentence (repeated N times, N = length)
3. tts_done → playback complete
* tts_error → sent when synthesis fails (replaces tts_done)
Event Formats
connected
{
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"language": "en-US",
"voice": "en-US-JennyNeural",
"start_sid": 5,
"length": 3
}
| Field | Type | Description |
|---|---|---|
task_id | string | Task ID (UUID) |
language | string | TTS output language |
voice | string | Voice name in use |
start_sid | number | Starting sentence ID |
length | number | Number of sentences requested |
tts_audio
{
"sid": 5,
"transcript": "Original text",
"text": "Translation",
"audio": "Base64EncodedMP3...",
"duration_ms": 2500,
"boundaries": [
{
"offset_ms": 0,
"duration_ms": 350,
"text_offset": 0,
"word_length": 5,
"text": "Hello"
}
]
}
| Field | Type | Description |
|---|---|---|
sid | number | Sentence ID |
transcript | string | Original transcript (STT recognition result) |
text | string | Translated text (source for TTS synthesis) |
audio | string | Base64-encoded MP3 audio |
duration_ms | number | Audio duration (milliseconds) |
boundaries | array | Word Boundary array |
Word Boundary field descriptions (each object in the boundaries array):
| Field | Type | Description |
|---|---|---|
offset_ms | number | Start time of the word in the audio (milliseconds) |
duration_ms | number | Duration of the word (milliseconds) |
text_offset | number | Position in the original text string (character index) |
word_length | number | Word length (number of characters) |
text | string | Word content |
tts_done
{
"sentences_sent": 3,
"total_duration_ms": 7500,
"total_characters_used": 120
}
| Field | Type | Description |
|---|---|---|
sentences_sent | number | Number of sentences actually sent |
total_duration_ms | number | Total audio duration of all sentences (milliseconds) |
total_characters_used | number | Total number of characters consumed by TTS synthesis (usage statistics) |
tts_error
Sent when an error occurs during TTS synthesis.
{
"error": "tts_synthesis_failed",
"message": "TTS synthesis failed"
}
| Field | Type | Description |
|---|---|---|
error | string | Error code |
message | string | Error message |
Specific Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
recording_not_found | 404 | Recording not found | Verify that taskId is correct |
sse_missing_target_lang | 422 | Missing language parameter | Provide the language parameter |
sse_unsupported_language | 422 | Unsupported language | Use a valid language code |
tts_translation_not_found | 400 | No translation found for the language | Verify that a translation exists for that language |
tts_synthesis_failed | 500 | TTS synthesis failed | Retry later |
tts_quota_exceeded | 402 | TTS usage limit reached | Retry later |
Frontend Example
async function playTTS(taskId, language, apiKey, startSid = 1, length = 1) {
const url = new URL(`https://vas-poc.vurbo.ai/api/v1/sse/tts/${taskId}`);
url.searchParams.set('language', language);
url.searchParams.set('sid', startSid);
url.searchParams.set('length', length);
const response = await fetch(url, {
headers: {
'X-API-Key': apiKey
}
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const events = parseSSE(decoder.decode(value));
for (const event of events) {
if (event.type === 'connected') {
console.log(`TTS connected, voice: ${event.data.voice}`);
} else if (event.type === 'tts_audio') {
console.log(`Sentence ${event.data.sid}: ${event.data.text}`);
// Play the audio
const audioBlob = base64ToBlob(event.data.audio, 'audio/mp3');
const audioUrl = URL.createObjectURL(audioBlob);
const audio = new Audio(audioUrl);
// Set up the karaoke effect
setupKaraoke(audio, event.data.boundaries, event.data.text);
audio.play();
} else if (event.type === 'tts_done') {
console.log(`Playback complete, ${event.data.sentences_sent} sentences total`);
}
}
}
}
// Base64 to Blob
function base64ToBlob(base64, mimeType) {
const byteCharacters = atob(base64);
const byteNumbers = new Array(byteCharacters.length);
for (let i = 0; i < byteCharacters.length; i++) {
byteNumbers[i] = byteCharacters.charCodeAt(i);
}
const byteArray = new Uint8Array(byteNumbers);
return new Blob([byteArray], { type: mimeType });
}
// Karaoke effect
function setupKaraoke(audio, boundaries, text) {
const updateHighlight = () => {
const currentTimeMs = audio.currentTime * 1000;
const currentWord = boundaries.find((b, i) => {
const nextOffset = boundaries[i + 1]?.offset_ms ?? Infinity;
return currentTimeMs >= b.offset_ms && currentTimeMs < nextOffset;
});
if (currentWord) {
// Highlight the current word
highlightWord(text, currentWord.text_offset, currentWord.word_length);
}
};
const interval = setInterval(updateHighlight, 50);
audio.addEventListener('ended', () => clearInterval(interval));
}
Version: V1.5.7 Last Updated: 2026-05-20