Events
Overview
A reference for all response event formats you may receive over WebSocket. For connection and authentication, see Connection and Authentication; for request operations, see Voice Translation Actions.
Table of Contents
- session_started - Session started successfully
- result - Recognition/translation result
- status - Generic status response
- task_complete - Task processing complete
- config_updated - Configuration update complete
- tts_ready - TTS audio ready
- tts_error - TTS synthesis failed
- viewer_count - Viewer count update
- broadcast_phase_changed - Broadcast phase changed
- speaker_renamed - Speaker renamed
- speaker_reassigned - Speaker identity reassigned
- speakers_merged - Speakers merged
- language_switch_start - Language switch started
- batch_retranslation - Batch retranslation result
- language_switch_done - Language switch complete
- tts_mode_changed - TTS mode changed
- language_switched - Conversation language switch complete
- tts_updated - Conversation TTS settings updated
- conversation_mode_changed - Conversation mode changed
- speaker_language_changed - Speaker language changed
- error - Error event
- segment_uploaded - Audio segment upload complete
- stt_event - STT connection status event
- viewer_joined - Viewer joined event
- viewer_left - Viewer left event
- upload_error - Upload error
- summary_done - Summary generation complete
- summary_error - Summary generation failed
session_started
Description
After a start action succeeds, the server returns an event containing the complete initial session information. The frontend can use recording_type to distinguish the recording type.
Standard recording (transcribe / conversation / record)
{
"type": "voice-translation",
"data": {
"action": "session_started",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"task_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"recording_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"recording_type": "transcribe",
"recognition_mode": "single",
"message": "Speech recognition started"
}
}
Broadcast mode (broadcast)
{
"type": "voice-translation",
"data": {
"action": "session_started",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"task_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"recording_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"recording_type": "broadcast",
"recognition_mode": "multi_speaker",
"phase": "standby",
"viewer_count": 0,
"queue_count": 0,
"peak_viewers": 0,
"total_viewers": 0,
"message": "Speech recognition started"
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
session_id | string | Session ID (WS connection scope; invalid once the connection ends) |
task_id | string | Task ID (the same identifier as REST /api/v1/tasks/{taskId} and Webhook data.task_id) |
recording_id | string | Deprecated (since V1.4.1): same value as task_id, will be removed in V2.0.0; use task_id instead |
recording_type | string | Recording type: transcribe, conversation, record, broadcast |
recognition_mode | string | Recognition mode: single, multi_speaker |
message | string | Status description message |
phase | string | Broadcast phase: standby or live (broadcast mode only) |
viewer_count | int | Current number of online viewers (broadcast mode only) |
queue_count | int | Number of viewers waiting in the queue (broadcast mode only) |
peak_viewers | int | Peak viewer count for this broadcast (broadcast mode only) |
total_viewers | int | Cumulative total of viewers that have ever connected (broadcast mode only) |
ID alignment tip: The WebSocket, REST, and Webhook interfaces all use
task_id(a UUID) as the unified identifier for a task.session_idis a WS connection-scope identifier and is a different concept from a task.recording_idis the old name used before V1.4.1; its value is exactly the same astask_idand it is retained only for backward compatibility.
result
Description
Speech recognition and translation results. A single result event may contain origin (the recognition result) and/or translations (the translation results).
origin (speech recognition result)
{
"type": "voice-translation",
"data": {
"action": "result",
"origin": {
"sid": 1,
"language": "zh-TW",
"text": "Hello, nice to meet you",
"is_final": true,
"speaker_id": "0",
"detected_language": "zh-TW",
"start_time": "00:05"
}
}
}
origin field descriptions
| Field | Type | Description |
|---|---|---|
sid | int | Sentence number, starting from 1 |
language | string | Source language code. In conversation mode, this is the automatically detected language |
text | string | The recognized text |
is_final | boolean | Whether this is the final result |
speaker_id | string | Original speaker ID |
speaker_label | string | (Multi-speaker mode) Display label (after applying the alias; equals speaker_id when no alias exists) |
detected_language | string | The detected language. In conversation mode, determined automatically by the system |
start_time | string | Sentence start time (mm:ss); not sent during the broadcast standby phase, and counted from 00:00 once live begins |
translations (translation result)
{
"type": "voice-translation",
"data": {
"action": "result",
"translations": {
"en-US": {
"sid": 1,
"text": "Hello, nice to meet you",
"is_final": true
}
}
}
}
translations field descriptions
Translation results are keyed by language code; each language's translation object contains:
| Field | Type | Description |
|---|---|---|
sid | int | Sentence number |
text | string | The translated text |
is_final | boolean | Whether this is the final result |
is_retranslation | boolean | Whether this is a retranslation result (only for retranslate) |
speaker_id | string | (Multi-speaker mode) Original speaker ID (aligned with origin since v1.5.3) |
speaker_label | string | (Multi-speaker mode) Display label (after applying the alias; equals speaker_id when no alias exists) |
Important: The success response for the
retranslateaction uses a separateaction: "translation"event (notresult); the payload structure is the same as the table above. Seevoice-translation.mdretranslate success response.
status
Description
A generic status response, used to confirm operations such as pause, resume, stop, and set_name.
{
"type": "voice-translation",
"data": {
"action": "status",
"message": "Speech recognition paused"
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
message | string | Status description |
task_complete
Description
Triggered after stop, once the audio file and transcript have finished uploading. The task_id can be used for subsequent REST API queries about the task details.
{
"type": "voice-translation",
"data": {
"action": "task_complete",
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"message": "Task processing complete"
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
task_id | string | Recording UUID, usable for subsequent API queries |
message | string | Status description |
config_updated
Description
Configuration update complete event, triggered after a config action succeeds.
{
"type": "voice-translation",
"data": {
"action": "config_updated",
"updated": ["terminology", "fuzzy_correction", "translation_dict"],
"message": "Configuration updated"
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
updated | string | The configuration types that were updated (terminology, fuzzy_correction, translation_dict) |
terminology_effective | string | (Optional) If the terminology is updated during recording, a value of "next_turn" indicates the new terminology takes effect from the next sentence; this field does not appear in the initial config |
message | string | Status message |
tts_ready
Description
TTS speech synthesis complete event. It contains the audio data and Word Boundary information (which can be used for a karaoke effect).
{
"type": "voice-translation",
"data": {
"action": "tts_ready",
"sid": 1,
"language": "en-US",
"transcript": "Hello, nice to meet you",
"text": "Hello, nice to meet you",
"audio": "Base64EncodedMP3...",
"format": "mp3",
"duration_ms": 2500,
"boundaries": [
{"offset_ms": 0, "duration_ms": 350, "text_offset": 0, "word_length": 5, "text": "Hello", "boundary_type": "WordBoundary"},
{"offset_ms": 350, "duration_ms": 100, "text_offset": 5, "word_length": 1, "text": ",", "boundary_type": "PunctuationBoundary"},
{"offset_ms": 500, "duration_ms": 250, "text_offset": 7, "word_length": 4, "text": "nice", "boundary_type": "WordBoundary"},
{"offset_ms": 750, "duration_ms": 200, "text_offset": 12, "word_length": 2, "text": "to", "boundary_type": "WordBoundary"},
{"offset_ms": 950, "duration_ms": 350, "text_offset": 15, "word_length": 4, "text": "meet", "boundary_type": "WordBoundary"},
{"offset_ms": 1300, "duration_ms": 300, "text_offset": 20, "word_length": 3, "text": "you", "boundary_type": "WordBoundary"}
]
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
sid | int | Sentence number |
language | string | TTS language |
transcript | string | Original transcript (STT recognition result) |
text | string | Translated text (the source for TTS synthesis) |
audio | string | Base64-encoded MP3 audio |
format | string | Audio format (always mp3) |
duration_ms | int | Total audio duration (milliseconds) |
boundaries | array | Word Boundary array |
Word Boundary field descriptions
| Field | Type | Description |
|---|---|---|
offset_ms | int | The word's start time within the audio (milliseconds) |
duration_ms | int | The word's duration (milliseconds) |
text_offset | int | The position within the original string (character index) |
word_length | int | Word length (number of characters) |
text | string | The word content |
boundary_type | string | Boundary type; common values: WordBoundary, PunctuationBoundary, SentenceBoundary, etc. |
tts_error
Description
TTS synthesis failed event.
{
"type": "voice-translation",
"data": {
"action": "tts_error",
"sid": 1,
"language": "en-US",
"error": "translation_not_found",
"message": "No translation available for language: en-US"
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
sid | int | Sentence number |
language | string | TTS language |
error | string | Error code |
message | string | Error message |
transcript | string | (Optional) The corresponding original transcript, to help the frontend locate the point of failure |
TTS error codes
| Error code | Description |
|---|---|
translation_not_found | No translation found for this language |
tts_synthesis_failed | TTS synthesis failed |
tts_quota_exceeded | TTS usage has reached its limit |
viewer_count
Broadcast mode only
Description
While a broadcast is in progress, the system checks the viewer count every 3 seconds and pushes this event to the host whenever it changes.
{
"type": "voice-translation",
"data": {
"action": "viewer_count",
"viewer_count": 45,
"queue_count": 8,
"peak_viewers": 50,
"total_viewers": 123
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
viewer_count | int | Current number of online viewers |
queue_count | int | Number of viewers waiting in the queue |
peak_viewers | int | Peak viewer count for this broadcast |
total_viewers | int | Cumulative total of viewers that have ever connected |
Note: This event is pushed only when the viewer count or queue count changes, to avoid unnecessary message transmission.
broadcast_phase_changed
Description
Triggered when the broadcast phase switches from standby to live.
{
"type": "voice-translation",
"data": {
"action": "broadcast_phase_changed",
"phase": "live",
"message": "Broadcast started"
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
phase | string | The new phase: standby or live |
message | string | Status description message |
speaker_renamed
Description
Global speaker rename complete event.
{
"type": "voice-translation",
"data": {
"action": "speaker_renamed",
"speaker_id": "Guest-1",
"new_label": "Manager Wang",
"affected_sids": [1, 3, 5, 8]
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
speaker_id | string | The resolved original speaker ID (even if the input was a display label, the event returns the original ID) |
new_label | string | The new display label |
affected_sids | int | List of affected sentence numbers |
speaker_reassigned
Description
Single-sentence speaker reassignment complete event.
{
"type": "voice-translation",
"data": {
"action": "speaker_reassigned",
"sid": 5,
"old_speaker_id": "Guest-1",
"new_speaker_id": "Guest-2",
"new_speaker_label": "Lisa Lee"
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
sid | int | The sentence number that was changed |
old_speaker_id | string | The original speaker ID |
new_speaker_id | string | The new original speaker ID |
new_speaker_label | string | The new speaker display label (after applying speaker_aliases; equals new_speaker_id when no alias exists) |
speakers_merged
Description
Speaker merge complete event. After the merge, future recognition results produced by the source speaker are also automatically converted to the target speaker.
{
"type": "voice-translation",
"data": {
"action": "speakers_merged",
"source_speaker_id": "Guest-2",
"target_speaker_id": "Guest-1",
"affected_sids": [3, 5, 7]
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
source_speaker_id | string | The original ID of the merged-away speaker |
target_speaker_id | string | The original ID of the merge target speaker |
affected_sids | number | List of affected sentence IDs |
To obtain the target speaker's display label, query
speaker_aliasesor the nextinit_metadataevent.
language_switch_start
Description
Language switch started event, sent after the switch_language action is triggered.
{
"type": "voice-translation",
"data": {
"action": "language_switch_start",
"translation_language": "ja-JP",
"total_segments": 15,
"message": "Starting language switch and retranslation"
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
translation_language | string | The new translation target language |
total_segments | int | The number of sentences to retranslate |
message | string | Status description |
batch_retranslation
Description
Batch retranslation result event, sent sentence by sentence during the language switch process.
{
"type": "voice-translation",
"data": {
"action": "batch_retranslation",
"sid": 3,
"translations": {
"ja-JP": {
"sid": 3,
"text": "今日はプロジェクトの進捗について話し合いましょう",
"is_final": true,
"is_retranslation": true
}
}
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
sid | int | Sentence number |
translations | object | Translation result (same format as the translations in result) |
language_switch_done
Description
Language switch complete event.
{
"type": "voice-translation",
"data": {
"action": "language_switch_done",
"translation_language": "ja-JP",
"success_count": 15,
"failed_count": 2,
"failed_sids": [3, 7],
"message": "Language switch complete"
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
translation_language | string | The translation target language |
success_count | int | The number of sentences successfully translated |
failed_count | int | The number of sentences that failed to translate |
failed_sids | int | List of sentence numbers that failed to translate (included only when failed_count > 0) |
message | string | Status description |
tts_mode_changed
Description
TTS playback mode changed event.
{
"type": "voice-translation",
"data": {
"action": "tts_mode_changed",
"tts_mode": "async"
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
tts_mode | string | The new mode: sync or async |
language_switched
Description
Conversation-mode (conversation) language switch complete event. Triggered when switch_language successfully switches the STT source language in conversation mode.
{
"type": "voice-translation",
"data": {
"action": "language_switched",
"language": "en-US",
"translation_language": "zh-TW",
"message": "Language switched"
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
language | string | The new active language (STT source) |
translation_language | string | The new translation target language |
message | string | Status message |
tts_updated
Description
Conversation-mode (conversation) TTS settings updated event. Triggered when set_tts successfully updates the TTS toggle or voice settings.
{
"type": "voice-translation",
"data": {
"action": "tts_updated",
"tts_enabled": true,
"tts_config": {
"zh-TW": { "voice": "zh-TW-HsiaoChenNeural", "speaking_rate": 1.0 },
"en-US": { "voice": "en-US-GuyNeural", "speaking_rate": 1.2 }
}
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
tts_enabled | boolean | Whether TTS is enabled |
tts_config | object | The TTS settings for each language (voice, speaking_rate) |
conversation_mode_changed
Description
Conversation-mode (conversation) mode changed event. Triggered when switch_conversation_mode successfully switches between auto and manual mode.
{
"type": "voice-translation",
"data": {
"action": "conversation_mode_changed",
"conversation_mode": "manual"
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
conversation_mode | string | The new conversation mode: auto or manual |
speaker_language_changed
Description
Conversation-mode (conversation) speaker language changed event. Triggered when set_speaker_language successfully changes a speaker's language; it includes the complete language map after the change.
{
"type": "voice-translation",
"data": {
"action": "speaker_language_changed",
"speaker_language_map": {
"1": "ja-JP",
"2": "en-US"
}
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
speaker_language_map | object | The speaker language map after the change (the key is the speaker number as a string) |
segment_uploaded
Description
Audio segment upload complete event. Triggered whenever an audio segment is successfully uploaded to cloud storage; it can be used to display upload progress in the frontend.
{
"type": "voice-translation",
"data": {
"action": "segment_uploaded",
"segment_index": 0,
"duration_sec": 30.5
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
segment_index | number | Segment index (starting from 0) |
duration_sec | number | The duration of this segment (seconds) |
stt_event
Description
STT connection status event. Triggered when the connection status of the speech recognition service changes; it can be used to display the STT service status in the frontend.
{
"type": "voice-translation",
"data": {
"action": "stt_event",
"event": "connected",
"message": "STT service connected"
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
event | string | Event type: connected / disconnected / error |
message | string | Event description message |
viewer_joined
Description
Viewer joined event (broadcast mode only). When a viewer joins the broadcast, the host receives this event.
{
"type": "voice-translation",
"data": {
"action": "viewer_joined",
"viewer": {
"id": "viewer_abc123",
"ip": "192.168.1.100",
"language": "zh-TW"
},
"viewer_count": 5,
"queue_count": 2
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
viewer | object | Information about the viewer who joined |
viewer.id | string | Viewer ID |
viewer.ip | string | Viewer IP address |
viewer.language | string | The language the viewer selected |
viewer_count | number | Current viewer count |
queue_count | number | Number of viewers in the queue |
viewer_left
Description
Viewer left event (broadcast mode only). When a viewer leaves the broadcast, the host receives this event.
{
"type": "voice-translation",
"data": {
"action": "viewer_left",
"viewer_id": "viewer_abc123",
"viewer_count": 4,
"queue_count": 1
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
viewer_id | string | The ID of the viewer who left |
viewer_count | number | Current viewer count |
queue_count | number | Number of viewers in the queue |
error
Description
Error event. Triggered when an operation fails or a system error occurs.
{
"type": "error",
"data": {
"error_code": "session_not_started",
"severity": "error",
"message": "Session not started",
"context": "voice-translation",
"request_id": "req_abc123xyz789",
"timestamp": "2026-01-15T10:30:45.123Z"
}
}
A sentence-level error (such as a translation failure for one language of a sentence) additionally carries sid and details:
{
"type": "error",
"data": {
"error_code": "llm_content_filtered",
"severity": "warning",
"message": "Content filtered",
"context": "translation",
"sid": 5,
"request_id": "req_abc123xyz789",
"timestamp": "2026-04-26T10:30:45.123Z",
"details": {
"provider": "llm_service",
"source_lang": "zh-TW",
"translation_language": "ja"
}
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
error_code | string | Error code (for programmatic handling) |
severity | string | Severity: fatal / error / warning |
message | string | Human-readable error message |
context | string | Error source category |
sid | int | Optional. The sentence number for a sentence-level error (such as a translation failure for that sentence); not included for non-sentence-level errors |
request_id | string | Request tracing ID |
timestamp | string | The time the error occurred (ISO 8601) |
details | object | Optional. Error context; common keys: provider, translation_language, source_lang. internal_error (a panic recovered for a single message) additionally carries message_type (always present) and action (best-effort; absent when parsing fails). See websocket-api.md Per-Message Errors |
Severity descriptions
| severity | Description | Recommended handling |
|---|---|---|
fatal | Fatal error | Stop the service and require reconnection |
error | Operation failed | Show an error prompt and allow retry |
warning | Warning | Show a warning without blocking the operation |
For the complete list of error codes, see Error Code Reference.
upload_error
v1.5.6 documentation fix: Earlier documentation described a standalone event format of
type: "voice-translation"+action: "upload_error", but in practice this format was never sent on the wire. Storage upload failures always use the unifiederrorenvelope, with one of the three error codes in the table below.If your client listens for
action === "upload_error", switch to listening fortype === "error"and matching onerror_code.
Storage-layer error codes (sent via the error event)
| Error code | Description |
|---|---|
storage_connection_failed | Storage service connection failed |
storage_upload_failed | File upload failed |
storage_queue_full | Upload queue full |
summary_done
Description
An event pushed after recording stops, once server-side non-streaming summary generation is complete. After receiving this event, the client can call GET /api/v1/sse/history/transcribe/{taskId} to retrieve the summary content (the payload does not include final_content, to avoid bloating the WebSocket message).
v1.5.5 adds two fallback audit fields, summary_fallback_level / summary_dropped_segments: when a custom prompt or transcript content triggers the LLM service content filter, the backend automatically downgrades (L1→L2→L3) and uses these two fields to notify the client of the path actually taken.
Examples
L1 succeeds directly (no fallback, no filtering triggered):
{
"type": "voice-translation",
"data": {
"action": "summary_done",
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"summary_id": "sum_a1b2c3d4e5f6g7h8",
"summary_mode": "custom",
"summary_template": "skin-clinic-acme-v2",
"summary_plain_text": true,
"tokens_used": { "input": 1234, "output": 567 }
}
}
L3 triggered (summary produced after transcript segments were trimmed):
{
"type": "voice-translation",
"data": {
"action": "summary_done",
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"summary_id": "sum_a1b2c3d4e5f6g7h8",
"summary_mode": "custom",
"summary_template": "skin-clinic-acme-v2",
"summary_plain_text": true,
"tokens_used": { "input": 3456, "output": 789 },
"summary_fallback_level": 3,
"summary_dropped_segments": [3, 7]
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
action | string | Always summary_done |
task_id | string | Recording UUID |
summary_id | string | The internal ID of this summary |
summary_mode | string | "builtin" or "custom" |
summary_template | string | effective slug — builtin → the built-in template slug (such as meeting); custom → the customer slug |
summary_plain_text | boolean | Whether the output is plain text |
tokens_used.input / .output | int | Token usage (a cumulative value across all calls when an L2/L3 fallback is triggered) |
summary_fallback_level | int (omit) | Present only when a fallback was triggered (2 or 3); omitted when L1 succeeds directly. 2 = L2 neutral prompt; 3 = L3 segment trimming |
summary_dropped_segments | int (omit) | Present only when fallback_level=3; the indices of the trimmed transcript segments (in original order) |
Interpreting the fallback level (for frontend UI hints)
summary_fallback_level | Meaning | Suggested UI hint |
|---|---|---|
| (field omitted) | L1 succeeds directly, no fallback | Do not show a hint |
2 | The customer prompt triggered filtering, so a neutral fallback prompt was used instead | "Your custom instructions contained terms the content filter could not process; the summary was generated using neutral mode" |
3 | The transcript content triggered filtering; the offending segments were trimmed before producing the summary | "The transcript contained N segments that could not be processed; the summary was generated after omitting the relevant content" (N = summary_dropped_segments.length) |
If L3 fails,
summary_doneis not sent; instead,summary_erroris sent witherror_code=llm_content_filtered(see §summary_error below).Note: The payload deliberately does not include
final_content. The client must callGET /api/v1/sse/history/transcribe/{taskId}itself to retrieve the full summary text.summary_fallback_levelandsummary_dropped_segmentsare also provided as top-level fields of theinit_summaryevent during history playback.
summary_error
Description
An event pushed when summary generation fails, so the client does not need to keep polling to find out.
Example
{
"type": "voice-translation",
"data": {
"action": "summary_error",
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"error_code": "summary_failed",
"message": "Summary generation failed"
}
}
Field descriptions
| Field | Type | Description |
|---|---|---|
action | string | Always summary_error |
task_id | string | Recording UUID |
error_code | string | Summary error code (such as summary_failed / summary_timeout / summary_mode_field_mismatch, etc.) |
message | string | Human-readable error message (already sanitized; does not include the LLM raw error) |
Version: V1.5.7 Last Updated: 2026-05-20