WebSocket API

Events

Overview

A reference for all response event formats you may receive over WebSocket. For connection and authentication, see Connection and Authentication; for request operations, see Voice Translation Actions.

session_started - Session started successfully
result - Recognition/translation result
status - Generic status response
task_complete - Task processing complete
config_updated - Configuration update complete
tts_ready - TTS audio ready
tts_error - TTS synthesis failed
viewer_count - Viewer count update
broadcast_phase_changed - Broadcast phase changed
speaker_renamed - Speaker renamed
speaker_reassigned - Speaker identity reassigned
speakers_merged - Speakers merged
language_switch_start - Language switch started
batch_retranslation - Batch retranslation result
language_switch_done - Language switch complete
tts_mode_changed - TTS mode changed
language_switched - Conversation language switch complete
tts_updated - Conversation TTS settings updated
conversation_mode_changed - Conversation mode changed
speaker_language_changed - Speaker language changed
error - Error event
segment_uploaded - Audio segment upload complete
stt_event - STT connection status event
viewer_joined - Viewer joined event
viewer_left - Viewer left event
upload_error - Upload error
summary_done - Summary generation complete
summary_error - Summary generation failed

session_started

Description

After a start action succeeds, the server returns an event containing the complete initial session information. The frontend can use recording_type to distinguish the recording type.

Standard recording (transcribe / conversation / record)

{
  "type": "voice-translation",
  "data": {
    "action": "session_started",
    "session_id": "550e8400-e29b-41d4-a716-446655440000",
    "task_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "recording_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "recording_type": "transcribe",
    "recognition_mode": "single",
    "message": "Speech recognition started"
  }
}

Broadcast mode (broadcast)

{
  "type": "voice-translation",
  "data": {
    "action": "session_started",
    "session_id": "550e8400-e29b-41d4-a716-446655440000",
    "task_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "recording_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "recording_type": "broadcast",
    "recognition_mode": "multi_speaker",
    "phase": "standby",
    "viewer_count": 0,
    "queue_count": 0,
    "peak_viewers": 0,
    "total_viewers": 0,
    "message": "Speech recognition started"
  }
}

Field	Type	Description
`session_id`	string	Session ID (WS connection scope; invalid once the connection ends)
`task_id`	string	Task ID (the same identifier as REST `/api/v1/tasks/{taskId}` and Webhook `data.task_id`)
`recording_id`	string	Deprecated (since V1.4.1): same value as `task_id`, will be removed in V2.0.0; use `task_id` instead
`recording_type`	string	Recording type: `transcribe`, `conversation`, `record`, `broadcast`
`recognition_mode`	string	Recognition mode: `single`, `multi_speaker`
`message`	string	Status description message
`phase`	string	Broadcast phase: `standby` or `live` (broadcast mode only)
`viewer_count`	int	Current number of online viewers (broadcast mode only)
`queue_count`	int	Number of viewers waiting in the queue (broadcast mode only)
`peak_viewers`	int	Peak viewer count for this broadcast (broadcast mode only)
`total_viewers`	int	Cumulative total of viewers that have ever connected (broadcast mode only)

Field	Type	Description
`sid`	int	Sentence number, starting from 1
`language`	string	Source language code. In conversation mode, this is the automatically detected language
`text`	string	The recognized text
`is_final`	boolean	Whether this is the final result
`speaker_id`	string	Original speaker ID
`speaker_label`	string	(Multi-speaker mode) Display label (after applying the alias; equals `speaker_id` when no alias exists)
`detected_language`	string	The detected language. In conversation mode, determined automatically by the system
`start_time`	string	Sentence start time (mm:ss); not sent during the broadcast standby phase, and counted from `00:00` once live begins

Field	Type	Description
`sid`	int	Sentence number
`text`	string	The translated text
`is_final`	boolean	Whether this is the final result
`is_retranslation`	boolean	Whether this is a retranslation result (only for retranslate)
`speaker_id`	string	(Multi-speaker mode) Original speaker ID (aligned with origin since v1.5.3)
`speaker_label`	string	(Multi-speaker mode) Display label (after applying the alias; equals `speaker_id` when no alias exists)

Field	Type	Description
`task_id`	string	Recording UUID, usable for subsequent API queries
`message`	string	Status description