Voice Translation
Overview
A complete list of all actions available under the voice-translation type. For connection and authentication, see Connection and Authentication; for response event formats, see Response Events.
Table of Contents
- start - Start Voice Translation
- config - Configure Terminology / Correction Rules
- audio - Send Audio
- pause - Pause Translation
- resume - Resume Translation
- stop - Stop Translation
- retranslate - Retranslate a Single Sentence
- switch_language - Switch Language
- set_name - Set Recording Name
- rename_speaker - Globally Rename a Speaker
- reassign_speaker - Change the Speaker of a Single Sentence
- merge_speakers - Merge Speakers
- tts_play - Play TTS
- tts_stop - Stop TTS
- tts_mode - Switch TTS Mode
- set_tts - Two-Way Translation TTS Settings
- start_speaking - Start Speaking (Manual Mode)
- stop_speaking - Stop Speaking (Manual Mode)
- switch_conversation_mode - Switch Conversation Mode
- set_speaker_language - Set Speaker Language
- broadcast_go_live - Switch to the Live Phase
- broadcast_announcement - Send an Announcement
- set_standby_message - Set the Standby Phase Message
start - Start Voice Translation
Description
Start a new voice translation session and begin processing audio according to the configured parameters.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value start |
transcription_languages | string | Yes | Speech recognition languages (up to 2) |
translation_languages | string | No | Translation target languages (empty = no translation) |
realtime_translation | boolean | No | Real-time translation mode (default false) |
recognition_mode | string | No | Recognition mode: single (single speaker, default), multi_speaker (multiple speakers); under multi_speaker, transcription_languages must contain exactly 1 language, otherwise a diarization_multilang_conflict error is returned and the session is refused |
type | string | Yes | Recording type: transcribe, conversation, record, broadcast |
audio_format | string | No | Audio format: pcm (default), webm |
summary_template | string | Conditional | Summary template (required for transcribe, optional for conversation/broadcast) |
options | object | No | Speech recognition options |
tts_enabled | boolean | No | Whether to enable TTS speech synthesis (default false) |
tts_language | string | No | TTS output language (must be in translation_languages) |
tts_voice | string | No | TTS voice name (e.g. en-US-JennyNeural) |
tts_mode | string | No | TTS playback mode: sync (synchronous, default), async (asynchronous) |
broadcast_token | string | Conditional | Broadcast token (required for broadcast type, obtained from the REST API) |
active_language | string | No | Initial active language in two-way translation mode (default transcription_languages[0]) |
tts_config | object | No | Multi-language TTS settings (broadcast / two-way translation mode) |
broadcast_phase | string | No | Initial broadcast phase: standby, live (default) |
standby_message | string | No | Message viewers see during the standby phase (default: "Preparing, please wait...") |
name | string | No | Initial default recording name (max 60 characters; the system may still override it; if not provided, one is generated automatically, e.g. Transcription #1) |
summary_language | string | No | Summary output language (defaults to the recognition language when not specified; in broadcast mode it is read automatically from the channel settings) |
summary_mode | string | No | Summary mode enum: builtin (apply the built-in template, default) / custom (the customer prompt fully replaces the default). When omitted, builtin is inferred automatically |
summary_prompt | string | No | Required in custom mode; treated as supplementary instructions in builtin mode. ≤2000 characters |
summary_prompt_slug | string | No | Required in custom mode; must not be provided in builtin mode. The customer's own identifier (≤64 characters, Unicode, no control characters; passed through and stored in the backend record for historical lookup) |
summary_plain_text | boolean | No | Request plain-text summary output (default false; when enabled, the backend performs Markdown post-processing) |
speakers | object | Conditional | Speaker language settings for two-way translation mode (required for conversation type, exactly 2 entries, see below) |
conversation_mode | string | No | Two-way conversation mode: auto (automatic detection, default), manual (manual PTT) |
Request Example (Basic)
{
"type": "voice-translation",
"data": {
"action": "start",
"transcription_languages": ["zh-TW"],
"translation_languages": ["en-US"],
"realtime_translation": false,
"type": "transcribe",
"audio_format": "pcm",
"summary_template": "meeting",
"options": {
"speaking_speed": "normal",
"segmentation_mode": "auto",
"profanity_handling": "mask"
}
}
}
Request Example (Initial Default Name)
{
"type": "voice-translation",
"data": {
"action": "start",
"transcription_languages": ["zh-TW"],
"translation_languages": ["en-US"],
"type": "transcribe",
"audio_format": "pcm",
"summary_template": "meeting",
"name": "Product Planning Meeting"
}
}
Recording Name Rules
| Scenario | Name | name_source | Overridden by system? |
|---|---|---|---|
start with a name parameter | Initial default name | default | Yes |
start without a name | Auto-generated (e.g. Transcription #1, Broadcast #3) | default | Yes |
Set via set_name | Name explicitly set by the user | user | No |
| Auto-generated by the system after the session ends | Summary name generated from the transcript content | llm | — |
Note: The
nameinstartis an initial default name; the system may still override it when the session ends. If you need a fixed name, useset_name.
Default name formats (fixed English):
| Recording Type | Default Name Format |
|---|---|
transcribe | Transcription #N |
conversation | Conversation #N |
record | Recording #N |
broadcast | Broadcast #N |
Nis the sequential number of recordings of the same type for that user. Name priority:user>llm>default. Once the user sets a name, the system will not override it when the session ends.
Request Example (with TTS)
{
"type": "voice-translation",
"data": {
"action": "start",
"transcription_languages": ["zh-TW"],
"translation_languages": ["en-US"],
"realtime_translation": true,
"type": "transcribe",
"tts_enabled": true,
"tts_language": "en-US",
"tts_voice": "en-US-JennyNeural",
"tts_mode": "sync"
}
}
Request Example (Two-Way Translation Mode - Automatic Detection)
{
"type": "voice-translation",
"data": {
"action": "start",
"type": "conversation",
"transcription_languages": ["zh-TW", "en-US"],
"active_language": "zh-TW",
"audio_format": "pcm",
"realtime_translation": true,
"speakers": [
{ "id": 1, "language": "zh-TW" },
{ "id": 2, "language": "en-US" }
],
"tts_config": {
"zh-TW": { "voice": "zh-TW-HsiaoChenNeural", "speaking_rate": 1.0 },
"en-US": { "voice": "en-US-JennyNeural", "speaking_rate": 1.0 }
}
}
}
Request Example (Two-Way Translation Mode - Manual Mode)
{
"type": "voice-translation",
"data": {
"action": "start",
"type": "conversation",
"transcription_languages": ["zh-TW", "en-US"],
"conversation_mode": "manual",
"audio_format": "pcm",
"realtime_translation": true,
"speakers": [
{ "id": 1, "language": "zh-TW" },
{ "id": 2, "language": "en-US" }
],
"tts_config": {
"zh-TW": { "voice": "zh-TW-HsiaoChenNeural", "speaking_rate": 1.0 },
"en-US": { "voice": "en-US-JennyNeural", "speaking_rate": 1.0 }
}
}
}
Special rules for two-way translation mode:
| Item | Description |
|---|---|
transcription_languages | Must contain exactly 2 languages, and they must differ |
translation_languages | Not required (automatically derived as the non-active language) |
active_language | Optional, defaults to transcription_languages[0] |
recognition_mode | Forced to single (speaker_diarization is ignored) |
tts_enabled | Defaults to true; set to false to return text translation only |
tts_config | Optional; configures the TTS voice for each of the two languages; leave empty to use the default voices automatically |
summary_template | Optional; when provided, a summary is generated automatically after stopping |
speakers | Required in two-way translation mode; specifies each user's language (exactly 2 entries) |
conversation_mode | Optional, auto (automatic detection, default) or manual (manual PTT) |
speakers field description:
| Field | Type | Required | Description |
|---|---|---|---|
id | int | Yes | User number (1 or 2) |
language | string | Yes | That user's language code (must be in transcription_languages) |
conversation_mode description:
| Mode | Description |
|---|---|
auto (default) | The system automatically detects the spoken language and segments sentences automatically |
manual | The user controls the speaking interval via start_speaking / stop_speaking; audio during that interval is merged into a single sentence |
Successful Response
After a successful start, a session_started event is returned, containing the complete initial session information.
General recording (transcribe / conversation / record):
{
"type": "voice-translation",
"data": {
"action": "session_started",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"task_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"recording_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"recording_type": "transcribe",
"recognition_mode": "single",
"message": "Speech recognition started"
}
}
Broadcast mode (broadcast):
{
"type": "voice-translation",
"data": {
"action": "session_started",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"task_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"recording_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"recording_type": "broadcast",
"recognition_mode": "multi_speaker",
"phase": "standby",
"viewer_count": 0,
"queue_count": 0,
"peak_viewers": 0,
"total_viewers": 0,
"message": "Speech recognition started"
}
}
For response field descriptions, see the session_started event.
Recording Type Descriptions
| type | Description | Use Case |
|---|---|---|
transcribe | Speech-to-text | Meeting minutes, interview records |
conversation | Conversation log | Two-way communication, customer service dialogues |
record | Plain recording | Voice memos, quick notes |
broadcast | Broadcast / live stream | Lectures, speeches, live content |
Broadcast Mode Description (type: "broadcast")
In broadcast mode, the language settings are obtained automatically from the broadcast channel settings and do not need to be sent in the WebSocket message.
Required parameters:
| Parameter | Type | Description |
|---|---|---|
type | string | Must be "broadcast" |
broadcast_token | string | Broadcast token (obtained after creating a broadcast via the REST API) |
audio_format | string | Audio format (pcm or webm) |
Optional parameters (override broadcast channel settings):
| Parameter | Type | Description |
|---|---|---|
tts_config | object | Multi-language TTS settings (override the settings used at creation) |
summary_template | string | Summary template slug (overrides the settings used at creation; if not provided, the broadcast channel default is used) |
Automatically configured parameters (can be omitted):
transcription_languages: read automatically from the broadcast settingstranslation_languages: read automatically from the broadcast settingsrealtime_translation: enabled by default in broadcast modesummary_template: read automatically from the broadcast settings (the value passed via WebSocket takes precedence)summary_language: read automatically from the broadcast settings (the value passed via WebSocket takes precedence)
Broadcast phase description:
| broadcast_phase | Description | Behavior |
|---|---|---|
live (default) | Live phase | STT/translation results are broadcast to viewers and written to the transcript |
standby | Standby phase | STT/translation results go only to the host; viewers see the standby_message |
Purpose of the standby phase: Lets the host run STT/translation warm-up tests before going live, confirming the equipment works before switching to the live phase.
Broadcast mode request example:
{
"type": "voice-translation",
"data": {
"action": "start",
"type": "broadcast",
"broadcast_token": "a3f9",
"audio_format": "pcm"
}
}
Broadcast mode request example (standby phase + override summary template):
{
"type": "voice-translation",
"data": {
"action": "start",
"type": "broadcast",
"broadcast_token": "a3f9",
"audio_format": "pcm",
"broadcast_phase": "standby",
"standby_message": "The talk is about to begin, please wait...",
"summary_template": "lecture"
}
}
Summary template priority: the value passed in the WebSocket
start> the default set when creating the broadcast channel. If neither is set, no summary is generated automatically.
Broadcast mode TTS settings (tts_config):
Use the tts_config parameter to specify which translation languages should produce TTS audio for viewers.
| tts_config field | Type | Description |
|---|---|---|
| voice | string | TTS voice name |
| speaking_rate | number | Speaking rate (0.5–2.0, default 1.0) |
{
"type": "voice-translation",
"data": {
"action": "start",
"type": "broadcast",
"broadcast_token": "a3f9",
"audio_format": "pcm",
"tts_config": {
"en-US": {
"voice": "en-US-JennyNeural",
"speaking_rate": 1.0
},
"ja-JP": {
"voice": "ja-JP-NanamiNeural",
"speaking_rate": 1.0
}
}
}
}
Note:
- The TTS language must be a valid language in
translation_languages; invalid languages are ignored automatically- The host (WebSocket) does not receive TTS audio; only SSE viewers receive the
tts_readyevent- TTS is sent only during the
livephase; it is not sent during thestandbyphase
TTS Playback Mode Description
| Mode | Description | Behavior |
|---|---|---|
sync | Synchronous mode (default) | Automatically plays the most recent is_final=true translated sentence; if the previous sentence is still playing, it enters the queue and waits |
async | Asynchronous mode (manual control) | The user can select any translated sentence for TTS, controlled with the tts_play command |
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
missing_transcription_languages | 400 | No language parameter provided | Make sure the request includes transcription_languages |
invalid_transcription_language | 400 | Invalid language code | Make sure the language code format is correct (e.g. zh-TW) |
too_many_languages | 400 | Number of languages exceeds the limit | You can specify at most 2 languages |
invalid_recording_type | 400 | Invalid recording type | Use a valid type value |
invalid_summary_template | 400 | Invalid summary template | Make sure the template identifier is correct |
stt_init_failed | 503 | Service initialization failed | Retry later |
auth_budget_exceeded | 402 | Monthly budget exceeded | Wait for the next month's budget reset or adjust the budget |
tts_init_failed | 503 | TTS service initialization failed | Retry later |
tts_invalid_language | 400 | TTS language is not in the translation languages | Make sure tts_language is in translation_languages |
broadcast_token_required | 400 | Broadcast mode requires a token | A broadcast type must provide broadcast_token |
broadcast_token_invalid | 400 | Invalid broadcast token | Make sure the token is correct and has not expired |
broadcast_not_ready | 503 | Broadcast service not yet started | Retry later |
summary_invalid_mode | 400 | summary_mode is not builtin / custom | Change to a valid mode |
summary_mode_field_mismatch | 400 | The mode and field combination do not match (a required field is missing / a forbidden field was provided) | Adjust the fields according to the mode rules |
summary_prompt_too_long | 400 | summary_prompt exceeds 2000 characters | Shorten the custom prompt |
summary_prompt_slug_too_long | 400 | summary_prompt_slug exceeds 64 characters | Shorten the identifier |
summary_prompt_slug_invalid | 400 | summary_prompt_slug contains control characters (\n / \r / \t / \0, etc.) | Remove the control characters |
config - Configure Terminology / Correction Rules
Description
Send terminology, fuzzy-word correction rules, and translation dictionary settings before or during a recording. These settings can improve STT accuracy, fix homophone errors, and ensure translation consistency.
Automatically generated correction rules: When terminology is provided, the system automatically generates fuzzy-word correction rules for each term (homophones, near-homophones, Traditional/Simplified variants). The frontend does not need to define fuzzy_correction manually, which greatly simplifies the configuration process.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value config |
terminology | object | No | Terminology settings |
fuzzy_correction | object | No | Fuzzy-word correction rules |
translation_dict | object | No | Translation dictionary |
Note: At least one setting item must be provided.
Terminology Format (terminology)
Keyed by language code, with an array of terms as the value:
{
"zh-TW": [
{ "term": "語者分離", "boost": 1.5 },
{ "term": "WebSocket", "boost": 2.0 }
],
"en-US": [
{ "term": "diarization", "boost": 1.5 }
]
}
| Field | Type | Required | Description |
|---|---|---|---|
term | string | Yes | The term (max 100 characters) |
boost | number | No | Weight (default 1.0, range 0.5–5.0) |
Limit: Up to 500 terms per language.
Fuzzy-Word Correction Format (fuzzy_correction)
Note: This field usually does not need to be set manually. The system generates correction rules automatically based on
terminology. Use it only when you need to define special custom rules.
Keyed by language code, with an array of correction rules as the value:
{
"zh-TW": [
{ "correct": "語者分離", "incorrect": ["語這分離", "語者分力"] }
]
}
| Field | Type | Required | Description |
|---|---|---|---|
correct | string | Yes | The correct term |
incorrect | string | Yes | List of incorrect variants |
Translation Dictionary Format (translation_dict)
Uses an array of entries directly:
[
{
"source": "語者分離",
"translations": {
"en-US": "Speaker Diarization",
"ja-JP": "話者分離"
}
}
]
| Field | Type | Required | Description |
|---|---|---|---|
source | string | Yes | The source term (in the STT language) |
translations | object | Yes | Translation mapping { "language_code": "translation" } |
Limit: We recommend no more than 50 entries (to avoid degraded processing performance).
Request Example (Recommended: terminology only)
{
"type": "voice-translation",
"data": {
"action": "config",
"terminology": {
"zh-TW": [
{ "term": "語者分離", "boost": 1.5 },
{ "term": "CVD製程", "boost": 1.5 },
{ "term": "wafer良率", "boost": 1.5 }
]
}
}
}
Request Example (Full settings, with manual correction rules)
{
"type": "voice-translation",
"data": {
"action": "config",
"terminology": {
"zh-TW": [
{ "term": "語者分離", "boost": 1.5 },
{ "term": "即時轉錄", "boost": 1.5 }
]
},
"fuzzy_correction": {
"zh-TW": [
{ "correct": "語者分離", "incorrect": ["語這分離", "語者分力"] }
]
},
"translation_dict": [
{ "source": "語者分離", "translations": { "en-US": "Speaker Diarization" } }
]
}
}
Successful Response
{
"type": "voice-translation",
"data": {
"action": "config_updated",
"updated": ["terminology", "fuzzy_correction", "translation_dict"],
"message": "Settings updated"
}
}
For response field descriptions, see the config_updated event.
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
config_empty | 400 | No settings provided | Provide at least one setting item |
config_term_too_long | 400 | Term exceeds 100 characters | Shorten the term |
config_too_many_entries | 400 | Number of terms exceeds 500 | Reduce the number of terms |
config_too_many_dict_entries | 400 | Translation dictionary exceeds 50 entries | Reduce the number of dictionary entries |
audio - Send Audio
Description
Send audio data to the server for speech recognition. The audio must be Base64-encoded before sending.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value audio |
payload | string | Yes | Base64-encoded audio data |
Audio Format Requirements
PCM format (default):
| Item | Specification |
|---|---|
| Format | PCM (raw audio) |
| Sample rate | 16000 Hz |
| Bit depth | 16-bit |
| Channels | Mono |
| Byte order | Little-endian |
| Transfer encoding | Base64 |
WebM/Opus format:
| Item | Specification |
|---|---|
| Format | WebM container + Opus codec |
| Sample rate | Any (the server converts automatically) |
| Channels | Mono or Stereo (the server converts automatically) |
| Transfer encoding | Base64 |
Request Example
{
"type": "voice-translation",
"data": {
"action": "audio",
"payload": "Base64-encoded PCM audio data"
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
session_not_started | 400 | Speech recognition has not started | Call the start action first |
audio_invalid_format | 400 | Invalid audio data format | Make sure the Base64 encoding is correct |
audio_format_unsupported | 400 | Unsupported audio format | Use the pcm or webm format |
audio_decode_failed | 400 | Audio decoding failed | Make sure the audio format is correct |
audio_process_failed | 500 | STT/diarization writes keep failing, exceeding the tolerance threshold | We recommend reconnecting |
pause - Pause Translation
Description
Pause speech recognition processing. Audio received while paused is cached and processing resumes afterward.
Request Example
{
"type": "voice-translation",
"data": {
"action": "pause"
}
}
Successful Response
{
"type": "voice-translation",
"data": {
"action": "status",
"message": "Speech recognition paused"
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
session_not_started | 400 | Speech recognition has not started | Call start first |
session_already_paused | 400 | Already paused | You can ignore this error |
resume - Resume Translation
Description
Resume paused speech recognition processing.
Request Example
{
"type": "voice-translation",
"data": {
"action": "resume"
}
}
Successful Response
{
"type": "voice-translation",
"data": {
"action": "status",
"message": "Speech recognition resumed"
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
session_not_started | 400 | Speech recognition has not started | Call start first |
session_not_paused | 400 | Not paused | You can ignore this error |
stop - Stop Translation
Description
Stop speech recognition and end the session. The system automatically uploads the audio file and transcript and generates a summary (if configured).
Request Example
{
"type": "voice-translation",
"data": {
"action": "stop"
}
}
Successful Response
{
"type": "voice-translation",
"data": {
"action": "status",
"message": "Speech recognition stopped"
}
}
After stopping, once the audio file and transcript have finished uploading, you receive a task_complete event containing task_id (the Recording UUID).
retranslate - Retranslate a Single Sentence
Description
Retranslate a specified sentence. This is useful when the source text has been corrected and the translation needs to be updated.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value retranslate |
sid | int | Yes | The number of the sentence to retranslate |
translation_languages | string | Yes | Array of translation language codes |
text | string | Yes | The source text to translate (the user's corrected text) |
Request Example
{
"type": "voice-translation",
"data": {
"action": "retranslate",
"sid": 1,
"translation_languages": ["en-US"],
"text": "The user's corrected source text"
}
}
Successful Response
A translation event is returned (sharing the same schema as normal translation results), and the translation result includes is_retranslation: true:
{
"type": "voice-translation",
"data": {
"action": "translation",
"translations": {
"en-US": {
"sid": 1,
"text": "The new translation result",
"is_final": true,
"is_retranslation": true
}
}
}
}
v1.5.6 documentation correction: Earlier documentation described retranslate returning
action: "result", but on the wire it is actuallyaction: "translation". If your client's dispatcher originally handled theresultaction, add a handler branch for thetranslationaction.
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
retranslate_sid_not_found | 400 | The specified SID was not found | Make sure the SID exists |
retranslate_session_not_active | 400 | The session is not started or has ended | Check the session status |
retranslate_no_target_lang | 400 | No target language provided | Provide translation_languages |
retranslate_no_text | 400 | No text to translate provided | Provide the text parameter |
retranslate_llm_failed | 500 | Translation service failed | Retry later |
switch_language - Switch Language
Description
Switch the language during real-time translation. The behavior depends on the recording type:
- General mode (transcribe, etc.): switches the translation target language and automatically batch-retranslates all already-translated sentences
- Two-way translation mode (conversation): switches the STT source language (the spoken language); the translation target switches automatically to the other language
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value switch_language |
translation_languages | string | Conditional | Array of translation language codes (required in general mode) |
transcription_languages | string | Conditional | The target language to switch to (two-way translation mode; if omitted, automatically toggles to the other language) |
Request Example (General Mode)
{
"type": "voice-translation",
"data": {
"action": "switch_language",
"translation_languages": ["ja-JP"]
}
}
Request Example (Two-Way Translation Mode)
Specify the switch target:
{
"type": "voice-translation",
"data": {
"action": "switch_language",
"transcription_languages": ["en-US"]
}
}
Automatic toggle (no parameters):
{
"type": "voice-translation",
"data": {
"action": "switch_language"
}
}
Special behavior in two-way translation mode:
- Two-way translation mode uses automatic language detection, so you usually don't need to switch the language manually
switch_languageonly updates the internal preference state- After a successful switch, a language_switched event is returned (not the language_switch_start/done sequence)
- Switching to the same language returns a
conversation_same_languagewarning
Response Sequence (General Mode)
After switching the language, you receive the following events in order:
- language_switch_start: notifies that the switch has started
{
"type": "voice-translation",
"data": {
"action": "language_switch_start",
"translation_language": "ja-JP",
"total_segments": 15,
"message": "Started switching language and retranslating"
}
}
- batch_retranslation (multiple): returns retranslation results sentence by sentence
{
"type": "voice-translation",
"data": {
"action": "batch_retranslation",
"sid": 3,
"translations": {
"ja-JP": {
"sid": 3,
"text": "今日はプロジェクトの進捗について話し合いましょう",
"is_final": true,
"is_retranslation": true
}
}
}
}
- language_switch_done: notifies that the switch is complete
{
"type": "voice-translation",
"data": {
"action": "language_switch_done",
"translation_language": "ja-JP",
"success_count": 15,
"failed_count": 0,
"message": "Language switch complete"
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
switch_language_no_target | 400 | No target language provided | Provide translation_languages |
switch_language_in_progress | 400 | The previous switch is not yet complete | Wait for the switch to complete |
switch_language_same_target | 400 | The target language is the same as the current one | You can ignore this error |
conversation_requires_two_languages | 400 | Two-way translation mode requires exactly two languages | Make sure transcription_languages has 2 entries |
conversation_languages_identical | 400 | The two languages in two-way translation cannot be the same | Provide two different languages |
conversation_invalid_language | 400 | Invalid two-way translation language | Make sure the language is in transcription_languages |
conversation_same_language | 400 | Already the current language | You can ignore this warning |
set_name - Set Recording Name
Description
Set the name during a recording. After it is set, name_source flips to user, and the system will not override it when the recording ends (even if the LLM generates a summary name, it yields to the user-set name). For the full semantics and priority of name_source, see § Recording Name Rules above.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value set_name |
name | string | Yes | Recording name (max 60 characters) |
Request Example
{
"type": "voice-translation",
"data": {
"action": "set_name",
"name": "Product Planning Meeting"
}
}
Successful Response
{
"type": "voice-translation",
"data": {
"action": "status",
"message": "Recording name set"
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
set_name_empty | 400 | Recording name is empty | Provide a non-empty name |
set_name_too_long | 400 | Recording name exceeds the limit | Shorten the name (≤60 characters) |
set_name_not_ready | 400 | The recording is not yet ready | Call after session_started |
session_not_started | 400 | Speech recognition has not started | Call start first |
rename_speaker - Globally Rename a Speaker
Description
In multi-speaker diarization mode (multi_speaker), globally rename a speaker. All sentences that use that speaker ID are updated in sync.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value rename_speaker |
speaker_id | string | Yes | The original speaker ID (e.g. Guest-1); also accepts the current display label for consecutive renames; max 100 characters |
new_label | string | Yes | The new display label; max 100 characters, must not contain control characters (\x00-\x1F, \x7F) or line breaks |
Request Example
{
"type": "voice-translation",
"data": {
"action": "rename_speaker",
"speaker_id": "Guest-1",
"new_label": "Manager Wang"
}
}
Successful Response
Returns the speaker_renamed event:
{
"type": "voice-translation",
"data": {
"action": "speaker_renamed",
"speaker_id": "Guest-1",
"new_label": "Manager Wang",
"affected_sids": [1, 3, 5, 8]
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
speaker_not_found | 400 | The specified speaker was not found | Make sure the speaker ID or alias exists |
speaker_name_empty | 400 | The speaker name cannot be empty | Provide a valid name |
speaker_name_duplicate | 422 | The speaker name is already in use | Use another name, or first rename the conflicting speaker |
session_not_started | 400 | Speech recognition has not started | Call start first |
reassign_speaker - Change the Speaker of a Single Sentence
Description
Change the speaker identity (OriginalSpeakerID) of a specific sentence, assigning the sentence to an existing speaker.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value reassign_speaker |
sid | int | Yes | The number of the sentence to change |
target_speaker_id | string | Yes | The target speaker's original ID (taken from init_sentence.speaker_id; reassign does not accept display labels) |
Request Example
{
"type": "voice-translation",
"data": {
"action": "reassign_speaker",
"sid": 5,
"target_speaker_id": "Guest-2"
}
}
Successful Response
Returns the speaker_reassigned event:
{
"type": "voice-translation",
"data": {
"action": "speaker_reassigned",
"sid": 5,
"old_speaker_id": "Guest-1",
"new_speaker_id": "Guest-2",
"new_speaker_label": "Lisa Lee"
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
speaker_sid_not_found | 400 | The specified sentence was not found | Make sure the SID exists |
speaker_not_found | 400 | The target speaker does not exist | Use an existing speaker ID |
speaker_name_empty | 400 | The target speaker ID cannot be empty | Provide a valid speaker ID |
session_not_started | 400 | Speech recognition has not started | Call start first |
invalid_parameter | 400 | Creating a new speaker is not supported | Use an existing speaker ID |
merge_speakers - Merge Speakers
Description
Merge all sentences from one speaker into another. After merging, future recognition results from that speaker are also automatically converted to the target speaker.
Difference from reassign_speaker
| Feature | Scope | Future Effect |
|---|---|---|
reassign_speaker | A single sentence (1 SID) | None |
merge_speakers | All sentences of that speaker | Future occurrences of the source are also automatically converted to the target |
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value merge_speakers |
source_speaker_id | string | Yes | The speaker ID to be merged (e.g. Guest-2) |
target_speaker_id | string | Yes | The target speaker ID to merge into (e.g. Guest-1) |
Request Example
{
"type": "voice-translation",
"data": {
"action": "merge_speakers",
"source_speaker_id": "Guest-2",
"target_speaker_id": "Guest-1"
}
}
Successful Response
Returns the speakers_merged event:
{
"type": "voice-translation",
"data": {
"action": "speakers_merged",
"source_speaker_id": "Guest-2",
"target_speaker_id": "Guest-1",
"target_speaker_label": "Manager Wang",
"affected_sids": [3, 5, 7]
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
speaker_not_found | 400 | The speaker does not exist | Make sure the speaker ID exists |
merge_speakers_same_id | 400 | The source and target speakers are the same | Use different speaker IDs |
speaker_name_empty | 400 | The speaker ID cannot be empty | Provide a valid speaker ID |
session_not_started | 400 | Speech recognition has not started | Call start first |
tts_play - Play TTS
Description
In async mode, manually play the TTS audio of a specified sentence. Repeated requests for the same sid are supported (replay).
Two-way translation mode (conversation):
tts_playautomatically synthesizes the translation in the appropriate language based on the voice settings intts_config; you don't need to specifytts_languageseparately.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value tts_play |
sid | int | Yes | The starting sentence ID |
length | int | No | The number of sentences to play (default 1, max 20) |
Request Example (Single Sentence)
{
"type": "voice-translation",
"data": {
"action": "tts_play",
"sid": 5
}
}
Request Example (Multiple Sentences)
{
"type": "voice-translation",
"data": {
"action": "tts_play",
"sid": 5,
"length": 3
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
tts_not_enabled | 400 | TTS not enabled | Make sure TTS was enabled at start |
tts_segment_not_found | 400 | The specified sentence was not found | Make sure the SID exists |
tts_translation_not_found | 400 | The sentence has no translation in the specified language | Make sure the translation exists |
tts_stop - Stop TTS
Description
Stop the currently playing TTS audio.
Request Example
{
"type": "voice-translation",
"data": {
"action": "tts_stop"
}
}
Successful Response
{
"type": "voice-translation",
"data": {
"action": "status",
"message": "TTS playback stopped"
}
}
tts_mode - Switch TTS Mode
Description
Switch the TTS playback mode (synchronous/asynchronous) during a recording.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value tts_mode |
tts_mode | string | Yes | Mode: sync (synchronous) or async (asynchronous) |
Request Example
{
"type": "voice-translation",
"data": {
"action": "tts_mode",
"tts_mode": "async"
}
}
Successful Response
Returns the tts_mode_changed event:
{
"type": "voice-translation",
"data": {
"action": "tts_mode_changed",
"tts_mode": "async"
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
tts_not_enabled | 400 | TTS not enabled | Make sure TTS was enabled at start |
invalid_data | 422 | Invalid mode | Use sync or async |
set_tts - Two-Way Translation TTS Settings
Description
During two-way translation mode (conversation), toggle TTS on/off or update the TTS voice settings mid-session. Available only under the conversation type.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value set_tts |
tts_enabled | boolean | No | Toggle TTS on/off |
tts_config | object | No | Update the TTS settings for a specific language (only the two two-way translation languages are valid) |
Request Example (Disable TTS)
{
"type": "voice-translation",
"data": {
"action": "set_tts",
"tts_enabled": false
}
}
Request Example (Update TTS Voice)
{
"type": "voice-translation",
"data": {
"action": "set_tts",
"tts_enabled": true,
"tts_config": {
"en-US": { "voice": "en-US-GuyNeural", "speaking_rate": 1.2 }
}
}
}
Successful Response
Returns the tts_updated event:
{
"type": "voice-translation",
"data": {
"action": "tts_updated",
"tts_enabled": true,
"tts_config": {
"zh-TW": { "voice": "zh-TW-HsiaoChenNeural", "speaking_rate": 1.0 },
"en-US": { "voice": "en-US-GuyNeural", "speaking_rate": 1.2 }
}
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
invalid_action | 400 | This operation is not supported outside two-way translation mode | Use only under the conversation type |
start_speaking - Start Speaking (Manual Mode)
Description
In two-way translation manual mode (conversation_mode: "manual"), notify the system that the user has started speaking. From this moment, audio is sent to STT for recognition, and all recognition results accumulate into the same sentence (no automatic segmentation).
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value start_speaking |
speaker | int | Yes | User number (1 or 2) |
Request Example
{
"type": "voice-translation",
"data": {
"action": "start_speaking",
"speaker": 1
}
}
Successful Response
{
"type": "voice-translation",
"data": {
"action": "status",
"message": "Started speaking"
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
invalid_action | 400 | Not in two-way translation mode | Use only under the conversation type |
conversation_not_manual_mode | 400 | Not in manual mode | Use only in manual mode |
conversation_speaking | 400 | Already speaking | Call stop_speaking first |
conversation_invalid_speaker | 400 | Invalid user number | Use 1 or 2 |
stop_speaking - Stop Speaking (Manual Mode)
Description
In two-way translation manual mode, notify the system that the user has stopped speaking. The system merges the recognition results accumulated during this period into one complete sentence, then translates it and synthesizes TTS.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value stop_speaking |
Request Example
{
"type": "voice-translation",
"data": {
"action": "stop_speaking"
}
}
Successful Response
After speaking stops, the system sends a complete result event (containing origin and translations):
{
"type": "voice-translation",
"data": {
"action": "result",
"origin": {
"sid": 1,
"language": "zh-TW",
"text": "The complete sentence merged from all recognition during this period",
"is_final": true,
"speaker_id": "Speaker-1",
"start_time": "00:05"
},
"translations": {
"en-US": {
"sid": 1,
"text": "The complete merged sentence from this speaking period",
"is_final": true
}
}
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
invalid_action | 400 | Not in two-way translation mode | Use only under the conversation type |
conversation_not_speaking | 400 | Not in the speaking state | Call start_speaking first |
switch_conversation_mode - Switch Conversation Mode
Description
During two-way translation mode, switch between automatic detection mode (auto) and manual mode (manual). If the user is speaking during the switch, speaking ends automatically.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value switch_conversation_mode |
conversation_mode | string | Yes | Target mode: auto or manual |
Request Example
{
"type": "voice-translation",
"data": {
"action": "switch_conversation_mode",
"conversation_mode": "manual"
}
}
Successful Response
Returns the conversation_mode_changed event:
{
"type": "voice-translation",
"data": {
"action": "conversation_mode_changed",
"conversation_mode": "manual"
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
invalid_action | 400 | Not in two-way translation mode | Use only under the conversation type |
conversation_invalid_mode | 400 | Invalid conversation mode | Use auto or manual |
set_speaker_language - Set Speaker Language
Description
During two-way translation mode, change a specified user's language in real time. The system rebuilds the STT connection to adapt to the new language, and the translation target is updated automatically. Transcript content before the change keeps its original language, and timestamps continue counting without resetting.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value set_speaker_language |
speaker | int | Yes | User number (1 or 2) |
language | string | Yes | The new language code (e.g. ja-JP) |
Request Example
{
"type": "voice-translation",
"data": {
"action": "set_speaker_language",
"speaker": 1,
"language": "ja-JP"
}
}
Successful Response
Returns the speaker_language_changed event:
{
"type": "voice-translation",
"data": {
"action": "speaker_language_changed",
"speaker_language_map": {
"1": "ja-JP",
"2": "en-US"
}
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
invalid_action | 400 | Not in two-way translation mode | Use only under the conversation type |
conversation_invalid_speaker | 400 | Invalid user number | Use 1 or 2 |
conversation_invalid_language | 400 | Invalid language code | Use a valid BCP 47 language code |
conversation_same_language | 400 | Same as the current language | You can ignore this warning |
conversation_language_same_as_peer | 400 | The new language is the same as the other user's | The two users cannot have the same language |
conversation_speaking | 400 | Currently speaking, cannot change the language | End speaking first, then change |
conversation_language_change_failed | 500 | Language change failed (STT rebuild failed) | Retry later |
broadcast_go_live - Switch to the Live Phase
Description
Switch from the broadcast standby phase (standby) to the live phase (live). After the switch, STT/translation results begin broadcasting to viewers and start being written to the transcript.
Request Example
{
"type": "voice-translation",
"data": {
"action": "broadcast_go_live"
}
}
Successful Response
Returns the broadcast_phase_changed event:
{
"type": "voice-translation",
"data": {
"action": "broadcast_phase_changed",
"phase": "live",
"message": "Broadcast started"
}
}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
broadcast_not_enabled | 400 | Not in broadcast mode | Make sure type: "broadcast" |
Note: If already in the live phase, a status message "Broadcast is already in progress" is returned; this is not treated as an error.
broadcast_announcement - Send an Announcement
Description
The host sends a custom announcement message to all viewers. Viewers receive an announcement event via SSE. The announcement message is automatically translated into all translation languages, and the SSE event viewers receive includes a translations field.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value broadcast_announcement |
message | string | Yes | The announcement message content |
Request Example
{
"type": "voice-translation",
"data": {
"action": "broadcast_announcement",
"message": "The meeting will end in 5 minutes"
}
}
Successful Response
{
"type": "voice-translation",
"data": {
"action": "status",
"message": "Announcement sent"
}
}
The SSE event received on the viewer side (with translations):
event: announcement
data: {"message":"The meeting will end in 5 minutes","translations":{"en-US":"The meeting will end in 5 minutes","ja-JP":"会議は5分後に終了します"}}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
broadcast_not_enabled | 400 | Not in broadcast mode | Make sure type: "broadcast" |
invalid_parameter | 400 | The message is empty | Provide a valid message parameter |
set_standby_message - Set the Standby Phase Message
Description
Dynamically set the message displayed to viewers during the broadcast standby phase (standby). This lets the host enter standby mode first and set the waiting message afterward, rather than having to provide it at start.
The message is automatically translated into all translation languages, and the SSE event viewers receive includes a translations field.
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
action | string | Yes | Fixed value set_standby_message |
message | string | Yes | The standby phase display text (translated for viewers in each language via the translation pipeline) |
Request Example
{
"type": "voice-translation",
"data": {
"action": "set_standby_message",
"message": "The talk is about to begin, please wait..."
}
}
Successful Response
{
"type": "voice-translation",
"data": {
"action": "status",
"message": "Standby phase text updated"
}
}
The SSE event received on the viewer side (with translations):
event: standby
data: {"message":"The talk is about to begin, please wait...","translations":{"en-US":"The presentation is about to begin, please wait...","ja-JP":"プレゼンテーションがまもなく始まります。お待ちください..."}}
Error Codes
| Error Code | HTTP Status | Description | Recommended Action |
|---|---|---|---|
broadcast_not_enabled | 400 | Not in broadcast mode | Make sure type: "broadcast" |
broadcast_not_in_standby | 400 | Not in the standby phase | Can only be used during the standby phase |
Note: This action can only be used during the standby phase (standby). If you have already entered the live phase (live), an error is returned.
Version: V1.5.7 Last Updated: 2026-05-20