WebSocket API

Voice Translation

Overview

A complete list of all actions available under the voice-translation type. For connection and authentication, see Connection and Authentication; for response event formats, see Response Events.


Table of Contents

  1. start - Start Voice Translation
  2. config - Configure Terminology / Correction Rules
  3. audio - Send Audio
  4. pause - Pause Translation
  5. resume - Resume Translation
  6. stop - Stop Translation
  7. retranslate - Retranslate a Single Sentence
  8. switch_language - Switch Language
  9. set_name - Set Recording Name
  10. rename_speaker - Globally Rename a Speaker
  11. reassign_speaker - Change the Speaker of a Single Sentence
  12. merge_speakers - Merge Speakers
  13. tts_play - Play TTS
  14. tts_stop - Stop TTS
  15. tts_mode - Switch TTS Mode
  16. set_tts - Two-Way Translation TTS Settings
  17. start_speaking - Start Speaking (Manual Mode)
  18. stop_speaking - Stop Speaking (Manual Mode)
  19. switch_conversation_mode - Switch Conversation Mode
  20. set_speaker_language - Set Speaker Language
  21. broadcast_go_live - Switch to the Live Phase
  22. broadcast_announcement - Send an Announcement
  23. set_standby_message - Set the Standby Phase Message

start - Start Voice Translation

Description

Start a new voice translation session and begin processing audio according to the configured parameters.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value start
transcription_languagesstringYesSpeech recognition languages (up to 2)
translation_languagesstringNoTranslation target languages (empty = no translation)
realtime_translationbooleanNoReal-time translation mode (default false)
recognition_modestringNoRecognition mode: single (single speaker, default), multi_speaker (multiple speakers); under multi_speaker, transcription_languages must contain exactly 1 language, otherwise a diarization_multilang_conflict error is returned and the session is refused
typestringYesRecording type: transcribe, conversation, record, broadcast
audio_formatstringNoAudio format: pcm (default), webm
summary_templatestringConditionalSummary template (required for transcribe, optional for conversation/broadcast)
optionsobjectNoSpeech recognition options
tts_enabledbooleanNoWhether to enable TTS speech synthesis (default false)
tts_languagestringNoTTS output language (must be in translation_languages)
tts_voicestringNoTTS voice name (e.g. en-US-JennyNeural)
tts_modestringNoTTS playback mode: sync (synchronous, default), async (asynchronous)
broadcast_tokenstringConditionalBroadcast token (required for broadcast type, obtained from the REST API)
active_languagestringNoInitial active language in two-way translation mode (default transcription_languages[0])
tts_configobjectNoMulti-language TTS settings (broadcast / two-way translation mode)
broadcast_phasestringNoInitial broadcast phase: standby, live (default)
standby_messagestringNoMessage viewers see during the standby phase (default: "Preparing, please wait...")
namestringNoInitial default recording name (max 60 characters; the system may still override it; if not provided, one is generated automatically, e.g. Transcription #1)
summary_languagestringNoSummary output language (defaults to the recognition language when not specified; in broadcast mode it is read automatically from the channel settings)
summary_modestringNoSummary mode enum: builtin (apply the built-in template, default) / custom (the customer prompt fully replaces the default). When omitted, builtin is inferred automatically
summary_promptstringNoRequired in custom mode; treated as supplementary instructions in builtin mode. ≤2000 characters
summary_prompt_slugstringNoRequired in custom mode; must not be provided in builtin mode. The customer's own identifier (≤64 characters, Unicode, no control characters; passed through and stored in the backend record for historical lookup)
summary_plain_textbooleanNoRequest plain-text summary output (default false; when enabled, the backend performs Markdown post-processing)
speakersobjectConditionalSpeaker language settings for two-way translation mode (required for conversation type, exactly 2 entries, see below)
conversation_modestringNoTwo-way conversation mode: auto (automatic detection, default), manual (manual PTT)

Request Example (Basic)

{
  "type": "voice-translation",
  "data": {
    "action": "start",
    "transcription_languages": ["zh-TW"],
    "translation_languages": ["en-US"],
    "realtime_translation": false,
    "type": "transcribe",
    "audio_format": "pcm",
    "summary_template": "meeting",
    "options": {
      "speaking_speed": "normal",
      "segmentation_mode": "auto",
      "profanity_handling": "mask"
    }
  }
}

Request Example (Initial Default Name)

{
  "type": "voice-translation",
  "data": {
    "action": "start",
    "transcription_languages": ["zh-TW"],
    "translation_languages": ["en-US"],
    "type": "transcribe",
    "audio_format": "pcm",
    "summary_template": "meeting",
    "name": "Product Planning Meeting"
  }
}

Recording Name Rules

ScenarioNamename_sourceOverridden by system?
start with a name parameterInitial default namedefaultYes
start without a nameAuto-generated (e.g. Transcription #1, Broadcast #3)defaultYes
Set via set_nameName explicitly set by the useruserNo
Auto-generated by the system after the session endsSummary name generated from the transcript contentllm

Note: The name in start is an initial default name; the system may still override it when the session ends. If you need a fixed name, use set_name.

Default name formats (fixed English):

Recording TypeDefault Name Format
transcribeTranscription #N
conversationConversation #N
recordRecording #N
broadcastBroadcast #N

N is the sequential number of recordings of the same type for that user. Name priority: user > llm > default. Once the user sets a name, the system will not override it when the session ends.

Request Example (with TTS)

{
  "type": "voice-translation",
  "data": {
    "action": "start",
    "transcription_languages": ["zh-TW"],
    "translation_languages": ["en-US"],
    "realtime_translation": true,
    "type": "transcribe",
    "tts_enabled": true,
    "tts_language": "en-US",
    "tts_voice": "en-US-JennyNeural",
    "tts_mode": "sync"
  }
}

Request Example (Two-Way Translation Mode - Automatic Detection)

{
  "type": "voice-translation",
  "data": {
    "action": "start",
    "type": "conversation",
    "transcription_languages": ["zh-TW", "en-US"],
    "active_language": "zh-TW",
    "audio_format": "pcm",
    "realtime_translation": true,
    "speakers": [
      { "id": 1, "language": "zh-TW" },
      { "id": 2, "language": "en-US" }
    ],
    "tts_config": {
      "zh-TW": { "voice": "zh-TW-HsiaoChenNeural", "speaking_rate": 1.0 },
      "en-US": { "voice": "en-US-JennyNeural", "speaking_rate": 1.0 }
    }
  }
}

Request Example (Two-Way Translation Mode - Manual Mode)

{
  "type": "voice-translation",
  "data": {
    "action": "start",
    "type": "conversation",
    "transcription_languages": ["zh-TW", "en-US"],
    "conversation_mode": "manual",
    "audio_format": "pcm",
    "realtime_translation": true,
    "speakers": [
      { "id": 1, "language": "zh-TW" },
      { "id": 2, "language": "en-US" }
    ],
    "tts_config": {
      "zh-TW": { "voice": "zh-TW-HsiaoChenNeural", "speaking_rate": 1.0 },
      "en-US": { "voice": "en-US-JennyNeural", "speaking_rate": 1.0 }
    }
  }
}

Special rules for two-way translation mode:

ItemDescription
transcription_languagesMust contain exactly 2 languages, and they must differ
translation_languagesNot required (automatically derived as the non-active language)
active_languageOptional, defaults to transcription_languages[0]
recognition_modeForced to single (speaker_diarization is ignored)
tts_enabledDefaults to true; set to false to return text translation only
tts_configOptional; configures the TTS voice for each of the two languages; leave empty to use the default voices automatically
summary_templateOptional; when provided, a summary is generated automatically after stopping
speakersRequired in two-way translation mode; specifies each user's language (exactly 2 entries)
conversation_modeOptional, auto (automatic detection, default) or manual (manual PTT)

speakers field description:

FieldTypeRequiredDescription
idintYesUser number (1 or 2)
languagestringYesThat user's language code (must be in transcription_languages)

conversation_mode description:

ModeDescription
auto (default)The system automatically detects the spoken language and segments sentences automatically
manualThe user controls the speaking interval via start_speaking / stop_speaking; audio during that interval is merged into a single sentence

Successful Response

After a successful start, a session_started event is returned, containing the complete initial session information.

General recording (transcribe / conversation / record):

{
  "type": "voice-translation",
  "data": {
    "action": "session_started",
    "session_id": "550e8400-e29b-41d4-a716-446655440000",
    "task_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "recording_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "recording_type": "transcribe",
    "recognition_mode": "single",
    "message": "Speech recognition started"
  }
}

Broadcast mode (broadcast):

{
  "type": "voice-translation",
  "data": {
    "action": "session_started",
    "session_id": "550e8400-e29b-41d4-a716-446655440000",
    "task_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "recording_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "recording_type": "broadcast",
    "recognition_mode": "multi_speaker",
    "phase": "standby",
    "viewer_count": 0,
    "queue_count": 0,
    "peak_viewers": 0,
    "total_viewers": 0,
    "message": "Speech recognition started"
  }
}

For response field descriptions, see the session_started event.

Recording Type Descriptions

typeDescriptionUse Case
transcribeSpeech-to-textMeeting minutes, interview records
conversationConversation logTwo-way communication, customer service dialogues
recordPlain recordingVoice memos, quick notes
broadcastBroadcast / live streamLectures, speeches, live content

Broadcast Mode Description (type: "broadcast")

In broadcast mode, the language settings are obtained automatically from the broadcast channel settings and do not need to be sent in the WebSocket message.

Required parameters:

ParameterTypeDescription
typestringMust be "broadcast"
broadcast_tokenstringBroadcast token (obtained after creating a broadcast via the REST API)
audio_formatstringAudio format (pcm or webm)

Optional parameters (override broadcast channel settings):

ParameterTypeDescription
tts_configobjectMulti-language TTS settings (override the settings used at creation)
summary_templatestringSummary template slug (overrides the settings used at creation; if not provided, the broadcast channel default is used)

Automatically configured parameters (can be omitted):

  • transcription_languages: read automatically from the broadcast settings
  • translation_languages: read automatically from the broadcast settings
  • realtime_translation: enabled by default in broadcast mode
  • summary_template: read automatically from the broadcast settings (the value passed via WebSocket takes precedence)
  • summary_language: read automatically from the broadcast settings (the value passed via WebSocket takes precedence)

Broadcast phase description:

broadcast_phaseDescriptionBehavior
live (default)Live phaseSTT/translation results are broadcast to viewers and written to the transcript
standbyStandby phaseSTT/translation results go only to the host; viewers see the standby_message

Purpose of the standby phase: Lets the host run STT/translation warm-up tests before going live, confirming the equipment works before switching to the live phase.

Broadcast mode request example:

{
  "type": "voice-translation",
  "data": {
    "action": "start",
    "type": "broadcast",
    "broadcast_token": "a3f9",
    "audio_format": "pcm"
  }
}

Broadcast mode request example (standby phase + override summary template):

{
  "type": "voice-translation",
  "data": {
    "action": "start",
    "type": "broadcast",
    "broadcast_token": "a3f9",
    "audio_format": "pcm",
    "broadcast_phase": "standby",
    "standby_message": "The talk is about to begin, please wait...",
    "summary_template": "lecture"
  }
}

Summary template priority: the value passed in the WebSocket start > the default set when creating the broadcast channel. If neither is set, no summary is generated automatically.

Broadcast mode TTS settings (tts_config):

Use the tts_config parameter to specify which translation languages should produce TTS audio for viewers.

tts_config fieldTypeDescription
voicestringTTS voice name
speaking_ratenumberSpeaking rate (0.5–2.0, default 1.0)
{
  "type": "voice-translation",
  "data": {
    "action": "start",
    "type": "broadcast",
    "broadcast_token": "a3f9",
    "audio_format": "pcm",
    "tts_config": {
      "en-US": {
        "voice": "en-US-JennyNeural",
        "speaking_rate": 1.0
      },
      "ja-JP": {
        "voice": "ja-JP-NanamiNeural",
        "speaking_rate": 1.0
      }
    }
  }
}

Note:

  • The TTS language must be a valid language in translation_languages; invalid languages are ignored automatically
  • The host (WebSocket) does not receive TTS audio; only SSE viewers receive the tts_ready event
  • TTS is sent only during the live phase; it is not sent during the standby phase

TTS Playback Mode Description

ModeDescriptionBehavior
syncSynchronous mode (default)Automatically plays the most recent is_final=true translated sentence; if the previous sentence is still playing, it enters the queue and waits
asyncAsynchronous mode (manual control)The user can select any translated sentence for TTS, controlled with the tts_play command

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
missing_transcription_languages400No language parameter providedMake sure the request includes transcription_languages
invalid_transcription_language400Invalid language codeMake sure the language code format is correct (e.g. zh-TW)
too_many_languages400Number of languages exceeds the limitYou can specify at most 2 languages
invalid_recording_type400Invalid recording typeUse a valid type value
invalid_summary_template400Invalid summary templateMake sure the template identifier is correct
stt_init_failed503Service initialization failedRetry later
auth_budget_exceeded402Monthly budget exceededWait for the next month's budget reset or adjust the budget
tts_init_failed503TTS service initialization failedRetry later
tts_invalid_language400TTS language is not in the translation languagesMake sure tts_language is in translation_languages
broadcast_token_required400Broadcast mode requires a tokenA broadcast type must provide broadcast_token
broadcast_token_invalid400Invalid broadcast tokenMake sure the token is correct and has not expired
broadcast_not_ready503Broadcast service not yet startedRetry later
summary_invalid_mode400summary_mode is not builtin / customChange to a valid mode
summary_mode_field_mismatch400The mode and field combination do not match (a required field is missing / a forbidden field was provided)Adjust the fields according to the mode rules
summary_prompt_too_long400summary_prompt exceeds 2000 charactersShorten the custom prompt
summary_prompt_slug_too_long400summary_prompt_slug exceeds 64 charactersShorten the identifier
summary_prompt_slug_invalid400summary_prompt_slug contains control characters (\n / \r / \t / \0, etc.)Remove the control characters

config - Configure Terminology / Correction Rules

Description

Send terminology, fuzzy-word correction rules, and translation dictionary settings before or during a recording. These settings can improve STT accuracy, fix homophone errors, and ensure translation consistency.

Automatically generated correction rules: When terminology is provided, the system automatically generates fuzzy-word correction rules for each term (homophones, near-homophones, Traditional/Simplified variants). The frontend does not need to define fuzzy_correction manually, which greatly simplifies the configuration process.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value config
terminologyobjectNoTerminology settings
fuzzy_correctionobjectNoFuzzy-word correction rules
translation_dictobjectNoTranslation dictionary

Note: At least one setting item must be provided.

Terminology Format (terminology)

Keyed by language code, with an array of terms as the value:

{
  "zh-TW": [
    { "term": "語者分離", "boost": 1.5 },
    { "term": "WebSocket", "boost": 2.0 }
  ],
  "en-US": [
    { "term": "diarization", "boost": 1.5 }
  ]
}
FieldTypeRequiredDescription
termstringYesThe term (max 100 characters)
boostnumberNoWeight (default 1.0, range 0.5–5.0)

Limit: Up to 500 terms per language.

Fuzzy-Word Correction Format (fuzzy_correction)

Note: This field usually does not need to be set manually. The system generates correction rules automatically based on terminology. Use it only when you need to define special custom rules.

Keyed by language code, with an array of correction rules as the value:

{
  "zh-TW": [
    { "correct": "語者分離", "incorrect": ["語這分離", "語者分力"] }
  ]
}
FieldTypeRequiredDescription
correctstringYesThe correct term
incorrectstringYesList of incorrect variants

Translation Dictionary Format (translation_dict)

Uses an array of entries directly:

[
  {
    "source": "語者分離",
    "translations": {
      "en-US": "Speaker Diarization",
      "ja-JP": "話者分離"
    }
  }
]
FieldTypeRequiredDescription
sourcestringYesThe source term (in the STT language)
translationsobjectYesTranslation mapping { "language_code": "translation" }

Limit: We recommend no more than 50 entries (to avoid degraded processing performance).

{
  "type": "voice-translation",
  "data": {
    "action": "config",
    "terminology": {
      "zh-TW": [
        { "term": "語者分離", "boost": 1.5 },
        { "term": "CVD製程", "boost": 1.5 },
        { "term": "wafer良率", "boost": 1.5 }
      ]
    }
  }
}

Request Example (Full settings, with manual correction rules)

{
  "type": "voice-translation",
  "data": {
    "action": "config",
    "terminology": {
      "zh-TW": [
        { "term": "語者分離", "boost": 1.5 },
        { "term": "即時轉錄", "boost": 1.5 }
      ]
    },
    "fuzzy_correction": {
      "zh-TW": [
        { "correct": "語者分離", "incorrect": ["語這分離", "語者分力"] }
      ]
    },
    "translation_dict": [
      { "source": "語者分離", "translations": { "en-US": "Speaker Diarization" } }
    ]
  }
}

Successful Response

{
  "type": "voice-translation",
  "data": {
    "action": "config_updated",
    "updated": ["terminology", "fuzzy_correction", "translation_dict"],
    "message": "Settings updated"
  }
}

For response field descriptions, see the config_updated event.

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
config_empty400No settings providedProvide at least one setting item
config_term_too_long400Term exceeds 100 charactersShorten the term
config_too_many_entries400Number of terms exceeds 500Reduce the number of terms
config_too_many_dict_entries400Translation dictionary exceeds 50 entriesReduce the number of dictionary entries

audio - Send Audio

Description

Send audio data to the server for speech recognition. The audio must be Base64-encoded before sending.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value audio
payloadstringYesBase64-encoded audio data

Audio Format Requirements

PCM format (default):

ItemSpecification
FormatPCM (raw audio)
Sample rate16000 Hz
Bit depth16-bit
ChannelsMono
Byte orderLittle-endian
Transfer encodingBase64

WebM/Opus format:

ItemSpecification
FormatWebM container + Opus codec
Sample rateAny (the server converts automatically)
ChannelsMono or Stereo (the server converts automatically)
Transfer encodingBase64

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "audio",
    "payload": "Base64-encoded PCM audio data"
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
session_not_started400Speech recognition has not startedCall the start action first
audio_invalid_format400Invalid audio data formatMake sure the Base64 encoding is correct
audio_format_unsupported400Unsupported audio formatUse the pcm or webm format
audio_decode_failed400Audio decoding failedMake sure the audio format is correct
audio_process_failed500STT/diarization writes keep failing, exceeding the tolerance thresholdWe recommend reconnecting

pause - Pause Translation

Description

Pause speech recognition processing. Audio received while paused is cached and processing resumes afterward.

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "pause"
  }
}

Successful Response

{
  "type": "voice-translation",
  "data": {
    "action": "status",
    "message": "Speech recognition paused"
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
session_not_started400Speech recognition has not startedCall start first
session_already_paused400Already pausedYou can ignore this error

resume - Resume Translation

Description

Resume paused speech recognition processing.

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "resume"
  }
}

Successful Response

{
  "type": "voice-translation",
  "data": {
    "action": "status",
    "message": "Speech recognition resumed"
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
session_not_started400Speech recognition has not startedCall start first
session_not_paused400Not pausedYou can ignore this error

stop - Stop Translation

Description

Stop speech recognition and end the session. The system automatically uploads the audio file and transcript and generates a summary (if configured).

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "stop"
  }
}

Successful Response

{
  "type": "voice-translation",
  "data": {
    "action": "status",
    "message": "Speech recognition stopped"
  }
}

After stopping, once the audio file and transcript have finished uploading, you receive a task_complete event containing task_id (the Recording UUID).


retranslate - Retranslate a Single Sentence

Description

Retranslate a specified sentence. This is useful when the source text has been corrected and the translation needs to be updated.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value retranslate
sidintYesThe number of the sentence to retranslate
translation_languagesstringYesArray of translation language codes
textstringYesThe source text to translate (the user's corrected text)

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "retranslate",
    "sid": 1,
    "translation_languages": ["en-US"],
    "text": "The user's corrected source text"
  }
}

Successful Response

A translation event is returned (sharing the same schema as normal translation results), and the translation result includes is_retranslation: true:

{
  "type": "voice-translation",
  "data": {
    "action": "translation",
    "translations": {
      "en-US": {
        "sid": 1,
        "text": "The new translation result",
        "is_final": true,
        "is_retranslation": true
      }
    }
  }
}

v1.5.6 documentation correction: Earlier documentation described retranslate returning action: "result", but on the wire it is actually action: "translation". If your client's dispatcher originally handled the result action, add a handler branch for the translation action.

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
retranslate_sid_not_found400The specified SID was not foundMake sure the SID exists
retranslate_session_not_active400The session is not started or has endedCheck the session status
retranslate_no_target_lang400No target language providedProvide translation_languages
retranslate_no_text400No text to translate providedProvide the text parameter
retranslate_llm_failed500Translation service failedRetry later

switch_language - Switch Language

Description

Switch the language during real-time translation. The behavior depends on the recording type:

  • General mode (transcribe, etc.): switches the translation target language and automatically batch-retranslates all already-translated sentences
  • Two-way translation mode (conversation): switches the STT source language (the spoken language); the translation target switches automatically to the other language

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value switch_language
translation_languagesstringConditionalArray of translation language codes (required in general mode)
transcription_languagesstringConditionalThe target language to switch to (two-way translation mode; if omitted, automatically toggles to the other language)

Request Example (General Mode)

{
  "type": "voice-translation",
  "data": {
    "action": "switch_language",
    "translation_languages": ["ja-JP"]
  }
}

Request Example (Two-Way Translation Mode)

Specify the switch target:

{
  "type": "voice-translation",
  "data": {
    "action": "switch_language",
    "transcription_languages": ["en-US"]
  }
}

Automatic toggle (no parameters):

{
  "type": "voice-translation",
  "data": {
    "action": "switch_language"
  }
}

Special behavior in two-way translation mode:

  • Two-way translation mode uses automatic language detection, so you usually don't need to switch the language manually
  • switch_language only updates the internal preference state
  • After a successful switch, a language_switched event is returned (not the language_switch_start/done sequence)
  • Switching to the same language returns a conversation_same_language warning

Response Sequence (General Mode)

After switching the language, you receive the following events in order:

  1. language_switch_start: notifies that the switch has started
{
  "type": "voice-translation",
  "data": {
    "action": "language_switch_start",
    "translation_language": "ja-JP",
    "total_segments": 15,
    "message": "Started switching language and retranslating"
  }
}
  1. batch_retranslation (multiple): returns retranslation results sentence by sentence
{
  "type": "voice-translation",
  "data": {
    "action": "batch_retranslation",
    "sid": 3,
    "translations": {
      "ja-JP": {
        "sid": 3,
        "text": "今日はプロジェクトの進捗について話し合いましょう",
        "is_final": true,
        "is_retranslation": true
      }
    }
  }
}
  1. language_switch_done: notifies that the switch is complete
{
  "type": "voice-translation",
  "data": {
    "action": "language_switch_done",
    "translation_language": "ja-JP",
    "success_count": 15,
    "failed_count": 0,
    "message": "Language switch complete"
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
switch_language_no_target400No target language providedProvide translation_languages
switch_language_in_progress400The previous switch is not yet completeWait for the switch to complete
switch_language_same_target400The target language is the same as the current oneYou can ignore this error
conversation_requires_two_languages400Two-way translation mode requires exactly two languagesMake sure transcription_languages has 2 entries
conversation_languages_identical400The two languages in two-way translation cannot be the sameProvide two different languages
conversation_invalid_language400Invalid two-way translation languageMake sure the language is in transcription_languages
conversation_same_language400Already the current languageYou can ignore this warning

set_name - Set Recording Name

Description

Set the name during a recording. After it is set, name_source flips to user, and the system will not override it when the recording ends (even if the LLM generates a summary name, it yields to the user-set name). For the full semantics and priority of name_source, see § Recording Name Rules above.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value set_name
namestringYesRecording name (max 60 characters)

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "set_name",
    "name": "Product Planning Meeting"
  }
}

Successful Response

{
  "type": "voice-translation",
  "data": {
    "action": "status",
    "message": "Recording name set"
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
set_name_empty400Recording name is emptyProvide a non-empty name
set_name_too_long400Recording name exceeds the limitShorten the name (≤60 characters)
set_name_not_ready400The recording is not yet readyCall after session_started
session_not_started400Speech recognition has not startedCall start first

rename_speaker - Globally Rename a Speaker

Description

In multi-speaker diarization mode (multi_speaker), globally rename a speaker. All sentences that use that speaker ID are updated in sync.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value rename_speaker
speaker_idstringYesThe original speaker ID (e.g. Guest-1); also accepts the current display label for consecutive renames; max 100 characters
new_labelstringYesThe new display label; max 100 characters, must not contain control characters (\x00-\x1F, \x7F) or line breaks

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "rename_speaker",
    "speaker_id": "Guest-1",
    "new_label": "Manager Wang"
  }
}

Successful Response

Returns the speaker_renamed event:

{
  "type": "voice-translation",
  "data": {
    "action": "speaker_renamed",
    "speaker_id": "Guest-1",
    "new_label": "Manager Wang",
    "affected_sids": [1, 3, 5, 8]
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
speaker_not_found400The specified speaker was not foundMake sure the speaker ID or alias exists
speaker_name_empty400The speaker name cannot be emptyProvide a valid name
speaker_name_duplicate422The speaker name is already in useUse another name, or first rename the conflicting speaker
session_not_started400Speech recognition has not startedCall start first

reassign_speaker - Change the Speaker of a Single Sentence

Description

Change the speaker identity (OriginalSpeakerID) of a specific sentence, assigning the sentence to an existing speaker.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value reassign_speaker
sidintYesThe number of the sentence to change
target_speaker_idstringYesThe target speaker's original ID (taken from init_sentence.speaker_id; reassign does not accept display labels)

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "reassign_speaker",
    "sid": 5,
    "target_speaker_id": "Guest-2"
  }
}

Successful Response

Returns the speaker_reassigned event:

{
  "type": "voice-translation",
  "data": {
    "action": "speaker_reassigned",
    "sid": 5,
    "old_speaker_id": "Guest-1",
    "new_speaker_id": "Guest-2",
    "new_speaker_label": "Lisa Lee"
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
speaker_sid_not_found400The specified sentence was not foundMake sure the SID exists
speaker_not_found400The target speaker does not existUse an existing speaker ID
speaker_name_empty400The target speaker ID cannot be emptyProvide a valid speaker ID
session_not_started400Speech recognition has not startedCall start first
invalid_parameter400Creating a new speaker is not supportedUse an existing speaker ID

merge_speakers - Merge Speakers

Description

Merge all sentences from one speaker into another. After merging, future recognition results from that speaker are also automatically converted to the target speaker.

Difference from reassign_speaker

FeatureScopeFuture Effect
reassign_speakerA single sentence (1 SID)None
merge_speakersAll sentences of that speakerFuture occurrences of the source are also automatically converted to the target

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value merge_speakers
source_speaker_idstringYesThe speaker ID to be merged (e.g. Guest-2)
target_speaker_idstringYesThe target speaker ID to merge into (e.g. Guest-1)

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "merge_speakers",
    "source_speaker_id": "Guest-2",
    "target_speaker_id": "Guest-1"
  }
}

Successful Response

Returns the speakers_merged event:

{
  "type": "voice-translation",
  "data": {
    "action": "speakers_merged",
    "source_speaker_id": "Guest-2",
    "target_speaker_id": "Guest-1",
    "target_speaker_label": "Manager Wang",
    "affected_sids": [3, 5, 7]
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
speaker_not_found400The speaker does not existMake sure the speaker ID exists
merge_speakers_same_id400The source and target speakers are the sameUse different speaker IDs
speaker_name_empty400The speaker ID cannot be emptyProvide a valid speaker ID
session_not_started400Speech recognition has not startedCall start first

tts_play - Play TTS

Description

In async mode, manually play the TTS audio of a specified sentence. Repeated requests for the same sid are supported (replay).

Two-way translation mode (conversation): tts_play automatically synthesizes the translation in the appropriate language based on the voice settings in tts_config; you don't need to specify tts_language separately.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value tts_play
sidintYesThe starting sentence ID
lengthintNoThe number of sentences to play (default 1, max 20)

Request Example (Single Sentence)

{
  "type": "voice-translation",
  "data": {
    "action": "tts_play",
    "sid": 5
  }
}

Request Example (Multiple Sentences)

{
  "type": "voice-translation",
  "data": {
    "action": "tts_play",
    "sid": 5,
    "length": 3
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
tts_not_enabled400TTS not enabledMake sure TTS was enabled at start
tts_segment_not_found400The specified sentence was not foundMake sure the SID exists
tts_translation_not_found400The sentence has no translation in the specified languageMake sure the translation exists

tts_stop - Stop TTS

Description

Stop the currently playing TTS audio.

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "tts_stop"
  }
}

Successful Response

{
  "type": "voice-translation",
  "data": {
    "action": "status",
    "message": "TTS playback stopped"
  }
}

tts_mode - Switch TTS Mode

Description

Switch the TTS playback mode (synchronous/asynchronous) during a recording.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value tts_mode
tts_modestringYesMode: sync (synchronous) or async (asynchronous)

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "tts_mode",
    "tts_mode": "async"
  }
}

Successful Response

Returns the tts_mode_changed event:

{
  "type": "voice-translation",
  "data": {
    "action": "tts_mode_changed",
    "tts_mode": "async"
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
tts_not_enabled400TTS not enabledMake sure TTS was enabled at start
invalid_data422Invalid modeUse sync or async

set_tts - Two-Way Translation TTS Settings

Description

During two-way translation mode (conversation), toggle TTS on/off or update the TTS voice settings mid-session. Available only under the conversation type.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value set_tts
tts_enabledbooleanNoToggle TTS on/off
tts_configobjectNoUpdate the TTS settings for a specific language (only the two two-way translation languages are valid)

Request Example (Disable TTS)

{
  "type": "voice-translation",
  "data": {
    "action": "set_tts",
    "tts_enabled": false
  }
}

Request Example (Update TTS Voice)

{
  "type": "voice-translation",
  "data": {
    "action": "set_tts",
    "tts_enabled": true,
    "tts_config": {
      "en-US": { "voice": "en-US-GuyNeural", "speaking_rate": 1.2 }
    }
  }
}

Successful Response

Returns the tts_updated event:

{
  "type": "voice-translation",
  "data": {
    "action": "tts_updated",
    "tts_enabled": true,
    "tts_config": {
      "zh-TW": { "voice": "zh-TW-HsiaoChenNeural", "speaking_rate": 1.0 },
      "en-US": { "voice": "en-US-GuyNeural", "speaking_rate": 1.2 }
    }
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
invalid_action400This operation is not supported outside two-way translation modeUse only under the conversation type

start_speaking - Start Speaking (Manual Mode)

Description

In two-way translation manual mode (conversation_mode: "manual"), notify the system that the user has started speaking. From this moment, audio is sent to STT for recognition, and all recognition results accumulate into the same sentence (no automatic segmentation).

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value start_speaking
speakerintYesUser number (1 or 2)

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "start_speaking",
    "speaker": 1
  }
}

Successful Response

{
  "type": "voice-translation",
  "data": {
    "action": "status",
    "message": "Started speaking"
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
invalid_action400Not in two-way translation modeUse only under the conversation type
conversation_not_manual_mode400Not in manual modeUse only in manual mode
conversation_speaking400Already speakingCall stop_speaking first
conversation_invalid_speaker400Invalid user numberUse 1 or 2

stop_speaking - Stop Speaking (Manual Mode)

Description

In two-way translation manual mode, notify the system that the user has stopped speaking. The system merges the recognition results accumulated during this period into one complete sentence, then translates it and synthesizes TTS.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value stop_speaking

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "stop_speaking"
  }
}

Successful Response

After speaking stops, the system sends a complete result event (containing origin and translations):

{
  "type": "voice-translation",
  "data": {
    "action": "result",
    "origin": {
      "sid": 1,
      "language": "zh-TW",
      "text": "The complete sentence merged from all recognition during this period",
      "is_final": true,
      "speaker_id": "Speaker-1",
      "start_time": "00:05"
    },
    "translations": {
      "en-US": {
        "sid": 1,
        "text": "The complete merged sentence from this speaking period",
        "is_final": true
      }
    }
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
invalid_action400Not in two-way translation modeUse only under the conversation type
conversation_not_speaking400Not in the speaking stateCall start_speaking first

switch_conversation_mode - Switch Conversation Mode

Description

During two-way translation mode, switch between automatic detection mode (auto) and manual mode (manual). If the user is speaking during the switch, speaking ends automatically.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value switch_conversation_mode
conversation_modestringYesTarget mode: auto or manual

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "switch_conversation_mode",
    "conversation_mode": "manual"
  }
}

Successful Response

Returns the conversation_mode_changed event:

{
  "type": "voice-translation",
  "data": {
    "action": "conversation_mode_changed",
    "conversation_mode": "manual"
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
invalid_action400Not in two-way translation modeUse only under the conversation type
conversation_invalid_mode400Invalid conversation modeUse auto or manual

set_speaker_language - Set Speaker Language

Description

During two-way translation mode, change a specified user's language in real time. The system rebuilds the STT connection to adapt to the new language, and the translation target is updated automatically. Transcript content before the change keeps its original language, and timestamps continue counting without resetting.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value set_speaker_language
speakerintYesUser number (1 or 2)
languagestringYesThe new language code (e.g. ja-JP)

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "set_speaker_language",
    "speaker": 1,
    "language": "ja-JP"
  }
}

Successful Response

Returns the speaker_language_changed event:

{
  "type": "voice-translation",
  "data": {
    "action": "speaker_language_changed",
    "speaker_language_map": {
      "1": "ja-JP",
      "2": "en-US"
    }
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
invalid_action400Not in two-way translation modeUse only under the conversation type
conversation_invalid_speaker400Invalid user numberUse 1 or 2
conversation_invalid_language400Invalid language codeUse a valid BCP 47 language code
conversation_same_language400Same as the current languageYou can ignore this warning
conversation_language_same_as_peer400The new language is the same as the other user'sThe two users cannot have the same language
conversation_speaking400Currently speaking, cannot change the languageEnd speaking first, then change
conversation_language_change_failed500Language change failed (STT rebuild failed)Retry later

broadcast_go_live - Switch to the Live Phase

Description

Switch from the broadcast standby phase (standby) to the live phase (live). After the switch, STT/translation results begin broadcasting to viewers and start being written to the transcript.

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "broadcast_go_live"
  }
}

Successful Response

Returns the broadcast_phase_changed event:

{
  "type": "voice-translation",
  "data": {
    "action": "broadcast_phase_changed",
    "phase": "live",
    "message": "Broadcast started"
  }
}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
broadcast_not_enabled400Not in broadcast modeMake sure type: "broadcast"

Note: If already in the live phase, a status message "Broadcast is already in progress" is returned; this is not treated as an error.


broadcast_announcement - Send an Announcement

Description

The host sends a custom announcement message to all viewers. Viewers receive an announcement event via SSE. The announcement message is automatically translated into all translation languages, and the SSE event viewers receive includes a translations field.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value broadcast_announcement
messagestringYesThe announcement message content

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "broadcast_announcement",
    "message": "The meeting will end in 5 minutes"
  }
}

Successful Response

{
  "type": "voice-translation",
  "data": {
    "action": "status",
    "message": "Announcement sent"
  }
}

The SSE event received on the viewer side (with translations):

event: announcement
data: {"message":"The meeting will end in 5 minutes","translations":{"en-US":"The meeting will end in 5 minutes","ja-JP":"会議は5分後に終了します"}}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
broadcast_not_enabled400Not in broadcast modeMake sure type: "broadcast"
invalid_parameter400The message is emptyProvide a valid message parameter

set_standby_message - Set the Standby Phase Message

Description

Dynamically set the message displayed to viewers during the broadcast standby phase (standby). This lets the host enter standby mode first and set the waiting message afterward, rather than having to provide it at start.

The message is automatically translated into all translation languages, and the SSE event viewers receive includes a translations field.

Request Parameters

ParameterTypeRequiredDescription
actionstringYesFixed value set_standby_message
messagestringYesThe standby phase display text (translated for viewers in each language via the translation pipeline)

Request Example

{
  "type": "voice-translation",
  "data": {
    "action": "set_standby_message",
    "message": "The talk is about to begin, please wait..."
  }
}

Successful Response

{
  "type": "voice-translation",
  "data": {
    "action": "status",
    "message": "Standby phase text updated"
  }
}

The SSE event received on the viewer side (with translations):

event: standby
data: {"message":"The talk is about to begin, please wait...","translations":{"en-US":"The presentation is about to begin, please wait...","ja-JP":"プレゼンテーションがまもなく始まります。お待ちください..."}}

Error Codes

Error CodeHTTP StatusDescriptionRecommended Action
broadcast_not_enabled400Not in broadcast modeMake sure type: "broadcast"
broadcast_not_in_standby400Not in the standby phaseCan only be used during the standby phase

Note: This action can only be used during the standby phase (standby). If you have already entered the live phase (live), an error is returned.


Version: V1.5.7 Last Updated: 2026-05-20

Copyright © 2026