Changelog
V1.5.7 (2026-05-20)
Documentation Update (No API Behavior Changes)
The public API behavior is completely unchanged. This release is a documentation supplement and wording revision.
New Usage Guide: Summary Prompt Customization
Added the Summary Prompt Customization Guide, consolidating the summary customization specs that were previously scattered across 6 reference documents into a single guide:
- The mutual-exclusion rules and use cases for the
builtinandcustomsummary modes - The corresponding fields for the three entry points: REST
POST /api/v1/summary, the WebSocketstartaction, and SSEregenerate/summary - Transcript record fields (including the
summary_prompt_snapshotaudit field and thesummary_fallback_level/summary_dropped_segmentsfallback audit fields) - A Profanity and Sensitive-Word Handling section, integrating the three paths (customer prompt -> neutral mode, transcript -> STT
profanity_handlingmasking, transcript -> summary-layer segment omission) and explicitly stating that the API layer does not proactively reject requests containing sensitive words - The built-in safety guard (content-neutralization guidance, prompt-injection protection) and character-length limits
- Complete examples for Node.js, Python, and WebSocket
The "Feature Guides" table in README.md now includes an entry for this guide.
Documentation Wording Revision
Refined public-facing wording: replaced internal/architecture-specific terms with neutral, customer-facing descriptions.
This revision is limited to the new guide guides/summary-customization.md. The remaining reference documents and historical changelog entries are kept as-is and will be aligned in subsequent releases.
Changelog Internal-Numbering Cleanup
- Removed an internal task number from the V1.5.4 section heading
- Removed an internal-policy reference from the V1.5.4 body text
Reference
V1.5.6 (2026-05-19)
Documentation Alignment Fixes (No API Behavior Changes)
This release is a documentation proofreading pass; the public API behavior is completely unchanged. If you previously implemented against the older documentation, please adjust to the current spec for the items below.
Token Formats
broadcast_token: a 4-character short code (character set a-z0-9)viewer_access_token: a 64-character alphanumeric string (not a JWT, no payload structure; do not attempt to parse it)
HTTP Status Codes
sse_missing_target_lang/sse_unsupported_language: 422broadcast_token_invalid(viewer verify endpoint): 401
Error Code Strings
POST /api/v1/importsinsufficient quota:stt_quota_exceeded- Broadcast not found on viewer SSE:
broadcast_session_not_found - Broadcast at capacity on viewer SSE:
broadcast_capacity_exceeded - The
contextfor thesse_translation_failederror event issse
WebSocket Event Naming
retranslatesuccess event:action: "translation"- Storage-layer upload failures: delivered via a
type: "error"envelope (error_codeisstorage_upload_failed/storage_connection_failed/storage_queue_full); there is no separateupload_erroraction
Newly Documented Error Codes
| Endpoint / Action | Error Code | Description |
|---|---|---|
WebSocket set_name | set_name_empty / set_name_too_long / set_name_not_ready | Replaces the older name_too_long |
WebSocket audio | audio_process_failed | STT write fails repeatedly (HTTP 500; reconnecting is recommended) |
Reference
- reference/rest/viewer.md
- reference/websocket/events.md
- reference/websocket/voice-translation.md
- reference/sse/retranslate.md
V1.5.5 (2026-05-13)
Breaking Change: The Summary API Is Now Mode-Aware
The "template + custom_prompt combined" design introduced in V1.5.4 is now mutually exclusive: on each summary request you must choose either mode=builtin (apply the built-in template) or mode=custom (your prompt fully replaces the built-in template).
Clients must migrate: V1.5.4 clients that do not update their fields will receive a 422.
Unified New Fields Across the Three Entry Points: REST POST /api/v1/summary, SSE regenerate/summary, and the WebSocket start action
Old (V1.5.4) -> New (V1.5.5) mapping:
| Old Field | New Field | Notes |
|---|---|---|
template / templateSlug / summary_template | Same name (builtin mode only) | Unchanged, but must not be sent in custom mode |
custom_prompt / customPrompt / summary_custom_prompt | prompt / summary_prompt (custom mode only) | Renamed |
custom_prompt_slug / customPromptSlug / summary_custom_prompt_slug | prompt_slug / summary_prompt_slug (custom mode only) | Renamed |
persist_custom_prompt / persistCustomPrompt | (removed) | Custom mode always snapshots; no opt-in |
custom_instructions | (removed) | Legacy field, no longer supported |
| (none) | mode / summary_mode (required) | New required field, enum builtin / custom |
Mutual-exclusion rules:
mode=builtin:templateis required;prompt/prompt_slugmust not be sentmode=custom:prompt/prompt_slugis required;templatemust not be sent- Violations -> 422
summary_mode_field_mismatch
GET /api/v1/tasks/ Response Fields
Within data.tasks[]:
- Added
summary_mode(builtin/custom/null) summary_templatenow returns the effective slug (in custom mode it returns your slug, identical to theprompt_slugyou submitted)- Removed
summary_custom_prompt_slug(merged intosummary_template)
Backward compatibility: recordings without a generated summary have summary_mode set to null; existing builtin-mode recordings keep their original summary_template value.
Transcript Record Structure Changes
New top-level fields (not nested under the summary object):
| Field | Description |
|---|---|
summary_mode | builtin / custom |
summary_template | effective slug — builtin -> the built-in slug; custom -> your slug |
summary_plain_text | bool |
summary_prompt_snapshot | Present only in custom mode; the prompt content you passed in verbatim (not written in builtin mode) |
summary_fallback_level | Present only when a fallback was triggered (value 2 or 3); indicates that this summary went through an automatic content-filter fallback path. Omitted when the summary succeeds directly |
summary_dropped_segments | Present only when fallback_level=3; the indices of the transcript segments that were dropped (an array of integers in original order) |
In addition to the existing text, the init_summary event of GET /api/v1/sse/history/transcribe/{taskId} now adds mode / template / plain_text / prompt_snapshot (populated only in custom mode) for client traceability, plus fallback_level / dropped_segments (populated only when a fallback was triggered).
New Outbound WebSocket Events
summary_done: summary generation completed (includessummary_mode/summary_template(effective) /summary_plain_text/tokens_used/summary_fallback_level/summary_dropped_segments; does not includefinal_content)summary_error: summary generation failed (includeserror_code/message)
Clients no longer need to poll the transcript record to determine whether the summary is complete.
Automatic Content-Filter Fallback for Summaries
When a custom-mode prompt or transcript content triggers the LLM service's content filter (finish_reason=content_filter), the system handles it through an automatic multi-step fallback instead of failing outright. If some transcript segments still cannot be processed, they are omitted and reported via summary_dropped_segments. If even the fallback cannot produce a summary, a summary_error event is emitted with error_code=llm_content_filtered.
Client-side handling:
- Use
summary_fallback_levelto show a UI notice indicating the summary was produced through a content-filter fallback path - Use
summary_dropped_segmentsto inform the user which segments were actually omitted
Spec scope: In this release the fallback applies to two paths: WebSocket realtime summaries (auto-generated when a recording ends) and file-import summaries. Fallback integration for the SSE
regenerate/summaryendpoint is a follow-up; in the current version it still returnsllm_content_filteredwhen blocked.
Custom-Mode Prompt Safety Rule (New in V1.5.5)
The built-in safety guard applied to custom-mode prompts now adds a rule instructing the LLM to "summarize the intent of any colloquial, emotional, or sensitive wording in the source in neutral, objective language, avoiding verbatim quotation or repetition." This rule is enforced by the backend and is not exposed for client configuration; its purpose is to reduce the chance of triggering the content filter on the first attempt.
The safety-guard content is never written to backend storage or logs. Your original prompt is still stored via the summary_prompt_snapshot field as an audit reference, complementing summary_fallback_level:
summary_prompt_snapshot= your intent (the original prompt content)summary_fallback_level= the actual execution path taken by the automatic fallback
Custom-Mode Prompt-Injection Protection
In custom mode, the backend wraps the prompt you provide with prompt-injection protection to prevent instructions inside your prompt from overriding system rules. You should still avoid concatenating untrusted end-user input directly into prompt.
New Error Codes
| Error Code | HTTP | Trigger Condition |
|---|---|---|
summary_invalid_mode | 422 (SSE) / 400 (others) | mode is not builtin / custom |
summary_mode_field_mismatch | 422 / 400 | The mode and field combination is inconsistent (a required field is missing, or a forbidden field was sent) |
summary_prompt_too_long | 422 / 400 | prompt exceeds 2000 characters |
summary_prompt_slug_too_long | 422 / 400 | prompt_slug exceeds 64 characters |
summary_prompt_slug_invalid | 422 / 400 | prompt_slug contains control characters (\n / \r / \t / \0, etc.) |
Client Recommendations
- Add the required
modefield — change existing calls usingtemplateSlug=meetingtomode=builtin&template=meeting - Rename fields —
customPrompt->prompt,customPromptSlug->promptSlug; these two fields are only used inmode=custom - Remove
persistCustomPrompt— custom mode preserves the prompt content automatically - Change
templateSlugtotemplate— and only use it inmode=builtin - Transcript records now use top-level fields — no longer nested under the
summaryobject - Clients can determine whether a summary was saved from the done event / summary_done event — check
persisted: true/false; you no longer need to infer it from the HTTP method
Reference
- reference/rest/summary.md
- reference/sse/regenerate-summary.md
- reference/rest/summary-templates.md
- reference/websocket/voice-translation.md
- reference/websocket/events.md
V1.5.4 (2026-05-12)
New Feature: Customer Prompt Customization for Summaries
Enterprise customers can now add their own rules to the summary API without modifying the built-in template. This release adds three orthogonal client parameters and splits the summary regeneration endpoint into "preview" and "save" verbs, avoiding the design gap of an HTTP GET with side effects.
Fully backward compatible — not sending the new fields = behavior identical to the previous version.
New Fields for POST /api/v1/summary
| Field | Type | Limit | Description |
|---|---|---|---|
custom_prompt | string | <=2000 characters | Customer custom instructions appended after the built-in template |
custom_prompt_slug | string | <=64 characters, Unicode, no control characters | A client-defined template identifier (pass-through) |
plain_text | bool | Default false | Request plain-text output (the backend performs Markdown post-processing) |
persist_custom_prompt | bool | Default false | Opt-in: whether the done event echoes the custom_prompt content |
The SSE start / done events also add the corresponding fields (custom_prompt_slug, plain_text, final_content, custom_prompt_snapshot); see reference/rest/summary.md.
/api/v1/sse/regenerate/summary/{taskId} Split Into Two Endpoints
| Method | Purpose | Writes DB | Saves Transcript | Billed |
|---|---|---|---|---|
| GET | Preview (dry run, compare different prompt results) | No | No | Yes |
| POST | Save (official persistence) | Yes | Yes + bumps revision | Yes |
Client recommendation: If your integration previously relied on "the backend record updating automatically after a GET," switch to POST. GET is now a pure preview and no longer writes any backend state.
The done event adds a
persisted: boolfield, so clients can determine directly from the payload whether this call was saved, without inferring from the HTTP method.
Four New Fields for the WebSocket start Action
summary_custom_prompt / summary_custom_prompt_slug / summary_plain_text / summary_persist_custom_prompt, mapping one-to-one to the REST endpoint fields with the same limits.
New Endpoint: GET /api/v1/summary-templates/{slug}
Exposes the built-in template's full content so enterprise customers can reference the existing baseline when integrating and then decide what to add via custom_prompt.
GET /api/v1/summary-templates also adds a ?category=summary|medical|legal|all filter and a data[].category field in the response (default summary, backward compatible).
New Error Codes
| Error Code | HTTP | Trigger Condition |
|---|---|---|
custom_prompt_too_long | 400 | custom_prompt exceeds 2000 characters |
custom_prompt_slug_too_long | 400 | custom_prompt_slug exceeds 64 characters |
custom_prompt_slug_invalid | 400 | custom_prompt_slug contains control characters |
template_not_found | 404 | The template for the specified slug does not exist or is disabled |
invalid_category | 400 | ?category= is not in the allowlist |
Behavior Changes
summary_text_empty/summary_text_too_longHTTP status code fix: these previously fell through to 500 because they were not explicitly mapped; this release fixes them to a semantically correct 400.- The
POST /api/v1/summaryerror eventdetailsno longer includes the LLM raw error: the raw error goes only to the server log; thedetailsreturned to the client retains only theproviderindicator. - GET preview is still billed: the LLM actually consumes tokens, so the GET endpoint cannot be free. Repeated GET calls are billed repeatedly, but they do not change DB / Blob state.
Path and Field Naming Conventions
customPromptSlugis a customer-defined pass-through identifier (semantically different from the existingtemplateSlug, which is validated for existence). In naming terms, the former is "for client traceability" and the latter is "for looking up the VAS built-in template."summary_custom_prompt_slugis recorded with each summary, so you can later query which customer template a summary corresponds to.custom_prompt_snapshot(opt-in) is stored in the transcript record only when the customer setspersist_custom_prompt=true; it is never written to the DB.
Security Controls
- All endpoints require API Key authentication
- The VAS server log does not log
custom_promptor the full transcript (it logs only the length and slug) - LLM error messages are sanitized (the raw error is not exposed to the client)
custom_promptis fully isolated across tenants (session-scoped, no memory persistence)
Bug Fixes and Internal Improvements
- The WebSocket
startactionrecording_idfield deprecation target version is unified to V2.0.0 (events.md previously said V1.6.0, inconsistent with code comments) - The SSE
sse-api.mdbroken TOC anchor is fixed (it pointed to the audio section, but that content has been moved to the standalonereference/sse/audio.md) - Improved text sanitization so CJK characters and emoji (including ZWJ sequences) are no longer mis-split or wrongly rejected
- The summary regeneration full text now has a 100,000-character upper limit
Reference
- reference/rest/summary.md (new)
- reference/rest/summary-templates.md (added
?category=andGET /{slug}) - reference/sse/regenerate-summary.md (rewritten as a two-endpoint GET / POST spec)
- reference/websocket/voice-translation.md (added the 4
summary_custom_prompt*fields + 3 error codes)
V1.5.3 (2026-05-07)
Breaking Change: speaker_id Naming Inversion
To support speaker editing, V1.3.12 added the original_speaker_id field to preserve the original ID, but it left a design gap where "the same name means different things at different stages": for WebSocket realtime recording, speaker_id is the original ID (e.g., Guest-1), but after an SSE historical audio load, speaker_id becomes the display name (e.g., Manager Wang, with the alias applied). Frontends often picked the wrong field and passed it to PATCH /speakers/reassign.
This release performs a one-time inversion that is not backward compatible:
| Old Name | New Name | Semantics |
|---|---|---|
speaker_id (display name) | speaker_label | Display label (after alias is applied; mutable, human-readable) |
original_speaker_id (original ID) | speaker_id | Original speaker ID (immutable, always stable) |
After the inversion, speaker_id consistently refers to the original ID in all contexts (WebSocket / SSE / REST / blob / log); the new speaker_label represents the display label after the alias is applied. Speaker editing (rename / reassign / merge) always uses speaker_id as the locating key.
REST API Field Changes
PATCH /api/v1/tasks/{taskId}/speakers/rename
| Location | Old Field | New Field |
|---|---|---|
| Request body | original_name | speaker_id (max 100 characters) |
| Request body | new_name | new_label (max 100 characters, no control characters \x00-\x1F / \x7F or newlines) |
| Response data | original_name | speaker_id |
| Response data | new_name | new_label |
speaker_id can still also accept a display label for chained renaming (e.g., first rename Guest-1 to "Manager Wang," then use "Manager Wang" to rename to "Director Wang"); the resolved response speaker_id is always the original ID.
PATCH /api/v1/tasks/{taskId}/speakers/reassign
| Location | Old Field | New Field |
|---|---|---|
| Request body | target_speaker_id | Unchanged (semantics already aligned to the original ID) |
| Response data | new_speaker_name | new_speaker_label |
target_speaker_id must be the original ID (taken from init_sentence.speaker_id); reassign does not accept a display label.
PATCH /api/v1/tasks/{taskId}/speakers/merge
| Location | Old Field | New Field |
|---|---|---|
| Request body | source_speaker_id / target_speaker_id | Unchanged (still accepts the original ID or the current display label) |
| Response data | target_speaker_name | target_speaker_label |
WebSocket Event Changes
| Event | Old Field | New Field |
|---|---|---|
rename_speaker action body | original_name / new_name | speaker_id / new_label |
result event origin / translations[lang] | only speaker_id (mixed with display name) | speaker_id (original ID) + speaker_label (display label) |
speaker_renamed event | original_name / new_name | speaker_id / new_label |
speaker_reassigned event | new_speaker_name | new_speaker_label |
speakers_merged event | (missing target label) | added target_speaker_label |
SSE Event Changes
| Event | Old Field | New Field |
|---|---|---|
init_sentence | speaker_id (display name) + original_speaker_id (original ID) | speaker_id (original ID) + speaker_label (display label) |
Broadcast viewer origin / translation | only speaker_id (mixed) | speaker_id + speaker_label |
Broadcast viewer speaker_renamed / speaker_reassigned / speakers_merged | same as the corresponding WebSocket events | as above |
The behavior and fields of init_metadata.speaker_aliases (the "original ID -> display label" mapping) are unchanged.
Client Recommendations
- Customers using WebSocket realtime recording: before upgrading, sync the handling of
result.origin.speaker_idand the newresult.origin.speaker_label; change the rename body to{ "speaker_id": "...", "new_label": "..." } - Customers using SSE historical audio:
init_sentence.speaker_idis now the original ID (previously the display name); switch tospeaker_labelfor display - Customers doing speaker editing (rename / reassign / merge):
- rename -> use
speaker_id(either the original ID or the current display label) +new_label - reassign ->
target_speaker_idmust be the original ID (taken frominit_sentence.speaker_id; you cannot send a display label) - merge ->
source_speaker_id/target_speaker_idcan still be the original ID or the current display label
- rename -> use
- Customers integrating TXT/SRT/CSV export:
new_labelnow has control-character/newline validation; if you previously sent labels containing newlines, you will now receive a 422, so change to single-line content - Customers who do not do speaker editing and only consume transcript text: the impact is minimal; the only behavior difference is that if old code rendered
speaker_iddirectly as the display name, it must switch tospeaker_label
Data Compatibility
- Not backward compatible: old transcript blobs (V1.3.12 ~ V1.5.1, containing
speaker+original_speaker_id) require migration before they can be read in the new version; there is no cross-version data retention commitment during the POC phase - New recordings are unaffected: transcript blobs created after V1.5.3 use the new fields directly
Documentation Update
- reference/rest/speakers.md: the body / response of all three endpoints (rename / reassign / merge) are fully aligned
- rest-api.md L2280–2455: the speakers summary is aligned to the new fields
- reference/websocket/voice-translation.md L975–1127: the rename_speaker / reassign_speaker / merge_speakers actions
- reference/websocket/events.md L405–495: the speaker_renamed / reassigned / merged events
- websocket-api.md L1310–2552: both rename / reassign / merge sections (actions first, events second) are aligned
- reference/sse/history.md L135–195: the
init_sentenceschema + client recommendations - reference/sse/broadcast-viewer.md L300–365, L605–630: viewer broadcast events + JS examples
- sse-api.md L240–475, L750–795: broadcast origin/translation + history init_sentence schema
- guides/speaker-management.md L140–390: examples + JS handler
- examples/curl.md, examples/python.md, examples/javascript.md: all rename / reassign / merge examples + TS interface
Reference
- REST - Speakers API
- WebSocket - Voice Translation
- WebSocket - Events
- SSE - Historical Audio Streaming
- Guide - Speaker Management
V1.5.1 (2026-05-07)
Bug Fix: POST /api/v1/imports Adds Length Validation for Terminology / Correction Fields
The length limits promised in several places in the documentation (e.g., a term's max of 100 characters) were previously not actually enforced on the file-import path, and overly long content was silently accepted. This release restores them, aligning behavior with the documentation's promises.
Behavior Changes (Aligning With Documented Promises)
POST /api/v1/imports adds 422 rejection conditions for the following fields (previously accepted):
| Field | Limit |
|---|---|
terminology.<lang> | Array, max 500 terms (per language) |
terminology.<lang>[].term | string, max 100 characters |
terminology.<lang>[].boost | numeric, 0.5–5.0 (optional, default 1.0) |
fuzzy_correction.<lang>[].correct | string, max 200 characters |
fuzzy_correction.<lang>[].incorrect[] | string, max 200 characters |
These limits are consistent with the WebSocket
configaction; previously only the WebSocket path enforced them, and this release completes the file-import path.
Client Recommendations
If you previously sent overly long terms (>100 characters) via POST /api/v1/imports, you will now receive a 422. The frontend should check the length before submitting and prompt the user. The WebSocket path is unchanged.
V1.5.0 (2026-05-07)
Internal Naming Unification (No Public API Changes)
Continuing the task_id naming unification started in V1.4.1, this release completes the transition of the internal protocol layer.
The public API (WebSocket, REST, Webhook, SSE) is completely unchanged, and customers need to take no action.
The old naming (recording_id) will be fully removed in V1.6.0; for the related client migration guidance, see V1.4.1 Client Recommendations.
V1.4.3 (2026-05-07)
Internal Observability-Layer Naming Unification (No Public API Changes)
Following V1.4.1, this release performs a naming-unification transition for the log and monitoring layers.
The public API is completely unchanged, and customers need to take no action.
V1.4.2 (2026-05-07)
Internal Code Naming Unification (No Public API Changes)
Following V1.4.1, this release advances the task_id naming to the backend code level.
The public API is completely unchanged, and customers need to take no action.
V1.4.1 (2026-05-06)
Naming Unification: task_id as the Cross-Interface Task Identifier
Previously, the same task had different field names across interfaces (WebSocket used recording_id, Webhook used task_id, and some REST path variables mixed {recordingId} / {taskId}), forcing integrators to reconcile the three naming schemes themselves. This release starts the naming-unification cycle; new integrations should use task_id consistently.
WebSocket Changes (Backward Compatible)
- The
session_startedevent payload now carries bothtask_idandrecording_id, and their values are exactly the same (the UUID of the same recording) - The
recording_idfield is marked as Deprecated; it is still emitted normally and is scheduled for removal in V1.6.0 - Documentation enhancement:
session_idis the WS connection-level identifier (invalidated when the connection ends), which is a different level fromtask_id(the task identifier)
REST API Changes (Backward Compatible)
Added /api/v1/tasks/{taskId}/... alias paths that behave exactly the same as the existing /api/v1/recordings/{recordingId}/...:
| Recommended (from V1.4.1) | Deprecated (removed in V1.6.0) |
|---|---|
PATCH /api/v1/tasks/{taskId}/speakers/rename | PATCH /api/v1/recordings/{recordingId}/speakers/rename |
PATCH /api/v1/tasks/{taskId}/speakers/reassign | PATCH /api/v1/recordings/{recordingId}/speakers/reassign |
PATCH /api/v1/tasks/{taskId}/entries/{sid} | PATCH /api/v1/recordings/{recordingId}/entries/{sid} |
Client Recommendations
- New integrations: use the
task_idfield and the/api/v1/tasks/{taskId}/...paths consistently to avoid migrating again later - Existing integrations: no immediate change required.
recording_idand/api/v1/recordings/...remain available throughout the V1.x period; we recommend migrating on your schedule, at the latest before V1.6.0 ships - ID alignment logic: if you depend on both WS and Webhook, you can align the WS
task_id(or the old namerecording_id) directly with the Webhookdata.task_id; all three are the same UUID - Do not use
session_idfor alignment:session_idis meaningful only within the WS connection lifecycle and does not appear in Webhook or REST
Removal Timeline Announcement (V1.6.0)
V1.6.0 will remove the recording_id field from the WS payload and remove the /api/v1/recordings/{recordingId}/... paths. The detailed timeline will be announced separately before V1.6.0 ships.
Unchanged Items
- Webhook payload: the existing
data.task_idnaming is unchanged - Existing
/api/v1/tasks/{taskId}/...endpoints: unchanged
V1.4.0 (2026-05-06)
New Feature: Source-Text Editing for Historical Recordings + Automatic Retranslation
Users can correct STT recognition errors and regenerate translations; for the workflow, see Entries API Typical Workflow.
- New endpoint
PATCH /api/v1/recordings/{recordingId}/entries/{sid}: edit a single sentence's source text; on the first edit it automatically backs up the original STT output tooriginal_text_raw, recordsoriginal_text_edited_at, and clears the TTS cache for all languages of that sentence - New endpoint
GET /api/v1/sse/recordings/{taskId}/entries/{sid}/retranslate: retranslate a single sentence (you can specify languages or retranslate all existing languages), with optimistic locking (expectedRevision) - Editing and retranslation are decoupled: PATCH only changes the source text and does not touch the translation; the frontend can decide when to trigger retranslation
Historical Record SSE Exposes Edit Markers
The historyTranscribe init_sentence event carries original_text_raw (the STT original) and original_text_edited_at on edited sentences, so the frontend can show an "edited" marker and a "restore original" function.
Security Fixes
retranslate/retranslateSummaryadd a user filter: these two existing SSE endpoints previously had a horizontal privilege vulnerability (IDOR) that allowed reading other users' recordings. This release adds the permission check; other users' recordings now returnrecording_not_found.- Retranslation / summary regeneration requires the recording to be completed: the four endpoints
retranslate/retranslateSummary/retranslateEntry/regenerateSummaryrequireprocessing_status === completedto avoid racing with the in-progress flow. When not completed, they returnrecording_not_completed.
New Error Codes
| Error Code | HTTP | Description |
|---|---|---|
recording_not_completed | 422 | The recording has not finished processing; retranslation / editing / summary regeneration is not allowed |
entry_not_found | 404 | The specified sentence was not found |
entry_text_empty | 422 | The sentence's source text is empty |
entry_text_too_long | 422 | The sentence's source text exceeds the 2000-character limit |
transcript_revision_conflict | 409 | The transcript has been modified by another request (optimistic-lock conflict) |
See error-codes.md.
Client Recommendations
- After editing the STT source text: we recommend triggering single-sentence retranslation SSE immediately after the PATCH, passing the
revisionfrom the PATCH response asexpectedRevisionto avoid concurrent overwrites - Showing the edit marker: determine whether a sentence has been edited by the presence of the
original_text_rawfield in theinit_sentenceevent ('original_text_raw' in data); do not use text comparison (the user may edit and then change it back to the original value) - Recording status: calling retranslation / editing / summary regeneration on a recording that is not completed returns
recording_not_completed; the frontend should block these operations in the UI untilprocessing_status === completed
V1.3.13 (2026-05-06)
Behavior Changes (Breaking Changes)
- WebSocket
audio_formatlocked topcmandwebm: the previously accepted 5 formats (pcm/webm/mp3/wav/m4a) are narrowed to accepting onlypcmandwebm, consistent with the existing spec inreference/websocket/voice-translation.md. Customers who sendmp3/wav/m4awill now receiveaudio_format_unsupported(previously these were silently decoded, which was undocumented implicit behavior). File imports still go throughPOST /api/v1/importsand are unaffected.
Documentation Update
- Audio download Content-Type is always
audio/mp4: rest-api / SSE audio / tasks export / history playback / curl / javascript documentation in several places is unified to "all recording audio is returned in an M4A container (AAC encoding)," removing the previous circular "dynamically determined" description. - Supported file-import formats narrowed to
mp3/wav/m4a: removed mentions ofmp4andwebmfrom the documentation to align with the formats actually accepted (guides/file-import.md, reference/rest/imports.md).
Client Recommendations
- Customers using the WebSocket
startaction: be sure to explicitly specifyaudio_formataspcmorwebm; if you previously relied on the undocumented implicitmp3/wav/m4asupport (very rare scenarios), switch to the File Import API. - Customers downloading recording audio: all new recordings have Content-Type fixed to
audio/mp4with the.m4aextension. If older recordings still exist in storage, downloads may still returnaudio/webm; we recommend keeping a handling branch for the old extension to cover historical data.
Reference
- WebSocket - Voice Translation
- REST - Task Audio Export
- SSE - Historical Audio Streaming
- Guide - File Import
- Guide - History Playback
V1.3.12 (2026-05-04)
⚠️ Inverted in V1.5.3: the
original_speaker_idfield and the "speaker_idis the display name" design introduced in this version have been superseded by the naming inversion in V1.5.3. This section is kept as a historical record; new integrations should refer directly to the V1.5.3 spec and do not need to implement this version's client recommendations.
New Feature
- History SSE adds fields to align with the Transcribe speaker-editing UX: the historical record's
init_metadataandinit_sentenceevents each add a field, allowing the frontend to fully reuse the realtime recording page's speaker-editing menu (single-sentence reassignment + global rename).init_metadataaddsspeaker_aliases(object): the "original speaker ID -> display name" mapping. When there are no aliases it is{}(an empty object, not an empty array). It lets the frontend perform a name-collision precheck before sendingPATCH /speakers/rename, covering the implicit conflict of "an original ID that exists on the backend but does not appear on screen because it was renamed."init_sentenceaddsoriginal_speaker_id(string|null): the original speaker identifier without alias substitution, provided as the source for thetarget_speaker_idofPATCH /speakers/reassign.- Old-data fallback: if a pre-v2.24.0 old transcript record has no
original_speaker_id, the SSE output automatically falls back tospeaker_id, preventing the new field from being null and disabling the editing entry point for old recordings.
Behavior Changes
- No breaking change. Both fields are pure additions; existing SSE clients using Zod
z.object(which strips extras by default) will not fail to parse, so no version negotiation is needed.
Documentation Update
- sse-api.md L156-198: added the new field descriptions to the
init_metadata/init_sentenceexamples and field tables - reference/sse/history.md L103-180: added the detailed reference schema accordingly
Client Recommendations
- Customers doing speaker editing on the history detail page: get the original ID for reassign from
init_sentence.original_speaker_id(do not usespeaker_id, which is the display name with the alias applied); useinit_metadata.speaker_aliasesfor the name-collision precheck before a rename. - Customers who do not do speaker editing: you can ignore the new fields; existing parsing behavior is unaffected.
Reference
- SSE API - Historical Record Events
- Reference - Historical Record SSE
- Recording Speaker API
- Speaker Management Guide
V1.3.11 (2026-05-04)
Behavior Changes (Breaking Changes)
- STT rejects the bare
encode (the V1.3.10 changelog claimed it was removed, but it was not actually in effect): customers who sendenwill receive a 422invalid_transcription_language; use a full BCP 47 code such asen-US/en-GBinstead. - TTS removes 4 locales not supported by the speech provider:
it-CH,ar-IL,ar-PS,en-GH. The speech provider's TTS never supported these 4 locales; previously, customers requesting their voices would fail at runtime on the provider side. STT still supports these 4 locales.
New Feature
- TTS completed to the speech provider's full set (154 languages, 325 voices): fully aligned with the speech provider's Monolingual Neural Voice list (including GA + Preview)
- Chinese dialects (4 added):
zh-CN-henan,zh-CN-guangxi,zh-CN-liaoning,zh-CN-shaanxi - South Asian languages (5 added):
bn-BDBengali (Bangladesh),ta-LKTamil (Sri Lanka),ta-MYTamil (Malaysia),ta-SGTamil (Singapore),ur-PKUrdu (Pakistan) - Southeast Asian languages (1 added):
su-IDSundanese (Indonesia) - Eastern European languages (1 added):
sr-Latn-RSSerbian (Latin script) - North American indigenous languages (2 added):
iu-Cans-CAInuktitut (Canadian syllabics),iu-Latn-CAInuktitut (Canadian Latin script)
- Chinese dialects (4 added):
Documentation Update
- languages.md TTS section rewritten, explicitly noting:
- Of the 145 STT locales, 141 are supported on both the STT and TTS sides; 4 (
it-CH,ar-IL,ar-PS,en-GH) are STT-only - Of the 154 TTS locales, 13 are TTS-only (4 zh-CN dialects + 9 other languages)
- Of the 145 STT locales, 141 are supported on both the STT and TTS sides; 4 (
- guides/tts.md numbers updated (142->154 languages, 304->325 voices)
- README.md TTS description updated
Verification Results
Compared against the speech provider's official STT and TTS voice list, VAS is fully aligned:
| Source | STT | TTS locale | TTS voice | Diarization |
|---|---|---|---|---|
| Provider official | 145 | 154 | 325 | 31 |
| VAS | 145 | 154 | 325 | 31 |
Client Recommendations
- Customers using the
enshort code: switch toen-USor another full BCP 47 code. - Customers using
it-CH/ar-IL/ar-PS/en-GHfor TTS: these already failed on the provider side; switch to another locale in the same language family (e.g.,it-CH->it-IT,ar-IL->ar-SA,en-GH->en-NG). STT is unaffected. - Customers who want to use the 13 new TTS-only locales: you can call
GET /api/v1/tts/voices?language=zh-CN-henanetc. directly to get the voice list.
Reference
V1.3.10 (2026-04-30)
Documentation Update
- languages.md number corrections
- Total speech-recognition languages
119->145(aligned with the main STT table) - Speech-translation support
117->143(145 minusjv-IDJavanese andwuu-CNWu Chinese)
- Total speech-recognition languages
This version has a residual issue; see V1.3.11: this version claimed "the bare
enwas removed and the language counts are fully consistent at 145," but the bare"en"was not actually removed (still 146), nor did it handle the TTS-sideit-CH/ar-IL/ar-PS/en-GH(not supported by the provider's TTS) or the 13 missing TTS-only locales. The full alignment fix was completed in V1.3.11.
Client Recommendations
- This version is a documentation-only number correction and does not affect running integrations.
Reference
V1.3.9 (2026-04-29)
New Feature
- Webhook Secret Bootstrap flow: resolves the contradiction where a client cannot obtain the secret on first webhook integration. The Dashboard adds a "Generate Webhook Secret" button (lazy generation), letting users obtain the secret and configure it on the receiving end first, then go back and set the webhook URL. The probe sent when setting the URL is signed with a secret that both sides agree on, so it passes on the first try. This aligns with the mainstream industry pattern of Stripe / Shopify.
- New endpoint:
POST /dashboard/api-keys/{id}/webhook/regenerate-secret(Dashboard only, reuses thewebhook-updaterate limiter, 10/min/user) - Behavior: generates a 64-character random secret and writes it to the DB; does not send a probe and does not touch the webhook URL; returns the plaintext once via a flash session for the Dashboard to display
- Regeneration impact: after execution, the old secret is invalidated immediately; existing receivers will get webhooks with mismatched signatures until they switch to the new secret
- New endpoint:
Behavior Changes
- Clearing the Webhook URL no longer clears the Secret: when
PATCH /dashboard/api-keys/{id}/webhooksetswebhook_urlto null,webhook_secretis left unchanged. The Secret and URL now have independent lifecycles. A customer can generate the secret first and set the URL later; sending an empty URL in the meantime will not lose the secret. - The Dashboard no longer returns webhook_secret in plaintext:
GET /dashboard/api-keys/{id}now returnswebhook_secret_masked(prefix mask + last 4 characters) and ahas_webhook_secretboolean. The plaintext is shown only once via a flash session right after generation (aligned with Stripe).
Documentation Update
- guides/webhook.md: "Method 2: API Key-level webhook_url" rewritten as a two-step flow (generate secret -> set URL); added a Webhook Secret Lifecycle section; added a Bootstrap callout to the security-verification section.
Client Recommendations
- First integration: in the Dashboard, click "Generate Webhook Secret," copy it to the receiving end's
.env, enable HMAC verification, and restart the service, then go back to the Dashboard and enter the webhook URL. - Existing customers: fully compatible, no changes needed. Existing webhook_url and webhook_secret behavior is unchanged.
- Secret rotation: we recommend that the receiving end briefly accept both the old and new secrets; after the dashboard regeneration, remove the old secret once in-flight webhooks have finished processing.
V1.3.8 (2026-04-27)
New Feature
- Translation-service-unavailable detection (session-level): added the error code
translation_service_unavailable. When the LLM translation service fails consecutively up to a threshold, the backend emits a session-level error event once, so the frontend can show a global "translation temporarily unavailable" prompt instead of users seeing a page full of individual failed sentences in gray text.- Trigger conditions:
llm_timeout/llm_provider_error/llm_rate_limit/llm_request_failedescalate after 5 consecutive failuresllm_auth_failed/llm_deployment_not_found/llm_quota_exceededescalate immediately after 1 occurrence (configuration/billing issues)llm_content_filteredis not counted (a content issue, not a service issue)
- Deduplication: each session is notified only once; any successful sentence translation resets the count and can trigger it again
- payload:
type: "error",severity: "error"(not fatal — should not disconnect), does not carrysid,detailscontainsprovider,last_error_code,fail_count - Viewer notification: in broadcast mode, all viewers (regardless of language) also receive this event (via the SSE
event: errorchannel)
- Trigger conditions:
Documentation Update (Spec Sync)
Continuing the spec blind spots surfaced by frontend feedback since V1.3.7+, this pass completes:
- error-codes.md — sentence-level error rule: added a sid-rule paragraph below the "Severity Levels" table, explicitly stating that "when an error carries
sid, regardless ofseverity, it should be treated as a sentence-level error and should not disconnect." Afatal+sidcombination only means that sentence failed severely; the session as a whole can still continue. - error-codes.md —
translation_service_unavailableerror-code registration: added this error code and its full trigger-rule description to the "Translation Service Errors" section - websocket-api.md: added a session-level translation error example (no sid, severity error) to the "Error Message Format" section
- sse-api.md — retranslate section adds the per-sid error rule: explicitly lists the spec and payload format for "a failed sentence is re-emitted as
event: errorwithsid+error_code, interleaved withtranslation" (implemented in V1.3.7 but documented only in the reference subdirectory) - reference/sse/broadcast-viewer.md: added a
translation_service_unavailableexample and a specific error-code entry - reference/websocket/events.md: removed an obsolete translation_error action that the service never actually emitted; translation errors are delivered on the standard error channel (type: "error")
Client Recommendations
- Existing sentence-level error handling (
type: "error"withsid) needs no changes. - If you want to show a global "translation service unavailable" prompt, add a listener: when you receive
error_code === "translation_service_unavailable"(withoutsid), show a banner / toast; clear it once any subsequent sentence translation succeeds (you receive atranslationevent again). - Do not treat
translation_service_unavailableas a disconnect signal — STT (the source text) continues to operate.
Reference:
- Error Code Reference
- WebSocket API – Error Message Format
- SSE API – retranslate Event Format
- reference/sse/retranslate.md
- reference/sse/broadcast-viewer.md – Specific Error Codes
V1.3.7 (2026-04-24)
Behavior Changes
- Realtime recording: silent tasks now follow the normal completion flow: when a realtime recording (WebSocket) is silent throughout, is noise, or cannot recognize any sentence, it now still produces an empty transcript (
entries: []) and ends with atask_completeevent. This behavior aligns with the V1.3.5 file-import flow; the realtime and import sources now share the "zero recognition results is treated as a legitimate completed" semantics. - SSE historical record: no longer returns
sse_transcript_not_foundin silent scenarios:GET /api/v1/sse/history/transcribe/{taskId}no longer returns thesse_transcript_not_founderror for silent tasks; instead it sends the full event sequence (init_metadata → init_summary(text='') → init_done(totalSentences=0)). Clients should usetotalSentences === 0to detect this and show a "no speech content" empty state.
Bug Fixes
- Fixed the History page getting stuck on "processing" for silent recordings: previously, if a realtime recording was silent throughout, the backend skipped the transcript upload due to the
segmentsCount == 0condition, buttask_completestill sent thetask_id, causing the frontend to receivesse_transcript_not_found(semantically "not finished processing") when loading the historical record, leaving the UI stuck on loading forever. After the fix, the realtime path matches the import path and always uploads the transcript (even with empty entries).
Client Recommendations
- If you previously had "retry / polling" handling logic for
sse_transcript_not_found, you may keep it as a defensive fallback (e.g., for blob upload delays), but you should no longer use it to determine "the task has no speech" — switch toinit_done.totalSentences === 0. - We recommend the UI prompt possible reasons when
totalSentences === 0(volume too low, silent throughout, recognition language does not match the audio), consistent with the V1.3.5 import-scenario wording.
Documentation Update
- Historical Record SSE adds a "Boundary scenario: no speech content" section and corrects the handling-recommendation description for
sse_transcript_not_found
Reference:
- Historical Record SSE
- Import Progress SSE
- File Import Guide – Behavior When Audio Cannot Be Recognized
V1.3.6 (2026-04-23)
New Feature
- Tasks API: added
POST /api/v1/tasks/{taskId}/force-fail: force-marks as failed a task stuck in a non-terminal state (recording/importing/uploading/pending/processing)- The body can optionally include
reason(max 500 characters) - Triggers the
recording.failedwebhook, withpayload.failure_sourceset touser_forced - A task already in a terminal state returns
invalid_processing_status(422)
- The body can optionally include
- Tasks API: added
POST /api/v1/tasks/{taskId}/retry: re-queues a task in thefailedstate for processing- Prerequisites:
processing_status = failedandaudio_status = successandtranscript_status = success - Not meeting the prerequisites returns
invalid_processing_status(422); thedetailsfield carriesaudio_status/transcript_statusto help with diagnosis
- Prerequisites:
Behavior Changes
- Error code
invalid_processing_status(422) expanded scope: now also used as the common response forforce-failandretry;detailscarriescurrent_status, and theretryscenario additionally carriesaudio_statusandtranscript_status
Documentation Update
- Tasks API adds documentation for the
force-failandretryendpoints - Error Code Reference adds the
invalid_processing_statusentry and a "Processing Status Mismatch" subsection
Reference:
V1.3.5 (2026-04-22)
Behavior Optimization
- File import: empty recognition-result filtering: audio imports now filter out empty recognition results (caused by silence, very low volume, noise, or a language mismatch), so imports no longer produce empty 00:00 placeholder segments
- Zero recognition results is a legitimate
completedstatus: in this scenario the import task still ends withstatus: completed(notfailed), thetask_idis produced normally, but the subsequently loaded transcriptentriesis an empty array andsegments_countis0 - Budget deducted by actual duration: unrecognizable audio is still deducted from the monthly budget based on the audio duration (no refund)
Client Recommendations
- After loading the transcript (SSE
/api/v1/sse/history/transcribe/{taskId}), if the cumulative sentence count is0, show a "no speech content was recognized in this audio" empty state - Do not treat zero recognition results as an error branch; follow the completion branch and judge by the sentence count
- We recommend the UI also prompt possible reasons (volume too low, silent throughout, recognition language does not match the audio)
Documentation Update
- File Import Guide adds a "Behavior When Audio Cannot Be Recognized" section
- Imports API adds a
completedboundary-scenario note under thestatustransition - Import Progress SSE adds a behavior note for zero recognition results under the
completedevent
Reference:
V1.3.4 (2026-04-22)
New Feature
- Tasks API: added
GET /api/v1/tasks/{taskId}/transcript/export: download a task's transcript, supporting five formats —txt,srt,sbv,vtt,csv- The output includes the source text and all translation languages
- CSV starts with a UTF-8 BOM, with columns
index,start,end,speaker,text,<one column per translation language>and times inHH:MM:SS(no milliseconds) - SRT times are
HH:MM:SS,mmm; SBV times areH:MM:SS.mmm, with the source text and translations joined into a single line with|; VTT uses theWEBVTTheader - The filename uses
{recording name}-transcript.{ext}(RFC 5987 UTF-8 encoded) - Added the error code
recording_transcript_not_ready(422)
Behavior Changes (Breaking)
- Speaker diarization and multi-language mutual exclusion is now a hard rejection: when
recognition_mode: multi_speakeris combined with multipletranscription_languages, it previously emitted a warning and automatically truncated to the first language; it now directly returns thediarization_multilang_conflicterror and refuses to start- The error severity is changed from
warningtoerror - The frontend must restrict "speaker diarization" and "multi-language" to one or the other before the user submits
start, or handle this error and guide the user to adjust the settings - Affected endpoint: WebSocket
voice-translation / start
- The error severity is changed from
Documentation Update
- Tasks API adds the full
transcript/exportspec and output examples for all five formats - Error Code Reference adds
recording_transcript_not_ready - curl, Python, JavaScript examples add a "Task Export" section
- README API reference table Tasks endpoint count updated from 8 to 9
Reference:
- Tasks API - transcript/export
- Error Code Reference
- WebSocket API Reference
- Speaker Diarization Guide
- Voice Translation Guide
V1.3.3 (2026-04-21)
New Documentation
- Tasks API: added the full documentation for the
GET /api/v1/tasks/{taskId}/audio/exportendpoint (the implementation existed but the documentation was missing), including parameters, dynamic Content-Type, error codes, and a frontend download example - Explained the difference between this endpoint and SSE
/api/v1/sse/audio/{taskId}: the former is for offline download (Content-Disposition: attachment), the latter is for playback (supports Range Requests)
Documentation Fixes
- Fixed the Voice Translation Actions translation-mode
speakersfield description table: the field name is corrected fromspeakertoid, consistent with the JSON example and the actual service behavior - Fixed the README API reference table endpoint counts: Tasks from 7 to 8 (added audio/export), Broadcasts from 9 to 6 (the original count was wrong)
Reference:
V1.3.2 (2026-04-07)
Documentation Structure Adjustment
- Removed 3 deprecated old documents (
error-codes.mdV0.6,languages.mdV0.1,authentication.mdV0.1) - Moved
appendix/error-codes.mdandappendix/languages.mdto the root directory, replacing the deprecated versions - Updated all cross-reference links
V1.3.1 (2026-03-26)
Batch Task Management
- Added
PUT /api/v1/tasks/batch/pin: batch-update pin status, max 100 per call - Added
DELETE /api/v1/tasks/batch: batch-delete tasks, max 100 per call - Both endpoints affect only tasks belonging to the current user; the response includes
affected_count
Batch Broadcast Cancellation
- Added
DELETE /api/v1/broadcasts/batch: batch-cancel broadcasts in the PENDING state, max 100 per call - IDs not in the PENDING state are ignored; the response includes
affected_count
Reference:
Version: V1.5.7 Last Updated: 2026-05-20