Appendix

Changelog

V1.5.7 (2026-05-20)

Documentation Update (No API Behavior Changes)

The public API behavior is completely unchanged. This release is a documentation supplement and wording revision.

New Usage Guide: Summary Prompt Customization

Added the Summary Prompt Customization Guide, consolidating the summary customization specs that were previously scattered across 6 reference documents into a single guide:

The mutual-exclusion rules and use cases for the builtin and custom summary modes
The corresponding fields for the three entry points: REST POST /api/v1/summary, the WebSocket start action, and SSE regenerate/summary
Transcript record fields (including the summary_prompt_snapshot audit field and the summary_fallback_level / summary_dropped_segments fallback audit fields)
A Profanity and Sensitive-Word Handling section, integrating the three paths (customer prompt -> neutral mode, transcript -> STT profanity_handling masking, transcript -> summary-layer segment omission) and explicitly stating that the API layer does not proactively reject requests containing sensitive words
The built-in safety guard (content-neutralization guidance, prompt-injection protection) and character-length limits
Complete examples for Node.js, Python, and WebSocket

The "Feature Guides" table in README.md now includes an entry for this guide.

Documentation Wording Revision

Refined public-facing wording: replaced internal/architecture-specific terms with neutral, customer-facing descriptions.

This revision is limited to the new guide guides/summary-customization.md. The remaining reference documents and historical changelog entries are kept as-is and will be aligned in subsequent releases.

Changelog Internal-Numbering Cleanup

Removed an internal task number from the V1.5.4 section heading
Removed an internal-policy reference from the V1.5.4 body text

This release is a documentation proofreading pass; the public API behavior is completely unchanged. If you previously implemented against the older documentation, please adjust to the current spec for the items below.

Token Formats

broadcast_token: a 4-character short code (character set a-z0-9)
viewer_access_token: a 64-character alphanumeric string (not a JWT, no payload structure; do not attempt to parse it)

HTTP Status Codes

sse_missing_target_lang / sse_unsupported_language: 422
broadcast_token_invalid (viewer verify endpoint): 401

Error Code Strings

POST /api/v1/imports insufficient quota: stt_quota_exceeded
Broadcast not found on viewer SSE: broadcast_session_not_found
Broadcast at capacity on viewer SSE: broadcast_capacity_exceeded
The context for the sse_translation_failed error event is sse

WebSocket Event Naming

retranslate success event: action: "translation"
Storage-layer upload failures: delivered via a type: "error" envelope (error_code is storage_upload_failed / storage_connection_failed / storage_queue_full); there is no separate upload_error action

Newly Documented Error Codes

Endpoint / Action	Error Code	Description
WebSocket `set_name`	`set_name_empty` / `set_name_too_long` / `set_name_not_ready`	Replaces the older `name_too_long`
WebSocket `audio`	`audio_process_failed`	STT write fails repeatedly (HTTP 500; reconnecting is recommended)

Reference

V1.5.5 (2026-05-13)

Breaking Change: The Summary API Is Now Mode-Aware

The "template + custom_prompt combined" design introduced in V1.5.4 is now mutually exclusive: on each summary request you must choose either mode=builtin (apply the built-in template) or mode=custom (your prompt fully replaces the built-in template).

Clients must migrate: V1.5.4 clients that do not update their fields will receive a 422.

Unified New Fields Across the Three Entry Points: REST `POST /api/v1/summary`, SSE `regenerate/summary`, and the WebSocket `start` action

Old (V1.5.4) -> New (V1.5.5) mapping:

Old Field	New Field	Notes
`template` / `templateSlug` / `summary_template`	Same name (builtin mode only)	Unchanged, but must not be sent in custom mode
`custom_prompt` / `customPrompt` / `summary_custom_prompt`	`prompt` / `summary_prompt` (custom mode only)	Renamed
`custom_prompt_slug` / `customPromptSlug` / `summary_custom_prompt_slug`	`prompt_slug` / `summary_prompt_slug` (custom mode only)	Renamed
`persist_custom_prompt` / `persistCustomPrompt`	(removed)	Custom mode always snapshots; no opt-in
`custom_instructions`	(removed)	Legacy field, no longer supported
(none)	`mode` / `summary_mode` (required)	New required field, enum `builtin` / `custom`

Mutual-exclusion rules:

mode=builtin: template is required; prompt / prompt_slug must not be sent
mode=custom: prompt / prompt_slug is required; template must not be sent
Violations -> 422 summary_mode_field_mismatch

`GET /api/v1/tasks/` Response Fields

Within data.tasks[]:

Added summary_mode (builtin / custom / null)
summary_template now returns the effective slug (in custom mode it returns your slug, identical to the prompt_slug you submitted)
Removed summary_custom_prompt_slug (merged into summary_template)

Backward compatibility: recordings without a generated summary have summary_mode set to null; existing builtin-mode recordings keep their original summary_template value.

Transcript Record Structure Changes

New top-level fields (not nested under the summary object):

Field	Description
`summary_mode`	builtin / custom
`summary_template`	effective slug — builtin -> the built-in slug; custom -> your slug
`summary_plain_text`	bool
`summary_prompt_snapshot`	Present only in custom mode; the prompt content you passed in verbatim (not written in builtin mode)
`summary_fallback_level`	Present only when a fallback was triggered (value `2` or `3`); indicates that this summary went through an automatic content-filter fallback path. Omitted when the summary succeeds directly
`summary_dropped_segments`	Present only when fallback_level=3; the indices of the transcript segments that were dropped (an array of integers in original order)

In addition to the existing text, the init_summary event of GET /api/v1/sse/history/transcribe/{taskId} now adds mode / template / plain_text / prompt_snapshot (populated only in custom mode) for client traceability, plus fallback_level / dropped_segments (populated only when a fallback was triggered).

New Outbound WebSocket Events

summary_done: summary generation completed (includes summary_mode / summary_template (effective) / summary_plain_text / tokens_used / summary_fallback_level / summary_dropped_segments; does not include final_content)
summary_error: summary generation failed (includes error_code / message)

Clients no longer need to poll the transcript record to determine whether the summary is complete.

Automatic Content-Filter Fallback for Summaries

When a custom-mode prompt or transcript content triggers the LLM service's content filter (finish_reason=content_filter), the system handles it through an automatic multi-step fallback instead of failing outright. If some transcript segments still cannot be processed, they are omitted and reported via summary_dropped_segments. If even the fallback cannot produce a summary, a summary_error event is emitted with error_code=llm_content_filtered.

Client-side handling:

Use summary_fallback_level to show a UI notice indicating the summary was produced through a content-filter fallback path
Use summary_dropped_segments to inform the user which segments were actually omitted

Spec scope: In this release the fallback applies to two paths: WebSocket realtime summaries (auto-generated when a recording ends) and file-import summaries. Fallback integration for the SSE regenerate/summary endpoint is a follow-up; in the current version it still returns llm_content_filtered when blocked.

Custom-Mode Prompt Safety Rule (New in V1.5.5)

The built-in safety guard applied to custom-mode prompts now adds a rule instructing the LLM to "summarize the intent of any colloquial, emotional, or sensitive wording in the source in neutral, objective language, avoiding verbatim quotation or repetition." This rule is enforced by the backend and is not exposed for client configuration; its purpose is to reduce the chance of triggering the content filter on the first attempt.

The safety-guard content is never written to backend storage or logs. Your original prompt is still stored via the summary_prompt_snapshot field as an audit reference, complementing summary_fallback_level:

summary_prompt_snapshot = your intent (the original prompt content)
summary_fallback_level = the actual execution path taken by the automatic fallback

Custom-Mode Prompt-Injection Protection

In custom mode, the backend wraps the prompt you provide with prompt-injection protection to prevent instructions inside your prompt from overriding system rules. You should still avoid concatenating untrusted end-user input directly into prompt.

New Error Codes

Error Code	HTTP	Trigger Condition
`summary_invalid_mode`	422 (SSE) / 400 (others)	`mode` is not `builtin` / `custom`
`summary_mode_field_mismatch`	422 / 400	The mode and field combination is inconsistent (a required field is missing, or a forbidden field was sent)
`summary_prompt_too_long`	422 / 400	`prompt` exceeds 2000 characters
`summary_prompt_slug_too_long`	422 / 400	`prompt_slug` exceeds 64 characters
`summary_prompt_slug_invalid`	422 / 400	`prompt_slug` contains control characters (`\n` / `\r` / `\t` / `\0`, etc.)

Client Recommendations

Add the required mode field — change existing calls using templateSlug=meeting to mode=builtin&template=meeting
Rename fields — customPrompt -> prompt, customPromptSlug -> promptSlug; these two fields are only used in mode=custom
Remove persistCustomPrompt — custom mode preserves the prompt content automatically
Change templateSlug to template — and only use it in mode=builtin
Transcript records now use top-level fields — no longer nested under the summary object
Clients can determine whether a summary was saved from the done event / summary_done event — check persisted: true/false; you no longer need to infer it from the HTTP method

Reference

V1.5.4 (2026-05-12)

New Feature: Customer Prompt Customization for Summaries

Enterprise customers can now add their own rules to the summary API without modifying the built-in template. This release adds three orthogonal client parameters and splits the summary regeneration endpoint into "preview" and "save" verbs, avoiding the design gap of an HTTP GET with side effects.

Fully backward compatible — not sending the new fields = behavior identical to the previous version.

New Fields for POST /api/v1/summary

Field	Type	Limit	Description
`custom_prompt`	string	<=2000 characters	Customer custom instructions appended after the built-in template
`custom_prompt_slug`	string	<=64 characters, Unicode, no control characters	A client-defined template identifier (pass-through)
`plain_text`	bool	Default `false`	Request plain-text output (the backend performs Markdown post-processing)
`persist_custom_prompt`	bool	Default `false`	Opt-in: whether the done event echoes the `custom_prompt` content

The SSE start / done events also add the corresponding fields (custom_prompt_slug, plain_text, final_content, custom_prompt_snapshot); see reference/rest/summary.md.

`/api/v1/sse/regenerate/summary/{taskId}` Split Into Two Endpoints

Method	Purpose	Writes DB	Saves Transcript	Billed
GET	Preview (dry run, compare different prompt results)	No	No	Yes
POST	Save (official persistence)	Yes	Yes + bumps `revision`	Yes

Client recommendation: If your integration previously relied on "the backend record updating automatically after a GET," switch to POST. GET is now a pure preview and no longer writes any backend state.
The done event adds a persisted: bool field, so clients can determine directly from the payload whether this call was saved, without inferring from the HTTP method.

Four New Fields for the WebSocket `start` Action

summary_custom_prompt / summary_custom_prompt_slug / summary_plain_text / summary_persist_custom_prompt, mapping one-to-one to the REST endpoint fields with the same limits.

New Endpoint: GET /api/v1/summary-templates/{slug}

Exposes the built-in template's full content so enterprise customers can reference the existing baseline when integrating and then decide what to add via custom_prompt.

GET /api/v1/summary-templates also adds a ?category=summary|medical|legal|all filter and a data[].category field in the response (default summary, backward compatible).

New Error Codes

Error Code	HTTP	Trigger Condition
`custom_prompt_too_long`	400	`custom_prompt` exceeds 2000 characters
`custom_prompt_slug_too_long`	400	`custom_prompt_slug` exceeds 64 characters
`custom_prompt_slug_invalid`	400	`custom_prompt_slug` contains control characters
`template_not_found`	404	The template for the specified slug does not exist or is disabled
`invalid_category`	400	`?category=` is not in the allowlist

Behavior Changes

summary_text_empty / summary_text_too_long HTTP status code fix: these previously fell through to 500 because they were not explicitly mapped; this release fixes them to a semantically correct 400.
The POST /api/v1/summary error event details no longer includes the LLM raw error: the raw error goes only to the server log; the details returned to the client retains only the provider indicator.
GET preview is still billed: the LLM actually consumes tokens, so the GET endpoint cannot be free. Repeated GET calls are billed repeatedly, but they do not change DB / Blob state.

Path and Field Naming Conventions

customPromptSlug is a customer-defined pass-through identifier (semantically different from the existing templateSlug, which is validated for existence). In naming terms, the former is "for client traceability" and the latter is "for looking up the VAS built-in template."
summary_custom_prompt_slug is recorded with each summary, so you can later query which customer template a summary corresponds to.
custom_prompt_snapshot (opt-in) is stored in the transcript record only when the customer sets persist_custom_prompt=true; it is never written to the DB.

Security Controls

All endpoints require API Key authentication
The VAS server log does not log custom_prompt or the full transcript (it logs only the length and slug)
LLM error messages are sanitized (the raw error is not exposed to the client)
custom_prompt is fully isolated across tenants (session-scoped, no memory persistence)

Bug Fixes and Internal Improvements

The WebSocket start action recording_id field deprecation target version is unified to V2.0.0 (events.md previously said V1.6.0, inconsistent with code comments)
The SSE sse-api.md broken TOC anchor is fixed (it pointed to the audio section, but that content has been moved to the standalone reference/sse/audio.md)
Improved text sanitization so CJK characters and emoji (including ZWJ sequences) are no longer mis-split or wrongly rejected
The summary regeneration full text now has a 100,000-character upper limit

Reference

reference/rest/summary.md (new)
reference/rest/summary-templates.md (added ?category= and GET /{slug})
reference/sse/regenerate-summary.md (rewritten as a two-endpoint GET / POST spec)
reference/websocket/voice-translation.md (added the 4 summary_custom_prompt* fields + 3 error codes)

V1.5.3 (2026-05-07)

Breaking Change: `speaker_id` Naming Inversion

To support speaker editing, V1.3.12 added the original_speaker_id field to preserve the original ID, but it left a design gap where "the same name means different things at different stages": for WebSocket realtime recording, speaker_id is the original ID (e.g., Guest-1), but after an SSE historical audio load, speaker_id becomes the display name (e.g., Manager Wang, with the alias applied). Frontends often picked the wrong field and passed it to PATCH /speakers/reassign.

This release performs a one-time inversion that is not backward compatible:

Old Name	New Name	Semantics
`speaker_id` (display name)	`speaker_label`	Display label (after alias is applied; mutable, human-readable)
`original_speaker_id` (original ID)	`speaker_id`	Original speaker ID (immutable, always stable)

After the inversion, speaker_id consistently refers to the original ID in all contexts (WebSocket / SSE / REST / blob / log); the new speaker_label represents the display label after the alias is applied. Speaker editing (rename / reassign / merge) always uses speaker_id as the locating key.

REST API Field Changes

`PATCH /api/v1/tasks/{taskId}/speakers/rename`

Location	Old Field	New Field
Request body	`original_name`	`speaker_id` (max 100 characters)
Request body	`new_name`	`new_label` (max 100 characters, no control characters `\x00-\x1F` / `\x7F` or newlines)
Response data	`original_name`	`speaker_id`
Response data	`new_name`	`new_label`

speaker_id can still also accept a display label for chained renaming (e.g., first rename Guest-1 to "Manager Wang," then use "Manager Wang" to rename to "Director Wang"); the resolved response speaker_id is always the original ID.

`PATCH /api/v1/tasks/{taskId}/speakers/reassign`

Location	Old Field	New Field
Request body	`target_speaker_id`	Unchanged (semantics already aligned to the original ID)
Response data	`new_speaker_name`	`new_speaker_label`

target_speaker_id must be the original ID (taken from init_sentence.speaker_id); reassign does not accept a display label.

`PATCH /api/v1/tasks/{taskId}/speakers/merge`

Location	Old Field	New Field
Request body	`source_speaker_id` / `target_speaker_id`	Unchanged (still accepts the original ID or the current display label)
Response data	`target_speaker_name`	`target_speaker_label`

WebSocket Event Changes

Event	Old Field	New Field
`rename_speaker` action body	`original_name` / `new_name`	`speaker_id` / `new_label`
`result` event `origin` / `translations[lang]`	only `speaker_id` (mixed with display name)	`speaker_id` (original ID) + `speaker_label` (display label)
`speaker_renamed` event	`original_name` / `new_name`	`speaker_id` / `new_label`
`speaker_reassigned` event	`new_speaker_name`	`new_speaker_label`
`speakers_merged` event	(missing target label)	added `target_speaker_label`

SSE Event Changes

Event	Old Field	New Field
`init_sentence`	`speaker_id` (display name) + `original_speaker_id` (original ID)	`speaker_id` (original ID) + `speaker_label` (display label)
Broadcast viewer `origin` / `translation`	only `speaker_id` (mixed)	`speaker_id` + `speaker_label`
Broadcast viewer `speaker_renamed` / `speaker_reassigned` / `speakers_merged`	same as the corresponding WebSocket events	as above

The behavior and fields of init_metadata.speaker_aliases (the "original ID -> display label" mapping) are unchanged.

Client Recommendations

Customers using WebSocket realtime recording: before upgrading, sync the handling of result.origin.speaker_id and the new result.origin.speaker_label; change the rename body to { "speaker_id": "...", "new_label": "..." }
Customers using SSE historical audio: init_sentence.speaker_id is now the original ID (previously the display name); switch to speaker_label for display
Customers doing speaker editing (rename / reassign / merge):
- rename -> use speaker_id (either the original ID or the current display label) + new_label
- reassign -> target_speaker_id must be the original ID (taken from init_sentence.speaker_id; you cannot send a display label)
- merge -> source_speaker_id / target_speaker_id can still be the original ID or the current display label
Customers integrating TXT/SRT/CSV export: new_label now has control-character/newline validation; if you previously sent labels containing newlines, you will now receive a 422, so change to single-line content
Customers who do not do speaker editing and only consume transcript text: the impact is minimal; the only behavior difference is that if old code rendered speaker_id directly as the display name, it must switch to speaker_label

Data Compatibility

Not backward compatible: old transcript blobs (V1.3.12 ~ V1.5.1, containing speaker + original_speaker_id) require migration before they can be read in the new version; there is no cross-version data retention commitment during the POC phase
New recordings are unaffected: transcript blobs created after V1.5.3 use the new fields directly

Documentation Update

reference/rest/speakers.md: the body / response of all three endpoints (rename / reassign / merge) are fully aligned
rest-api.md L2280–2455: the speakers summary is aligned to the new fields
reference/websocket/voice-translation.md L975–1127: the rename_speaker / reassign_speaker / merge_speakers actions
reference/websocket/events.md L405–495: the speaker_renamed / reassigned / merged events
websocket-api.md L1310–2552: both rename / reassign / merge sections (actions first, events second) are aligned
reference/sse/history.md L135–195: the init_sentence schema + client recommendations
reference/sse/broadcast-viewer.md L300–365, L605–630: viewer broadcast events + JS examples
sse-api.md L240–475, L750–795: broadcast origin/translation + history init_sentence schema
guides/speaker-management.md L140–390: examples + JS handler
examples/curl.md, examples/python.md, examples/javascript.md: all rename / reassign / merge examples + TS interface

Reference

V1.5.1 (2026-05-07)

Bug Fix: `POST /api/v1/imports` Adds Length Validation for Terminology / Correction Fields

The length limits promised in several places in the documentation (e.g., a term's max of 100 characters) were previously not actually enforced on the file-import path, and overly long content was silently accepted. This release restores them, aligning behavior with the documentation's promises.

Behavior Changes (Aligning With Documented Promises)

POST /api/v1/imports adds 422 rejection conditions for the following fields (previously accepted):

Field	Limit
`terminology.<lang>`	Array, max 500 terms (per language)
`terminology.<lang>[].term`	string, max 100 characters
`terminology.<lang>[].boost`	numeric, 0.5–5.0 (optional, default 1.0)
`fuzzy_correction.<lang>[].correct`	string, max 200 characters
`fuzzy_correction.<lang>[].incorrect[]`	string, max 200 characters

These limits are consistent with the WebSocket config action; previously only the WebSocket path enforced them, and this release completes the file-import path.

Client Recommendations

If you previously sent overly long terms (>100 characters) via POST /api/v1/imports, you will now receive a 422. The frontend should check the length before submitting and prompt the user. The WebSocket path is unchanged.

V1.5.0 (2026-05-07)

Internal Naming Unification (No Public API Changes)

Continuing the task_id naming unification started in V1.4.1, this release completes the transition of the internal protocol layer.

The public API (WebSocket, REST, Webhook, SSE) is completely unchanged, and customers need to take no action.

The old naming (recording_id) will be fully removed in V1.6.0; for the related client migration guidance, see V1.4.1 Client Recommendations.

V1.4.3 (2026-05-07)

Internal Observability-Layer Naming Unification (No Public API Changes)

Following V1.4.1, this release performs a naming-unification transition for the log and monitoring layers.

The public API is completely unchanged, and customers need to take no action.

V1.4.2 (2026-05-07)

Internal Code Naming Unification (No Public API Changes)

Following V1.4.1, this release advances the task_id naming to the backend code level.

The public API is completely unchanged, and customers need to take no action.

V1.4.1 (2026-05-06)

Naming Unification: `task_id` as the Cross-Interface Task Identifier

Previously, the same task had different field names across interfaces (WebSocket used recording_id, Webhook used task_id, and some REST path variables mixed {recordingId} / {taskId}), forcing integrators to reconcile the three naming schemes themselves. This release starts the naming-unification cycle; new integrations should use task_id consistently.

WebSocket Changes (Backward Compatible)

The session_started event payload now carries both task_id and recording_id, and their values are exactly the same (the UUID of the same recording)
The recording_id field is marked as Deprecated; it is still emitted normally and is scheduled for removal in V1.6.0
Documentation enhancement: session_id is the WS connection-level identifier (invalidated when the connection ends), which is a different level from task_id (the task identifier)

REST API Changes (Backward Compatible)

Added /api/v1/tasks/{taskId}/... alias paths that behave exactly the same as the existing /api/v1/recordings/{recordingId}/...:

Recommended (from V1.4.1)	Deprecated (removed in V1.6.0)
`PATCH /api/v1/tasks/{taskId}/speakers/rename`	`PATCH /api/v1/recordings/{recordingId}/speakers/rename`
`PATCH /api/v1/tasks/{taskId}/speakers/reassign`	`PATCH /api/v1/recordings/{recordingId}/speakers/reassign`
`PATCH /api/v1/tasks/{taskId}/entries/{sid}`	`PATCH /api/v1/recordings/{recordingId}/entries/{sid}`

Client Recommendations

New integrations: use the task_id field and the /api/v1/tasks/{taskId}/... paths consistently to avoid migrating again later
Existing integrations: no immediate change required. recording_id and /api/v1/recordings/... remain available throughout the V1.x period; we recommend migrating on your schedule, at the latest before V1.6.0 ships
ID alignment logic: if you depend on both WS and Webhook, you can align the WS task_id (or the old name recording_id) directly with the Webhook data.task_id; all three are the same UUID
Do not use session_id for alignment: session_id is meaningful only within the WS connection lifecycle and does not appear in Webhook or REST

Removal Timeline Announcement (V1.6.0)

V1.6.0 will remove the recording_id field from the WS payload and remove the /api/v1/recordings/{recordingId}/... paths. The detailed timeline will be announced separately before V1.6.0 ships.

Unchanged Items

Webhook payload: the existing data.task_id naming is unchanged
Existing /api/v1/tasks/{taskId}/... endpoints: unchanged

V1.4.0 (2026-05-06)

New Feature: Source-Text Editing for Historical Recordings + Automatic Retranslation

Users can correct STT recognition errors and regenerate translations; for the workflow, see Entries API Typical Workflow.

New endpoint PATCH /api/v1/recordings/{recordingId}/entries/{sid}: edit a single sentence's source text; on the first edit it automatically backs up the original STT output to original_text_raw, records original_text_edited_at, and clears the TTS cache for all languages of that sentence
New endpoint GET /api/v1/sse/recordings/{taskId}/entries/{sid}/retranslate: retranslate a single sentence (you can specify languages or retranslate all existing languages), with optimistic locking (expectedRevision)
Editing and retranslation are decoupled: PATCH only changes the source text and does not touch the translation; the frontend can decide when to trigger retranslation

Historical Record SSE Exposes Edit Markers

The historyTranscribe init_sentence event carries original_text_raw (the STT original) and original_text_edited_at on edited sentences, so the frontend can show an "edited" marker and a "restore original" function.

Security Fixes

retranslate / retranslateSummary add a user filter: these two existing SSE endpoints previously had a horizontal privilege vulnerability (IDOR) that allowed reading other users' recordings. This release adds the permission check; other users' recordings now return recording_not_found.
Retranslation / summary regeneration requires the recording to be completed: the four endpoints retranslate / retranslateSummary / retranslateEntry / regenerateSummary require processing_status === completed to avoid racing with the in-progress flow. When not completed, they return recording_not_completed.

New Error Codes

Error Code	HTTP	Description
`recording_not_completed`	422	The recording has not finished processing; retranslation / editing / summary regeneration is not allowed
`entry_not_found`	404	The specified sentence was not found
`entry_text_empty`	422	The sentence's source text is empty
`entry_text_too_long`	422	The sentence's source text exceeds the 2000-character limit
`transcript_revision_conflict`	409	The transcript has been modified by another request (optimistic-lock conflict)

See error-codes.md.

Client Recommendations

After editing the STT source text: we recommend triggering single-sentence retranslation SSE immediately after the PATCH, passing the revision from the PATCH response as expectedRevision to avoid concurrent overwrites
Showing the edit marker: determine whether a sentence has been edited by the presence of the original_text_raw field in the init_sentence event ('original_text_raw' in data); do not use text comparison (the user may edit and then change it back to the original value)
Recording status: calling retranslation / editing / summary regeneration on a recording that is not completed returns recording_not_completed; the frontend should block these operations in the UI until processing_status === completed

V1.3.13 (2026-05-06)

Behavior Changes (Breaking Changes)

WebSocket audio_format locked to pcm and webm: the previously accepted 5 formats (pcm / webm / mp3 / wav / m4a) are narrowed to accepting only pcm and webm, consistent with the existing spec in reference/websocket/voice-translation.md. Customers who send mp3 / wav / m4a will now receive audio_format_unsupported (previously these were silently decoded, which was undocumented implicit behavior). File imports still go through POST /api/v1/imports and are unaffected.

Documentation Update

Audio download Content-Type is always audio/mp4: rest-api / SSE audio / tasks export / history playback / curl / javascript documentation in several places is unified to "all recording audio is returned in an M4A container (AAC encoding)," removing the previous circular "dynamically determined" description.
Supported file-import formats narrowed to mp3 / wav / m4a: removed mentions of mp4 and webm from the documentation to align with the formats actually accepted (guides/file-import.md, reference/rest/imports.md).

Client Recommendations

Customers using the WebSocket start action: be sure to explicitly specify audio_format as pcm or webm; if you previously relied on the undocumented implicit mp3 / wav / m4a support (very rare scenarios), switch to the File Import API.
Customers downloading recording audio: all new recordings have Content-Type fixed to audio/mp4 with the .m4a extension. If older recordings still exist in storage, downloads may still return audio/webm; we recommend keeping a handling branch for the old extension to cover historical data.

Reference

V1.3.12 (2026-05-04)

⚠️ Inverted in V1.5.3: the original_speaker_id field and the "speaker_id is the display name" design introduced in this version have been superseded by the naming inversion in V1.5.3. This section is kept as a historical record; new integrations should refer directly to the V1.5.3 spec and do not need to implement this version's client recommendations.

New Feature

History SSE adds fields to align with the Transcribe speaker-editing UX: the historical record's init_metadata and init_sentence events each add a field, allowing the frontend to fully reuse the realtime recording page's speaker-editing menu (single-sentence reassignment + global rename).
- init_metadata adds speaker_aliases (object): the "original speaker ID -> display name" mapping. When there are no aliases it is {} (an empty object, not an empty array). It lets the frontend perform a name-collision precheck before sending PATCH /speakers/rename, covering the implicit conflict of "an original ID that exists on the backend but does not appear on screen because it was renamed."
- init_sentence adds original_speaker_id (string|null): the original speaker identifier without alias substitution, provided as the source for the target_speaker_id of PATCH /speakers/reassign.
- Old-data fallback: if a pre-v2.24.0 old transcript record has no original_speaker_id, the SSE output automatically falls back to speaker_id, preventing the new field from being null and disabling the editing entry point for old recordings.

Behavior Changes

No breaking change. Both fields are pure additions; existing SSE clients using Zod z.object (which strips extras by default) will not fail to parse, so no version negotiation is needed.

Documentation Update

sse-api.md L156-198: added the new field descriptions to the init_metadata / init_sentence examples and field tables
reference/sse/history.md L103-180: added the detailed reference schema accordingly

Client Recommendations

Customers doing speaker editing on the history detail page: get the original ID for reassign from init_sentence.original_speaker_id (do not use speaker_id, which is the display name with the alias applied); use init_metadata.speaker_aliases for the name-collision precheck before a rename.
Customers who do not do speaker editing: you can ignore the new fields; existing parsing behavior is unaffected.

STT rejects the bare en code (the V1.3.10 changelog claimed it was removed, but it was not actually in effect): customers who send en will receive a 422 invalid_transcription_language; use a full BCP 47 code such as en-US / en-GB instead.
TTS removes 4 locales not supported by the speech provider: it-CH, ar-IL, ar-PS, en-GH. The speech provider's TTS never supported these 4 locales; previously, customers requesting their voices would fail at runtime on the provider side. STT still supports these 4 locales.

New Feature

TTS completed to the speech provider's full set (154 languages, 325 voices): fully aligned with the speech provider's Monolingual Neural Voice list (including GA + Preview)
- Chinese dialects (4 added): zh-CN-henan, zh-CN-guangxi, zh-CN-liaoning, zh-CN-shaanxi
- South Asian languages (5 added): bn-BD Bengali (Bangladesh), ta-LK Tamil (Sri Lanka), ta-MY Tamil (Malaysia), ta-SG Tamil (Singapore), ur-PK Urdu (Pakistan)
- Southeast Asian languages (1 added): su-ID Sundanese (Indonesia)
- Eastern European languages (1 added): sr-Latn-RS Serbian (Latin script)
- North American indigenous languages (2 added): iu-Cans-CA Inuktitut (Canadian syllabics), iu-Latn-CA Inuktitut (Canadian Latin script)

Documentation Update

languages.md TTS section rewritten, explicitly noting:
- Of the 145 STT locales, 141 are supported on both the STT and TTS sides; 4 (it-CH, ar-IL, ar-PS, en-GH) are STT-only
- Of the 154 TTS locales, 13 are TTS-only (4 zh-CN dialects + 9 other languages)
guides/tts.md numbers updated (142->154 languages, 304->325 voices)
README.md TTS description updated

Verification Results

Compared against the speech provider's official STT and TTS voice list, VAS is fully aligned:

Source	STT	TTS locale	TTS voice	Diarization
Provider official	145	154	325	31
VAS	145	154	325	31

Client Recommendations

Customers using the en short code: switch to en-US or another full BCP 47 code.
Customers using it-CH/ar-IL/ar-PS/en-GH for TTS: these already failed on the provider side; switch to another locale in the same language family (e.g., it-CH -> it-IT, ar-IL -> ar-SA, en-GH -> en-NG). STT is unaffected.
Customers who want to use the 13 new TTS-only locales: you can call GET /api/v1/tts/voices?language=zh-CN-henan etc. directly to get the voice list.

languages.md number corrections
- Total speech-recognition languages 119 -> 145 (aligned with the main STT table)
- Speech-translation support 117 -> 143 (145 minus jv-ID Javanese and wuu-CN Wu Chinese)

This version has a residual issue; see V1.3.11: this version claimed "the bare en was removed and the language counts are fully consistent at 145," but the bare "en" was not actually removed (still 146), nor did it handle the TTS-side it-CH/ar-IL/ar-PS/en-GH (not supported by the provider's TTS) or the 13 missing TTS-only locales. The full alignment fix was completed in V1.3.11.

Client Recommendations

This version is a documentation-only number correction and does not affect running integrations.

Reference

Language List languages.md

V1.3.9 (2026-04-29)

New Feature

Webhook Secret Bootstrap flow: resolves the contradiction where a client cannot obtain the secret on first webhook integration. The Dashboard adds a "Generate Webhook Secret" button (lazy generation), letting users obtain the secret and configure it on the receiving end first, then go back and set the webhook URL. The probe sent when setting the URL is signed with a secret that both sides agree on, so it passes on the first try. This aligns with the mainstream industry pattern of Stripe / Shopify.
- New endpoint: POST /dashboard/api-keys/{id}/webhook/regenerate-secret (Dashboard only, reuses the webhook-update rate limiter, 10/min/user)
- Behavior: generates a 64-character random secret and writes it to the DB; does not send a probe and does not touch the webhook URL; returns the plaintext once via a flash session for the Dashboard to display
- Regeneration impact: after execution, the old secret is invalidated immediately; existing receivers will get webhooks with mismatched signatures until they switch to the new secret

Behavior Changes

Clearing the Webhook URL no longer clears the Secret: when PATCH /dashboard/api-keys/{id}/webhook sets webhook_url to null, webhook_secret is left unchanged. The Secret and URL now have independent lifecycles. A customer can generate the secret first and set the URL later; sending an empty URL in the meantime will not lose the secret.
The Dashboard no longer returns webhook_secret in plaintext: GET /dashboard/api-keys/{id} now returns webhook_secret_masked (prefix mask + last 4 characters) and a has_webhook_secret boolean. The plaintext is shown only once via a flash session right after generation (aligned with Stripe).

Documentation Update

guides/webhook.md: "Method 2: API Key-level webhook_url" rewritten as a two-step flow (generate secret -> set URL); added a Webhook Secret Lifecycle section; added a Bootstrap callout to the security-verification section.

Client Recommendations

First integration: in the Dashboard, click "Generate Webhook Secret," copy it to the receiving end's .env, enable HMAC verification, and restart the service, then go back to the Dashboard and enter the webhook URL.
Existing customers: fully compatible, no changes needed. Existing webhook_url and webhook_secret behavior is unchanged.
Secret rotation: we recommend that the receiving end briefly accept both the old and new secrets; after the dashboard regeneration, remove the old secret once in-flight webhooks have finished processing.

V1.3.8 (2026-04-27)

New Feature

Translation-service-unavailable detection (session-level): added the error code translation_service_unavailable. When the LLM translation service fails consecutively up to a threshold, the backend emits a session-level error event once, so the frontend can show a global "translation temporarily unavailable" prompt instead of users seeing a page full of individual failed sentences in gray text.
- Trigger conditions:
  - llm_timeout / llm_provider_error / llm_rate_limit / llm_request_failed escalate after 5 consecutive failures
  - llm_auth_failed / llm_deployment_not_found / llm_quota_exceeded escalate immediately after 1 occurrence (configuration/billing issues)
  - llm_content_filtered is not counted (a content issue, not a service issue)
- Deduplication: each session is notified only once; any successful sentence translation resets the count and can trigger it again
- payload: type: "error", severity: "error" (not fatal — should not disconnect), does not carry sid, details contains provider, last_error_code, fail_count
- Viewer notification: in broadcast mode, all viewers (regardless of language) also receive this event (via the SSE event: error channel)

Documentation Update (Spec Sync)

Continuing the spec blind spots surfaced by frontend feedback since V1.3.7+, this pass completes:

error-codes.md — sentence-level error rule: added a sid-rule paragraph below the "Severity Levels" table, explicitly stating that "when an error carries sid, regardless of severity, it should be treated as a sentence-level error and should not disconnect." A fatal + sid combination only means that sentence failed severely; the session as a whole can still continue.
error-codes.md — translation_service_unavailable error-code registration: added this error code and its full trigger-rule description to the "Translation Service Errors" section
websocket-api.md: added a session-level translation error example (no sid, severity error) to the "Error Message Format" section
sse-api.md — retranslate section adds the per-sid error rule: explicitly lists the spec and payload format for "a failed sentence is re-emitted as event: error with sid + error_code, interleaved with translation" (implemented in V1.3.7 but documented only in the reference subdirectory)
reference/sse/broadcast-viewer.md: added a translation_service_unavailable example and a specific error-code entry
reference/websocket/events.md: removed an obsolete translation_error action that the service never actually emitted; translation errors are delivered on the standard error channel (type: "error")

Client Recommendations

Existing sentence-level error handling (type: "error" with sid) needs no changes.
If you want to show a global "translation service unavailable" prompt, add a listener: when you receive error_code === "translation_service_unavailable" (without sid), show a banner / toast; clear it once any subsequent sentence translation succeeds (you receive a translation event again).
Do not treat translation_service_unavailable as a disconnect signal — STT (the source text) continues to operate.

Reference:

V1.3.7 (2026-04-24)

Behavior Changes

Realtime recording: silent tasks now follow the normal completion flow: when a realtime recording (WebSocket) is silent throughout, is noise, or cannot recognize any sentence, it now still produces an empty transcript (entries: []) and ends with a task_complete event. This behavior aligns with the V1.3.5 file-import flow; the realtime and import sources now share the "zero recognition results is treated as a legitimate completed" semantics.
SSE historical record: no longer returns sse_transcript_not_found in silent scenarios: GET /api/v1/sse/history/transcribe/{taskId} no longer returns the sse_transcript_not_found error for silent tasks; instead it sends the full event sequence (init_metadata → init_summary(text='') → init_done(totalSentences=0)). Clients should use totalSentences === 0 to detect this and show a "no speech content" empty state.

Bug Fixes

Fixed the History page getting stuck on "processing" for silent recordings: previously, if a realtime recording was silent throughout, the backend skipped the transcript upload due to the segmentsCount == 0 condition, but task_complete still sent the task_id, causing the frontend to receive sse_transcript_not_found (semantically "not finished processing") when loading the historical record, leaving the UI stuck on loading forever. After the fix, the realtime path matches the import path and always uploads the transcript (even with empty entries).

Client Recommendations

If you previously had "retry / polling" handling logic for sse_transcript_not_found, you may keep it as a defensive fallback (e.g., for blob upload delays), but you should no longer use it to determine "the task has no speech" — switch to init_done.totalSentences === 0.
We recommend the UI prompt possible reasons when totalSentences === 0 (volume too low, silent throughout, recognition language does not match the audio), consistent with the V1.3.5 import-scenario wording.

Documentation Update

Historical Record SSE adds a "Boundary scenario: no speech content" section and corrects the handling-recommendation description for sse_transcript_not_found

Reference:

V1.3.6 (2026-04-23)

New Feature

Tasks API: added POST /api/v1/tasks/{taskId}/force-fail: force-marks as failed a task stuck in a non-terminal state (recording / importing / uploading / pending / processing)
- The body can optionally include reason (max 500 characters)
- Triggers the recording.failed webhook, with payload.failure_source set to user_forced
- A task already in a terminal state returns invalid_processing_status (422)
Tasks API: added POST /api/v1/tasks/{taskId}/retry: re-queues a task in the failed state for processing
- Prerequisites: processing_status = failed and audio_status = success and transcript_status = success
- Not meeting the prerequisites returns invalid_processing_status (422); the details field carries audio_status / transcript_status to help with diagnosis

Behavior Changes

Error code invalid_processing_status (422) expanded scope: now also used as the common response for force-fail and retry; details carries current_status, and the retry scenario additionally carries audio_status and transcript_status

Documentation Update

Tasks API adds documentation for the force-fail and retry endpoints
Error Code Reference adds the invalid_processing_status entry and a "Processing Status Mismatch" subsection

Reference:

V1.3.5 (2026-04-22)

Behavior Optimization

File import: empty recognition-result filtering: audio imports now filter out empty recognition results (caused by silence, very low volume, noise, or a language mismatch), so imports no longer produce empty 00:00 placeholder segments
Zero recognition results is a legitimate completed status: in this scenario the import task still ends with status: completed (not failed), the task_id is produced normally, but the subsequently loaded transcript entries is an empty array and segments_count is 0
Budget deducted by actual duration: unrecognizable audio is still deducted from the monthly budget based on the audio duration (no refund)

Client Recommendations

After loading the transcript (SSE /api/v1/sse/history/transcribe/{taskId}), if the cumulative sentence count is 0, show a "no speech content was recognized in this audio" empty state
Do not treat zero recognition results as an error branch; follow the completion branch and judge by the sentence count
We recommend the UI also prompt possible reasons (volume too low, silent throughout, recognition language does not match the audio)

Documentation Update

File Import Guide adds a "Behavior When Audio Cannot Be Recognized" section
Imports API adds a completed boundary-scenario note under the status transition
Import Progress SSE adds a behavior note for zero recognition results under the completed event

Reference:

V1.3.4 (2026-04-22)

New Feature

Tasks API: added GET /api/v1/tasks/{taskId}/transcript/export: download a task's transcript, supporting five formats — txt, srt, sbv, vtt, csv
- The output includes the source text and all translation languages
- CSV starts with a UTF-8 BOM, with columns index,start,end,speaker,text,<one column per translation language> and times in HH:MM:SS (no milliseconds)
- SRT times are HH:MM:SS,mmm; SBV times are H:MM:SS.mmm, with the source text and translations joined into a single line with |; VTT uses the WEBVTT header
- The filename uses {recording name}-transcript.{ext} (RFC 5987 UTF-8 encoded)
- Added the error code recording_transcript_not_ready (422)

Behavior Changes (Breaking)

Speaker diarization and multi-language mutual exclusion is now a hard rejection: when recognition_mode: multi_speaker is combined with multiple transcription_languages, it previously emitted a warning and automatically truncated to the first language; it now directly returns the diarization_multilang_conflict error and refuses to start
- The error severity is changed from warning to error
- The frontend must restrict "speaker diarization" and "multi-language" to one or the other before the user submits start, or handle this error and guide the user to adjust the settings
- Affected endpoint: WebSocket voice-translation / start

Documentation Update

Tasks API adds the full transcript/export spec and output examples for all five formats
Error Code Reference adds recording_transcript_not_ready
curl, Python, JavaScript examples add a "Task Export" section
README API reference table Tasks endpoint count updated from 8 to 9

Reference:

V1.3.3 (2026-04-21)

New Documentation

Tasks API: added the full documentation for the GET /api/v1/tasks/{taskId}/audio/export endpoint (the implementation existed but the documentation was missing), including parameters, dynamic Content-Type, error codes, and a frontend download example
Explained the difference between this endpoint and SSE /api/v1/sse/audio/{taskId}: the former is for offline download (Content-Disposition: attachment), the latter is for playback (supports Range Requests)

Documentation Fixes

Fixed the Voice Translation Actions translation-mode speakers field description table: the field name is corrected from speaker to id, consistent with the JSON example and the actual service behavior
Fixed the README API reference table endpoint counts: Tasks from 7 to 8 (added audio/export), Broadcasts from 9 to 6 (the original count was wrong)

Reference:

V1.3.2 (2026-04-07)

Documentation Structure Adjustment

Removed 3 deprecated old documents (error-codes.md V0.6, languages.md V0.1, authentication.md V0.1)
Moved appendix/error-codes.md and appendix/languages.md to the root directory, replacing the deprecated versions
Updated all cross-reference links

V1.3.1 (2026-03-26)

Batch Task Management

Added PUT /api/v1/tasks/batch/pin: batch-update pin status, max 100 per call
Added DELETE /api/v1/tasks/batch: batch-delete tasks, max 100 per call
Both endpoints affect only tasks belonging to the current user; the response includes affected_count

Batch Broadcast Cancellation

Added DELETE /api/v1/broadcasts/batch: batch-cancel broadcasts in the PENDING state, max 100 per call
IDs not in the PENDING state are ignored; the response includes affected_count

Reference:

Version: V1.5.7 Last Updated: 2026-05-20

Python

Error Codes