Appendix

Changelog

V1.5.7 (2026-05-20)

Documentation Update (No API Behavior Changes)

The public API behavior is completely unchanged. This release is a documentation supplement and wording revision.

New Usage Guide: Summary Prompt Customization

Added the Summary Prompt Customization Guide, consolidating the summary customization specs that were previously scattered across 6 reference documents into a single guide:

  • The mutual-exclusion rules and use cases for the builtin and custom summary modes
  • The corresponding fields for the three entry points: REST POST /api/v1/summary, the WebSocket start action, and SSE regenerate/summary
  • Transcript record fields (including the summary_prompt_snapshot audit field and the summary_fallback_level / summary_dropped_segments fallback audit fields)
  • A Profanity and Sensitive-Word Handling section, integrating the three paths (customer prompt -> neutral mode, transcript -> STT profanity_handling masking, transcript -> summary-layer segment omission) and explicitly stating that the API layer does not proactively reject requests containing sensitive words
  • The built-in safety guard (content-neutralization guidance, prompt-injection protection) and character-length limits
  • Complete examples for Node.js, Python, and WebSocket

The "Feature Guides" table in README.md now includes an entry for this guide.

Documentation Wording Revision

Refined public-facing wording: replaced internal/architecture-specific terms with neutral, customer-facing descriptions.

This revision is limited to the new guide guides/summary-customization.md. The remaining reference documents and historical changelog entries are kept as-is and will be aligned in subsequent releases.

Changelog Internal-Numbering Cleanup

  • Removed an internal task number from the V1.5.4 section heading
  • Removed an internal-policy reference from the V1.5.4 body text

Reference


V1.5.6 (2026-05-19)

Documentation Alignment Fixes (No API Behavior Changes)

This release is a documentation proofreading pass; the public API behavior is completely unchanged. If you previously implemented against the older documentation, please adjust to the current spec for the items below.

Token Formats

  • broadcast_token: a 4-character short code (character set a-z0-9)
  • viewer_access_token: a 64-character alphanumeric string (not a JWT, no payload structure; do not attempt to parse it)

HTTP Status Codes

  • sse_missing_target_lang / sse_unsupported_language: 422
  • broadcast_token_invalid (viewer verify endpoint): 401

Error Code Strings

  • POST /api/v1/imports insufficient quota: stt_quota_exceeded
  • Broadcast not found on viewer SSE: broadcast_session_not_found
  • Broadcast at capacity on viewer SSE: broadcast_capacity_exceeded
  • The context for the sse_translation_failed error event is sse

WebSocket Event Naming

  • retranslate success event: action: "translation"
  • Storage-layer upload failures: delivered via a type: "error" envelope (error_code is storage_upload_failed / storage_connection_failed / storage_queue_full); there is no separate upload_error action

Newly Documented Error Codes

Endpoint / ActionError CodeDescription
WebSocket set_nameset_name_empty / set_name_too_long / set_name_not_readyReplaces the older name_too_long
WebSocket audioaudio_process_failedSTT write fails repeatedly (HTTP 500; reconnecting is recommended)

Reference


V1.5.5 (2026-05-13)

Breaking Change: The Summary API Is Now Mode-Aware

The "template + custom_prompt combined" design introduced in V1.5.4 is now mutually exclusive: on each summary request you must choose either mode=builtin (apply the built-in template) or mode=custom (your prompt fully replaces the built-in template).

Clients must migrate: V1.5.4 clients that do not update their fields will receive a 422.

Unified New Fields Across the Three Entry Points: REST POST /api/v1/summary, SSE regenerate/summary, and the WebSocket start action

Old (V1.5.4) -> New (V1.5.5) mapping:

Old FieldNew FieldNotes
template / templateSlug / summary_templateSame name (builtin mode only)Unchanged, but must not be sent in custom mode
custom_prompt / customPrompt / summary_custom_promptprompt / summary_prompt (custom mode only)Renamed
custom_prompt_slug / customPromptSlug / summary_custom_prompt_slugprompt_slug / summary_prompt_slug (custom mode only)Renamed
persist_custom_prompt / persistCustomPrompt(removed)Custom mode always snapshots; no opt-in
custom_instructions(removed)Legacy field, no longer supported
(none)mode / summary_mode (required)New required field, enum builtin / custom

Mutual-exclusion rules:

  • mode=builtin: template is required; prompt / prompt_slug must not be sent
  • mode=custom: prompt / prompt_slug is required; template must not be sent
  • Violations -> 422 summary_mode_field_mismatch

GET /api/v1/tasks/ Response Fields

Within data.tasks[]:

  • Added summary_mode (builtin / custom / null)
  • summary_template now returns the effective slug (in custom mode it returns your slug, identical to the prompt_slug you submitted)
  • Removed summary_custom_prompt_slug (merged into summary_template)

Backward compatibility: recordings without a generated summary have summary_mode set to null; existing builtin-mode recordings keep their original summary_template value.

Transcript Record Structure Changes

New top-level fields (not nested under the summary object):

FieldDescription
summary_modebuiltin / custom
summary_templateeffective slug — builtin -> the built-in slug; custom -> your slug
summary_plain_textbool
summary_prompt_snapshotPresent only in custom mode; the prompt content you passed in verbatim (not written in builtin mode)
summary_fallback_levelPresent only when a fallback was triggered (value 2 or 3); indicates that this summary went through an automatic content-filter fallback path. Omitted when the summary succeeds directly
summary_dropped_segmentsPresent only when fallback_level=3; the indices of the transcript segments that were dropped (an array of integers in original order)

In addition to the existing text, the init_summary event of GET /api/v1/sse/history/transcribe/{taskId} now adds mode / template / plain_text / prompt_snapshot (populated only in custom mode) for client traceability, plus fallback_level / dropped_segments (populated only when a fallback was triggered).

New Outbound WebSocket Events

  • summary_done: summary generation completed (includes summary_mode / summary_template (effective) / summary_plain_text / tokens_used / summary_fallback_level / summary_dropped_segments; does not include final_content)
  • summary_error: summary generation failed (includes error_code / message)

Clients no longer need to poll the transcript record to determine whether the summary is complete.

Automatic Content-Filter Fallback for Summaries

When a custom-mode prompt or transcript content triggers the LLM service's content filter (finish_reason=content_filter), the system handles it through an automatic multi-step fallback instead of failing outright. If some transcript segments still cannot be processed, they are omitted and reported via summary_dropped_segments. If even the fallback cannot produce a summary, a summary_error event is emitted with error_code=llm_content_filtered.

Client-side handling:

  • Use summary_fallback_level to show a UI notice indicating the summary was produced through a content-filter fallback path
  • Use summary_dropped_segments to inform the user which segments were actually omitted

Spec scope: In this release the fallback applies to two paths: WebSocket realtime summaries (auto-generated when a recording ends) and file-import summaries. Fallback integration for the SSE regenerate/summary endpoint is a follow-up; in the current version it still returns llm_content_filtered when blocked.

Custom-Mode Prompt Safety Rule (New in V1.5.5)

The built-in safety guard applied to custom-mode prompts now adds a rule instructing the LLM to "summarize the intent of any colloquial, emotional, or sensitive wording in the source in neutral, objective language, avoiding verbatim quotation or repetition." This rule is enforced by the backend and is not exposed for client configuration; its purpose is to reduce the chance of triggering the content filter on the first attempt.

The safety-guard content is never written to backend storage or logs. Your original prompt is still stored via the summary_prompt_snapshot field as an audit reference, complementing summary_fallback_level:

  • summary_prompt_snapshot = your intent (the original prompt content)
  • summary_fallback_level = the actual execution path taken by the automatic fallback

Custom-Mode Prompt-Injection Protection

In custom mode, the backend wraps the prompt you provide with prompt-injection protection to prevent instructions inside your prompt from overriding system rules. You should still avoid concatenating untrusted end-user input directly into prompt.

New Error Codes

Error CodeHTTPTrigger Condition
summary_invalid_mode422 (SSE) / 400 (others)mode is not builtin / custom
summary_mode_field_mismatch422 / 400The mode and field combination is inconsistent (a required field is missing, or a forbidden field was sent)
summary_prompt_too_long422 / 400prompt exceeds 2000 characters
summary_prompt_slug_too_long422 / 400prompt_slug exceeds 64 characters
summary_prompt_slug_invalid422 / 400prompt_slug contains control characters (\n / \r / \t / \0, etc.)

Client Recommendations

  1. Add the required mode field — change existing calls using templateSlug=meeting to mode=builtin&template=meeting
  2. Rename fieldscustomPrompt -> prompt, customPromptSlug -> promptSlug; these two fields are only used in mode=custom
  3. Remove persistCustomPrompt — custom mode preserves the prompt content automatically
  4. Change templateSlug to template — and only use it in mode=builtin
  5. Transcript records now use top-level fields — no longer nested under the summary object
  6. Clients can determine whether a summary was saved from the done event / summary_done event — check persisted: true/false; you no longer need to infer it from the HTTP method

Reference


V1.5.4 (2026-05-12)

New Feature: Customer Prompt Customization for Summaries

Enterprise customers can now add their own rules to the summary API without modifying the built-in template. This release adds three orthogonal client parameters and splits the summary regeneration endpoint into "preview" and "save" verbs, avoiding the design gap of an HTTP GET with side effects.

Fully backward compatible — not sending the new fields = behavior identical to the previous version.

New Fields for POST /api/v1/summary

FieldTypeLimitDescription
custom_promptstring<=2000 charactersCustomer custom instructions appended after the built-in template
custom_prompt_slugstring<=64 characters, Unicode, no control charactersA client-defined template identifier (pass-through)
plain_textboolDefault falseRequest plain-text output (the backend performs Markdown post-processing)
persist_custom_promptboolDefault falseOpt-in: whether the done event echoes the custom_prompt content

The SSE start / done events also add the corresponding fields (custom_prompt_slug, plain_text, final_content, custom_prompt_snapshot); see reference/rest/summary.md.

/api/v1/sse/regenerate/summary/{taskId} Split Into Two Endpoints

MethodPurposeWrites DBSaves TranscriptBilled
GETPreview (dry run, compare different prompt results)NoNoYes
POSTSave (official persistence)YesYes + bumps revisionYes

Client recommendation: If your integration previously relied on "the backend record updating automatically after a GET," switch to POST. GET is now a pure preview and no longer writes any backend state.

The done event adds a persisted: bool field, so clients can determine directly from the payload whether this call was saved, without inferring from the HTTP method.

Four New Fields for the WebSocket start Action

summary_custom_prompt / summary_custom_prompt_slug / summary_plain_text / summary_persist_custom_prompt, mapping one-to-one to the REST endpoint fields with the same limits.

New Endpoint: GET /api/v1/summary-templates/{slug}

Exposes the built-in template's full content so enterprise customers can reference the existing baseline when integrating and then decide what to add via custom_prompt.

GET /api/v1/summary-templates also adds a ?category=summary|medical|legal|all filter and a data[].category field in the response (default summary, backward compatible).

New Error Codes

Error CodeHTTPTrigger Condition
custom_prompt_too_long400custom_prompt exceeds 2000 characters
custom_prompt_slug_too_long400custom_prompt_slug exceeds 64 characters
custom_prompt_slug_invalid400custom_prompt_slug contains control characters
template_not_found404The template for the specified slug does not exist or is disabled
invalid_category400?category= is not in the allowlist

Behavior Changes

  • summary_text_empty / summary_text_too_long HTTP status code fix: these previously fell through to 500 because they were not explicitly mapped; this release fixes them to a semantically correct 400.
  • The POST /api/v1/summary error event details no longer includes the LLM raw error: the raw error goes only to the server log; the details returned to the client retains only the provider indicator.
  • GET preview is still billed: the LLM actually consumes tokens, so the GET endpoint cannot be free. Repeated GET calls are billed repeatedly, but they do not change DB / Blob state.

Path and Field Naming Conventions

  • customPromptSlug is a customer-defined pass-through identifier (semantically different from the existing templateSlug, which is validated for existence). In naming terms, the former is "for client traceability" and the latter is "for looking up the VAS built-in template."
  • summary_custom_prompt_slug is recorded with each summary, so you can later query which customer template a summary corresponds to.
  • custom_prompt_snapshot (opt-in) is stored in the transcript record only when the customer sets persist_custom_prompt=true; it is never written to the DB.

Security Controls

  • All endpoints require API Key authentication
  • The VAS server log does not log custom_prompt or the full transcript (it logs only the length and slug)
  • LLM error messages are sanitized (the raw error is not exposed to the client)
  • custom_prompt is fully isolated across tenants (session-scoped, no memory persistence)

Bug Fixes and Internal Improvements

  • The WebSocket start action recording_id field deprecation target version is unified to V2.0.0 (events.md previously said V1.6.0, inconsistent with code comments)
  • The SSE sse-api.md broken TOC anchor is fixed (it pointed to the audio section, but that content has been moved to the standalone reference/sse/audio.md)
  • Improved text sanitization so CJK characters and emoji (including ZWJ sequences) are no longer mis-split or wrongly rejected
  • The summary regeneration full text now has a 100,000-character upper limit

Reference


V1.5.3 (2026-05-07)

Breaking Change: speaker_id Naming Inversion

To support speaker editing, V1.3.12 added the original_speaker_id field to preserve the original ID, but it left a design gap where "the same name means different things at different stages": for WebSocket realtime recording, speaker_id is the original ID (e.g., Guest-1), but after an SSE historical audio load, speaker_id becomes the display name (e.g., Manager Wang, with the alias applied). Frontends often picked the wrong field and passed it to PATCH /speakers/reassign.

This release performs a one-time inversion that is not backward compatible:

Old NameNew NameSemantics
speaker_id (display name)speaker_labelDisplay label (after alias is applied; mutable, human-readable)
original_speaker_id (original ID)speaker_idOriginal speaker ID (immutable, always stable)

After the inversion, speaker_id consistently refers to the original ID in all contexts (WebSocket / SSE / REST / blob / log); the new speaker_label represents the display label after the alias is applied. Speaker editing (rename / reassign / merge) always uses speaker_id as the locating key.

REST API Field Changes

PATCH /api/v1/tasks/{taskId}/speakers/rename

LocationOld FieldNew Field
Request bodyoriginal_namespeaker_id (max 100 characters)
Request bodynew_namenew_label (max 100 characters, no control characters \x00-\x1F / \x7F or newlines)
Response dataoriginal_namespeaker_id
Response datanew_namenew_label

speaker_id can still also accept a display label for chained renaming (e.g., first rename Guest-1 to "Manager Wang," then use "Manager Wang" to rename to "Director Wang"); the resolved response speaker_id is always the original ID.

PATCH /api/v1/tasks/{taskId}/speakers/reassign

LocationOld FieldNew Field
Request bodytarget_speaker_idUnchanged (semantics already aligned to the original ID)
Response datanew_speaker_namenew_speaker_label

target_speaker_id must be the original ID (taken from init_sentence.speaker_id); reassign does not accept a display label.

PATCH /api/v1/tasks/{taskId}/speakers/merge

LocationOld FieldNew Field
Request bodysource_speaker_id / target_speaker_idUnchanged (still accepts the original ID or the current display label)
Response datatarget_speaker_nametarget_speaker_label

WebSocket Event Changes

EventOld FieldNew Field
rename_speaker action bodyoriginal_name / new_namespeaker_id / new_label
result event origin / translations[lang]only speaker_id (mixed with display name)speaker_id (original ID) + speaker_label (display label)
speaker_renamed eventoriginal_name / new_namespeaker_id / new_label
speaker_reassigned eventnew_speaker_namenew_speaker_label
speakers_merged event(missing target label)added target_speaker_label

SSE Event Changes

EventOld FieldNew Field
init_sentencespeaker_id (display name) + original_speaker_id (original ID)speaker_id (original ID) + speaker_label (display label)
Broadcast viewer origin / translationonly speaker_id (mixed)speaker_id + speaker_label
Broadcast viewer speaker_renamed / speaker_reassigned / speakers_mergedsame as the corresponding WebSocket eventsas above

The behavior and fields of init_metadata.speaker_aliases (the "original ID -> display label" mapping) are unchanged.

Client Recommendations

  • Customers using WebSocket realtime recording: before upgrading, sync the handling of result.origin.speaker_id and the new result.origin.speaker_label; change the rename body to { "speaker_id": "...", "new_label": "..." }
  • Customers using SSE historical audio: init_sentence.speaker_id is now the original ID (previously the display name); switch to speaker_label for display
  • Customers doing speaker editing (rename / reassign / merge):
    • rename -> use speaker_id (either the original ID or the current display label) + new_label
    • reassign -> target_speaker_id must be the original ID (taken from init_sentence.speaker_id; you cannot send a display label)
    • merge -> source_speaker_id / target_speaker_id can still be the original ID or the current display label
  • Customers integrating TXT/SRT/CSV export: new_label now has control-character/newline validation; if you previously sent labels containing newlines, you will now receive a 422, so change to single-line content
  • Customers who do not do speaker editing and only consume transcript text: the impact is minimal; the only behavior difference is that if old code rendered speaker_id directly as the display name, it must switch to speaker_label

Data Compatibility

  • Not backward compatible: old transcript blobs (V1.3.12 ~ V1.5.1, containing speaker + original_speaker_id) require migration before they can be read in the new version; there is no cross-version data retention commitment during the POC phase
  • New recordings are unaffected: transcript blobs created after V1.5.3 use the new fields directly

Documentation Update

Reference


V1.5.1 (2026-05-07)

Bug Fix: POST /api/v1/imports Adds Length Validation for Terminology / Correction Fields

The length limits promised in several places in the documentation (e.g., a term's max of 100 characters) were previously not actually enforced on the file-import path, and overly long content was silently accepted. This release restores them, aligning behavior with the documentation's promises.

Behavior Changes (Aligning With Documented Promises)

POST /api/v1/imports adds 422 rejection conditions for the following fields (previously accepted):

FieldLimit
terminology.<lang>Array, max 500 terms (per language)
terminology.<lang>[].termstring, max 100 characters
terminology.<lang>[].boostnumeric, 0.5–5.0 (optional, default 1.0)
fuzzy_correction.<lang>[].correctstring, max 200 characters
fuzzy_correction.<lang>[].incorrect[]string, max 200 characters

These limits are consistent with the WebSocket config action; previously only the WebSocket path enforced them, and this release completes the file-import path.

Client Recommendations

If you previously sent overly long terms (>100 characters) via POST /api/v1/imports, you will now receive a 422. The frontend should check the length before submitting and prompt the user. The WebSocket path is unchanged.


V1.5.0 (2026-05-07)

Internal Naming Unification (No Public API Changes)

Continuing the task_id naming unification started in V1.4.1, this release completes the transition of the internal protocol layer.

The public API (WebSocket, REST, Webhook, SSE) is completely unchanged, and customers need to take no action.

The old naming (recording_id) will be fully removed in V1.6.0; for the related client migration guidance, see V1.4.1 Client Recommendations.


V1.4.3 (2026-05-07)

Internal Observability-Layer Naming Unification (No Public API Changes)

Following V1.4.1, this release performs a naming-unification transition for the log and monitoring layers.

The public API is completely unchanged, and customers need to take no action.


V1.4.2 (2026-05-07)

Internal Code Naming Unification (No Public API Changes)

Following V1.4.1, this release advances the task_id naming to the backend code level.

The public API is completely unchanged, and customers need to take no action.


V1.4.1 (2026-05-06)

Naming Unification: task_id as the Cross-Interface Task Identifier

Previously, the same task had different field names across interfaces (WebSocket used recording_id, Webhook used task_id, and some REST path variables mixed {recordingId} / {taskId}), forcing integrators to reconcile the three naming schemes themselves. This release starts the naming-unification cycle; new integrations should use task_id consistently.

WebSocket Changes (Backward Compatible)

  • The session_started event payload now carries both task_id and recording_id, and their values are exactly the same (the UUID of the same recording)
  • The recording_id field is marked as Deprecated; it is still emitted normally and is scheduled for removal in V1.6.0
  • Documentation enhancement: session_id is the WS connection-level identifier (invalidated when the connection ends), which is a different level from task_id (the task identifier)

REST API Changes (Backward Compatible)

Added /api/v1/tasks/{taskId}/... alias paths that behave exactly the same as the existing /api/v1/recordings/{recordingId}/...:

Recommended (from V1.4.1)Deprecated (removed in V1.6.0)
PATCH /api/v1/tasks/{taskId}/speakers/renamePATCH /api/v1/recordings/{recordingId}/speakers/rename
PATCH /api/v1/tasks/{taskId}/speakers/reassignPATCH /api/v1/recordings/{recordingId}/speakers/reassign
PATCH /api/v1/tasks/{taskId}/entries/{sid}PATCH /api/v1/recordings/{recordingId}/entries/{sid}

Client Recommendations

  • New integrations: use the task_id field and the /api/v1/tasks/{taskId}/... paths consistently to avoid migrating again later
  • Existing integrations: no immediate change required. recording_id and /api/v1/recordings/... remain available throughout the V1.x period; we recommend migrating on your schedule, at the latest before V1.6.0 ships
  • ID alignment logic: if you depend on both WS and Webhook, you can align the WS task_id (or the old name recording_id) directly with the Webhook data.task_id; all three are the same UUID
  • Do not use session_id for alignment: session_id is meaningful only within the WS connection lifecycle and does not appear in Webhook or REST

Removal Timeline Announcement (V1.6.0)

V1.6.0 will remove the recording_id field from the WS payload and remove the /api/v1/recordings/{recordingId}/... paths. The detailed timeline will be announced separately before V1.6.0 ships.

Unchanged Items

  • Webhook payload: the existing data.task_id naming is unchanged
  • Existing /api/v1/tasks/{taskId}/... endpoints: unchanged

V1.4.0 (2026-05-06)

New Feature: Source-Text Editing for Historical Recordings + Automatic Retranslation

Users can correct STT recognition errors and regenerate translations; for the workflow, see Entries API Typical Workflow.

  • New endpoint PATCH /api/v1/recordings/{recordingId}/entries/{sid}: edit a single sentence's source text; on the first edit it automatically backs up the original STT output to original_text_raw, records original_text_edited_at, and clears the TTS cache for all languages of that sentence
  • New endpoint GET /api/v1/sse/recordings/{taskId}/entries/{sid}/retranslate: retranslate a single sentence (you can specify languages or retranslate all existing languages), with optimistic locking (expectedRevision)
  • Editing and retranslation are decoupled: PATCH only changes the source text and does not touch the translation; the frontend can decide when to trigger retranslation

Historical Record SSE Exposes Edit Markers

The historyTranscribe init_sentence event carries original_text_raw (the STT original) and original_text_edited_at on edited sentences, so the frontend can show an "edited" marker and a "restore original" function.

Security Fixes

  • retranslate / retranslateSummary add a user filter: these two existing SSE endpoints previously had a horizontal privilege vulnerability (IDOR) that allowed reading other users' recordings. This release adds the permission check; other users' recordings now return recording_not_found.
  • Retranslation / summary regeneration requires the recording to be completed: the four endpoints retranslate / retranslateSummary / retranslateEntry / regenerateSummary require processing_status === completed to avoid racing with the in-progress flow. When not completed, they return recording_not_completed.

New Error Codes

Error CodeHTTPDescription
recording_not_completed422The recording has not finished processing; retranslation / editing / summary regeneration is not allowed
entry_not_found404The specified sentence was not found
entry_text_empty422The sentence's source text is empty
entry_text_too_long422The sentence's source text exceeds the 2000-character limit
transcript_revision_conflict409The transcript has been modified by another request (optimistic-lock conflict)

See error-codes.md.

Client Recommendations

  • After editing the STT source text: we recommend triggering single-sentence retranslation SSE immediately after the PATCH, passing the revision from the PATCH response as expectedRevision to avoid concurrent overwrites
  • Showing the edit marker: determine whether a sentence has been edited by the presence of the original_text_raw field in the init_sentence event ('original_text_raw' in data); do not use text comparison (the user may edit and then change it back to the original value)
  • Recording status: calling retranslation / editing / summary regeneration on a recording that is not completed returns recording_not_completed; the frontend should block these operations in the UI until processing_status === completed

V1.3.13 (2026-05-06)

Behavior Changes (Breaking Changes)

  • WebSocket audio_format locked to pcm and webm: the previously accepted 5 formats (pcm / webm / mp3 / wav / m4a) are narrowed to accepting only pcm and webm, consistent with the existing spec in reference/websocket/voice-translation.md. Customers who send mp3 / wav / m4a will now receive audio_format_unsupported (previously these were silently decoded, which was undocumented implicit behavior). File imports still go through POST /api/v1/imports and are unaffected.

Documentation Update

  • Audio download Content-Type is always audio/mp4: rest-api / SSE audio / tasks export / history playback / curl / javascript documentation in several places is unified to "all recording audio is returned in an M4A container (AAC encoding)," removing the previous circular "dynamically determined" description.
  • Supported file-import formats narrowed to mp3 / wav / m4a: removed mentions of mp4 and webm from the documentation to align with the formats actually accepted (guides/file-import.md, reference/rest/imports.md).

Client Recommendations

  • Customers using the WebSocket start action: be sure to explicitly specify audio_format as pcm or webm; if you previously relied on the undocumented implicit mp3 / wav / m4a support (very rare scenarios), switch to the File Import API.
  • Customers downloading recording audio: all new recordings have Content-Type fixed to audio/mp4 with the .m4a extension. If older recordings still exist in storage, downloads may still return audio/webm; we recommend keeping a handling branch for the old extension to cover historical data.

Reference


V1.3.12 (2026-05-04)

⚠️ Inverted in V1.5.3: the original_speaker_id field and the "speaker_id is the display name" design introduced in this version have been superseded by the naming inversion in V1.5.3. This section is kept as a historical record; new integrations should refer directly to the V1.5.3 spec and do not need to implement this version's client recommendations.

New Feature

  • History SSE adds fields to align with the Transcribe speaker-editing UX: the historical record's init_metadata and init_sentence events each add a field, allowing the frontend to fully reuse the realtime recording page's speaker-editing menu (single-sentence reassignment + global rename).
    • init_metadata adds speaker_aliases (object): the "original speaker ID -> display name" mapping. When there are no aliases it is {} (an empty object, not an empty array). It lets the frontend perform a name-collision precheck before sending PATCH /speakers/rename, covering the implicit conflict of "an original ID that exists on the backend but does not appear on screen because it was renamed."
    • init_sentence adds original_speaker_id (string|null): the original speaker identifier without alias substitution, provided as the source for the target_speaker_id of PATCH /speakers/reassign.
    • Old-data fallback: if a pre-v2.24.0 old transcript record has no original_speaker_id, the SSE output automatically falls back to speaker_id, preventing the new field from being null and disabling the editing entry point for old recordings.

Behavior Changes

  • No breaking change. Both fields are pure additions; existing SSE clients using Zod z.object (which strips extras by default) will not fail to parse, so no version negotiation is needed.

Documentation Update

  • sse-api.md L156-198: added the new field descriptions to the init_metadata / init_sentence examples and field tables
  • reference/sse/history.md L103-180: added the detailed reference schema accordingly

Client Recommendations

  • Customers doing speaker editing on the history detail page: get the original ID for reassign from init_sentence.original_speaker_id (do not use speaker_id, which is the display name with the alias applied); use init_metadata.speaker_aliases for the name-collision precheck before a rename.
  • Customers who do not do speaker editing: you can ignore the new fields; existing parsing behavior is unaffected.

Reference


V1.3.11 (2026-05-04)

Behavior Changes (Breaking Changes)

  • STT rejects the bare en code (the V1.3.10 changelog claimed it was removed, but it was not actually in effect): customers who send en will receive a 422 invalid_transcription_language; use a full BCP 47 code such as en-US / en-GB instead.
  • TTS removes 4 locales not supported by the speech provider: it-CH, ar-IL, ar-PS, en-GH. The speech provider's TTS never supported these 4 locales; previously, customers requesting their voices would fail at runtime on the provider side. STT still supports these 4 locales.

New Feature

  • TTS completed to the speech provider's full set (154 languages, 325 voices): fully aligned with the speech provider's Monolingual Neural Voice list (including GA + Preview)
    • Chinese dialects (4 added): zh-CN-henan, zh-CN-guangxi, zh-CN-liaoning, zh-CN-shaanxi
    • South Asian languages (5 added): bn-BD Bengali (Bangladesh), ta-LK Tamil (Sri Lanka), ta-MY Tamil (Malaysia), ta-SG Tamil (Singapore), ur-PK Urdu (Pakistan)
    • Southeast Asian languages (1 added): su-ID Sundanese (Indonesia)
    • Eastern European languages (1 added): sr-Latn-RS Serbian (Latin script)
    • North American indigenous languages (2 added): iu-Cans-CA Inuktitut (Canadian syllabics), iu-Latn-CA Inuktitut (Canadian Latin script)

Documentation Update

  • languages.md TTS section rewritten, explicitly noting:
    • Of the 145 STT locales, 141 are supported on both the STT and TTS sides; 4 (it-CH, ar-IL, ar-PS, en-GH) are STT-only
    • Of the 154 TTS locales, 13 are TTS-only (4 zh-CN dialects + 9 other languages)
  • guides/tts.md numbers updated (142->154 languages, 304->325 voices)
  • README.md TTS description updated

Verification Results

Compared against the speech provider's official STT and TTS voice list, VAS is fully aligned:

SourceSTTTTS localeTTS voiceDiarization
Provider official14515432531
VAS14515432531

Client Recommendations

  • Customers using the en short code: switch to en-US or another full BCP 47 code.
  • Customers using it-CH/ar-IL/ar-PS/en-GH for TTS: these already failed on the provider side; switch to another locale in the same language family (e.g., it-CH -> it-IT, ar-IL -> ar-SA, en-GH -> en-NG). STT is unaffected.
  • Customers who want to use the 13 new TTS-only locales: you can call GET /api/v1/tts/voices?language=zh-CN-henan etc. directly to get the voice list.

Reference


V1.3.10 (2026-04-30)

Documentation Update

  • languages.md number corrections
    • Total speech-recognition languages 119 -> 145 (aligned with the main STT table)
    • Speech-translation support 117 -> 143 (145 minus jv-ID Javanese and wuu-CN Wu Chinese)

This version has a residual issue; see V1.3.11: this version claimed "the bare en was removed and the language counts are fully consistent at 145," but the bare "en" was not actually removed (still 146), nor did it handle the TTS-side it-CH/ar-IL/ar-PS/en-GH (not supported by the provider's TTS) or the 13 missing TTS-only locales. The full alignment fix was completed in V1.3.11.

Client Recommendations

  • This version is a documentation-only number correction and does not affect running integrations.

Reference


V1.3.9 (2026-04-29)

New Feature

  • Webhook Secret Bootstrap flow: resolves the contradiction where a client cannot obtain the secret on first webhook integration. The Dashboard adds a "Generate Webhook Secret" button (lazy generation), letting users obtain the secret and configure it on the receiving end first, then go back and set the webhook URL. The probe sent when setting the URL is signed with a secret that both sides agree on, so it passes on the first try. This aligns with the mainstream industry pattern of Stripe / Shopify.
    • New endpoint: POST /dashboard/api-keys/{id}/webhook/regenerate-secret (Dashboard only, reuses the webhook-update rate limiter, 10/min/user)
    • Behavior: generates a 64-character random secret and writes it to the DB; does not send a probe and does not touch the webhook URL; returns the plaintext once via a flash session for the Dashboard to display
    • Regeneration impact: after execution, the old secret is invalidated immediately; existing receivers will get webhooks with mismatched signatures until they switch to the new secret

Behavior Changes

  • Clearing the Webhook URL no longer clears the Secret: when PATCH /dashboard/api-keys/{id}/webhook sets webhook_url to null, webhook_secret is left unchanged. The Secret and URL now have independent lifecycles. A customer can generate the secret first and set the URL later; sending an empty URL in the meantime will not lose the secret.
  • The Dashboard no longer returns webhook_secret in plaintext: GET /dashboard/api-keys/{id} now returns webhook_secret_masked (prefix mask + last 4 characters) and a has_webhook_secret boolean. The plaintext is shown only once via a flash session right after generation (aligned with Stripe).

Documentation Update

  • guides/webhook.md: "Method 2: API Key-level webhook_url" rewritten as a two-step flow (generate secret -> set URL); added a Webhook Secret Lifecycle section; added a Bootstrap callout to the security-verification section.

Client Recommendations

  • First integration: in the Dashboard, click "Generate Webhook Secret," copy it to the receiving end's .env, enable HMAC verification, and restart the service, then go back to the Dashboard and enter the webhook URL.
  • Existing customers: fully compatible, no changes needed. Existing webhook_url and webhook_secret behavior is unchanged.
  • Secret rotation: we recommend that the receiving end briefly accept both the old and new secrets; after the dashboard regeneration, remove the old secret once in-flight webhooks have finished processing.

V1.3.8 (2026-04-27)

New Feature

  • Translation-service-unavailable detection (session-level): added the error code translation_service_unavailable. When the LLM translation service fails consecutively up to a threshold, the backend emits a session-level error event once, so the frontend can show a global "translation temporarily unavailable" prompt instead of users seeing a page full of individual failed sentences in gray text.
    • Trigger conditions:
      • llm_timeout / llm_provider_error / llm_rate_limit / llm_request_failed escalate after 5 consecutive failures
      • llm_auth_failed / llm_deployment_not_found / llm_quota_exceeded escalate immediately after 1 occurrence (configuration/billing issues)
      • llm_content_filtered is not counted (a content issue, not a service issue)
    • Deduplication: each session is notified only once; any successful sentence translation resets the count and can trigger it again
    • payload: type: "error", severity: "error" (not fatal — should not disconnect), does not carry sid, details contains provider, last_error_code, fail_count
    • Viewer notification: in broadcast mode, all viewers (regardless of language) also receive this event (via the SSE event: error channel)

Documentation Update (Spec Sync)

Continuing the spec blind spots surfaced by frontend feedback since V1.3.7+, this pass completes:

  • error-codes.md — sentence-level error rule: added a sid-rule paragraph below the "Severity Levels" table, explicitly stating that "when an error carries sid, regardless of severity, it should be treated as a sentence-level error and should not disconnect." A fatal + sid combination only means that sentence failed severely; the session as a whole can still continue.
  • error-codes.mdtranslation_service_unavailable error-code registration: added this error code and its full trigger-rule description to the "Translation Service Errors" section
  • websocket-api.md: added a session-level translation error example (no sid, severity error) to the "Error Message Format" section
  • sse-api.md — retranslate section adds the per-sid error rule: explicitly lists the spec and payload format for "a failed sentence is re-emitted as event: error with sid + error_code, interleaved with translation" (implemented in V1.3.7 but documented only in the reference subdirectory)
  • reference/sse/broadcast-viewer.md: added a translation_service_unavailable example and a specific error-code entry
  • reference/websocket/events.md: removed an obsolete translation_error action that the service never actually emitted; translation errors are delivered on the standard error channel (type: "error")

Client Recommendations

  • Existing sentence-level error handling (type: "error" with sid) needs no changes.
  • If you want to show a global "translation service unavailable" prompt, add a listener: when you receive error_code === "translation_service_unavailable" (without sid), show a banner / toast; clear it once any subsequent sentence translation succeeds (you receive a translation event again).
  • Do not treat translation_service_unavailable as a disconnect signal — STT (the source text) continues to operate.

Reference:


V1.3.7 (2026-04-24)

Behavior Changes

  • Realtime recording: silent tasks now follow the normal completion flow: when a realtime recording (WebSocket) is silent throughout, is noise, or cannot recognize any sentence, it now still produces an empty transcript (entries: []) and ends with a task_complete event. This behavior aligns with the V1.3.5 file-import flow; the realtime and import sources now share the "zero recognition results is treated as a legitimate completed" semantics.
  • SSE historical record: no longer returns sse_transcript_not_found in silent scenarios: GET /api/v1/sse/history/transcribe/{taskId} no longer returns the sse_transcript_not_found error for silent tasks; instead it sends the full event sequence (init_metadata → init_summary(text='') → init_done(totalSentences=0)). Clients should use totalSentences === 0 to detect this and show a "no speech content" empty state.

Bug Fixes

  • Fixed the History page getting stuck on "processing" for silent recordings: previously, if a realtime recording was silent throughout, the backend skipped the transcript upload due to the segmentsCount == 0 condition, but task_complete still sent the task_id, causing the frontend to receive sse_transcript_not_found (semantically "not finished processing") when loading the historical record, leaving the UI stuck on loading forever. After the fix, the realtime path matches the import path and always uploads the transcript (even with empty entries).

Client Recommendations

  • If you previously had "retry / polling" handling logic for sse_transcript_not_found, you may keep it as a defensive fallback (e.g., for blob upload delays), but you should no longer use it to determine "the task has no speech" — switch to init_done.totalSentences === 0.
  • We recommend the UI prompt possible reasons when totalSentences === 0 (volume too low, silent throughout, recognition language does not match the audio), consistent with the V1.3.5 import-scenario wording.

Documentation Update

  • Historical Record SSE adds a "Boundary scenario: no speech content" section and corrects the handling-recommendation description for sse_transcript_not_found

Reference:


V1.3.6 (2026-04-23)

New Feature

  • Tasks API: added POST /api/v1/tasks/{taskId}/force-fail: force-marks as failed a task stuck in a non-terminal state (recording / importing / uploading / pending / processing)
    • The body can optionally include reason (max 500 characters)
    • Triggers the recording.failed webhook, with payload.failure_source set to user_forced
    • A task already in a terminal state returns invalid_processing_status (422)
  • Tasks API: added POST /api/v1/tasks/{taskId}/retry: re-queues a task in the failed state for processing
    • Prerequisites: processing_status = failed and audio_status = success and transcript_status = success
    • Not meeting the prerequisites returns invalid_processing_status (422); the details field carries audio_status / transcript_status to help with diagnosis

Behavior Changes

  • Error code invalid_processing_status (422) expanded scope: now also used as the common response for force-fail and retry; details carries current_status, and the retry scenario additionally carries audio_status and transcript_status

Documentation Update

  • Tasks API adds documentation for the force-fail and retry endpoints
  • Error Code Reference adds the invalid_processing_status entry and a "Processing Status Mismatch" subsection

Reference:


V1.3.5 (2026-04-22)

Behavior Optimization

  • File import: empty recognition-result filtering: audio imports now filter out empty recognition results (caused by silence, very low volume, noise, or a language mismatch), so imports no longer produce empty 00:00 placeholder segments
  • Zero recognition results is a legitimate completed status: in this scenario the import task still ends with status: completed (not failed), the task_id is produced normally, but the subsequently loaded transcript entries is an empty array and segments_count is 0
  • Budget deducted by actual duration: unrecognizable audio is still deducted from the monthly budget based on the audio duration (no refund)

Client Recommendations

  • After loading the transcript (SSE /api/v1/sse/history/transcribe/{taskId}), if the cumulative sentence count is 0, show a "no speech content was recognized in this audio" empty state
  • Do not treat zero recognition results as an error branch; follow the completion branch and judge by the sentence count
  • We recommend the UI also prompt possible reasons (volume too low, silent throughout, recognition language does not match the audio)

Documentation Update

  • File Import Guide adds a "Behavior When Audio Cannot Be Recognized" section
  • Imports API adds a completed boundary-scenario note under the status transition
  • Import Progress SSE adds a behavior note for zero recognition results under the completed event

Reference:


V1.3.4 (2026-04-22)

New Feature

  • Tasks API: added GET /api/v1/tasks/{taskId}/transcript/export: download a task's transcript, supporting five formatstxt, srt, sbv, vtt, csv
    • The output includes the source text and all translation languages
    • CSV starts with a UTF-8 BOM, with columns index,start,end,speaker,text,<one column per translation language> and times in HH:MM:SS (no milliseconds)
    • SRT times are HH:MM:SS,mmm; SBV times are H:MM:SS.mmm, with the source text and translations joined into a single line with |; VTT uses the WEBVTT header
    • The filename uses {recording name}-transcript.{ext} (RFC 5987 UTF-8 encoded)
    • Added the error code recording_transcript_not_ready (422)

Behavior Changes (Breaking)

  • Speaker diarization and multi-language mutual exclusion is now a hard rejection: when recognition_mode: multi_speaker is combined with multiple transcription_languages, it previously emitted a warning and automatically truncated to the first language; it now directly returns the diarization_multilang_conflict error and refuses to start
    • The error severity is changed from warning to error
    • The frontend must restrict "speaker diarization" and "multi-language" to one or the other before the user submits start, or handle this error and guide the user to adjust the settings
    • Affected endpoint: WebSocket voice-translation / start

Documentation Update

Reference:


V1.3.3 (2026-04-21)

New Documentation

  • Tasks API: added the full documentation for the GET /api/v1/tasks/{taskId}/audio/export endpoint (the implementation existed but the documentation was missing), including parameters, dynamic Content-Type, error codes, and a frontend download example
  • Explained the difference between this endpoint and SSE /api/v1/sse/audio/{taskId}: the former is for offline download (Content-Disposition: attachment), the latter is for playback (supports Range Requests)

Documentation Fixes

  • Fixed the Voice Translation Actions translation-mode speakers field description table: the field name is corrected from speaker to id, consistent with the JSON example and the actual service behavior
  • Fixed the README API reference table endpoint counts: Tasks from 7 to 8 (added audio/export), Broadcasts from 9 to 6 (the original count was wrong)

Reference:


V1.3.2 (2026-04-07)

Documentation Structure Adjustment

  • Removed 3 deprecated old documents (error-codes.md V0.6, languages.md V0.1, authentication.md V0.1)
  • Moved appendix/error-codes.md and appendix/languages.md to the root directory, replacing the deprecated versions
  • Updated all cross-reference links

V1.3.1 (2026-03-26)

Batch Task Management

  • Added PUT /api/v1/tasks/batch/pin: batch-update pin status, max 100 per call
  • Added DELETE /api/v1/tasks/batch: batch-delete tasks, max 100 per call
  • Both endpoints affect only tasks belonging to the current user; the response includes affected_count

Batch Broadcast Cancellation

  • Added DELETE /api/v1/broadcasts/batch: batch-cancel broadcasts in the PENDING state, max 100 per call
  • IDs not in the PENDING state are ignored; the response includes affected_count

Reference:


Version: V1.5.7 Last Updated: 2026-05-20

Copyright © 2026