Guides

Summary Customization

Overview
Prerequisites
Two Modes
Three Entry Points
Transcript Record Fields
Handling Sensitive Words and Profanity
Security and Limits
Complete Examples
Error Codes
Related Documents

Overview

VAS summary generation supports two mutually exclusive modes:

builtin — Applies a VAS built-in template (scenarios such as meetings, interviews, courses, and medical consultations). You only need to specify a template slug.
custom — You provide a complete prompt that replaces the built-in template rules. The system still automatically handles output language and plain-text post-processing.

Across the three entry points (REST / WebSocket / SSE), both modes share the same request field semantics; the only difference is the field naming prefix.

When to Use Each Mode

Mode	Use Case
`builtin`	General meetings, interviews, courses, medical consultations, and other scenarios that can directly use a built-in template
`custom`	You have your own prompt rules (specific output fields, custom formats, domain-specific structures, etc.) and need to fully customize the summary generation behavior

Mutual Exclusion Rules

Mode	`template`	`prompt` / `prompt_slug`
`builtin`	Required	Not allowed
`custom`	Not allowed	Required (both are required)

Violating the mutual exclusion rules returns summary_mode_field_mismatch (REST 400 / SSE 422).

Prerequisites

2. (builtin path) Browse Available Templates

Call GET /api/v1/summary-templates to retrieve the list of built-in templates (supports ?category=summary|medical|legal|all filtering).

When you need to reference the design of a specific template, call GET /api/v1/summary-templates/{slug} to retrieve the template content as a design reference.

3. (custom path) Design Your Own Prompt

When writing your prompt, you must include the following yourself:

Output language requirement (e.g., "Please output in Traditional Chinese")
Structure and field descriptions (which information to extract, section order)
Output format requirement (JSON / bullet points / paragraphs)

In custom mode, your prompt replaces the built-in template rules, so you must include these elements in the prompt yourself.

Two Modes

builtin mode

{
  "mode": "builtin",
  "template": "meeting",
  "language": "zh-TW",
  "plain_text": true
}

Effect:

Generates the summary using the built-in template for the specified slug
The template slug is written to the backend record
The backend record is written with summary_mode: "builtin" and summary_template: <slug> (retrievable via the init_summary event)

custom mode

{
  "mode": "custom",
  "prompt": "你是皮膚科專科助手。請從逐字稿萃取:1) 主訴 2) Fitzpatrick 分型 3) 過敏原史。輸出 JSON。",
  "prompt_slug": "skin-clinic-acme-v2",
  "language": "zh-TW",
  "plain_text": true
}

Effect:

Does not query a built-in template; your prompt replaces the template rules
The prompt_slug is written to the backend record (pass-through, for historical lookup)
The original prompt text is forcibly snapshotted into the summary_prompt_snapshot field (the sole basis for reconstruction, retrievable via the init_summary event)

It is recommended that prompt_slug include a version (e.g., acme-v1 → acme-v2) so integrators can trace it.

Three Entry Points

The three entry points share the same semantics but use different field naming prefixes:

Entry Point	Field Prefix	Purpose
REST `POST /api/v1/summary`	(no prefix) `mode` / `template` / `prompt` / `prompt_slug`	Generate a summary for any transcript
WebSocket `start` action	`summary_*`	Automatically generated after a live recording ends
SSE `regenerate/summary`	(no prefix; SSE uses camelCase) `mode` / `template` / `prompt` / `promptSlug`	Regenerate a summary for an existing recording

REST `POST /api/v1/summary`

Generates a summary for any transcript (including external sources); the response is sent segment by segment as an SSE stream.

Full specification: reference/rest/summary.md

curl -N -X POST "https://vas-poc.vurbo.ai/api/v1/summary" \
  -H "Authorization: Bearer vas_..." \
  -H "Content-Type: application/json" \
  -d '{
    "content": "...transcript...",
    "mode": "custom",
    "prompt": "你是會議分析助手...",
    "prompt_slug": "acme-meeting-v1",
    "language": "zh-TW",
    "plain_text": true
  }'

⚠️ This endpoint authenticates with Authorization: Bearer <api_key>, which differs from the X-API-Key used by other REST endpoints.

WebSocket `start` action

The start action of a live recording carries the summary settings; after the session ends, the summary is generated automatically and you are notified via the summary_done event.

Full specification: reference/websocket/voice-translation.md

{
  "type": "voice-translation",
  "data": {
    "action": "start",
    "transcription_languages": ["zh-TW"],
    "translation_languages": ["en-US"],
    "type": "transcribe",
    "audio_format": "pcm",
    "summary_mode": "custom",
    "summary_prompt": "你是會議分析助手...",
    "summary_prompt_slug": "acme-meeting-v1",
    "summary_language": "zh-TW",
    "summary_plain_text": true
  }
}

Summary completion event:

{
  "type": "summary_done",
  "data": {
    "summary_mode": "custom",
    "summary_template": "acme-meeting-v1",
    "summary_plain_text": true,
    "tokens_used": 890,
    "summary_fallback_level": null,
    "summary_dropped_segments": null
  }
}

summary_template is the effective slug — in custom mode it returns your slug (i.e., the summary_prompt_slug you submitted).

SSE `regenerate/summary`

Regenerates a summary for an existing recording, split into two endpoints:

Method	Purpose	Writes DB	Stores Transcript	Billed
`GET /api/v1/sse/regenerate/summary/{taskId}`	Preview (trial run, compare different prompt results)	❌	❌	✅
`POST /api/v1/sse/regenerate/summary/{taskId}`	Persist (officially saved)	✅	✅ + bump `revision`	✅

Full specification: reference/sse/regenerate-summary.md

# Preview
curl -N "https://vas-poc.vurbo.ai/api/v1/sse/regenerate/summary/550e8400-...?mode=custom&prompt=...&promptSlug=acme-v2&plainText=true" \
  -H "X-API-Key: vas_..."

# Persist
curl -N -X POST "https://vas-poc.vurbo.ai/api/v1/sse/regenerate/summary/550e8400-..." \
  -H "X-API-Key: vas_..." \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "custom",
    "prompt": "...",
    "promptSlug": "acme-v2",
    "plainText": true
  }'

The done event carries persisted: bool — the client can determine directly from the payload whether this run was saved, without inferring it from the HTTP method.

Transcript Record Fields

After the summary completes, the following fields are written to the top-level of the transcript record (not nested under the summary object):

Field	When Present	Description
`summary_mode`	Always	`"builtin"` / `"custom"`
`summary_template`	Always	effective slug — builtin → built-in slug; custom → your slug
`summary_plain_text`	Always	bool
`summary_prompt_snapshot`	custom mode only	The `prompt` content you passed in verbatim (forcibly snapshotted; the sole basis for reconstruction)
`summary_fallback_level`	Only when fallback is triggered	Value is `2` or `3`, indicating which content-filtering fallback path this summary actually took. Omitted when standard mode succeeds directly
`summary_dropped_segments`	Only when `summary_fallback_level=3`	The indices of the omitted transcript segments (integer array in original order)

The init_summary event of GET /api/v1/sse/history/transcribe/{taskId} also carries the above fields (prompt_snapshot only in custom mode; fallback_level / dropped_segments only when fallback is triggered).

Roles of the Two Fields

summary_prompt_snapshot = your intent (the original prompt content)
summary_fallback_level = the actual execution path (standard / neutral / segment omission)

The two fields are complementary: the snapshot can be used for auditing and reconstruction; fallback_level can be used to inform the user of what actually happened during that summary.

Handling Sensitive Words and Profanity

VAS handles sensitive words (profanity, emotional or sensitive content) via three different paths depending on the "source."

The API layer does not proactively reject requests containing sensitive words. All handling is downgraded or masked at runtime. If you want to block them beforehand, perform your own keyword checks in the frontend UI (see Frontend Pre-Validation (Optional)).

Quick Reference

Source	Trigger Mechanism	Is a Summary Generated?	How the Client Is Informed
The customer `prompt` itself contains sensitive words	Automatically enters neutral mode	✅	`summary_done.summary_fallback_level === 2`
Transcript text (STT layer)	`options.profanity_handling` masks / removes	✅	The transcript text is already processed; no additional notification
Transcript text (summary layer)	Automatically omits the triggering segments	✅	`summary_done.summary_fallback_level === 3`
Multiple paths all fail	`summary_error`	❌	`error_code: llm_content_filtered`

Source 1: Customer prompt contains sensitive words → neutral mode

When the system detects that the customer prompt content triggers content filtering:

Behavior
Automatically generates the summary using neutral instructions instead — does not reject, does not error
The customer's original `prompt` is still stored as a `summary_prompt_snapshot` for auditing
The `summary_done` event carries `summary_fallback_level: 2` to notify the client

Suggested UI message: "Your custom instructions contain terms that cannot be processed due to filtering; the summary was generated in neutral mode."

Source 2: Transcript text (STT-layer masking)

The options.profanity_handling field of the WebSocket start action controls how profanity in the speech recognition output is handled:

Value	Behavior
`mask` (default)	Profanity is masked with `***`
`removed`	Profanity is removed from the transcript
`raw`	The original text is kept unprocessed

See the options field in WebSocket - Voice Translation for details.

This option only affects STT output (the transcript text). It has nothing to do with sensitive words in the customer prompt.

Source 3: Transcript text (summary-layer segment omission) → segment omission mode

If the STT-layer profanity_handling is set to raw (or filtering is still triggered after masking), the system automatically omits the segments that trigger filtering during summary generation:

Behavior
The `summary_done` event carries `summary_fallback_level: 3`
`summary_dropped_segments` lists the indices of the omitted transcript segments (integer array in original order)
The client can use this to inform the user which segments were actually omitted

Suggested UI message: "The transcript contains N segments that cannot be processed; the summary was generated after omitting the related content." (N = summary_dropped_segments.length)

Total Failure → llm_content_filtered

When the customer prompt and the transcript both contain excessive sensitive content, and automatic downgrading still cannot produce a valid summary:

Behavior
Triggers the `summary_error` event with `error_code: llm_content_filtered`
The summary will not be saved

Suggested UI message: "A summary cannot be generated for this content (restricted by content-filtering rules)."

Billing Impact

When a fallback is triggered, the system may need to call the LLM service multiple times, and all calls are recorded in usage_logs for billing. We recommend treating "summaries that trigger a fallback" as normal but higher token-cost events.

Specification Coverage

Path	Fallback Integration Status
WebSocket realtime summary (auto-generated after recording ends)	✅ Since V1.5.5
File import summary	✅ Since V1.5.5
SSE `regenerate/summary` endpoint	⚠️ Not yet integrated — still returns `llm_content_filtered` directly when content filtering is triggered. If you need automatic downgrading, we recommend using the `POST /api/v1/summary` REST endpoint instead

Corresponding Client Implementation

function renderSummary(summary, fallbackLevel, droppedSegments) {
  switch (fallbackLevel) {
    case undefined:
    case null:
    case 1:
      return summary;
    case 2:
      return summary + '\n\n(Your custom instructions contained filtered terms; the summary was generated in neutral mode)';
    case 3:
      return summary + `\n\n(The transcript contained ${droppedSegments.length} segments that could not be processed; the summary was generated after omitting the related content)`;
  }
}

Frontend Pre-Validation (Optional)

If you want to block prompts containing sensitive words before sending the request (saving the extra token billing from fallbacks), you can add keyword checks to the frontend UI.

VAS does not provide an API-layer prompt pre-check endpoint — whether to perform pre-validation and which keyword list to use is entirely up to the client.

Security and Limits

Built-in Safety Guards

In custom mode, the system automatically adds safety guards to the customer prompt, including:

Content neutralization guidance: Instructs the LLM to summarize the intent of any colloquial, emotional, or sensitive terms in the source text using neutral, objective language, avoiding verbatim quotation. The goal is to reduce the chance that standard mode triggers content filtering.
Instruction injection protection: Prevents instructions in the customer prompt from accidentally overriding system rules.

The safety guards are not exposed for client configuration; the customer's original prompt is still stored in the summary_prompt_snapshot field for auditing.

⚠️ The safety guards are not foolproof. You should not splice untrusted end-user input directly into prompt; prompt injection risk is your own responsibility.

Character and Length Limits

Field	Limit
`prompt` / `summary_prompt`	≤ 2000 characters
`prompt_slug` / `summary_prompt_slug`	≤ 64 characters, Unicode, no control characters (`\n` / `\r` / `\t` / `\0`, etc.)
`content` (the transcript input for REST `POST /api/v1/summary`)	≤ 100,000 characters

const response = await fetch('https://vas-poc.vurbo.ai/api/v1/summary', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    content: transcriptText,
    mode: 'custom',
    prompt: `你是會議分析助手。請從逐字稿萃取:
1. 主要決議事項(每項一句話)
2. 待辦工作(含負責人、期限)
3. 風險與阻礙
輸出格式:JSON。請以繁體中文輸出。`,
    prompt_slug: 'acme-meeting-v2',
    language: 'zh-TW',
    plain_text: true,
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value);

  for (const line of buffer.split('\n\n')) {
    if (line.startsWith('event: chunk')) {
      const data = JSON.parse(line.split('\ndata: ')[1]);
      process.stdout.write(data.content);
    } else if (line.startsWith('event: done')) {
      const data = JSON.parse(line.split('\ndata: ')[1]);
      console.log('\n\n--- Done ---');
      console.log(`tokens: input=${data.tokens_used.input}, output=${data.tokens_used.output}`);
      console.log(`prompt_snapshot saved: ${data.prompt_snapshot ? 'yes' : 'no'}`);
    }
  }
}

Python — SSE Regenerate Summary (custom mode: preview then persist)

import requests

BASE = "https://vas-poc.vurbo.ai/api/v1/sse/regenerate/summary"
TASK_ID = "550e8400-e29b-41d4-a716-446655440000"
HEADERS = {"X-API-Key": "vas_..."}

# 1. Preview first with GET
preview_params = {
    "mode": "custom",
    "prompt": "你是醫療摘要助手...",
    "promptSlug": "clinic-acme-v3",
    "plainText": "true",
}
with requests.get(f"{BASE}/{TASK_ID}", params=preview_params, headers=HEADERS, stream=True) as r:
    for line in r.iter_lines():
        if line.startswith(b"data: "):
            print(line[6:].decode())

# 2. Persist with POST after customer confirmation
persist_body = {
    "mode": "custom",
    "prompt": "你是醫療摘要助手...",
    "promptSlug": "clinic-acme-v3",
    "plainText": True,
}
with requests.post(f"{BASE}/{TASK_ID}", json=persist_body, headers=HEADERS, stream=True) as r:
    for line in r.iter_lines():
        if line.startswith(b"data: "):
            print(line[6:].decode())

WebSocket — Live Recording custom Summary

ws.send(JSON.stringify({
  type: 'voice-translation',
  data: {
    action: 'start',
    transcription_languages: ['zh-TW'],
    translation_languages: [],
    type: 'transcribe',
    audio_format: 'pcm',
    summary_mode: 'custom',
    summary_prompt: '你是法律會議助手。請輸出:1) 爭點 2) 立場 3) 結論。',
    summary_prompt_slug: 'legal-firm-acme-v1',
    summary_language: 'zh-TW',
    summary_plain_text: true,
  },
}));

ws.addEventListener('message', (evt) => {
  const msg = JSON.parse(evt.data);
  if (msg.type === 'summary_done') {
    console.log('Summary done', msg.data);
    if (msg.data.summary_fallback_level === 2) {
      showBanner('Your custom instructions contained filtered terms; the summary was generated in neutral mode');
    } else if (msg.data.summary_fallback_level === 3) {
      showBanner(`The transcript contained ${msg.data.summary_dropped_segments.length} segments that could not be processed and were omitted`);
    }
  } else if (msg.type === 'summary_error') {
    console.error('Summary failed', msg.data.error_code, msg.data.message);
  }
});

Error Codes

Error Code	HTTP	Trigger Condition
`summary_invalid_mode`	422 (SSE) / 400 (others)	`mode` is not `builtin` / `custom`
`summary_mode_field_mismatch`	422 / 400	The mode and field combination do not match (required field missing / disallowed field provided)
`summary_prompt_too_long`	422 / 400	`prompt` exceeds 2000 characters
`summary_prompt_slug_too_long`	422 / 400	`prompt_slug` exceeds 64 characters
`summary_prompt_slug_invalid`	422 / 400	`prompt_slug` contains control characters (`\n` / `\r` / `\t` / `\0`, etc.)
`template_not_found`	404	(builtin mode) The template for the specified slug does not exist or has been disabled
`llm_content_filtered`	400	Blocked by content filtering (the customer prompt or transcript contains sensitive words), and the fallback chain failed entirely or the endpoint does not integrate fallback
`summary_failed`	500	Summary generation failed
`summary_timeout`	504	Generation timed out

For a complete list of error codes, see Error Code Reference.

Document	Description
POST /api/v1/summary	Full specification for the REST summary endpoint
Summary Templates API	Built-in template list and template lookup
SSE - Regenerate Summary	Specification for the `GET` preview / `POST` persist endpoints
WebSocket - Voice Translation	The `summary_*` fields of the `start` action
SSE - History	The mode / template / prompt_snapshot fields of the `init_summary` event
Error Code Reference	Complete list of error codes
Changelog V1.5.5	The version that introduced the builtin/custom mutual-exclusion spec and automatic content-filtering downgrade

Version: V1.5.7 Last Updated: 2026-05-20

Speaker Management

Tts