Introduction

Overview

The scope and API composition of Vurbo.ai API Service (VAS).

Vurbo.ai API Service (VAS) is a Task-centric speech processing API providing real-time speech recognition, translation, text-to-speech (TTS), summarization and broadcasting.

Service composition

ChannelPurposeAuthentication
REST APITask management, audio import, webhook configurationAPI Key (X-API-Key header)
WebSocketReal-time recording, speech recognition and translationTicket (exchanged from an API Key, short-lived and single-use)
SSEBroadcast viewers receiving real-time captions, task history playbackAPI Key; broadcast viewers use a share token
WebhookTask completion and failure event callbacks

The API Key is a long-lived credential used for REST and SSE. WebSocket connections instead exchange the key for a single-use Ticket, avoiding exposing the API Key in the connection URL; broadcast viewers are identified by a separate share token and do not need an API Key.

Recording types and use cases

TypeDescriptionParticipants / orientationScenario
transcribeSpeech to textSingle-device recording, one or multiple speakersMeeting notes, interview records
conversationBilingual real-time interpretation1-to-1 (two people sharing one device)Cross-language conversation, live interpretation
recordPlain recordingSingle personVoice memos, quick notes
broadcastBroadcast / live stream1-to-many (one presenter → many viewers)Lectures, talks, live streams

Next steps

Copyright © 2026