Get Result

This endpoint enables real-time audio transcription using WebSocket. It supports streaming audio input and provides partial and final transcription results during the session.

Request

URL:
GET /ws (Websocket)

Headers:
Authorization: Bearer <TOKEN>

WebSocket Connection Initialization

Once the WebSocket connection is established, the first message must be a JSON string to initialize the transcription session.

Initialization Message Parameters

uid (required): A unique identifier for the client/session.
sample_rate (required): Audio sample rate (e.g. 16000).
language (optional, default: "en"): The language of the transcription.
task (optional, default: "transcribe"): Task type, "transcribe" or "translate".
duration_minutes (required if token is provided): The intended duration of the session.
model (optional, default: "large-v3-turbo"): Model version to use.
initial_prompt (optional): Initial text prompt to guide the transcription.
use_vad (optional, default: true): Whether to use voice activity detection.
spoken_languages (optional): List of expected spoken languages.
vad_parameters (optional): Custom VAD settings (e.g., silence threshold).
max_delay (optional, default: 7): Max delay before producing partials.

Example Initialization Message

{
  "uid": "client-1234",
  "sample_rate": 16000,
  "language": "en",
  "task": "transcribe",
  "duration_minutes": 5,
  "model": "large-v3-turbo",
  "initial_prompt": "Welcome to Whisperly.",
  "use_vad": true,
  "spoken_languages": ["en"],
  "vad_parameters": {
    "min_silence_duration_ms": 800,
    "max_speech_duration_s": 7
  },
  "max_delay": 7
}

Streaming Audio

After the initialization, the client must send raw audio chunks as float32 arrays. End the stream by sending the b"END_OF_AUDIO" signal.

Audio Frame Format

Must be float32 PCM raw data.
Send data continuously to avoid timeout or disconnects.

Server Messages

The server will respond with transcription results in real-time.

Response Types

Partial Transcription:

{
  "uid": "client-1234",
  "segments": [
    {
      "start": "0.000",
      "end": "3.500",
      "text": "Hello everyone",
      "type": "partial"
    }
  ]
}

Final Transcription:

Sent when a complete sentence or silence is detected.

{
  "uid": "client-1234",
  "segments": [
    {
      "start": "0.000",
      "end": "3.500",
      "text": "Hello everyone",
      "type": "final"
    }
  ]
}

- Server Ready:

{
  "uid": "client-1234",
  "message": "SERVER_READY",
  "backend": "faster_whisper"
}

- Error Message:

{
  "uid": "client-1234",
  "status": "ERROR",
  "message": "duration_minutes is required"
}

Closing the Session

The connection is automatically closed:

If the specified duration is exceeded.
If client sends b"END_OF_AUDIO" signal.
If an error occurs.

You can also send websocket.close() from the client to terminate the session manually.

Request​

WebSocket Connection Initialization​

Initialization Message Parameters​

Example Initialization Message​

Streaming Audio​

Audio Frame Format​

Server Messages​

Response Types​

Closing the Session​