Get Result
This endpoint enables real-time audio transcription using WebSocket. It supports streaming audio input and provides partial and final transcription results during the session.
Request
URL:
GET /ws (Websocket)
Headers:
Authorization: Bearer <TOKEN>
WebSocket Connection Initialization
Once the WebSocket connection is established, the first message must be a JSON string to initialize the transcription session.
Initialization Message Parameters
- uid (required): A unique identifier for the client/session.
- sample_rate (required): Audio sample rate (e.g.
16000). - language (optional, default:
"en"): The language of the transcription. - task (optional, default:
"transcribe"): Task type,"transcribe"or"translate". - duration_minutes (required if token is provided): The intended duration of the session.
- model (optional, default:
"large-v3-turbo"): Model version to use. - initial_prompt (optional): Initial text prompt to guide the transcription.
- use_vad (optional, default:
true): Whether to use voice activity detection. - spoken_languages (optional): List of expected spoken languages.
- vad_parameters (optional): Custom VAD settings (e.g., silence threshold).
- max_delay (optional, default:
7): Max delay before producing partials.
Example Initialization Message
{
"uid": "client-1234",
"sample_rate": 16000,
"language": "en",
"task": "transcribe",
"duration_minutes": 5,
"model": "large-v3-turbo",
"initial_prompt": "Welcome to Whisperly.",
"use_vad": true,
"spoken_languages": ["en"],
"vad_parameters": {
"min_silence_duration_ms": 800,
"max_speech_duration_s": 7
},
"max_delay": 7
}
Streaming Audio
After the initialization, the client must send raw audio chunks as float32 arrays. End the stream by sending the b"END_OF_AUDIO" signal.
Audio Frame Format
- Must be float32 PCM raw data.
- Send data continuously to avoid timeout or disconnects.
Server Messages
The server will respond with transcription results in real-time.
Response Types
- Partial Transcription:
{
"uid": "client-1234",
"segments": [
{
"start": "0.000",
"end": "3.500",
"text": "Hello everyone",
"type": "partial"
}
]
}
- Final Transcription:
Sent when a complete sentence or silence is detected.
{
"uid": "client-1234",
"segments": [
{
"start": "0.000",
"end": "3.500",
"text": "Hello everyone",
"type": "final"
}
]
}
- Server Ready:
{
"uid": "client-1234",
"message": "SERVER_READY",
"backend": "faster_whisper"
}
- Error Message:
{
"uid": "client-1234",
"status": "ERROR",
"message": "duration_minutes is required"
}
Closing the Session
The connection is automatically closed:
- If the specified duration is exceeded.
- If client sends b"END_OF_AUDIO" signal.
- If an error occurs.
You can also send websocket.close() from the client to terminate the session manually.