Skip to main content
Audio workloads split into two shapes. Use the audio endpoints for file-like requests such as text-to-speech, transcription, and audio translation. Use the realtime WebSocket endpoint when the user experience needs low-latency, interactive audio or multimodal events.

Choose The Workflow

WorkflowEndpointUse it when
Text to speechPOST /v1/audio/speechYou need an audio file from text.
TranscriptionPOST /v1/audio/transcriptionsYou need text from an audio file.
Audio translationPOST /v1/audio/translationsYou need translated text from an audio file.
Realtime sessionGET /v1/realtimeYou need bidirectional streaming audio or realtime multimodal events.

Discover Models

Query the model catalog before hard-coding a model. Use recommended shortlists for speech and transcription, and use model details to confirm realtime support before opening a socket.
curl "https://api.tokenlab.sh/v1/models?recommended_for=tts" \
  -H "Authorization: Bearer sk-your-api-key"

curl "https://api.tokenlab.sh/v1/models?recommended_for=stt" \
  -H "Authorization: Bearer sk-your-api-key"

Synchronous Audio Requests

Speech, transcription, and translation requests return directly from the HTTP request. Large inputs can take longer than common client defaults, so set a generous timeout and store request IDs for support.
curl -X POST "https://api.tokenlab.sh/v1/audio/speech" \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1-hd",
    "voice": "nova",
    "input": "Welcome to TokenLab."
  }' \
  --output speech.mp3

Realtime Sessions

Open a WebSocket with the model in the query string and the API key in the Authorization header. Keep the event format documented for the selected realtime model, and close the socket when the session is complete.
import WebSocket from 'ws';

const socket = new WebSocket('wss://api.tokenlab.sh/v1/realtime?model=gpt-realtime', {
  headers: { Authorization: 'Bearer sk-your-api-key' }
});

socket.on('message', (event) => console.log(event.toString()));

State Handling

  • Save generated audio files instead of replaying the same request on refresh.
  • For transcription and translation, show upload and processing states even when the API call is synchronous.
  • For realtime, handle close events and reconnect only after the user starts a new session.
  • Do not put API keys, private URLs, or account secrets in audio text input.

API Reference

TopicReference
Create SpeechCreate Speech
Create TranscriptionCreate Transcription
Create TranslationCreate Translation
Realtime WebSocketRealtime WebSocket
List ModelsList Models
Billing & PricingBilling & Pricing