Choose The Workflow
| Workflow | Endpoint | Use it when |
|---|---|---|
| Text to speech | POST /v1/audio/speech | You need an audio file from text. |
| Transcription | POST /v1/audio/transcriptions | You need text from an audio file. |
| Audio translation | POST /v1/audio/translations | You need translated text from an audio file. |
| Realtime session | GET /v1/realtime | You need bidirectional streaming audio or realtime multimodal events. |
Discover Models
Query the model catalog before hard-coding a model. Use recommended shortlists for speech and transcription, and use model details to confirm realtime support before opening a socket.Synchronous Audio Requests
Speech, transcription, and translation requests return directly from the HTTP request. Large inputs can take longer than common client defaults, so set a generous timeout and store request IDs for support.Realtime Sessions
Open a WebSocket with the model in the query string and the API key in the Authorization header. Keep the event format documented for the selected realtime model, and close the socket when the session is complete.State Handling
- Save generated audio files instead of replaying the same request on refresh.
- For transcription and translation, show upload and processing states even when the API call is synchronous.
- For realtime, handle close events and reconnect only after the user starts a new session.
- Do not put API keys, private URLs, or account secrets in audio text input.
API Reference
| Topic | Reference |
|---|---|
| Create Speech | Create Speech |
| Create Transcription | Create Transcription |
| Create Translation | Create Translation |
| Realtime WebSocket | Realtime WebSocket |
| List Models | List Models |
| Billing & Pricing | Billing & Pricing |