Overview
TokenLab provides access to video generation models through a single unified API. Video generation is asynchronous: submit a request, receive a task ID andpoll_url, then poll for the final result.
Availability and polling
The model inventory changes over time. For the latest public availability, use the Models API or visit the Models page. If a create response returnspoll_url, call that exact URL. When it points to /v1/tasks/{id}, treat that as the canonical fixed status endpoint.
Model and media behavior
Audio behavior is model-dependent. In TokenLab, Veo 3 family requests default to audio-on whenoutput_audio is omitted. Some public models are silent-only or do not expose a stable toggle.
For production integrations, prefer publicly reachable https URLs over inline base64 for images, videos, and audio. Inline data: URLs are still supported by compatible models, but URLs are easier to retry, inspect, and debug.
Async Workflow
Public Operations
TokenLab’s current public video contract centers on these operations:text-to-videoimage-to-videoreference-to-videostart-end-to-videovideo-to-videomotion-control
audio-to-video and video-extension for model-specific flows, but the current generally enabled public model inventory in this docs build does not include a broadly enabled model that advertises either capability.
Capability Matrix
Legend: ✅ Supported by at least one currently enabled public model in that provider family | ❌ Not currently represented by an enabled public model| Series | T2V | I2V | Reference | Start-End | V2V | Motion |
|---|---|---|---|---|---|---|
| OpenAI | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Kuaishou | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | |
| ByteDance | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| MiniMax | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Alibaba | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
| Shengshu | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
| xAI | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ |
| Other | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ |
Capability Definitions
- T2V (Text-to-Video): Generate video from a text prompt.
- I2V (Image-to-Video): Animate a starting image. For the broadest compatibility, provide
image_url. - Reference: Condition generation on one or more reference images via
reference_images. - Start-End: Control the first and last frames with
start_imageandend_image. - V2V (Video-to-Video): Use an existing video as the primary source input.
- Motion: Combine a subject image with a motion reference video.
Current Public Model Inventory
OpenAI
| Model | Public operations |
|---|---|
sora-2 | Text-to-video, image-to-video |
sora-2-pro | Text-to-video, image-to-video |
sora-2-pro-storyboard | Image-to-video |
Kuaishou
| Model | Public operations |
|---|---|
kling-3.0-motion-control | Motion control |
kling-3.0-video | Text-to-video, image-to-video, start-end-to-video, element references |
kling-v2.1-master | Text-to-video, image-to-video |
kling-v2.1-pro | Image-to-video, start-end-to-video |
kling-v2.1-standard | Image-to-video |
kling-v2.5-turbo-pro | Text-to-video, image-to-video, start-end-to-video |
kling-v2.5-turbo-std | Text-to-video, image-to-video |
kling-v2.6-pro | Text-to-video, image-to-video, start-end-to-video |
kling-v2.6-std | Text-to-video, image-to-video |
kling-v3.0-pro | Text-to-video, image-to-video, start-end-to-video |
kling-v3.0-std | Text-to-video, image-to-video, start-end-to-video |
kling-video-o1-pro | Text-to-video, image-to-video, reference-to-video, start-end-to-video, video-to-video |
kling-video-o1-std | Text-to-video, image-to-video, reference-to-video, start-end-to-video, video-to-video |
| Model | Public operations |
|---|---|
veo3 | Text-to-video, image-to-video |
veo3-fast | Text-to-video, image-to-video |
veo3-pro | Text-to-video, image-to-video |
veo3.1 | Text-to-video, image-to-video, reference-to-video, start-end-to-video |
veo3.1-fast | Text-to-video, image-to-video, reference-to-video, start-end-to-video |
veo3.1-pro | Text-to-video, image-to-video, start-end-to-video |
ByteDance
| Model | Public operations |
|---|---|
seedance-1.5-pro | Text-to-video, image-to-video |
MiniMax
| Model | Public operations |
|---|---|
hailuo-2.3-fast | Image-to-video |
hailuo-2.3-pro | Text-to-video, image-to-video |
hailuo-2.3-standard | Text-to-video, image-to-video |
Alibaba
| Model | Public operations |
|---|---|
wan-2.2-plus | Text-to-video, image-to-video |
wan-2.5 | Text-to-video, image-to-video |
wan-2.6 | Text-to-video, image-to-video, reference-to-video |
Shengshu
| Model | Public operations |
|---|---|
viduq2 | Text-to-video, reference-to-video |
viduq2-pro | Image-to-video, reference-to-video, start-end-to-video |
viduq2-pro-fast | Image-to-video, start-end-to-video |
viduq2-turbo | Image-to-video, start-end-to-video |
viduq3-pro | Text-to-video, image-to-video, start-end-to-video |
viduq3-turbo | Text-to-video, image-to-video, start-end-to-video |
xAI
| Model | Public operations |
|---|---|
grok-imagine-video | Text-to-video, image-to-video, reference-to-video, video-to-video |
grok-imagine-video-1.5-preview | Image-to-video |
grok-imagine-image-to-video | Image-to-video |
grok-imagine-text-to-video | Text-to-video |
grok-imagine-upscale | Video-to-video |
Other
| Model | Public operations |
|---|---|
topaz-video-upscale | Video-to-video |
Usage Examples
Text-to-Video
Image-to-Video
Kling 3.0 Elements
Usekling_elements with kling-3.0-video when you need element references. Provide an image-conditioned request (image_url, image_urls, start_image, or end_image) and reference each element in the prompt with @name. Do not combine kling_elements with output_audio=true; omit output_audio or set it to false for element-reference requests.
Reference-to-Video
Forseedance-2.0 and seedance-2.0-fast, TokenLab currently supports up to 9 reference images plus up to 3 reference videos and 3 reference audios. duration controls generated output length only; it does not define a separate reference video input duration limit. For grok-imagine-video, reference-to-video accepts up to 7 image references (reference_images or image_urls) and duration is capped at 10 seconds. Do not combine reference images with image_url / image first-frame inputs. grok-imagine-video-1.5-preview is image-to-video only.
Start-End-to-Video
Video-to-Video
Forgrok-imagine-video video-to-video, send a public HTTPS .mp4 URL in video_url. TokenLab translates it to xAI’s REST video.url body; duration, aspect_ratio, and resolution are not accepted for that edit flow.
Motion Control
Parameters Reference
| Parameter | Type | Notes |
|---|---|---|
operation | string | Explicit operation is recommended in production. |
image_url | string | Preferred image input form for broad cross-model compatibility. |
image | string | Inline data URL; useful for debugging and small local integrations. |
reference_images | string[] | Canonical public field for reference-image conditioning. |
reference_image_type | string | Optional asset / style selector when supported. |
video_url | string | Required for current public video-to-video and motion-control models. |
audio_url | string | Used by model-specific audio-conditioned flows when available. |
output_audio | boolean | Veo 3 family defaults to true when omitted. kling-3.0-video accepts this selector for upstream sound control and defaults to silent output when omitted. |
Model Selection Guide
Best Quality
veo3.1-pro, kling-video-o1-pro, and viduq3-pro are strong choices when fidelity matters more than speed.
Fastest Public Options
veo3.1-fast, hailuo-2.3-fast, and viduq3-turbo are good starting points for faster iteration.
Reference-Heavy Flows
Use veo3.1, veo3.1-fast, wan-2.6, or kling-video-o1-pro / std when you need dedicated reference-image conditioning.
Video-to-Video
topaz-video-upscale, grok-imagine-upscale, and kling-video-o1-pro / std cover the current generally enabled public
video-to-video paths.