创建视频 - TokenLab

概述

视频生成是异步的。你提交请求后，会收到一个任务 ID 和 poll_url，然后通过轮询获取结果。

轮询行为

创建响应会返回规范异步标识 id，并通常同时回传 task_id。请优先轮询 poll_url；如果需要固定状态入口，请使用 GET /v1/tasks/{id}。如果创建响应返回 poll_url，请直接使用该 URL。若它指向 /v1/tasks/{id}，请将其视为规范的固定状态查询入口。为了获得最可靠的轮询行为，请严格使用创建请求返回的 poll_url。

模型与媒体行为

音频输出是否开启取决于具体模型。在 TokenLab 中，Veo 3 和 Seedance 请求在省略 output_audio 时默认按开启音频处理。模型支持音频控制时，可用 output_audio 显式切换；兼容别名 outputAudio 和 generate_audio 也可以使用，但多个字段同时出现时值必须一致。生产集成中，建议优先使用公网可访问的 https URL 作为图片、视频和音频输入。兼容模型仍支持内联 data: URL，但大体积 base64 在重试、观测和排障时通常更不友好。

请求体

model

string

默认值:"veo3.1"

视频模型 ID。请使用 veo3.1、wan-2.7、happyhorse-1.0、viduq3、pixverse-v6、kling-3.0-video 等模型 ID；文生视频、图生视频、参考图生视频等能力用 operation 选择。当前公开视频能力请参考视频生成指南和 Models API。

PixVerse

模型: pixverse-c1, pixverse-v6, pixverse-v5.6
操作: text-to-video, image-to-video, start-end-to-video, reference-to-video
音频参数: output_audio, 默认为 false

在 TokenLab 上，上述 PixVerse 模型不接受 operation=video-extension。HappyHorse

模型: happyhorse-1.0
操作: text-to-video, image-to-video, reference-to-video, video-to-video
音频参数: 不要传入 output_audio

prompt

string

必填

要生成视频的文本描述。大多数公开视频模型都要求该字段。

operation

string

要执行的视频操作。支持 text-to-video、image-to-video、reference-to-video、start-end-to-video、video-to-video、video-extension、audio-to-video 和 motion-control。TokenLab 可以根据输入自动推断操作，但生产环境仍建议显式传入 operation。

image_url

string

用于图生视频的起始图片 URL。为了获得最广泛的跨模型兼容性，建议优先使用 image_url。

image

string

以内联 data URL 形式提供的图片（例如 data:image/jpeg;base64,...）。兼容模型支持该方式，但 image_url 的兼容性更广。

reference_images

array

用于参考图生视频的参考图输入。可传数量取决于模型。对于 seedance-2.0 和 seedance-2.0-fast，TokenLab 当前支持最多 9 张参考图，外加最多 3 段参考视频和 3 段参考音频。模型选型、4K 边界和 Mini 说明请参考 Seedance 2.0 视频模型指南。建议优先使用公网 https URL；兼容模型也支持内联 data: URL。对于 grok-imagine-video，reference-to-video 最多接受 7 个图片参考，且 duration 最高为 10 秒。grok-imagine-video-1.5-preview 仅支持图生视频，不接受参考图片。

material_asset_id

string

创建素材或自动图片准备流程返回的 TokenLab Seedance 素材 ID。素材变为 ACTIVE 后，可在能够使用 TokenLab 素材库的 Seedance 模型中使用。

material_asset_ids

array

多个 TokenLab Seedance 素材 ID。它们与 reference_images 共用 Seedance 图片参考数量限制；所选模型必须能够使用 TokenLab 素材库。

当所选 Seedance 模型可使用 TokenLab 素材库时，TokenLab 会在生成前将图片字段（image、image_url、image_urls、reference_images、start_image、end_image）准备为可复用素材。如果 60 秒内仍未准备完成，API 会返回 409 seedance_material_preparing 和 auto_material_asset_ids；请在这些素材变为 ACTIVE 后重试。如果所选模型暂不可使用素材库，普通图片输入仍走常规图片路径；显式素材 ID 会返回素材可用性错误。

reference_image_type

string

可选的参考图角色字段，用于区分 asset 和 style 两类参考图的模型。

kling_elements

array

kling-3.0-video 的元素引用定义，仅支持带图片条件的请求。可定义 1-3 个元素；每个元素包含 name、可选 description，以及 2-4 个图片 URL 的 element_input_urls。在 prompt 中用 @name 引用元素。不要将 kling_elements 与 output_audio=true 组合使用；使用元素引用时请省略 output_audio 或设置为 false。

video_url

string

源视频的公网 URL。基于视频 URL 的 video-to-video 流程和 motion-control 需要该字段；部分衍生流程改用 task_id。

video_urls

array

用于支持多模态参考条件控制的额外参考视频输入。可传数量取决于模型。对于 seedance-2.0 和 seedance-2.0-fast，TokenLab 当前支持最多 3 段参考视频。

audio_url

string

用于 audio-to-video 模型的公网音频 URL。

audio_urls

array

用于支持多模态参考条件控制的额外参考音频输入。可传数量取决于模型。对于 seedance-2.0 和 seedance-2.0-fast，TokenLab 当前支持最多 3 段参考音频。

task_id

string

某些续写、扩展或衍生流程使用的任务标识符。

extend_at

integer

某些 video-extension 流程使用的模型侧扩展起点参数。

extend_times

string

某些 video-extension 流程使用的模型侧扩展次数或倍率参数。

duration

integer

生成输出视频的时长（秒）。Seedance 1.5/2.0 模型省略时默认 5 秒；传 -1 表示让模型在支持范围内自动选择时长，任务完成前会按保守方式预估费用。

seconds

integer

duration 的兼容别名。若同时传 seconds 和 duration，两者必须完全一致。对 Seedance，seconds=-1 与 duration=-1 一样表示自动时长。

aspect_ratio

string

规范宽高比，例如 adaptive、16:9、9:16、1:1、4:3、3:4 或 21:9。Seedance 省略时默认 adaptive。

resolution

string

模型相关的输出分辨率。Seedance 省略时默认 720p；seedance-2.0 支持 480p、720p、1080p 和 4k，seedance-2.0-fast / seedance-2.0-mini 仅支持 480p 和 720p。

output_audio

boolean

规范的音频输出开关。Veo 3 和 Seedance 省略时默认 true。kling-3.0-video 仅在非元素引用请求中接受该字段，省略时默认无声。不要将 output_audio=true 与 kling_elements 组合使用。

draft

boolean

Seedance 1.5 Pro 草稿工作流开关。仅在支持草稿任务的 Seedance 模型上使用 draft=true；不要和 draft_task_id 同时传。

draft_task_id

string

Seedance 1.5 Pro 草稿晋升任务 ID。传入上一次草稿任务 ID 后创建正式视频；这不是通用视频字段。

ratio

string

aspect_ratio 的兼容别名。若同时传 ratio 和 aspect_ratio，两者必须完全一致。

generate_audio

boolean

output_audio 的兼容别名。generate_audio、output_audio、outputAudio 同时出现时，所有值必须一致。

execution_expires_after

integer

兼容视频模型的执行过期时间（秒）。Seedance 省略时默认 172800 秒。

priority

integer

兼容视频模型的任务优先级，范围 0 到 9。不要把 priority 与 service_tier=flex 组合使用。

safety_identifier

string

兼容视频模型的终端用户安全标识。Seedance 未传该字段时，TokenLab 会使用 user 的值。

service_tier

string

default 对 Seedance 2.0 会按兼容 no-op 处理。只有所选模型明确支持时才可使用 flex。

frames

integer

兼容视频模型的可选帧数。Seedance 2.0 模型和 Seedance 1.5 Pro 不支持该字段。

camera_fixed

boolean

兼容视频模型的固定相机开关。Seedance 2.0 模型不支持该字段。

fps

integer

每秒帧数（1-120），仅在模型公开支持 FPS 控制时生效。

negative_prompt

string

希望在视频生成中避免的内容。

seed

integer

用于可复现生成的随机种子。Seedance 省略时使用 -1 表示随机种子。

cfg_scale

number

提示词遵循强度（0-20），仅在公开模型支持该控制项时生效。

motion_strength

number

运动强度（0-1），仅在公开模型支持该字段时生效。

start_image

string

start-end-to-video 中使用的起始帧图片 URL 或兼容图片输入。

end_image

string

start-end-to-video 中使用的结束帧图片 URL 或兼容图片输入。

size

string

兼容视频模型使用的模型相关尺寸档位参数。

watermark

boolean

某些模型暴露的水印开关。Seedance 省略时默认 false。

effect_type

string

某些特效或编辑流程使用的模型侧效果选择器。

user

string

终端用户的唯一标识符。对 Seedance，如果未传 safety_identifier，TokenLab 会使用该值。

兼容说明

规范公开字段继续使用 snake_case：aspect_ratio、output_audio、reference_images 和 reference_image_type。
为了兼容已有调用，TokenLab 也接受 ratio、generate_audio、outputAudio、seconds、referenceImages 和 referenceImageType。
如果规范字段和别名字段同时出现，值必须一致；冲突会在创建任务前被拒绝。
如果省略 operation，TokenLab 会根据输入自动推断操作；生产环境仍建议显式传入。

输入最佳实践

对于 image_url、reference_images、video_url 和 audio_url，建议优先使用公网可访问的 https URL。
可以避免在同一请求中混用内联 base64 和远程 URL；统一一种表示方式更容易排障和重试。
请确保远程媒体 URL 的有效期足够覆盖重试窗口和异步任务创建过程。

Seedance 参数

对于 Seedance 1.5/2.0 模型，统一接口以 TokenLab 字段为主，同时接受兼容别名 seconds、ratio 和 generate_audio。省略 Seedance 参数时会使用这些默认值：duration=5、resolution=720p、aspect_ratio=adaptive、output_audio=true、watermark=false、return_last_frame=false、execution_expires_after=172800、priority=0、seed=-1。 duration=-1 或 seconds=-1 表示让 Seedance 在模型支持范围内自动选择输出时长。TokenLab 会在任务完成前按保守方式预估费用，并在可获得完成任务 usage 时按实际结果结算。service_tier=default 对 Seedance 2.0 作为兼容 no-op 接受；service_tier=flex、frames、camera_fixed 会在所选模型不支持时被拒绝。

Seedance 示例

cURL

curl -X POST "https://api.tokenlab.sh/v1/videos/generations" \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2.0",
    "prompt": "A sleek product reveal with cinematic camera movement",
    "operation": "text-to-video",
    "duration": -1,
    "aspect_ratio": "adaptive",
    "resolution": "720p",
    "output_audio": true
  }'

响应

string

规范异步任务 ID。

task_id

string

用于轮询的唯一任务标识符。

poll_url

string

此任务建议使用的轮询 URL。检查状态时请使用该精确路径。

billing_transaction_id

string

当结算已经完成时返回 TokenLab 账单交易 ID。它对应 dashboard / 对账使用的交易标识，与异步 id / task_id 不同。

status

string

初始状态：pending。

created

integer

创建任务时的 Unix 时间戳。

model

string

所使用的模型。

video_url

string

结果已就绪时可直接使用的视频 URL。

video

object

可用时返回单个视频对象，包含 url、duration、width 和 height。

videos

array

当任务生成多个输出时，可能出现视频数组。

error

string

任务失败时返回的错误信息或结构化错误对象。

curl -X POST "https://api.tokenlab.sh/v1/videos/generations" \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "veo3.1",
    "prompt": "A cat walking through a garden, cinematic lighting",
    "operation": "text-to-video",
    "duration": 4,
    "aspect_ratio": "16:9"
  }'

{
  "id": "ldtask_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
  "task_id": "ldtask_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
  "poll_url": "/v1/tasks/ldtask_aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
  "status": "pending",
  "model": "veo3.1",
  "created": 1706000000
}

图生视频

response = requests.post(
    "https://api.tokenlab.sh/v1/videos/generations",
    headers={"Authorization": "Bearer sk-your-api-key"},
    json={
        "model": "hailuo-2.3-standard",
        "prompt": "The scene begins from the provided image and adds gentle natural motion.",
        "operation": "image-to-video",
        "image_url": "https://example.com/image.jpg",
        "duration": 6,
        "aspect_ratio": "16:9"
    }
)

Kling 3.0 元素引用

当需要元素引用时，在 kling-3.0-video 请求中传入 kling_elements。请求需要包含图片条件输入（image_url、image_urls、start_image 或 end_image），并在提示词中用 @name 引用对应元素。

response = requests.post("https://api.tokenlab.sh/v1/videos/generations",
    headers=headers,
    json={
        "model": "kling-3.0-video",
        "prompt": "Place @hero_bag on a studio turntable with soft product lighting.",
        "operation": "image-to-video",
        "image_url": "https://example.com/studio-start.png",
        "duration": 5,
        "resolution": "720p",
        "kling_elements": [
            {
                "name": "hero_bag",
                "description": "black leather handbag",
                "element_input_urls": [
                    "https://example.com/bag-front.png",
                    "https://example.com/bag-side.png"
                ]
            }
        ]
    }
)

参考图生视频

当模型支持专门的参考条件控制时，请使用 operation=reference-to-video。在 TokenLab 请求中，图片参考素材使用 reference_images，多模态参考视频和参考音频分别使用 video_urls 与 audio_urls。对于 seedance-2.0 和 seedance-2.0-fast，TokenLab 当前支持最多 9 张参考图，外加最多 3 段参考视频和 3 段参考音频。模型选型、4K 边界和 Mini 说明请参考 Seedance 2.0 视频模型指南。duration 只控制生成输出时长，不单独限制参考视频输入时长。对于 grok-imagine-video，reference-to-video 最多接受 7 个图片参考（reference_images 或 image_urls），且 duration 最高为 10 秒。不要把参考图片与 image_url / image 首帧输入混用。grok-imagine-video-1.5-preview 仅支持图生视频。

response = requests.post(
    "https://api.tokenlab.sh/v1/videos/generations",
    headers={"Authorization": "Bearer sk-your-api-key"},
    json={
        "model": "veo3.1",
        "prompt": "Keep the same subject identity and palette while adding subtle motion.",
        "operation": "reference-to-video",
        "reference_images": [
            "https://example.com/ref-a.jpg",
            "https://example.com/ref-b.jpg"
        ],
        "reference_image_type": "asset",
        "duration": 8,
        "resolution": "720p",
        "aspect_ratio": "9:16"
    }
)

首尾帧控制

使用 start_image 和 end_image 控制首帧和尾帧：

response = requests.post(
    "https://api.tokenlab.sh/v1/videos/generations",
    headers={"Authorization": "Bearer sk-your-api-key"},
    json={
        "model": "viduq2-pro",
        "prompt": "Smooth transition from day to night",
        "operation": "start-end-to-video",
        "start_image": "https://example.com/day.jpg",
        "end_image": "https://example.com/night.jpg",
        "duration": 5,
        "resolution": "720p",
        "aspect_ratio": "16:9"
    }
)

视频转视频

对于 grok-imagine-video 的 video-to-video，请在 video_url 中传入公网 HTTPS .mp4 URL。TokenLab 会把它转换为 xAI REST 的 video.url 请求体。你可以把 resolution 设为 480p 或 720p；该编辑流程不接受 duration 和 aspect_ratio。当模型接受现有视频作为主输入时，请使用 operation=video-to-video。

response = requests.post(
    "https://api.tokenlab.sh/v1/videos/generations",
    headers={"Authorization": "Bearer sk-your-api-key"},
    json={
        "model": "grok-imagine-video",
        "operation": "video-to-video",
        "video_url": "https://example.com/source.mp4",
        "prompt": "Enhance the clip while preserving the original motion.",
        "resolution": "720p"
    }
)

动作控制

当模型同时需要主体图片和动作参考视频时，请使用 operation=motion-control。TokenLab 会把公开的 image_url + video_url 请求形态转换成该模型的 motion-control 请求格式。

response = requests.post(
    "https://api.tokenlab.sh/v1/videos/generations",
    headers={"Authorization": "Bearer sk-your-api-key"},
    json={
        "model": "kling-3.0-motion-control",
        "operation": "motion-control",
        "prompt": "Keep the subject stable while following the motion reference.",
        "image_url": "https://example.com/subject.png",
        "video_url": "https://example.com/motion.mp4",
        "resolution": "720p"
    }
)

模型发现

公开视频模型库存和支持的操作会持续变化。接入某个模型特定流程前，请以 Models API 确认当前支持情况：

curl "https://api.tokenlab.sh/v1/models?recommended_for=video" \
  -H "Authorization: Bearer sk-your-api-key"

curl "https://api.tokenlab.sh/v1/models/veo3.1" \
  -H "Authorization: Bearer sk-your-api-key"

依赖模型特定操作或字段前，请读取单模型详情响应。audio-to-video、video-extension 等操作属于模型特定能力；请在那里确认实时可用性，不要依赖本页中的静态示例。

​概述

​轮询行为

​模型与媒体行为

​请求体

​兼容说明

​输入最佳实践

​Seedance 参数

​Seedance 示例

​响应

​图生视频

​Kling 3.0 元素引用

​参考图生视频

​首尾帧控制

​视频转视频

​动作控制

​模型发现

概述

轮询行为

模型与媒体行为

请求体

兼容说明

输入最佳实践

Seedance 参数

Seedance 示例

响应

图生视频

Kling 3.0 元素引用

参考图生视频

首尾帧控制

视频转视频

动作控制

模型发现