SeeDance 2.0 API: Specs, Pricing, and How to Use It
SeeDance 2.0 is ByteDance's flagship video model. Up to 15-second clips, multimodal omni-reference (9 images, 3 videos, 3 audio), native audio. Full API guide.
SeeDance 2.0 is ByteDance's February 2026 video model and the version that brings multimodal omni-reference to the SeeDance line. One generate call accepts up to nine reference images, three reference video clips, and three reference audio clips — addressable by name (@Image1, @Video1, @Audio1) anywhere in the prompt. The model produces synchronized audio in the same pass, multi-shot narrative output with character consistency, and clip lengths up to 15 seconds. This guide covers what SeeDance 2.0 actually does, how Pro and Fast differ on Unifically, what it costs, and how to call it.
TL;DR: SeeDance 2.0 ships in Pro and Fast tiers on Unifically at $0.08 per second. Output is 720p, 4–15 seconds, in 16:9 / 9:16 / 1:1 / 4:3. Up to 9 images + 3 video clips + 3 audio clips per call as omni-reference, addressable via
@Image1/@Video1/@Audio1placeholders. Multi-shot narrative with character consistency in one generate. Native synchronized audio with millisecond lip-sync across multiple languages.
What is SeeDance 2.0?
SeeDance 2.0 is ByteDance's flagship video model, released in early February 2026 with the public API beta opening on April 14, 2026. It accepts a text prompt plus an optional multimodal reference set, and returns an MP4 of 4 to 15 seconds at 720p with synchronized audio.
Three things make it different from SeeDance 1.5 Pro:
- Multimodal omni-reference — up to nine reference images, three video clips, and three audio clips per call, addressable in the prompt with placeholders.
- Native audio-video generation — audio is generated jointly with the visuals (not as a post-process), with millisecond lip-sync precision across multiple languages.
- Multi-shot storytelling — multiple connected shots in a single generate call, with the same character recognisable across them.
ByteDance also reports a +31.7 point physics benchmark improvement over SeeDance 1.5 Pro, mostly on water, cloth, and gravity simulation.
Why SeeDance 2.0 matters in 2026
The interesting unlock isn't any single feature — it's that the prompt surface stops being a bottleneck. Pre-SeeDance 2.0, multi-asset creative briefs ("a video that combines the look of this still, the motion of this clip, and the audio mood of this track") had to be broken into multiple generations and stitched together. SeeDance 2.0 takes the whole brief in one call.
The other shift is multi-shot output with character consistency. Earlier video models would happily render a multi-shot prompt as four separate-looking generations stitched together. SeeDance 2.0 holds the character — face, body, clothing — across shots in the same generate, which closes the biggest gap between AI video and proper short-form narrative work.
How SeeDance 2.0 works
Two tiers on Unifically
- SeeDance 2.0 Pro (default,
?model=pro) — higher quality, the production pick. - SeeDance 2.0 Fast (
?model=fast) — same controls, throughput-optimised for drafting and high-volume runs.
Both expose the same modes, the same omni-reference surface, the same duration range (4–15 seconds), and the same 720p output ceiling on Unifically.
Modes
- Text-to-video — a prompt is enough.
- First and last frame — supply start and end images; SeeDance 2.0 interpolates the motion.
- Omni-reference — up to nine images, three video clips, and three audio clips, addressable via
@Image1,@Video1,@Audio1placeholders in the prompt.
Inputs
prompt— required.images— up to 9 reference images.videos— up to 3 reference video clips.audio— up to 3 reference audio clips.duration— 4–15 seconds.aspect_ratio— 16:9, 9:16, 1:1, 4:3.audio: false— disable audio output if you want a silent MP4.seed— optional, for reproducibility.
Output
- 720p MP4 at 24 FPS.
- Native synchronized audio (unless disabled).
- Returned via task-based async polling.
SeeDance 2.0 Pro vs Fast
Both tiers expose the same parameters; pick by what the workload needs.
| Tier | Best for | Per-second price |
|---|---|---|
| SeeDance 2.0 Pro | Hero shots, paid placements, omni-reference work that needs the higher quality model | $0.08 |
| SeeDance 2.0 Fast | Draft loops, high-volume social content, throughput-bound runs | $0.08 |
Same per-second list price, different model behaviour: Pro is tuned for fidelity, Fast for throughput. Switch between them by the playground tab or the ?model= URL parameter — the request body is identical.
SeeDance 2.0 pricing and how it compares
| Source | Variant | Per-second price | 5s clip | 10s clip | 15s clip |
|---|---|---|---|---|---|
| Unifically | SeeDance 2.0 Pro | $0.08 | $0.40 | $0.80 | $1.20 |
| Unifically | SeeDance 2.0 Fast | $0.08 | $0.40 | $0.80 | $1.20 |
| FAL.ai | SeeDance 2.0 Standard 720p | ~$0.30 | ~$1.50 | ~$3.00 | ~$4.55 |
| WaveSpeed | SeeDance 2.0 Standard 720p | ~$0.24 | ~$1.20 | ~$2.40 | ~$3.60 |
| WaveSpeed | SeeDance 2.0 Fast 720p | ~$0.20 | ~$1.00 | ~$2.00 | ~$3.00 |
| Replicate | SeeDance 2.0 | per-second rate | varies | varies | varies |
Source for third-party rates: published per-second pricing across FAL, WaveSpeed, Replicate, PiAPI, and EvoLink. Unifically's $0.08/s rate is materially cheaper than any of the third-party providers we surveyed for SeeDance 2.0.
How to call SeeDance 2.0 on Unifically
The API is async: POST a generation, then poll the task endpoint until the MP4 is ready.
Text-to-video, multi-shot, with omni-reference
const API = 'https://api.unifically.com';
const headers = {
Authorization: `Bearer ${process.env.UNIFICALLY_API_KEY}`,
'Content-Type': 'application/json',
};
const start = await fetch(`${API}/v1/tasks`, {
method: 'POST',
headers,
body: JSON.stringify({
model: 'bytedance/seedance-2.0-pro',
input: {
prompt:
'Shot 1: the chef in @Image1 plates the dish from @Image2 in a sunlit kitchen. Shot 2: she walks the plate to the dining room. Shot 3: she serves a guest, who smiles. Soundtrack matches the mood of @Audio1.',
aspect_ratio: '16:9',
duration: 15,
images: ['https://example.com/chef.jpg', 'https://example.com/dish.jpg'],
audio: ['https://example.com/jazz-mood.mp3'],
},
}),
}).then((r) => r.json());
while (true) {
await new Promise((r) => setTimeout(r, 3000));
const task = await fetch(`${API}/v1/tasks/${start.data.task_id}`, { headers }).then((r) => r.json());
if (task.data.status === 'completed') {
console.log(task.data.output.video_url);
break;
}
if (task.data.status === 'failed') throw new Error(task.data.error?.message);
}
First-and-last-frame mode
const start = await fetch(`${API}/v1/tasks`, {
method: 'POST',
headers,
body: JSON.stringify({
model: 'bytedance/seedance-2.0-pro',
input: {
prompt: 'A slow rotation of the product, soft rim lighting, marble plinth',
aspect_ratio: '1:1',
duration: 6,
start_frame_url: 'https://example.com/product-front.jpg',
end_frame_url: 'https://example.com/product-three-quarter.jpg',
},
}),
}).then((r) => r.json());
Switching to Fast
Same payload, swap the model id to bytedance/seedance-2.0-fast.
Working with omni-reference
The omni-reference surface is the differentiator, so it's worth calling out explicitly.
- Address references by placeholder. In the prompt, write
@Image1,@Image2,@Video1,@Audio1to point a sentence at a specific reference asset. The model ties the clause to that file slot. - Order matters.
images[0]becomes@Image1,images[1]becomes@Image2, etc. Same forvideos[]andaudio[]. Re-order the array if you want a different placeholder mapping. - Don't max out the slots by default. Three to five well-chosen images plus one source clip almost always beats nine mixed images. References compete for influence; over-loading dilutes the strongest one.
- Audio references shape mood, not literal output. Pass an audio clip and SeeDance 2.0 will try to match its mood, tempo, and vibe — not splice the clip into the output. Use it the way you'd use a "vibe reference" in a brief.
Common mistakes and gotchas
- Asking for 4K. Not supported on Unifically. SeeDance 2.0 Pro and Fast both cap at 720p output. For 4K, look at Veo 3.1 Quality + Upscale 4K or Kling 3.0 Ultra.
- Treating Pro and Fast as different feature surfaces. They expose identical parameters. Pro is the higher-fidelity model; Fast is throughput-optimised. Pick by latency, not by feature.
- Stuffing all 15 omni-reference slots. Three to five well-chosen references usually beat fifteen mixed ones. Send the minimum that anchors the look.
- Forgetting
audio: falsefor silent runs. Audio is generated by default. If your downstream pipeline replaces the audio anyway, passaudio: falseto keep generation focused. - Confusing duration with shot count. Duration is the total clip length (4–15 seconds). Multi-shot narrative is a prompt-structure decision (
Shot 1: ... Shot 2: ...). You can have 15 seconds of single-shot content or 15 seconds split across three or four shots.
Frequently asked questions
What is SeeDance 2.0?
SeeDance 2.0 is ByteDance's February 2026 flagship video model. It accepts a text prompt plus up to nine reference images, three reference video clips, and three reference audio clips, and returns a 4–15 second 720p MP4 with synchronized audio. Multi-shot narrative output with character consistency is supported in a single call.
How much does SeeDance 2.0 cost?
On Unifically, both SeeDance 2.0 Pro and Fast list at $0.08 per second of generated video. That is $0.40 for a 5-second clip, $0.80 for 10 seconds, and $1.20 for the 15-second maximum. Audio output is included.
What is omni-reference?
Omni-reference is SeeDance 2.0's multimodal input surface: up to nine images, three video clips, and three audio clips per generation. Each asset is addressable in the prompt via placeholders (@Image1, @Video1, @Audio1) so each clause maps to a specific file slot.
What is the difference between SeeDance 2.0 Pro and Fast?
Both expose the same modes, the same omni-reference surface, the same 4–15 second duration range, and the same 720p output. Pro is the higher-quality model; Fast is throughput-optimised for drafting and high-volume runs. Same per-second list price on Unifically.
Does SeeDance 2.0 generate audio?
Yes. Audio is generated jointly with the video in the same pass, with millisecond lip-sync precision across multiple languages. Pass audio: false to disable audio output and return a silent MP4.
Related reading
- SeeDance 2.0 model page — live Pro and Fast playgrounds
- Veo 3.1 vs SeeDance 2.0 — Western flagship comparison
- SeeDance 2.0 vs Kling 3.0 — Chinese flagship comparison
- Kling 3.0 and MiniMax Hailuo — alternative video APIs
- Suno — pair SeeDance 2.0 video with full-track music generation



