SeeDance 2.0 API: Specs, Pricing, and How to Use It
SeeDance 2.0 is ByteDance's flagship video model. Up to 15-second clips, multimodal omni-reference (9 images, 3 videos, 3 audio), native audio. Full API guide.
SeeDance 2.0 is ByteDance's February 2026 video model. It brings multimodal omni-reference to the SeeDance line. One generate call accepts up to nine reference images, three reference video clips, and three reference audio clips, addressable by name (@Image1, @Video1, @Audio1) anywhere in the prompt. The model produces synchronized audio in the same pass, multi-shot narrative output with character consistency, and clips up to 15 seconds. This guide covers what SeeDance 2.0 does, how Pro and Fast differ on Unifically, what it costs, and how to call it.
TL;DR: SeeDance 2.0 comes in Pro and Fast variants on Unifically. Fast is $0.11 per second; Pro starts from $0.13 per second (Pro has two sub-variants). Output is 720p or 1080p, 4 to 15 seconds, in 16:9 / 9:16 / 1:1 / 4:3. Up to 9 images + 3 video clips + 3 audio clips per call as omni-reference, addressable via
@Image1/@Video1/@Audio1placeholders. Multi-shot narrative with character consistency in one generate. Native synchronized audio with millisecond lip-sync across multiple languages. Live rates: see the pricing page.
What is SeeDance 2.0?
SeeDance 2.0 is ByteDance's flagship video model, released in early February 2026. The public API beta opened on April 14, 2026. It accepts a text prompt plus an optional multimodal reference set, and returns a 4 to 15 second MP4 at 720p or 1080p with synchronized audio.
Three things make it different from SeeDance 1.5 Pro:
- Multimodal omni-reference. Up to nine reference images, three video clips, and three audio clips per call, addressable in the prompt with placeholders.
- Native audio-video generation. Audio is generated jointly with the video (not as a post-process), with millisecond lip-sync across multiple languages.
- Multi-shot storytelling. Multiple connected shots in a single generate call, with the same character recognisable across them.
ByteDance also reports a 31.7-point physics benchmark improvement over SeeDance 1.5 Pro, mostly on water, cloth, and gravity simulation.
What's new in SeeDance 2.0
The biggest shift is that the prompt surface stops being a bottleneck. Before SeeDance 2.0, multi-asset creative briefs ("a video that combines the look of this still, the motion of this clip, and the audio mood of this track") had to be split into multiple generations and stitched together. SeeDance 2.0 takes the whole brief in one call.
The other shift is multi-shot output with character consistency. Earlier video models would render a multi-shot prompt as four separate-looking generations stitched together. SeeDance 2.0 holds the character (face, body, clothing) across shots in the same generate, which closes the biggest gap between AI video and proper short-form narrative work.
What you can do with SeeDance 2.0
Two variants on Unifically
- SeeDance 2.0 Pro (default): higher-quality model. Comes in two sub-variants for different output settings.
- SeeDance 2.0 Fast: same controls, tuned for speed and high-volume runs.
Both expose the same modes, the same omni-reference surface, the same duration range (4 to 15 seconds), and the same aspect-ratio set on Unifically.
Modes
- Text-to-video. A prompt is enough.
- First and last frame. Supply start and end images; SeeDance 2.0 fills in the motion.
- Omni-reference. Up to nine images, three video clips, and three audio clips, addressable via
@Image1,@Video1,@Audio1placeholders in the prompt.
Inputs
prompt. Required.images. Up to 9 reference images.videos. Up to 3 reference video clips.audio. Up to 3 reference audio clips.duration. 4 to 15 seconds.aspect_ratio. 16:9, 9:16, 1:1, 4:3.audio: false. Disable audio output for a silent MP4.seed. Optional, for reproducibility.
Output
- 720p or 1080p MP4 at 24 FPS.
- Native synchronized audio (unless disabled).
- Returned via task-based async polling.
SeeDance 2.0 Pro vs Fast
Both variants expose the same parameters. Pick by what the workload needs.
| Variant | Best for | Per-second price |
|---|---|---|
| SeeDance 2.0 Pro | Hero shots, paid placements, omni-reference work that needs the higher-quality model | from $0.11 |
| SeeDance 2.0 Fast | Draft loops, high-volume social content, speed-bound runs | $0.11 |
Pro is tuned for quality, Fast is tuned for speed. Switch between them in the playground tabs or by calling the matching model id. The request body is identical. Live prices: pricing page.
SeeDance 2.0 pricing and how it compares
Listed prices were accurate at the time of writing. Always check the pricing page for current rates, since they can change.
| Source | Variant | Per-second price | 5s clip | 10s clip | 15s clip |
|---|---|---|---|---|---|
| Unifically | SeeDance 2.0 Fast | $0.11 | $0.55 | $1.10 | $1.65 |
| Unifically | SeeDance 2.0 Pro | from $0.13 | from $0.65 | from $1.30 | from $1.95 |
| FAL.ai | SeeDance 2.0 Standard 720p | ~$0.30 | ~$1.50 | ~$3.00 | ~$4.55 |
| WaveSpeed | SeeDance 2.0 Standard 720p | ~$0.24 | ~$1.20 | ~$2.40 | ~$3.60 |
| WaveSpeed | SeeDance 2.0 Fast 720p | ~$0.20 | ~$1.00 | ~$2.00 | ~$3.00 |
| Replicate | SeeDance 2.0 | per-second rate | varies | varies | varies |
Source for third-party rates: published per-second pricing across FAL, WaveSpeed, Replicate, PiAPI, and EvoLink. Unifically's rates are materially cheaper than any of the third-party providers we surveyed for SeeDance 2.0.
How to call SeeDance 2.0 on Unifically
The API is async: POST a generation, then poll the task endpoint until the MP4 is ready.
Text-to-video, multi-shot, with omni-reference
const API = 'https://api.unifically.com';
const headers = {
Authorization: `Bearer ${process.env.UNIFICALLY_API_KEY}`,
'Content-Type': 'application/json',
};
const start = await fetch(`${API}/v1/tasks`, {
method: 'POST',
headers,
body: JSON.stringify({
model: 'bytedance/seedance-2.0-pro',
input: {
prompt:
'Shot 1: the chef in @Image1 plates the dish from @Image2 in a sunlit kitchen. Shot 2: she walks the plate to the dining room. Shot 3: she serves a guest, who smiles. Soundtrack matches the mood of @Audio1.',
aspect_ratio: '16:9',
duration: 15,
images: ['https://example.com/chef.jpg', 'https://example.com/dish.jpg'],
audio: ['https://example.com/jazz-mood.mp3'],
},
}),
}).then((r) => r.json());
while (true) {
await new Promise((r) => setTimeout(r, 3000));
const task = await fetch(`${API}/v1/tasks/${start.data.task_id}`, { headers }).then((r) => r.json());
if (task.data.status === 'completed') {
console.log(task.data.output.video_url);
break;
}
if (task.data.status === 'failed') throw new Error(task.data.error?.message);
}
First-and-last-frame mode
const start = await fetch(`${API}/v1/tasks`, {
method: 'POST',
headers,
body: JSON.stringify({
model: 'bytedance/seedance-2.0-pro',
input: {
prompt: 'A slow rotation of the product, soft rim lighting, marble plinth',
aspect_ratio: '1:1',
duration: 6,
start_frame_url: 'https://example.com/product-front.jpg',
end_frame_url: 'https://example.com/product-three-quarter.jpg',
},
}),
}).then((r) => r.json());
Switching to Fast
Same payload, swap the model id to bytedance/seedance-2.0-fast.
Working with omni-reference
Omni-reference is the differentiator, so it is worth calling out directly.
- Address references by placeholder. In the prompt, write
@Image1,@Image2,@Video1,@Audio1to point a sentence at a specific reference asset. The model ties that clause to that file slot. - Order matters.
images[0]becomes@Image1,images[1]becomes@Image2, and so on. Same forvideos[]andaudio[]. Reorder the array if you want a different placeholder mapping. - Don't max out the slots by default. Three to five well-chosen images plus one source clip almost always beats nine mixed images. References compete for influence; loading too many dilutes the strongest one.
- Audio references shape mood, not literal output. Pass an audio clip and SeeDance 2.0 tries to match its mood, tempo, and feel, not splice the clip into the output. Use it the way you would use a "vibe reference" in a brief.
Things to know
- Asking for 4K. Not supported on Unifically. SeeDance 2.0 Pro and Fast cap at 1080p output. For 4K, use Veo 3.1 Quality + Upscale 4K or Kling 3.0 Ultra.
- Treating Pro and Fast as different feature surfaces. They expose the same parameters. Pro is the higher-quality model; Fast is tuned for speed. Choose based on quality needs and latency budget; the feature surface is the same.
- Stuffing all 15 omni-reference slots. Three to five well-chosen references usually beat fifteen mixed ones. Send the minimum that anchors the look.
- Forgetting
audio: falsefor silent runs. Audio is generated by default. If the downstream pipeline replaces the audio anyway, passaudio: falseto keep generation focused. - Confusing duration with shot count. Duration is the total clip length (4 to 15 seconds). Multi-shot narrative is a prompt-structure decision (
Shot 1: ... Shot 2: ...). You can have 15 seconds of single-shot content or 15 seconds split across three or four shots.
Frequently asked questions
What is SeeDance 2.0?
SeeDance 2.0 is ByteDance's February 2026 flagship video model. It accepts a text prompt plus up to nine reference images, three reference video clips, and three reference audio clips, and returns a 4 to 15 second MP4 at 720p or 1080p with synchronized audio. Multi-shot narrative output with character consistency works in a single call.
How much does SeeDance 2.0 cost?
On Unifically, SeeDance 2.0 Fast lists at $0.11 per second of generated video. SeeDance 2.0 Pro starts from $0.13 per second and has two sub-variants. Audio output is included. Check the pricing page for the current rates.
What is omni-reference?
Omni-reference is SeeDance 2.0's multimodal input surface: up to nine images, three video clips, and three audio clips per generation. Each asset is addressable in the prompt via placeholders (@Image1, @Video1, @Audio1), so each clause maps to a specific file slot.
What is the difference between SeeDance 2.0 Pro and Fast?
Both expose the same modes, the same omni-reference surface, the same 4 to 15 second duration range, and the same aspect ratios. Pro is the higher-quality model and has two sub-variants. Fast is tuned for speed and high-volume runs.
Does SeeDance 2.0 generate audio?
Yes. Audio is generated jointly with the video in the same pass, with millisecond lip-sync across multiple languages. Pass audio: false to disable audio output and return a silent MP4.
Related reading
- SeeDance 2.0 model page: live Pro and Fast playgrounds.
- Veo 3.1 vs SeeDance 2.0: Western flagship comparison.
- SeeDance 2.0 vs Kling 3.0: Chinese flagship comparison.
- Kling 3.0 and MiniMax Hailuo: other video APIs to consider.
- Suno: pair SeeDance 2.0 video with full-track music generation.



