SeeDance 2.0 API: Specs, Pricing, and How to Use It

SeeDance 2.0 is ByteDance's February 2026 video model and the version that brings multimodal omni-reference to the SeeDance line. One generate call accepts up to nine reference images, three reference video clips, and three reference audio clips — addressable by name (@Image1, @Video1, @Audio1) anywhere in the prompt. The model produces synchronized audio in the same pass, multi-shot narrative output with character consistency, and clip lengths up to 15 seconds. This guide covers what SeeDance 2.0 actually does, how Pro and Fast differ on Unifically, what it costs, and how to call it.

TL;DR: SeeDance 2.0 ships in Pro and Fast tiers on Unifically at $0.08 per second. Output is 720p, 4–15 seconds, in 16:9 / 9:16 / 1:1 / 4:3. Up to 9 images + 3 video clips + 3 audio clips per call as omni-reference, addressable via @Image1 / @Video1 / @Audio1 placeholders. Multi-shot narrative with character consistency in one generate. Native synchronized audio with millisecond lip-sync across multiple languages.

What is SeeDance 2.0?

SeeDance 2.0 is ByteDance's flagship video model, released in early February 2026 with the public API beta opening on April 14, 2026. It accepts a text prompt plus an optional multimodal reference set, and returns an MP4 of 4 to 15 seconds at 720p with synchronized audio.

Three things make it different from SeeDance 1.5 Pro:

Multimodal omni-reference — up to nine reference images, three video clips, and three audio clips per call, addressable in the prompt with placeholders.
Native audio-video generation — audio is generated jointly with the visuals (not as a post-process), with millisecond lip-sync precision across multiple languages.
Multi-shot storytelling — multiple connected shots in a single generate call, with the same character recognisable across them.

ByteDance also reports a +31.7 point physics benchmark improvement over SeeDance 1.5 Pro, mostly on water, cloth, and gravity simulation.

Why SeeDance 2.0 matters in 2026

The interesting unlock isn't any single feature — it's that the prompt surface stops being a bottleneck. Pre-SeeDance 2.0, multi-asset creative briefs ("a video that combines the look of this still, the motion of this clip, and the audio mood of this track") had to be broken into multiple generations and stitched together. SeeDance 2.0 takes the whole brief in one call.

The other shift is multi-shot output with character consistency. Earlier video models would happily render a multi-shot prompt as four separate-looking generations stitched together. SeeDance 2.0 holds the character — face, body, clothing — across shots in the same generate, which closes the biggest gap between AI video and proper short-form narrative work.

How SeeDance 2.0 works

Two tiers on Unifically

SeeDance 2.0 Pro (default, ?model=pro) — higher quality, the production pick.
SeeDance 2.0 Fast (?model=fast) — same controls, throughput-optimised for drafting and high-volume runs.

Both expose the same modes, the same omni-reference surface, the same duration range (4–15 seconds), and the same 720p output ceiling on Unifically.

Modes

Text-to-video — a prompt is enough.
First and last frame — supply start and end images; SeeDance 2.0 interpolates the motion.
Omni-reference — up to nine images, three video clips, and three audio clips, addressable via @Image1, @Video1, @Audio1 placeholders in the prompt.

Inputs

prompt — required.
images — up to 9 reference images.
videos — up to 3 reference video clips.
audio — up to 3 reference audio clips.
duration — 4–15 seconds.
aspect_ratio — 16:9, 9:16, 1:1, 4:3.
audio: false — disable audio output if you want a silent MP4.
seed — optional, for reproducibility.

Output

720p MP4 at 24 FPS.
Native synchronized audio (unless disabled).
Returned via task-based async polling.

SeeDance 2.0 Pro vs Fast

Both tiers expose the same parameters; pick by what the workload needs.

Tier	Best for	Per-second price
SeeDance 2.0 Pro	Hero shots, paid placements, omni-reference work that needs the higher quality model	$0.08
SeeDance 2.0 Fast	Draft loops, high-volume social content, throughput-bound runs	$0.08

Same per-second list price, different model behaviour: Pro is tuned for fidelity, Fast for throughput. Switch between them by the playground tab or the ?model= URL parameter — the request body is identical.

SeeDance 2.0 pricing and how it compares

Source	Variant	Per-second price	5s clip	10s clip	15s clip
Unifically	SeeDance 2.0 Pro	$0.08	$0.40	$0.80	$1.20
Unifically	SeeDance 2.0 Fast	$0.08	$0.40	$0.80	$1.20
FAL.ai	SeeDance 2.0 Standard 720p	~$0.30	~$1.50	~$3.00	~$4.55
WaveSpeed	SeeDance 2.0 Standard 720p	~$0.24	~$1.20	~$2.40	~$3.60
WaveSpeed	SeeDance 2.0 Fast 720p	~$0.20	~$1.00	~$2.00	~$3.00
Replicate	SeeDance 2.0	per-second rate	varies	varies	varies

Source for third-party rates: published per-second pricing across FAL, WaveSpeed, Replicate, PiAPI, and EvoLink. Unifically's $0.08/s rate is materially cheaper than any of the third-party providers we surveyed for SeeDance 2.0.

How to call SeeDance 2.0 on Unifically

The API is async: POST a generation, then poll the task endpoint until the MP4 is ready.

Text-to-video, multi-shot, with omni-reference

const API = 'https://api.unifically.com';
const headers = {
  Authorization: `Bearer ${process.env.UNIFICALLY_API_KEY}`,
  'Content-Type': 'application/json',
};

const start = await fetch(`${API}/v1/tasks`, {
  method: 'POST',
  headers,
  body: JSON.stringify({
    model: 'bytedance/seedance-2.0-pro',
    input: {
      prompt:
        'Shot 1: the chef in @Image1 plates the dish from @Image2 in a sunlit kitchen. Shot 2: she walks the plate to the dining room. Shot 3: she serves a guest, who smiles. Soundtrack matches the mood of @Audio1.',
      aspect_ratio: '16:9',
      duration: 15,
      images: ['https://example.com/chef.jpg', 'https://example.com/dish.jpg'],
      audio: ['https://example.com/jazz-mood.mp3'],
    },
  }),
}).then((r) => r.json());

while (true) {
  await new Promise((r) => setTimeout(r, 3000));
  const task = await fetch(`${API}/v1/tasks/${start.data.task_id}`, { headers }).then((r) => r.json());
  if (task.data.status === 'completed') {
    console.log(task.data.output.video_url);
    break;
  }
  if (task.data.status === 'failed') throw new Error(task.data.error?.message);
}

First-and-last-frame mode

const start = await fetch(`${API}/v1/tasks`, {
  method: 'POST',
  headers,
  body: JSON.stringify({
    model: 'bytedance/seedance-2.0-pro',
    input: {
      prompt: 'A slow rotation of the product, soft rim lighting, marble plinth',
      aspect_ratio: '1:1',
      duration: 6,
      start_frame_url: 'https://example.com/product-front.jpg',
      end_frame_url: 'https://example.com/product-three-quarter.jpg',
    },
  }),
}).then((r) => r.json());

Switching to Fast

Same payload, swap the model id to bytedance/seedance-2.0-fast.

Working with omni-reference

The omni-reference surface is the differentiator, so it's worth calling out explicitly.

Address references by placeholder. In the prompt, write @Image1, @Image2, @Video1, @Audio1 to point a sentence at a specific reference asset. The model ties the clause to that file slot.
Order matters. images[0] becomes @Image1, images[1] becomes @Image2, etc. Same for videos[] and audio[]. Re-order the array if you want a different placeholder mapping.
Don't max out the slots by default. Three to five well-chosen images plus one source clip almost always beats nine mixed images. References compete for influence; over-loading dilutes the strongest one.
Audio references shape mood, not literal output. Pass an audio clip and SeeDance 2.0 will try to match its mood, tempo, and vibe — not splice the clip into the output. Use it the way you'd use a "vibe reference" in a brief.

Common mistakes and gotchas

Asking for 4K. Not supported on Unifically. SeeDance 2.0 Pro and Fast both cap at 720p output. For 4K, look at Veo 3.1 Quality + Upscale 4K or Kling 3.0 Ultra.
Treating Pro and Fast as different feature surfaces. They expose identical parameters. Pro is the higher-fidelity model; Fast is throughput-optimised. Pick by latency, not by feature.
Stuffing all 15 omni-reference slots. Three to five well-chosen references usually beat fifteen mixed ones. Send the minimum that anchors the look.
Forgetting audio: false for silent runs. Audio is generated by default. If your downstream pipeline replaces the audio anyway, pass audio: false to keep generation focused.
Confusing duration with shot count. Duration is the total clip length (4–15 seconds). Multi-shot narrative is a prompt-structure decision (Shot 1: ... Shot 2: ...). You can have 15 seconds of single-shot content or 15 seconds split across three or four shots.

Frequently asked questions

What is SeeDance 2.0?

SeeDance 2.0 is ByteDance's February 2026 flagship video model. It accepts a text prompt plus up to nine reference images, three reference video clips, and three reference audio clips, and returns a 4–15 second 720p MP4 with synchronized audio. Multi-shot narrative output with character consistency is supported in a single call.

How much does SeeDance 2.0 cost?

On Unifically, both SeeDance 2.0 Pro and Fast list at $0.08 per second of generated video. That is $0.40 for a 5-second clip, $0.80 for 10 seconds, and $1.20 for the 15-second maximum. Audio output is included.

What is omni-reference?

Omni-reference is SeeDance 2.0's multimodal input surface: up to nine images, three video clips, and three audio clips per generation. Each asset is addressable in the prompt via placeholders (@Image1, @Video1, @Audio1) so each clause maps to a specific file slot.

What is the difference between SeeDance 2.0 Pro and Fast?

Both expose the same modes, the same omni-reference surface, the same 4–15 second duration range, and the same 720p output. Pro is the higher-quality model; Fast is throughput-optimised for drafting and high-volume runs. Same per-second list price on Unifically.

Does SeeDance 2.0 generate audio?

Yes. Audio is generated jointly with the video in the same pass, with millisecond lip-sync precision across multiple languages. Pass audio: false to disable audio output and return a silent MP4.

SeeDance 2.0 model page — live Pro and Fast playgrounds
Veo 3.1 vs SeeDance 2.0 — Western flagship comparison
SeeDance 2.0 vs Kling 3.0 — Chinese flagship comparison
Kling 3.0 and MiniMax Hailuo — alternative video APIs
Suno — pair SeeDance 2.0 video with full-track music generation

SeeDance 2.0 API: Specs, Pricing, and How to Use It

What is SeeDance 2.0?

Why SeeDance 2.0 matters in 2026