Wan 2.6 API

Text to VideoImage to VideoReference to Video

Wan AI's latest model with extended 15s duration, multi-shot, and audio support. Includes Flash variant.

·Features·FAQ

Documentation

Prompt *

Text description of the video. For R2V mode, use character1, character2, etc.

Mode

Generation mode

Resolution

Output resolution

Duration

Video duration 2-15s (R2V: 2-10s)

Custom Audio

Custom audio URL for audio-video sync (wav/mp3, max 15MB, 3-30s)

0/1

Click or drag & dropMP3, WAV, FLAC, OGG · Max 15MB

Multi-Shot Segments

Multi-shot narrative segments. Auto-calculates total duration.

2/8 shots

Negative Prompt

What to avoid in the video (max 500 chars, ignored for R2V)

Generate Audio

Auto-generate audio. Set false for silent video.

Prompt Extend

Intelligent prompt rewriting

Watermark

Add watermark to video

Seed

Seed for reproducibility (not supported for R2V)

Output

Your generated video will appear here

Features

What Wan 2.6 API offers

Text-to-video, image-to-video, and reference-to-video modes

720p or 1080p output

Duration about 2 to 15 seconds (reference-to-video limited to 2 to 10 seconds per UI)

Multi-shot mode with 2 to 8 segments and selectable segment durations

Optional generated audio, optional custom audio upload for sync

Negative prompts for text-to-video and image-to-video (not applied in reference-to-video)

Optional intelligent prompt rewriting, watermark, and seed (seed not used in reference-to-video)

REST API with JSON responses

Use cases

Built for

Primary

Marketing: promos from a prompt or a single start frame

Social: vertical or widescreen clips at 720p or 1080p

Character-led shorts: reference-to-video with several image or video references

Narrative tests: multi-shot segments for story beats

FAQ

About Wan 2.6 API

Text-to-video, image-to-video with a start image, and reference-to-video where you supply up to five reference files (images and videos mixed, at most three videos) mapped to characters in your prompt.

By default the API can generate audio with the clip. You can turn that off for silent video or supply your own audio file for synchronization when supported.

Up to about 15 seconds for text-to-video and image-to-video. Reference-to-video is limited to about 2 through 10 seconds per the duration control help text.

Negative prompts apply to text-to-video and image-to-video. They are ignored for reference-to-video in the current schema.