Skip to main content
Unifically LogoUnificAlly
Model logo

Wan 2.7 API

  • Text to Video
  • Image to Video
  • Reference to Video
  • Video to Video
Prompt Extend
LLM-based prompt rewriting. Improves short prompts but adds latency.
Watermark
Add "AI Generated" watermark in lower-right corner
Output

Your generated video will appear here

Wan 2.7

What is Wan 2.7?

Wan 2.7 is Alibaba's March 2026 upgrade to the Wan video line. Three modes share one model ID. Text-to-video accepts a prompt and an optional audio_url for driving timing, with multi-shot controllable through natural language and timestamps in the prompt. Image-to-video covers first-frame, first-and-last-frame, video continuation (extending an existing clip), and lip-synced motion when an audio track is supplied. Reference-to-video accepts up to 5 reference images and up to 5 reference videos with "Image 1" and "Video 1" identifiers in the prompt. Output runs at 720P or 1080P in 2 to 15 second clips, with negative prompts, prompt extend, watermark, and seed available across all three modes.

Key features of Wan 2.7

Five features cover the surface area you'll actually use day to day.

First-and-last-frame in image-to-video

First-and-last-frame in image-to-video

Supply a start frame and an optional end frame. The model interpolates the camera move between them while keeping aspect ratio from the input. Useful for storyboard-driven animation.

Video continuation

Video continuation

Pass a 2–10 second source clip via `video_url`. The model generates new content up to the requested `duration`. Useful for stretching a hero clip past the per-call cap without restarting the look.

Lip-sync via driving audio

Lip-sync via driving audio

In image-to-video, pass an `audio_url` (wav/mp3, 2–30 seconds). The model times mouth movements and motion to the audio. No separate lip-sync pipeline needed.

Reference-to-video up to 5 inputs each

Reference-to-video up to 5 inputs each

Mix up to 5 reference images and up to 5 reference videos. Reference each one in the prompt with "Image 1" or "Video 1". Useful for ads with multiple recurring characters or props.

Multi-shot from natural language

Multi-shot from natural language

Control shot structure in the prompt itself, with timestamped beats like "Shot 1 [0-3s] wide shot…" The model interprets and chains shots without a separate `multi_prompt` array.

Best for

Last-frame interpolation

Supply start and end frames in image-to-video and the model lands on the target.

Video continuation

Stretch a finished clip past the per-call cap by feeding it back in as `video_url`.

Lip-synced character clips

Pass an audio track in image-to-video for mouth movements timed to the recording.

Multi-character reference-to-video

Up to 5 reference images and up to 5 reference videos for scenes with recurring characters and props.

Multi-shot narratives

Control shot boundaries with timestamps in the prompt, no separate multi-shot field.

Brand-restyled video

Pair text-to-video or image-to-video with prompt extend and seeds for repeatable brand-aligned output.

Limitations

last_frame_url requires first_frame_url. Video continuation (video_url) is mutually exclusive with the frame inputs. Audio inputs are capped at 30 seconds and 15MB. Reference videos are capped at 30 seconds and 100MB. Reference-to-video does not accept a seed. Aspect ratio in reference-to-video is ignored when first_frame_url is set.

API examples

Call Wan 2.7 from any language by POSTing to /v1/tasks. Full parameter docs live at docs.unifically.com/models/video/alibaba/wan-2.7-video.

curl -X POST https://api.unifically.com/v1/tasks \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "alibaba/wan-2.7-video",
    "input": {
      "mode": "t2v",
      "prompt": "A kitten running in the moonlight",
      "resolution": "720P",
      "ratio": "16:9"
    }
  }'

Successful submission returns a task_id. Poll GET /v1/tasks/<task_id> or set a callback_url on the request to receive the finished result.

FAQs

People also ask

Wan 2.7 is Alibaba's March 2026 upgrade to the Wan video line. It exposes three modes (text-to-video, image-to-video, reference-to-video) with new features including first-and-last-frame interpolation, video continuation, lip-sync from a driving audio track, and reference-to-video with up to 5 reference images and up to 5 reference videos.

Wan 2.7 adds last-frame control in image-to-video, video continuation that extends an existing clip, lip-sync via a driving audio_url in image-to-video, and a unified reference-to-video mode with up to 5 reference images and up to 5 reference videos, using "Image 1" / "Video 1" identifiers in the prompt. Multi-shot can be controlled by natural language with timestamps in the prompt.

Three. text-to-video (t2v) with optional audio, image-to-video (i2v) with first-frame, last-frame, video continuation, and lip-sync, and reference-to-video (r2v) with up to 5 reference images and up to 5 reference videos, using "Image 1" / "Video 1" identifiers in the prompt.

In image-to-video mode, pass a video_url instead of a first frame. The model extends the input clip up to the requested duration. If your input is 3 seconds and you ask for 15, the model generates 12 new seconds and the final output is 15.

Pass an audio_url (wav or mp3, 2–30 seconds, max 15MB) in image-to-video mode. The model times mouth movements and motion to the audio.

16:9, 9:16, 1:1, 4:3, and 3:4. The default is 16:9.

Yes. Wan 2.7 Video Edit is a separate model (alibaba/wan-2.7-video-edit) for editing or style-transferring an existing video with optional reference images.