Veo 3.1
What is Veo 3.1?
Veo 3.1 is Google's October 2025 update to the Veo video family, with a January 2026 refresh that added native vertical output and 4K upscaling. It's a video model built for footage that has to look real: shallow depth of field, believable motion blur, lighting that holds up at 1080p without the usual AI giveaways. Compared with the last Veo version, the clear wins are camera motion that no longer drifts on long pans, much better lip-synced speech, and characters that stay the same across multiple scenes. On Unifically it's available as four generation models, an Extend variant that continues a finished clip, and an Upscale variant that takes any finished task to 1080p or 4K.
Key features of Veo 3.1
Five features cover how Veo 3.1 fits into a production pipeline.

Three clip lengths: 4, 6, or 8 seconds
Four seconds is the default and works for any clip that loops: feed posts, hero banners, store previews, app onboarding. Six seconds gives a bit more room. Eight seconds covers voice-over, full camera moves, and dialogue.

Four generation variants for cost and quality control
Lite Relaxed is the cheapest option, with a longer queue, and works well for brainstorming. Fast handles most day-to-day work. Quality takes longer per render but gives better motion quality and lighting. Each variant is a separate model with its own ID, so you choose based on what the shot needs.

Frame mode for storyboarded camera moves
Give Veo a start image and, if you want, an end image too. The model animates the camera move between them. Useful when you have stills from a shoot or concept art and want a working motion test in minutes instead of hours.

Reference mode for character and style continuity
Pass one to three reference images and Veo keeps silhouette, palette, and style in the new clip. Combine with a voice preset to get a lip-synced speech track in the same call, with no separate audio pipeline or manual lip sync afterwards.

Extend and Upscale for finished clips
Once you have a clip worth keeping, Extend continues it for another eight seconds with a fresh prompt while keeping aspect ratio and framing. Upscale takes the finished task up to 1080p or 4K. You only pay the upscale fee on the clips that need it.
Best for
Short product hero loops
A four-second clip at 16:9 or 9:16 is enough for a homepage hero or a paid-ad slot. Lock it with a seed once the result is right.
Vertical social with brand references
Fast or Lite reference mode accepts up to three campaign stills, so 9:16 cuts stay aligned with shoot photography.
First-and-last-frame storyboarding
Frame mode lets you set both the opening and the closing image. The model animates the camera move between them.
Lip-synced character dialogue
Combine reference mode with a voice preset to get a lip-synced voice track in the same call, with no separate audio pipeline.
High-quality master files at 4K
Generate cheaply on Lite Relaxed, lock the clip you want with Quality, then call Upscale once on the final result to deliver 4K.
Longer stories via Extend
Extend continues a finished clip with a new prompt. Each call is locked to 8 seconds and keeps the source aspect ratio, so cuts stay consistent.
Variants
Veo 3.1 has six variants on the same API. Each one is a different model with its own price and turnaround. Pick the one that fits the shot.
Lite
The cheapest variant. Good for brainstorming, first drafts, and quick A/B prompts before you spend money on a Quality render. Frame mode, reference mode, and voice presets are all supported here.
Lite Relaxed
Same Lite output, lower-priority queue, lower price. Use it when you can wait longer. Reference mode and voice presets are also available. Good for overnight batch runs.
Fast
The default for day-to-day work, with generation typically returning in around 100 seconds. Reference mode (one to three style or character images) and voice presets are available on Fast, Lite, and Lite Relaxed, so lip-synced audio work runs on any of these three variants.
Quality
Slowest per render at roughly 240 seconds, with the best motion quality, lighting, and lip-sync accuracy. Use it when the clip has to be production-ready. Frame mode is supported, but reference mode is not — if you need character continuity from reference images, run on Fast or Lite.
Extend
Continues a finished Veo 3.1 clip for another eight seconds with a new prompt while keeping aspect ratio and framing. Use it when you need to push past the per-clip cap without restarting the look.
Upscale
Takes any finished task up to 1080p or 4K. Run this only on the clips you want to deliver. Output comes back as a new finished task at the chosen resolution.
Use cases
Build a homepage hero in four seconds: write the prompt, run Lite Relaxed, lock the clip with a seed, and you have a long-lasting loop ready. Build a vertical product reel by passing three campaign stills as references on Fast. Silhouette and brand color stay the same across the clip, and adding a voice preset gives you a voice-over cut in one call. Storyboard a complex camera move with frame mode by giving Veo a start frame and an end frame, then tweaking until the motion lands. Stretch a story past eight seconds by chaining Extend off a finished clip. Once you have the result you want, send it to Upscale at 4K for delivery.
Limitations
Reference mode is locked to 8-second clips, and it is not available on the Quality variant. If you want the best motion quality, you have to drop reference images and run on Quality without them. Frame mode and reference mode can't be used in the same request — pick one. Extend always returns 8 seconds, so a 4-second tag is still an 8-second render. Upscale is a separate paid call on top of the first generation, so plan for it before you promise to deliver in 4K.
API examples
Call Veo 3.1 from any language by POSTing to /v1/tasks. Full parameter docs live at docs.unifically.com/models/video/google/veo-3.1.
curl -X POST https://api.unifically.com/v1/tasks \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "google/veo-3.1-fast",
"input": {
"prompt": "A cinematic shot of a cat walking through a garden",
"aspect_ratio": "16:9",
"duration": 4
}
}'
Successful submission returns a task_id. Poll GET /v1/tasks/<task_id> or set a callback_url on the request to receive the finished video URL.
FAQs
People also ask
Veo 3.1 is Google's October 2025 video generation model. It produces short cinematic clips of 4, 6, or 8 seconds from a text prompt, with the option to guide it with a start frame, an end frame, or up to three reference images. With a voice preset it also produces lip-synced speech.
Six. Four generation variants (Lite, Lite Relaxed, Fast, and Quality), plus a dedicated Extend variant for continuing a finished clip and a separate Upscale variant that bumps a finished task up to 1080p or 4K.
4, 6, or 8 seconds per generation, with 4 seconds as the default. Reference mode and the Extend variant are both locked to 8 seconds, so a four-second clip is only available in plain text-to-video and frame mode.
Most of the time, when the clip is going to loop or play in a feed. Homepage hero loops, paid-ad slots, app onboarding stingers, and store-listing previews all fit comfortably in 4s. Use 6s when you need a bit more room, and 8s when there's voice-over or a longer camera move.
Yes. Pass a voice preset on Fast, Lite, or Lite Relaxed (alongside at least one reference image) and Veo 3.1 returns a clip with a lip-synced voice track. Ambient sound and effects are made in the same pass.
Yes, via a separate upscale call. Generate the clip on any variant, then send the finished task to the Upscale endpoint with resolution set to 1080p or 4K. The upscale returns a new finished task at the chosen resolution.
Same model, different queue priority. Lite Relaxed waits in a lower-priority lane for a lower price. Lite returns the result sooner. Output quality is the same, so Lite Relaxed is the better choice when you can wait longer.
Frame mode uses a start image and an optional end image to bracket the camera move and works on every variant. Reference mode uses one to three style or character images plus an optional voice preset to blend visuals, and it is available on Fast, Lite, and Lite Relaxed. The two modes can't be used together in one request.
Yes. Pass an integer seed alongside the prompt and any input frames. Re-sending the same prompt, frames, and seed returns the same render. Useful for A/B variants, regression tests, and locking in a clip you want to upscale.