Kling 3.0
What is Kling 3.0?
Kling 3.0 is Kuaishou's flagship video model. The headline upgrades over Kling 2.6 are multi-shot storytelling (2 to 6 connected scenes in one call), single-shot duration up to 15 seconds, and a 4K output mode on top of the existing 720p and 1080p. Native audio is on by default. The endpoint covers the full input range: pure text-to-video and image-to-video with start and optional end frames. Aspect ratios are 1:1, 16:9, or 9:16. Use 3.0 when the brief needs more than a single short cut from one model.
Key features of Kling 3.0
Five features cover what production teams build with 3.0.
Multi-shot mode (2 to 6 scenes per call)
Pass a multi_shots array instead of a single prompt and the model returns connected scenes that share characters and tone. Each shot has its own prompt and duration; total length stays between 3 and 15 seconds.
Single shots up to 15 seconds
50% longer per call than Kling 2.6. Useful for premium social posts, pre-roll spots, and any cut where 10 seconds was leaving you short.
720p, 1080p, or 4K output
Standard renders at 720p, Pro at 1080p, and a dedicated 4K mode delivers 4K straight from the endpoint. Pick the resolution that matches the placement.
Native AI audio on by default
native_audio defaults to true and the model produces an aligned audio track in the same call. Set it to false when you plan to score the clip in post.
Best for
Multi-shot ad arcs
Setup-beat-payoff structure in one call. Useful for short ads that need narrative pacing.
15-second hero clips
50% longer single-shot duration than 2.6. Useful for premium social posts and pre-roll ads.
Voiced cuts in one call
Native audio on by default produces an aligned soundtrack with the video, removing the separate audio pass for most briefs.
4K delivery without an upscale step
Set mode to 4k for premium delivery straight from the endpoint.
Cinematic composition
The 3.0 generation produces stronger scene composition than 2.6 on complex prompts.
Variants
Kling 3.0 has three output modes plus an input-mode switch. Pick by output resolution and by whether the call is one shot or many.
Standard
The 720p output mode. Cheapest per clip and good for layout exploration and prompt iteration.
Pro
The 1080p output mode. The default for delivery cuts that need to land on a hero unit or paid placement.
4K
A dedicated 4K mode for premium delivery. Use it when the brief requires 4K masters straight out of the endpoint.
Multi-shot
Pass a multi_shots array (2 to 6 entries) instead of a single prompt. Each shot has its own prompt and duration. Useful for ad arcs and serial character work.
Use cases
Build a 12-second product ad as three connected shots in one call: hero reveal, feature highlight, closing card. Pass each as an entry in the multi_shots array and let the model carry character and tone across the cuts. Generate a 15-second hero loop for a homepage by running single-shot mode with native audio on, then deliver straight to the page without an extra audio pass. Deliver in 4K without a separate upscale by switching mode to 4k on the final approved cut.
API examples
Call Kling 3.0 from any language by POSTing to /v1/tasks. Full parameter docs live at docs.unifically.com/models/video/kling/kling-3.0.
curl -X POST https://api.unifically.com/v1/tasks \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "kuaishou/kling-3.0-video",
"input": {
"prompt": "A cinematic drone shot over a misty forest at dawn",
"duration": 10,
"mode": "pro",
"native_audio": true
}
}'
Successful submission returns a task_id. Poll GET /v1/tasks/<task_id> or set a callback_url on the request to receive the finished result.
FAQs
People also ask
Kling 3.0 is Kuaishou's flagship video model. Single-shot mode generates 3 to 15 second clips from a prompt with optional start and end frames. Multi-shot mode generates 2 to 6 connected scenes in one call. Native AI audio is on by default.
Three big changes. Multi-shot storytelling (2 to 6 connected scenes per call) replaces single-shot only. Single-shot duration extends from 10 to 15 seconds. Output adds a 4K mode on top of 720p and 1080p.
Kling O1 caps single clips at 10 seconds and skips multi-shot and native audio. Kling 3.0 stretches single shots to 15 seconds, adds multi-shot mode, and has native audio on by default. Pick 3.0 for narrative work; O1 for tightly-referenced single shots.
One generate call returns 2 to 6 connected scenes. Each shot has its own prompt and duration (minimum 1 second), with the total length between 3 and 15 seconds. Useful for short ad arcs and serial content where the same character appears across cuts.
Yes. native_audio is on by default and the model produces an AI audio track aligned to the video. Disable it by setting native_audio to false when you plan to score the clip in post.