Skip to main content
Unifically LogoUnificAlly
Model logo

Kling 3.0 API

  • Text to Video
  • Image to Video
Generate Audio
Generate audio for the output video (default: true)
Output

Your generated video will appear here

Kling 3.0

What is Kling 3.0?

Kling 3.0 is Kuaishou's flagship video model. The headline upgrades over Kling 2.6 are multi-shot storytelling (2 to 6 connected scenes in one call), single-shot duration up to 15 seconds, and a 4K output mode on top of the existing 720p and 1080p. Native audio is on by default. The endpoint covers the full input range: pure text-to-video and image-to-video with start and optional end frames. Aspect ratios are 1:1, 16:9, or 9:16. Use 3.0 when the brief needs more than a single short cut from one model.

Key features of Kling 3.0

Five features cover what production teams build with 3.0.

Multi-shot mode (2 to 6 scenes per call)

Pass a multi_shots array instead of a single prompt and the model returns connected scenes that share characters and tone. Each shot has its own prompt and duration; total length stays between 3 and 15 seconds.

Single shots up to 15 seconds

50% longer per call than Kling 2.6. Useful for premium social posts, pre-roll spots, and any cut where 10 seconds was leaving you short.

720p, 1080p, or 4K output

Standard renders at 720p, Pro at 1080p, and a dedicated 4K mode delivers 4K straight from the endpoint. Pick the resolution that matches the placement.

Native AI audio on by default

native_audio defaults to true and the model produces an aligned audio track in the same call. Set it to false when you plan to score the clip in post.

Best for

Multi-shot ad arcs

Setup-beat-payoff structure in one call. Useful for short ads that need narrative pacing.

15-second hero clips

50% longer single-shot duration than 2.6. Useful for premium social posts and pre-roll ads.

Voiced cuts in one call

Native audio on by default produces an aligned soundtrack with the video, removing the separate audio pass for most briefs.

4K delivery without an upscale step

Set mode to 4k for premium delivery straight from the endpoint.

Cinematic composition

The 3.0 generation produces stronger scene composition than 2.6 on complex prompts.

Variants

Kling 3.0 has three output modes plus an input-mode switch. Pick by output resolution and by whether the call is one shot or many.

Standard

The 720p output mode. Cheapest per clip and good for layout exploration and prompt iteration.

Pro

The 1080p output mode. The default for delivery cuts that need to land on a hero unit or paid placement.

4K

A dedicated 4K mode for premium delivery. Use it when the brief requires 4K masters straight out of the endpoint.

Multi-shot

Pass a multi_shots array (2 to 6 entries) instead of a single prompt. Each shot has its own prompt and duration. Useful for ad arcs and serial character work.

Use cases

Build a 12-second product ad as three connected shots in one call: hero reveal, feature highlight, closing card. Pass each as an entry in the multi_shots array and let the model carry character and tone across the cuts. Generate a 15-second hero loop for a homepage by running single-shot mode with native audio on, then deliver straight to the page without an extra audio pass. Deliver in 4K without a separate upscale by switching mode to 4k on the final approved cut.

API examples

Call Kling 3.0 from any language by POSTing to /v1/tasks. Full parameter docs live at docs.unifically.com/models/video/kling/kling-3.0.

curl -X POST https://api.unifically.com/v1/tasks \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "kuaishou/kling-3.0-video",
    "input": {
      "prompt": "A cinematic drone shot over a misty forest at dawn",
      "duration": 10,
      "mode": "pro",
      "native_audio": true
    }
  }'

Successful submission returns a task_id. Poll GET /v1/tasks/<task_id> or set a callback_url on the request to receive the finished result.

FAQs

People also ask

Kling 3.0 is Kuaishou's flagship video model. Single-shot mode generates 3 to 15 second clips from a prompt with optional start and end frames. Multi-shot mode generates 2 to 6 connected scenes in one call. Native AI audio is on by default.

Three big changes. Multi-shot storytelling (2 to 6 connected scenes per call) replaces single-shot only. Single-shot duration extends from 10 to 15 seconds. Output adds a 4K mode on top of 720p and 1080p.

Kling O1 caps single clips at 10 seconds and skips multi-shot and native audio. Kling 3.0 stretches single shots to 15 seconds, adds multi-shot mode, and has native audio on by default. Pick 3.0 for narrative work; O1 for tightly-referenced single shots.

One generate call returns 2 to 6 connected scenes. Each shot has its own prompt and duration (minimum 1 second), with the total length between 3 and 15 seconds. Useful for short ad arcs and serial content where the same character appears across cuts.

Yes. native_audio is on by default and the model produces an AI audio track aligned to the video. Disable it by setting native_audio to false when you plan to score the clip in post.