Kling 3.0 API

Text to VideoImage to Video

Latest Kling model with multi-shot support, 3-15s duration, and sound generation.

·Features·FAQ

Documentation

Generation mode

Use a single prompt or a JSON array of shots (mutually exclusive with prompt per API)

Mode

Output quality mode

Aspect Ratio

Generate Audio

Generate audio for the output video (default: true)

Output

Your generated video will appear here

Features

What Kling 3.0 API offers

Text-to-video with optional start and end frame images (single-shot mode)

Multi-shot generation with 2 to 6 scenes, 3 to 15 seconds total (each shot 1 to 15 seconds)

Aspect ratios 16:9, 9:16, and 1:1

Standard (720p) or Pro (1080p) output

Optional generated audio for the output clip (on by default in the playground)

REST API with JSON request and response bodies

Use cases

Built for

Primary

Advertising - Single prompt or multi-shot clips up to 15 seconds for ads and promos

Short-form social - Portrait or square formats for reels and short video feeds

Story beats - Multi-shot prompts for simple scene changes within one render

Product and brand - Start and end frames to lock first and last looks when not using multi-shot

Tests and drafts - Fast iteration on motion, framing, and audio before a finishing pass

FAQ

About Kling 3.0 API

Kling 3.0 generates video from text. A single-shot request uses one prompt and a 3 to 15 second duration with optional start and end images. Multi-shot uses 2 to 6 timed scenes with a combined length of 3 to 15 seconds.

Use single-shot for one continuous scene, optional frame images, audio, aspect ratio, and duration. Use multi-shot when you want several prompted segments in one output; multi-shot does not use start or end frame uploads.

Standard maps to 720p and Pro to 1080p. Aspect ratio can be 16:9, 9:16, or 1:1.

Yes. You can turn generated audio on or off when creating the job. The default in the playground is on.