Kling O1
What is Kling O1?
Kling O1 is Kuaishou's unified multimodal video model. It runs structured prompting with reference bundles in one call: up to 7 raw reference images (image_urls), persistent IMAGE-only elements, optional start and end frames, and an optional reference video with Reference or Transform behavior. Modes are: omitted (pure text-to-video, no inputs), elements (image_urls plus persistent elements, combined cap of 7), start_end_frame, transform, and video_reference. The constraint to know: single-shot only, capped at 10 seconds, no native audio. If the brief needs multi-shot, a 15-second single shot, or video elements, jump to Kling 3.0 or 3.0 Omni.
Key features of Kling O1
Five features cover what production teams use O1 for.
Up to 7 raw reference images
Pass image_urls in elements mode and reference them inline as @image_1 through @image_7. Useful for brand work where the brief locks character, product, and environment in one composition.
Persistent IMAGE-only elements
The elements array stores reusable subjects on the Kling account. Each element accepts up to four image references. Reference them in the prompt as @element_1, @element_2.
Start and end frame support
start_end_frame mode lets you anchor the open and close compositions with explicit images. Good for ads where the first and final beats are fixed.
Reference and Transform video modes
video_mode: video_reference treats the source as a style or motion guide. video_mode: transform reshapes the reference clip itself. Both cap inputs at 4.
Standard or Pro output
Std at 720p for layout exploration. Pro at 1080p for delivery cuts. Duration runs from 3 to 10 seconds in either mode.
Best for
Heavily-referenced single shots
Up to 7 reference images plus persistent elements in one call. Useful for brand campaigns with locked subject continuity.
Continuity edits via bookend frames
Optional start and end frames anchor the open and close compositions in start_end_frame mode.
Style or timing transfer from a reference video
Reference or Transform modes on a reference clip. Useful for motion templates and style transfers in a single shot.
Audio-preserved performance clips
keep_audio carries the source soundtrack through when video_url is set, useful for music-driven cuts.
Structured agency briefs
When the brief specifies many fixed elements (character, product, environment), the reference bundle keeps them all in frame.
Variants
Two output tracks plus the video_mode switch on the optional reference video. Pick by resolution and by how aggressive the reference conditioning should be.
Std
The 720p output mode. Good for layout exploration and short-form social work.
Pro
The 1080p output mode. Use it when the clip is the delivery cut and resolution is the contract.
Reference mode (video_reference)
Treats the optional reference video as a style or motion guide. The prompt and other references shape the new content while inheriting feel from the reference clip.
Transform mode
Reshapes the reference clip's behavior. Use it when the source should be retimed or restyled rather than just referenced.
Use cases
Build a heavily-referenced brand spot in a single 10-second cut by passing 5 image_urls (character, product, two environments, brand palette) and writing a prompt that names each as @image_1 through @image_5. Anchor the open and close of an ad with start_end_frame mode by passing the opening hero and the closing logo card as explicit frames. Match the look of an existing piece by handing the reference clip to video_reference mode and letting the model inherit feel without copying the source frame-for-frame.
When to use Kling O1
Use Kling 3.0 or 3.0 Omni instead when:
- You need multi-shot mode (2 to 6 connected scenes)
- You need a 15-second single shot (O1 caps at 10)
- You need VIDEO elements in your reference bundle (use Kling 3.0 Omni)
- You need native AI audio in the same call (use Kling 3.0)
- You need 4K output (use Kling 3.0 or 3.0 Omni)
Stay on Kling O1 when the brief is a heavily-referenced single shot at 720p or 1080p and the structured @image and @element syntax fits your client code.
API examples
Call Kling O1 from any language by POSTing to /v1/tasks. Full parameter docs live at docs.unifically.com/models/video/kling/kling-o1-video.
curl -X POST https://api.unifically.com/v1/tasks \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "kuaishou/kling-o1-video",
"input": {
"prompt": "A cinematic drone shot over a misty forest at dawn",
"duration": 10,
"mode": "pro"
}
}'
Successful submission returns a task_id. Poll GET /v1/tasks/<task_id> or set a callback_url on the request to receive the finished result.
FAQs
People also ask
Kling O1 is Kuaishou's unified multimodal video model. It accepts up to 7 raw reference images, persistent IMAGE-only elements, optional start and end frames, and an optional reference video with Reference or Transform behavior. Single-shot only, duration 3 to 10 seconds, output at Standard (720p) or Pro (1080p).
Use @image_1 through @image_7 for raw image_urls and @element_1, @element_2 for persistent elements. Reference them inline in the prompt string. Pass the reference clip via video_url for transform or video_reference modes.
O1 caps single-clip duration at 10 seconds, supports IMAGE-only elements, and skips multi-shot mode and native audio. Kling 3.0 Omni stretches single shots to 15 seconds, allows multi-shot stacks, supports IMAGE and VIDEO elements, has native audio, and adds 4K output.
Reference treats the clip as a style or motion guide; the prompt and other references shape the new content while inheriting feel from the reference. Transform reshapes the reference clip itself. Pick Reference for inspiration, Transform when the source should be reshaped.
Different niches. O1 is built around structured prompting with reference bundles. Master is the dedicated Pro-only path inside the older 2.1 family with no reference image support. Pick the tool that matches whether you need reference bundles (O1) or a simpler 2.1 Pro pipeline (Master).