Kling 3.0 Omni
What is Kling 3.0 Omni?
Kling 3.0 Omni is the full-reference variant of Kling 3.0. Where the base 3.0 endpoint covers prompt, start and end frames, multi-shot, native audio, and elements, Omni adds raw reference images (image_urls, max 7), an explicit video_mode switch, and two reference-video behaviors (Reference and Transform) on top. Modes are: omitted (pure text-to-video, no inputs), elements (image_urls and persistent elements, combined cap of 7), start_end_frame, transform, and video_reference. Output is 720p (Standard), 1080p (Pro), or 4K, in single-shot 3 to 15 second clips or multi-shot stacks of 2 to 6 scenes. Native audio is supported in the same call. Use Omni when the brief is heavy on reference assets or when the reference is a video.
Key features of Kling 3.0 Omni
Five features define what Omni adds over base Kling 3.0.
Up to 7 raw reference images
Pass image_urls in elements mode and reference them inline as @image_1 through @image_7. Combined cap of 7 with persistent elements. Useful when the brief locks character, product, environment, and brand assets all at once.
Persistent elements for image or video assets
The elements array stores subjects on the Kling account, lasts for the call, and supports both image and video element types. Each element accepts up to four image_urls or a single video_url.
Reference and Transform video modes
video_mode: reference treats the source clip as a style or motion guide. video_mode: transform reshapes the clip itself. Both cap inputs at 4 and require aspect_ratio: auto so the model picks the output ratio.
Multi-shot up to 6 scenes per call
multi_shots accepts 2 to 6 entries with per-shot prompts and durations. Total length stays within 3 to 15 seconds. Useful for narrative arcs and serial character work where consistency across cuts matters.
720p, 1080p, or 4K with native audio
Standard renders at 720p, Pro at 1080p, and a dedicated 4K mode delivers 4K straight from the endpoint. native_audio is available on the same call.
Best for
Heavily-referenced campaign work
Up to 7 reference images plus persistent elements. Useful when the brief has many fixed assets (character, product, environment, brand kit).
Style or motion transfer from a reference video
video_mode: reference for style transfer, video_mode: transform for retiming and reshaping. Useful for "match the motion of @video" and similar briefs.
Multi-shot with consistent characters
Combines the multi-shot mode of Kling 3.0 with the heavier reference handling. Useful for serial character work.
Auto aspect ratio across mixed references
When a reference video is set, the model picks the output aspect. Useful when references span square, vertical, and widescreen.
Audio-preserved style transfers
Keep native audio aligned to the visual while restyling the look. Useful for music-driven cuts and serial brand work.
Variants
Three output modes plus the video_mode switch. Pick by resolution and by whether the brief leads with images, frames, or a reference video.
Standard
The 720p output mode. Cheaper per clip and faster, useful for layout exploration and concept passes.
Pro
The 1080p output mode. The default for delivery cuts that need to land on a hero unit or paid placement.
4K
A dedicated 4K mode for premium delivery straight from the endpoint.
Reference (video_mode: video_reference)
Treats the optional reference video as a style or motion guide for the prompted result. Combined image_urls plus elements cap of 4, aspect_ratio must be auto.
Transform (video_mode: transform)
Aims at stronger edits or retiming of the supplied reference clip's behavior. Same cap and aspect rules as Reference.
Multi-shot
Pass a multi_shots array of 2 to 6 entries. Each shot has its own prompt and duration. Total length stays between 3 and 15 seconds.
Use cases
Run a campaign that locks character, product, and environment across the cut by passing all three as raw reference images (max 7) and writing a prompt that references them as @image_1, @image_2, @image_3. Match the motion of an existing trend clip by setting video_mode to video_reference, passing the trend video as video_url, and letting the model inherit feel without copying frames. Reshape a piece of footage with a stronger restyle by switching to video_mode: transform. Build a 12-second serial cut as three connected shots in the multi_shots array, with a persistent element holding the hero character across all three.
API examples
Call Kling 3.0 Omni from any language by POSTing to /v1/tasks. Full parameter docs live at docs.unifically.com/models/video/kling/kling-3.0-omni-video.
curl -X POST https://api.unifically.com/v1/tasks \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "kuaishou/kling-3.0-omni-video",
"input": {
"prompt": "A majestic eagle soaring through mountain peaks at sunset",
"duration": 5,
"mode": "pro"
}
}'
Successful submission returns a task_id. Poll GET /v1/tasks/<task_id> or set a callback_url on the request to receive the finished result.
FAQs
People also ask
Kling 3.0 Omni is the full-reference variant of Kling 3.0. One generate call accepts up to 7 reference images, persistent elements (image or video), optional start and end frames, and an optional reference video with Reference or Transform behavior. Single-shot 3 to 15 seconds, multi-shot 2 to 6 scenes, 720p / 1080p / 4K.
Base Kling 3.0 covers prompt, frame images, multi-shot, native audio, and elements. Omni stacks on raw reference images (image_urls, max 7), a video_mode switch with explicit Transform and Video Reference modes, and Auto aspect ratio when a reference video is in play. The base endpoint has no video_url and no video_mode at all.
Use @image_1 through @image_7 for raw reference images and @element_1, @element_2 for persistent elements from the elements array. Both are referenced inline in the prompt string.
Reference treats the clip as a guide for style or motion while the prompt steers a new result. Transform aims at stronger edits or retiming of the supplied clip. Both modes cap the combined image_urls plus elements at 4 and require aspect_ratio to be auto.
When a reference video is set, aspect_ratio must be auto and the model picks the output ratio based on the input. For text-to-video and elements modes you can pass 1:1, 16:9, or 9:16 explicitly.