Skip to main content
Unifically LogoUnificAlly
Model logo

Kling O1 API

  • Text to Video
  • Image to Video
  • Reference to Video
  • Video to Video

Upload images/videos above, then type @ in your prompt to reference them

Click or drag & dropPNG, JPG, WEBP, GIF · Max 100MB
Click or drag & dropPNG, JPG, WEBP, GIF · Max 100MB
Click or drag & dropPNG, JPG, WEBP, GIF · Max 100MB
Click or drag & dropMP4, WEBM, MOV · Max 100MB
Keep reference video audio
Keep audio from the reference video
Output

Your generated video will appear here

Kling O1

What is Kling O1?

Kling O1 is Kuaishou's unified multimodal video model. It runs structured prompting with reference bundles in one call: up to 7 raw reference images (image_urls), persistent IMAGE-only elements, optional start and end frames, and an optional reference video with Reference or Transform behavior. Modes are: omitted (pure text-to-video, no inputs), elements (image_urls plus persistent elements, combined cap of 7), start_end_frame, transform, and video_reference. The constraint to know: single-shot only, capped at 10 seconds, no native audio. If the brief needs multi-shot, a 15-second single shot, or video elements, jump to Kling 3.0 or 3.0 Omni.

Key features of Kling O1

Five features cover what production teams use O1 for.

Up to 7 raw reference images

Pass image_urls in elements mode and reference them inline as @image_1 through @image_7. Useful for brand work where the brief locks character, product, and environment in one composition.

Persistent IMAGE-only elements

The elements array stores reusable subjects on the Kling account. Each element accepts up to four image references. Reference them in the prompt as @element_1, @element_2.

Start and end frame support

start_end_frame mode lets you anchor the open and close compositions with explicit images. Good for ads where the first and final beats are fixed.

Reference and Transform video modes

video_mode: video_reference treats the source as a style or motion guide. video_mode: transform reshapes the reference clip itself. Both cap inputs at 4.

Standard or Pro output

Std at 720p for layout exploration. Pro at 1080p for delivery cuts. Duration runs from 3 to 10 seconds in either mode.

Best for

Heavily-referenced single shots

Up to 7 reference images plus persistent elements in one call. Useful for brand campaigns with locked subject continuity.

Continuity edits via bookend frames

Optional start and end frames anchor the open and close compositions in start_end_frame mode.

Style or timing transfer from a reference video

Reference or Transform modes on a reference clip. Useful for motion templates and style transfers in a single shot.

Audio-preserved performance clips

keep_audio carries the source soundtrack through when video_url is set, useful for music-driven cuts.

Structured agency briefs

When the brief specifies many fixed elements (character, product, environment), the reference bundle keeps them all in frame.

Variants

Two output tracks plus the video_mode switch on the optional reference video. Pick by resolution and by how aggressive the reference conditioning should be.

Std

The 720p output mode. Good for layout exploration and short-form social work.

Pro

The 1080p output mode. Use it when the clip is the delivery cut and resolution is the contract.

Reference mode (video_reference)

Treats the optional reference video as a style or motion guide. The prompt and other references shape the new content while inheriting feel from the reference clip.

Transform mode

Reshapes the reference clip's behavior. Use it when the source should be retimed or restyled rather than just referenced.

Use cases

Build a heavily-referenced brand spot in a single 10-second cut by passing 5 image_urls (character, product, two environments, brand palette) and writing a prompt that names each as @image_1 through @image_5. Anchor the open and close of an ad with start_end_frame mode by passing the opening hero and the closing logo card as explicit frames. Match the look of an existing piece by handing the reference clip to video_reference mode and letting the model inherit feel without copying the source frame-for-frame.

When to use Kling O1

Use Kling 3.0 or 3.0 Omni instead when:

  • You need multi-shot mode (2 to 6 connected scenes)
  • You need a 15-second single shot (O1 caps at 10)
  • You need VIDEO elements in your reference bundle (use Kling 3.0 Omni)
  • You need native AI audio in the same call (use Kling 3.0)
  • You need 4K output (use Kling 3.0 or 3.0 Omni)

Stay on Kling O1 when the brief is a heavily-referenced single shot at 720p or 1080p and the structured @image and @element syntax fits your client code.

API examples

Call Kling O1 from any language by POSTing to /v1/tasks. Full parameter docs live at docs.unifically.com/models/video/kling/kling-o1-video.

curl -X POST https://api.unifically.com/v1/tasks \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "kuaishou/kling-o1-video",
    "input": {
      "prompt": "A cinematic drone shot over a misty forest at dawn",
      "duration": 10,
      "mode": "pro"
    }
  }'

Successful submission returns a task_id. Poll GET /v1/tasks/<task_id> or set a callback_url on the request to receive the finished result.

FAQs

People also ask

Kling O1 is Kuaishou's unified multimodal video model. It accepts up to 7 raw reference images, persistent IMAGE-only elements, optional start and end frames, and an optional reference video with Reference or Transform behavior. Single-shot only, duration 3 to 10 seconds, output at Standard (720p) or Pro (1080p).

Use @image_1 through @image_7 for raw image_urls and @element_1, @element_2 for persistent elements. Reference them inline in the prompt string. Pass the reference clip via video_url for transform or video_reference modes.

O1 caps single-clip duration at 10 seconds, supports IMAGE-only elements, and skips multi-shot mode and native audio. Kling 3.0 Omni stretches single shots to 15 seconds, allows multi-shot stacks, supports IMAGE and VIDEO elements, has native audio, and adds 4K output.

Reference treats the clip as a style or motion guide; the prompt and other references shape the new content while inheriting feel from the reference. Transform reshapes the reference clip itself. Pick Reference for inspiration, Transform when the source should be reshaped.

Different niches. O1 is built around structured prompting with reference bundles. Master is the dedicated Pro-only path inside the older 2.1 family with no reference image support. Pick the tool that matches whether you need reference bundles (O1) or a simpler 2.1 Pro pipeline (Master).