Grok Imagine Video 1.5 API

Unifically: one endpoint, one API key, USD billing.

What is Grok Imagine Video 1.5?

Grok Imagine Video 1.5 is xAI's image-to-video model, released in preview on May 30, 2026. You give it a starting image and a text prompt, and it animates that frame into a clip with motion and native sound in one pass. The source image becomes the first frame, so it sets composition, color, and identity, and the model holds the subject's shape as it moves. There is no text-to-video here: every generation needs an input image. It debuted at the top of the Arena image-to-video board at 1474 Elo, just ahead of Seedance 2.0, and it does one thing well: turning a frame you already have into motion with matching sound.

Key features of Grok Imagine Video 1.5

Glass and refraction from a single still

From one abstract render, the model animates a glossy morphic form so its surfaces flow like liquid mercury and the prismatic bands refract as the shape deforms. Translucent materials and accurate reflection are where many video models break, so this is a strong quality signal.

Water physics with designed sound

A breaking wave crests, folds, and explodes into foam, with spray drifting in the wind and water rushing back in white rivulets. The native audio carries the boom of the swell, the hiss of pullback, and wind across the coast, all timed to the motion.

Lip-synced dialogue while moving

A skateboarder accelerates as the handheld camera tracks his face, and his spoken line stays lip-matched while the background streaks past. Dialogue, ambient street sound, and motion all generate together, no separate audio edit.

Micro-expressions with a held frame

The subject touches her cheek and shifts from a gentle smile to a wide one at the camera, while the products and text in frame stay locked. Subtle facial motion plus a stable composition is what makes a clip usable for ads.

Best for

Product photo to motion ad

Animate a packaging or hero shot into a short clip with a matching music bed and sound effects, then output 9:16 for paid social.

Talking character clips

Turn a portrait into a lip-synced scene where the subject speaks a short line, with dialogue baked into the same pass.

Water, glass, and fire motion

Add believable physics to a still, the place many video models break, and let the native audio carry the swell, hiss, or crackle.

Short-form social clips

Reels, TikTok, and Shorts where audio and picture must match, generated together so the cut lands without a separate mix.

Concept frames to life

Bring a finished render or concept frame into motion for trailers and scene loops, no separate audio edit needed.

Quick motion tests

Run the same image with different prompts to pick the best take before scaling up resolution or duration.

Use cases

Start with a clean image, then let the prompt say only what moves. A cosmetics brand can take a packaging shot and get a 720p clip of a presenter touching her cheek and speaking to camera while the products stay locked in frame. A studio can animate an abstract render so its glass surfaces ripple and refract, useful for title cards and loops. A documentary-style edit can drop in a tracked sprint or a skateboarding beat with spoken lines lip-matched to the subject. Because audio rides along, an early-morning street scene can carry footsteps, breathing, and distant traffic without any post work. The 15-second ceiling and 24 fps make it a fit for ad spots, trailers, and social cuts rather than long sequences.

Limitations

This is an image-to-video model only. There is no text-to-video; for a clip from text alone, use the base Grok Imagine Video model instead. It is a preview release, so behavior can change. Negative prompts are ignored, so you describe what you want rather than what to avoid. Output tops out at 720p, and longer 15-second clips are more prone to artifacts than 5-to-8-second clips.

Grok Imagine Video 1.5 vs Seedance 2.0

On the Arena image-to-video board, Grok Imagine Video 1.5 (720p) ranks first at 1474 Elo, one point above Seedance 2.0 (720p) at 1473. On community votes the two are effectively tied. The practical split is audio: Grok 1.5 generates synced sound and lip-matched dialogue in the same pass, which Seedance does not. Use Grok 1.5 when the clip needs sound baked in; the leaderboard gap alone is too small to decide on.

API examples

Call Grok Imagine Video 1.5 from any language by POSTing to /v1/tasks. Full parameter docs live at docs.unifically.com/models/video/xai/grok-imagine-video-1.5.

curl -X POST https://api.unifically.com/v1/tasks \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "xai/grok-imagine-video-1.5-preview",
    "input": {
      "image_urls": ["https://example.com/portrait.png"],
      "prompt": "She touches her cheek and smiles gently, then smiles wide looking into the camera. The camera stays still. AUDIO: soft ambient room tone, a light breath, a gentle acoustic note.",
      "aspect_ratio": "16:9",
      "duration": 6,
      "resolution": "720p"
    }
  }'

Successful submission returns a task_id. Poll GET /v1/tasks/<task_id> or set a callback_url on the request to receive the finished video URL.

FAQs