Skip to main content
Unifically LogoUnificAlly
Model logo

Grok Imagine Video API

  • Text to Video
  • Image to Video
  • Text to Image
  • Image to Image
  • Video to Video
Output

Your generated video will appear here

Grok Imagine

What is Grok Imagine?

Grok Imagine is xAI's video generation model. One endpoint covers text-to-video and image-to-video, with output from 1 to 10 seconds at 480p or 720p across five aspect ratios (1:1, 2:3, 3:2, 9:16, 16:9). A video_preset field picks between four style modes (custom, spicy, fun, normal), and a separate Extend endpoint chains a 6 or 10 second continuation onto any finished clip. The trade is resolution and length: Grok Imagine caps at 720p and 10 seconds per clip, but stays cheap and fast. Use it when the brief is high-volume social motion rather than 4K master files.

Key features of Grok Imagine

Five features cover what you'll actually use day to day.

Text-to-video and image-to-video on one endpoint

Text-to-video and image-to-video on one endpoint

Pass `prompt` alone for text-to-video. Add `image_url` and the model uses it as the opening frame, then animates from the prompt. Same code path for both flows.

Five aspect ratios up to 10 seconds

Five aspect ratios up to 10 seconds

1:1, 2:3, 3:2, 9:16, and 16:9. Duration is an integer between 1 and 10 seconds. The same prompt re-rendered at 9:16 and 16:9 gives a matched horizontal-and-vertical pair without a second prompt pass.

Four style presets

Four style presets

Custom is the default, with the prompt doing all the steering. Normal is a neutral house style. Fun biases toward bright, energetic motion. Spicy biases toward stylized, dramatic motion. Switch presets without changing the prompt.

480p draft mode for fast turnaround

480p draft mode for fast turnaround

Set `resolution` to 480p when you want a faster pass on a brainstorm or A/B prompt. Switch to 720p once the result is right. Same model, two cost-and-speed points.

Extend for clips longer than 10 seconds

Extend for clips longer than 10 seconds

A separate Extend endpoint takes a finished `task_id` and adds a 6 or 10 second continuation. Preset mode keeps the original style; custom mode lets you supply a new prompt with `extend_at` and `extend_duration`.

Best for

High-volume social ads

A 720p clip at 9:16 or 1:1 is the native delivery size for short-form social, with no upscale step required after the render.

Vertical short-form for TikTok and Reels

9:16 at 720p is the right size for in-feed playback. Default to the Normal or Fun preset, then re-roll on Spicy when a cut needs more visual punch.

Animated product shots from packaging stills

Pass a flat product photo as `image_url` and prompt a slow turntable or push-in. Six seconds is enough for an e-commerce hero loop.

Storyboard animatics for client review

480p drafts at 1:1 or 16:9 turn around fast, so a client meeting can look at five takes instead of arguing over the one.

Editorial B-roll on a deadline

Atmospheric prompts on the Normal preset produce neutral filler footage that drops into a cut without retiming.

Continued narratives via Extend

Chain a Generate call plus two Extend calls of 10 seconds each to build a 30-second cut without re-rendering the opening.

Variants

Grok Imagine has four style presets in the Generate call plus a dedicated Extend route. Each one is a different setting on the same API.

Custom

The default preset. The prompt does all the steering, with no built-in tonal bias. Good when you want full control of the look and the prompt is doing the heavy lifting.

Normal

Neutral house style. No tonal bias either way. Good for editorial B-roll, product loops, and any cut that should read straight.

Fun

Brighter palette, more energetic motion. Good for short-form social where the brief is energy and approachability.

Spicy

Heightened motion, dramatic lighting, stylized tone. Good for ads and brand spots where the cut needs visual punch.

Extend

A separate endpoint that continues a finished clip for another 6 or 10 seconds. Preset mode takes only video_preset (spicy or normal); custom mode requires prompt, extend_at, and extend_duration. Chain two or three together to assemble a 30-second narrative without re-rendering the opening.

Use cases

Spin up a vertical product reel in under a minute by passing a packaging shot as image_url, setting aspect_ratio to 9:16, and prompting a slow camera move at 720p. Build an in-feed ad set by re-rendering the same prompt on Normal, Fun, and Spicy presets and A/B testing the cuts. Run editorial B-roll for a newsroom by writing atmospheric prompts on Normal, then locking 720p. Stretch a 10-second clip into a 30-second narrative by chaining two Extend calls in custom mode and feeding each one its own continuation prompt.

Limitations

Grok Imagine caps at 720p and 10 seconds per Generate call. If you need 4K masters, finish elsewhere or run a separate upscale step after delivery. Aspect ratio defaults to 1:1, so set it explicitly for vertical or widescreen work. Extend in custom mode requires prompt, extend_at, and extend_duration together; preset mode ignores all three. Image-to-video uses a single image as the opening frame; there's no end-frame parameter.

API examples

Call Grok Imagine from any language by POSTing to /v1/tasks. Full parameter docs live at docs.unifically.com/models/video/xai/grok-imagine-video.

curl -X POST https://api.unifically.com/v1/tasks \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "xai/grok-imagine-video",
    "input": {
      "prompt": "A futuristic cityscape at sunset with flying cars",
      "duration": 10,
      "resolution": "720p",
      "aspect_ratio": "16:9",
      "video_preset": "custom"
    }
  }'

Successful submission returns a task_id. Poll GET /v1/tasks/<task_id> or set a callback_url on the request to receive the finished video URL.

FAQs

People also ask

Grok Imagine is xAI's video generation model. The same endpoint covers text-to-video and image-to-video, with output up to 10 seconds at 480p or 720p, five aspect ratios, and four style presets. A separate Extend endpoint chains a 6 or 10 second continuation onto a finished clip.

1 to 10 seconds in a single Generate call, with 10 seconds as the default. To get past 10 seconds, use the Extend endpoint on a finished task_id to add a 6 or 10 second continuation.

Five. 1:1, 2:3, 3:2, 9:16, and 16:9. Default is 1:1, so set the field explicitly when you need vertical 9:16 for short-form social or 16:9 for widescreen.

Yes. Pass image_url on the Generate call and Grok Imagine uses the image as the opening frame, then animates from the prompt. Useful for product packaging shots that need to come alive.

Four. Custom is the default and lets your prompt steer everything. Normal is a neutral house style. Fun biases toward bright, energetic motion. Spicy biases toward stylized, dramatic motion.

480p or 720p. The default is 720p. Pick 480p when you want a faster turnaround on a draft pass; pick 720p when the clip is going into a feed or paid slot.

Run the Extend endpoint with the finished task_id. Preset mode takes only video_preset (spicy or normal). Custom mode requires prompt, extend_at (the second to start from), and extend_duration (6 or 10).