Unifically LogoUnificAlly
Model logo

Grok Imagine Video API

Text to VideoImage to VideoText to ImageImage to ImageVideo to Video

xAI's multimodal generation platform. Create videos, images, and apply video edits with text guidance.

Output

Your generated video will appear here

Features

What Grok Imagine API offers

xAI Grok Imagine video: text-to-video with optional first-frame image upload
Durations from 1s to 10s (slider in the playground)
480p or 720p output resolution
Aspect ratios: 1:1, 2:3, 3:2, 9:16, 16:9
Preset styles plus a custom mode (Spicy, Fun, Normal, or custom in the playground)
Second workflow: extend an existing rendered clip using task_id, optional preset mode, or custom extend_at plus extend_duration (6s or 10s extensions)
JSON requests with generate and poll endpoints for both generate and extend flows

Use cases

Built for

Primary

Social clips - Square or vertical ratios for short promos under ten seconds

#2

Storyboard animatics - Rapid motion tests before a full production pipeline

#3

Product teasers - Optional start image to anchor branding in the first frame

#4

Continued takes - Extend mode for a follow-on segment from a prior task

#5

Creative variants - Style presets when you want different motion tone without rewriting tooling

#6

Editorial B-roll - Quick atmospheric shots from prompts alone

FAQ

About Grok Imagine API

Two variants share one page: Generate creates fresh clips from prompts with optional image_url, duration 1 to 10 seconds, resolution, aspect_ratio, and video_preset. Extend continues a prior generation when you supply the source task_id.

Generate uses 1 to 10 second durations. Both 480p and 720p are available. Extension segments are 6 or 10 seconds depending on the extend_duration you select.

Extend requires the task_id from a completed clip. In preset mode you pick a video_preset and the API ignores manual prompts. In custom mode you supply prompt, extend_at (seconds into the clip), and extend_duration.

Yes. Upload an optional image in Generate to bias the opening frame content while still driving motion with the text prompt.