What modes does HappyHorse 1.0 API support?

HappyHorse 1.0 supports text-to-video, image-to-video (first frame), and reference-to-video for character-consistent generation.

How do I integrate HappyHorse 1.0 API?

Integrate HappyHorse 1.0 API through Unifically's REST endpoints with simple HTTP requests and JSON responses.

🤖

HappyHorse 1.0 API

Text to VideoImage to VideoReference to VideoVideo to Video

Cinematic-grade video generation with joint audio-video output. Text-to-video, image-to-video, and reference-to-video with character consistency.

Documentation

HappyHorse 1.0 HappyHorse 1.0 Edit

Mode

Generation mode: t2v (text prompt), i2v (first frame image), r2v (reference images with characterN in prompt)

Prompt

Text prompt (max 5000 non-Chinese chars or 2500 Chinese chars). For R2V, use characterN to reference images.

Resolution

Output resolution

Duration

Output duration in seconds (3-15)

Watermark

Add watermark to the output

Seed

Random seed for reproducible generation (0-2147483647)

Output

Your generated video will appear here

Features

What HappyHorse 1.0 API offers

Native multimodal architecture with joint audio-video generation for true audio-visual unity

Cinematic-grade visual quality with outstanding lighting expression

Smooth, stable camera movements and natural transitions

Highly realistic characters with lifelike facial expressions

Outstanding mid-to-close-range narrative capability

Text-to-video, image-to-video, and reference-to-video modes

Character consistency with characterN prompting in R2V mode (up to 9 references)

Multiple aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4

720P or 1080P output, configurable 3-15 second duration

REST API with JSON responses and optional seed for reproducibility

Use cases

Built for

Primary

Advertising & marketing: generate cinematic video clips from text descriptions or product images with matching audio

E-commerce showcases: animate product stills into dynamic video with first-frame image-to-video

Short-form drama production: reference-to-video with character consistency across scenes using up to 9 character images

Social media creativity: rapid creative validation and concept visualization from text prompts

Brand consistency: reference-to-video for style transfer and brand-aligned content at scale

FAQ

About HappyHorse 1.0 API

Built on a native multimodal architecture, HappyHorse 1.0 employs joint audio-video generation — producing high-quality video visuals while simultaneously generating matching audio for true audio-visual unity. It targets advertising, e-commerce, short-form drama, and social media creation.

HappyHorse 1.0 supports three generation modes: text-to-video (t2v) for generating video from text descriptions, image-to-video (i2v) using a single static image as the first frame, and reference-to-video (r2v) for generating video with character consistency from reference images.

In R2V mode, you upload 1-9 reference images and use characterN identifiers (e.g. character1, character2) in your prompt. The model generates new video that preserves the style, motion, or structure of the references — ideal for brand consistency and style transfer.

HappyHorse 1.0 features cinematic-grade visual quality, smooth camera movements, highly realistic characters with lifelike expressions, and outstanding mid-to-close-range narrative capability. Its native multimodal architecture enables simultaneous audio-video generation.

HappyHorse 1.0 supports durations from 3 to 15 seconds at 720P or 1080P resolution, with aspect ratios including 16:9, 9:16, 1:1, 4:3, and 3:4.