Skip to main content
Unifically LogoUnificAlly
Model logo

HappyHorse 1.0 API

  • Text to Video
  • Image to Video
  • Reference to Video
  • Video to Video
Watermark
Add watermark to the output
Output

Your generated video will appear here

HappyHorse 1.0

What is HappyHorse 1.0?

HappyHorse 1.0 is a video generation model that produces video and matching audio in a single pass. The same endpoint covers three input modes: text-to-video for prompt-only generation, image-to-video that animates a first-frame image, and reference-to-video that takes 1 to 9 character images and locks them into the clip via characterN tags in the prompt. Output runs at 720p or 1080p across five aspect ratios (16:9, 9:16, 1:1, 4:3, 3:4) with clip lengths from 3 to 15 seconds. Because audio and video come back from the same render, sound and on-screen action stay in sync without a separate audio pipeline.

Key features of HappyHorse 1.0

Five features cover how HappyHorse 1.0 fits into a production pipeline.

Joint audio and video in one pass

Joint audio and video in one pass

The model generates the picture and the soundtrack together, so dialogue, footsteps, ambient noise, and on-screen events stay locked. No second TTS or foley pass, no manual lip sync, no drift between two pipelines.

Three input modes on one endpoint

Three input modes on one endpoint

Text-to-video for fresh briefs. Image-to-video for animating a still. Reference-to-video for character continuity. Switch by setting the `mode` field; everything else stays the same.

Up to nine reference images for character work

Up to nine reference images for character work

Reference-to-video takes 1 to 9 images. Refer to each in the prompt as `character1`, `character2`, and so on, and the model holds identity, costume, and silhouette across the clip.

Five aspect ratios and 720p or 1080p output

Five aspect ratios and 720p or 1080p output

16:9, 9:16, 1:1, 4:3, and 3:4 cover horizontal hero, vertical social, square feed, and the older broadcast ratios in one model. Choose 720p for fast drafts or 1080p for delivery.

3 to 15 second clips with seed control

3 to 15 second clips with seed control

Clip length is a single integer field. Pair it with an integer seed (0 to 2,147,483,647) to get the same render back across re-runs. Useful for A/B prompts and locking a clip you want to keep.

Best for

Ads and marketing with native audio

Cinematic clips from text or product images that come back with matching audio in the same render. No second pass for voice or foley.

E-commerce hero shots

Animate a product still with image-to-video. The first frame anchors the look while the prompt drives the camera move and ambient sound.

Short-form drama and serialized clips

Reference-to-video keeps a character consistent across scenes. Use `character1`, `character2` tags in the prompt to lock identity over a sequence.

Social campaigns with multiple ratios

Five aspect ratios in one model means a 16:9 cut, a 9:16 cut, and a 1:1 cut all come from the same prompt without re-prompting per platform.

Brand-aligned reference work

Up to nine references per call lets you carry brand colours, costume, and product geometry across an entire campaign without retraining a custom model.

Quick turnaround dialogue scenes

Joint audio means a clip with spoken lines or beat-driven cuts works in one shot. Useful when audio sync is the part that has historically slowed a render.

Use cases

Build a product hero in one call by passing the packaging shot as first_frame_url and a 5-second camera-move prompt; the result lands at 1080p with matching ambient audio. Make a vertical TikTok cut by switching ratio to 9:16 and re-running the same prompt. Run a serialized story by uploading 2 or 3 character references and writing prompts that mention character1 and character2 directly, so the cast stays consistent across scenes. Lock a winning result with a seed and an integer between 0 and 2,147,483,647 so you can re-render at a different duration without losing the look.

API examples

Call HappyHorse 1.0 from any language by POSTing to /v1/tasks. Full parameter docs live at docs.unifically.com/models/video/alibaba/happyhorse-1.0-video.

curl -X POST https://api.unifically.com/v1/tasks \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "alibaba/happyhorse-1.0-video",
    "input": {
      "mode": "t2v",
      "prompt": "A golden retriever running through a field of wildflowers at sunset",
      "resolution": "1080P",
      "ratio": "16:9",
      "duration": 5
    }
  }'

Successful submission returns a task_id. Poll GET /v1/tasks/<task_id> or set a callback_url on the request to receive the finished video URL.

FAQs

People also ask

HappyHorse 1.0 is a video generation model that produces video and matching audio in a single pass. It runs in three input modes (text-to-video, image-to-video, and reference-to-video) on the same endpoint, with output at 720p or 1080p, five aspect ratios, and clips from 3 to 15 seconds.

Three. Text-to-video for prompt-only generation. Image-to-video, which takes a single first-frame image. Reference-to-video, which takes 1 to 9 character images and uses characterN tags in the prompt to anchor each one in the clip.

Pass one to nine images on reference_image_urls and refer to each by character1, character2, and so on inside the prompt. The model uses those references to lock character identity, costume, and silhouette across the generated clip.

3 to 15 seconds per clip, with 5 seconds as the default. Output runs at 720p or 1080p. Aspect ratios cover 16:9, 9:16, 1:1, 4:3, and 3:4 in text-to-video and reference-to-video modes.

Yes. Audio and video come back in the same render, so on-screen action and sound stay in sync without a second audio pass or manual lip sync afterwards.

Up to 5,000 characters of non-Chinese text or 2,500 Chinese characters per call. Anything past the cap is truncated, so trim long briefs before sending.

Yes. Pass an integer seed in the range 0 to 2,147,483,647 alongside the prompt and any input frames. Re-sending the same inputs and seed returns the same render.