arielshemesh1999@gmail.com · Israel
← All articles

API: Higgsfield

Thirty-plus image and video generation models — Nano Banana Pro, FLUX.2, Soul V2, Veo 3.1, Kling 3.0, Seedance 2.0 — behind one MIT-licensed CLI. Generative media from the terminal.

What it is

Higgsfield is the AI media platform whose CLI “generate(s) images, videos, and finished-video analysis from the terminal using 30+ Higgsfield AI models.” The pitch in one line: every brand-relevant text-to-image and text-to-video model in 2026 — OpenAI, Google, Kuaishou, ByteDance, Black Forest Labs, xAI, and the in-house Soul / Cinematic Studio models — behind one auth token and one binary. The whole CLI is MIT licensed and distributed through three channels: a curl install script, a Homebrew tap, and an npm global package.

What makes the project interesting isn’t a single model — it’s the abstraction. Every model, regardless of upstream provider, gets the same CLI surface: higgsfield generate create <model> --prompt … --wait. Swapping Veo 3.1 for Kling 3.0 is a one-string change. The CLI handles auth, polling, output formats, and disk persistence; you only pick the model and the prompt.

Architecture

The CLI is a thin client over the Higgsfield REST API. Each generate create command opens a job, returns a job id, then polls the status endpoint until completed (or fails). With --wait, the CLI blocks until the artifact is ready and writes the file to --out. Without --wait, you get the job id back immediately — useful for fan-out scripts where one driver kicks off twenty generations and a second pass reaps them. Auth tokens are stored in ~/.config/higgsfield/ after higgsfield auth login, or you can short-circuit the whole flow by exporting HIGGSFIELD_API_KEY for CI.

Under the unified surface, each underlying provider has its own quirks — Veo and Kling are video-only, FLUX.2 and Nano Banana are image-only, Soul V2 carries the in-house face-consistency pipeline. The CLI hides the provider boundary: every job runs through the same auth, the same polling, the same output handling, and the same --json contract. Failures surface as structured error payloads with a stable code field, which makes wrapping the CLI in higher-level orchestration straightforward.

The model catalogue

18 image models: Nano Banana Pro, Nano Banana 2, FLUX.2, Flux Kontext, GPT Image 2, Soul V2, Seedream 4.5, Grok Image, OpenAI Hazel, Z Image, Kling O1 Image, Cinematic Studio 2.5, Soul Cinematic, Soul Location, Marketing Studio Image and more.

17 video models: Virality Predictor, Google Veo 3.1, Kling v3.0, Kling 2.6, Seedance 2.0, Seedance 1.5 Pro, Wan 2.7, Grok Video, Cinematic Studio 3.0, Soul Cast, Marketing Studio Video and more.

Models are referenced by slug, not display name. nano_banana_2, kling3_0, veo_3_1, flux_2, soul_v2 — the slug naming is consistent enough that you can guess most of them, and higgsfield models list prints the authoritative set.

Install

# macOS / Linux — one-line installer
curl -fsSL https://raw.githubusercontent.com/higgsfield-ai/cli/main/install.sh | sh

# Or Homebrew
brew install higgsfield-ai/tap/higgsfield

# Or npm
npm install -g @higgsfield/cli

# Verify
higgsfield --version

Auth

# Browser-based device login — opens a tab, signs you in, drops a token.
higgsfield auth login

# Or use a static API key (CI, servers)
export HIGGSFIELD_API_KEY=<your-key>

# Confirm the active account
higgsfield auth whoami

Treat HIGGSFIELD_API_KEY the same way you’d treat any other generation credential — in env vars, never in repo, and rotate when a CI log leaks. For local development, the device-login flow is friendlier; for Vercel cron or GitHub Actions, the static key is the only path that works.

Usage examples

# Image: Nano Banana 2 — wait for completion + save to ./out/
higgsfield generate create nano_banana_2 \
  --prompt "a quiet beach at sunrise, soft fog, 35mm film" \
  --wait --out ./out/

# Video: Kling 3.0, 5 seconds, vertical 9:16 for socials
higgsfield generate create kling3_0 \
  --prompt "forest clearing at dawn, slow dolly-in" \
  --duration 5 --aspect 9:16 --wait

# Train a character (“Soul ID”) on your own face
higgsfield soul-id create --name me --soul-2 --image ./photo.jpg

# Generate with a Soul ID baked in (consistent face across shots)
higgsfield generate create soul_v2 \
  --prompt "portrait, golden hour, shallow depth of field" \
  --soul-id me --wait --out ./out/

# Score an existing video for virality before posting
higgsfield generate create virality_predictor --video ./final.mp4 --wait

# Fan-out: kick off five variants without waiting, reap with poll
for i in 1 2 3 4 5; do
  higgsfield generate create flux_2 --prompt "variant $i" --json > job_$i.json
done

Where Higgsfield earns its place

The closed image/video tools each lock you into one model and one UI. Higgsfield inverts that — same CLI, same auth, same output folder, thirty models — so the question becomes “which model is best for this prompt today,” not “which subscription do I still need.” The Soul ID training and Virality Predictor are the two features I haven’t seen anywhere else as first-class CLI flags: a face you can reuse across models without having to re-prompt the resemblance every time, and a numeric score on a finished video before you post it. Combined with --wait + --out, the whole workflow drops into a Makefile or a shell loop, which is exactly the shape I want for batch generation.

What’s new

The CLI catalogue is updated as new models ship from upstream providers — Veo 3.1, FLUX.2, Kling 3.0 and Seedance 2.0 are all post-April-2026 additions. Models are referenced by slug (nano_banana_2, kling3_0), so swapping models in an existing pipeline is one string change. The Soul V2 trainer dropped per-sample requirements from a dozen images to one. The Virality Predictor moved from a hidden preview flag to a first-class generate target.

Configuration and output

Most generation runs are configured purely through CLI flags — --aspect (16:9, 9:16, 1:1, 4:5), --duration (1–10 seconds for most video models), --seed for reproducibility, --negative-prompt for steering away from artifacts, and --steps on the models that expose it. Where a model needs a structured payload (reference image, mask, control net hints), you pass them as additional --image, --mask or --ref flags and the CLI uploads them to a temporary bucket before kicking off the job. Output filenames default to {slug}_{timestamp}_{seed}.{ext}, which keeps batch runs sortable and re-runnable.

For programmatic use, every command accepts --json, which prints the structured job payload to stdout instead of streaming progress dots. Pipe it to jq, persist it to a file, drive a Node or Python script around it, or wire it into a Makefile target. The CLI itself stays the same; the JSON is the API surface.

Why it matters / where I use it

For brand and marketing work I’d normally need three or four subscriptions — one for image gen, one for video, one for face consistency, one for analytics. Higgsfield collapses that to one CLI and one bill. The unified slug-based interface means switching from Kling to Veo because one of them handles a tricky camera move better is a one-line change in a script, not a re-architecture. The Soul ID pipeline alone saves the dozen-image-reference dance I used to do in Midjourney or SDXL to keep a face consistent across a campaign.

Where it stops being the right answer: anywhere you need fine-grained inference control (custom samplers, ControlNet, IP-Adapter, LoRA stacking) that the underlying providers don’t expose through the unified API. For that, you still drop down to ComfyUI or the native model SDKs. Higgsfield optimises for breadth and speed, not for the deepest end of any single model’s knob-set.

Source

CLI: github.com/higgsfield-ai/cli. Node SDK: higgsfield-ai/higgsfield-js. Site: higgsfield.ai. Community MCP wrapper: geopopos/higgsfield_ai_mcp. License: MIT.