Beyond Diffusion: Inside Apple’s New STARFlow Generative Model

ai Dec 2, 2025

The AI art world just got a curveball from Cupertino. While the rest of the industry is doubling down on Diffusion models (the tech behind Midjourney, Flux, and Stable Diffusion), Apple Research just quietly released STARFlow on Hugging Face—a powerful new generative model that takes a completely different mathematical approach.

Here is everything you need to know about Apple’s latest open-source drop and why it matters for the future of AI generation.

What Just Happened?

Apple has released the code and weights for STARFlow (for images) and STARFlow-V (for video) on Hugging Face and GitHub. This isn't just a small utility model; it’s a high-performance generative engine capable of creating high-resolution images and videos from text prompts.

The Repository: apple/ml-starflow
The Tech: Normalizing Flows (specifically "Transformer Autoregressive Flow")
The Promise: High-quality generation that rivals diffusion models but with potentially faster sampling and better efficiency.

The "Secret Sauce": Flow vs. Diffusion

To understand why STARFlow is exciting, you have to understand the current meta. Almost every top-tier image generator today uses Diffusion. Diffusion models work by adding noise to an image until it’s static, and then learning to reverse that process (denoising) to create an image from scratch.

STARFlow is different. It is based on Normalizing Flows. Instead of iteratively removing noise, Flow-based models learn a direct, invertible mapping between a simple distribution (like random noise) and complex data (like a photo).

Diffusion: "Let me fix this static repeatedly until it looks like a cat." (Iterative, slow)
Flows (STARFlow): "Let me transform this noise directly into a cat using a complex mathematical formula." (Direct, theoretically faster/exact).

Historically, Flow models struggled to match the quality of Diffusion. Apple claims STARFlow bridges that gap, achieving "state-of-the-art" results that compete with the best diffusion models while offering exact likelihood estimation—a fancy way of saying the model has a much more precise mathematical understanding of the data it generates.

Key Specs & Features

Apple didn't just release a paper; they released the goods.

STARFlow (Images): ~3 Billion parameters. Uses a "deep-shallow" architecture to balance power and speed.
STARFlow-V (Video): ~7 Billion parameters. Capable of generating temporal video sequences (up to 480p resolution in initial tests) with causal attention (meaning frame 2 depends on frame 1, ensuring smooth motion).
Training: Trained on massive internal datasets using Apple’s specialized infrastructure.

Why Should You Care

Apple is Open Sourcing: For a company known for its "walled garden," releasing a research model of this caliber on Hugging Face is a significant signal. They are engaging with the open research community.
Alternatives to Diffusion: We are arguably hitting diminishing returns with Diffusion models. STARFlow proves there is another path. If Flow models can match Diffusion in quality but beat them in speed (inference latency), the next generation of AI art tools might not be Diffusion-based at all.
On-Device Potential: Normalizing Flows can be incredibly efficient. This aligns perfectly with Apple's goal of running powerful AI locally on your iPhone or Mac, rather than in the cloud.

How to Try It

The model weights and code are live now. If you are a developer or researcher, you can clone the repo and run it (likely requires a beefy GPU or a Mac with Apple Silicon).

Hugging Face: apple/starflow

The Verdict

STARFlow is a "research preview," so don't expect a polished app just yet. But the technology underneath is a glimpse into the future of generative AI. Apple just showed up to the AI art party, and they brought their own physics engine.