Fireworks AI is in talks to raise a new funding round that would value the inference startup at roughly $15 billion, according to a Bloomberg report dated May 27 — nearly four times the $4 billion valuation it carried just seven months ago. Existing backer Index Ventures is set to co-lead the round, which has not closed and whose terms could still change.
4x in seven months
The jump is steep even by 2026 standards. Fireworks set that $4 billion mark in its October 2025 Series C, a $250 million raise co-led by Lightspeed Venture Partners, Index Ventures and Evantic Capital, with Sequoia Capital participating. Its July 2024 Series B valued the company at just $552 million. A $15 billion price would mark roughly a 27x step-up in under two years.
The revenue trajectory is doing the talking. Research firm Sacra estimates Fireworks hit about $315 million in annualized revenue in February 2026, up 416% year over year — the kind of curve that explains why an investor already on the cap table would lead the next round rather than sit it out.
What the company sells
Founded in 2022 and based in Redwood City, Fireworks was started by CEO Lin Qiao, a former Meta engineer. Its pitch is narrow and increasingly valuable: run inference for open-weight LLMs and generative models faster and cheaper than teams can self-host. The platform exposes OpenAI-compatible endpoints, offers both serverless and dedicated-GPU deployments, and layers optimizations on serving stacks like vLLM, SGLang and TensorRT. Fireworks reportedly processes around 15 trillion tokens per day.
That positions it squarely in the serverless-inference tier alongside Together AI, DeepInfra and Replicate, with Baseten — fresh off a $300 million Series E at a $5 billion valuation in January 2026 — pushing hardest on the enterprise engineering angle. Custom-silicon vendors Groq, Cerebras and SambaNova attack the same workloads from the hardware side, competing on raw throughput.
Why inference is the trade
The round is a bet that serving models, not training them, is where durable margin sits. As frontier-lab capex balloons into the hundreds of billions, the practical question for enterprises is who runs their open-weight checkpoints at production scale without per-token costs spiraling. NVIDIA notes leading providers — Baseten, DeepInfra, Fireworks and Together among them — are cutting cost per token by up to 10x on Blackwell-class hardware, which is exactly what makes open-weight deployment viable against closed-API incumbents.
What it means for builders
For teams choosing an inference layer, a $15 billion Fireworks is a vendor with the balance sheet to commit GPU capacity and hold pricing — but also one under pressure to monetize. The recurring reasons teams migrate off any single provider remain the same: per-token cost at scale, dedicated-GPU control, model-catalog breadth for newer checkpoints, and fine-tune portability. The smart play is to keep deployments on OpenAI-compatible endpoints so the inference layer stays swappable, no matter whose valuation is climbing this quarter.



