Back to stories
Industry

Tensormesh Lands $20M from Nvidia, AMD and CoreWeave to Turn KV Caching Into an Inference Layer

Michael Ouroumis2 min read
Tensormesh Lands $20M from Nvidia, AMD and CoreWeave to Turn KV Caching Into an Inference Layer

Tensormesh, the startup commercializing the open-source LMCache project, has raised $20 million in a seed extension that brings total funding to $24.5 million — and the cap table is the headline. NVentures (Nvidia's venture arm), AMD Ventures and CoreWeave all co-invested, alongside Valley Capital Partners and Laude Ventures. The round, announced May 27, landed alongside the general availability of the company's flagship product, Tensormesh Inference.

The pitch: stop recomputing the KV cache

Every token an LLM processes generates key-value tensors. In multi-turn agents, RAG pipelines and long-context workloads, the same prefixes get recomputed on essentially every call — burning GPU cycles to regenerate state the model already produced. Tensormesh stores those KV tensors and reuses them across requests and nodes, eliminating the redundant prefill.

The company claims up to 10x reductions in both latency and GPU spend for agentic models, where long context is replayed repeatedly across steps. Some customers, it says, see cache hit rates above 70% — meaning more than two-thirds of prompts are served from cache rather than recomputed.

Why Nvidia, AMD and CoreWeave all wrote checks

The notable signal is who's funding it. A GPU vendor (Nvidia), its direct rival (AMD), and a GPU cloud (CoreWeave) rarely back the same infrastructure startup. The read: KV caching is being treated as a foundational layer of the inference stack rather than a vendor-specific tweak. LMCache, the open-source base, has 8,000+ GitHub stars and integrations spanning vLLM, SGLang, TensorRT, llm-d, NVIDIA Dynamo, AWS SageMaker and Oracle OCI Data Science — hardware-agnostic by design.

What ships

Tensormesh Inference is a SaaS platform with three deployment modes: a serverless, OpenAI-compatible API; on-demand GPU; and reserved enterprise capacity. It includes a cost dashboard that tracks cache hit rates so teams can see the savings directly. The company is led by founder and CEO Junchen Jiang, a University of Chicago faculty member and LMCache co-creator, with a team drawn from UChicago, UC Berkeley and Carnegie Mellon. Target workloads are the expensive ones: multi-step agents, long-context deployments, and applications that repeatedly analyze previously seen documents.

What changes for builders

For anyone running LLMs at production scale, inference is the dominant cost line, and agentic architectures make it worse by re-sending long histories on every turn. A productized, GPU-agnostic KV-cache layer that drops in behind an OpenAI-compatible endpoint is a concrete lever on that bill — and the fact that it now ships as a managed platform, not just an open-source library, lowers the integration cost of capturing it. The cross-vendor backing suggests the major infrastructure players expect cache reuse to become standard plumbing rather than a differentiator any one of them owns.

Learn AI for Free — FreeAcademy.ai

Take "AI for Business: Practical Implementation" — a free course with certificate to master the skills behind this story.

More in Industry

CXMT Clears China's Biggest IPO Since 2022 — $4.3B to Scale Domestic DRAM
Industry

CXMT Clears China's Biggest IPO Since 2022 — $4.3B to Scale Domestic DRAM

ChangXin Memory Technologies passed the Shanghai STAR Market listing review, targeting 29.5 billion yuan (~$4.3B) to expand DRAM capacity and develop HBM — China's biggest IPO since 2022.

1 hours ago2 min read
Snowflake Commits $6B to AWS Over Five Years, Doubling Its Largest-Ever Infrastructure Bet on Graviton and Agentic AI
Industry

Snowflake Commits $6B to AWS Over Five Years, Doubling Its Largest-Ever Infrastructure Bet on Graviton and Agentic AI

Snowflake signed a $6 billion, five-year strategic collaboration with AWS centered on Graviton CPUs and GPUs for agentic AI — its biggest infrastructure commitment ever — alongside a Q1 beat that lifted shares ~37%.

13 hours ago2 min read
Cognition Raises $1B at a $26B Valuation, 2.5x in Eight Months
Industry

Cognition Raises $1B at a $26B Valuation, 2.5x in Eight Months

Cognition, maker of the Devin coding agent and the Windsurf IDE, closed more than $1 billion at a $26 billion post-money valuation co-led by Lux Capital, General Catalyst, and 8VC, on a reported ~$492M annualized run-rate.

18 hours ago2 min read