How much did Tensormesh raise and who backed it?

A $20 million seed extension, bringing total funding to $24.5 million, from Nvidia's NVentures, AMD Ventures, CoreWeave, Valley Capital Partners and Laude Ventures.

What performance gains does Tensormesh claim?

Up to 10x reductions in latency and GPU spend for agentic models by reusing the KV cache across requests. The company says some customers hit cache rates above 70%, so more than two-thirds of prompts are served from cache.

How does it fit into an existing inference stack?

Tensormesh Inference exposes an OpenAI-compatible serverless API plus on-demand and reserved GPU deployments. Its open-source base, LMCache, already integrates with vLLM, SGLang, TensorRT, llm-d, NVIDIA Dynamo, AWS SageMaker and Oracle OCI Data Science.

Tensormesh Lands $20M from Nvidia, AMD and CoreWeave to Turn KV Caching Into an Inference Layer

Tensormesh, the startup commercializing the open-source LMCache project, has raised $20 million in a seed extension that brings total funding to $24.5 million — and the cap table is the headline. NVentures (Nvidia's venture arm), AMD Ventures and CoreWeave all co-invested, alongside Valley Capital Partners and Laude Ventures. The round, announced May 27, landed alongside the general availability of the company's flagship product, Tensormesh Inference.

The pitch: stop recomputing the KV cache

Every token an LLM processes generates key-value tensors. In multi-turn agents, RAG pipelines and long-context workloads, the same prefixes get recomputed on essentially every call — burning GPU cycles to regenerate state the model already produced. Tensormesh stores those KV tensors and reuses them across requests and nodes, eliminating the redundant prefill.

The company claims up to 10x reductions in both latency and GPU spend for agentic models, where long context is replayed repeatedly across steps. Some customers, it says, see cache hit rates above 70% — meaning more than two-thirds of prompts are served from cache rather than recomputed.

Why Nvidia, AMD and CoreWeave all wrote checks

The notable signal is who's funding it. A GPU vendor (Nvidia), its direct rival (AMD), and a GPU cloud (CoreWeave) rarely back the same infrastructure startup. The read: KV caching is being treated as a foundational layer of the inference stack rather than a vendor-specific tweak. LMCache, the open-source base, has 8,000+ GitHub stars and integrations spanning vLLM, SGLang, TensorRT, llm-d, NVIDIA Dynamo, AWS SageMaker and Oracle OCI Data Science — hardware-agnostic by design.

What ships

Tensormesh Inference is a SaaS platform with three deployment modes: a serverless, OpenAI-compatible API; on-demand GPU; and reserved enterprise capacity. It includes a cost dashboard that tracks cache hit rates so teams can see the savings directly. The company is led by founder and CEO Junchen Jiang, a University of Chicago faculty member and LMCache co-creator, with a team drawn from UChicago, UC Berkeley and Carnegie Mellon. Target workloads are the expensive ones: multi-step agents, long-context deployments, and applications that repeatedly analyze previously seen documents.

What changes for builders

For anyone running LLMs at production scale, inference is the dominant cost line, and agentic architectures make it worse by re-sending long histories on every turn. A productized, GPU-agnostic KV-cache layer that drops in behind an OpenAI-compatible endpoint is a concrete lever on that bill — and the fact that it now ships as a managed platform, not just an open-source library, lowers the integration cost of capturing it. The cross-vendor backing suggests the major infrastructure players expect cache reuse to become standard plumbing rather than a differentiator any one of them owns.

Tensormesh Lands $20M from Nvidia, AMD and CoreWeave to Turn KV Caching Into an Inference Layer

The pitch: stop recomputing the KV cache

Why Nvidia, AMD and CoreWeave all wrote checks

What ships

What changes for builders

More in Industry

CXMT Clears China's Biggest IPO Since 2022 — $4.3B to Scale Domestic DRAM

Snowflake Commits $6B to AWS Over Five Years, Doubling Its Largest-Ever Infrastructure Bet on Graviton and Agentic AI

Cognition Raises $1B at a $26B Valuation, 2.5x in Eight Months