Basecamp Research announced the launch of the Trillion Gene Atlas this week at both the SXSW Health Track and NVIDIA's GTC 2026 conference in San Jose — a landmark initiative that aims to expand humanity's map of evolutionary genetic diversity by a factor of 100 and use it to train the next generation of AI models for drug discovery.
The Scale of the Ambition
The Atlas will collect novel genomic data from more than 100 million new species across thousands of sampling sites worldwide. The goal is to provide the vast, diverse training data that AI systems need to learn from billions of years of evolution and, ultimately, design new medicines on demand.
Basecamp Research estimates that processing the quadrillions of DNA base pairs involved would have taken more than 20 years using conventional methods. With the Atlas infrastructure, the company expects to compress that timeline to under two years.
A Powerhouse Partnership
The initiative brings together an unusual coalition. Anthropic, the AI safety company behind the Claude model family, is contributing AI capabilities. Ultima Genomics and PacBio are providing next-generation sequencing technology — with PacBio's HiFi sequencing selected specifically for its accuracy on long-read genomic data. NVIDIA's AI infrastructure, including the Parabricks genomic analysis toolkit, delivers a reported 10x speedup in data processing.
Building on EDEN
The Trillion Gene Atlas builds on Basecamp Research's EDEN foundation models, which launched earlier this year after training on more than 10 billion novel genes collected from over one million previously uncharacterized species. The Atlas represents a 100x expansion of the underlying dataset — a scale that the company argues is necessary for AI models to capture the full breadth of nature's molecular toolkit.
Global Scientific Network
Alongside the Atlas launch, Basecamp Research announced new biodiversity partnerships in Chile and Argentina, as well as an expanded collaboration in Antarctica. These additions extend the company's global network of scientific collaborators to 31 countries, ensuring that the genomic data captured reflects the planet's true biological diversity rather than sampling only well-studied ecosystems.
Why It Matters for Drug Discovery
Traditional drug discovery relies on screening known compounds against known targets — a process that is expensive, slow, and limited by the chemical space researchers have explored. Foundation models trained on evolutionary data offer a fundamentally different approach: they can predict protein structures, suggest novel molecular candidates, and identify therapeutic targets that conventional methods might miss entirely.
By dramatically expanding the training data available to these models, the Trillion Gene Atlas could unlock new classes of treatments for diseases where current drug pipelines have stalled. The initiative also raises the bar for what constitutes a competitive biological AI dataset, putting pressure on rivals to match Basecamp's data breadth or risk falling behind in the race to build effective biomedical foundation models.



