What is the Trillion Gene Atlas?

The Trillion Gene Atlas is a Basecamp Research initiative to collect novel genomic data from over 100 million species, expanding known evolutionary genetic diversity by 100x to train AI models for drug discovery.

Who are the partners behind the Trillion Gene Atlas?

The project is a collaboration between Basecamp Research, Anthropic, Ultima Genomics, and PacBio, powered by NVIDIA AI infrastructure including NVIDIA Parabricks for data processing.

How long will it take to complete the Trillion Gene Atlas?

Basecamp Research expects to compress what would have been over 20 years of biological data gathering and analysis into less than two years, thanks to AI-accelerated processing.

Basecamp Research Launches Trillion Gene Atlas to Revolutionize AI Drug Discovery

Basecamp Research announced the launch of the Trillion Gene Atlas this week at both the SXSW Health Track and NVIDIA's GTC 2026 conference in San Jose — a landmark initiative that aims to expand humanity's map of evolutionary genetic diversity by a factor of 100 and use it to train the next generation of AI models for drug discovery.

The Scale of the Ambition

The Atlas will collect novel genomic data from more than 100 million new species across thousands of sampling sites worldwide. The goal is to provide the vast, diverse training data that AI systems need to learn from billions of years of evolution and, ultimately, design new medicines on demand.

Basecamp Research estimates that processing the quadrillions of DNA base pairs involved would have taken more than 20 years using conventional methods. With the Atlas infrastructure, the company expects to compress that timeline to under two years.

A Powerhouse Partnership

The initiative brings together an unusual coalition. Anthropic, the AI safety company behind the Claude model family, is contributing AI capabilities. Ultima Genomics and PacBio are providing next-generation sequencing technology — with PacBio's HiFi sequencing selected specifically for its accuracy on long-read genomic data. NVIDIA's AI infrastructure, including the Parabricks genomic analysis toolkit, delivers a reported 10x speedup in data processing.

Building on EDEN

The Trillion Gene Atlas builds on Basecamp Research's EDEN foundation models, which launched earlier this year after training on more than 10 billion novel genes collected from over one million previously uncharacterized species. The Atlas represents a 100x expansion of the underlying dataset — a scale that the company argues is necessary for AI models to capture the full breadth of nature's molecular toolkit.

Global Scientific Network

Alongside the Atlas launch, Basecamp Research announced new biodiversity partnerships in Chile and Argentina, as well as an expanded collaboration in Antarctica. These additions extend the company's global network of scientific collaborators to 31 countries, ensuring that the genomic data captured reflects the planet's true biological diversity rather than sampling only well-studied ecosystems.

Why It Matters for Drug Discovery

Traditional drug discovery relies on screening known compounds against known targets — a process that is expensive, slow, and limited by the chemical space researchers have explored. Foundation models trained on evolutionary data offer a fundamentally different approach: they can predict protein structures, suggest novel molecular candidates, and identify therapeutic targets that conventional methods might miss entirely.

By dramatically expanding the training data available to these models, the Trillion Gene Atlas could unlock new classes of treatments for diseases where current drug pipelines have stalled. The initiative also raises the bar for what constitutes a competitive biological AI dataset, putting pressure on rivals to match Basecamp's data breadth or risk falling behind in the race to build effective biomedical foundation models.

Basecamp Research Launches Trillion Gene Atlas to Revolutionize AI Drug Discovery

The Scale of the Ambition

A Powerhouse Partnership

Building on EDEN

Global Scientific Network

Why It Matters for Drug Discovery

More in Research

Harvard Study: OpenAI's o1 Outperforms ER Doctors on Diagnosis Accuracy

ARC Prize Analysis: GPT-5.5 and Opus 4.7 Share Three Systematic Reasoning Errors on ARC-AGI-3

MIT's FTTE Cuts Federated Learning Time 81%, Brings AI Training to Smartwatches and Sensors