Recently, the Arc Institute collaborated with NVIDIA, along with researchers from Stanford University, the University of California, Berkeley, and the University of California, San Francisco, to launch the world's largest biological artificial intelligence model - Evo2. This model is based on data from over 128,000 genomes and has been trained on 930 trillion nucleotides, making its scale comparable to the most powerful generative AI language models.

QQ_1740017835097.png

Evo2's deep learning capabilities allow it to quickly identify patterns in the gene sequences of different organisms, saving researchers years of time. The model can accurately identify mutations that cause human diseases and is capable of designing new genomes comparable in length to those of simple bacterial genomes. The development team of Evo2 announced that detailed information about the model will be released on February 19, 2025, along with a user-friendly interface called Evo Designer. The code for Evo2 has been made publicly available on Arc's GitHub and has been integrated into NVIDIA's BioNeMo framework to facilitate scientific research advancements.

Compared to its predecessor, Evo1, Evo2 not only expands the data range to include data from bacteria, archaea, viruses, and eukaryotes such as humans and plants, but it also marks a significant moment in the field of generative biology, enabling machines to "read, write, and think" in the language of nucleotides.

On the technical side, Evo2 was trained on the NVIDIA DGX Cloud AI platform using over 2,000 NVIDIA H100 GPUs, allowing the model to process up to 1 million nucleotides of gene sequences at once, thereby understanding the relationships between distant parts of genomes. The new AI architecture, "StripedHyena2," enables Evo2 to handle 30 times more data than Evo1.

Evo2 has wide-ranging applications, excelling in analyzing genetic variations related to protein function and organism adaptability. In testing variants of the breast cancer-related gene BRCA1, Evo2 predicted mutations with an accuracy exceeding 90%. These findings can significantly save laboratory time and funding, promoting the development of new drugs.

Additionally, Evo2 can assist in designing new biological tools or therapies. For example, scientists can design gene therapies targeting specific cells to avoid side effects. The research team believes that more specific AI models can be built on the foundation of Evo2, providing more possibilities for genomic research and bioengineering.

Regarding ethical and safety risks, researchers ensured that Evo2's dataset does not contain pathogens harmful to humans and other complex organisms, responsibly developing and deploying this technology.

For more details on Evo2: https://arcinstitute.org/news/blog/evo2

Key Points:

🌱 Evo2 is the world's largest biological AI model, trained on data covering 128,000 genomes.  

🔍 The model can quickly identify disease mutations and design new genomes, significantly improving research efficiency.  

💡 Evo2 offers new possibilities for future bioengineering and gene therapy design.