Harvard and Columbia Release Open Dataset of 16 Million Protein Sequences, Solving the Private Data Issue for AlphaFold 2 Training!

新智元

Published inAI News · 2 min read · Sep 20, 2023

The research institutions such as Harvard University and Columbia University have released an open-source dataset called OpenProteinSet, which includes 16 million protein multiple sequence alignments (MSA) and related data. This dataset addresses the issue of privatized training data for DeepMind's AlphaFold 2, providing significant support for the fields of bioinformatics and protein machine learning. AlphaFold 2 has led the field in the accuracy of protein structure prediction, but its private data has restricted progress for other researchers. OpenProteinSet contains proteins from all protein databases and data from various UniProt clusters, making it suitable for training a wide range of AI models. This resource is of great significance for research in biology, drug development, and other fields, and will drive the advancement of related studies.

Bioinformatics AlphaFold 2 Protein Structure

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Cleans 11 Ranking Titles! Ant Group Releases F2LLM-v2: A Full-Scale, Multilingual Hexagonal Embedding Model

Ant Group and Shanghai Jiao Tong University jointly released the F2LLM-v2 series of Embedding models, aiming to break the English-centric bias in the field of semantic representation. The model swept 11 SOTA rankings in the authoritative MTEB benchmark, demonstrating dominant performance. As a fully open-source solution, it combines high performance with extreme efficiency, providing global developers with advanced semantic representation tools.

Mar 26, 2026

AI2 Releases Fully Open-Source Web Agent MolmoWeb: Control Web Pages with Just Visuals

The Allen Institute for AI has released the fully open-source web agent MolmoWeb, which uses a 'visually driven' technology to make web navigation decisions by analyzing screen captures, simulating human browsing behavior, and performing actions such as clicking and scrolling, marking a significant breakthrough in web navigation technology.

Mar 26, 2026

180

Google DeepMind Launches Lyria 3 Pro: AI Music Transforms from 30-Second Preview to Full Songs

Google DeepMind's newly released Lyria 3 Pro model significantly increases the duration of music generation from 30 seconds to 3 minutes and adds a 'structure awareness' capability, allowing the model to understand and generate complete songs with elements such as intro, verse, and chorus, breaking through the previous limitation of only being able to generate background sound effects.

Mar 26, 2026

Cursor Admits New Model Composer 2 Was Developed Based on Domestic Large Model Kimi

Cursor's new model Composer 2 is accused of being developed based on the open-source model Kimi 2.5 by Moonshot, raising questions about originality. Company executives admitted to using an open-source foundation and acknowledged technical借鉴.

Mar 23, 2026

330

Devil's Party! Xiaomi MiMo Large Model Joins OpenClaw and Five Other Frameworks for a Week of Free Access

Xiaomi launched the MiMo-V2 series large model and collaborated with five major Agent frameworks, offering free API access to global developers for one week, aiming to accelerate AI technology ecosystem development and expand its global influence.

Mar 23, 2026

280

Lei Jun confirms that the desktop version of Xiaomi's AI intelligent agent 'MiClaw' is under development, with the MiMo-V2-Pro large model launching across all platforms

Lei Jun from Xiaomi Group announced at the 2026 China Development Forum that the desktop version of the AI intelligent agent 'MiClaw' has been included in the development plan. Previously, the mobile version of MiClaw had started a limited beta test and released signals of cross-device collaboration during the spring product launch. With the full platform release of Xiaomi's self-developed large model MiMo-V2-Pro, MiClaw's features have been fully updated and made available for trial experience. This AI intelligent agent is capable of executing real tasks.

Mar 23, 2026

160

Cursor Releases Composer2 Model, Admits Underlying Base is Moonshot AI Open-Source Foundation Kimi

Cursor's AI programming model Composer2 is suspected of being based on Kimi k2.5, an open-source model from Moonshot AI, without proper renaming. Company executives have responded to the allegations.....

Mar 23, 2026

320

Lei Jun Strikes: Investing 6 Billion Yuan in Three Years to Break Through AI! Xiaomi SU7 Upgrade Brings Both Price Increase and Fan Growth

Xiaomi advances in both EV and AI, investing heavily to build competitive barriers. It launched the SU7 electric car and plans over 160 billion yuan in AI R&D this year, with 600 billion yuan over three years. Its self-developed AI model ranks among the world's top tier, supporting its strategic expansion.....

Mar 20, 2026

380

Cursor Releases Composer 2 Programming Model: Performance Rivals GPT-5, at a Fraction of the Cost

Cursor launches Composer2, its second-gen AI coding model, with significant performance gains and competitive pricing to challenge OpenAI and Claude, intensifying competition in AI programming tools.....

Mar 20, 2026

910

Entering the Top Three Globally: Microsoft Releases the Strongest Text-to-Image Model MAI-Image-2

Microsoft AI CEO Mustafa Suleyman unveiled MAI-Image-2, a second-gen image generation model. It ranks third globally on LMArena, behind Google's Gemini-3.1-flash-image-preview and OpenAI's GPT-image-1.5-high-fidelity, with notable quality improvements over its predecessor.....

Mar 20, 2026

880

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

Harvard and Columbia Release Open Dataset of 16 Million Protein Sequences, Solving the Private Data Issue for AlphaFold 2 Training!

新智元

This article is from AIbase Daily

AI News Recommendations

Cleans 11 Ranking Titles! Ant Group Releases F2LLM-v2: A Full-Scale, Multilingual Hexagonal Embedding Model

AI2 Releases Fully Open-Source Web Agent MolmoWeb: Control Web Pages with Just Visuals

Google DeepMind Launches Lyria 3 Pro: AI Music Transforms from 30-Second Preview to Full Songs

Cursor Admits New Model Composer 2 Was Developed Based on Domestic Large Model Kimi

Devil's Party! Xiaomi MiMo Large Model Joins OpenClaw and Five Other Frameworks for a Week of Free Access

Lei Jun confirms that the desktop version of Xiaomi's AI intelligent agent 'MiClaw' is under development, with the MiMo-V2-Pro large model launching across all platforms

Cursor Releases Composer2 Model, Admits Underlying Base is Moonshot AI Open-Source Foundation Kimi

Lei Jun Strikes: Investing 6 Billion Yuan in Three Years to Break Through AI! Xiaomi SU7 Upgrade Brings Both Price Increase and Fan Growth

Cursor Releases Composer 2 Programming Model: Performance Rivals GPT-5, at a Fraction of the Cost

Entering the Top Three Globally: Microsoft Releases the Strongest Text-to-Image Model MAI-Image-2

AI News Recommendations

Cleans 11 Ranking Titles! Ant Group Releases F2LLM-v2: A Full-Scale, Multilingual Hexagonal Embedding Model

AI2 Releases Fully Open-Source Web Agent MolmoWeb: Control Web Pages with Just Visuals

Google DeepMind Launches Lyria 3 Pro: AI Music Transforms from 30-Second Preview to Full Songs

Cursor Admits New Model Composer 2 Was Developed Based on Domestic Large Model Kimi

Devil's Party! Xiaomi MiMo Large Model Joins OpenClaw and Five Other Frameworks for a Week of Free Access

Lei Jun confirms that the desktop version of Xiaomi's AI intelligent agent 'MiClaw' is under development, with the MiMo-V2-Pro large model launching across all platforms

Cursor Releases Composer2 Model, Admits Underlying Base is Moonshot AI Open-Source Foundation Kimi

Lei Jun Strikes: Investing 6 Billion Yuan in Three Years to Break Through AI! Xiaomi SU7 Upgrade Brings Both Price Increase and Fan Growth

Cursor Releases Composer 2 Programming Model: Performance Rivals GPT-5, at a Fraction of the Cost

Entering the Top Three Globally: Microsoft Releases the Strongest Text-to-Image Model MAI-Image-2

GEO Services