Recently, scientists at Sakana AI have made groundbreaking progress in the field of artificial intelligence, successfully utilizing vision-language foundation models (FMs) for the first time to automate the search for Artificial Life (ALife) simulations. This new method, called ASAL (Automated Search for Artificial Life), brings a revolutionary transformation to research in the field of artificial life and is expected to accelerate development in this area.
Traditional artificial life research primarily relies on manual design and trial-and-error methods, while ASAL changes this status quo. The core of this method is to evaluate the videos generated by simulations using foundation models, thereby automatically searching for interesting ALife simulations. ASAL discovers life forms mainly through three mechanisms:
Supervised Target Search: Searching for simulations that produce specific phenomena through text prompts. For example, researchers can set targets like "one cell" or "two cells," allowing the system to automatically identify simulations that meet these criteria. Open-ended Search: Seeking simulations that generate endless novelty over time. This approach helps discover simulations that remain interesting to human observers. Heuristic Search: Looking for a diverse set of interesting simulations to reveal "alien worlds."
ASAL's versatility enables it to be effectively applied to various ALife substrates, including Boids, Particle Life, Game of Life, Lenia, and Neural Cellular Automata. Researchers have discovered unprecedented life forms within these substrates, such as unusual clustering patterns in Boids, new self-organizing cells in Lenia, and open-ended cellular automata similar to Conway's Game of Life.
Moreover, ASAL supports quantitative analysis of phenomena that were previously only qualitatively assessed. The foundation models possess representation capabilities similar to humans, allowing ASAL to measure complexity in a way that aligns with human cognition. For instance, researchers can quantify the plateau phase in Lenia simulations by measuring the rate of change of CLIP vectors during the simulation process.
The innovation of this research lies in its use of pre-trained foundation models, particularly the CLIP (Contrastive Language-Image Pre-training) model, to evaluate the videos of simulations. The CLIP model aligns the representations of images and text through contrastive learning, enabling it to understand human concepts of complexity. The ASAL approach is not limited to specific foundation models or simulation substrates, meaning it can be compatible with future models and substrates.
Researchers have also experimentally validated the effectiveness of ASAL, testing it with different foundation models (such as CLIP and DINOv2) and various ALife substrates. Results indicate that CLIP slightly outperforms DINOv2 in generating diversity that aligns with human cognition, but both significantly surpass low-level pixel representations. This highlights the importance of using deep foundation model representations to measure human concepts of diversity.
This research opens new avenues in the field of artificial life, allowing researchers to focus on higher-level questions, such as how to best describe the phenomena they wish to observe, and then let automated processes search for these outcomes. The emergence of ASAL not only aids scientists in discovering new life forms but also enables the quantitative analysis of complexity and openness in life simulations. Ultimately, this technology holds the potential to help people understand the nature of life and all possible forms of life that may exist in the universe.
Project Code: https://github.com/SakanaAI/asal/
Paper Link: https://arxiv.org/pdf/2412.17799