Recently, the Allen Institute for Artificial Intelligence (AI2) released a new open-source model, OLMoE, designed to meet the demand for large language models (LLM). This model not only boasts exceptional performance but is also relatively cost-effective.

image.png

OLMoE employs a sparse mixture-of-experts (MoE) architecture, featuring 7 billion parameters, yet only uses 1 billion parameters per input token. It comes in two versions: the more general OLMoE-1B-7B and the instruction-tuned OLMoE-1B-7B-Instruct.

image.png

Unlike most other closed-source mixture-of-experts models, AI2 emphasizes that OLMoE is fully open-source. They noted in their paper, "Most MoE models are closed-source: despite some releasing model weights, information about their training data, code, or recipes is extremely limited." This has made many academic researchers unable to access these models.

AI2 research scientist Nathan Lambert stated on social media that OLMoE will aid in policy-making, providing a starting point for the academic community's H100 clusters. He also mentioned that the release of the OLMoE model is part of AI2's commitment to developing open-source models that can match the performance of closed models.

In constructing the model, AI2 decided to use 64 small experts for fine routing, activating only eight at runtime. Experiments show that OLMoE is on par with other models in performance but significantly reduces inference costs and memory storage. OLMoE is also built on AI2's previous open-source model OLMO1.7-7B, supporting a context window of 4096 tokens. The training data for OLMoE comes from multiple sources, including Common Crawl, Dolma CC, and Wikipedia.

In benchmark tests, OLMoE-1B-7B outperformed many existing models when compared to similar parameter models, even surpassing larger models like Llama2-13B-Chat and DeepSeekMoE-16B.

image.png

One of AI2's goals is to provide researchers with more fully open-source AI models, including mixture-of-experts architectures. Although many developers are using MoE architectures, AI2 believes that most other AI models are still far from being sufficiently open.

huggingface: https://huggingface.co/collections/allenai/olmoe-66cf678c047657a30c8cd3da

Paper entry: https://arxiv.org/abs/2409.02060

Key Points:

- 🌟 AI2's new open-source model, OLMoE, is competitive in both performance and cost.

- 📊 OLMoE employs a sparse mixture-of-experts architecture, effectively reducing inference costs and memory requirements.

- 🔍 AI2 is committed to providing fully open-source AI models to promote academic research and development.