In the field of generative artificial intelligence, Apple's efforts seem to be primarily focused on mobile devices, especially the latest iOS 18 system. However, the new Apple M4 chip showcases powerful performance in the recently released Mac Mini and MacBook Pro, enabling them to effectively run some of the most powerful open-source large language models (LLMs) currently available, such as Meta's Llama-3.1405B, Nvidia's Nemotron70B, and Qwen2.5Coder-32B.
Exo Labs is a startup founded in March 2024, dedicated to "democratizing access to artificial intelligence." Its co-founder, Alex Cheema, has successfully built a local computing cluster using multiple M4 devices.
He connected four Mac Mini M4s (priced at $599 each) with a MacBook Pro M4Max (priced at $1599) and ran Alibaba's Qwen2.5Coder-32B using Exo's open-source software. The total cost of the cluster is about $5000, which is highly cost-effective compared to a single Nvidia H100 GPU that costs between $25,000 and $30,000.
The benefits of using a local computing cluster instead of cloud services are evident. By running AI models on devices controlled by users or businesses, costs can be significantly reduced while enhancing privacy and security. Cheema stated that Exo Labs is continuously improving its enterprise-level software, and several companies are already using Exo software for local AI inference, with this trend expected to gradually extend to individuals and businesses.
Exo Labs' recent success is attributed to the powerful performance of the M4 chip, which is referred to as "the world's fastest GPU core."
Cheema revealed that Exo Labs' Mac Mini M4 cluster can run Qwen2.5Coder-32B at a speed of 18 tokens per second and Nemotron-70B at a speed of 8 tokens per second. This indicates that users can efficiently handle AI training and inference tasks without relying on cloud infrastructure, making AI more accessible to privacy- and cost-sensitive consumers and businesses.
To further support this wave of local AI innovation, Exo Labs plans to launch a free benchmarking website to provide detailed hardware configuration comparisons, helping users choose the best LLM operating solution based on their needs and budget.
Project link: https://github.com/exo-explore/exo
Key points:
🌟 Exo Labs successfully runs powerful open-source AI models on local computing clusters using the Apple M4 chip.
💰 Running AI models locally can reduce costs, enhance privacy and security, and avoid reliance on cloud services.
📊 Exo Labs will launch a benchmarking website to help users select suitable hardware configurations for AI tasks.