Artificial intelligence computing startup Cerebras Systems Inc. has officially launched its so-called "world's fastest artificial intelligence inference service," a move that is undoubtedly a direct challenge to industry giant Nvidia Corp. Andrew Feldman, CEO of Cerebras, stated that the new service aims to complete AI inference tasks at a faster speed and lower cost, responding to the growing market demand for efficient inference solutions.

Chip

Cerebras' "High-Speed Inference" service is built on its powerful WSE-3 processor. This processor boasts over 900,000 computing cores and 44GB of on-board memory, with its core count being 52 times that of a single Nvidia H100 graphics processing unit. Cerebras claims that its inference service can reach a speed of 1,000 tokens per second, which is 20 times faster than similar cloud services using Nvidia's most powerful GPU. More notably, the service starts at just 10 cents per million tokens, reportedly offering a 100-fold increase in cost-effectiveness over existing AI inference workloads.

Cerebras' inference service offers three access tiers, including free service, developer tier, and enterprise level. The developer tier, accessible via API endpoints, offers a price of 10 cents per million tokens for the Llama3.18B model, while the Llama3.170B model is priced at 60 cents. The enterprise level provides more customization options and dedicated support, suitable for continuous workloads.

Several renowned institutions have become early customers of Cerebras, including GlaxoSmithKline, Perplexity AI Inc., and Meter Inc. Dr. Andrew Ng, founder of DeepLearning AI Inc., has highly praised Cerebras' rapid inference capabilities, believing it to be particularly helpful for agent AI workflows that require frequent prompting of large language models.

In addition to the inference service, Cerebras has announced several strategic partnerships aimed at providing comprehensive AI development tools for customers. Partners include LangChain, LlamaIndex, Docker Inc., Weights & Biases Inc., and AgentOps Inc. Furthermore, Cerebras' inference API is fully compatible with OpenAI's chat completion API, meaning existing applications can easily migrate to its platform.