Cerebras Launches 'World's Fastest' AI Inference Service, Challenging Nvidia's Dominance

AIbase基地

Published inAI News · 4 min read · Aug 28, 2024

218

Artificial intelligence computing startup Cerebras Systems Inc. has officially launched its so-called "world's fastest artificial intelligence inference service," a move that is undoubtedly a direct challenge to industry giant Nvidia Corp. Andrew Feldman, CEO of Cerebras, stated that the new service aims to complete AI inference tasks at a faster speed and lower cost, responding to the growing market demand for efficient inference solutions.

Chip

Cerebras' "High-Speed Inference" service is built on its powerful WSE-3 processor. This processor boasts over 900,000 computing cores and 44GB of on-board memory, with its core count being 52 times that of a single Nvidia H100 graphics processing unit. Cerebras claims that its inference service can reach a speed of 1,000 tokens per second, which is 20 times faster than similar cloud services using Nvidia's most powerful GPU. More notably, the service starts at just 10 cents per million tokens, reportedly offering a 100-fold increase in cost-effectiveness over existing AI inference workloads.

Cerebras' inference service offers three access tiers, including free service, developer tier, and enterprise level. The developer tier, accessible via API endpoints, offers a price of 10 cents per million tokens for the Llama3.18B model, while the Llama3.170B model is priced at 60 cents. The enterprise level provides more customization options and dedicated support, suitable for continuous workloads.

Several renowned institutions have become early customers of Cerebras, including GlaxoSmithKline, Perplexity AI Inc., and Meter Inc. Dr. Andrew Ng, founder of DeepLearning AI Inc., has highly praised Cerebras' rapid inference capabilities, believing it to be particularly helpful for agent AI workflows that require frequent prompting of large language models.

In addition to the inference service, Cerebras has announced several strategic partnerships aimed at providing comprehensive AI development tools for customers. Partners include LangChain, LlamaIndex, Docker Inc., Weights & Biases Inc., and AgentOps Inc. Furthermore, Cerebras' inference API is fully compatible with OpenAI's chat completion API, meaning existing applications can easily migrate to its platform.

NVIDIA Launches OmniVinci, a Multimodal Understanding Model That Sets a New SOTA with 19.05 Points Higher

NVIDIA released the multimodal understanding model OmniVinci, which outperformed top models by 19.05 points in benchmark tests. The model achieves excellent performance with only 1/6 of the training data. It aims to enable AI systems to simultaneously understand vision, audio, and text, simulating human multisensory perception of the world.

ByteDance Seed Team Announces the Launch of 3D Generation Large Model Seed 3D 1.0

The ByteDance Seed team recently announced the launch of the 3D generation large model Seed3D1.0, which is capable of generating high-quality, realistic 3D models from a single image in an end-to-end manner, including detailed geometry, realistic textures, and physically based rendering (PBR) materials. This innovative achievement is expected to provide powerful world simulation support for the development of embodied intelligence, addressing bottlenecks in physical interaction capabilities and content diversity in current technologies. During the development process, the Seed team collected and processed a large amount of high-quality 3D data, building a complete three

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Cerebras Launches 'World's Fastest' AI Inference Service, Challenging Nvidia's Dominance

AIbase基地

This article is from AIbase Daily

AI News Recommendations

NVIDIA Launches OmniVinci, a Multimodal Understanding Model That Sets a New SOTA with 19.05 Points Higher

OpenAI GPT-5 Revolutionary Upgrade in Mental Health Response, Unwanted Answers Drop by 65%

Qualcomm Launches New Generation AI Chip, Challenging NVIDIA's Stock Surge by 20%

Volc Engine Launches Doubao Video Generation Model 1.0 Pro Fast, Speed Increased by 3 Times, Price Reduced by 72%

Research Reveals that a Large Amount of Garbage Data Affects the Reasoning Ability of Large Language Models

ByteDance Seed Team Announces the Launch of 3D Generation Large Model Seed 3D 1.0

Cybercab to Start Production in the Second Quarter of Next Year; Optimus V3 to Be Unveiled: Musk Places Bet on AI and Robotics

Hailuo 2.3 is Coming Soon: The Next-Generation AI Video Model That Exceeds Veo, with Enhanced Realism

Hunyuan World Model 1.1 Officially Released: Revolutionary 3D Reconstruction Technology, High-Quality Scene Generation in Seconds

AI Daily: OpenAI Releases Browser Atlas; Tongyi Qwen3-VL Adds Two Model Sizes, 2B and 32B; Baidu Launches Recurrent Evidence Enhancement Large Model

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Cerebras Launches 'World's Fastest' AI Inference Service, Challenging Nvidia's Dominance

AIbase基地

This article is from AIbase Daily

AI News Recommendations

NVIDIA Launches OmniVinci, a Multimodal Understanding Model That Sets a New SOTA with 19.05 Points Higher

OpenAI GPT-5 Revolutionary Upgrade in Mental Health Response, Unwanted Answers Drop by 65%

Qualcomm Launches New Generation AI Chip, Challenging NVIDIA's Stock Surge by 20%

Volc Engine Launches Doubao Video Generation Model 1.0 Pro Fast, Speed Increased by 3 Times, Price Reduced by 72%

Research Reveals that a Large Amount of Garbage Data Affects the Reasoning Ability of Large Language Models

ByteDance Seed Team Announces the Launch of 3D Generation Large Model Seed 3D 1.0

Cybercab to Start Production in the Second Quarter of Next Year; Optimus V3 to Be Unveiled: Musk Places Bet on AI and Robotics

Hailuo 2.3 is Coming Soon: The Next-Generation AI Video Model That Exceeds Veo, with Enhanced Realism

Hunyuan World Model 1.1 Officially Released: Revolutionary 3D Reconstruction Technology, High-Quality Scene Generation in Seconds

AI Daily: OpenAI Releases Browser Atlas; Tongyi Qwen3-VL Adds Two Model Sizes, 2B and 32B; Baidu Launches Recurrent Evidence Enhancement Large Model

GEO Services