Etched Company Bets on Transformer Architecture Launches Revolutionary AI Chip Sohu

AIbase

Published inAI News · 9 min read · Jun 26, 2024

340

In an era where artificial intelligence is advancing at an unprecedented pace, a company named Etched is staking everything on a revolutionary AI architecture known as Transformer. Recently, the company announced the launch of the world's first Application-Specific Integrated Circuit (ASIC) chip designed specifically for Transformer architecture—Sohu. They claim that Sohu's performance far exceeds any GPU currently on the market, promising a revolutionary transformation in the AI field.

Transformer Architecture Dominates the AI Field

Etched made a bold prediction in 2022: the Transformer architecture would dominate the AI world. This prediction has proven accurate. Today, from ChatGPT to Sora, from Gemini to Stable Diffusion3, every state-of-the-art AI model employs the Transformer architecture. It is based on this prediction that Etched spent two years developing the Sohu chip.

The Sohu chip achieves unprecedented performance improvements by directly embedding the Transformer architecture into the hardware. Although this means Sohu cannot run most traditional AI models, such as the DLRM behind Instagram ads, the AlphaFold2 protein folding model, or early image models like Stable Diffusion2, for Transformer models, Sohu's speed far surpasses any other chip.

Significant Performance Advantages

According to Etched, a server equipped with eight Sohu chips can process over 500,000 tokens per second when running the Llama70B model. This performance is an order of magnitude faster than Nvidia's upcoming next-generation Blackwell (B200) GPU, yet at a lower cost.

Specifically, an 8xSohu server can replace 160 H100 GPUs. This means using Sohu chips can significantly reduce the operating costs of AI models while dramatically increasing processing speed.

The Logic Behind the Bet

Etched is so firmly betting on the Transformer architecture based on their deep insight into AI development trends. The company believes that scaling is key to achieving superhuman intelligence. Over the past five years, AI models have surpassed humans in most standardized tests, primarily due to significant improvements in computing power. For example, Meta's computational resources used to train the Llama400B model were 50,000 times those used by OpenAI to train GPT-2.

However, continuing to scale poses significant challenges. The cost of the next-generation data center could exceed the GDP of a small country. At the current development pace, our hardware, power grids, and financial resources cannot keep up. This is where the Sohu chip comes in.

The Inevitability of Specialized Chips

Etched believes that with Moore's Law slowing down, the only way to improve performance is through specialization. Before the Transformer architecture dominated the AI field, many companies were developing flexible AI chips and GPUs to handle various architectures. Now, with the market's demand for Transformer inference skyrocketing from about $50 million to billions of dollars, coupled with the convergence of AI model architectures, the emergence of specialized chips has become inevitable.

When the training cost of a model reaches $1 billion and the inference cost exceeds $10 billion, even a 1% performance improvement is enough to justify a $50 million to $100 million custom chip project. In fact, the performance advantage of ASICs is far greater than that.

How Sohu Chip Works

The Sohu chip achieves such high performance because it is optimized specifically for the Transformer architecture. By removing most control flow logic, Sohu can accommodate more mathematical operation units. This allows Sohu's FLOPS utilization to exceed 90%, while GPU utilization when running TRT-LLM is only around 30%.

Etched explains that since most of the GPU's area is used to ensure programmability, a design specifically targeting Transformers can accommodate more computing units. In fact, only 3.3% of the 80 billion transistors in the H100 GPU are used for matrix multiplication. By focusing on Transformers, Sohu can accommodate more FLOPS on the chip without reducing precision or using sparsity techniques.

Software Ecosystem

Although the Sohu chip has achieved significant breakthroughs at the hardware level, the software ecosystem is equally crucial. Compared to GPUs and TPUs, Sohu's software development is relatively simple as it only needs to support the Transformer architecture. Etched promises to open source all software from drivers to kernels to service stacks, which will greatly facilitate developers to use and optimize the Sohu chip.

Future Outlook

If Etched's bet pays off, the Sohu chip will completely reshape the AI industry landscape. Currently, many AI applications face performance bottlenecks. For example, Gemini takes over 60 seconds to answer a question about a video, the cost of coding agents is higher than that of software engineers, and it takes hours to complete a task, and video models can only generate one frame per second.

The Sohu chip is expected to increase the speed of AI models by 20 times while significantly reducing costs. This means real-time video generation, calls, intelligent agents, and search applications will become possible. Etched has already started accepting early user applications for Sohu developer cloud services and is actively recruiting talent to join their team.

The breakthrough in AI computing power could have profound implications, and Etched's Sohu chip is undoubtedly worth our close attention. As more details are disclosed and practical applications unfold, we will be better able to assess the potential of this technology and its impact on the AI field.

Selecting AI Glasses for Nearsighted Users Requires Attention: Integrated Curved Lenses Are Generally Recognized by Experts

At the end of November, new AI glasses were launched in China. Nearsighted users pay more attention to lens safety and wearing experience when purchasing. Experts point out that "integrated curved lenses" perform better in optical performance and reliability. Mainstream products support intelligent functions such as Q&A, photography, and navigation. Prices range from 2000 to 6000 yuan, and they are favored by AI enthusiasts and young people. Nearsighted users need to use them in conjunction with vision correction solutions.

Domestic AI Model Kimi K2 Successfully Integrated with Perplexity, Marking a Significant Step

The domestic Kimi K2 Thinking model has successfully integrated with the globally renowned AI search application Perplexity, becoming the only domestic model to join the platform. This integration, occurring simultaneously with OpenAI's GPT-5.1, highlights the international competitiveness of domestic AI technology. Perplexity, a conversational answer engine established in 2022, has grown into the highest-valued AI search application globally, revolutionizing the way users access information.

Databricks Co-founder Konwinski Warns: The US AI Research Advantage Is Being Lost

Databricks Co-founder Andy Konwinski warned that the US is yielding AI research leadership to China, which poses an existential threat to democracy. He pointed out that feedback from Berkeley and Stanford PhD students shows that about half of the notable new AI ideas in the past year have come from Chinese teams, a significant increase in proportion. Konwinski co-founded the venture capital firm Laude with his partner in 2024 and runs a nonprofit accelerator called Laud.

Google Gemini New Feature: Users Can Guide AI Video Generation with Multiple Reference Images

Google Gemini app update, supports uploading multiple reference images in video prompts, combined with text to generate videos and audio, allowing users to have more precise control over the appearance and sound of the video. This feature is now available for testing on the Flow platform, which also supports video expansion and scene stitching, providing higher video quality.

AI Ecological Effects Accelerate Release, Tencent's Operating Profit in Q3 Increased by 18% to 72.6 Billion Yuan

Tencent's revenue in the third quarter was 192.87 billion yuan, an increase of 15% year-over-year; operating profit was 72.57 billion yuan, up 18%. Core businesses and AI collaboration, all segments grew at double-digit rates: Value-added services revenue was 95.86 billion yuan (up 16%), gaming business up 22.8%; Marketing services revenue was 36.24 billion yuan (up 21%), benefiting from AI and WeChat ecosystem.

Accel Report: AI Application Funding in Europe Surges, Catching Up with the US

According to Accel's report, the US leads in AI large models, but Europe is rapidly catching up in the field of AI applications, with emerging companies such as Lovable and Synthesia rising. Private funding for AI applications in Europe and Israel reached 66% of that in the US in 2025, a significant increase from 10% a decade ago. A partner pointed out that Europe's mature software ecosystem has driven this shift.

iFlytek Launches AI Hardware and Software Integrated Solution: Accurate Recognition Even in 90 Decibel Noise

iFlytek launched its AI hardware and software integrated solution at the 2025 1024 Developer Festival. By deeply integrating algorithms and hardware, it solves recognition challenges in complex environments such as high noise and far-field conditions, improving the accuracy of voice and visual intelligence, marking a significant breakthrough in this field.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition