Artificial intelligence company Anthropic has announced significant upgrades to its Claude3.5 series models, including the new Claude3.5Sonnet and Claude3.5Haiku. The updated versions claim to allow AI to fully take over your personal computer, capable of performing a variety of basic tasks such as simulating keyboard inputs and mouse clicks, thereby using any applications installed on your computer.
Significant coding capability improvements, surpassing the OpenAI o1-preview model
The new Claude3.5Sonnet has seen notable enhancements across all aspects, especially in coding capabilities. Its score on the SWE-bench Verified has increased from 33.4% to 49.0%, surpassing all publicly available models, including OpenAI's o1-preview model.
Additionally, its performance on the TAU-bench has also improved, particularly in the retail and aviation sectors. All this has been achieved while maintaining the same price and speed as the previous product.
Customer feedback indicates that the upgraded Claude3.5Sonnet has made a qualitative leap in AI coding. For example, GitLab tested this model for DevSecOps tasks and found significant improvements in its reasoning capabilities without increasing latency.
Claude3.5Haiku is the fastest model of the next generation of Claude, surpassing Claude3Opus at the same cost and speed, and performing exceptionally well in multiple intelligence benchmarks, especially in coding tasks. Claude3.5Haiku's low latency and more precise instruction following make it highly suitable for generating user interface products and personalized experiences.
Manipulating computers like humans
The newly introduced computer usage feature is a groundbreaking attempt. The official statement clarifies that this is not about developing specific tools for Claude, but rather teaching it general computer skills to use various standard tools and software programs. Developers can leverage this capability to automate repetitive processes, build and test software, and conduct open-ended research.
Of course, Claude's ability to use computers still needs improvement. Simple operations, such as scrolling and dragging, currently pose challenges for Claude. To ensure safety, new classifiers have been developed to identify potential harm caused by computer usage.
Jared Kaplan, Chief Scientific Officer at Anthropic, said in an interview: "We are entering a new era where AI can use all the tools you use as an individual to accomplish tasks." This update marks a significant step for Anthropic in expanding commercial AI models from traditional chat frameworks to comprehensive "AI agents."
In a demonstration, Claude was tasked with planning a sunrise trip to the Golden Gate Bridge for a friend. The AI not only opened a webpage but also searched for a suitable viewing spot on Google and added the itinerary to the calendar application. Although this performance is impressive, Wired pointed out that it did not provide additional information such as how to get to the destination.
In another demonstration, Claude was asked to build a simple website, successfully creating one using Microsoft's Visual Studio Code and opening a local server for testing. However, it encountered minor errors during the process but successfully fixed the code with prompts.
The upgraded Claude3.5Sonnet is now available to all users. Starting today, developers can build using the computer beta on Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI. The new Claude3.5Haiku will be released later this month.
Key Points:
🌟 The Claude3.5Sonnet and Haiku models have seen significant upgrades, with notable improvements in coding capabilities.
💻 The newly introduced computer usage feature allows Claude to operate computers like humans, opening up more possibilities.
🔒 The use of AI assistants brings security concerns, and Anthropic emphasizes gradual observation and improvement to ensure safety.