en
AI Ranking
每月不到10元,就可以无限制地访问最好的AIbase。立即成为会员
Home
News
Daily Brief
Income Guide
Tutorial
Tools Directory
Product Library
en
AI Ranking
Search AI Products and News
Explore worldwide AI information, discover new AI opportunities
AI News
AI Tools
AI Cases
AI Tutorial
Type :
AI News
AI Tools
AI Cases
AI Tutorial
2024-08-15 14:53:25
.
AIbase
.
11.1k
OpenAI Launches SWE-bench Verified: Enhancing AI Software Engineering Capability Assessment
OpenAI has released SWE-bench Verified, aiming to more accurately assess AI performance in software engineering tasks and address the limitations of the original SWE-bench, such as overly strict unit tests, ambiguous problem descriptions, and challenging development environment setups. The new benchmark improves assessment consistency and reliability by introducing a containerized Docker environment, significantly enhancing the performance scoring of AI models. GPT-4o solved 33.2% of the samples under the new benchmark, while the best open-source agent framework has...
2023-08-10 10:09:18
.
AIbase
.
279
ChatGPT Answers More Than Half of Software Engineering Questions Incorrectly
A new study has found that ChatGPT's accuracy in answering software engineering questions is below fifty percent. The results show that 52% of ChatGPT's answers to software engineering questions are incorrect. While ChatGPT performs better on general questions, 77% of its answers are still overly verbose.