SWE-bench Verified
AI model assessment tool for software engineering capabilities
PremiumNewProductProgrammingAI AssessmentSoftware Engineering
SWE-bench Verified is a subset of SWE-bench released by OpenAI that has been manually verified to reliably assess the ability of AI models to solve real-world software issues. It challenges AI to generate patches that resolve the described problems by providing code repositories and problem descriptions. This tool has been developed to improve the accuracy of evaluating the model's ability to autonomously perform software engineering tasks and is a key component of OpenAI's medium-risk framework.
SWE-bench Verified Visit Over Time
Monthly Visits
546526496
Bounce Rate
56.81%
Page per Visit
2.1
Visit Duration
00:01:39