SWE-bench Verified is a subset of SWE-bench released by OpenAI that has been manually verified to reliably assess the ability of AI models to solve real-world software issues. It challenges AI to generate patches that resolve the described problems by providing code repositories and problem descriptions. This tool has been developed to improve the accuracy of evaluating the model's ability to autonomously perform software engineering tasks and is a key component of OpenAI's medium-risk framework.