In the world of software development, bug fixing is always a headache. Now, ByteDance's Doubao large model team has good news: they've officially launched the first multilingual Software Engineering (SWE) dataset – Multi-SWE-bench. This new dataset aims to evaluate and improve large models' ability to automatically fix code errors.
Compared to previous monolingual datasets, Multi-SWE-bench significantly expands its scope. This dataset not only covers Python but also includes seven mainstream programming languages: Java, Go, Rust, C, C++, TypeScript, and JavaScript, truly achieving a "full-stack engineering" benchmark. This means developers using any of these languages can benefit.
The dataset's construction process is also noteworthy. Multi-SWE-bench contains 1632 real-world programming examples, all sourced from GitHub issue reports. To ensure quality, these examples have undergone standardized testing and review by professional developers, ensuring each sample has a clear problem description, a valid fix patch, and a reproducible test environment.
The Doubao large model team hopes that this new dataset will promote systematic evaluation of large models in multiple mainstream programming languages and real-world code environments, thereby improving their automatic programming capabilities and moving towards a more practical and engineering-oriented direction. This effort can not only save developers time but also improve software development efficiency and quality.
In actual development, bug fixing is not just a technical issue; it's a significant factor affecting project progress and team morale. Therefore, the launch of Multi-SWE-bench may be a crucial step towards automated software engineering in the future.
This new dataset from ByteDance marks a significant step forward in automated code repair technology and promises to bring convenience to developers worldwide.