Ant Group, in collaboration with Peking University, has released DevOps-Eval, a large language model evaluation benchmark specifically designed for the DevOps domain. This benchmark encompasses 4,850 multiple-choice questions across eight categories: planning, coding, building, testing, releasing, deploying, operations, and monitoring. Additionally, it has been refined for AIOps tasks, incorporating challenges such as log parsing, time series anomaly detection, time series classification, and root cause analysis. The evaluation results indicate that the scores among the models are relatively close. Ant Group has expressed its commitment to continuously improving the benchmark, enriching the evaluation dataset, with a particular focus on the AIOps field, and expanding the number of models evaluated.