Ant Group Releases Benchmark for Large Model Evaluation in the DevOps Field

站长之家

Published inAI News · 2 min read · Nov 2, 2023

Ant Group, in collaboration with Peking University, has released DevOps-Eval, a large language model evaluation benchmark specifically designed for the DevOps domain. This benchmark encompasses 4,850 multiple-choice questions across eight categories: planning, coding, building, testing, releasing, deploying, operations, and monitoring. Additionally, it has been refined for AIOps tasks, incorporating challenges such as log parsing, time series anomaly detection, time series classification, and root cause analysis. The evaluation results indicate that the scores among the models are relatively close. Ant Group has expressed its commitment to continuously improving the benchmark, enriching the evaluation dataset, with a particular focus on the AIOps field, and expanding the number of models evaluated.

Ant Group DevOps Large Model Evaluation

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

TikTok and LV-NUS jointly launch the SAIL-VL2 model: Small size but big performance!

SAIL-VL2, a multimodal model by TikTok SAIL and LV-NUS Lab, achieves breakthroughs on 106 datasets with compact 2B/8B parameters, outperforming peers in complex reasoning tasks like MMMU and MathVista, rivaling large closed models.....

Oct 14, 2025

Tencent Launches Youtu-Embedding: Empowering Enterprise-Level Intelligent Services

Tencent Ugli Lab open-sources the Youtu-Embedding text representation model, enhancing the efficiency of enterprise intelligent customer service and knowledge management. The model avoids generating misleading answers in specific fields by extracting accurate information, solving the problem of irrelevant responses caused by general corpora, and effectively addressing the issue of poor cross-domain performance.

Oct 14, 2025

AI Daily: Microsoft Launches Its First Self-Developed Image Generation Model MAI-Image-1; Baidu World Conference Dates Announced; AI Expert Opens Source Nanochat Teaching Tool

Microsoft launches its first self-developed image generation model MAI-Image-1, ranking in the top ten of LMArena, demonstrating outstanding image generation capabilities. The model is independently developed by Microsoft, marking an important advancement in the field of AI image generation.

Oct 14, 2025

New Breakthrough in Agricultural Intelligence! China Agricultural University Launches Shennong Model 3.0

China Agricultural University launched Shennong Model 3.0, covering agricultural disciplines and application scenarios across the country, promoting agricultural AI to a new stage. The model focuses on 36 agricultural agents, achieving the goals of small size, high intelligence, and low cost. It provides three versions: 32B, 7B, and 1B. It uses dynamic sparsity and incremental compression technology, reducing computing power by 50%.

Oct 14, 2025

HKU and Meituan Collaborate to Solve AI Math Challenges: CodePlot-CoT Enables Large Models to Think with Code Plotting, Performance Surges by 21%

Large language models like GPT-4.1 and Gemini-2.5-Pro struggle with geometry problems due to limited spatial visualization, despite strong text-based reasoning.....

Oct 14, 2025

Apple Launches New FS-DFM Model, AI Long Text Writing Efficiency Improved by 128 Times!

Apple and The Ohio State University jointly launched the FS-DFM model, which can generate long text comparable to traditional models after only 8 iterations, achieving a writing speed improvement of up to 128 times, breaking through the efficiency bottleneck of long text generation. The model uses discrete flow matching technology, different from self-regressive models like ChatGPT that generate text character by character.

Oct 14, 2025

China Agricultural University Launches Shennong Model 3.0, AI Aids Rural Revitalization

China Agricultural University launches Shennong AI 3.0, covering agricultural disciplines nationwide, advancing agricultural AI applications through continuous upgrades from basic Q&A to multimodal features.....

Oct 14, 2025

Baidu World 2025 Will Be Held on November 13: Focusing on Large Model Technology, AI-Native Applications, and Globalization Strategy

Baidu will host its World Conference in Beijing on Nov 13, 2025, focusing on deep learning models, AI-native app ecosystems, and global expansion. This event marks a key milestone in Baidu's next decade, showcasing AI breakthroughs and market strategies.....

Oct 14, 2025

Meta Super Intelligence Lab Breaks RAG Technology Bottleneck: The REFRAG Framework Boosts Inference Speed by 30 Times

Meta Super Intelligence Lab introduced the REFRAG technology, which improves the inference speed of large language models in retrieval-augmented generation tasks by more than 30 times. This breakthrough result was published in a related paper and profoundly transforms the way AI models operate. The lab was established in California in June this year, stemming from Zuckerberg's emphasis on the Llama4 model.

Oct 14, 2025

Ant Group Releases and Opens Source the Ring-1T Model with Trillion Parameters, Setting New SOTA for Open Source

Ant Group open-sourced the Ring-1T model with trillion parameters on October 14th, including weights and training methods. The model is an upgrade from the preview version, optimized for reasoning capabilities through reinforcement learning, and has improved general performance, showing balanced results in multiple tasks. The team is aiming to tackle more challenging problems to enhance complex reasoning abilities such as mathematics.

Oct 14, 2025

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Ant Group Releases Benchmark for Large Model Evaluation in the DevOps Field

站长之家

This article is from AIbase Daily

AI News Recommendations

TikTok and LV-NUS jointly launch the SAIL-VL2 model: Small size but big performance!

Tencent Launches Youtu-Embedding: Empowering Enterprise-Level Intelligent Services

AI Daily: Microsoft Launches Its First Self-Developed Image Generation Model MAI-Image-1; Baidu World Conference Dates Announced; AI Expert Opens Source Nanochat Teaching Tool

New Breakthrough in Agricultural Intelligence! China Agricultural University Launches Shennong Model 3.0

HKU and Meituan Collaborate to Solve AI Math Challenges: CodePlot-CoT Enables Large Models to Think with Code Plotting, Performance Surges by 21%

Apple Launches New FS-DFM Model, AI Long Text Writing Efficiency Improved by 128 Times!

China Agricultural University Launches Shennong Model 3.0, AI Aids Rural Revitalization

Baidu World 2025 Will Be Held on November 13: Focusing on Large Model Technology, AI-Native Applications, and Globalization Strategy

Meta Super Intelligence Lab Breaks RAG Technology Bottleneck: The REFRAG Framework Boosts Inference Speed by 30 Times

Ant Group Releases and Opens Source the Ring-1T Model with Trillion Parameters, Setting New SOTA for Open Source

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Ant Group Releases Benchmark for Large Model Evaluation in the DevOps Field

站长之家

This article is from AIbase Daily

AI News Recommendations

TikTok and LV-NUS jointly launch the SAIL-VL2 model: Small size but big performance!

Tencent Launches Youtu-Embedding: Empowering Enterprise-Level Intelligent Services

AI Daily: Microsoft Launches Its First Self-Developed Image Generation Model MAI-Image-1; Baidu World Conference Dates Announced; AI Expert Opens Source Nanochat Teaching Tool

New Breakthrough in Agricultural Intelligence! China Agricultural University Launches Shennong Model 3.0

HKU and Meituan Collaborate to Solve AI Math Challenges: CodePlot-CoT Enables Large Models to Think with Code Plotting, Performance Surges by 21%

Apple Launches New FS-DFM Model, AI Long Text Writing Efficiency Improved by 128 Times!

China Agricultural University Launches Shennong Model 3.0, AI Aids Rural Revitalization

Baidu World 2025 Will Be Held on November 13: Focusing on Large Model Technology, AI-Native Applications, and Globalization Strategy

Meta Super Intelligence Lab Breaks RAG Technology Bottleneck: The REFRAG Framework Boosts Inference Speed by 30 Times

Ant Group Releases and Opens Source the Ring-1T Model with Trillion Parameters, Setting New SOTA for Open Source

GEO Services