Recently, Alibaba has launched an AI data science assistant named DS Assistant, which automates the entire process from data exploration to model evaluation, making data science tasks simpler and more efficient.

DS Assistant is developed based on the Modelscope-Agent framework, an open-source framework by Alibaba, known for its rich tool ecosystem and flexible module design. The introduction of DS Assistant signifies that users without a deep background in data science can now easily handle complex data science problems.

image.png

The core advantage of DS Assistant lies in its automated workflow. Users only need to provide their requirements, and DS Assistant will automatically perform exploratory data analysis, data preprocessing, feature engineering, model training, and evaluation. This not only enhances efficiency but also lowers the entry barrier for data science tasks.

The Modelscope-Agent framework serves as the robust backbone behind DS Assistant, featuring:

Support for various mainstream open-source models, such as vllm, ollama, etc;

Provision of RAG components for quick integration with knowledge bases;

A rich tool ecosystem, supporting Modelscope community models and langchain tools.

DS Assistant employs the emerging plan-and-execute framework, which efficiently completes complex tasks through clear planning and execution steps. Its workflow includes task planning, sub-task scheduling, task execution, and result consolidation, significantly enhancing the efficiency and controllability of task execution.

In terms of system architecture, DS Assistant consists of four main modules: DS Assistant itself acts as the system's brain, responsible for overall scheduling; the Plan module generates task lists and performs topological sorting; the Execution module handles specific execution and result saving; the Memory management module records intermediate task execution results.

In practical cases, DS Assistant has been successfully applied to the ICR - Identifying Age-Related Conditions competition task on Kaggle. Through automated data processing and analysis, DS Assistant not only increased the success rate of task execution but also generated detailed process records for users.

The effectiveness of DS Assistant was evaluated through ML-Benchmark, showing superior performance over open-source SOTA in some complex data science tasks, measured by Normalized Performance Score (NPS), total time, and total token count.

The application value of DS Assistant lies in:

For users unfamiliar with the data analysis process, DS Assistant offers a quick way to understand data processing ideas and technical points;

For users familiar with the data analysis process, DS Assistant provides detailed method descriptions for experimental comparison;

For everyone, DS Assistant can automatically and quickly achieve a deeper understanding of the current file.

In the future, DS Assistant will be optimized in three directions: increasing task execution success rates, supporting conversational interactive tasks, and batch processing multiple files of the same task, to further enhance user experience.

This innovative tool by Alibaba not only lowers the entry barrier in the data science field but also provides data scientists with a powerful automated assistant, heralding a new transformation in the data science domain.

Official repository: https://github.com/modelscope/modelscope-agent/blob/master/examples/agents/data_science_assistant.ipynb

Reference: https://blog.langchain.dev/planning-agents/