Unlock New Ways to Handle Data! Alibaba Research Team Releases XiYan-SQL, an Efficient Text to SQL Conversion Tool

AIbase基地

Published inAI News · 5 min read · Nov 20, 2024

339

The technology of Natural Language to SQL (NL2SQL) is rapidly evolving, becoming an important innovation in the field of Natural Language Processing (NLP). This technology enables users to convert natural language queries into Structured Query Language (SQL) statements, significantly facilitating the interaction between users without a technical background and complex databases to obtain valuable information. NL2SQL technology not only opens new doors for exploring large databases across various industries but also enhances work efficiency and decision-making capabilities.

However, there is a trade-off between query accuracy and adaptability in the implementation of NL2SQL. Some methods struggle to ensure accuracy while also adapting to different types of databases when generating SQL queries. Existing solutions often rely on Large Language Models (LLMs) that generate multiple outputs through prompt engineering and select the best query, but this approach increases computational burden and is not suitable for real-time applications. Meanwhile, Supervised Fine-Tuning (SFT) can achieve targeted SQL generation but faces challenges in cross-domain applications and complex database operations, highlighting the need for innovative frameworks.

The research team at Alibaba has launched XiYan-SQL, a groundbreaking NL2SQL framework. It integrates multiple generator ensemble strategies, combining the advantages of prompt engineering and SFT. A key innovation of XiYan-SQL is the introduction of M-Schema, a semi-structured schema representation method that enhances the system's understanding of database hierarchies, including data types, primary keys, and example values, thereby improving its ability to generate accurate and contextually relevant SQL queries.

XiYan-SQL employs a three-stage process to generate and optimize SQL queries.

First, the system identifies relevant database elements through schema linking, reducing redundant information and focusing on key structures. Next, it generates SQL candidates using generators based on In-Context Learning (ICL) and SFT. Finally, the system optimizes and selects the generated SQL using error correction models and selection models, ensuring the best query is chosen. XiYan-SQL integrates these steps into an efficient pipeline, surpassing traditional methods.

Through rigorous benchmarking, XiYan-SQL has shown outstanding performance across multiple standard test sets, achieving an execution accuracy of 89.65% in the Spider test set, significantly outperforming previous top models.

Moreover, in terms of adaptability to non-relational datasets, XiYan-SQL has also achieved excellent results, reaching an accuracy of 41.20% in the NL2GQL test set. These results indicate that XiYan-SQL possesses exceptional flexibility and accuracy across various scenarios.

GitHub: https://github.com/XGenerationLab/XiYan-SQL

Highlights:

🌟 Innovative schema representation: M-Schema enhances the understanding of database hierarchies, improving query accuracy.

📊 Advanced candidate generation: XiYan-SQL utilizes multiple generators to produce diverse SQL candidates, improving query quality.

✅ Superior adaptability: Through benchmarking, XiYan-SQL demonstrates outstanding performance across various databases, setting a new standard for NL2SQL frameworks.

AI Daily: ByteDance to Release AI Coding Tool TRAE2.0 Version; Mistral Launches Major Audio Model Voxtral; Moonshot Responds to Slow Speed of Kimi K2 API

ByteDance's TRAE 2.0 adds voice interaction. Mistral launches open-source Voxtral audio model. Kimi K2API optimizing. Kunlun releases AgentOrchestra. Thinking Machines Lab raises $2B. Kimi-2 outperforms GPT-4.1. TRAE offers Kimi-K2 & Grok-4. ByteDance open-sources POLARIS. ima knowledge base now web-accessible.....

Moon's Dark Side Kimi K2 API is Slow, Being Optimized Intensively!

Moon's Dark Side addresses Kimi K2API slowdowns, citing high traffic and large model size. Optimization and hardware upgrades underway, with improvements expected soon. Kimi K2 is fully open-source, allowing alternative deployments. The company, founded in 2023, focuses on AI tools like Kimi Assistant for translation and legal analysis, aiming to enhance user experience.....

OpenAI's Former CTO's AI Company Thinking Machines Lab Secures $2 Billion in Funding, Valued at $12 Billion

AI startup Thinking Machines Lab has completed a $2 billion seed round funding, valuing the company at $12 billion, setting a new record for seed funding in Silicon Valley. Founded by Miral Kotb, former CTO of OpenAI, the company has attracted well-known investors such as NVIDIA within less than a year. The company is about to launch its first product, which includes important open-source projects aimed at supporting AI researchers and startups. Despite Meta's failed attempt to acquire the company, it is still seen as a potential challenger to industry giants. This funding reflects...

Kimi K2 Wins Short Story Creative Writing Contest, Exceeding o3-Pro to Showcase New Heights in AI Literature

Kimi K2 excels in creative writing, outperforming o3-Pro in short story creation. This open-source model by Moonshot (Ali-backed) shows strengths in literary compression and metaphor innovation, with some works near publishable quality. Its low cost ($0.15/M tokens) and precise instruction-following attract developers, though emotional depth and multilingual performance need improvement. This breakthrough sets new AI writing standards.....

Moonshot Responds to Slow Speed of Kimi K2 API: Working Hard to Optimize

On July 11, Moonshot officially launched the Kimi K2 model, which has stronger code capabilities and general agent task processing abilities, and simultaneously chose to open-source it. This foundation model based on the MoE architecture has a total parameter count of 1T and an activated parameter count of 32B, and it immediately attracted widespread attention upon its release. However, recently some users have reported that the API service speed for the Kimi K2 model is slow. In response to this, Moonshot posted tonight stating that the main reasons for the slow speed are the high volume of traffic and the large model size. To address this issue, the company

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Unlock New Ways to Handle Data! Alibaba Research Team Releases XiYan-SQL, an Efficient Text to SQL Conversion Tool

AIbase基地

This article is from AIbase Daily

AI News Recommendations

API Price is Only 1/25 of Claude Opus, Kimi K2 Strongly Attracts Cursor Users

AI Daily: ByteDance to Release AI Coding Tool TRAE2.0 Version; Mistral Launches Major Audio Model Voxtral; Moonshot Responds to Slow Speed of Kimi K2 API

Kimi-2 Has Been Launched on LiveBench AI: The New Open-Source AI Champion Exceeds GPT-4.1

New Company of Former OpenAI CTO Mira Murati Completes $2 Billion Funding to Advance Multimodal AI Development

Moon's Dark Side Kimi K2 API is Slow, Being Optimized Intensively!

OpenAI's Former CTO's AI Company Thinking Machines Lab Secures $2 Billion in Funding, Valued at $12 Billion

Kimi K2 Wins Short Story Creative Writing Contest, Exceeding o3-Pro to Showcase New Heights in AI Literature

TRAE Launches Kimi-K2 Model Service International Version Supports Grok-4 (Beta) Function Upgrade

Moonshot Responds to Slow Speed of Kimi K2 API: Working Hard to Optimize

AI Daily: Meitu Launches Imaging AI Agent RoboNeo; 1.8bit Quantized Kimi K2 Model Released; Amazon Introduces AI Code Editor Kiro