Recently, two Chinese scholars from the Georgia Institute of Technology and NVIDIA have proposed a new fine-tuning framework called RankRAG. This framework greatly simplifies the complex RAG pipeline, enabling the same Large Language Model (LLM) to perform retrieval, ranking, and generation tasks, resulting in a significant performance improvement.
RAG (Retrieval-Augmented Generation) is a commonly used technique in LLM deployment, especially suitable for text generation tasks that require a large amount of factual knowledge. Typically, the RAG process involves: a dense model based on text encoding retrieves the top-k text segments from an external database, followed by the LLM's reading and generation. Although this process is widely used, it has limitations, such as the selection of k-value. If k-value is too large, even LLMs supporting long contexts struggle to process quickly; if it's too small, it requires a high recall retrieval mechanism, and the existing retrieval and ranking models have their own shortcomings.
Based on these issues, the RankRAG framework proposes a new approach: through fine-tuning, it extends the LLM's capabilities, allowing the LLM to perform retrieval and ranking itself. Experimental results show that this method not only improves data efficiency but also significantly enhances model performance. Particularly, the RankRAG fine-tuned Llama38B/70B models on multiple general benchmarks and biomedical knowledge-intensive benchmarks have outperformed the ChatQA-1.58B and ChatQA-1.570B models, respectively.
The key to RankRAG lies in its high interactivity and editability. Users can not only view the AI-generated content in real-time but also directly edit and iterate on the interface. This immediate feedback mechanism greatly improves work efficiency, making AI a powerful assistant in the creative process. More excitingly, this update allows these Artifacts to no longer be limited to the Claude platform, and users can easily share them anywhere.
This innovation in the RankRAG fine-tuning framework also includes two stages of instruction fine-tuning. The first stage is supervised fine-tuning (SFT), which mixes multiple datasets to improve the LLM's ability to follow instructions. The second stage's fine-tuning dataset contains various QA data, retrieval-enhanced QA data, and contextual ranking data, further enhancing the LLM's retrieval and ranking capabilities.
In experiments, RankRAG consistently outperformed the current open-source SOTA model ChatQA-1.5 on nine general domain datasets. Especially in challenging QA tasks such as long-tail QA and multi-hop QA, RankRAG improved performance by over 10% compared to ChatQA-1.5.
In summary, RankRAG not only excels in retrieval and generation tasks but also demonstrates strong adaptability on the biomedical RAG benchmark Mirage. Even without fine-tuning, RankRAG's performance in medical QA tasks exceeds that of many open-source models in specialized fields.
With the introduction and continuous improvement of the RankRAG framework, we have every reason to believe that the future of AI and human collaboration in creativity will be even brighter. Whether it's independent developers or researchers, this innovative framework can inspire more creativity and possibilities, driving the development of technology and applications.
Paper address: https://arxiv.org/abs/2407.02485