In the digital era, we interact with vast amounts of images daily. But have you ever imagined how magical it would be if we could quickly find the images we want just by using a sketch, an artwork, or even a blurry photo? Researchers from Peking University's Yuan Li课题组, along with colleagues from Nanyang Technological University and Tsinghua University's Institute of Automation, have brought us such a surprise—a groundbreaking image retrieval technology that can handle diverse query styles, whether sketches, artworks, or low-resolution images, with precise matching.
The core of this technology is their proposed "Universal Style Retrieval" method. Unlike traditional text-based image retrieval, this new method can process various query styles, including combined queries such as sketches with text, artworks with text, etc. This not only enhances the flexibility of retrieval but also greatly improves the accuracy.
To achieve this goal, the research team constructed two unique datasets: DSR (Diverse-Style Retrieval Dataset) and ImageNet-X. DSR includes 10,000 natural images and corresponding texts for four retrieval styles, while ImageNet-X contains 1 million natural images with various style annotations. The establishment of these two datasets provides rich training and testing resources for the new method.
Even more exciting, the research team proposed a framework named FreestyleRet. This framework effectively solves the problem of existing models being unable to accommodate different types of retrieval vectors by extracting image styles and injecting them into the retrieval model. FreestyleRet consists of three main modules: the style extraction module, the style space construction module, and the style-inspired prompt tuning module. These modules work together to enable the retrieval model to understand and process various style query vectors.
In experiments, the FreestyleRet framework demonstrated outstanding performance. It not only achieved significant improvements in Recall@1 and Recall@5 on the DSR and ImageNet-X datasets but also showed good generalization and scalability in handling various style query vectors.
The results of this research have been published and can be found in detail on arXiv. Additionally, the related code and datasets have been open-sourced for interested researchers and developers to further explore and apply.
This is not just a technological leap in the field of image retrieval but also a significant convenience in our daily lives. Imagine, in the future, whether seeking inspiration, conducting academic research, or daily entertainment, we will be able to find the necessary image resources more quickly and accurately. This is the power of technology, making everything possible.
Paper link: https://arxiv.org/pdf/2312.02428