Machine learning has long permeated various online services, with online shopping being one of the most successful areas. In recent years, machine learning has been applied to various online shopping tasks, such as user queries, browsing history, review analysis, product attribute extraction, and more. To promote the development of machine learning methods, many benchmark tests have emerged to lower the barrier for researchers and engineers to develop and evaluate novel solutions for real online shopping tasks.

However, existing models and benchmarks are often tailored to specific tasks and cannot fully capture the complexity of online shopping. Large Language Models (LLMs), with their multi-tasking and few-shot learning capabilities, have the potential to revolutionize the online shopping experience by reducing the engineering workload for specific tasks and providing users with interactive conversations. Despite their great potential, LLMs also face unique challenges in the online shopping domain, such as specific shopping concepts, implicit knowledge, and heterogeneous user behaviors.

image.png

To address these challenges, researchers at Amazon proposed Shopping MMLU, a multi-task online shopping benchmark based on real Amazon data. Shopping MMLU includes 57 tasks covering four major shopping skills: concept understanding, knowledge reasoning, user behavior alignment, and multilingual capabilities, thus providing a comprehensive assessment of the potential of large language models as general shopping assistants.

This Shopping MMLU is not just an ordinary "test"; it extracts 57 tasks from real Amazon shopping data, covering the four modules of concept understanding, knowledge reasoning, user behavior alignment, and multilingual capabilities. In simple terms, it aims to assess whether an AI assistant can understand your needs like a real shopping guide and help you find your desired products.

image.png

The researchers at Amazon tested over 20 existing AI models using Shopping MMLU and found that:

Famous proprietary AI models, such as Claude-3Sonnet and ChatGPT, performed impressively and secured a place in the top tier. However, open-source AI models are catching up and showing potential to challenge the "authoritative" models.

The results of Shopping MMLU also revealed an interesting phenomenon: online shopping is essentially a multi-task learning problem. This means that AI assistants need to master multiple skills simultaneously to be competent in this role.

Moreover, it is surprising that those AI models that perform excellently in general domains are also not lacking in the online shopping domain. This indicates that AI assistants can transfer general knowledge to specific areas and quickly learn new skills.

image.png

Of course, AI assistants are not perfect by nature. Researchers found that some commonly used AI training methods, such as Instruction Fine-Tuning (IFT), may lead to overfitting in certain cases, adversely affecting performance.

Additionally, few-shot learning poses a significant challenge for AI assistants. This means that when faced with new tasks, AI assistants need to learn quickly and cannot always rely on large amounts of training data.

In summary, Amazon's Shopping MMLU benchmark provides a clear direction for the development of AI assistants. In the future, we look forward to seeing smarter and more human-like online shopping AI assistants that make our shopping experiences more convenient and enjoyable.

image.png

The researchers also discovered some noteworthy details:

Shopping MMLU is more complex and challenging than other existing online shopping AI datasets.

The effects of domain-specific instruction fine-tuning are not always good and are only effective on powerful models that already possess a large amount of general knowledge.

Currently, even the most advanced AI models do not perform as well as algorithms specifically designed for certain online shopping tasks.

image.png

The results of this research indicate that building a perfect online shopping AI assistant still has a long way to go. Future research directions include developing more effective AI training methods, creating more diverse online shopping AI datasets, and combining AI models with specific task algorithms to create more powerful hybrid AI systems.

Finally, the researchers candidly pointed out some limitations of this study:

The data in Shopping MMLU mainly comes from Amazon and may not fully represent user behaviors on other e-commerce platforms.

Although researchers have made efforts to avoid it, there may still be some errors in the data within Shopping MMLU.

In conclusion, Amazon's research opens the door to the future of intelligent shopping. We believe that in the near future, online shopping AI assistants will become an indispensable part of our lives.

Paper link: https://arxiv.org/pdf/2410.20745