Salesforce has introduced a set of open-source multimodal AI models called xGen-MM, which can understand and generate various data types such as text and images simultaneously, potentially revolutionizing our approach to AI research and application.

The Salesforce AI research team has published a paper on arXiv detailing the xGen-MM framework. This framework not only includes pre-trained models but also datasets and fine-tuning code. Notably, the largest model has 4 billion parameters and performs exceptionally well in multiple benchmark tests, on par with similar open-source models.

image.png

This open-source initiative stands in contrast to the trend of many tech giants keeping advanced AI models proprietary. Salesforce aims to promote broader research and development by sharing models and datasets, allowing more researchers and developers to contribute to the advancement of multimodal AI technology.

A significant innovation of xGen-MM is its ability to handle "interleaved data," meaning it can process multiple images and texts simultaneously. This capability enables the model to perform more complex tasks, such as answering questions about multiple images, truly impressive! Such applications could be highly beneficial in fields like medical diagnosis and autonomous driving.

The release includes various optimized versions of the model, such as a basic pre-trained model, a model fine-tuned for following instructions, and a "safety-tuned" model designed to reduce harmful outputs. This diversity reflects the AI community's increasing focus on balancing capability with safety and ethics.

However, the release of powerful models has also sparked discussions about the potential risks and societal impacts of more advanced AI systems. While Salesforce has taken steps to mitigate risks with safety tuning, balancing innovation with safety remains a critical issue.

Salesforce's open-source release provides valuable tools for researchers to better understand and improve these powerful technologies. It also sets a new standard for transparency in the AI field, potentially encouraging other tech giants to be more open with their research.

Model access: https://huggingface.co/collections/Salesforce/xgen-mm-1-models-662971d6cecbf3a7f80ecc2e

Key Points:

🌟 xGen-MM is Salesforce's open-source multimodal AI model, supporting comprehensive understanding and generation of text and images.

🔍 The model has the ability to handle interleaved data, enabling it to answer questions about multiple images simultaneously, with broad application potential.

🔒 This release includes various optimized versions, focusing on safety and ethical issues, providing researchers with rich resources.