Any GPT

A multi-modal large-scale language model

CommonProductProductivityMulti-modalChatbot
AnyGPT is a unified large-scale language model that employs discrete representations for the uniform processing of various modalities, including voice, text, images, and music. AnyGPT can be trained stably without modifying the architecture or training paradigm of existing large-scale language models. It relies entirely on data-level preprocessing, which facilitates the seamless integration of new modalities into the language model, akin to the addition of a new language. We have constructed a text-centric multi-modal dataset for multi-modal alignment pre-training. Utilizing generative models, we have created the first large-scale multi-modal instruction dataset from any modality to any modality. It consists of 108,000 multi-turn dialogue examples with different modalities intertwined, enabling the model to handle combinations of any modal input and output. Experimental results indicate that AnyGPT can facilitate multi-modal dialogues from any modality to any modality and achieve performance comparable to dedicated models across all modalities, demonstrating that discrete representations can be effectively and conveniently used for unifying multiple modalities in language models.
Visit

Any GPT Visit Over Time

Monthly Visits

590

Bounce Rate

77.68%

Page per Visit

1.0

Visit Duration

00:00:00

Any GPT Visit Trend

Any GPT Visit Geography

Any GPT Traffic Sources

Any GPT Alternatives