The French open-source AI research lab Kyutai has launched a new multimodal large-scale model called Moshi. This is not just a technical breakthrough but also a bold challenge to existing AI technology.

On the early morning of July 4, Kyutai announced the arrival of Moshi on their official website. This model's capabilities are comparable to OpenAI's GPT-4o showcased in May, capable of listening to human voice questions and providing real-time reasoning answers. However, unlike the GPT-4o's voice mode which will be fully open in autumn, Moshi is already accessible to everyone.

Key Features:

  • Multi-modal Capabilities: Moshi can listen to human voice questions and provide real-time reasoning answers. Its voice mode has already been opened, much faster than GPT-4o's autumn launch.

  • Region-Free Access: You can use Moshi anywhere in the world.

  • Mobile App Support: Although it doesn't support Mandarin well, it's perfectly fine to ask questions in English.

  • Upcoming Open Source: Kyutai plans to open source Moshi soon, disclosing the code, model weights, and papers.

QQ截图20240704095539.jpg

Experience URL: https://top.aibase.com/tool/moshi-chat

The release of Moshi represents a bold attempt in AI technology. It not only has listening and speaking capabilities but may also demonstrate the ability to see in the future. This fills us with anticipation for the future of AI. Moreover, the process of using Moshi is very simple; just log in to the official website, fill in your email address, click to join, and you can start conversing with Moshi.

Official Demonstration Video

It's worth mentioning that Moshi's support for Mandarin needs improvement, and using English will provide a better experience. Moreover, Moshi is region-free, allowing direct use anywhere in the world, which undoubtedly provides great convenience for global AI enthusiasts.

Official Demonstration

Kyutai's move also shows their commitment to the open-source spirit. They plan to open source Moshi soon, disclosing the code, model weights, and papers, allowing developers and researchers worldwide to participate in the development and optimization of Moshi.

User Experience:

  • Fast Response: Even on the domestic network route, Moshi can respond to questions almost without delay.

  • Language Support: Currently, Moshi mainly supports English and French, with the Mandarin support needing improvement.

  • Convenient Use: The registration process is simple, only requiring you to submit your email address.

  • Ability Demonstration: Moshi demonstrates listening and speaking capabilities and may add the ability to see in the future. Moshi's anthropomorphic tone is one of its major features, with little machine-like flavor, making the conversational experience more natural and smooth.

Of course, the content of Moshi's responses is relatively limited at present, providing only a general outline and summary. However, with continuous iteration and optimization of the product, we believe Moshi's responses will become more detailed and accurate.

In addition, the release of Moshi will have a profound impact on the education industry. For example, AI can provide circular explanations for students, which is a huge help in education. We look forward to more similar products in the future that support more local languages, bringing AI technology closer to people's lives.