Meta Releases Multiple Models: Multimodal Model Chameleon, Text-to-Music Generation Model JASCO, Audio Watermarking Technology AudioSeal, and More

AIbase

Published inAI News · 7 min read · Jun 19, 2024

207

Recently, Meta quietly released six research achievements, bringing new applications and technological breakthroughs to the AI field. These include multi-modal models, text-to-music generation models, audio watermarking technology, datasets, and more. Let's take a closer look at what these research achievements entail.

Meta Chameleon (“Chameleon” Model)

First, the released multi-modal model "Chameleon" can process both text and images simultaneously, supporting mixed input and output of text, providing a novel solution for handling multi-modal data.

While most current late-fusion models use diffusion-based learning, Meta Chameleon employs tokenization for both text and images. This allows for a more unified approach and makes the model easier to design, maintain, and expand.

See the video case below: generating creative titles from images or creating a new scene using a mix of text prompts and images.

Currently, Meta plans to release key components of the Chameleon7B and 34B models under a research license. The models released have been adjusted for safety, support mixed-mode input and pure text output, and are intended for research purposes. The official emphasized that they will not release the Chameleon image generation model.

Product Entry:https://top.aibase.com/tool/meta-chameleon

Multi-Token Prediction

A new language model training method, "Multi-Token Prediction," aims to enhance model capabilities and training efficiency by training models to predict multiple words at once, improving the model's prediction accuracy.

Using this method, language models can be trained to predict multiple future words simultaneously, rather than the previous method of predicting one word at a time. This enhances model capabilities and training efficiency while increasing speed. In the spirit of responsible open science, the official will release pre-trained models for code completion under a non-commercial/research-only license.

Product Entry:https://top.aibase.com/tool/multi-token-prediction

Text-to-Music Generation Model "JASCO"

While existing text-to-music models (such as MusicGen) primarily rely on text input to generate music, Meta's new model "Meta Joint Audio and Symbolic Conditioning for Temporal-Controlled Text-to-Music Generation" (JASCO) can accept various conditional inputs, such as specific chords or beats, to improve control over the generated music output. Specifically, information bottleneck layers combined with temporal blurring can be used to extract information related to specific controls. This allows for the combination of symbolic and audio-based conditions within the same text-to-music generation model.

JASCO is comparable to evaluation baselines in terms of generation quality while allowing for better and more flexible control over the generated music. The official will release research papers and a sample page, with inference code to be released under the MIT license as part of the AudioCraft repository later this month, and pre-trained models under CC-BY-NC.

Code Entry:https://top.aibase.com/tool/audiocraft

Audio Watermarking Technology "AudioSeal"

This is the first audio watermarking technology designed specifically for local detection of AI-generated speech, capable of accurately locating AI-generated segments within longer audio clips. AudioSeal improves upon traditional audio watermarking by focusing on detecting AI-generated content rather than steganography.

Unlike traditional methods that rely on complex decoding algorithms, AudioSeal's local detection method enables faster and more efficient detection. This design increases detection speed by 485 times compared to previous methods, making it suitable for large-scale and real-time applications. Our approach achieves state-of-the-art performance in terms of robustness and imperceptibility of audio watermarking.

AudioSeal is released under a commercial license.

Product Entry:https://top.aibase.com/tool/audioseal

PRISM Dataset

In the meantime, Meta also released the PRISM dataset in collaboration with external partners, containing dialogue data and preferences from 1,500 participants worldwide, aimed at improving large language models to enhance the diversity of dialogues, preference diversity, and social benefits.

This dataset maps individual preferences and fine-grained feedback to 8,011 real-time dialogues with 21 different LLMs.

Dataset Entry: https://huggingface.co/datasets/HannahRoseKirk/prism-alignment

"DIG In" Metrics

Used to evaluate geographical disparities in text-to-image generation models, providing more reference data for model improvement. To understand how people in different regions perceive geographical representations, Meta conducted a large-scale annotation study. We collected over 65,000 annotations and more than 20 survey responses for each example, covering attractiveness, similarity, consistency, and shared suggestions to improve automatic and manual evaluation of text-to-image models.

Code Entry:https://top.aibase.com/tool/dig-in

The release of these projects brings new technological breakthroughs and application prospects to the AI field, which is of great significance for promoting the development and application of AI technology.

Claude Code Upgraded Again! Hooks Feature Unlocks a New Dimension in AI Programming, Making Automation Smarter

With the deep application of artificial intelligence technology in the field of programming, Claude Code launched by Anthropic has become a reliable assistant for many developers, thanks to its powerful code comprehension and automation capabilities. Just yesterday, Claude Code received an important update, introducing the Hooks feature, which provides developers with more precise control and a more efficient development experience. What is the Hooks feature? The Hooks feature is a user-defined shell introduced by Claude Code.

KPMG Report: China Leads in Medical Large Models, Accounting for 70% of the Global Total

A recent report titled "Health Tech 50 - The First Edition" released by KPMG China reveals that China has taken a leading position in the field of medical large models globally. The report indicates that the number of medical large models launched in China accounts for more than 70% of the global total, far surpassing other countries and regions. In terms of model categories, large language models (LLMs) are the most numerous, accounting for nearly 65%. Moreover, the report also highlights the strong growth momentum of the intelligent medical devices market in China. It is expected that by 2025, the scale of the intelligent medical devices market in China will reach 24.23 billion yuan, and it will continue to grow.

Xiaomi App Store Launches AI Intelligent Agent Zone, First Collaboration with Baidu Wenyi Intelligent Agent Platform

Since July, Xiaomi App Store will gradually open AI intelligent agent distribution services to users. After opening the Xiaomi App Store APP, users can click on the newly added 【Intelligent Agent】 entry at the bottom to directly access the special zone and browse and experience various practical and interesting AI intelligent agent products. At the same time, users can also quickly locate the desired service through the search function. The entire process does not require downloading or installing, truly achieving the convenient experience of "instant use." This innovative model not only lowers the threshold for users to access AI services, but also enhances the efficiency of service delivery through scenario-based recommendations. It is worth noting that

Google Launches Gemini for Education! Free AI Tools Sweep the Global Education Sector

Google recently announced the launch of a new AI tool suite called Gemini for Education, based on its latest generation Gemini 2.5 Pro model and the LearnLM learning large model specifically optimized for education, providing free, powerful, and efficient learning and teaching support for teachers and students around the world. This move marks another major breakthrough for Google in the field of educational technology, aiming to empower educators and students through AI technology, creating a more personalized and efficient learning experience. Gemini for Educa

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Meta Releases Multiple Models: Multimodal Model Chameleon, Text-to-Music Generation Model JASCO, Audio Watermarking Technology AudioSeal, and More

AIbase

This article is from AIbase Daily

AI News Recommendations

ChatGPT Helps News Websites Increase Traffic, But Struggles to Compensate for the Decline in Search Traffic

2025 Global AI Talent Rankings: The Rise of Chinese Experts and Emerging Forces

A Daily: Bilibili Upgrades Anime Video Generation Model AniSora V3; ByteDance Open Sources 4D Video Generation Framework EX-4D; DeepSWE Open Sources AI Agent System Rises to the Top

Claude Code Upgraded Again! Hooks Feature Unlocks a New Dimension in AI Programming, Making Automation Smarter

Perplexity Launches Max Subscription Plan: Unlock Unlimited AI Productivity for $200 per Month

KPMG Report: China Leads in Medical Large Models, Accounting for 70% of the Global Total

Topview Avatar 2 Shakes the Market! AI Digital Humans Revolution E-commerce Live Streaming, Will the Era of Models Come to an End?

Perplexity Launches Monthly $200 Max Subscription Service to Unlock Advanced AI Models and Exclusive Features

Xiaomi App Store Launches AI Intelligent Agent Zone, First Collaboration with Baidu Wenyi Intelligent Agent Platform

Google Launches Gemini for Education! Free AI Tools Sweep the Global Education Sector