Resemble AI, a voice cloning company, has released its next-generation deepfake detection model with an accuracy rate of approximately 94%. Detect-2B uses a series of pre-trained sub-models and fine-tuning to examine audio clips and determine if they are AI-generated. DETECT-2B can detect audio in over 30 languages with very high accuracy (above 94%) in just 200 milliseconds. With this efficient, multilingual technology, we can effectively combat AI-generated audio fraud.
Product Entry:https://top.aibase.com/tool/detect-2b
The company stated in a blog post: "Building on the solid foundation of our initial Detect model, DETECT-2B has made significant advancements in model architecture, training data, and overall performance. The result is an extremely powerful and accurate deepfake detection model that has achieved remarkable performance on a dataset of both real and fake audio clips."
According to Resemble, the sub-models of Detect-2B "consist of a frozen audio representation model and an adaptive module inserted into its key layers." The adaptive module shifts the model's focus towards often overlooked sounds that distinguish real audio from fake—namely, unintended sounds left in recordings. Most AI-generated audio clips sound "too clean." Detect-2B can predict the AI-generated parts of the audio without retraining the model each time it listens to a new clip. The sub-models have also been trained on large-scale datasets.
Detect-2B aggregates its prediction scores and compares them with a "carefully tuned threshold" to determine whether a recording is real or fake. Resemble says the way its researchers built Detect-2B makes it faster to train without requiring much computational resources to deploy.
The model's architecture is based on Mamba-SSM, or State Space Models, which does not rely on static data or repetitive patterns. Instead, it uses a stochastic probabilistic model that is more responsive to different variables. Resemble says this architecture performs well in audio detection because it captures the different dynamics in audio clips, adapts to the various states of audio signals, and continues to work even with poor recording quality.
To evaluate the model, Resemble tested Detect-2B on unknown speakers, deepfake-generated audio, and different languages. The company claims the model correctly detected deepfake audio in six different languages with at least 93% accuracy.
Resemble launched its AI voice platform Rapid Voice Cloning in April. Detect-2B will be available via API and can be integrated into various applications.
Resemble is not the only company working on detecting AI clones. McAfee launched Project Mockingbird in January to detect AI audio. Meta is also developing a method to watermark AI-generated audio.
Key Points:
- Resemble AI's Detect-2B model is the next-generation deepfake detection model with 94% accuracy.
- Detect-2B uses pre-trained sub-models and fine-tuning to check audio clips for AI generation.
- The model's architecture is based on a stochastic probabilistic model, making it more sensitive to different dynamics in audio signals and performing well in detecting deepfake audio across various languages.