Resemble AI, a leading voice cloning company, recently announced the release of its next-generation deepfake detection model, Detect-2B. This new model boasts an accuracy rate of approximately 94% in detecting AI-generated audio, marking another significant advancement in deepfake detection technology.
Detect-2B employs a series of pre-trained sub-models and fine-tuning techniques to conduct in-depth inspections of audio segments to determine if they are AI-generated. Resemble AI stated on their blog that Detect-2B has achieved significant leaps in model architecture, training data, and overall performance over its predecessor, creating a highly robust and accurate detection model.
The sub-models of Detect-2B consist of a frozen audio representation model and adaptive modules inserted into key layers. These adaptive modules shift the model's focus to artifacts—unexpected sounds left in recordings, which typically distinguish real audio from AI-generated audio. AI-generated audio often sounds "too clean," and Detect-2B can predict the AI-generated probability of audio without retraining the model each time a new segment is listened to.
Image source note: The image is generated by AI, authorized service provider Midjourney
Resemble AI also mentioned that Detect-2B's architecture is based on Mamba-SSM or state-space models, which do not rely on static data or repetitive patterns but instead use stochastic or probabilistic models that react better to different variables. This architecture is well-suited for audio detection as it can capture various dynamics within audio segments and adapt to the state of audio signals, even when the recording quality is poor.
In evaluating the model's performance, Resemble AI conducted extensive tests on Detect-2B, including unseen speakers, deepfake-generated audio, and different languages. The company stated that the model can correctly detect deepfake audio in six different languages with at least 93% accuracy.
Resemble AI launched its AI voice platform Rapid Voice Cloning in April. Detect-2B will be available via API and can be integrated into various applications, providing businesses with a powerful tool for deepfake detection.
As the 2024 U.S. presidential election approaches, identifying AI-generated voices or videos becomes increasingly important. AI voices could make misleading voters and spreading misinformation easier, undermining trust in brands. Tools like Detect-2B can help identify and verify these forgeries before they become known to the public.
Resemble AI is not the only company working on detecting AI clones. McAfee launched Project Mockingbird in January to detect AI audio, while Meta is developing a method to watermark AI-generated audio.
Resemble AI stated that as the capabilities of generative AI continue to advance, their detection capabilities must also progress. They have planned several exciting research directions to further improve Detect-2B, focusing on areas such as representation learning, advanced model architectures, and data augmentation. This indicates Resemble AI's commitment to continuous innovation in response to the challenges posed by deepfake technology.