Website Home (ChinaZ.com) June 12 News: A joint research team from Beihang University and Nanyang Technological University conducted an in-depth security test on the GPT-4o model. Through tens of thousands of API queries, researchers evaluated the security of GPT-4o in text, image, and audio modalities. The study found that although GPT-4o has improved its security against text jailbreak attacks, the newly introduced audio modality has expanded the attack surface, and its overall multimodal security is not as robust as its predecessor, GPT-4V.

Key Findings:

  • Enhanced text modality security, but with transfer risks: GPT-4o has strengthened its resistance to text jailbreak attacks, but attackers can still exploit multimodal forms for attacks.

  • New security challenges from audio modality: The newly introduced audio modality may provide new avenues for jailbreak attacks.

  • Insufficient multimodal security: GPT-4o's security performance in the multimodal aspect is inferior to that of GPT-4V, indicating potential security vulnerabilities in integrating different modalities.

Experimental Methods:

Utilized over 4,000 initial text queries, 8,000+ response assessments, and 16,000+ API queries.

Evaluated open-source jailbreak datasets based on single and multimodal approaches, including AdvBench, RedTeam-2K, SafeBench, and MM-SafetyBench.

Tested seven jailbreak methods, including template-based methods, GCG, AutoDAN, PAP, and BAP, among others.

image.png

Evaluation Metrics:

Attack Success Rate (ASR) was used as the primary evaluation metric, reflecting the ease of jailbreaking the model.

Experimental Results:

In pure text modality, GPT-4o's safety level was lower than GPT-4V without attacks, but it exhibited higher security under attack conditions.

The security of the audio modality was relatively high, with direct conversion of text to audio being difficult to jailbreak GPT-4o.

Multimodal security tests showed that GPT-4o was more susceptible to attacks in certain scenarios compared to GPT-4V.

Conclusions and Recommendations:

The research team emphasized that despite the improvements in multimodal capabilities of GPT-4o, its security issues cannot be overlooked. They recommend raising awareness of the security risks associated with large multimodal models and prioritizing the development of alignment strategies and mitigation techniques. Additionally, due to the lack of multimodal jailbreak datasets, researchers call for the establishment of more comprehensive multimodal datasets to more accurately assess model security.

Paper Link: https://arxiv.org/abs/2406.06302

Project Link: https://github.com/NY1024/Jailbreak_GPT4o