With the rapid development of AIGC technology, image editing tools have become increasingly powerful, making image tampering easier and more difficult to detect. Although existing image forgery detection and localization methods (IFDL) are usually effective, they often face two major challenges: one is the "black box" nature, with unclear detection principles; the other is limited generalization ability, making it difficult to cope with various tampering methods (such as Photoshop, DeepFake, AIGC editing). 

image.png

To address these issues, a research team from Peking University has proposed an interpretable IFDL task and designed FakeShield, a multimodal framework that can assess the authenticity of images, generate tampered region masks, and provide judgment basis based on pixel-level and image-level tampering clues.

Traditional IFDL methods can only provide the probability of image authenticity and tampered regions, but cannot explain the detection principles. Due to the limited accuracy of existing IFDL methods, manual judgment is still needed afterwards. However, due to the insufficient information provided by IFDL methods, it is difficult to support manual evaluation, and users still need to reanalyze suspicious images themselves.

image.png

 In addition, in real-world scenarios, tampering types are diverse, including Photoshop (copy-move, splicing, and removal), AIGC editing, DeepFake, etc. Existing IFDL methods usually can only handle one of these techniques, lacking comprehensive generalization ability. This forces users to identify different tampering types in advance and apply specific detection methods accordingly, greatly reducing the practicality of these models.

To solve these two major problems of existing IFDL methods, the FakeShield framework utilizes the powerful capabilities of large language models (LLM), especially multimodal large language models (M-LLM), which can align visual and textual features, thereby endowing LLM with stronger visual understanding capabilities. As LLMs are pretrained on massive and diverse world knowledge corpora, they have great potential in many application areas such as machine translation, code completion, and visual understanding.

image.png

The core of the FakeShield framework is the Multimodal Tampering Description Dataset (MMTD-Set). This dataset uses GPT-4o to enhance existing IFDL datasets, containing triplets of tampered images, modified region masks, and detailed descriptions of the edited areas. By leveraging the MMTD-Set, the research team fine-tuned the M-LLM and visual segmentation models to provide complete analysis results, including detecting tampering and generating accurate tampered region masks. 

FakeShield also includes the Domain Tag-guided Explainable Forgery Detection Module (DTE-FDM) and the Multimodal Forgery Localization Module (MFLM), respectively used to solve various types of tampering detection explanations and achieve forgery localization guided by detailed textual descriptions.

Numerous experiments show that FakeShield can effectively detect and locate various tampering techniques, providing an interpretable and superior solution compared to previous IFDL methods.

This research marks the first attempt to apply M-LLM to interpretable IFDL, indicating significant progress in the field. FakeShield not only excels in tampering detection but also provides comprehensive explanations and precise localization, demonstrating strong generalization ability for various tampering types. These features make it a versatile practical tool suitable for various real-world applications.

 In the future, this work will play a crucial role in multiple areas, such as helping to improve laws and regulations related to digital content manipulation, providing guidance for the development of generative AI, and promoting a clearer, more trustworthy online environment. Additionally, FakeShield can assist in evidence collection in legal proceedings and help correct misinformation in public discourse, ultimately contributing to the integrity and reliability of digital media.

Project Page: https://zhipeixu.github.io/projects/FakeShield/

GitHub Address: https://github.com/zhipeixu/FakeShield

Paper Address: https://arxiv.org/pdf/2410.02761