Solving the AI Video Forgery Problem? DIVID Detects AI-Generated Videos with an Accuracy of 93.7%

AI-generated visuals are becoming increasingly realistic, making it difficult for humans (and existing detection systems) to distinguish between real and fake videos. To address this issue, researchers at Columbia University's School of Engineering, led by computer science professor Junfeng Yang, have developed a new tool called DIVID, short for DIffusion-generated VIdeo Detector. DIVID is an extension of the team's earlier release, Raidar, which detects AI-generated text by analyzing the text itself without needing to access the internal workings of large language models.

DIVID improves upon earlier methods for detecting generated videos, effectively identifying videos created by older AI models such as Generative Adversarial Networks (GANs). GANs are AI systems with two neural networks: one creates fake data, and the other evaluates to distinguish between real and fake. Through continuous feedback, both networks improve, producing highly realistic synthetic videos. Current AI detection tools look for significant signs such as unusual pixel arrangements, unnatural movements, or inconsistencies between frames, which are typically not found in real videos.

New-generation AI video tools, such as OpenAI's Sora, Runway Gen-2, and Pika, use diffusion models to create videos. Diffusion models are AI technologies that create images and videos by gradually transforming random noise into clear, realistic images. For videos, they optimize each frame while ensuring smooth transitions, resulting in high-quality, realistic outputs. This increasingly sophisticated development of AI-generated videos poses a significant challenge to their authenticity detection.

Bernadette Young's team used a technique called DIRE (DIffusion Reconstruction Error) to detect diffusion-generated images. DIRE measures the difference between the input image and the corresponding output image reconstructed by a pre-trained diffusion model.

Junfeng Yang, co-director of the Software Systems Laboratory, has been exploring how to detect AI-generated text and videos. Earlier this year, with the release of Raidar, Junfeng Yang and collaborators achieved a method to detect AI-generated text by analyzing the text itself, without needing to access the internal workings of large language models like chatGPT-4, Gemini, or Llama. Raidar uses language models to rephrase or modify given text, then measures the number of edits the system makes to the given text. More edits suggest the text may be human-written, while fewer edits suggest it may be machine-generated.

Junfeng Yang stated, "The insight behind Raidar — that another AI typically considers another AI's output to be of high quality, hence it makes fewer edits — is a powerful one, not limited to text." He added, "Given that AI-generated videos are becoming increasingly realistic, we hope to leverage the insights from Raidar to create a tool that can accurately detect AI-generated videos."

Researchers developed DIVID using the same concept. This new method for detecting generated videos can identify videos created by diffusion models. The research paper was presented at the Computer Vision and Pattern Recognition Conference (CVPR) in Seattle on June 18, 2024, along with the release of open-source code and datasets.

Paper link: https://arxiv.org/abs/2406.09601

Key Points:

- In response to increasingly realistic AI-generated videos, researchers at Columbia University's School of Engineering have developed a new tool, DIVID, which can detect AI-generated videos with 93.7% accuracy.

- DIVID improves upon previous methods for detecting new-generation AI videos, capable of identifying videos created by diffusion models, which can gradually transform random noise into high-quality, realistic video images.

- Researchers extended the insights from Raidar's AI-generated text detection to videos, using language models to rephrase or modify text or videos, then measuring the number of edits the system makes to determine authenticity.