As AI-generated images become increasingly realistic, many people find themselves questioning during video consumption: is this truly a real-life shot, or the masterpiece of AI?
Recently, a video by "QuantumBit" on Bilibili discussing how to use AI to identify AI-generated videos sparked heated discussions, garnering over 1.68 million views in an instant. Let's take a look at how AI can "spot" AI at a glance.
The video introduces some tips for identifying AI-generated videos with the naked eye. For instance, paying attention to whether the characters exhibit unnatural movements or expressions, and whether the voice, mouth movements, and emotions are coordinated when speaking. However, given the vast amount of videos, relying solely on human effort is clearly insufficient, which is where AI comes into play.
AI has a unique advantage in identifying deepfake videos. Deepfake technology typically stitches synthetic parts frame by frame onto the original video. Although the human eye might only sense something "off," AI can accurately pinpoint these "stitching traces." Just as every individual has unique fingerprints, the lighting, texture, and other information in different videos are difficult to perfectly replicate, and these subtle differences are key to AI recognition.
For videos completely generated by AI, the identification methods are even more complex. Research teams have developed three classifiers by examining model characteristics, motion characteristics, and geometric monocular depth characteristics. Taking videos generated by Sora as an example, the instability in the number of characters and animals, abnormal changes in color and shadows during object movement, and perspective and proportion errors during camera movements all serve as important clues for AI recognition.
More interestingly, researchers have discovered a new method called DIVID. They found that if AI videos and real videos are separately regenerated by diffusion models, the results will be significantly different. AI-generated video pixels often more closely resemble the average of the training data, while human-created videos exhibit distinct individuality in various aspects. The DIVID algorithm developed based on this characteristic achieves an accuracy rate of 93.7% in identifying videos generated by Sora.
The emergence of these AI recognition methods undoubtedly provides us with a powerful tool to combat the spread of false information. They are like the fiery eyes and golden pupils in the digital world, helping us discern truth from falsehood in the sea of information.