Real scene or AI-generated? The sharp eyes of identifying "Virtual Video" are here! The accuracy rate is as high as 93.7%

Real scene or AI-generated? The sharp eyes of identifying "Virtual Video" are here! The accuracy rate is as high as 93.7%

Today, AI video generation tools are changing industries such as design, marketing, entertainment, and education by producing realistic video content. In particular, the Sora, Gen-3 and other live video models can generate realistic, continuous, high-quality video blockbusters by simply inputting a few lines of prompt text.

While this technology has brought countless possibilities to creators around the world, it has also brought many harms and risks to the general public , especially in terms of spreading false information, propaganda, scams and phishing.
Therefore, how to accurately identify AI-generated videos has become an issue that everyone needs to care about.

Recently, Professor Junfeng Yang's team at Columbia University developed a video detection tool called DIVID (DIffusion-generated VIdeo Detector). For videos generated by models such as SORA, Gen-2 and Pika, the detection accuracy rate reached 93.7% .

The research paper, which includes open-source code and a dataset, was presented at the Conference on Computer Vision and Pattern Recognition (CVPR) in Seattle last month.

How was DIVID created?

Existing deepfake detectors perform well in identifying samples generated by GANs, but are not robust enough in detecting videos generated by diffusion models.

In this work, the research team used a new tool called DIVID to detect videos generated by AI. According to reports, DIVID is based on Raidar, a result released by the team earlier this year, which detects text generated by AI by analyzing the text itself without accessing the internal operation of the large language model (LLM).

Raidar uses LLM to restate or revise a given text, and then measures the number of edits the system makes to that text. More edits means the text is more likely to be written by a human, while fewer edits means the text is more likely to be machine-generated.

They developed DIVID using the same concept. DIVID works by reconstructing a video and comparing the newly reconstructed video with the original. It uses the DIRE value to detect diffusion-generated videos because the method is based on the assumption that the reconstructed images generated by the diffusion model should be very similar to each other because they are sampled from the diffusion process distribution. If there is a significant change, the original video is likely generated by a human, if not, it is likely generated by AI.

Figure | DIVID detection process. In step 1, given a series of video frames, the research team first uses the diffusion model to generate a reconstructed version of each frame. Then the DIRE value is calculated by the reconstructed frame and its corresponding input frame; in step 2, the CNN+LSTM detector is trained based on the DIRE value sequence and the original RGB frame.

The framework is based on the idea that AI-generated tools create content based on the statistical distribution of large data sets, resulting in “statistical mean” content such as pixel intensity distribution, texture patterns, and noise characteristics in video frames, as well as small inconsistencies that vary unnaturally between frames or anomalous patterns that are more likely to appear in diffusion-generated videos.

Figure | Detection performance on the in-domain test set. DIVID outperforms the baseline architecture in terms of accuracy (Acc.) and average precision (AP). RGB represents the pixel frame value in the original video.

In contrast, human-generated videos exhibit personalization and deviation from statistical normality. DIVID achieves up to 93.7% detection accuracy on videos generated by Stable Vision Diffusion, Sora, Pika, and Gen-2 in its benchmark dataset.

Future Outlook

Currently, DIVID is a command-line tool that analyzes videos and outputs whether they were generated by AI or humans, and is only available to developers. The researchers noted that their technology has the potential to be integrated into Zoom as a plug-in to detect deep fake calls in real time . The team is also considering developing a website or browser plug-in to make DIVID available to ordinary users.

The researchers are currently improving DIVID’s framework to handle different types of synthetic videos from open source video generation tools. They are also using DIVID to collect videos to expand the DIVID dataset.

"Our framework makes significant progress in detecting AI-generated content," said Dr. Yun-Yun Tsai, one of the authors of the paper. "There are too many bad actors using AI to generate videos, and it is critical to stop them and protect society."

Reference Links:

https://arxiv.org/abs/2406.09601

https://techxplore.com/news/2024-06-tool-ai-generated-videos-accuracy.html

<<:  Thunderstorm, gale or hail is coming, please take precautions in these areas →

>>:  From "building houses" to "supporting bamboo shoots", Chinese scientists pioneered a new method for crystal preparation

Recommend

The efficacy and function of hairy grape

As people's research on traditional Chinese m...

The efficacy and function of shark meat

Shark meat is a commonly used medicinal ingredien...

Is King Solomon's treasury located in the Solomon Islands?

The Solomon Islands were formed in the Cretaceous...

The efficacy and function of Veronica officinalis

Veronica officinalis is something that many peopl...

The efficacy and function of sea snakes

Sea snakes are very nutritious and precious medic...

What are the effects of Jiujie Tea?

Jiujie Tea may not be drunk by most people in nor...

The efficacy and function of Dahuntou Chicken

Dahuntou Chicken is a traditional Chinese medicin...

Rhubarb, Aconite and Asarum Decoction

Rhubarb, Aconite and Asarum Decoction is a common...

What are the effects and functions of Chinese medicine mint

In fact, mint is widely used in our daily life. S...

The efficacy and function of Schefflera arboricola

Schefflera arborvitae is a medicinal material. If...