Recently, a secret operation by tech giant NVIDIA in data acquisition was exposed. According to reports from media outlet 404, NVIDIA has been scraping vast amounts of YouTube video data to train their artificial intelligence models, a practice that is legally and ethically ambiguous.


The report indicates that NVIDIA is utilizing this video data to train multiple AI models, including the Cosmos deep learning model, autonomous driving algorithms, digital human AI avatars, and the 3D world-building tool Omniverse.

It is understood that NVIDIA has taken numerous covert measures to hide their data scraping activities, using multiple "virtual machines" and constantly changing IP addresses to avoid detection by YouTube. Moreover, the video creators and YouTube's parent company, Google, have not granted any authorization for this data scraping. Internal communications at NVIDIA reveal a bold strategy; an executive mentioned in an email the construction of a "video data factory" capable of generating visual experience data equivalent to a human lifetime every day.

Interestingly, when employees expressed concerns about the legality and ethics of this data acquisition, management appeared quite confident, asserting that all decisions were made at the highest level. The email stated, "We have comprehensive approval for all data."

More troubling is that NVIDIA knowingly used a dataset called HD-VG-130M containing 130 million YouTube videos, originally created for academic research. Many experts strongly disapprove of using research data for commercial purposes.

As a key player in the AI industry, NVIDIA holds a significant market position, with their Graphics Processing Units (GPUs) being the foundation for many compute-intensive AI systems. Companies like OpenAI, Microsoft, and Google, which collaborate with NVIDIA, have expressed concern over this behavior. Google's spokesperson noted that using YouTube data without permission clearly violates the platform's terms of service.

In response to media inquiries, NVIDIA claims that their AI training practices are "fully in line with the spirit and letter of copyright law." However, how will the creators of these contents view this statement?

Key Points:

📹 NVIDIA secretly scrapes large amounts of YouTube video data for AI training, raising legal and ethical concerns.

💻 Internal emails show NVIDIA executives believe this action has received comprehensive approval, adopting a bold stance.

📜 Google points out that using YouTube data without permission clearly violates the platform's terms of service, sparking controversy over NVIDIA's response.