Recently, The New York Times and the Daily News jointly filed a lawsuit against OpenAI, accusing it of using their works to train artificial intelligence models without authorization.
The development of this case has drawn public attention, as the plaintiffs' legal team pointed out in the latest court documents that OpenAI's engineers accidentally deleted evidence that could be crucial to the case while handling the relevant data.
It is reported that OpenAI agreed to provide two virtual machines this fall so that the plaintiffs' legal team could search its training data for copyrighted content. A virtual machine is a virtual computer that runs within a computer operating system, typically used for testing, data backup, and running applications. Since November 1, legal advisors from The New York Times and the Daily News, along with their hired experts, have worked over 150 hours on OpenAI's training data.
However, on November 14, OpenAI's engineers accidentally cleared the search data stored on one of the virtual machines. According to a letter from the plaintiffs' lawyers, although OpenAI attempted to recover the lost data and was successful in most cases, the recovered data could not be used to determine how the articles from the news plaintiffs were used to train OpenAI's models due to the "irrecoverable" folder structure and file names.
The plaintiffs' legal advisors noted that they do not believe this deletion was intentional, but the incident indicates that OpenAI "is in the best position to search its own dataset for potential infringing content." This means that OpenAI should utilize its own tools to more effectively locate relevant infringing content.
In this case and other similar cases, OpenAI has consistently maintained that using publicly available data for model training constitutes fair use. This means that OpenAI believes it does not need to pay copyright fees for using these examples, even though it profits from these models.
It is worth mentioning that OpenAI has signed licensing agreements with an increasing number of new media outlets, including the Associated Press, Business Insider, and the Financial Times, but the specific terms of these agreements have not been made public. It is reported that content partner Dotdash receives at least $16 million annually.
Despite the legal disputes, OpenAI has neither confirmed nor denied using specific copyrighted works for AI training without permission.
Key Points:
🌐 OpenAI is accused of accidentally deleting potentially important evidence in a copyright lawsuit.
🕒 The plaintiffs' lawyers state that they have spent significant time and resources trying to recover the data.
💼 OpenAI insists that its use of public data for training models falls under fair use.