Zuckerberg Aware of Meta Using Pirated Library Data to Train AI

AIbase基地

Published inAI News · 4 min read · Jan 15, 2025

123

Recently, documents revealed in a collective lawsuit concerning copyright against Meta have drawn significant attention to the company's use of a piracy eBook library called Library Genesis (LibGen) to train its latest AI chatbot, Llama3. These documents show that Meta's engineers discussed the potential risks of utilizing LibGen, a "shadow library," especially in light of the growing concerns surrounding copyright and data ownership. Despite the potential negative impacts and public relations risks, Meta's CEO Mark Zuckerberg approved this decision.

Library Study Reading (3)

Image Source Note: Image generated by AI, image licensed by Midjourney

At the court's request, confidential internal conversations about using the LibGen dataset were unsealed, revealing that Meta's executives explicitly stated in discussions with the AI research team that the data from LibGen was "known to be pirated," and they agreed to use this data to enhance Llama3's performance. In an email, Meta's product management director Sony Theakanath noted that while the decision to use LibGen posed public relations risks, other AI companies were also using similar data, leading Meta's team to feel that this path was not unique.

More concerning is that Meta employees discussed how to handle and filter the text from LibGen to remove copyright identifiers such as ISBNs and copyright notices. An internal memo stated that the materials provided by LibGen were "high quality and lengthy, making them very suitable for learning particularly specialized knowledge." This suggests that Meta appears to be attempting to obscure its use of unauthorized content.

Additionally, Meta employees mentioned in emails that directly using the company's IP address for torrent downloads might be inappropriate and expressed concerns about this practice. However, under Zuckerberg's "top-down push" to use the LibGen dataset, Meta's competitive drive in the AI race became apparent. This incident has once again raised external scrutiny and questions regarding major tech companies' handling of copyright issues.

The outcome of this copyright lawsuit could significantly impact other ongoing similar cases, especially those involving the use of creative works such as images, music, and literature. As tech companies' demand for original content continues to rise, the rights of original content creators will become a focal point of attention.

IBM Releases Granite 4.0 3B Vision: A New Tool for Enterprise Document Data Extraction

IBM launches Granite 4.0 3B Vision, a 3-billion-parameter visual-language model optimized for extracting complex enterprise document data. It excels in processing unstructured data in finance, law, and healthcare, particularly with tables, scans, and multi-modal layouts, by integrating visual understanding and language generation to accurately identify and extract key information.....

Perplexity AI Faces Class-Action Lawsuit Over Alleged Leak of User Chat Records to Meta and Google

Perplexity AI faces a U.S. class-action lawsuit for allegedly sharing sensitive user data with third parties like Meta and Google without consent, potentially exposing private financial and tax information even in incognito mode. The company states it has not yet received formal legal documents.....

Zhiyuan Robotics' 10,000th Expedition A3 Mass Production Unit Rolls Off the Line, Achieving a Tenfold Scale Increase in 15 Months

The 10,000th general-purpose embodied robot 'Expedition A3' by Zhiyuan Robot has rolled off the production line, marking its transition from R&D to large-scale industrial production. Since January 2025, production capacity has increased tenfold in 15 months, with the jump from 5,000 to 10,000 units taking just over three months, demonstrating exponential growth.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

Zuckerberg Aware of Meta Using Pirated Library Data to Train AI

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Cursor 3 Released: Software Development Enters the Age of Agent Autonomy

Rejecting Compute Anxiety! Apple's LGTM Framework Launches: Enabling 4K-Grade 3D Rendering to Take Off on Vision Pro

Ministry of Commerce Responds to Meta's Acquisition of Manus: Support for Transnational Cooperation but Must Comply with Laws and Fulfill Obligations

WeChat Enhances AI, Modifies Video Governance, Handles 3,800 Violating Short Video Contents

IBM Releases Granite 4.0 3B Vision: A New Tool for Enterprise Document Data Extraction

Perplexity AI Faces Class-Action Lawsuit Over Alleged Leak of User Chat Records to Meta and Google

Meta Releases New Ray-Ban Smart Glasses: Support for Multiple Prescription Options, Starting at $499

Xiaomi's Self-Developed Input Method Exposed: Built-in Large Model, Supports System-Level AI Correction

Zhiyuan Robotics' 10,000th Expedition A3 Mass Production Unit Rolls Off the Line, Achieving a Tenfold Scale Increase in 15 Months

YouTube Flooded with AI-Generated Spam Videos: Platform Dilemma Under Algorithm-Driven Content Pressure

AI News Recommendations

Cursor 3 Released: Software Development Enters the Age of Agent Autonomy

Rejecting Compute Anxiety! Apple's LGTM Framework Launches: Enabling 4K-Grade 3D Rendering to Take Off on Vision Pro

Ministry of Commerce Responds to Meta's Acquisition of Manus: Support for Transnational Cooperation but Must Comply with Laws and Fulfill Obligations

WeChat Enhances AI, Modifies Video Governance, Handles 3,800 Violating Short Video Contents

IBM Releases Granite 4.0 3B Vision: A New Tool for Enterprise Document Data Extraction

Perplexity AI Faces Class-Action Lawsuit Over Alleged Leak of User Chat Records to Meta and Google

Meta Releases New Ray-Ban Smart Glasses: Support for Multiple Prescription Options, Starting at $499

Xiaomi's Self-Developed Input Method Exposed: Built-in Large Model, Supports System-Level AI Correction

Zhiyuan Robotics' 10,000th Expedition A3 Mass Production Unit Rolls Off the Line, Achieving a Tenfold Scale Increase in 15 Months

YouTube Flooded with AI-Generated Spam Videos: Platform Dilemma Under Algorithm-Driven Content Pressure

GEO Services