LOGO

AI Copyright Case: Zuckerberg Cites YouTube in Defense

January 16, 2025
AI Copyright Case: Zuckerberg Cites YouTube in Defense

Meta CEO Defends Data Usage in AI Training

Mark Zuckerberg, CEO of Meta, seemingly referenced YouTube’s efforts to combat piracy in an attempt to justify his company’s utilization of a dataset containing copyrighted e-books. This emerged from recently published excerpts of a deposition he provided late last year.

AI Copyright Litigation

The deposition is connected to the Kadrey v. Meta Platforms case, an AI copyright dispute. This case is one of several currently progressing through the U.S. legal system, involving conflicts between AI developers and copyright owners.

Generally, AI companies involved in these lawsuits assert that training their models on copyrighted material constitutes “fair use.” However, this position is widely contested by copyright holders.

Zuckerberg’s YouTube Analogy

“YouTube may, at times, host illegally copied content, but they actively work to remove it,” Zuckerberg stated during his deposition, as per transcript portions released Wednesday. “I assume the majority of content on YouTube is legitimate and properly licensed.”

Insights from the Deposition

The released segments of Zuckerberg’s deposition offer some insight into his perspective on copyright and fair use. It’s important to note that the complete deposition transcript has not been made public.

TechCrunch has contacted Meta for further clarification and will update this report if a response is received.

The LibGen Dataset

Zuckerberg’s statements appear to be a defense of Meta’s decision to use LibGen, a collection of e-books, to train its Llama family of AI models. These Llama models are direct competitors to leading AI models developed by companies like OpenAI.

LibGen identifies itself as a “links aggregator,” offering access to copyrighted works from major publishers such as Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education.

The platform has faced multiple lawsuits, shutdown orders, and substantial fines—totaling millions of dollars—due to copyright infringement.

Internal Concerns at Meta

Court documents unsealed this week indicate that Zuckerberg reportedly authorized the use of LibGen for training at least one Llama model, despite internal concerns raised by Meta’s AI and research teams regarding potential legal ramifications.

Plaintiffs’ counsel, representing authors including Sarah Silverman and Ta-Nehisi Coates, presented evidence of Meta employees describing LibGen as a “data set we know to be pirated” and warning that its use “may undermine [Meta’s] negotiating position with regulators.”

Zuckerberg’s Claim of Unfamiliarity

During his deposition, Zuckerberg claimed he “hadn’t really heard of” LibGen.

“I understand you are attempting to elicit my opinion on LibGen, which is not something I am familiar with,” he stated. “I simply lack knowledge of that specific resource.”

Justification for Dataset Usage

Under questioning by attorney David Boies, Zuckerberg explained why a blanket prohibition on using datasets like LibGen would be unreasonable.

“Would I implement a policy against utilizing YouTube simply because some of its content may be copyrighted? No,” he responded. “[T]here are instances where such a broad restriction might not be appropriate.”

Caution Regarding Copyrighted Material

Zuckerberg did acknowledge that Meta should exercise “pretty careful” consideration when training on copyrighted material.

“If a website is intentionally designed to infringe upon rights… obviously, we would need to be cautious about engaging with it, or potentially prevent our teams from doing so,” Zuckerberg explained in his deposition.

Updated Claims in Kadrey v. Meta Platforms

Since the initial filing in 2023 within the U.S. District Court for the Northern District of California, San Francisco Division, the plaintiffs’ legal team in Kadrey v. Meta Platforms has repeatedly revised their complaint. The most recent amendment, submitted late Wednesday by the plaintiffs’ attorneys, introduces further accusations against Meta. These include claims that the company correlated titles from the LibGen repository of pirated books with copyrighted works available for licensing.

It is alleged that Meta employed this method to assess the viability of securing licensing agreements with publishers. This strategy would allow them to determine if pursuing a license was a worthwhile investment.

The amended filing asserts that Meta leveraged LibGen in the training process for its newest Llama 3 models. Furthermore, plaintiffs contend that the dataset is currently being utilized to develop the subsequent generation, Llama 4.

According to the revised complaint, Meta researchers purportedly attempted to obscure the use of copyrighted material in the training of Llama models. This was allegedly achieved through the inclusion of “supervised samples” during the fine-tuning phase.

The complaint also alleges that Meta continued to download pirated e-books from Z-Library for Llama model training as recently as April 2024. This practice reportedly persisted despite known legal risks.

Z-Library, often referred to as Z-Lib, has faced numerous legal challenges initiated by publishers. These actions have included domain name seizures and content removal orders.

In 2022, individuals identified as the operators of Z-Library, Russian nationals, were formally charged with offenses including copyright violation, wire fraud, and money laundering. These charges highlight the serious legal implications associated with the platform’s operations.

#Mark Zuckerberg#AI copyright#YouTube#AI defense#copyright lawsuit#artificial intelligence