Meta Executives Focused on Beating GPT-4, Court Documents Show

Meta’s Drive to Surpass GPT-4 in AI Development
Internal communications recently made public in the Kadrey v. Meta AI copyright case reveal a strong focus among Meta’s leadership and researchers on exceeding the capabilities of OpenAI’s GPT-4 model during the development of Llama 3.
Intense Competition and Goal Setting
Ahmad Al-Dahle, Meta’s VP of Generative AI, articulated the primary objective in an October 2023 message to researcher Hugo Touvron: “Our goal needs to be GPT-4.” He emphasized the company’s substantial computational resources, stating, “We have 64k GPUs coming! We need to learn how to build frontier and win this race.”
Despite Meta’s commitment to releasing open AI models, the internal discussions demonstrate a greater preoccupation with outperforming competitors like Anthropic and OpenAI, who typically restrict access to their models through APIs.
Dismissal of Competitors
While the French AI startup Mistral, a significant open-source competitor, was mentioned in the internal correspondence, it was largely discounted. Al-Dahle expressed this sentiment, stating, “Mistral is peanuts for us,” and later added, “We should be able to do better.”
Aggressive Data Acquisition
The filings highlight the highly competitive nature of Meta’s AI initiatives. Leaders within the company discussed employing “very aggressive” tactics to secure the necessary data for Llama’s training. One executive even conveyed that “Llama 3 is literally all I care about” to colleagues.
Concerns Regarding Copyrighted Material
The prosecution alleges that Meta’s pursuit of rapid AI model deployment may have involved compromises, specifically the utilization of copyrighted books for training purposes.
Hugo Touvron acknowledged that the dataset composition for Llama 2 “was bad” and proposed improvements for Llama 3 through a more refined selection of data sources. Discussions then centered on facilitating access to the LibGen dataset, which includes copyrighted materials from publishers such as Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education.
Seeking Authorization for Data Usage
Ahmad Al-Dahle inquired about the datasets being utilized, asking, “Do we have the right datasets in there[?]” He further questioned if any desired data sources had been excluded due to unwarranted restrictions.
Zuckerberg’s Vision and Llama 3’s Performance
Mark Zuckerberg, Meta’s CEO, has previously stated his intention to diminish the performance disparity between Llama’s AI models and those of closed-source developers like OpenAI and Google. The internal messages underscore the significant internal pressure to achieve this goal.
In a July 2024 letter, Zuckerberg declared, “This year, Llama 3 is competitive with the most advanced models and leading in some areas.” He further projected that “Starting next year, we expect future Llama models to become the most advanced in the industry.”
Scrutiny of Training Data
Upon its release in April 2024, Llama 3 proved to be competitive with leading closed models from Google, OpenAI, and Anthropic, and surpassed other open-source alternatives like those from Mistral. However, the data employed in its training – data reportedly approved for use by Zuckerberg despite its copyright status – is currently under examination in multiple ongoing legal proceedings.
Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature
