Will Smith Eating Spaghetti & Viral AI Benchmarks of 2024

The Rise of Unconventional AI Benchmarks

It has become a recurring phenomenon: upon the release of a new AI video generator, individuals quickly test its capabilities by creating a video depicting actor Will Smith consuming spaghetti.

This practice has evolved into a popular meme and serves as an informal benchmark. The goal is to assess whether the new video generator can realistically portray Smith enjoying a bowl of noodles. The actor himself playfully acknowledged this trend with a post on Instagram in February.

Beyond Will Smith and Pasta

The pairing of Will Smith with pasta represents just one example of several unusual “unofficial” benchmarks that have gained traction within the AI community in 2024.

A 16-year-old developer constructed an application granting AI control within Minecraft, evaluating its capacity to design complex structures. Simultaneously, a programmer from Britain developed a platform enabling AI to compete in games such as Pictionary and Connect 4.

While more rigorous, academic evaluations of AI performance do exist, the more eccentric tests have captured significant attention. But why is this the case?

Many established AI benchmarks fail to resonate with the general public. Companies frequently highlight their AI’s proficiency in solving problems from Math Olympiads or tackling challenges at the doctoral level.

However, the majority of users – including this author – utilize chatbots for more commonplace tasks, like composing email responses and conducting preliminary research.

Limitations of Current Benchmarking Methods

Even crowdsourced industry metrics aren’t without their shortcomings. Platforms like Chatbot Arena, where users rate AI performance on tasks like web app creation or image generation, often attract a biased audience.

The majority of raters are affiliated with the AI and technology sectors, and their evaluations are frequently influenced by subjective and difficult-to-quantify preferences.

Ethan Mollick, a professor at Wharton, recently highlighted another issue with many industry benchmarks: they often lack comparison to average human performance.

He noted the absence of diverse benchmarks across fields like medicine, law, and advisory services, despite the increasing use of AI in these areas.

Unconventional AI benchmarks, such as those involving Connect 4, Minecraft, and Will Smith, are not scientifically rigorous or broadly applicable. Success in the “Will Smith test” doesn’t guarantee proficiency in generating other visuals, like a depiction of a burger.

The Value of Understandability and Downstream Impact

One expert suggested that the AI community should prioritize evaluating the real-world effects of AI, rather than focusing solely on its performance in isolated domains. This is a logical approach.

However, it’s likely that these quirky benchmarks will persist. They are inherently entertaining – who wouldn’t enjoy watching AI construct Minecraft castles? – and they are easily understood.

Furthermore, as noted by a colleague, the industry continues to struggle with translating the complexities of AI into easily digestible marketing materials.

The question now is not if, but which unusual benchmarks will gain viral attention in 2025?

Stay informed: Subscribe to TechCrunch’s AI newsletter to receive updates every Wednesday.

Topics

More

Will Smith Eating Spaghetti & Viral AI Benchmarks of 2024

The Rise of Unconventional AI Benchmarks

Beyond Will Smith and Pasta

Limitations of Current Benchmarking Methods

The Value of Understandability and Downstream Impact

Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature

Amazon Appoints Peter DeSantis to Lead New AI Organization