MIT Study: AI Doesn't Have Values

Debunking Claims of AI Value Systems

A widely circulated study previously suggested that increasingly advanced AI might develop inherent “value systems,” potentially leading it to prioritize self-preservation over human interests. However, a new research paper originating from MIT challenges this assertion, concluding that AI currently does not possess any demonstrably coherent values.

Challenges in AI Alignment

The MIT study’s authors propose that their findings indicate that “aligning” AI systems – ensuring predictable and desirable behavior – may prove more complex than commonly believed. They emphasize that contemporary AI frequently exhibits tendencies to hallucinate and imitate, contributing to its inherent unpredictability.

“We can confidently state that models do not consistently adhere to assumptions regarding stability, extrapolability, and steerability,” explained Stephen Casper, an MIT doctoral student and co-author of the research, in an interview with TechCrunch. “While it’s valid to observe a model expressing preferences aligned with specific principles under certain conditions, generalizing these observations to define the model’s overall opinions or preferences based on limited experiments is problematic.”

Investigating AI Preferences

Casper and his colleagues examined several recent AI models developed by Meta, Google, Mistral, OpenAI, and Anthropic. Their investigation focused on determining the extent to which these models displayed consistent “views” and values – for example, leaning towards individualistic or collectivist ideologies.

They also explored the possibility of “steering” these views – modifying them – and assessed how firmly the models maintained their opinions across diverse scenarios.

Inconsistent and Unstable Preferences

The research revealed that none of the models demonstrated consistent preferences. Their viewpoints shifted significantly depending on the phrasing and structure of the prompts they received.

Casper believes this inconsistency provides strong evidence that these models are fundamentally “inconsistent and unstable,” and potentially incapable of truly internalizing human-like preferences.

“My primary conclusion from this research is a realization that models shouldn’t be viewed as systems possessing a stable, coherent set of beliefs and preferences,” Casper stated. “Rather, they are fundamentally imitators, prone to confabulation and generating arbitrary responses.”

Expert Agreement on AI's Nature

Mike Cook, a research fellow at King’s College London specializing in AI, who was not involved in the study, concurred with the authors’ conclusions. He highlighted a frequent disconnect between the “scientific reality” of AI systems and the interpretations people assign to them.

“An AI model cannot ‘oppose’ a change in its values – this is a projection of human characteristics onto the system,” Cook explained. “Attributing such anthropomorphic qualities to AI systems indicates either a pursuit of attention or a fundamental misunderstanding of the human-AI relationship.”

He further clarified, “Is an AI system optimizing for its goals, or is it ‘acquiring its own values’? This is a matter of descriptive language and the level of embellishment used.”

Topics

More

MIT Study: AI Doesn't Have Values | AI Ethics

Debunking Claims of AI Value Systems

Challenges in AI Alignment

Investigating AI Preferences

Inconsistent and Unstable Preferences

Expert Agreement on AI's Nature