OpenAI AI Model Thinks in Chinese - The Mystery Explained

An Unusual Phenomenon with OpenAI’s o1 Model
Soon after its release, OpenAI’s o1, the company’s initial “reasoning” AI model, exhibited a peculiar behavior that quickly garnered attention. Users observed instances where the model would spontaneously begin processing information in languages like Chinese or Persian, even when presented with an English-language query.
The Reasoning Process and Language Switching
When tasked with a problem – for example, determining the number of “R”s in the word “strawberry” – o1 would initiate a “thought” process, progressing through a series of reasoning steps to arrive at a solution. While the final response was consistently delivered in English, the model would sometimes execute portions of its reasoning in another language before reaching its conclusion.
One Reddit user reported that “[o1] randomly started thinking in Chinese midway through the process.”
A separate user on X (formerly Twitter) questioned, “Why did [o1] unexpectedly begin thinking in Chinese?” emphasizing that the preceding conversation, spanning over five messages, had not involved the Chinese language.
Possible Explanations for the Behavior
Currently, OpenAI has not offered any explanation for o1’s unusual behavior, nor have they formally acknowledged its occurrence. However, several AI experts have proposed potential theories.
The Role of Training Data
Many experts suggest that the phenomenon may be linked to the composition of the datasets used to train o1. Clément Delangue, CEO of Hugging Face, and others on X pointed out that reasoning models like o1 are trained on extensive datasets that contain a significant amount of Chinese characters.
Ted Xiao, a researcher at Google DeepMind, posited that OpenAI and similar companies utilize third-party Chinese data labeling services. He suggests that o1’s shift to Chinese represents an instance of “Chinese linguistic influence on reasoning.”
Xiao explained in a post on X that “[Labs like] OpenAI and Anthropic utilize [third-party] data labeling services for PhD-level reasoning data for science, math, and coding,” adding that “for expert labor availability and cost reasons, many of these data providers are based in China.”
Understanding Data Labeling
Data labels, also known as tags or annotations, are crucial for helping AI models understand and interpret data during training. For instance, in image recognition, labels might involve markings around objects or captions describing the elements within an image.
Research indicates that biased labels can lead to biased models. As an example, annotators are statistically more likely to flag phrases in African-American Vernacular English (AAVE) as toxic, causing AI toxicity detectors trained on these labels to unfairly perceive AAVE as disproportionately toxic.
Alternative Perspectives and Tokenization
However, not all experts agree with the data labeling hypothesis. Some argue that o1 is equally likely to switch to languages like Hindi or Thai, suggesting a more general pattern.
These experts propose that o1 and other reasoning models may simply be leveraging languages they perceive as most efficient for achieving a given objective, or that the behavior could be a form of hallucination.
Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, stated to TechCrunch, “The model doesn’t comprehend the concept of language or that languages differ; it simply processes text.”
AI models don’t directly process words; instead, they utilize tokens. These tokens can be entire words, like “fantastic,” or smaller units such as syllables (“fan,” “tas,” “tic”), or even individual characters (“f,” “a,” “n,” “t,” “a,” “s,” “t,” “i,” “c”).
Similar to labeling, tokenization can introduce biases. Many word-to-token translators assume a space indicates a new word, a convention not universally followed across all languages.
The Probabilistic Nature of AI and the Need for Transparency
Tiezhen Wang, a software engineer at Hugging Face, aligns with Guzdial, suggesting that the language inconsistencies may stem from associations formed during training.
“By embracing every linguistic nuance, we expand the model’s worldview and allow it to learn from the full spectrum of human knowledge,” Wang wrote on X. He further illustrated this point by stating a preference for performing mathematical calculations in Chinese due to the single-syllable nature of each digit, while automatically switching to English when discussing topics like unconscious bias due to initial learning experiences.
This theory aligns with the understanding that models are probabilistic machines, learning patterns from numerous examples to make predictions, such as anticipating “it may concern” following “to whom” in an email.
However, Luca Soldaini, a research scientist at the Allen Institute for AI, cautioned against definitive conclusions. They emphasized to TechCrunch that “This type of observation on a deployed AI system is impossible to back up due to how opaque these models are,” and that “It’s one of the many cases for why transparency in how AI systems are built is fundamental.”
Concluding Thoughts
Without a direct explanation from OpenAI, we are left to speculate about why o1 might contemplate songs in French but approach synthetic biology in Mandarin. The incident highlights the complexities and often unpredictable nature of advanced AI systems.
Stay informed with TechCrunch’s AI newsletter! Subscribe here to receive it in your inbox every Wednesday.
Related Posts

Disney Cease and Desist: Google Faces Copyright Infringement Claim

OpenAI Responds to Google with GPT-5.2 After 'Code Red' Memo

Waymo Baby Delivery: Birth in Self-Driving Car

Google AI Leadership: Promoting Data Center Tech Expert
