A new study from the University of Zurich has revealed that large language models, the engines behind modern artificial intelligence systems, may not be as impartial as once believed.
Their judgments stay consistent only until they learn who wrote the text they are asked to evaluate. Once the author’s identity or nationality is disclosed, that neutrality collapses.
The strongest example of this breakdown appeared when the author was said to be from China, even in tests involving China’s own AI model.
The research, published in Science Advances, examined four of the world’s most widely used systems including OpenAI’s o3-mini, DeepSeek Reasoner from China, Elon Musk’s xAI Grok 2, and France’s Mistral. Together, they were asked to write and then assess thousands of short statements covering 24 controversial subjects ranging from pandemic policies and climate change to human rights and global conflicts. Each model generated fifty statements per topic, and then all models were tasked with rating those statements under different conditions. Sometimes the source was hidden; at other times, it was described as coming from a human or another AI, with nationality occasionally added.
That setup produced a staggering 192,000 separate evaluations. When the source information was withheld, the models showed a remarkable level of agreement, above ninety percent across nearly every subject. Their answers were almost identical regardless of who had written the statement. That pattern changed completely once the author’s identity entered the equation. When told that a passage came from “a person from China,” agreement levels plunged, and the same texts were rated far less favorably.
DeepSeek Reasoner, the model built in China, showed the steepest drop. Its ratings fell by more than seventy percent for some geopolitical topics, such as Taiwan’s sovereignty, when the text was falsely attributed to a Chinese writer. In those cases, DeepSeek dismissed arguments it had previously judged as logical and well-written, apparently because it expected a Chinese person to hold a certain view. The bias was not limited to DeepSeek. OpenAI’s model, Grok 2, and Mistral all showed smaller but consistent reductions in agreement once a Chinese source was added.
The researchers described this as a framing effect, similar to how people can judge the same idea differently depending on who expresses it. In humans, this behavior often arises from preconceived beliefs or social identity. In AI models, it appears as a statistical echo of the patterns learned from data. These models seem to associate certain nationalities with expected viewpoints, and when a statement deviates from that assumption, they downgrade its credibility.
The bias did not only appear with Chinese sources. Some smaller asymmetries were also found elsewhere. Grok 2, for instance, reduced its agreement sharply when an American was linked to statements supporting universal health care, suggesting that cultural expectations may influence how the models reason. But the Chinese framing produced the most consistent and significant shift across all systems.
Interestingly, the study also found that AI models tend to trust people more than other models. When they thought a text was written by another AI, they lowered their agreement slightly, hinting at a built-in skepticism toward machine-generated reasoning.
What makes these findings particularly relevant is that such models are increasingly being used to evaluate real-world content. They rank online posts, filter misinformation, review job applications, and even grade essays. If their fairness depends on who they think the author is, that creates a serious risk in any process where identities or origins are visible.
The researchers stress that the problem is not political indoctrination or deliberate bias. Instead, it emerges from subtle cues in the data used to train these systems. Small details (a name, a nationality, or even a hint of cultural context) can trigger statistical shortcuts that distort how an argument is judged.
The study proposes a few solutions. Removing all author information before running an evaluation could prevent such bias. Comparing answers with and without source details may also help detect when prejudice is at work. Finally, any automated judgment should keep a human reviewer involved, particularly in sensitive areas like moderation or academic assessment.
All in all, the work suggests that large language models may reflect more of our human habits than their designers intend. They mirror shared assumptions, including the subtle ones about where ideas come from. When stripped of that context, they reason calmly and consistently. But the moment they know who spoke, even machines start to take sides.
Notes: This post was edited/created using GenAI tools.
Read next: Google Play Will Expose Battery-Hungry Apps Starting March 2026
Their judgments stay consistent only until they learn who wrote the text they are asked to evaluate. Once the author’s identity or nationality is disclosed, that neutrality collapses.
The strongest example of this breakdown appeared when the author was said to be from China, even in tests involving China’s own AI model.
The research, published in Science Advances, examined four of the world’s most widely used systems including OpenAI’s o3-mini, DeepSeek Reasoner from China, Elon Musk’s xAI Grok 2, and France’s Mistral. Together, they were asked to write and then assess thousands of short statements covering 24 controversial subjects ranging from pandemic policies and climate change to human rights and global conflicts. Each model generated fifty statements per topic, and then all models were tasked with rating those statements under different conditions. Sometimes the source was hidden; at other times, it was described as coming from a human or another AI, with nationality occasionally added.
That setup produced a staggering 192,000 separate evaluations. When the source information was withheld, the models showed a remarkable level of agreement, above ninety percent across nearly every subject. Their answers were almost identical regardless of who had written the statement. That pattern changed completely once the author’s identity entered the equation. When told that a passage came from “a person from China,” agreement levels plunged, and the same texts were rated far less favorably.
DeepSeek Reasoner, the model built in China, showed the steepest drop. Its ratings fell by more than seventy percent for some geopolitical topics, such as Taiwan’s sovereignty, when the text was falsely attributed to a Chinese writer. In those cases, DeepSeek dismissed arguments it had previously judged as logical and well-written, apparently because it expected a Chinese person to hold a certain view. The bias was not limited to DeepSeek. OpenAI’s model, Grok 2, and Mistral all showed smaller but consistent reductions in agreement once a Chinese source was added.
The researchers described this as a framing effect, similar to how people can judge the same idea differently depending on who expresses it. In humans, this behavior often arises from preconceived beliefs or social identity. In AI models, it appears as a statistical echo of the patterns learned from data. These models seem to associate certain nationalities with expected viewpoints, and when a statement deviates from that assumption, they downgrade its credibility.
The bias did not only appear with Chinese sources. Some smaller asymmetries were also found elsewhere. Grok 2, for instance, reduced its agreement sharply when an American was linked to statements supporting universal health care, suggesting that cultural expectations may influence how the models reason. But the Chinese framing produced the most consistent and significant shift across all systems.
Interestingly, the study also found that AI models tend to trust people more than other models. When they thought a text was written by another AI, they lowered their agreement slightly, hinting at a built-in skepticism toward machine-generated reasoning.
What makes these findings particularly relevant is that such models are increasingly being used to evaluate real-world content. They rank online posts, filter misinformation, review job applications, and even grade essays. If their fairness depends on who they think the author is, that creates a serious risk in any process where identities or origins are visible.
The researchers stress that the problem is not political indoctrination or deliberate bias. Instead, it emerges from subtle cues in the data used to train these systems. Small details (a name, a nationality, or even a hint of cultural context) can trigger statistical shortcuts that distort how an argument is judged.
The study proposes a few solutions. Removing all author information before running an evaluation could prevent such bias. Comparing answers with and without source details may also help detect when prejudice is at work. Finally, any automated judgment should keep a human reviewer involved, particularly in sensitive areas like moderation or academic assessment.
All in all, the work suggests that large language models may reflect more of our human habits than their designers intend. They mirror shared assumptions, including the subtle ones about where ideas come from. When stripped of that context, they reason calmly and consistently. But the moment they know who spoke, even machines start to take sides.
Notes: This post was edited/created using GenAI tools.
Read next: Google Play Will Expose Battery-Hungry Apps Starting March 2026
