Google-Led Study Finds Language Models Struggle With Confidence When Challenged

Research highlights hidden decision-making problems in AI systems

A joint study from Google DeepMind and University College London has found that large language models (LLMs) often behave in inconsistent ways when asked to revise their answers. While these AI systems may begin with strong confidence in their first response, their belief in that answer can collapse quickly when challenged, even if the opposing input is weak or incorrect.

Models change their mind too easily under pressure

In a controlled experiment, the researchers asked models like Gemma 3, GPT-4o, and o1-preview to answer multiple-choice questions. After giving an initial answer, each model received “advice” from a fictitious AI agent. This advice was labeled with an accuracy score, such as 70% or 90%, and either supported, contradicted, or gave no input about the model’s original answer.

When the model’s initial answer was hidden during the final decision, it was far more likely to change its response. The data showed a clear pattern: “mean change of mind rate in Answer Hidden and Shown conditions [was] 32.5% and 13.1% respectively.” This difference revealed a bias known as “choice-supportive bias,” where seeing one's own answer made the model more resistant to revising it.

Contradictory advice has stronger influence than supportive input

Although the models did take advice into account, the response to opposing advice was not balanced. The study found that contradictory input caused the models to adjust their confidence too aggressively. As the authors wrote, “opposing advice was significantly overweighted compared to an ideal observer,” with a measured update ratio of 2.58 in one condition.

By comparison, when advice matched the original answer, the models showed almost no unusual change. “No overweighting was observed” for supportive input in the hidden-answer condition. This created an asymmetry where LLMs overreacted to disagreement but were largely unaffected by agreement.

Confidence scores do not always reflect model reasoning

The researchers also measured how the models estimated their own confidence before and after receiving advice. Ideally, models should lower their confidence slightly when hearing contrary input and raise it modestly when advice aligns with their answer. But instead, the shifts were often too sharp or too soft.

In one setting, the model’s final confidence in its initial answer dropped well below what a Bayesian model would predict. The study reported that “there was striking underconfidence in the Answer Hidden - Opposite Advice condition (OUCS = -0.30).” In contrast, showing the original answer inflated confidence even when no advice was given, with an “OUCS = 0.210”—meaning the model became more certain without new information.

Bias disappears when model thinks answer came from someone else

To explore whether this behavior was tied to self-consistency, the researchers ran another version of the test. This time, the visible answer was said to come from a different AI system. In that case, the choice-supportive bias disappeared. The authors concluded, “no choice-supportive bias was observed in this experiment,” suggesting that models treat their own prior answers as more trustworthy than those of others, even when both are technically the same.

Findings mirror some human traits but deviate in key areas

The combination of sticking with previous answers and becoming overly uncertain when challenged resembles some human cognitive patterns. However, the study found one major difference. Unlike people, who often show a confirmation bias by favoring information that aligns with their beliefs, LLMs showed the opposite behavior.

As the authors explained, “LLMs overweight opposing rather than supportive advice, both when the initial answer of the model was visible and hidden from the model.” This may reflect how these models are trained. Reinforcement learning with human feedback (RLHF), a common technique, might encourage them to be too quick to defer to others.

Implications for AI developers and product designers

These findings are important for anyone designing AI tools that involve multi-turn conversations or decision-making systems. If a model discards its initial reasoning too easily or becomes locked into a visible earlier choice, it can lead to inconsistent or misleading answers.

The researchers suggest that adjusting how memory and prior responses are handled could help. For example, summarizing past exchanges and removing specific attributions may reduce these biases. As the study notes, “the visibility of the LLM’s own answer inflates confidence and induces a marked reluctance to change mind.”

Understanding how confidence forms and shifts inside LLMs will be necessary for building systems that behave predictably under pressure, especially in complex or high-stakes environments.


Notes: This post was edited/created using GenAI tools. Image: DIW-Aigen.

Read next: WeTransfer Adjusts Terms After User Pushback Over AI Clause

Previous Post Next Post