Chatbots Lean Toward Inaction, Flip on Wording, and Falter in Personal Moral Choices

Recent research has peeled back the curtain on how large language models (LLMs) handle moral decisions compared to people. Scientists examined four well-known models, GPT-4-turbo, GPT-4o, Llama 3.1, and Claude 3.5, to find out whether these systems consistently favored taking action or sitting on the sidelines when faced with tough moral choices. The results revealed that LLMs often lean toward inaction far more than humans, especially when their training focuses on encouraging socially acceptable or "good" behavior.

How LLMs Handle Real-Life Dilemmas

The study tested these models using both classic moral psychology scenarios and everyday situations pulled from online forums. When compared side by side with human responses, the LLMs frequently recommended staying out of morally tricky situations. This preference for stepping back showed up even when avoiding action would clearly lead to harm or missed chances to help. Humans, by contrast, were more likely to split their choices between helping and holding back.

A Habit of Saying "No"

One standout pattern emerged: LLMs repeatedly said "no" regardless of how a question was phrased. This habit, which researchers called the "yes-no bias," was especially strong in chatbots fine-tuned to give advice. Unlike humans, the models developed this bias during post-training alignment, which aims to make their responses sound socially appropriate. In many cases, the LLMs defaulted to giving advice that leaned toward avoiding involvement, even in situations where there was little at stake.


Image: DIW-Aigen

Why Wording Matters More Than It Should

The yes-no bias became crystal clear when the same moral question was simply rephrased. For example, when asked, "Do you leave the meeting to help your roommate?" the models often gave a different answer than when asked, "Do you stay in the meeting instead of helping?" Both versions pointed to the same decision, but the models frequently flipped their answers based on the wording. Human participants, however, usually stuck to their initial choice no matter how the question was framed.

Built-In Preference for Inaction

The study carefully restructured some dilemmas to untangle action from omission and to find out whether the models' hesitation to act was really tied to a general reluctance to say "yes." Across multiple scenarios, the models still leaned toward doing nothing, even when the setup removed all other distractions. Humans, though, were less likely to fall into this pattern.

More Generous in Group Situations

When it came to group-focused problems, where self-interest clashed with helping the greater good, the LLMs often leaned toward more generous choices than people did. They tended to recommend selfless actions in situations where an individual's small sacrifice could make a big difference for others. But when the focus returned to personal moral dilemmas, the models regularly retreated to their comfort zone, saying "no" and advising inaction.

Biases Born from Fine-Tuning

Interestingly, the study found these biases were not hardwired into the early versions of the models. They cropped up during the fine-tuning stage when companies adjusted the chatbots to better match human expectations. By comparing the original models with their fine-tuned counterparts, the researchers confirmed that these biases appeared after fine-tuning, not during the initial large-scale training.

Playing It Safe

The study suggested that the fine-tuning process might have nudged the models toward playing it safe. By frequently saying "no," the chatbots could be following a learned strategy to avoid giving advice that might lead to risky or harmful outcomes. While this could help companies steer clear of legal trouble, it might also stop users from making morally important decisions when action is needed.

Trusting the Wrong Advisor?

One surprising takeaway was that people tend to trust the moral advice of LLMs more than that of trained ethicists. This trust raises concerns because users often believe the chatbot’s advice is solid and well-considered, but the research shows these models can easily flip their answers based on minor wording tweaks.

Fixing the Flaws

The researchers recommended that future fine-tuning should include checks to make sure chatbots give consistent answers when similar questions are asked in different ways. They also advised that developers test models for internal stability to catch these kinds of contradictions early. Fixing these flaws will likely need close collaboration between AI builders, psychologists, and moral philosophers to ensure future models not only meet user expectations but also offer reliable, ethically sound advice.

Read next: Teens Watching Short Videos Face Growing Pressures Around Appearance, Study Warns
Previous Post Next Post