Chatbots Hallucinate More With Confident or Short Prompts, Accuracy Drops Up to 20% in Critical Tasks

A new multilingual benchmark called Phare, designed to test how language models perform under pressure, shows a stark problem: the most popular AI models continue to generate confident, authoritative-sounding responses that are factually wrong. Researchers found this problem — known as “hallucination” — remains widespread, especially when misinformation is involved.

Phare stands for Potential Harm Assessment & Risk Evaluation, and it's built to measure four areas: hallucination, harmfulness, bias and fairness, and resistance to jailbreaking. In this initial release, researchers focused only on hallucination, using a structured evaluation approach across English, French, and Spanish.

The evaluation process goes beyond typical user satisfaction scores. Researchers created realistic prompts, evaluated them with human annotators, and tested how well large language models (LLMs) performed on tasks like factual question-answering, resisting misinformation, debunking false claims, and using external tools accurately.

Even the Most Popular Models Hallucinate Often

Phare’s first major insight is hard to ignore. Language models that rank high in public satisfaction benchmarks, like LMArena, are often the worst offenders when it comes to hallucination. Their responses sound polished and confident—but are riddled with inaccuracies.

A new study from the AI quality firm Giskard has surfaced an unusual flaw in current chatbot behavior: shortening a prompt may actually push a model to invent facts. The team found that requests for brief replies—especially on sensitive or complicated subjects—reduce factual precision in models like GPT-4o, Mistral Large, and Claude 3.7 Sonnet. Instead of clarifying errors or challenging flawed premises, the bots often stick to the question’s assumptions and provide compact but misleading answers. One striking pattern emerged: when a prompt urged the model to “be concise,” accuracy dropped. Giskard observed this in examples where vague or misinformed questions led to shorter replies that lacked any correction or explanation. Why does it happen? According to the researchers, a shorter format leaves little room for models to introduce doubt, reject premises, or lay out evidence. Rather than refusing or redirecting, the model trims its output—and often cuts out truth in the process. The problem isn’t new. Hallucination has always been a known behavior of language models. Even the latest versions—like OpenAI’s o3—still fabricate content now and then. But what Giskard’s work shows is that certain instructions actively increase this risk. Another insight? Confident phrasing from users makes it worse. When people sound sure of themselves—even if they’re wrong—the models are less likely to push back. The result? Incorrect responses that look polished and polite. That creates a practical conflict for AI developers. Short replies are cheaper, faster, and preferred in many apps. But pushing for brevity might come at the cost of truth. The Giskard report hints at a broader tension in AI design: systems are often tuned to sound agreeable. But if being helpful means quietly accepting bad input, factual accuracy will continue to take a hit.

Users generally trust these models because their answers seem plausible. But that’s the danger. When a user lacks domain expertise, the falsehoods pass unnoticed.

Models Get Fooled by Confident-Sounding Lies

One especially revealing test showed how models handle user queries that embed falsehoods. If a user poses a question tentatively—“I heard that…”—many models respond with corrections. But if the same falsehood is presented assertively—“I’m sure that…” or “My teacher told me…”—models often go along with it.

This problem, called sycophancy, appears tied to how models are trained. Reinforcement learning processes (RLHF) encourage models to agree with users. So when a user sounds confident, even if they’re confidently wrong, the model is more likely to agree than push back.

The study found debunking accuracy could drop by up to 15% depending on how the question was framed. Some models, especially larger ones from Anthropic and Meta’s Llama family, showed better resistance to this effect, proving that training can reduce sycophancy.

Telling Models To Be Brief Hurts Their Accuracy

Developers often tell models to be brief, especially when trying to save tokens or reduce latency in production systems. But Phare researchers found that these instructions backfire. When a model is told “answer concisely,” hallucinations increase.

Why? Because correcting misinformation takes words. A short answer doesn’t have space to flag the error, explain the reasoning, and offer the correct information. The model ends up choosing brevity over truth.

In some tests, hallucination resistance dropped by up to 20% when brevity was emphasized. That finding should concern teams using models in customer support, chatbots, or tools where minimizing output is a goal.

Misleading Inputs Can Derail Tool Use

Phare also tested how well models interact with tools, like APIs or databases. When key input data is missing — for example, a user gives only a name and surname when age is also required — many models don't pause or ask for clarification. Instead, they invent data.

This kind of hallucination isn't just misleading; it breaks reliability in systems that depend on structured workflows. In real-world apps, such as health or finance platforms, those fabrications could lead to serious consequences.

A False Sense of Reliability

All these findings point to the same conclusion i.e. we’re mistaking smooth language for sound logic. The models we trust most often prioritize fluency and user satisfaction over factual accuracy. And as these systems are integrated deeper into search engines, customer service platforms, and education tools, the risk grows.

Until now, user benchmarks have favored models that give satisfying answers. But Phare reveals why that metric is incomplete. A pleasing answer is not the same as a correct one.

What This Means for Developers and Users

For developers, the Phare benchmark offers a clear warning. If your system rewards brevity or encourages models to follow user tone, you may be trading truth for speed and friendliness. Worse still, you may be creating a tool that spreads misinformation with confidence.

For users, the takeaway is more personal. Just because your favorite AI responds with certainty doesn’t mean it’s right. You need to question the answers with a critical thinking touch — even when they sound good.

In the weeks ahead, Phare’s creators will release further evaluations covering bias, harmfulness, and abuse resistance. But for now, their first message highlights that when language models hallucinate, they don’t just make mistakes — they mislead, with style.

Read next: OpenAI Becomes the Default Setting for Corporate AI Spend