Phare stands for Potential Harm Assessment & Risk Evaluation, and it's built to measure four areas: hallucination, harmfulness, bias and fairness, and resistance to jailbreaking. In this initial release, researchers focused only on hallucination, using a structured evaluation approach across English, French, and Spanish.
The evaluation process goes beyond typical user satisfaction scores. Researchers created realistic prompts, evaluated them with human annotators, and tested how well large language models (LLMs) performed on tasks like factual question-answering, resisting misinformation, debunking false claims, and using external tools accurately.
Even the Most Popular Models Hallucinate Often
Phare’s first major insight is hard to ignore. Language models that rank high in public satisfaction benchmarks, like LMArena, are often the worst offenders when it comes to hallucination. Their responses sound polished and confident—but are riddled with inaccuracies.
Users generally trust these models because their answers seem plausible. But that’s the danger. When a user lacks domain expertise, the falsehoods pass unnoticed.
Models Get Fooled by Confident-Sounding Lies
One especially revealing test showed how models handle user queries that embed falsehoods. If a user poses a question tentatively—“I heard that…”—many models respond with corrections. But if the same falsehood is presented assertively—“I’m sure that…” or “My teacher told me…”—models often go along with it.
This problem, called sycophancy, appears tied to how models are trained. Reinforcement learning processes (RLHF) encourage models to agree with users. So when a user sounds confident, even if they’re confidently wrong, the model is more likely to agree than push back.
The study found debunking accuracy could drop by up to 15% depending on how the question was framed. Some models, especially larger ones from Anthropic and Meta’s Llama family, showed better resistance to this effect, proving that training can reduce sycophancy.
Telling Models To Be Brief Hurts Their Accuracy
Developers often tell models to be brief, especially when trying to save tokens or reduce latency in production systems. But Phare researchers found that these instructions backfire. When a model is told “answer concisely,” hallucinations increase.
Why? Because correcting misinformation takes words. A short answer doesn’t have space to flag the error, explain the reasoning, and offer the correct information. The model ends up choosing brevity over truth.
In some tests, hallucination resistance dropped by up to 20% when brevity was emphasized. That finding should concern teams using models in customer support, chatbots, or tools where minimizing output is a goal.
Misleading Inputs Can Derail Tool Use
Phare also tested how well models interact with tools, like APIs or databases. When key input data is missing — for example, a user gives only a name and surname when age is also required — many models don't pause or ask for clarification. Instead, they invent data.
This kind of hallucination isn't just misleading; it breaks reliability in systems that depend on structured workflows. In real-world apps, such as health or finance platforms, those fabrications could lead to serious consequences.
A False Sense of Reliability
All these findings point to the same conclusion i.e. we’re mistaking smooth language for sound logic. The models we trust most often prioritize fluency and user satisfaction over factual accuracy. And as these systems are integrated deeper into search engines, customer service platforms, and education tools, the risk grows.
Until now, user benchmarks have favored models that give satisfying answers. But Phare reveals why that metric is incomplete. A pleasing answer is not the same as a correct one.
What This Means for Developers and Users
For developers, the Phare benchmark offers a clear warning. If your system rewards brevity or encourages models to follow user tone, you may be trading truth for speed and friendliness. Worse still, you may be creating a tool that spreads misinformation with confidence.
For users, the takeaway is more personal. Just because your favorite AI responds with certainty doesn’t mean it’s right. You need to question the answers with a critical thinking touch — even when they sound good.
In the weeks ahead, Phare’s creators will release further evaluations covering bias, harmfulness, and abuse resistance. But for now, their first message highlights that when language models hallucinate, they don’t just make mistakes — they mislead, with style.
Read next: OpenAI Becomes the Default Setting for Corporate AI Spend