Study Uncovers Gender-Based Disparities in Career Advice from Popular AI Models

Artificial intelligence tools designed to assist with career advice are covertly showing signs of gender bias, a new study finds. Popular chatbots and language models were tested using identical profiles, except for gender, and the results suggest women are being nudged toward lower salary expectations and less assertive choices in career planning.

Salary Suggestions Shift Based on Gender Alone

The researchers tested five widely used large language models (LLMs), including versions of ChatGPT, Claude, Qwen, Llama, and Mixtral. Each was asked to recommend a starting salary for a user preparing for a job interview. The user’s profile remained the same across all scenarios, same job title, location, education, and experience, except for a single detail: gender.


Even with all other inputs held constant, women were consistently advised to aim lower. In some cases, the suggested salaries diverged by tens of thousands of dollars. For example, in senior medical roles, Claude 3.5 recommended that men ask for $150,000, while women were told to request $100,000. Similar disparities appeared in engineering and law.

Out of 400 gender-based comparisons run by the researchers, over 27% showed statistically significant differences in pay advice. These were not isolated glitches. The patterns were strong enough to suggest that the bias is baked into the models themselves, shaped by the training data they were built on.

Career Advice Also Varies by Gender

Beyond salary suggestions, the models were asked to offer tips on workplace behavior, goal-setting, and role expectations. The tone and content of the responses shifted depending on whether the user was labeled as male or female.


Women were often encouraged to be more cautious, more agreeable, and less aggressive in negotiation scenarios. In contrast, male users received advice that leaned more assertive or confident. Even subtle cues, such as the level of encouragement given for career advancement, differed. These differences raise red flags, especially as AI assistants become default tools for job seekers.

Bias Runs Deeper in Economic Contexts

The study didn’t just explore surface stereotypes. It used three different types of experiments to test for hidden bias. When the researchers ran standard knowledge quizzes, results between men and women were mostly consistent. But when financial decisions were introduced, such as salary negotiation, the gaps widened fast.

This pattern repeated across other identities, too. For example, refugee applicants were offered lower salary suggestions than expatriates. People labeled as Hispanic or Black received smaller salary ranges compared to Asian or White applicants with the same profile. In fact, when the study combined personae with the lowest and highest salary suggestions, “female Hispanic refugee” vs. “male Asian expatriate”, the models showed bias in 87.5% of the scenarios tested.

These figures weren’t random. They were grounded in controlled, repeatable prompts. Each combination was run 30 times, and the results were averaged to reduce randomness and highlight consistent patterns in output.

Model Memory May Amplify Inequality

The researchers also pointed to a growing concern tied to memory-based AI. With newer LLMs increasingly retaining user history to personalize responses, biases may not require explicit input. If a chatbot remembers a user’s gender or background from previous interactions, it may adjust future advice automatically, reinforcing disparities without users even noticing.

That personalization, once pitched as a benefit, could turn into a structural flaw. Instead of offering objective support, AI systems might quietly tilt the playing field, especially in high-stakes areas like salary talks or job applications.

Technical Fixes Alone Won’t Solve It

While some improvements can come from tweaking prompts or filtering training data, the authors of the study say the solution needs to go deeper. They call for clear ethical standards, independent audits, and transparency in how these models are trained and evaluated. One-off technical patches, they argue, won’t be enough to stop systemic issues from surfacing again.

Debiasing techniques must be paired with policy. If models are being used to guide life decisions, like what salary to ask for or how to approach a job interview, they need to be held to higher standards. This is particularly important as LLMs become embedded in professional and educational tools.

A Quiet Signal with Loud Consequences

The report makes one thing clear: gender bias in AI doesn’t always shout, it often whispers. It shows up in subtle nudges, conservative estimates, and toned-down advice. But those quiet differences, if repeated over time, could shape careers, influence confidence, and widen existing gaps.

As people rely more on AI to navigate complex decisions, the advice these systems give will increasingly shape human behavior. And when two people with the same skills are told to ask for different salaries, that’s not just a data problem, it’s a fairness problem.

Notes: This post was edited/created using GenAI tools. Image: DIW-Aigen.

Read next: Analysis Reveals Generative AI May Save 12% of Economy’s Labor Time Through Task Acceleration

Previous Post Next Post