How AI bias can creep into online content moderation

A University of Queensland study has shown Large Language Models (LLMs) used in AI content moderation may be prone to subtle biases that undermine their neutrality.

A team led by data scientist Professor Gianluca Demartini from UQ’s School of Electrical Engineering and Computer Science used persona prompting to test the tendency of AI chatbots to encode and reproduce political biases, and found significant behavioural shifts.

The research team asked six LLMs – including vision models – to moderate thousands of examples of hateful text and memes through the lens of different ideologically diverse AI personas.

Professor Demartini said the exercise revealed that AI political personas, even without significantly altering overall accuracy, were prone to introducing consistent ideological biases and divergences in chatbot content moderation judgments.

“It has already been established that persona conditioning can shift the political stance expressed by LLMs,” Professor Demartini said.

“Now we have shown through political personas that there is an underlying risk that LLMs will lean towards certain perspectives when identifying and responding to hateful and harmful comments.”

“It demonstrates a need to rigorously examine the ideological robustness of AI systems used in tasks where even subtle biases can affect fairness, inclusivity and public trust.”

Image: Emma Ou / Unsplash

The AI personas used in the study were from a database of 200,000 synthetic identities ranging from schoolteachers to musicians, sports stars and political activists.

Each persona was put through a political compass test to determine their ideological positioning, with 400 of the more ‘extreme’ positions asked to identify hateful online content.

Professor Demartini said his team found that assigning a persona to an LLM chatbot altered its precision and recall in line with ideological leanings, rather than change the overall accuracy of hate speech detection.

However, the team found LLMs – especially larger models – exhibited strong ideological cohesion and alignment between personas from the same ideological ‘region’.

Professor Demartini said this suggested larger AI models tend to internalise ideological framings, as opposed to smoothing them out or ‘neutralising’ them.

“As LLMs become more capable at persona adoption, they also encode ideological ‘in-groups’ more distinctly,” Professor Demartini said.

“On politically targeted tasks like hate speech detection this manifested as partisan bias, with LLMs judging criticism directed at their ideological in-group more harshly than content aimed at their opponents.”

Professor Demartini said larger LLMs also displayed more complex patterns, including a tendency towards defensive bias.

“Left personas showed heightened sensitivity to anti-left hate, and right-wing personas were more sensitive to anti-right hate speech,” Professor Demartini said.

“This suggests that ideological alignment not only shifts detection thresholds globally, but also conditions the model to prioritise protection of its ‘in-group’ while downplaying harmfulness directed at opposing groups.”

Researchers said the project highlighted that it was crucial for high-stakes content moderation tasks to be overseen by neutral arbiters so that fairness and public trust is maintained and the health and wellbeing of vulnerable demographics is protected.

“People interact with AI programs trusting and believing they are completely neutral,” Professor Demartini said.

“But concerns remain about their tendency to encode and reproduce political biases, raising important questions about AI ethics and deployment.

“In content moderation the outputs of these models reflect embedded ideological biases that can disproportionately affect certain groups, potentially leading to unfair treatment of billions of users.”

PhD candidates Stefano Civelli, Pietro Bernadelle and research assistant Nardiena Pratama collaborated on the study.

The research is published in Transactions on Intelligent Systems and Technology.

This article is republished from The University of Queensland under a Creative Commons license. Read the original article.

Reviewed by Irfan Ahmad.

Read next:

• From floppy discs to Claude Mythos, how ransomware grew into a multibillion‑dollar industry

• Single-minded pursuit of profit can get firms in trouble. Same thing with AI

• Why April 23 Is Email Day: 25 Fascinating Names for the @ Sign Worldwide

How AI bias can creep into online content moderation

You might like