AI Debiasing Revolution: Brock University Unveils a Game-Changer

Brock University has come up with a new way to check if AI language models, like ChatGPT, are free from bias and can tell the difference between appropriate and inappropriate content they generate.

A study led by Robert Morabito, a student at Brock University, and Assistant Professor Ali Emami, along with Jad Kabbara from MIT, was published in the Findings of the Association for Computational Linguistics: ACL 2023. They looked at how to make sure AI systems don't produce content that's not suitable for marginalized groups.

This research holds significance because AI systems can exhibit biases related to variables such as race, gender, or age when conducting online information searches.

One common method to fix this bias is called Self-Debiasing. It identifies and removes toxic or inappropriate language. However, the research team found that Self-Debiasing doesn't work well when you give it different instructions. For example, if you tell it to be nice and then later tell it to be rude, it should change its language, but it doesn't. This means it's not very reliable.

To improve this, the team created a checklist with three tests. The first test checks if the AI system understands when it's told to be different things, like nice and then rude. If it doesn't change its language, it fails the test. The second test checks if the AI understands specific instructions and changes its language accordingly. If it doesn't, it fails this test too. The final test sees if these trends hold when you give the AI a regular, everyday prompt.

The team also came up with a new method called Instructive Debiasing. This method gives the AI a prompt and tells it to act in a certain way, like being polite, for that prompt. This method is easy to use and is a good way to check if the AI is working properly.

The researchers hope that their work inspires more research and becomes a standard for evaluating AI models to make them more fair and reliable.

