Research By Top Minds Shows Flaws In Hate Speech Detection By AI

Researchers from across a myriad of different institutes and universities are coming together in an attempt to develop software that's more sophisticated in its attempts to identify hate speech online.

One would think that with the advent of PC (political correctness) culture, people would be more careful about online antics. Well, the disappointing answer to that is not at all. In fact, current social culture's showing an intense amount of friction as far as showing even a smidge of empathy goes. White supremacists have ransacked every online platform, despite the best efforts of moderators and developers. Twitter, in fact, has had a huge problem with Neo-Nazis constantly spamming the social media service. The (former) US president Donald Trump got kicked off of every available online platform for spreading hate rhetoric and actively supporting the recent Capitol invasion.

To top it off, algorithms developed by social media companies are having a rather hard time of tracking hate speech online. Rhetoric doesn't really confine itself to any particular set of words and phrases, as the human language's nuances and fluidity aren't really something machine learning can fully comprehend. There's also the problem of issues constantly evolving and changing, with new ones gaining relevancy.

The bright minds at Oxford, Utrech, Sheffield, and the Alan Turing Institute (named after a man who was himself subject to bigotry) have taken to with pride. Putting their heads together, they've developed a new benchmark for hate speech detection called HateCheck. Reviewing research related to the topic and conversing with NGOs from across the world, the benchmark's been used to highlight flaws in current detection algorithms.

In a quick summation of its abilities, HateCheck encompasses 29 modes that rigorously test algorithms to see at what level their checking stops. 18 of the modes cover hate speech for distinctive groups (women, minorities, and the like), and the remaining 11 test distinction between hate speech and normal speech that sound similar (I love *blank* vs I hate *blank*, as an example). Initial experiments tested two models developed under the neural network DistilBERT (labelled DB-D and DB-F), and the Perspective model developed by Google and Jigsaw, sister companies under their parent Alphabet.

Some glaring flaws were noted across the board, as the DB-D/F models showed bias and inaccuracies in dealing with certain groups (one showed 39.9% accuracy in women-related discourse, but only 25.4% against that of disabled individuals). Perspective showed confusion when dealing with spelling variations, such as too many spaces or inaccurately written phrases etc.

The researches went on to end the study on a few pointers. First, suggesting that AI utilize more and more expansive libraries to develop themselves, as opposed to just feeding off of old knowledge and reported comments. Secondly, developers were advised to be wary of flagging reclaimed phrases, such as the usage of the N word by black communities, as such action only further alienates and punishes these groups.


Read next: Researchers believe that curators show bias while training AI models for algorithm-generated art
Previous Post Next Post