New Study Says Understanding The Background Of Annotators Who Record Data Is Crucial For Reliable AI Model Creation

A long list of social media firms is on the rise that utilizes the most complex algorithms as well as AI for the detection of offensive behavior of users on the internet.

But both these kinds of technologies put heavy reliance on data to determine what exactly is offensive. Now the question is who happens to be behind such datasets and does an investigation regarding their backgrounds actually affect their decisions?

You’ll be amazed that a new research managed to cover just that topic and it was shown that knowing more about those recording the data is super important and can drastically affect the outcome. Similarly, it’s also been shown to be a key part of building the most reliable AI models as proven by the University of Michigan School of Information’s assistant professor.

They found how the background of those that label texts, the online data, and videos really do play an integral part in the outcome and it’s high time people understood that before it got too late.

Things such as their life journey, demographics, as well as background are crucial. They contribute so much to things like labeling data as well. Only when you attain a balanced group of crowd workers can one expect to limit bias for the research’s datasets?

After analyzing close to 6,000 comments on the Reddit app, it was shown how the beliefs of those recording the data can affect which learning models are used to flag which content on the web is appropriate to be seen every day.

Remember, anything that’s regarded as polite by a certain sector of the population may be deemed less polite to another.

The researchers noted how various types of AI systems from all over the globe make use of this type of data and such studies can assist outline the significance of realizing who labels the data.

For instance, if only the data is labeled by people of a certain population, the AI model doesn’t stand for the viewpoints of the common individual.

So the research really got into the details of gauging the differences among annotator identities and how life journeys and experiences can affect their decision. In the past, researchers only looked at a single aspect linked to identity such as gender.

Hence, the authors of the study mentioned how their aim was now to assist AI models become a better representation of the beliefs as well as the thoughts of the majority and not just a selected few.

The findings of this study proved that no major difference that men and women would give out different ratings, although previous studies did make such suggestions. But one point worth noticing is how those that were non-binary opted to rate things in a less offensive than the rest.

Next, those that were above the age of 60 tend to produce more offensive scores than those that are middle-aged.

There were also major differences in terms of racial factors that affected the ratings. Those who were Black rated it with a slightly higher degree of offense than the rest.

Last but not least, there were no major changes that were found linked to the education of the annotator.

Hence, after looking at the responses and results of this study, we can see how important it is to account for such differences, or else the end product would be marginalizing some important groups in society and not getting results that are a true representation of the population.

AI models are only as good as the data they are trained on. It is important to consider the biases and perspectives of the people who moderate the data when building AI models to ensure that they are reliable.

Read next: New Alert Issued As AI Can Now Decode Users’ Keystrokes While Typing Sensitive Data During Calls
Previous Post Next Post