Researchers have found that even the best Speech recognition systems are actually biased

The researchers from the University of Amsterdam, the Netherlands Cancer Institute and the Delft University of Technology, has founded that even the state of the art Automatic speech recognition (ASR) algorithm struggles while identifying the accents of people from different regions. According to these researchers, the ASR system for the Dutch Language can recognize speakers of specific age groups, genders and even the countries of origin much better than others.

This speech recognition has come a very long way since IBM developed and demonstrated the very first speech recognition system the Shoebox Machine back in early 1960s, which is also known as the forerunner of the modern world’s voice recognition system, and the launch of the famous Worlds of Wonder’s Julie Doll back in 1987. In spite of all the progress achieved with the help of Artificial Intelligence, the current speech recognition system are not upto the level and are also considered to be biased. A study , ordered by the Washington Post states that the famous smart speakers manufactured by the very own Google and Amazon, are predicted to not to be able to understand Non-American accents as compared to American accents.

Very recently, the Cambridge based advocacy organization, Algorithmic Justice League’s Voice erasure project discovered that the speech recognition system of the famous Tech Giants including Apple, Amazon, Microsoft, IBM and even Google itself has collectively secured a word error of about 35% for African American voice in comparison to just 19 % for White American voices.

The co-workers of this latest research has decided to check and investigate that how well a Dutch based Automatic Speech recognition system would do while recognizing speech from different groups of speakers. A series of experiments were to be carried out to examine if the Dutch ASR system is able to keep up with the diversity in speech with factors including gender, age and accent.

In the beginning the ASR system was fed some sample date from the CGN , a technique used to familiarize AI language models to the Dutch language. This CGN contains recordings of People ranging from 18-65 years old from different regions of Netherlands and Flanders of Belgium that covered speaking style of telephonic conversation as well as broadcast news. CGN has an astonishing 483 hours of speech time including 1678 men and 1185 women. But in order to make things more complex for the system, the co-author introduced the data augmentation technique to increase the training data hours up to nine fold.

When this test subject ASR system was run for a test drive from CGN the results showed that the female speech was more reliably identified as compared to males regardless of their speaking style. In addition to this the system had a hard time recognizing speech from older people as compared to young ones mainly because the younger People were more articulated. And it seemed to be easy to identify native speech as compared to non native speaker. The Dutch children had an error rate of about 20% which was better than the non-native age group.

Researchers believe that the only way to lessen this speech bias issue is possible at Algorithmic level and cannot be done in datasets. They concluded this study on a piece of paper by adding that the direct bias mitigation will concern the diversifying and aiming for a balanced representation in datasets whereas an indirect bias mitigation technique will deal with a variety including age, region, gender and more. And both of them working together can make it easier to ensure a more developing environment for the automatic speech recognition (ASR).

Photo: Marko Geber / Getty Images

H/T: VB.

Read next: Tech is changing dramatically fast, almost a third of the consumers feel that they can’t keep up with the speed of technology
Previous Post Next Post