Meta’s Latest AI Dataset Vows To Enhance The Performance Of ASR Tools

Meta’s newest AI system is making some big promises and it’s linked to enhanced performance of ASR (automatic speech recognition) tools.

For those who might not be aware, it’s the year 2023 but most mobile assistants are still having a tough time in terms of having trouble hearing and speaking back. It’s almost as if they are outdated and using models that are decades old.

But you can finally breathe a sigh of relief after tech giant Meta’s announcements today. It claims that automatic speech recognition systems are going to be the best thanks to clustering speech technology.

For a while now, we’ve been hearing more about how Meta has tried on and off to better the performance of the tools and assisted with training without making use of any data scripts. At the same time, the company claims to be giving it the unique capability of analysing nearly 4000 different languages and also carrying out lip reading at a fast pace than what humans are capable of doing.

But one issue did arise including how most datasets tend to be classically trained using the likes of ASR models who are lined up as per demographics like age, sex, and nationality. And at the end of the day, this just limits pronunciations to a great degree and also puts a barrier in terms of the functions carried out against a cross-section of people.

Therefore, to combat such issues, there was an entire dataset that was created. It relies upon clustering techniques and won’t be creating any divisions depending on things like demographics.

Models would then be trained depending on the clusters outlined and they would make use of fairness datasets to gauge how a model affects the outcome against several types of demographic subsets.

Today, we’re bound to see a dataset that has nearly 27K commands being collected through 600 American volunteers. The utterances outlined have to do with alerts, texts, music, calls, and also captures among others. After that, researchers could use it to train models and other assistants using the same medium.

Coming down to what commands were generated concerned, speakers were requested to talk about voice search for some tunes or even allowed to conduct plans alongside pals when they happen to be determining the location for a meeting.

To better gauge such systems, the tech giant underwent training on public models and videos found across the Facebook app. And that’s when scientists tested it by making use of two datasets.

While the results at the start were certainly very positive, we did see more improvements arise with time. These had to do with demographic groups across the dataset. As a whole, they witnessed an ASR performance rise of 10% through the clustering technique. There were some huge gains pulling through from a 66 to 85 age group and that was extremely underrepresented.

Meta added that its long-term goals at the moment had to do with better focus on the world of AI and this is just one step closer to that.


Read next: Meta's Blue Tick Bonanza: Paying for Perceived Popularity
Previous Post Next Post