Google's Voice Access Can Now Detect In-App Icons On Android Devices, All Thanks To Artificial Intelligence

Google has just updated the Voice Access and that means with the new version, Android users will now be better able to control their devices the way they like with the help of voice commands.

The new set up will take advantage of a more advanced level machine learning model that now holds the potential to automatically detect icons that are present on the screen with the help of UI screenshots. This then brings to the attention of Voice Access if elements like images and icons have accessibility labels or not.

For those of you who are unaware of the Accessibility labels, they allow the device’s accessibility services to target one on-screen element at a time while also making users know that a complete cycle of the UI has also been completed at the same time. Previously, there were some elements that did not have labels, and this is exactly what the new version of Voice Access plans to solve for the users.

There is a new vision-based object detection model called IconNet that has been incorporated in version 5.0 of Voice Access. It will now detect 31 different icon types and one can also expect the number to go up to 70 in times to come. While explaining the workings of IconNet, Google has said via a blog post that it has been structured on the basis of novel CenterNet architecture, which means that it first takes out the relevant app icons from input images and then look out for the exact locations and sizes.

Users, on the other hand, are only required to refer to icons via Voice Access and they will be directed to them by IconNet through names, e.g., “Tap ‘menu’.”

Google engineers have worked hard with IconNet as they began with collecting and labeling more than 700,000 app screenshots first. The team then streamlined the process with heuristics, auxiliary models, and data augmentation techniques - which turned out to be pretty effective in picking up rarer icons and enriching the existing screenshots with infrequent icons as well.

As a result, the IconNet is optimized in such a way that it will run smooth on-devices with mobile environments, compact size, and fast inference time while together offering a seamless user experience.

Google also has plans to increase the range of elements that will be supported by IconNet right from generic images and text to buttons as well. In the near future, the IconNet may also be able to differentiate between two or more similar-looking icons on the basis of their functionality.

On the developing forefront, Google will also increase the number of apps that have valid content descriptions as it will also offer much improved tools to rightly suggest the content descriptions for different elements when developers are working on to build their own applications.

Google first came out with the beta of Voice Access in 2016 along with other mobile accessibility efforts. Currently, the company is also working on Lookout which is an accessibility-centered app and can pinpoint packaged foods using computer vision, scan documents for you so it becomes easy to review letters and emails in one go, and much more.

We have also been hearing about Project Euphonia that will be dedicated to people with speech impairments, Live Relay, based on on-device speech recognition and text-to-speech so that phones can listen and speak on behalf of its owner in the near future, and also Project Diva, through which you can give Google Assistant commands without speaking anything.

Read next: How Does Facebook’s AI Determine A User’s News Feed?
Previous Post Next Post