Researchers Discovered That Their State of Art Computer Vision AI Can Be Fooled Rather Easily. Here is Why

The researchers at OpenAI have found that their computer vision system which is a state of art for the company can be fooled by basic human inventions like a pen and a paper. A technology made so advanced which can detect so many things similar to the working of a human brain was fooled by an experiment conducted by the company. The company researchers took a text written paper and placed it on top of another object and that can be enough to trick the software into misidentifying what it sees. For example, the company wrote on a piece of paper “iPod” and then stuck it on an apple and the Al technology detected the image as an iPod instead of an apple with a rate of 99.7 percent.

The OpenAI researchers in a blog post said that they consider these mishaps as “typographic attacks.” This mishap is due to the exploitation in the model’s ability to read text robustly that hand written texts can fool it into thinking that it is that particular object and said that such attacks and mishaps are similar to adversarial affects.

Adversarial images are those images that are a threat to machines which work in artificial intelligences. Similar to this, researchers gave an example of a Tesla who’s self-driving mode can be potentially dangerous because it can change lanes itself depending on the signs it sees on the roads. One wrong detection can lead to a fatal crash.

However, there's nothing to worry about at the moment as the AI of OpenAI software is not present in any commercial product which can be in use of the public. Though the company does know that nature of (Contrastive Language–Image Pre-training) CLIP’s unusual machine learning architecture created the weakness that enables this attack to succeed.

The company said that the CLIP has been fed over 4 million of texts and images from the internet for it to detect signs and signals without human guidance so the company opened up CLIP to see its internal working and published a paper this month. The company upon opening CLIP discovered individual components in the machine learning network that respond not only to images of objects but also sketches, cartoons, and associated text which they now call “multimodal neurons.” The company finds these multimodal neurons interesting because they mirror the exact same way a single brain cell works inside a human brain where it can observe and respond to abstract concepts rather than specific examples but while any human being can tell you the difference between an apple and a piece of paper with the word “apple” written on it, software like CLIP can’t.

The researchers also found that the multimodal neurons have been encoded with text and images from the internet and therefore CLIP’s multimodal neurons encoded exactly the sort of biases you might expect to find when sourcing your data from the internet. One such example the company gave was that note that the neuron for “Middle East” is also associated with terrorism and discovered “a neuron that fires for both dark-skinned people and gorillas.” This replicates an infamous error in Google’s image recognition system, which tagged Black people as gorillas. This is another example that how much AI intelligence differs from human intelligence and why we should have complete knowledge of the working of AI of certain products before we introduce them in commercial products as even a minor error left behind could lead to huge fatalities because AI is no way in near similar to actual human intelligence.

Researchers Discovered That Their State of Art Computer Vision AI Can Be Fooled Rather Easily. Here is Why

Arooj Ahmed

You might like