Deep Fake Speeches are Undetectable by Humans

Artificial intelligence has changed the landscape of how we perceive things; from making better life decisions for humans to better risk management and future prediction to making in-depth analytical assumptions and taking businesses to new heights, AI is capable of doing it all. For the most part, people have been talking about how the impact of AI in their life is highly beneficial. Still, at the same time, in the background, some voices from tech experts regard the ethical concerns of more and more power being taken by AI in navigating our lives.

Among AI's most prominent ethical concerns and risks is the advancement in deep fake technology, which is algorithmic systems to mimic a natural human voice or physical appearance. Deep fakes are made using trained data sets of machine learning. They use the data sets to learn similar behavior patterns that impersonate human voice or bodily features so accurately that it almost resembles an original sound and video appearance.

Research done by the University of London researchers found that at this moment, humans can only detect 73% of the generated Deep Fake speeches for English and Mandarin.

At the initial stages of deep fake development, thousands of hours of samples were required to generate a similar quality of original audio. Through the working principles of machine learning, the more data is fed to the system, the better it will generate more precise results in less time. Therefore latest deep fake algorithms are not about producing the same person’s voice with relatively high accuracy by just using a three-second audio clip. Now because of the availability of open sources, people can now seek expert help and feasibility to train the algorithms in only a few days.

Mega tech companies like Apple have made official statements about the release of AI-driven softwares for its iPhone and iPad products that will be able to create an accurate copy of a person’s voice with only 15 minutes of authentic recordings. The research team at the University of London generated 50 deep fake voices using tts algorithms (text-to-speech). The ai models were trained on public datasets available on the internet, as one model was for the language of English and the other one was for Mandarin.

The samples produced by deep fake and authentic samples were later played to 529 participants to test if they could identify the difference between the actual and fake recordings. The results concluded that only 73% of the fake speech recordings and the percentage of 73% improved only a little when the participants received appropriate training on the identification and the aspect of spotting deep fake speeches.

Kimberly Mai, a computer scientist at the University of London and the author of another prominent study on deep fakes, said that the findings show that humans cannot identify profound fake speeches. The further shocking thing was the revelation that the samples of deep fake that most people are not able to locate were trained on the old machine learning algorithms, and now it raises a more thought-provoking question if the deep fake speeches are made via the newer and more precise machine learning algorithms will the humans be able to detect them?

The advancing capabilities of deep fake to produce more accurate recordings are highly concerning, mainly falling into the hands of criminals as they can use such technology to harm people immensely. In 2019 an incident happened when a criminal used a deep fake recording to convince the CEO of a British energy company to send thousands of pounds to the wrong supplier by using the fake voice of his boss.

Deep Fake Speeches are Undetectable by Humans

You might like