Latest Deepfake Technology Create More Convincing Videos Based on Audio Source Than Ever Before

A video of a woman appeared where she first said “Knowledge is one thing, virtue is another” and then she said, “Knowledge is a virtue”. The same voice with two conflicting videos is the work of the artificial intelligence (AI) system. She only said the first statement whereas in the second, the audio was turned into video through AI technology.

Researchers from three different institutes, Nanyang Technological University in Singapore, National Laboratory of Pattern Recognition in China and SenseTime, gathered to create the new method of deepfakes from audio sources.

AI joins together the audio with the video of some other person or the same person speaking it and develops a convincing video. The person shown in the video is like a puppet speaking the words of the audio source.

Firstly, a 3D face model is created on every frame of the video to analyze the geometry, pose and expressions of the person in the video. Then from it, the 2D landmarks of the face are created in which the mouth movements are focused. By doing so, the algorithm is to be trained on facial features only instead of the whole scene.

Then the 3D face mesh is reconstructed to match the movement of lips with the source audio clip, just like the text-to-video method.

Also read: Deepfake Round Table Held by Collider Included Tom Cruise, Jeff Goldblum and Robert Downey Jr

According to researchers, highly deceptive audio-visual results can be achieved through this method. The results are said to be better than Face2Face from 2016 and “Synthesizing Obama” from 2017 with only little artifacts that are visible through a normal human eye.

An online poll was conducted in which 55% of the 100 participants rated the video as real.

It is the first end-to-end learnable audio-based video editing method. The video can be doubtful only if the voice of the audio is inconvincible. AI engineers and deepfake developers have been working on fake audio for years and the voice generated through algorithm can sound quite realistic.

Through this, the real voice of a person has used the words are molded as required which created a more realistic deepfake, a step forward in deepfake realism.

Bulletin of the Atomic Scientists, agents of Doomsday Clock, mentioned deepfake as one of the reasons that the world is about to end. It said videos generated through algorithm can make it difficult for common people and the decision-makers to distinguish truth from fake.

Researchers in their paper highlighted that they are aware of the misuse and manipulation of this advancing technology. Therefore, they support all the measures that must be taken in order to minimize potential exploitative practices.

Read next: New Facebook AI Could Revolutionize Point to Point Robot Travel Accuracy

Latest Deepfake Technology Create More Convincing Videos Based on Audio Source Than Ever Before

Aqsa Rasool

You might like