Facebook Provides More Insight into Hyper-realistic Virtual Avatars through a new Research Paper!

Facebook’s AR/VR R&D group Facebook Reality Labs recently provided more insight into how it would implement hyper-realistic real-time virtual avatars. A system capable of animating virtual avatars in real-time with exceptional fidelity from compact hardware has been devised.

In the headset, three cameras have been placed for capturing user’s eyes and mouth and through this setting, the particular user’s complex face movements will be better represented than they were through the earlier methods. The main specialty of the new method is how the incoming images are used to craft a user’s virtual representation.

The method is dependent on computer vision and machine learning. According to one of the authors who published the detailed research on the system, it runs live in real-time and functions for different expressions including puffed-in cheeks, biting lips, etc. Minute details like pimples are also accounted for in this method. A technical video to explain the new solution was released by Facebook Reality Labs.

A full research paper was published as well which explains the methodology and math behind the new system. The work is being referred to as “VR Facial Animation via Multiview Image Translation” and authored by various renowned personalities including Shih-En Wei, Jason Saragih and others.

In the paper, it is mentioned why there was a need to create two different experimental headsets (1. Training and 2. Tracking).

The larger Training headset comprises of nine cameras to capture a wider range of details of the user’s face and eyes. This makes it easier in finding correspondence between the incoming image and the digital scan of the same user captured beforehand. The process is automatically found via self-supervised multiview image translation.

After the correspondence is established, the “Tracking” headset comes into play. Its three cameras are positioned in such a way that they reflect three of the nine cameras on the “Training” headset and this results in an accurate avatar representation.



The paper stresses upon the accuracy of the system, drawing comparisons against the previous methods. Capturing facial expressions and the correspondence between what the eyes and mouth are doing are surely impressive. What’s even more impressive is that a user whose face is mainly hidden by the headset gets an accurate view of their face rebuilt through very close camera shots.

Although the solution sounds ideal, there are still a lot of factors pulling it from rolling out to general public. The need for an initial scan of the user as well as to use the “Training” headset would require the establishment of “scanning centers”. However, VR isn’t a key factor in communication (as of yet) and thus, such centers aren’t practical. Still, it’s better to hope for the best, especially with the advancements in automatic correspondence and sensing tech.
Previous Post Next Post