flexible expressions could lift 3d-generated faces out of the uncanny valley

Creating 3D-rendered faces is now commonplace in major film and game productions, but accurately capturing and animating these faces to appear natural presents significant challenges. Disney Research is actively developing solutions to streamline this process, including a machine learning tool designed to simplify the generation and manipulation of 3D faces while avoiding the unsettling effect known as the uncanny valley.
The technology behind digital faces has advanced considerably since the days of stiff expressions and limited detail. Today, high-resolution, believable 3D faces can be animated efficiently; however, replicating the nuances of human expression remains difficult. The sheer variety of expressions, combined with our sensitivity to even slight inaccuracies, makes achieving realism a complex undertaking.
Consider how a person’s entire facial structure shifts when they smile – it’s a unique characteristic for each individual, yet we instinctively recognize genuine smiles versus those that are forced. Replicating this level of detail in an artificial face is a key hurdle.
Current “linear” modeling techniques simplify the complexities of expression, allowing for adjustments to emotions like “happiness” or “anger,” but often at the expense of precision. While they may not perfectly represent every possible facial configuration, they can inadvertently create anatomically impossible expressions. More recent neural models learn from the relationships between expressions, but their internal processes are often opaque and difficult to control. Furthermore, their effectiveness may be limited to the faces they were trained on, and they may not provide the degree of control needed by artists in film or gaming, potentially resulting in faces that appear subtly, but noticeably, unnatural.
Researchers at Disney Research propose a new model that combines the strengths of both approaches – a “semantic deep face model.” Without delving into the specific technical details, the core innovation lies in its ability to function as a neural model that understands how a facial expression impacts the entire face, without being tied to a specific individual. Importantly, it utilizes a nonlinear approach, enabling flexibility in how expressions interact with facial geometry and one another.
To illustrate, a linear model allows you to apply an expression, such as a smile or kiss, on a scale of 0 to 100 to any 3D face, but the outcome may lack realism. A neural model can apply a learned expression realistically on a scale of 0 to 100, but only to the face it was trained on. This model can smoothly apply an expression from 0 to 100 to any 3D face. While this is a simplification, it conveys the central idea.
Image Credits: Disney ResearchThe potential applications are substantial. It becomes possible to generate a multitude of faces with varying shapes and skin tones, and then animate them all with the same expressions without additional effort. This could lead to the creation of diverse, computer-generated crowds with just a few clicks, or game characters exhibiting realistic facial expressions regardless of whether they were meticulously crafted by hand.
This technology is not a complete solution, but rather one component of the ongoing advancements being made by artists and engineers across various industries. Markerless face tracking, improved skin deformation, and realistic eye movements are just a few of the other important areas of development.
The Disney Research paper was presented at the International Conference on 3D Vision and is available for review here.