Hand Animating the Faces in The Last of Us Part I

Since announcing The Last of Us Part I, Naughty Dog has shared two side-by-side character comparison clips; first one of Tess and then one of Joel. These are great comparisons that emphasize the new character models, lighting, and environment detail. They look less shiny and young and more like 40ish-year-old smugglers in a post-apocalyptic Boston.

The game is using the exact same performances from the original game. The nuances gained in these new models and animations convey way more subtly. Watching these clips, I remembered that the original game, while using mocap for movement and performances, did not use facial capture. The faces were hand-animated.

“We don’t do facial capture. We don’t track eye movements on stage. It’s just the motion capture data. Everything that you see on the faces is hand keyed. You can see this is all her mo-cap data. And so when I am doing something like this I go back and forth to performances she was giving and I watch just this section over and over and over again.” – Marianne Hayden

My brain then wonders, “Is Naughty Dog re-hand-animating these faces?” Thankfully, Neil Druckmann addressed this during the reveal at Summer Game Fest.

“Yeah, actually we came up with a process were we could take the original animation that we did for the faces and kind of like retarget it on these new rigs that have a lot more fidelity. Animators went back and – (Geoff interrupts about side-by-side shots) – everything is rebuilt from the ground up. The same art director re-art-directed the whole thing from the ground up. But the great thing about these faces is that they’re closer to the original performance. All the animators went and studied those videos and got it closer to what you (Ashley Johnson and Troy Baker) did on set then we could have achieve before.”

To me, a person with zero programming experience, it sounds like the studio is taking the facial animation data from the PS3/4 game and pointing it to these new PS5 character models. My brain imagines that skeleton song: the PS3 eyes are connected to the PS5 eyes.

It’s much catchier in my head.

Then, just like they did in the early teens, the animators studied the facial reference footage of the performances to give us the results we see today. Wild to see techniques used two hardware generations ago be adapted to modern development practices, but still have that human touch of hand animation.

Another note on these particular comparison shots: They are presented at 720p resolution. That’s the native resolution of the PS3 version. Part I likely will have a native 4K output option – nine times the amount of pixels. Toss in inherent compression from web video and the PS5 shots here are being crushed from their native performance. The clips do say that the PS3 clip was captured on said console. I wonder how much we are missing solely from the compression.

One more fun tidbit: According to Naughty Dog Senior Editor Samuel Prince, any clips longer than seven seconds have to go through the ESRB.