Performance Art

Wednesday, 13 January 2010 16:44

By Alex Wiltshire

If you’d asked us back in September whether we cared about the death of the cinematic narrative in a non-linear, interactive medium, we would have probably shrugged pensively before going back to stroking our beards in front of Slither Link. It took Uncharted 2 to remind us that we don’t actually mind games that try to be films – so long as those films are pretty good. Historically, however, games have swung wildly at the lowest hanging fruits of cinema, and yet still somehow failed to reach them. Naughty Dog, meanwhile, managed to combine a tight script with snappy direction, breathing life into its CG cast with credible human expressions, transforming them into complex, sympathetic characters with emotive voice-acting. Who would have guessed this might be a winning combination?

Of course, if it was that easy, we’d have engaging cinematic drama gushing from our consoles. More often than not we have horrifying digital mannequins, jerking about like malfunctioning wind-up toys while bored stage-actors drone through the dialogue, wondering if they’ll ever play Hamlet again. Clearly there are many hurdles – technical, practical and ideological – that lie in the way of getting as complete and evocative a performance as Naughty Dog has achieved. But there are now also solutions for developers, offered by thirdparty companies specialising in staging CG performances, combining directorial nous, cutting-edge capture technology and animation expertise, reshaped to fit varying budgets, scheduling and technical requirements.

“I think developers are now acutely aware that they need to have believable characters that can carry a story,” says Mick Morris, MD of Audiomotion, which provides capture services for TV, film and games. “But purely from a technical point of view, it’s only in the last few years that we’ve been able to wrestle a good solution for that out of the available technology.”

^{_{Imagination Studios' John Klepper finds that a limited number of markers tends to produce the best effect: "It's mostly about placement rather than number".}}

Morris points to the latest generation of consoles as the leap in capacity which enabled detailed, lifelike animation in realtime. And in no other area is this as true as the face, the subtleties of which can only be conveyed through a comparatively large expenditure of a game’s technology budget – or so it is often thought.

“Game companies have been avoiding the face,” says CEO Mike Starkenburg of facial mo-cap specialist Image Metrics. “We call it ‘the illegitimate helmet’. There are guys in gasmasks but never any gas in sight. Faces are so difficult to do right that it’s risky, and in terms of the engine’s tech budget, the face can easily be three times as expensive to move as the body.

“The geometric shape of the head – that’s the mesh. The process of moving that mesh in a believable way is rigging. You could move it one vertex at a time, but that would take forever. Instead, animators create a set of controls. One, for example, will open the eye and another will close the eye. I’m simplifying – most mouth rigs have 20 different controls. The body has probably 20 controls. The face can have 60, sometimes a couple of hundred. You can get a really good facial rig with relatively few controls – equal to that of the body – but it’s an optimisation problem, and most people simply haven’t done enough facial rigs to get good at solving it. We have. Inadvertently, we’ve become world experts in facial rigging. Many animators will spend their lives animating bodies and cars and so on, but how many faces do they really do? Even a game like Grand Theft Auto IV has, like, 80 characters. We’ve done literally thousands. So we always look at the facial rigs when we walk in and try and persuade developers to adopt our strategies.”

No other area is as crucial to producing an emotional performance as the face, says John Klepper, CEO of mo-cap firm Imagination Studios: “The face is everything – we as humans look at faces from the day we are born, and we have an amazing perception of its subtleties. In order to be able to transfer those extreme subtleties to animation takes an in-depth understanding of how to rig and skin and weight a character correctly. Eighty per cent of the rigs we get sent look like hell – there’s not a lot of competence, and the difference can be huge if you have something mere millimetres out of place.”

That in-depth understanding is an elusive thing. As Klepper, Morris and Starkenberg say in near unison, the major stumbling block for developers has been in building up the required level of in-house expertise – not simply on the technical side of animation, but in understanding the vicissitudes of motioncapture itself, a process which requires both a keen knowledge of the technology but also demands other skillsets: directing, acting and cinematography.

Audiomotion’s frequent collaborator, Side, specialises in delivering casting and direction for mo-cap productions. “This is what we do day in, day out for a multitude of different companies,” says Side’s managing director, Andy Emery. “Developers have to come up with a pipeline for getting these performances in-game, but that’s a problem we might have already solved on another project. We get to learn from all the different processes that people try. A developer just doesn’t have that exposure. They might do one project in two years – we’ll do 25 similar projects in that same time.”

Equally, Klepper’s experience working for Starbreeze has left him an advocate of outsourcing animation: “The lifespan of the game can be 18 to 24 months, but the time that animation is required, if it’s well planned out, may only be six or nine months. So, if you have it in-house, you crunch like crazy during that period, but what are you doing for the rest of it? I came over from LA to get Starbeeze’s internal mo-cap studio up and running. I got a very in-depth view of the pros and cons of having a large internal team. The result of it was they had to close the mo-cap studio down because it was too expensive. They still have a small team of very competent internal animators but they outsource quite extensively.”

Not all studios are beholden to outsourcing, of course, as Morris points out, citing EA’s mo-cap studio in Canada, Sony’s San Diego studio and Activision’s capture setup in Santa Monica. But, even then, he suggests that outsourcing may bring interdisciplinary expertise that would otherwise be unavailable.

“There’s always going to be that pressure to use those internal resources,” says Morris. “But the breadth of work we do, bringing experience from Hollywood blockbusters, music promos, commercials – perhaps those internal teams don’t have this much exposure to those influences.”

One other reason that the industry has struggled to squeeze scintillating performances from its CG casts is simply timing. The production methods of film clash with the fluid, ever-changing nature of game development, says Side’s Emery: “The process of capturing performances for games has often been driven by horrible phrases like ‘vertical slice’ – it doesn’t work. You need to get creative people engaged, involved and on a contract for a period of time before they move on to other projects. The technology and capture method has too often defined how we get these performances rather than being driven by the fundamentals of the performance itself.”

“Having to lock down your script is a terrifying thing for developers,” says Morris. “But there are huge benefits. If they do draw a line in the sand and work back from there in terms of rehearsing actors and having the director spend as much time with them as possible, then ultimately they’re going to get much stronger performances.”

But whether developers prioritise the performance-capture schedule or not, there are always awkward practicalities that both Side and Audiomotion are adept at working around. Emery describes full performance-capture – capturing sound, body and facial animation all at once – as the holy grail, but it is sometimes impossible to implement.

“Often, you have to record the voice, and then do the mo-cap with completely different actors,” he explains. “Take Guitar Hero – these tracks were already been recorded. So we got someone who was bloody good at miming to belt out those tracks, and we got the facial animation from those sessions. Sometimes, because of the constraints of the vocal talent, the voice work is already recorded in LA, the developer brings audio files to the mo-cap session and you’ll have actors mime to the pre-recorded stuff. It’s not the best way to do it, but if that’s the only option, we’ll take that approach.”

Nor is it ever a given that your voice-actors will actually be capable of the physical performance required for motion-capture. And, if you aim for A-list celebrity talent, it may simply be more economical to use other actors as their bodies for the lengthier mo-cap shoots.

“If you want Vin Diesel’s voice for the main character,” says Klepper, “you’re going to have to spend an unbelievable fortune to get him in a studio for three or four weeks while you shoot all the body data. Alternatively, you spend one or two days in a voiceover booth and get all the audio, then get somebody else to play the physical performance. In that situation where you have pre-existing voice, you then have the tricky task of getting the performances to match.”

Emery is sceptical of the benefits of getting in big names for this very reason. “It’s not about getting triple-A actors,” he says. “I don’t think Uncharted 2 has triple-A actors – it has great actors. We find it frustrating when what we want to do is rehearse, try a few different things, maybe go through it a bit slowly, work on the script – and that’s a very difficult sell when you hire a Hollywood star for just a few hours.”

But actors are only one of three essential ingredients, continues Emery: “There’s a misconception that a good actor will make something out of a bad script. They will make it better than a bad actor doing it, but they won’t make it. Script, casting and direction are the base tools. You can put great actors together, but without good direction they’ll soon lose their focus. If it looks like you don’t know how to get a performance out of them, they can disengage quite quickly with a project.”

This is not wisdom that has percolated down to all game developers, however, many of whom, for budgetary or aspirational reasons, repurpose members of staff from elsewhere in the company to act as directors or writers, when they are perhaps not as well qualified as they might believe. While Emery says the majority of Side’s clients now use external directors, there are still some horror stories: “Sometimes people will say: ‘Oh, well, I used to work in this drama group’. That’s fine, but it is the equivalent of me saying: ‘Well, I did a bit of claymation at school – can you let me have a go at one of your character models?’ When our directors aren’t working for us, they’re directing theatre or directing TV. They are directors – that’s what they do. We’re vocal enough that developers know the implications of using a director without directing experience, but we can’t go too much further than that.”

^{_{Image Metrics' videos show that it can match a video of a face without markers; Starkenburg says, "We have a version of Emily (above) that'll run in a game engine - we just can't find one that can do it!"}}

“What we see ranges from the animator writing the dialogue to productions similar to a Hollywood movie with a director and a second,” says Image Metrics’ Starkenberg. “There is a pattern developing, and the people who are most organised tend to be the people working on franchises. Storytelling in games is coming. People are doing more of it, but they’re used to being limited by the technology. I don’t think they believe they can get the subtlety in the movement, so they don’t do all the other stuff that’s required to make that a good shot. You need great writing, great casting, great acting, great directing – if you have all that and a great engine there’s no reason we can’t put that performance in a game. It’s just that all this has rarely come together for games.”

Perceptions are changing, however, and developers are slowly waking up not only to the importance of getting drama right, but also the humbling fact that they might not always be the best ones to do it. Emery explains: “We used to see a lot of work which was just a case of putting a cast together and letting the developer direct and record it, however hit or miss that may be. And that was a progression from: ‘Have you got a studio we can record in?’ It’s an important step-change we’ve seen already.”

^{_{More from Image Metrics face-matching videos}}

“People are slowly starting to realise that mo-cap can be a good and easy thing,” says Klepper when we ask if Imagination Studios finds that there’s still a need to educate developers about mo-cap’s pitfalls and solutions. “Unfortunately there’s a whole history of studios that churn out mo-cap data with the intent of producing quantity over quality. It’s tricky because mo-cap data can be a real pain in the ass. How it’s solved to the rig, what kind of rig it is, how it gets put into the engine – all kinds of things can go wrong, especially when studios just hand over data. I’d say 60 per cent of the clients who come to us say that they hate motion-capture – and then they realise pretty soon that it can actually be a pleasure.”

“It’s about changing the way people work as much as it is about changing the amount of budget they allocate,” says Emery. “To get the most out of a full performance-capture scenario, it’s about getting people involved early. It’s about being organised and considering that you might want to do your voice at the same time as your motion-capture. It’s about having had a great scriptwriter involved from an early stage.”

There are, as Audiomotion’s Morris observes, many ways to skin a cat. The choices facing developers are a little overwhelming – if and how they choose to break down the capture into multiple sessions, separating motion and audio, being the primary decision. But there are also different technologies involved in capture, each with their own champions, benefits and drawbacks.

The method most will be familiar with involves markers being placed all over a body, and a large number of static cameras tracking the movement of these markers to build up a 3D picture of the body’s movements. It’s a technique that can easily be scaled up to include multiple actors interacting on a soundstage, or scaled down to capture the minor movements of the face. Voice can also be recorded at these same sessions, permitting the full performance to be captured in a single sitting.

“In an ideal world, I’d go for full performance-capture every time,” repeats Emery. “But whether you are capturing that physical or facial performance or not we all give a different performance when we’re moving. We work very hard to try and incorporate that physicality into as many performances as possible. For a long time we’ve had a lot of disparate elements put together to create a single performance and I think we’ve all seen how they can suffer from that. Anything you can do to tie those elements together is enormously beneficial.”

But there are plenty of instances when this isn’t possible – and the video-capture technology pioneered by Image Metrics offers a useful alternative to marker-based facial-capture methods.

“You can’t get the same degree of subtlety with markers,” claims Starkenburg. “The process of putting markers on is complicated, and you can easily occlude some of the dots when you scrunch your face up – or they can fall off. Body motion-capture works really well, but the limbs are fairly large and wellspaced. Faces are so small and to capture an emotion requires so many little movements. What a lot of people do right now is mo-cap the body and hand-animate the face. But if you’re doing it by hand, it’s time consuming. [Video-capture] is affordable and efficient. You can divide that up between really high quality or really high volume, but either way it’s much more effective than doing it by hand.

“We capture from video – any video,” Starkenburg continues. “We’ve actually had people take stuff on their iPhone and been able to use it. Most of our customers capture video while they’re doing a voiceover, but we can use any picture of a face that is relatively straight on – we can go 20 degrees in either direction. That’s where the maths comes in. We then plot changes in the values of texture and light from frame to frame and use some statistical validation to say: ‘This is a face, so that must be an eye’. The human face has a statistical average: the eyes can only be a certain amount apart, the nose will be above the mouth, the tip of the nose will be in front of the cheek, and so on. We actually started off a lot more generalised, not just dealing with faces, and have since spun out a medical company to look at X-rays. They’ll look at an image and say statistically a spine looks like this, and if they then take a long series of X-rays – say, one a month for two years – they can look at the changes in the same way we look at frames in a video, and diagnose disease.”

The quality of the results are astounding – so much so that one of Image Metrics’ demonstration videos tricks you into thinking you are watching the capture footage when in fact it is the CG model. But though, as Starkenburg says, technology which allows you to pull 3D data from a 2D image is “really very cool”, it’s clear that it is not without its drawbacks.

“The problem I have with the video-capture method is that the actor has to look straight down a camera,” says Emery. “You can have head cameras, and with a single actor that’s OK, but feeding back the comments I’ve heard from actors and directors, if you’ve got a group of actors performing an action scene or an intimate scene and they’ve all got little cameras looking back at their faces, it can be quite difficult. Interestingly, if you have markers attached to your face, you soon forget about it. It’s not the same barrier to performance. The video-capture tech works well for things like RPGs where you’ve got a vast quantity of dialogue – I can see real merit in those scenarios. And if you’re using a marker-based technology then you still have to do the eyes. Video-capture can track the eyes.”

“The thing you notice with video-capture is how much unusual eye movement there is,” says Starkenburg. “If you walk down a street and you see a girl sitting on a bench and you’re checking her out, your eyes are all over the place – they’re not doing what you think they’re doing. An animator trying to estimate that doesn’t get it right. We recently did this sports game – when the guys are running around and jumping and dunking, their faces are really expressive and it’s not something a hand-animator would think about.”

There are circumstances in which hand animation does the trick, however – particularly if you want larger-than-life results. The expressive faces of Uncharted 2’s cast, for example, were animated by hand.

“A lot of that’s to do with the uncanny valley issue,” says Richard Scott, managing director of Axis Animation, another collaborator with Audiomotion and Side. “An expression might look fine on a real person; when it’s superimposed on a computer-generated character it loses its realness. You want to be able to exaggerate. A subtle smile might not read so well on a CG character, so you want to push that smile a little bit. So that’s why people choose to refine the motion-capture or keyframe the faces from scratch. We actually chose to keyframe all of the Killzone 2 intro animation. We couldn’t get Brian Cox in a full-performance setup, or get a camera to shoot Brian while he was doing the voice – but keyframing gave us that little bit of extra flexibility to push him into hyper-realism.”

^{_{John Pertwee playing Killzone 2's Commander Radec}}

Of course, there’s one remaining question for developers: what can you afford? The studios we speak to are quick to insist on the relative good value of this investment (“It’s a cost-effective way of engaging your player – getting that extra percentage score, getting those extra column inches,” says Emery) but hiring out soundstages, directors and so on is clearly something that comes with a substantial price tag. Imagination Studios and Audiomotion are positioning themselves as high-end services, looking to the needs of big-budget titles. As such, they are reluctant to jeopardise their reputation for quality by offering cheaper options.

“Even if a client asks us for raw mo-cap data, we won’t deliver it,” says Klepper. “We don’t insist on building the rigs ourselves but we do insist on solving the data to the rigs, so that we can at least ensure that when it leaves Imagination Studios it looks pretty good. If we can tweak the rig, or build a new one, we can get it much, much better. The benefit that our clients get is trust. We’re going to listen to everything they need, we’re going to send them tests, so they know that everything is perfect before any production work gets under way.”

“We are, and always have been, about quality – so we can’t launch a budget range,” echoes Morris. “But there are other solutions. You’ve got software where you load in the audio files from your voiceover session and it generates mouth shapes (see ‘Marker my words’). That, for lots of people, gives satisfactory results, but it misses all the nuance of real performance from real actors. There are subtle things that a director can bring to a session that you’re not going to get out of a piece of software.”

But you don’t always need that level of subtlety in a game, where a mass of background characters may be suitably furnished with rudimentary face animations. Image Metrics’ video-capture technology is thus the most scalable in terms of cost, since it doesn’t require a stage, markers or an elaborate camera setup, and can be used to rapidly produce the 3D facial data for reams and reams of dialogue.

“The top tier of games and the bottom tier of film are almost the same,” says Starkenburg. “That tier is what we call the ‘hero shot’. But you don’t need that when the character is 35 feet away and facing three-quarters in the opposite direction. And there’s lots of that in games, and what we offer is less than half the price of hand animation.”

Regardless of the level of animation, the one thing that all of the studios interviewed here stress is the need for a quality script. All the technology in the world will come to nought if the dialogue is leaden and absurd.

“The reality is we aren’t telling great stories yet,” explains Starkenburg. “It’s not really about the ability to get good motion-capture, it’s about writing and direction. But it would be frustrating for a great director to come to games if he was unable to project his vision. So if our technology becomes more available, they will invest more in the front-end of actually getting these great performances.”

Be it using markers or video-capture, recorded as a full performance or in disparate sessions, the technology is there to produce the data developers need to create convincing movement. The next step is working out how to move the player.

Comments

RSS

Only registered users can write comments!

MCS Login

Categories

Blog Archive

By Alex Wiltshire

3.26 Copyright (C) 2008 Compojoom.com / Copyright (C) 2007 Alain Georgette / Copyright (C) 2006 Frantisek Hliva. All rights reserved."