MoCap MIT - an essay on mocap from MIT |
Motion Captureby Maureen Furniss Right from the beginning, one finds that talking about motion capture can be difficult. Deciding on which term to employ is one of the challenges, since so many exist:
also heard are 'virtual theater', 'digital puppetry', and 'real-time animation'
The variety of terms one hears is reflective of not only the newness of the field, which is trying to define itself and still perfecting its technology, but also the many directions that motion capture is going. It is employed in the fields of music, fine art dance/performance, sign language, gesture recognition, rehabilitation/medicine, biomechanics, special effects for live-action films, and computer animation of all types, as well as in defense and athletic analalysis training. In this paper, I will use the generic term 'motion capture' or 'mocap' for the sake of simplicity. So, with such a lot of options, perhaps I should address the most basic question,... "What is Motion Capture?"Writing an overviewing 'White Paper' on the subject in 1995, Scott Dyer, Jeff Martin, and John Zulauf explain that motion capture "involves measuring an object's position and orientation in physical space, then recording that information in a computer-usable form. Objects of interest include human and non-human bodies, facial expressions, camera or light positions, and other elements in a scene." In most instances, a live subject, most likely human (but possibly and animal or puppet), is used as the source of data which is transformed into another form. However, a static object also can be used for motion capture purposes. Perhaps best known of the inanimate motion capture models is the 'Monkey', a 23-inch articulated mannequin with 39 joints, created by Digital Image Design. A Monkey, of course, is different from a live subject in that it provides pose data but no timing information. Dyer, et al., explain that the Monkey provides an easy transition for animators trained in stop motion, since it relies on posed models rather than the direction of live performers. As with traditional animation and many other arts, mocap is actually composed of a number of phases. Dancer Lisa Marie Naugle, an Assistant Professor at University of California, Irvine, identifies them as follows:
There is another question about motion capture I will consider briefly, but not attempt to answer: "Is it Animation?"In a 1999 article which appears in Animation World Magazine's special issue on motion capture, Brad deGraf and Emre Yilmaz's response to this question is that the debate is one of semantics. I tend to agree. I am less concerned with arguing for or against motion capture, or comparing it qualitatively to other animation techniques, than exploring its current state and (eventually) the way in which spectators in entertainment and fine art applications respond to figures based on 'realistic' motion capture data (that is, characters based on the performance of humans). The Question of ArtistryMost agree that the term motion capture already has a negative connotation. In a way, I see motion capture being in a similar situation to limited animation, which also has been derided in terms of its aesthetics. Like rotoscoping, limited animation and motion capture are seen as "technical cheats," to use the words of Greg Pair, of AM Pnyc?. He notes that the same stigma attacked CGI when it first appeared, but he thinks "when technology and output improves, motion capture will be seen as yet another new medium and not a replacement for the traditional media." He also suggests that some of the disdain for motion capture stems from a fear of losing jobs to this technology. Throughout the history of animation, producers have sought ways of reducing the time and money needed to create animated images. Certainly, limited animation and rotoscoping can be conceived of as time-saving devices, though creative uses of these techniques can expand their production process-and costs. Motion capture (like computer animation, in general) is often touted as being both time- and money-saving, but actually it is not yet at the point where it is always a 'better deal' than traditional techniques. One must know the technology well enough to judge when it actually will produce more economical results. Nonetheless, it is common to hear statistics like those published in a 1999 Animation World Magazine article by Deborah Reber, regarding the Nickelodeon network's use of motion capture; she explains, "based on Nickelodeon's model, a half-hour motion capture animation program could cost as little as $200,000 versus a minimum of $400,000 per episode for a traditional ccl animated half-hour program. In a 1997 Los Angeles Times article, Marla Matzer provides another perspective on costs: "Medialab... estimates that a 'full body' character can be animated for as little as $1,000 per minute; over a series of shows, the price can go even lower. Cel animation, by contrast, can cost as much as $5,000 a minute." She explains, however, that "these prices don't include other production elements, such as backgrounds"; presumably, they also do not include post-production enhancements to facial features and other character movements. Also, most prices quoted are for the capture of a single character- typically a host on a children's show who has no physical interaction with other characters, since it costs more to do that. Seth Rosenthal sums up the situation by saying, "Motion capture is irreplaceable for some applications and inappropriate for others. When used well, it can be incredibly cost effective, but, in practice, some of those savings are displaced to other parts of the production." In 1995, Dyer, et al., wrote that "motion capture is one of the hottest topics in computer graphics today. It is also often poorly understood and oversold."' Rosenthal seems to agree, saying one of the biggest problems is "that the community of vendors have exaggerated the capabilities of the technology and are given to making claims that are almost guaranteed to offend anyone working in animation: usually, regarding how much cheaper it is, how much superior it is to traditional animation, etc. Today, it seems that America remains less accepting of mocap than other countries, where the technology is being used in entertainment to a greater extent. In any case, I find it interesting that this process has a lack of respect within the 'mass art' of animation but much more support within the fine arts-particularly dance and music (not to mention other areas, such as the sciences). Usually, it is the other way around; in this case, however, I have found more fine artists who lock favorably on motion capture, while the immediate response of animators (except those working with motion capture, of course) has been strongly negative. Within the entertainment industry, one sees various indications that motion capture is being resisted. For example, a June 1999 Vanel article reports that the Academy of Television Arts and Sciences did not allow "Donkey Kong Country," produced by Nelvana Communications, to compete in the animation category of the Emmy Awards. In the same article, Bob Kurtz is quoted as saying, "Animation is about creating an illusion of motion that doesn't otherwise exist. (Mocap) doesn't involve the same artistic input and creativity." Note that the emphasis in Kurtz' s statement is on the origin of the image (within a creative mind). In motion capture, a significant amount of the creative process occurs during post production process, when data is manipulated to become animated imagery. Sometimes imagery is manipulated 'on the fly,' or as the performance of real-time animation is taking place; this might be compared to disc jockeys who mix and scratch records as they are played to create new and original musical compositions from pre-recorded music. Seth Rosenthal, of ILM, makes an argument that creativity does in fact occur early in the mocap process. He insists that "Mocap is much more about performing, directing talent and benefiting from the spontaneity of live performance. Chris Walker, one of the founders of Modern Cartoons, provides an altogether different view of the 'artistry' of motion capture. He contends that "Motion Capture is a different art form. It's not classical animation but it is animation. It's transporting the viewer away from reality." Here the emphasis is on the final product and its situation as a fantasy construct. Dr. Norman Badler of the Center for Human Modeling and Simulation makes an argument that motion capture is an aspect of animation. He says, "Motion Capture is basically 3-D rotoscoping. If you accept rotoscoping as a form of animation then you have to accept motion capture." The problem with his argument is that many people do not accept rotoscoping as a form of animation. Greg Pair believes that "the stigma [of the rotoscope] seems to have rolled over" to motion capture. [23] Certainly, mocap shares with roto animation the close relation to a model's form (human motion). The extent to which the two-model and image-are related can vary dramatically, some feeling of the 'presence' of a human being still exists in most animation of these types. When watching a film like Gulliver's Travels (1939), it is of course easy to sense that Gulliver is quite different in essence from the little people around him; you don't have to know that rotoscoped footage was used to create him in order to sense that difference. A related animation technique involves the use of 'reference footage'; that is, a live model/performer is filmed and artists study his or her movements closely (frame by frame) in order to create animated movement. In this case, the live person's relationship to the animation is more distant to the end product, since the reference footage just suggests how images should look and does not literally provide the basis for it. In a way, the artist using reference footage is making a drawn record of what he or she sees-a print record (ultimately transformed into a motion one) of a series of movements. From this perspective, we might see the animator's work as a form of visual 'notation'. That is how Lisa Marie Naugle describes motion capture in terms of her dance performance work. Notation, as I use the term here, generally refers to a recording of movement in print form, so that it might be preserved, studied, and perhaps re-enacted at some future time. Ethnographic researchers can use notation, for example, to record ceremonial dances that are on the verge of 'extinction' because the people who perform it are becoming integrated into another culture. One of the best known forms of dance notation is called the Laban dance notation system (actually a software program called 'Laban Writer?' can be integrated into the motion capture process). Naugle compares Laban and motion capture, as two forms of notation, with the use of video and film recording. She explains that the benefits of using motion capture over other sorts of notation are that it allows analysis from any point of view and that it can be visualized in 3D form. She explains, "Looking at dance images from different locations and perspectives, notators, choreographers and dancers can create annotations or list notes about the work .. . While video may be used repeatedly to extract information about color, motion, and, to a limited extent, depth, it is often lacking in detail or definition. Even if the video has been edited from several different perspectives, the medium does not allow for a full exploration of movement in three dimensions. This sentiment is echoed in an article about a dance performance by Bill T. Jones which employs mocap technology: Dancers stroke a metaphorical canvas, sketching ephemeral lines that are lost in the moment of creation. The invention of film and video immortalized some of this centuryY5 greatest dancers, preserving their movement for the next generation. But the quality of such recordings betrays the vitality of the dancer, often leaving the viewer with snatches of their genius seen through a murky lens. Motion capturing, the product of body sensors that create a constellation-like skeleton reassembled with computer brushes and palettes, may change the dancerís predicament de rigueur by invigorating a self they didnít know they had. Paul Kaiser and Shelley Eshkar of Riverbed have motion captured the legendary dancer Bill T. Jones for their exhibit at Cooper Union. The artists call it "Ghostcatching," the term Native Americans gave to photography which some believed stole their souls. Motion capture also can be described as a sort of 'sampling', a term that is perhaps most familiar in terms of music, when bits of pre-recorded music, dialogue or other sounds are recorded and mixed into a new composition (I've already made the comparison between record 'scratching' and the type of animation that occurs in real-time animation). It seems common for supporters of motion capture technology to compare its process to music recording. Brad deGraf and Emre Yilmaz go so far as to describe it as "a new kind of jazz." It is also not difficult to see parallels between motion capture and certain types of electronic music, such as the Theremin, an instrument that translates human movement into sound. This is not to suggest that all or even most motion capture has reached the level of artistry one might associate with accomplished musicians and music technology, but rather a way of conceptualizing how motion capture might work and, in some cases, does. In terms of animation, motion capture can be used for different aspects of production. For example, captured motions can be used real-time directly in a work, with or without secondary animation of hands and face in post production. The captured data also can be transformed into characters and modified completely during post-production. Sometimes mocap is used only as reference material, as filmed reference footage is used. As Richard Cray, founding director of the Performance Animation Society, notes, "very often animators rehearse their character's moves themselves prior to keyframing or hand positioning the various elements. They can be seen dancing about, crawling on the floor and performing other acts of physical movement in their workspaces in the process of designing their character's motion. All Motion Capture does is add the capability of tracking these rehearsals if an animator chooses to do them and helps package their performance in a digital format for them to use and reference as they see fit." According to Seth Rosenthal, mocap is used as the basis of animatics at ILM. He says "We have been using magnetic for animatic work because it is fast and inexpensive, and we have been using optical for feature work because it is accurate. Types of Motion CaptureIt is sometimes suggested that the roots of motion capture can be seen in the motion studies of Eadweard Muybridge and Etienne Jules Marey. In the form we think of it today, mocap technology has been developing since the 1970s, when it was created for military use, and has been used in entertainment since the mid-198Os. Over the years, mocap has taken many forms, each with its own strengths and weaknesses. Following is a summary of three types of mocap used in entertainment and the ways in which they work. Examples and more description of all these types of motion capture systems is available on a website at La Trobe University, "Introduction to Motion Capture in Music." 1. Mechanicalperformer wears a human-shaped set of straight metal pieces (like a very basic skeleton) that is hooked onto the performer's back; as the performer moves, this exoskeleton is forced to move as well and sensors in each joint feel the rotations other types of mechanical motion capture involve gloves, mechanical arms, or articnlated models(like Monkey), which are used for 'key framing'
2. Opticalperformer wears reflective dots that are followed by several cameras and the information is triangulated between them markers are either reflective, such as a system manufactured by Vicon or Motion Analysis, or infra-red emitting, many of which have been developed for musical applications (such as conducting) developed primarily for biomedical applications (sports injuries, analysis of athletic performance, etc.)
3. Electromagnetic (magnetic)performer wears an array of magnetic receivers which track location with respect to a static magnetic transmitter one of the first uses was for the military, to track head movements of pilots often this type of motion capture is layered with animation from other input devices the two main manufacturers of this type of motion capture equipment are Polhemus and Ascension
In their 1995 "White Paper," Dyer, Martin and Zulauf explain: The typical magnetic motion capture session is run much like a film shoot. Careful rehearsal ensures that the performers are familiar with the constraints of the tethers and the available 'active' space for capture. Rehearsal often includes the grips for the cables to ensure that their motion aligns to the motion of the performers. The script is broken down into manageable shot lengths and is often story boarded prior to motion capture. Each shot may be recorded several times, and an audio track is often used as a synchronizing element. Because the magnetic systems provide data in real-time, the director and actors can observe the results of the motion capture both during the actual take and immediately after, with audio playback and unlimited ability to adjust the camera for a better view. The tight feedback loop makes magnetic motion capture ideally suited for situations in which the motion range is limited and direct interaction between the actor, director, and computer character is important.[34] Today, wireless magnetic systems are available from Ascension, for example, though the performer still must wear a relatively bulky pack of materials on his or her suit. Other types of motion capture technologies include: sonic, which employs ultrasound and is subject to several types of interference; biofeedback sensing, which measures bodily movement from the heart, brain, retina, eyes, skin, and muscles, and is used extensively in biomechanical and sports related work, but also has been used for music performance; electric field sensing, in which the body either works as a transmitter or a source of interference, which is measured; inertial systems, which measure acceleration, orientation, angle of incline and other characteristics; and video, employing optical technologies which can detect changes in luminescence and color. Part of what makes motion capture technology such a challenge is the speed at which everything must occur. In real-time mocap, within 1/30th of a second, the length of one frame of video, motion must be sampled, data must be applied to a digital scene representing various body parts of a character, and a scene must be rendered into a digital image. Depending on the system used, interference of the signals can impede accurate collection of data. In full body motion capture, typically sensors, or markers, are placed at selected joints on the performer. Several marker segments make up a body segment and each has a 'weight' -that is, influence or priority in bone hierarchy (degrees of freedom). Movements of performers captured in real-time mocap can be supplemented with automated movements, such as blinking, breathing, hand gestures, or secondary actions (for example, when a foot hits ground, its toes spread out). These are called 'expressions': program components written to control a number of low-level features from one high-level attribute, so that the movement becomes more interesting. Related to this are voice recognition systems, such as one developed by Shane Cooper at Protozoa, which allow almost real-time synchronization of mouth movements with words; however, the movements necessarily follow the words and so are at least slightly out of synch. Cooper says that the proper way to work is actually through the audio bank: to have visuals rendered from audio information, not vice-versa. He has not been able to perfect his work, though, because the mocap systems he works with are led by visual material (that is, movement of some sort other than sound waves). Although the above categories- mechanical, optical, magnetic, sonic, biofeedback, electric field, inertial, and video-are common ways of classifying mocap, another means is by whether a system is 'active' or 'passive'. Active devices include magnetic equipment and synchronized lights, if used in optical motion capture, while passive systems most commonly refer to the use of reflective markers in optical mocap. Passive systems usually are more economical, since there is more 'wear and tear' on active devices. However, the cost of the passive optical mocap process overall generally is higher than the active magnetic mocap process, since optical usually involves more post-production and magnetic is often employed for real-time animation without labor-intensive post processing. There are also categories for the motion capture itself, most commonly divided into the areas of body movement, facial capture, and hand gestures. Special facial capture systems and gloves can be used to record the more subtle movements of faces and hands that add personality to animated images. Typically, this recording is done separately from the body capture, though it can be done all at once. Shane Cooper notes that "people are very critical of facial animation," [38] in terms of sensing the 'realism' of a character. Though audiences are perhaps not as critical about the hands of characters, nonetheless these appendages are very important to the animation process (recall that Lotte Reiniger relied on hand gestures to give dimension to her silhouette animation; she felt that, "with silhouettes, the hands are one of the few ways which can convey characters' feelings" ). As a result, most 'high quality' mocap will involve post production facial animation and work on hand gestures, to create nuances that make characters more complex. However, in low-budge productions, little post work is done. As a result, close-ups tend to be avoided; in her Animation World Magazine article on the new "Voltron" series, Deborah Reber makes this observation. In another AWM article, Reber explains an approach to facial animation that is used in more elaborate productions. The feature-length Sinbad: Beyond the Veil of Mists, produced by Pentafour in 1999, employed one performer for the body of each character and another for its facial data. Body performers were chosen because they matched the height and body shape of the characters in the film. Voice performers for the film are recognizable stars (including Brendan Fraser, Leonard Nimoy, Mark Hamill, Jennifer Hale, and John Rhys Davies), a common marketing decision for feature films. Incidentally, shooting of the film's studio material lasted eight weeks and the films' cost is estimated to be under US$20 million, about one-fifth that of a Disney animated feature. Future DirectionsIs there a direction in which motion capture development seems to be heading? It is difficult to tell because there are varying opinions on which is the most promising technology. Shane Cooper, who is presently working at the innovative ZKM Center for Art and Media in Germany, feels that magnetic systems offer the best quality data; however, he sees a lot of research going into optical systems due to its flexibility. Seth Rosenthal notes that optical mocap usually is employed in feature film production (examples can be found in Titanic and Star Wars Episode I: The Phantom Menace) because of the higher quality data, while magnetic is used for live, interactive situations. In any case, one can find assorted issues that are being addressed in motion capture research and development, as a whole. Consideration of these points can assist the analyst in assessing the aesthetics of motion capture work. Often, one will find hardware and software manufacturers emphasizing the ways that their products solve some of the following problems: geometric dissimilarity, different movement qualities in performers and cartoony characters, increasing data collection ability, and an increased number of performers which can be captured simultaneously. Geometric dissimilarity between the performer and the character he or she is supposed to be always has been one of "the most difficult problems facing motion capture animators." In 1999, deGraf and Yilmaz discussed this point, saying "a crucial step in going beyond motion capture is re-proportioning data to fit non-human shaped characters Making human-shaped data work on one of these characters, without introducing ugly artifacts like skating feet [feet that slide along the ground, as an effect of mismatched proportions], is a challenge and an art." There tend to be differences between the way in which humans and classically animated characters move: the speeds of acceleration and deceleration are different. Because motion capture records movements at an even pace, reflecting natural human motion, there can be problems creating truly cartoony movement. Relatively limited amounts of data can be collected, which tends to be seen as a liability. However, some find it is actually better to use fewer data points. De Graf? and Yilmaz are among them, saying, "Ironically, it's often better to have less data than more-we usually use only 12 body sensors. If you had one sensor for every single moving part of the body, you'd have a lot more information tying you to the human form, but for our purposes we just want enough sensors to convey the broad lines and arcs of the body. [46] Without being too tied in to the actual human form, one can minimize the obvious link between a character and its human model. Seth Rosenthal's opinion is that the capability of capturing of data is now quite good, but the incorporation of that data into sophisticated animation remains difficult. That is, the capturing technology itself is at a more advanced level than the processing and post-production technology. He explains, "the software tools are not adequate at this point, so it requires a high level of expertise to use the data effectively." Motion capture often requires performers to be recorded separately. This is true of multiple motion capture characters or even one character who is composed of data from separate voice and body performers. When performers are composited, there is less integration of the characters with each other- because they were not together when they were 'filmed'. The same is true for dubbing sessions, for example. Sometimes voice performers are recorded separately and sometimes as an ensemble. When performers are not together as they are recorded, they cannot act off each other. However, when groups of performers are captured simultaneously, other problems arise. For example, the number of polygons available to be digitized for each performer is decreased, reducing image quality. Other goals being set for future mocap development include: enhancement of performance conditions through lack of tethering and simplification of performance apparel increased speed of the technology increased 'volume' or area in which performances can be captured lower cost, so that consumers and independent artists can have access and experiment/expand the technology increased accuracy of the results, including improved physical abilities, so that characters can touch each other and feet meet solidly on the ground greater ability to capture data from multiple characters combination of virtual reality and existing mocap technologies could aid in technological development Concluding Remarks This paper has summarized some basic historical, technological and aesthetic points related to motion capture, particularly in terms of entertainment and fine art. My future research will take me into a deeper connection between these two groups. As I see it, fine artists using motion capture share some interests I have found in my research of experimental animators. Harry Smith, Len Lye and a host of other abstract artists are among those who explored physiological issues in their work. The whole concept of 'expanded' perception, a common focus for experimental filmmakers (and artists, in general) beginning in the 1950s and explored in the writing of P. Adams Sitney, among others, is another possible link to the aesthetics of motion capture. One might also consider how the artwork that explores the concept of synaesthetia, an overlapping of the senses, is related to experiments in motion capture. [50] For example, how does the imagery produced by Oskar Fischinger's Lumigraph, which translated human movement into abstract visuals, link to motion capture practice? A 1996 Wired article by Evantheia Schibsted suggests that motion capture taps into intangible elements of experience. In the article, the author refers to Merce Cunningham's opinion of a specific type of software called Lifeforms, which "is not revolutionizing dance but expanding it, because you see movement in a way that was always there-but wasn't visible to the naked eye." Dancer and software designer Theela Shiphorst extends this idea, adding that "the nonlinguistic knowledge inherent in physical training is a richly technical world that can inform technological development." A more concrete example of the way in which motion capture taps into bodily rhythms can be found in another Wired article, this one a May 1998 report on a motion capture device called Smartpen. This pen, which is used as a security device, detects movement, rather than the signature produced while using it. The article's author, Tom Standage, explains that "Smartpen, from LCI Computer Group, doesn't give a hoot about your scrawl-it uses sensors to detect the motion of the pen, another unique data 'signature.' Looks like a pen, writes like a pen-acts like a biometric capture device." He adds that the Smartpen is more subtle than a fingerprint or iris scan and, unlike a signature, your biometric identity can't be forged. In an article on computer generated animation, Leslie Bishko talks about different types of viewer responses that can be created by animation, ranging from gut-level to evocative to conceptual.[53] I think it is worth exploring how viewers respond to motion capture imagery in animation, including their perception of facial and hand movements. All these areas are of interest to me and, I think, help pave the way for discussion of how motion capture might expand the expressive possibilities of animation in terms of both entertainment and fine art. Thanks to Linda Simensky for her research assistance and to Shane Cooper, Richard Cray and Greg Pair for providing me with replies to my research questions. Thanks also to Lisa Naugle for sharing her research with me and to Deanna Morse for sending me a copy of her video and other materials related to her work. |