Week 4 |THE PAST, PRESENT AND FUTURE OF MOTION CAPTURE

The release of Cyberpunk 2077 was undoubtedly one of the most eagerly anticipated events in the game industry over the past year. The production process of such an immersive simulation game is inseparable from an advanced technology, motion capture.

Motion capture, also known as mocap, is the procedure of transferring motion from the real world into three dimensional digital systems.[1]Nowadays, mocap has been widespread used in the film and game industry.

In fact, as a technology, mocap has only developed for less than forty years. Nevertheless, the concept of capturing the movement, known as rotoscoping, was proposed as early as the last century, before the birth of 3D computer graphics. Adopted by Disney in the 1930s, many of their celebrated early titles like Snow White and the Seven Dwarves, Cinderella and Alice in Wonderland, were all created via this method.[2] By filming required scenes in live action, the actors’ performance would be recorded. The film then would be projected onto transparent easels for an animator to trace the action frame-by-frame as a reference.

The success of Snow White led to Disney moving ahead with more feature-film productions. However, the rotoscoping, indeed came from its chief competitor, Fleischer Studios. In 1914, Max Fleischer demonstrated his new device, the rotoscope. The original version of the technique was shown through his animated series Out of the Inkwell [3], in which Max played a painter, interacting with his cartoon character, Koko the Clown. In order to make the clown’s actions come alive. Max’s brother Dave filmed a set of actions, becoming the first person to be captured in history.

Constrained to trace every frame accurately in 2-dimensional planes, traditional rotoscoping was actually painstaking and also painful. The prototype of today’s motion capture in 3-dimensional space appeared in the early 1960s, during which an engineer called Lee Harrison III introduced his animated stick figures. The figure composed of bones, joints, eyes, moving lips would be displayed on a Cathode Ray Tube (CRT). This idea evolved into a giant hardware contraption called ANIMAC which could generate real-time animation controlled by an actor wearing body mounted sensors.[4] Although the design behind the scene was fabulous and ahead of time, these glowing stick figures seemed not to be taken notice of.

Unfortunately, lack of funds, limits of capabilities and computing power all led ANIMAC to a business failure. After Harrison, the development of motion capture entered a long period of stagnation. Until 1980, Tom Calvert of Simon Fraser University made a major breakthrough in physical mechanical capture clothing using attached potentiometers. Almost the same time, Ginsberg and Maxwell at MIT, invented the Graphical Marionette. This system was designed to generate animation scripts by enacting the motions, using an early optical motion capture system called Op-Eye which relied on sequenced LEDs. These LEDs located the key joints of the skeleton, playing the role of active markers. Two detectors then sent the 2-D position of each marker to the computer and a 3-D coordinate would be obtained. As an advantage, the location information could be stored for later rendering of a more detailed character.[5]

The emergence of these technologies quickly laid the foundation for the rapid development of motion capture in the future, including today’s performance capture, which refers to recording an actor’s face and body motion simultaneously.[1]

Around 1992, SimGraphics developed a facial tracking system called “face waldo”. One of the most successful applications was Mario’s real-time performance in Nintendo’s trade fairs.[6] Driven by an actor wearing the face waldo, Mario could converse and joke with audiences, which attracted a great deal of media and public attention at that time. However, body and facial animation were still produced separately in a traditional workflow, such a disconnection lasted for a period of time.

At the same time, mocap began to be applied in the game industry. The earliest use went back to 1994, after the huge success of the world’s first 3D fighting game Virtua Fighter in Japan, Sega launched its sequel the next year.[7] In order to produce a realistic visual experience, Sega recruited Japanese martial artists to perform various professional techniques in mocap equipment. This put Virtua Fighter 2 in a leading position of games throughout the 1990s. Even director Mamoru Oshii intended to include an iconic move from Virtua Fighter 2 in his anime film Ghost in the Shell.[8]

On the other hand, the development of the film industry was requiring CGI to be involved, and the key to making CGI characters look lifelike was mocap. In 1999, the movie Star Wars: Episode I – The Phantom Menace made a historical leap and became a milestone in a journey of motion capture towards Hollywood. With the help of mocap, the comic relief character Jar Jar Binks [9] was portrayed as a bumbling and foolish image, thereby also he was recognized as one of the most hated characters in film history.

Now that humanoids can be made with CGI, why not take a fully CGI cast instead of real actors? In 2001, Final Fantasy: The Spirits Within implemented this idea. All the characters in this photorealistic movie are entirely computer-generated with mocap technology.[10] It was noticeable that the heroine, Dr. Aki Ross, was rendered in incredible detail, down to the 60,000 hairs on her head, in a production process which took four years and cost nearly $150m.[11] However, the end product seemed not to be accepted by the audiences. The ambition went flop and sent its studio, Square Pictures, to the wall.

Different from the reason why people hate Jar Jar Binks, the digital humans in Final Fantasy always gave a feeling of creepiness. “Solemnly realist human faces looked strikingly phoney precisely” stated by critic Peter Bradshaw[12], the imperfect visual depictions of human characters were exaggerated on the big screen, exposing the immaturity of mocap technology at that time.

What was the situation of motion capture used on cartoon characters? Coincidentally, the 2004 animated film The Polar Express received a lot of criticism for misuse of mocap.[12] From children to Santa, all characters looked creepy and had a feeling of deadness in their faces, mismatching the film atmosphere.

While the debate over motion capture rumbled on, performance capture had gradually taken the place of traditional mocap. The perfect visual depictions of some hideous creatures such as Gollum in the Lord of the Rings proved that performance capture can be embraced by the mainstream. Because such creepiness became a required element of performance in this case, concealing the technical flaws of the time.

The problems caused by technology must be solved by technology. The filming of blockbuster Avatar (2009) utilized a real-time performance capture system which allowed the director and the actor to preview the final shot[13], significantly increasing the efficiency of filmmaking. The saved time then could be spent on fine-tuning details for final rendering.

In the past few years, cognitive automation has come of age. While the rapidly updating technology results in an explosive growth in computing powers, researchers are more concentrating on machine learning innovations, aiming to bring mocap from large studios to everyday life. The future of mocap is markless.

During the 2018 Game Developers Conference, Cubic Motion presented a digital character “Siren” who was driven by a mocap actor through Unreal Engine in real time.[14] One of the most impressive progress was that her subtle-detailed face performance was totally recreated in an image-based camera tracking system without the use of markers. Such a high-fidelity immediate feedback system demonstration showed potential when putting into practice for emerging cutting-edge industries such as Augmented Reality (AR) and Virtual Reality (VR).

Today we can remove the markers to simulate a digital character, so is it possible to remove the actors one day in the future? KTH Royal Institute of Technology recently developed a speech-driven motion generator by adapting a deep learning-based motion synthesis method called MoGlow.[15] Given only high-level input speech, a rich natural variation of body motion will be obtained on a virtual speaker who may convey a desired character personality or mood. Such control just relies on a large motion capture training dataset. Even though more performance details have not been touched by this method, it should be the holy grail of motion capture.

The pursuit of reality is both the goal and driving force behind motion capture. Motion capture, or performance capture, essentially, is a tool to help match the level of realism of all aspects of the characters. We will definitely walk out of “creepiness”, and perhaps in the near future, such creepiness will develop a new aesthetic that symbolizes the era of innovation.

Bibliography

[1] En.wikipedia.org. 2021. Motion capture. [online] Available at: <https://en.wikipedia.org/wiki/Motion_capture>.

[2] Film, I., 2021. Animation: Rotoscoping. [online] Intofilm.org. Available at: <https://www.intofilm.org/films/filmlist/87>.

[3] En.wikipedia.org. 2021. Out of the Inkwell. [online] Available at: <https://en.wikipedia.org/wiki/Out_of_the_Inkwell> .

[4] Vasulka.org. 2021. [online] Available at: <http://www.vasulka.org/Kitchen/PDF_Eigenwelt/pdf/092-095.pdf> .

[5] Www6.uniovi.es. 2021. A Brief History of Motion Capture for Computer CharacterAnimation. [online] Available at: <http://www6.uniovi.es/hypgraph/animation/character_animation/motion_capture/history1.htm> .

[6] Super Mario Wiki. 2021. Mario in Real Time. [online] Available at: <https://www.mariowiki.com/Mario_in_Real_Time> .

[7] En.wikipedia.org. 2021. Virtua Fighter 2. [online] Available at: <https://en.wikipedia.org/wiki/Virtua_Fighter_2>.

[8] Virtua Fighter dot com. 2021. Virtua Ghost in the Shell. [online] Available at: <https://virtuafighter.com/threads/virtua-ghost-in-the-shell.21086/>.

[9] En.wikipedia.org. 2021. Jar Jar Binks. [online] Available at: <https://en.wikipedia.org/wiki/Jar_Jar_Binks>.

[10] En.wikipedia.org. 2021. Final Fantasy: The Spirits Within. [online] Available at: <https://en.wikipedia.org/wiki/Final_Fantasy:_The_Spirits_Within>.

[11] Empire. 2021. A History Of CGI In The Movies. [online] Available at: <https://www.empireonline.com/movies/features/history-cgi/>.

[12] En.wikipedia.org. 2021. Uncanny valley. [online] Available at: <https://en.wikipedia.org/wiki/Uncanny_valley>.

[13] En.wikipedia.org. 2021. Avatar (2009 film). [online] Available at: <https://en.wikipedia.org/wiki/Avatar_(2009_film)>.

[14] Cdn2.unrealengine.com. 2021. [online] Available at: <https://cdn2.unrealengine.com/Unreal+Engine%2Fperformance-capture-whitepaper%2FLPC_Whitepaper_final-7f4163190d9926a15142eafcca15e8da5f4d0701.pdf>.

[15] Alexanderson, S., Henter, G., Kucherenko, T. and Beskow, J., 2021. Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows. [online] Diglib.eg.org. Available at: <https://diglib.eg.org/handle/10.1111/cgf13946>.

MA VFX 2020-2021

Alex NI

Week 4 |THE PAST, PRESENT AND FUTURE OF MOTION CAPTURE

Bibliography

Leave a Reply Cancel reply