Voices of CS: Purva Tendulkar

The fourth-year PhD student’s love of cartoons springboarded into a research career advancing realism in human-centric generative computer vision.

As kids, many of us spent countless hours watching cartoons, getting lost in the colorful worlds and playful characters. For most, these shows are simply a source of entertainment, a fun escape into magical realms. But for Purva Tendulkar, those endless hours of watching animated movies and shows became something more—a spark that ignited a lifelong curiosity about how animation works.

While other kids pressed play to enjoy the story, Tendulkar found herself drawn to the process behind the scenes. She would watch “making of” videos, fascinated by the creative techniques that brought her favorite characters to life. This early passion for animation has stayed with her, growing into a dedicated pursuit in her PhD studies, where she now works with Carl Vondrick on using computer vision and graphics techniques to make films and video games come alive. Her research, recognized with the prestigious Apple Scholars in AIML PhD fellowship, explores how to create digital humans that interact more authentically with their environments, aiming to push the boundaries of perceptive and generative tools for designers and developers.

(1) Tendulkar at CVPR, (2) Purva Tendulkar, Samir Gadre, Revant Teotia, Basile Van Hoorick, Ruoshi Liu, Sachit Menon, (3) Dídac Surís, Purva Tendulkar, Scott Geng, Arjun Mani, Sachit Menon, Ishaan Chandratreya, Basile Van Hoorick, Carl Vondrick, Sruthi Sudhakar, Mia Chiquier, Revant Teotia

In this interview, Tendulkar delves into the inspiration behind her research and her vision for the future of realistic human motion in digital environments.

Q: Can you describe your research focus and what motivates your work?
My research interests lie at the intersection of Computer Vision, Machine Learning, and Computer Graphics. My vision is to emulate the varied facets of human behavior authentically. I work on understanding and synthesizing humans and how they interact with their physical surroundings. This has applications in developing cutting-edge video games, robotic simulators, and immersive AR/VR experiences – all of which cater to human needs and behaviors.

I grew up watching Disney movies and have been fascinated with the magic that could be created on-screen by bringing animated characters to life and the emotions they evoked in me. I would find myself watching hours of behind-the-scenes of these movies – how the artists hand-designed each character’s personality and refined the animations (e.g., sword fights) to be more realistic.

This directly reflected my choices when it came to selecting a topic to work on in the first year of my PhD – which was to synthesize all the realistic ways humans might interact with the world. This theme has stayed consistent throughout my PhD.

 

Q: What challenges and questions drive your research?
Humans interact with the world, as well as each other, in a variety of physically and logically plausible ways that come very naturally to us. However, it is difficult to teach machines what is plausible and what is not – one of the biggest challenges is the data. Presently, human motion data arises from highly accurate but extremely expensive motion capture systems that are catered to a specific scene. On the other hand, there is abundant information present in internet videos (e.g., on YouTube) that cannot possibly be captured in a studio. Through my research, I aim to creatively combine the benefits of complementary data sources to build powerful generative models that wouldn’t be possible with just one source.

I have worked on generating 3D human-object interactions. Concretely, given a 3D object in a 3D environment (e.g., a mug on a kitchen countertop), my method, called FLEX, is able to generate a number of diverse human avatars, grasping the object realistically. Such a tool could serve as a template for animation artists to work with.

Earlier works that tackled this problem collected expensive full-body 3D human-object interaction data and trained models on it. However, such methods suffer from the limitations of the dataset and do not scale when the object appears in a different configuration than seen during training. For example, when an object was placed on the floor, previous methods would generate humans that were sinking into the floor rather than a kneeling/squatting pose.

Instead of creating a model that has to be trained to learn a full-body pose to grasp an object, we decided to combine two information sources – systems that can generate “hand-only” grasps and full-body data without objects. Then, we developed an algorithm that optimized a full-body pose that matched the “hand-only” grasp. We found that our approach, which does not use full-body grasping data, outperforms methods trained on it, thus challenging existing data collection paradigms for 3D humans and propagating for better utilization of existing sources.

 

Q: Do you play video games, and how do they influence your research?
I quite enjoy playing video games in my free time. Still, I often find myself spending more time appreciating/critiquing the design rather than finishing the game. Some games that I think have great graphics are Horizon Zero Dawn, God of War, and Uncharted.

I think the visuals in current video games are truly impressive from a graphics/rendering standpoint – clothing, skin, object textures, and lighting effects are all very realistic and convincing. But there’s still a lot of room for improvement when it comes to how characters move and interact.

Most characters walk the same way, which strips away their unique personalities. Even human-object interaction feels templated and unrealistic, wherein objects sometimes appear to just “stick” to a hand instead of a realistic grasp. Sometimes, you see characters behave in an unpredictable way when interacting with the world around them. For instance, if you try to move your character into a spot that isn’t pre-programmed, like a tight space in a wall, the game just freaks out, and you might end up walking through the wall! And when it comes to how characters react to each other beyond scripted scenes, it feels a bit off. Say, if you have your character run circles around others or bump into them, the other characters barely react—they just go on about their business as if nothing’s happening.

I find all these problems very fascinating! Getting characters to behave in a truly convincing way would really prove how deeply we understand human behavior. After all, our world is built by humans, for humans, and I am excited to continue pushing the frontiers of 3D human research.