We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon
Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
The video at the end, although it's 30 minutes long, is well worth watching, giving the state of the art of graphics, AI and the Omniverse.
In a turducken of a demo, NVIDIA researchers stuffed four AI models into a serving of digital avatar technology for SIGGRAPH 2021’s Real-Time Live showcase — winning the Best in Show award. The showcase, one of the most anticipated events at the world’s largest computer graphics conference, held virtually this year, celebrates cutting-edge real-time projects spanning game technology, augmented reality and scientific visualization. It featured a lineup of jury-reviewed interactive projects, with presenters hailing from Unity Technologies, Rensselaer Polytechnic Institute, the NYU Future Reality Lab and more.
The demo featured tools to generate digital avatars from a single photo, animate avatars with natural 3D facial motion and convert text to speech.
“Making digital avatars is a notoriously difficult, tedious and expensive process,” said Bryan Catanzaro, vice president of applied deep learning research at NVIDIA, in the presentation. But with AI tools, “there is an easy way to create digital avatars for real people as well as cartoon characters. It can be used for video conferencing, storytelling, virtual assistants and many other applications.”
AI Aces the Interview In the demo, two NVIDIA research scientists played the part of an interviewer and a prospective hire speaking over video conference. Over the course of the call, the interviewee showed off the capabilities of AI-driven digital avatar technology to communicate with the interviewer.
The researcher playing the part of interviewee relied on an NVIDIA RTX laptop throughout, while the other used a desktop workstation powered by RTX A6000 GPUs. The entire pipeline can also be run on GPUs in the cloud.
While sitting in a campus coffee shop, wearing a baseball cap and a face mask, the interviewee used the Vid2Vid Cameo model to appear clean-shaven in a collared shirt on the video call (seen in the image above). The AI model creates realistic digital avatars from a single photo of the subject — no 3D scan or specialized training images required.
“The digital avatar creation is instantaneous, so I can quickly create a different avatar by using a different photo,” he said, demonstrating the capability with another two images of himself.
Instead of transmitting a video stream, the researcher’s system sent only his voice — which was then fed into the NVIDIA Omniverse Audio2Face app. Audio2Face generates natural motion of the head, eyes and lips to match audio input in real time on a 3D head model. This facial animation went into Vid2Vid Cameo to synthesize natural-looking motion with the presenter’s digital avatar.
Not just for photorealistic digital avatars, the researcher fed his speech through Audio2Face and Vid2Vid Cameo to voice an animated character, too. Using NVIDIA StyleGAN, he explained, developers can create infinite digital avatars modeled after cartoon characters or paintings.
The models, optimized to run on NVIDIA RTX GPUs, easily deliver video at 30 frames per second. It’s also highly bandwidth efficient, since the presenter is sending only audio data over the network instead of transmitting a high-resolution video feed.
Taking it a step further, the researcher showed that when his coffee shop surroundings got too loud, the RAD-TTS model could convert typed messages into his voice — replacing the audio fed into Audio2Face. The breakthrough text-to-speech, deep learning-based tool can synthesize lifelike speech from arbitrary text inputs in milliseconds.
RAD-TTS can synthesize a variety of voices, helping developers bring book characters to life or even rap “The Real Slim Shady” by Eminem, as the research team showed in the demo’s finale.