This week Microsoft has released their hyper-realistic talking head AI:
VASA is a framework for generating lifelike talking faces of virtual characters with visual affective skills (VAS).
All from a single static image and audio clip.
Their first model, VASA-1, can:
• Synchronise facial expressions and lip movements with audio
• Capture a large spectrum of facial nuances and head motion
• Generate 512×512 videos at up to 40 FPS
We are making leaps and bounds towards real-time engagements with lifelike avatars that emulate human conversational behaviours.
Additional Videos below
Link to Microsoft Reaserch page