In news that has made pranksters around the world pay attention, there is now a computer program that can create a realistic simulated video of someone speaking.
Researchers at the University of Washington have proved their point by creating a lip-synced video of former US president Barack Obama that blends existing audio and footage.
The program uses artificial intelligence (AI) to match audio of a person speaking with realistic mouth shapes, which it then grafts on to an existing video. After analysing millions of video frames in stock footage, reviewing mouth shapes and sound patterns, the program is able to produce highly realistic simulations.
Have you read?
Faking it in the film industry
The researchers say the technology has the potential to be used in special effects. Currently the process for audio-to-video conversion involves filming lots of people saying the same sentence and attempting to find a correlation between sounds and mouth shapes. As well as being tedious and time-consuming, it also creates what is known as the “uncanny valley” problem, where videos are fairly realistic, but not quite realistic enough. Instead of looking convincing, they tend to look creepy.
The technology could also improve the experience on poor-quality video calls and could have an application for hearing-impaired people, allowing them to lip-read video synthesis created from over-the-phone audio.
The team also estimates that by reversing the process – feeding video into the programme instead of just audio – they could potentially develop an algorithm to detect whether a video is real or faked.
The aim is to improve the algorithms to generalize situations and recognize a person’s voice and speech patterns with less data, for example with one hour of video to learn from instead of the current 14 hours.
The program is only capable of creating video from words spoken by the same person: you can’t yet put your words in someone else’s mouth.