AI-based app can now modify talking heads in videos to cover up muffed lines

Those in the world of movies and television are well-aware of actors muffing their dialogues, or fumbling. This often necessitates re-takes or some clever editing.

But now, thanks to technology, a video editor can modify the visuals using a text transcript. Which means, he/she can add new words or delete unwanted ones from the dialogues?

That’s exactly what researchers have now developed. A team of researchers from Stanford University, Max Planck Institute for Informatics, Princeton University and Adobe Research have come up with an algorithm for editing talking-head videos – videos showing speakers from the shoulders up. All of it means the use of AI in video editing.

Here`s how it works:

The new app uses the transcript to extract speech motions from various video pieces.

Then, with machine learning, it converts those into a final video that appears natural to the viewer – lip-synched and all.

So, essentially, when an actor flubs a word or mins-pronounces, the new technique allows the editor to edit the transcript, and the application will assemble the right word from various words or portions of words spoken elsewhere in the video. It’s the equivalent of rewriting with video, much like a writer retypes a misspelled or unfit word.

What is required, though is at least 40 minutes of original video as input.

According to an article on the Stanford Site, the process is seamless, visually. “There’s no need to rerecord anything,” says Fried in that report. He is the first author of a paper about the research published on the pre-publication Website arXiv.

Fried works in the lab of Maneesh Agrawala, the Forest Baskett Professor in the School of Engineering and senior author of the paper. The project began around two years ago when Fried was a graduate student working with computer scientist Adam Finkelstein at Princeton.

To make the video appear more natural, the algorithm applies intelligent smoothing to the motion parameters and renders a 3D animated version of the desired result.

For now, the rendered face is still far from realistic. As a final step, a machine learning technique called Neural Rendering converts the low-fidelity digital model into a photorealistic video in perfect lip-synch. Indeed, AI in video editing is soon on the way.

Content Credit: Stanford News

Video Credit: YouTube


 

Leave a Reply

Click here to opt out of Google Analytics