Out of time: automated lip sync in the wild

Chung JS, Zisserman A

1 March 2017

Conference paper

Journal:

Workshop on Multi-view Lip-reading, 13th Asian Conference on Computer Vision (ACCV 2016)

Springer

The goal of this work is to determine the audio-video synchronisation between mouth motion and speech in a video.

We propose a two-stream ConvNet architecture that enables the mapping between the sound and the mouth images to be trained end-to-end from unlabelled data. The trained network is used to determine the lip-sync error in a video.

We apply the network to two further tasks: active speaker detection and lip reading. On both tasks we set a new state-of-the-art on standard benchmark datasets.

DOI

10.1007/978-3-319-54427-4_19

ORA record

Out of time: automated lip sync in the wild

Departments

Contact

Follow us