Last year the iBUG group out of the Imperial College London and the Samsung AI Centre published a paper on speech reconstruction from video. The model presented is novel in its ability to generate interpretable speech from video only of previously unseen participants. The main ML engine for the workflow is the Wasserstein GAN and is part of a collection of networks working together to generate speech. The model is composed of three parts: the generator network, the critic which forces the generation of ‘natural’ sounding waveforms, and a speech encoder.
The generator network…is responsible for transforming the sequence of video frames into a waveform. During the training phase the critic network drives the generator to produce waveforms that sound similar to natural speech. Finally, a pretrained speech encoder is used to conserve the speech content of the waveform.
The model is trained on the GRID dataset which is a freely available audiovisual corpus of participants reading sentences. The model was evaluated based on “sound quality” and on “accuracy of the spoken words”. They also posted videos of model performance and comparisons with another recent framework/model Lip2AudSpec with quite impressive results.
If you’d like to learn more about the authors checkouttheirpages on iBUG. If you’d like to check out their work you can find the first and second authors on GitHub. Aaaand…if you’re still interested in more lip reading fun, take a look at this video of Rasputin killing it at some Beyoncé karaoke.
Written by Rebecca Minich, Product Analyst, Data Science at Google. Opinions expressed are solely my own and do not express the views or opinions of my employer.
We are angry, frustrated, and in pain because of the violence and murder of Black people by the police because of racism. We are in the fight AGAINST RACISM. George Floyd was murdered, his life stolen. The Adafruit teams have specific actions we’ve done, are doing, and will do together as a company and culture. We are asking the Adafruit community to get involved and share what you are doing. The Adafruit teams will not settle for a hash tag, a Tweet, or an icon change. We will work on real change, and that requires real action and real work together. That is what we will do each day, each month, each year – we will hold ourselves accountable and publish our collective efforts, partnerships, activism, donations, openly and publicly. Our blog and social media platforms will be utilized in actionable ways. Join us and the anti-racist efforts working to end police brutality, reform the criminal justice system, and dismantle the many other forms of systemic racism at work in this country, read more @ adafruit.com/blacklivesmatter
Stop breadboarding and soldering – start making immediately! Adafruit’s Circuit Playground is jam-packed with LEDs, sensors, buttons, alligator clip pads and more. Build projects with Circuit Playground in a few minutes with the drag-and-drop MakeCode programming site, learn computer science using the CS Discoveries class on code.org, jump into CircuitPython to learn Python and hardware together, TinyGO, or even use the Arduino IDE. Circuit Playground Express is the newest and best Circuit Playground board, with support for CircuitPython, MakeCode, and Arduino. It has a powerful processor, 10 NeoPixels, mini speaker, InfraRed receive and transmit, two buttons, a switch, 14 alligator clip pads, and lots of sensors: capacitive touch, IR proximity, temperature, light, motion and sound. A whole wide world of electronics and coding is waiting for you, and it fits in the palm of your hand.