Video-Driven Speech Reconstruction with GANs! #GAN #ArtificialIntelligence #MachineLearning #AI #Speech @imperialcollege @MajaPantic70 « Adafruit Industries – Makers, hackers, artists, designers and engineers!

0

Popular Categoriesview all

3D printing (14961)
Raspberry Pi (9580)
art (8427)
wearables (4882)
science (4378)
arduino (4340)
random (3712)
CircuitPython (3184)
costuming (3009)
music (2947)

adafruit learning system (2864)
cosplay (2781)
robotics (2680)
educators (2632)
community (2592)
history (2358)
space (2353)
ask-an-engineer (2260)
holiday (2129)
New Products (2015)

New Postsview all

April 17, 2024 at 10:30 am
LIVE! 3D Hangouts with Noe and Pedro

April 17, 2024 at 10:13 am
Raspberry Pi brings GitHub’s Octocat to…

April 17, 2024 at 10:00 am
Make Yourself Kinetic Coasters

Featured Postsview all

April 13, 2024 at 8:00 pm
Adafruit New Products this Week: Featuring the…

April 5, 2024 at 2:30 pm
New nEw NEWS From Adafruit Round-Up: January,…

March 27, 2024 at 4:01 pm
Celebrating over 39,000 members in the Adafruit…

February 20, 2020 AT 1:46 am

Video-Driven Speech Reconstruction with GANs! #GAN #ArtificialIntelligence #MachineLearning #AI #Speech @imperialcollege @MajaPantic70

From the publication, “Video-Driven Speech Reconstruction using Generative Adversarial Networks”. https://arxiv.org/abs/1906.06301

Last year the iBUG group out of the Imperial College London and the Samsung AI Centre published a paper on speech reconstruction from video. The model presented is novel in its ability to generate interpretable speech from video only of previously unseen participants. The main ML engine for the workflow is the Wasserstein GAN and is part of a collection of networks working together to generate speech. The model is composed of three parts: the generator network, the critic which forces the generation of ‘natural’ sounding waveforms, and a speech encoder.

The generator network…is responsible for transforming the sequence of video frames into a waveform. During the training phase the critic network drives the generator to produce waveforms that sound similar to natural speech. Finally, a pretrained speech encoder is used to conserve the speech content of the waveform.

The model is trained on the GRID dataset which is a freely available audiovisual corpus of participants reading sentences. The model was evaluated based on “sound quality” and on “accuracy of the spoken words”. They also posted videos of model performance and comparisons with another recent framework/model Lip2AudSpec with quite impressive results.

If you’d like to learn more about the authors check out their pages on iBUG. If you’d like to check out their work you can find the first and second authors on GitHub. Aaaand…if you’re still interested in more lip reading fun, take a look at this video of Rasputin killing it at some Beyoncé karaoke.

Written by Rebecca Minich, Product Analyst, Data Science at Google. Opinions expressed are solely my own and do not express the views or opinions of my employer.

Adafruit publishes a wide range of writing and video content, including interviews and reporting on the maker market and the wider technology world. Our standards page is intended as a guide to best practices that Adafruit uses, as well as an outline of the ethical standards Adafruit aspires to. While Adafruit is not an independent journalistic institution, Adafruit strives to be a fair, informative, and positive voice within the community – check it out here: adafruit.com/editorialstandards

Adafruit is on Mastodon, join in! adafruit.com/mastodon

Stop breadboarding and soldering – start making immediately! Adafruit’s Circuit Playground is jam-packed with LEDs, sensors, buttons, alligator clip pads and more. Build projects with Circuit Playground in a few minutes with the drag-and-drop MakeCode programming site, learn computer science using the CS Discoveries class on code.org, jump into CircuitPython to learn Python and hardware together, TinyGO, or even use the Arduino IDE. Circuit Playground Express is the newest and best Circuit Playground board, with support for CircuitPython, MakeCode, and Arduino. It has a powerful processor, 10 NeoPixels, mini speaker, InfraRed receive and transmit, two buttons, a switch, 14 alligator clip pads, and lots of sensors: capacitive touch, IR proximity, temperature, light, motion and sound. A whole wide world of electronics and coding is waiting for you, and it fits in the palm of your hand.

Have an amazing project to share? The Electronics Show and Tell is every Wednesday at 7pm ET! To join, head over to YouTube and check out the show’s live chat – we’ll post the link there.

Join us every Wednesday night at 8pm ET for Ask an Engineer!

Join over 36,000+ makers on Adafruit’s Discord channels and be part of the community! http://adafru.it/discord

CircuitPython – The easiest way to program microcontrollers – CircuitPython.org

Maker Business — “Packaging” chips in the US

Wearables — Enclosures help fight body humidity in costumes

Electronics — Transformers: More than meets the eye!

Python for Microcontrollers — Python on Microcontrollers Newsletter: Silicon Labs introduces CircuitPython support, and more! #CircuitPython #Python #micropython @ThePSF @Raspberry_Pi

Adafruit IoT Monthly — Guardian Robot, Weather-wise Umbrella Stand, and more!

Microsoft MakeCode — MakeCode Thank You!

EYE on NPI — Maxim’s Himalaya uSLIC Step-Down Power Module #EyeOnNPI @maximintegrated @digikey

New Products – Adafruit Industries – Makers, hackers, artists, designers and engineers! — #NewProds 7/19/23 Feat. Adafruit Matrix Portal S3 CircuitPython Powered Internet Display!

Get the only spam-free daily newsletter about wearables, running a "maker business", electronic tips and more! Subscribe at AdafruitDaily.com !

No Comments

No comments yet.

Sorry, the comment form is closed at this time.

Filed under: Artificial intelligence, Data Science, Deep Learning, machine learning, speech —
Tags: artificial intelligence, deep learning, facial recognition, GAN, generative adversarial network, machine learning, speech, speech synthesis — by Becca

Comments Off