This is pretty amazing.

John Hardy Lambda September 9, 2016 1 Minute

Via Emlyn O’Regan

Originally shared by Vincent Vanhoucke

File under ‘Big Deals’.

Neal Stephenson’s ‘The Diamond Age’ has a funky premise: in a future where technology had pretty much solved everything, one unsolved problem remains: generating human-sounding speech, and people called ‘ractors’ are hired to act out lectures delivered in spoken form.

For a while, it seemed actually believable that delivering human-sounding speech would be very hard, as synthesis technology consistently failed to deliver anything that would fool anyone for more than a few words.

Today the work from our colleagues at DeepMind feels like it’s leaped over the Uncanny Valley, and then some. The examples they provide sound fantastic. This is very exciting. I am really interested to hear what very long-form text sounds like, because that remains the ultimate challenge for TTS.

https://deepmind.com/blog/wavenet-generative-model-raw-audio/

Published by John Hardy

Closures are a poor man's object. Objects are a poor man's closure. View all posts by John Hardy

Published September 9, 2016

4 thoughts on “This is pretty amazing.”

John Jainschigg says:

September 9, 2016 at 1:22 pm

Damn, Daniel … this is good stuff. What blew my mind in the samples was actually hearing ‘vocal fry,’ as you do from human speakers but never from parametric, and rarely from concatenative systems.

LikeLike

Reply
John Hardy says:

September 9, 2016 at 1:42 pm

Cool. I wasn’t aware of that concept before but it’s certainly ubiquitous.

LikeLike

Reply
John Hardy says:

September 9, 2016 at 1:44 pm

I love this jumbled english:

https://storage.googleapis.com/deepmind-media/pixie/knowing-what-to-say/first-list/speaker-1.wav

https://storage.googleapis.com/deepmind-media/pixie/knowing-what-to-say/first-list/speaker-2.wav

https://storage.googleapis.com/deepmind-media/pixie/knowing-what-to-say/first-list/speaker-3.wav

https://storage.googleapis.com/deepmind-media/pixie/knowing-what-to-say/first-list/speaker-4.wav

https://storage.googleapis.com/deepmind-media/pixie/knowing-what-to-say/first-list/speaker-5.wav

https://storage.googleapis.com/deepmind-media/pixie/knowing-what-to-say/first-list/speaker-6.wav

LikeLike

Reply
Michael Tufekci says:

September 9, 2016 at 6:07 pm

Excellent research results. But like other botware I’m not looking forward to deceptive voice spam.

LikeLike

Reply

Share this:

Related

Published by John Hardy

4 thoughts on “This is pretty amazing.”

Leave a comment Cancel reply