If you’re reading this post, you probably visited the text to speech page that we put up. Hope you found it fun to use. Below we’ll outline how we created the voices on the page, including the tech as well as the process.
How the voices were made
First off, all of these voices were created using Microsoft’s Custom Voice product. You can read more about it here.
To create a custom voice font, you need to send in small chunks of monologue along with a matching transcript. They suggest adding 30 minutes to 40 hours of audio, cut into these small chunks (30 minutes is hundreds of them, 40 hours is tends of thousands), for this project, we used anywhere from 5-10 minutes of audio for a a couple of reasons:
- It takes a lot of time to get each voice into the data format needed. We decided to go for quantity of voices over quality.
- We wanted the voices to have the essence of the person, but not be so realistic they could be used for nefarious purposes. eg. deepfakes etc.
Why did we do this?
We are always interested in learning more and testing new ways that audio can be shared on social media, and thought having these voices available could be a good way to spark some interest.
The last reason is just purely for fun. We were testing out the custom voice platform and found ourselves playing around specifically with the Mark Zuckerberg voice a lot (it definitely derailed a few product meetings) so we thought if we were getting a laugh from this, other’s might too.
How will this tech be incorporated into Headliner?
In 2019 we plan to allow some users to create their own custom voice fonts for a text to speech option within Headliner. We imagine this will initially be used for creating flash briefings and video versions of blog posts and articles. Essentially, it will give people the ability to quickly add voice over to any of their social videos, using a custom created voice specific to their site or brand.