Google Text-To-Speech translates text into spoken word. In this post, we'll teach you how to use Google Text-to-Speech and its human-like WaveNet voices to improve the accessibility of your blog posts. You don't need to be a software developer to follow our guide.
Why you should add audio to your blog posts
Whether you're an independent author or a big company, you write to create value for and build connections with your audience. Audio is a powerful tool for that. For evidence, look to the popularity of podcasts.
In our case, we received a nice email from a reader asking us to add audio recordings of our blog posts to support the visually impaired. It was more than a great suggestion. It was an opportunity for us to do good and provide some relief.
We want to connect our content with as many people that want to consume it. Our reader taught us; we needed to make our content more accessible. That meant distributing it in multiple formats.
Choosing the right audio solution
To create and distribute audio recordings of your blog posts, you'll need four tools:
- A person or text-to-speech service to read your blog posts
- A podcast host
- Audio editing software, like GarageBand, Logic, or Audacity
- The Google Chrome web browser
A quick Google search will show there are a lot of text-to-speech services on the web. Some of them are good. Many sound robotic and unnatural. None are perfect or could be confused with an actual person. You have three good options.
We chose Google Text-to-Speech. To be specific, we chose to use Google WaveNet voices, with natural speech enabled by deep learning. Its neural network algorithms were the most natural sounding to our ears.
For the price sensitive, Google's free plan will translate 1,000,000 characters each month. That's a lot of blog posts. To give some perspective, we translated just over 121,000 characters to record all of our last five blog posts. That included a lot of trial and error, while we were getting familiar with the service.
Made famous by its performance on Jeopardy, IBM Watson's text-to-speech was our second choice. Watson voices sounded slightly less natural than Google's. Its free plan will read 10,000 characters per month. It's still cost effective though, charging only $.02 per thousand characters, after exceeding the limit of the free plan.
Amazon's Polly sounds less natural than Google WaveNet Text-to-Speech or IBM Watson. It still sounds great though, and it's free. Of the text-to-speech services covered in this guide, Polly is the easiest to use. If you want to skip some of the work we do in the guide, Polly is a great choice.
Create a Google Cloud account
Google Text-to-Speech is part of Google Cloud, Google's developer platform. It's a suite of products similar to those offered through Amazon Web Services and Microsoft Azure. You'll need Google's developer platform to make our text-to-speech solution work. However, as said earlier, you don't need to be a developer. You won't be writing any code.
Accept the terms and conditions
Visit Google Cloud's terms and conditions page to try Google Cloud for free. To proceed, you'll need to accept Google's terms. Review them, and make sure you're comfortable with them before you continue. We're not offering legal advice in this article.
Create your account
Google requires a credit card to proceed. As of the date of this post, their terms explain they won't automatically charge you after you use your complimentary credit of $300. Read carefully. Terms can change in the future. Once you're satisfied, enter your billing address and payment details. Then, continue.
Customize your Google Cloud Platform landing page
Google gives you the option to personalize your Google Cloud Platform landing page. You can invest some time customizing it or skip it. You can always return and do it later.
Create your first Google Cloud project
Google automatically creates your first project for you named, "My First Project." It's a blank canvas. If you want it to have any capabilities, you need to search for a given capability and add it.
Add the Text-to-Speech API to your project
Search for, "Text-to-speech." The results will auto-populate early as you type. Select the "Cloud Text-to-Speech API."
Enable the API. Notice the pricing for API usage. As of this post's date, the WaveNet API is free up to 1,000,000 monthly characters.
Click "Credentials in APIs & Services."
Choose "API key" to create your API key.
Restrict your API key
Choose "Restrict key."
Your API key is basically a password which allows anyone who has it to use your Google account resources. Do not share it with anyone.
Under "API restrictions," select "Google Cloud Text-to-Speech API." This tells Google that your API key can only be used for text-to-speech. It's not perfect security, but it's a step in the right direction.
Notice, at the top of the screen, there are options to regenerate or delete your API key. Regenerating the key will give your existing configuration a new API key, like resetting your password to something new. Deleting the key will end access to the API. If you later decide you no longer want Google to read your blog posts for you, you should delete your key.
Save your changes.
While you're editing your API key, it's a good time to name it something that describes its purpose. That way, if you return to it later, you'll know what it is. Name it, "WaveNet for Google Chrome," since we'll be using the API key with Google Chrome via an extension.
Again, save your changes. You're done. To recap, here's what you accomplished:
- You created your Google Cloud account
- You created your first project in Google Cloud
- You activated the Text-to-Speech API in your project
- You created an API key to use with your project
- You restricted the use of your API key, so it can only be used with the Text-to-Speech API
Panicked, got stuck, or changed your mind?
It happens. Don't worry. You can delete your project and API key. If you're not using them anymore, it's good housekeeping.
Shut down your project
Shutting down your project is like deleting it. Select your project in the Google Cloud console. Navigate the project settings. Choose "Shut down." Then, follow the on-screen instructions to terminate the project.
Since your API key is attached to your project, it's removed when you shut down your project. You don't need to separately delete your API key.
Install WaveNet for Chrome
WaveNet for Chrome is a Chrome extension that will allow you to use Google Text-to-Speech in your browser when paired with the API key you created. Go to the Chrome web store to find and install the extension. Before installing it, review and understand the permissions the extension requests, a good practice for any software you install.
Retrieve your API key
Go back to the Google Cloud console. Open the main menu. Select "APIs & Services" and choose "Credentials" from the context menu. Copy the API key you created. You can now close the Google Cloud console.
Add your Google Text-to-Speech API key to the Chrome extension
Open the WaveNet for Chrome extension. Paste your API key into the extension. With this, you've completed the setup process. You won't need to repeat it unless you make a change or delete your API key. You can now use the Wavenet for Chrome extension to read text out loud and download it as an MP3 file.
Choose a Text-to-Speech voice
Visit the Google Cloud Text-to-Speech home page. Enter some sentences. Try different WaveNet voices. Play with the pitch and speed sliders until you find a voice you like. Then, open the WaveNet for Chrome extension again, and adjust your "Voice Settings" to match.
Design your new publishing workflow
Workflows are a matter of personal preference. Here, we'll tell you what we do. You can adapt elements of our workflow to your own way of working.
Write your blog post or article
There's nothing special about this. Don't change how you write, so it sounds better when read by Google Text-to-Speech. You don't want to sacrifice the quality of one medium for another. Later, you'll have the opportunity to separately edit what you've written, so it sounds better when Google WaveNet reads it.
After you've written your article, and before preceding to the next step, proofread your work. It's easier to fix spelling and grammar errors when you catch them early in the process than it is when you find them later in the process.
Create a plain text file
Copy the text from your article. Then, paste it as plain text in a new text document. Google Text-to-Speech will read from this file. It's here you'll make minor edits, so your writing will sound good when you hear it read back to you by Google. Save your changes along the way.
Prepare your article for Google-Text-to-Speech
Add punctuation to each heading
You need all the headings in your article to read like sentences. Each one has to end with a period, question mark, or exclamation point. The same is true for your article title and the author's name.
Depending on how you write your content, you might choose to delete some or all of your headings. You can decide after you hear Google WaveNet read your article to you.
Replace colons with periods
It's common to use a colon before starting a bulleted or numbered list. We've done it in this article. Google Text-to-Speech doesn't pause when it reads colons. Do a find-and-replace. Change any colons to periods.
Organize list items
Google Text-to-Speech is unable to read bullets. Each bulleted item can be on a separate line. As you did for your headings, add a period to the end of each item in a list. The WaveNet voice will pause when reading the text. It's will be more natural and understandable.
Delete unnecessary text
Some text may work in your written article, but not in its audio counterpart. You'll probably have a sentence or two like that in every article you write. Delete the text that doesn't add value. Your audio article doesn't need to be a carbon copy of the written one. Once you've finished, save your work as a plain text document.
Transform your article from text to speech and an MP3
Open your text document in Google Chrome. Highlight a large section of your text. You can choose a segment as long at 5,000 characters. Make sure your selection terminates with the end of a sentence, not in the middle of one.
Then, right-click the selected text. You'll see the WaveNet for Chrome extension as an item in the context menu. Choose, "Download as MP3." Repeat this process until you reach the end of your article.
Depending on how long your article is, you may have a handful of files. This is fine.
Listen to the audio
Listen to each MP3 created by Google Text-to-Speech. You might want to make changes to your text file, and download a new MP3 of your text selection. Importantly, hearing your own words read back to you is an efficient way to discover errors. If you find mistakes in your writing, you can update more than your text file. You can correct your written article, having caught your mistakes before you published it.
Stitch your audio files together
You can use GarageBand, Logic, Audacity, or something else to accomplish this. Open your MP3 files in your audio editor of choice. Lay the tracks next to each other. Then, export them as a single track.
Publish your podcast
Upload your audio blog post to your podcast host. Each audio post should be an episode of a podcast. Keep your blog episodes separate from any other podcast episodes. In other words, your blog needs to be its own podcast. You can see an example of this on our podcast mini-site. It'll be less confusing to your readers and listeners. It'll also be easier for your listeners to refer to the written source material. They may want to look at screenshots or embedded video.
Publish your audio posts as YouTube videos
Some podcast providers will automatically create and publish YouTube videos of your podcast, whenever you publish a new episode. This is great. Create a separate playlist for your blog posts, and let your provider publish them on YouTube. You can look at the LEAP WORKS channel for an example.
Embed your podcast player
All podcast hosts provide an embeddable player. Embed your audio episode in your written blog post. Your audience can choose to read your content. They'll also have the option of listening to it on the page. They might even listen to it while looking at the screenshots and images in your post. When someone listens to your blog audio on your written post's page, your site and your podcast will each get credit for the traffic.
Once you've embedded your player, you can publish your blog post. Nice work! You improved the quality of your content and made it more accessible at the same time.