Text-to-speech
(TTS
) converts written text to synthesized speech, enabling various voice interfaces and applications
such as virtual assistants, audiobooks, and accessibility tools.
In our previous post, we explored on-device TTS
solutions
that allow for synthesizing speech directly on a user's device.
This blog compares top cloud-based TTS
systems that process text input in the cloud and transmit audio output back to
users’ devices. For use cases where internet connectivity and privacy are not a concern, cloud-based TTS
systems offer
higher quality voices across various languages and accents.
In this article, we will explore the TTS
Python APIs of major tech companies
including Google Cloud Text-to-Speech, Amazon Polly,
or Microsoft Text-to-Speech.
Google Cloud Text-to-Speech
Google Cloud Text-to-Speech leverages neural network models created by
DeepMind and supports hundreds of voices across languages, dialects, and accents.
To get started you will need to sign up for a Google Cloud Platform
account, create a new project, and set up your credentials.
Refer to the documentation for more details.
To synthesize speech, follow the steps:
- Install the Google
TTS
Python library:
- Import the library and create a client:
- Use the following function to synthesize speech from text:
To learn about the available voices and languages, visit Google's documentation.
Amazon Polly
Amazon Polly Text-to-Speech has two offerings: Standard TTS and Neural TTS. Polly Standard TTS leverages concatenative synthesis, whereas Neural TTS leverages neural networks, resulting in more natural and human-like voices.
To get started, create an AWS
account and set up your credentials.
Then, follow the steps:
- Install the Amazon Polly Python library:
- Import the library and create a client in Python.
YourProfileName
corresponds to the name of your AWS profile account:
- Synthesize speech with the following function:
Check Amazon's documentation for the available voices.
Microsoft Azure TTS
Microsoft Text-to-Speech offers Text-to-Speech under its Azure AI Speech services, and has similar offerings as Google and Amazon to synthesize speech in a variety of languages, voices, and dialects. They also focus on training custom voice models.
To synthesize speech, you first need to sign up for an Azure account and create a speech resource in the Azure portal. Then, follow the steps:
- Install the Microsoft Azure Python library:
- Set environment variables
SPEECH_KEY
andSPEECH_REGION
to the ones created in your speech resource in the Azure portal. - Import the library in Python:
- Use the following function to synthesize speech from text:
For available languages and voices, check Microsoft's documentation.
Conclusion
In summary, the top cloud providers offer high-quality TTS
services accessible via Python.
They come in standard voices and high-quality neural voices at different price points.
On top of the examples above, the APIs also allow adjusting speech parameters like rate, pitch, and speaking style, and
support Speech Synthesis Markup Language
(SSML
) for fine-tuned speech synthesis control.
They also offer support for creating custom voices by engaging with the respective sales teams.