Text-to-Speech (TTS) technology, also known as Speech Synthesis, converts text into human-like speech. The rise of deep learning has led to major advancements in TTS quality and naturalness, but at the cost of increased computational requirements. Most big tech companies offer cloud-based TTS APIs, like Google Text-to-Speech, Amazon Polly, or Microsoft Text-to-Speech, and new companies with similar offerings have emerged, such as ElevenLabs, or Coqui Studio. While convenient, these services require an internet connection, raise privacy concerns, and are prone to network outages. On-device solutions allow for more flexibility and privacy by synthesizing speech directly on the user's device. However, few options exist for on-device TTS. This article explores three open-source Python libraries and Picovoice Orca Text-to-Speech.

PyTTSx3

PyTTSx3 is a Python library that utilizes the popular eSpeak speech synthesis engine on Linux (NSSpeechSynthesizer is used on MacOS and SAPI5 on Windows). Getting started is straightforward:

  1. Install pyTTSx3:
  1. Save synthesized speech to a file in Python:

While simple to use, eSpeak's voice quality is robotic compared to more modern TTS systems.

Coqui TTS

Coqui TTS is the open-source repository of Coqui Studio. Developers can leverage Coqui's pretrained models or train custom voices. To synthesize speech, follow the steps:

  1. Install Coqui TTS:
  1. List available models in Python:
  1. Choose a model name and save synthesized speech to a file:

Coqui offers high-quality voices with natural prosody, at the cost of larger model sizes and longer processing times.

Mimic3 from Mycroft

Mycroft is a free and open-source virtual assistant that offers a TTS system called Mimic3. This framework currently lacks a pure Python API, so we will use Python's subprocess:

  1. Install Mycroft:
  1. Synthesize speech and save file to directory OUTPUT/DIR:

For prototyping on-device TTS, Mimic3 from Mycroft provides a balance of quality and performance.

Orca Text-to-Speech

Picovoice Orca Text-to-Speech leverages state-of-the-art Text-to-Speech (TTS) models to provide high-quality voices, while still being small and efficient.

  1. Install Orca Text-to-Speech Python SDK
  1. Import Orca and create an Orca instance.

Sign-up or Log in to Picovoice Console to copy your access key and replace ${ACCESS_KEY} with it.

  1. Synthesize your desired text with

For more information refer to the Orca Text-to-Speech Python SDK Documentation.

Conclusion

On-device TTS removes privacy concerns, internet requirements, and minimizes latency. With Python solutions like PyTTSx3, Coqui TTS, and Mimic3, developers have several options for synthesizing speech directly on devices based on their needs. However, each solution comes with drawbacks such as poor voice quality, large resource requirements, or lack of flexible APIs. Another alternative is Orca Text-to-Speech, which combines state-of-the-art neural TTS with efficiency, allowing to synthesize high-quality speech even on a Raspberry Pi.