🎯 Voice AI Consulting
Get dedicated support and consultation to ensure your specific needs are met.
Consult an AI Expert

Learn how to add streaming text to speech (TTS) capabilities to your Node.js application with on-device voice generation—no cloud APIs required. While solutions like Amazon Polly, Google Cloud Text-to-Speech, Azure Speech, OpenAI TTS, ElevenLabs, and Cartesia have become popular, they're all cloud-based, introducing latency, privacy risks, and reliability issues for enterprise applications.

Orca Streaming Text-to-Speech processes text-to-speech locally with zero network dependency while maintaining natural-sounding voice output—perfect for accessibility tools, voice assistants, content readers, and real-time applications that need to read text aloud.

Challenge: Building voice-enabled applications typically requires choosing between cloud TTS APIs (with latency and privacy concerns) or low-quality local synthesis.

Solution: Orca Streaming Text-to-Speech provides cloud-quality text-to-speech entirely on-device, with streaming synthesis for real-time applications.

See our open-source TTS latency benchmark comparing Orca against Amazon Polly, Google Cloud TTS, Azure Speech, OpenAI, and ElevenLabs. Orca achieves 130ms first-token-to-speech latency—6.5x faster than ElevenLabs and up to 16x faster than other cloud solutions.

This tutorial shows you how to implement text-to-speech in Node.js with the Orca Streaming Text-to-Speech Node.js SDK. This cross-platform solution runs across Windows, macOS, Linux, and Raspberry Pi—converting text into natural-sounding speech for real-time applications.

Step-by-Step Guide: Streaming TTS in Node.js

Prerequisites

  1. Download Node.js (v18 or newer)
  2. Sign up for a Picovoice Console account and copy your AccessKey
  3. Ensure your device has a working audio output device (speaker or headphones)

1. Install Node.js Packages

Install the Node.js packages for Orca Streaming Text-to-Speech and PvSpeaker:

2. Initialize Streaming TTS Engine & Audio Playback Library

Initialize Orca Streaming Text-to-Speech with your AccessKey.

Initialize PvSpeaker with the appropriate sample rate (orca.sampleRate) and a 16-bit depth so it can correctly play audio generated by Orca.

3. Synthesize Speech from a Text Stream

Feed text into the OrcaStream object chunk by chunk (e.g. token by token). OrcaStream buffers incoming text until there's enough context to generate audio. If insufficient text is available, null is returned; otherwise, it returns PCM audio.

After all text has been passed to synthesize, call flush to generate any remaining PCM audio. Add all generated PCM to a buffer to prepare for audio playback.

The text stream could be from any streaming text source, such as a large language model (LLM) response.

4. Play Synthesized Speech

Pass the PCM buffer to PvSpeaker for playback. PvSpeaker.write may not write the entire buffer at once, so it returns the number of samples successfully written — use this value to update the buffer accordingly.

Once all PCM has been written, call flush. This blocks the thread until all PCM has been played.

We've separated PCM generation from playback for simplicity, but in production they should run simultaneously so audio can begin playing without having to wait for all audio to be generated first.

5. Clean Up

When finished, stop the audio output device, close the OrcaStream object, and release all resources to free memory.

Complete Demo: Real-Time TTS in Node.js

Here is a complete example that synthesizes speech from user input and plays it to your audio output device.

This demo uses the following packages:

For a more detailed guide, refer to the documentation:

For a complete demo application, check out the Orca Streaming Text-to-Speech Node.js Demo on GitHub.

Troubleshooting Common Issues

No Audio Output

  • Check that your system audio output is working and that your application has permission to use it.
  • Make sure PvSpeaker is properly initialized and started using start(), and that write() is called with valid PCM frames.

Speech Cuts Off or Stutters

  • Provide text to Orca as soon as it is available to ensure smooth streaming.
  • If your text streamer has a low token-per-second (TPS) rate, buffer a few seconds of text before sending it to Orca.
  • Use an intermediate audio buffer rather than streaming PCM directly from Orca to PvSpeaker, so you can handle cases where write() receives more audio than its internal buffer can fit.

You can configure the size of PvSpeaker's internal buffer with the bufferSizeSecs parameter, though a greater buffer size will require more memory. The default is 20 seconds.

Next Steps For Your Enterprise Voice Solution

Enhance your enterprise voice application by integrating additional speech recognition technology:

Start Building