How to Implement Streaming Text-to-Speech in Node.js

🎯 Voice AI Consulting

Get dedicated support and consultation to ensure your specific needs are met.

Learn how to add streaming text to speech (TTS) capabilities to your Node.js application with on-device voice generation—no cloud APIs required. While solutions like Amazon Polly, Google Cloud Text-to-Speech, Azure Speech, OpenAI TTS, ElevenLabs, and Cartesia have become popular, they're all cloud-based, introducing latency, privacy risks, and reliability issues for enterprise applications.

Orca Streaming Text-to-Speech processes text-to-speech locally with zero network dependency while maintaining natural-sounding voice output—perfect for accessibility tools, voice assistants, content readers, and real-time applications that need to read text aloud.

Challenge: Building voice-enabled applications typically requires choosing between cloud TTS APIs (with latency and privacy concerns) or low-quality local synthesis.

Solution: Orca Streaming Text-to-Speech provides cloud-quality text-to-speech entirely on-device, with streaming synthesis for real-time applications.

See our open-source TTS latency benchmark comparing Orca against Amazon Polly, Google Cloud TTS, Azure Speech, OpenAI, and ElevenLabs. Orca achieves 130ms first-token-to-speech latency—6.5x faster than ElevenLabs and up to 16x faster than other cloud solutions.

This tutorial shows you how to implement text-to-speech in Node.js with the Orca Streaming Text-to-Speech Node.js SDK. This cross-platform solution runs across Windows, macOS, Linux, and Raspberry Pi—converting text into natural-sounding speech for real-time applications.

Step-by-Step Guide: Streaming TTS in Node.js

Prerequisites

Download Node.js (v18 or newer)
Sign up for a Picovoice Console account and copy your AccessKey
Ensure your device has a working audio output device (speaker or headphones)

1. Install Node.js Packages

Install the Node.js packages for Orca Streaming Text-to-Speech and PvSpeaker:

npm install @picovoice/orca-node @picovoice/pvspeaker-node

2. Initialize Streaming TTS Engine & Audio Playback Library

Initialize Orca Streaming Text-to-Speech with your AccessKey.

Initialize PvSpeaker with the appropriate sample rate (orca.sampleRate) and a 16-bit depth so it can correctly play audio generated by Orca.

const { Orca } = require("@picovoice/orca-node");
const { PvSpeaker } = require("@picovoice/pvspeaker-node");

async function main() {
  const orca = new Orca("${ACCESS_KEY}"); // AccessKey from Picovoice Console
  const stream = orca.streamOpen();

  const bitsPerSample = 16;
  const speaker = new PvSpeaker(orca.sampleRate, bitsPerSample);
  speaker.start();
}

3. Synthesize Speech from a Text Stream

Feed text into the OrcaStream object chunk by chunk (e.g. token by token). OrcaStream buffers incoming text until there's enough context to generate audio. If insufficient text is available, null is returned; otherwise, it returns PCM audio.

let pcmBuffer = [];

for (const token of textStream) {
  const pcm = stream.synthesize(token);
  if (pcm !== null) {
    pcmBuffer.push(...pcm);
  }
}

const flushedPcm = stream.flush();
if (flushedPcm !== null) {
  pcmBuffer.push(...flushedPcm);
}

After all text has been passed to synthesize, call flush to generate any remaining PCM audio. Add all generated PCM to a buffer to prepare for audio playback.

The text stream could be from any streaming text source, such as a large language model (LLM) response.

4. Play Synthesized Speech

Pass the PCM buffer to PvSpeaker for playback. PvSpeaker.write may not write the entire buffer at once, so it returns the number of samples successfully written — use this value to update the buffer accordingly.

while (pcmBuffer.length > 0) {
  const arrayBuffer = new Int16Array(pcmBuffer).buffer;
  const written = speaker.write(arrayBuffer);
  pcmBuffer = pcmBuffer.slice(written);
}

speaker.flush();

Once all PCM has been written, call flush. This blocks the thread until all PCM has been played.

We've separated PCM generation from playback for simplicity, but in production they should run simultaneously so audio can begin playing without having to wait for all audio to be generated first.

5. Clean Up

When finished, stop the audio output device, close the OrcaStream object, and release all resources to free memory.

speaker.stop();
stream.close();

speaker.release();
orca.release();

Complete Demo: Real-Time TTS in Node.js

Here is a complete example that synthesizes speech from user input and plays it to your audio output device.

const { Orca } = require("@picovoice/orca-node");
const { PvSpeaker } = require("@picovoice/pvspeaker-node");
const readline = require("readline");

async function main() {
  let orca = null;
  let speaker = null;

  try {
    orca = new Orca("${ACCESS_KEY}");
    const stream = orca.streamOpen();

    const bitsPerSample = 16;
    speaker = new PvSpeaker(orca.sampleRate, bitsPerSample);
    speaker.start();

    const rl = readline.createInterface({
      input: process.stdin,
      output: process.stdout,
    });

    console.log("Streaming TTS ready. Type something and press ENTER.");

    const userInput = await new Promise((resolve) => {
      rl.question("> ", resolve);
    });

    rl.close();

    const words = userInput.match(/\S+\s*/g) || [];

    console.log("Streaming speech...");

    let pcmBuffer = [];
    for (const word of words) {
      process.stdout.write(word);

      const pcm = stream.synthesize(word);
      if (pcm !== null) {
        pcmBuffer.push(...pcm);
      }

      if (pcmBuffer.length > 0) {
        const arrayBuffer = new Int16Array(pcmBuffer).buffer;
        const written = speaker.write(arrayBuffer);
        pcmBuffer = pcmBuffer.slice(written);
      }
    }
    process.stdout.write("\n");

    const flushedPcm = stream.flush();
    if (flushedPcm !== null) {
      pcmBuffer.push(...flushedPcm);
    }

    const arrayBuffer = new Int16Array(pcmBuffer).buffer;
    speaker.flush(arrayBuffer);

    speaker.stop();
    stream.close();
  } catch (err) {
    console.error("Error:", err);
  } finally {
    console.log("Releasing resources...");
    speaker.release();
    orca.release();
    console.log("Done.");
    process.exit(0);
  }
}

main();

This demo uses the following packages:

For a more detailed guide, refer to the documentation:

For a complete demo application, check out the Orca Streaming Text-to-Speech Node.js Demo on GitHub.

Troubleshooting Common Issues

No Audio Output

Check that your system audio output is working and that your application has permission to use it.
Make sure PvSpeaker is properly initialized and started using start(), and that write() is called with valid PCM frames.

Speech Cuts Off or Stutters

Provide text to Orca as soon as it is available to ensure smooth streaming.
If your text streamer has a low token-per-second (TPS) rate, buffer a few seconds of text before sending it to Orca.
Use an intermediate audio buffer rather than streaming PCM directly from Orca to PvSpeaker, so you can handle cases where write() receives more audio than its internal buffer can fit.

You can configure the size of PvSpeaker's internal buffer with the bufferSizeSecs parameter, though a greater buffer size will require more memory. The default is 20 seconds.

Next Steps For Your Enterprise Voice Solution

Enhance your enterprise voice application by integrating additional speech recognition technology:

Realtime Transcription & On-Device LLMs: Integrate with Cheetah Streaming Speech-to-Text and picoLLM On-Device LLM to build conversational AI assistants.
Speech Detection: Use Cobra Voice Activity Detection to automatically pause TTS when the user begins speaking.
Custom Keyword Spotting: Add Porcupine Wake Word Detection to activate TTS hands-free via a custom wake phrase.
Custom Voice Commands: Implement Rhino Speech-to-Intent to build applications that interpret and respond intelligently to user commands.

Start Building