How to Add Streaming Text-to-Speech to a .NET App

🎯 Voice AI Consulting

Get dedicated support and consultation to ensure your specific needs are met.

Text-to-speech (TTS) has become a core feature in modern applications—whether it's used to read outputs from large language models (LLMs) to provide voice responses in conversational AI, narrate content, or power accessibility tools. Traditional TTS solutions often depend on cloud services, which can introduce latency and raise privacy concerns, as data must be sent to external servers for processing.

Orca Streaming Text-to-Speech addresses these challenges by running entirely on-device. With Orca Streaming Text-to-Speech, enterprise .NET applications generate high-quality, human-like speech in real time—without the need for cloud reliance. This approach not only minimizes delays but also ensures sensitive information remains private, making it ideal for HIPAA-compliant healthcare apps, financial services, and other privacy-sensitive applications.

In this post, we'll guide you through the process of integrating Orca Streaming Text-to-Speech into your .NET C# app—covering installation, speech synthesis, and audio playback for enterprise-grade deployments.

Use picoLLM as your on-device LLM inference engine to add real-time AI capabilities directly in your .NET app—enabling chat, question answering, and content generation while keeping all data private.

Step-by-Step: Implement Streaming TTS in .NET

Ensure that your environment meets the following .NET requirements:

Windows (x86_64): .NET Framework 4.6.1+, .NET Standard 2.0+, or .NET Core 3.0+
macOS (x86_64): .NET Standard 2.0+ or .NET Core 3.0+
macOS (arm64), Windows (arm64), Linux (x86_64), Raspberry Pi (3, 4, 5): .NET 6.0+

Next, sign up for a Picovoice Console account and copy your AccessKey.

1. Install the NuGet Package

Install the Picovoice.Orca NuGet package:

dotnet add package Picovoice.Orca

2. Initialize Orca

Create a new instance of Orca in your application, passing in your AccessKey. Then, create an Orca.OrcaStream object:

using Pv;

Orca orca = Orca.Create("${ACCESS_KEY}");

Orca.OrcaStream orcaStream = orca.StreamOpen();

Orca Streaming Text-to-Speech also supports single synthesis, where the entire text is synthesized in one call. If you don't need streaming synthesis, check out the Orca .NET API docs for more details on single synthesis.

3. Synthesize Speech From Text

Orca synthesizes speech from real-time or partial text inputs, like LLM-generated tokens. The audio is then available in buffers for immediate playback.

string[] tokens = { "LLM ", " generated", " text", "." };

foreach (string token in tokens)
{
    short[] pcm = orcaStream.Synthesize(token);
    if (pcm != null)
    {
        // handle pcm chunk
    }
}

See the next section for audio playback instructions.

4. Clean Up Resources

Dispose of your Orca instance when you're done to free memory and resources.

orca.Dispose();

Step-by-Step: Implement Audio Playback in .NET

Handling audio playback in .NET can be complex, especially when dealing with various audio formats, streams, and device compatibility. PvSpeaker simplifies this process by integrating seamlessly with Orca to play back the synthesized speech.

1. Install the NuGet Package

Install the PvSpeaker NuGet package:

dotnet add package PvSpeaker

2. Initialize PvSpeaker

Create an instance of PvSpeaker, providing the sample rate and bit depth (bitsPerSample) of the audio you'll be playing:

using Pv;

PvSpeaker speaker = new PvSpeaker(
    sampleRate: 22050, // orca.SampleRate
    bitsPerSample: 16);

You can obtain the sample rate from Orca using orca.SampleRate. Orca's internal bit depth is 16.

3. Start Audio Playback Device

Start the audio playback device:

speaker.Start();

4. Write Audio Data For Playback

Write PCM data to the PvSpeaker instance for playback:

public static byte[] GetNextAudioFrame() { } // get audio data from Orca

int writtenLength = speaker.Write(GetNextAudioFrame());

Important: PvSpeaker may not write all audio frames in one call, depending on its buffer capacity. Each Write call returns the number of samples written, which you can use to manage an audio buffer at the application level.

5. Implement a Thread-Safe Audio Buffer

Implement a buffer to store the synthesized PCM audio data while it waits to be written to the speaker. Here's a custom thread-safe class using a linked list and locking mechanism:

public class ThreadSafeDeque<T>
{
    private readonly LinkedList<T> list = new LinkedList<T>();
    private readonly object locker = new object();

    public void AddToFront(T item)
    {
        lock (locker)
        {
            list.AddFirst(item);
        }
    }

    public void AddToEnd(T item)
    {
        lock (locker)
        {
            list.AddLast(item);
        }
    }

    public bool TryRemoveFromStart(out T item)
    {
        lock (locker)
        {
            if (list.Count == 0)
            {
                item = default!;
                return false;
            }

            item = list.First!.Value;
            list.RemoveFirst();
            return true;
        }
    }

    public bool IsEmpty()
    {
        lock (locker)
        {
            return list.Count == 0;
        }
    }
}

Write audio for audio playback properly using ThreadSafeDeque:

ThreadSafeDeque<short[]> pcmDeque = new ThreadSafeDeque<short[]>();

if (pcmDeque.TryRemoveFromStart(out short[]? pcm))
{
    Span<byte> bytes = MemoryMarshal.AsBytes(pcm.AsSpan());
    int written = speaker.Write(bytes.ToArray());
    if (written < pcm.Length)
    {
        short[] remaining = pcm.Skip(written).ToArray();
        pcmDeque.AddToFront(remaining);
    }
}

6. Wait For Audio To Finish Playing

After all audio data has been written, call Flush. This will block the thread until all audio has been played.

speaker.Flush();

7. Stop & Release Resources

To stop the audio playback device, call Stop(). If you no longer need PvSpeaker, call Dispose() to free memory:

speaker.Stop();
speaker.Dispose();

Complete Demo: Real-Time Voice Output in .NET

Here's a simple .NET console application that demonstrates real-time text-to-speech using Orca Streaming Text-to-Speech and PvSpeaker:

using System;
using System.Runtime.InteropServices;
using System.Text.RegularExpressions;
using System.Threading;
using Pv;

class OrcaDemo
{
    static void Main()
    {
        Console.WriteLine("Initializing Orca & PvSpeaker...");
        Orca orca = Orca.Create("${ACCESS_KEY}"); // Replace with your AccessKey
        PvSpeaker speaker = new PvSpeaker(
            sampleRate: orca.SampleRate,
            bitsPerSample: 16
        );

        Console.WriteLine("Enter the text you want to synthesize:");
        string input = Console.ReadLine() ?? "";
        string[] textChunks = Regex.Split(input, "(\\s+)");

        ThreadSafeDeque<short[]> pcmDeque = new ThreadSafeDeque<short[]>();
        bool isSynthesisComplete = false;

        Thread synthesisThread = new Thread(() =>
        {
            using (Orca.OrcaStream orcaStream = orca.StreamOpen())
            {
                foreach (string chunk in textChunks)
                {
                    Console.Write(chunk);
                    short[] pcmChunk = orcaStream.Synthesize(chunk);
                    if (pcmChunk != null && pcmChunk.Length > 0)
                        pcmDeque.AddToEnd(pcmChunk);
                }

                short[] flushPcm = orcaStream.Flush();
                if (flushPcm != null && flushPcm.Length > 0)
                    pcmDeque.AddToEnd(flushPcm);
            }
            isSynthesisComplete = true;
        });

        Thread playbackThread = new Thread(() =>
        {
            while (!isSynthesisComplete || !pcmDeque.IsEmpty())
            {
                if (pcmDeque.TryRemoveFromStart(out short[]? pcm))
                {
                    Span<byte> bytes = MemoryMarshal.AsBytes(pcm.AsSpan());
                    int written = speaker.Write(bytes.ToArray());
                    if (written < pcm.Length)
                    {
                        short[] remaining = pcm.Skip(written).ToArray();
                        pcmDeque.AddToFront(remaining);
                    }
                }
            }
        });

        speaker.Start();

        synthesisThread.Start();
        playbackThread.Start();

        synthesisThread.Join();
        playbackThread.Join();

        speaker.Flush();
        speaker.Stop();

        orca.Dispose();
        speaker.Dispose();

        Console.WriteLine("\nDone. Resources released.");
    }
}

This is a simplified example but contains all the essential components to demonstrate text-to-speech and audio playback. For a complete .NET demo application, see the Orca .NET demo on GitHub.

This tutorial uses the following packages:

Explore the documentation for more details:

Best Practices & Tips: Streaming TTS in .NET

Buffer Management: For real-time performance, stream tokens to Orca Streaming Text-to-Speech as soon as they're produced. Each synthesized PCM chunk should be added to a thread-safe audio buffer managed by your application. This buffer is responsible for feeding PCM data to PvSpeaker in sequence to maintain smooth, uninterrupted playback.
Threading: To optimize performance, especially in real-time applications, it's essential to handle text-to-speech synthesis and audio playback in separate threads. This prevents blocking your UI and ensures that both tasks run concurrently without impacting responsiveness.

Start Free

How to Add Streaming Text-to-Speech to a .NET App

Step-by-Step: Implement Streaming TTS in .NET

1. Install the NuGet Package

2. Initialize Orca

3. Synthesize Speech From Text

4. Clean Up Resources

Step-by-Step: Implement Audio Playback in .NET

1. Install the NuGet Package

2. Initialize PvSpeaker

3. Start Audio Playback Device

4. Write Audio Data For Playback

5. Implement a Thread-Safe Audio Buffer

6. Wait For Audio To Finish Playing

7. Stop & Release Resources

Complete Demo: Real-Time Voice Output in .NET

Best Practices & Tips: Streaming TTS in .NET

More from Picovoice