🎯 Voice AI Consulting
Get dedicated support and consultation to ensure your specific needs are met.
Consult an AI Expert

Text-to-speech (TTS) has become a core feature in modern applications—whether it's used to read outputs from large language models (LLMs) to provide voice responses in conversational AI, narrate content, or power accessibility tools. Traditional TTS solutions often depend on cloud services, which can introduce latency and raise privacy concerns, as data must be sent to external servers for processing.

Orca Streaming Text-to-Speech addresses these challenges by running entirely on-device. With Orca Streaming Text-to-Speech, enterprise .NET applications generate high-quality, human-like speech in real time—without the need for cloud reliance. This approach not only minimizes delays but also ensures sensitive information remains private, making it ideal for HIPAA-compliant healthcare apps, financial services, and other privacy-sensitive applications.

In this post, we'll guide you through the process of integrating Orca Streaming Text-to-Speech into your .NET C# app—covering installation, speech synthesis, and audio playback for enterprise-grade deployments.

Use picoLLM as your on-device LLM inference engine to add real-time AI capabilities directly in your .NET app—enabling chat, question answering, and content generation while keeping all data private.

Step-by-Step: Implement Streaming TTS in .NET

Ensure that your environment meets the following .NET requirements:

  • Windows (x86_64): .NET Framework 4.6.1+, .NET Standard 2.0+, or .NET Core 3.0+
  • macOS (x86_64): .NET Standard 2.0+ or .NET Core 3.0+
  • macOS (arm64), Windows (arm64), Linux (x86_64), Raspberry Pi (3, 4, 5): .NET 6.0+

Next, sign up for a Picovoice Console account and copy your AccessKey.

1. Install the NuGet Package

Install the Picovoice.Orca NuGet package:

2. Initialize Orca

Create a new instance of Orca in your application, passing in your AccessKey. Then, create an Orca.OrcaStream object:

Orca Streaming Text-to-Speech also supports single synthesis, where the entire text is synthesized in one call. If you don't need streaming synthesis, check out the Orca .NET API docs for more details on single synthesis.

3. Synthesize Speech From Text

Orca synthesizes speech from real-time or partial text inputs, like LLM-generated tokens. The audio is then available in buffers for immediate playback.

See the next section for audio playback instructions.

4. Clean Up Resources

Dispose of your Orca instance when you're done to free memory and resources.

Step-by-Step: Implement Audio Playback in .NET

Handling audio playback in .NET can be complex, especially when dealing with various audio formats, streams, and device compatibility. PvSpeaker simplifies this process by integrating seamlessly with Orca to play back the synthesized speech.

1. Install the NuGet Package

Install the PvSpeaker NuGet package:

2. Initialize PvSpeaker

Create an instance of PvSpeaker, providing the sample rate and bit depth (bitsPerSample) of the audio you'll be playing:

You can obtain the sample rate from Orca using orca.SampleRate. Orca's internal bit depth is 16.

3. Start Audio Playback Device

Start the audio playback device:

4. Write Audio Data For Playback

Write PCM data to the PvSpeaker instance for playback:

Important: PvSpeaker may not write all audio frames in one call, depending on its buffer capacity. Each Write call returns the number of samples written, which you can use to manage an audio buffer at the application level.

5. Implement a Thread-Safe Audio Buffer

Implement a buffer to store the synthesized PCM audio data while it waits to be written to the speaker. Here's a custom thread-safe class using a linked list and locking mechanism:

Write audio for audio playback properly using ThreadSafeDeque:

6. Wait For Audio To Finish Playing

After all audio data has been written, call Flush. This will block the thread until all audio has been played.

7. Stop & Release Resources

To stop the audio playback device, call Stop(). If you no longer need PvSpeaker, call Dispose() to free memory:

Complete Demo: Real-Time Voice Output in .NET

Here's a simple .NET console application that demonstrates real-time text-to-speech using Orca Streaming Text-to-Speech and PvSpeaker:

This is a simplified example but contains all the essential components to demonstrate text-to-speech and audio playback. For a complete .NET demo application, see the Orca .NET demo on GitHub.

This tutorial uses the following packages:

Explore the documentation for more details:

Best Practices & Tips: Streaming TTS in .NET

  • Buffer Management: For real-time performance, stream tokens to Orca Streaming Text-to-Speech as soon as they're produced. Each synthesized PCM chunk should be added to a thread-safe audio buffer managed by your application. This buffer is responsible for feeding PCM data to PvSpeaker in sequence to maintain smooth, uninterrupted playback.
  • Threading: To optimize performance, especially in real-time applications, it's essential to handle text-to-speech synthesis and audio playback in separate threads. This prevents blocking your UI and ensures that both tasks run concurrently without impacting responsiveness.
Start Free