Text-to-speech (TTS) has become a core feature in modern applications—whether it's used to read outputs from large language models (LLMs) to provide voice responses in conversational AI, narrate content, or power accessibility tools. Traditional TTS solutions often depend on cloud services, which can introduce latency and raise privacy concerns, as data must be sent to external servers for processing.
Orca Streaming Text-to-Speech addresses these challenges by running entirely on-device. With Orca Streaming Text-to-Speech, enterprise .NET applications generate high-quality, human-like speech in real time—without the need for cloud reliance. This approach not only minimizes delays but also ensures sensitive information remains private, making it ideal for HIPAA-compliant healthcare apps, financial services, and other privacy-sensitive applications.
In this post, we'll guide you through the process of integrating Orca Streaming Text-to-Speech into your .NET C# app—covering installation, speech synthesis, and audio playback for enterprise-grade deployments.
Use picoLLM as your on-device LLM inference engine to add real-time AI capabilities directly in your .NET app—enabling chat, question answering, and content generation while keeping all data private.
Step-by-Step: Implement Streaming TTS in .NET
Ensure that your environment meets the following .NET requirements:
- Windows (x86_64): .NET Framework 4.6.1+, .NET Standard 2.0+, or .NET Core 3.0+
- macOS (x86_64): .NET Standard 2.0+ or .NET Core 3.0+
- macOS (arm64), Windows (arm64), Linux (x86_64), Raspberry Pi (3, 4, 5): .NET 6.0+
Next, sign up for a Picovoice Console account and copy your AccessKey.
1. Install the NuGet Package
Install the Picovoice.Orca NuGet package:
2. Initialize Orca
Create a new instance of Orca in your application, passing in your AccessKey. Then, create an Orca.OrcaStream object:
Orca Streaming Text-to-Speech also supports single synthesis, where the entire text is synthesized in one call. If you don't need streaming synthesis, check out the Orca .NET API docs for more details on single synthesis.
3. Synthesize Speech From Text
Orca synthesizes speech from real-time or partial text inputs, like LLM-generated tokens. The audio is then available in buffers for immediate playback.
See the next section for audio playback instructions.
4. Clean Up Resources
Dispose of your Orca instance when you're done to free memory and resources.
Step-by-Step: Implement Audio Playback in .NET
Handling audio playback in .NET can be complex, especially when dealing with various audio formats, streams, and device compatibility. PvSpeaker simplifies this process by integrating seamlessly with Orca to play back the synthesized speech.
1. Install the NuGet Package
Install the PvSpeaker NuGet package:
2. Initialize PvSpeaker
Create an instance of PvSpeaker, providing the sample rate and bit depth (bitsPerSample) of the audio you'll be playing:
You can obtain the sample rate from Orca using orca.SampleRate. Orca's internal bit depth is 16.
3. Start Audio Playback Device
Start the audio playback device:
4. Write Audio Data For Playback
Write PCM data to the PvSpeaker instance for playback:
Important: PvSpeaker may not write all audio frames in one call, depending on its buffer capacity. Each Write call returns the number of samples written, which you can use to manage an audio buffer at the application level.
5. Implement a Thread-Safe Audio Buffer
Implement a buffer to store the synthesized PCM audio data while it waits to be written to the speaker. Here's a custom thread-safe class using a linked list and locking mechanism:
Write audio for audio playback properly using ThreadSafeDeque:
6. Wait For Audio To Finish Playing
After all audio data has been written, call Flush. This will block the thread until all audio has been played.
7. Stop & Release Resources
To stop the audio playback device, call Stop(). If you no longer need PvSpeaker, call Dispose() to free memory:
Complete Demo: Real-Time Voice Output in .NET
Here's a simple .NET console application that demonstrates real-time text-to-speech using Orca Streaming Text-to-Speech and PvSpeaker:
This is a simplified example but contains all the essential components to demonstrate text-to-speech and audio playback. For a complete .NET demo application, see the Orca .NET demo on GitHub.
This tutorial uses the following packages:
Explore the documentation for more details:
- Orca Streaming TTS .NET Quick Start
- Orca Streaming TTS .NET API
- PvSpeaker .NET Quick Start
- PvSpeaker .NET API
Best Practices & Tips: Streaming TTS in .NET
- Buffer Management: For real-time performance, stream tokens to
Orca Streaming Text-to-Speechas soon as they're produced. Each synthesized PCM chunk should be added to a thread-safe audio buffer managed by your application. This buffer is responsible for feeding PCM data toPvSpeakerin sequence to maintain smooth, uninterrupted playback. - Threading: To optimize performance, especially in real-time applications, it's essential to handle text-to-speech synthesis and audio playback in separate threads. This prevents blocking your UI and ensures that both tasks run concurrently without impacting responsiveness.







