How to Add Streaming Speech-to-Text to a .NET App

🎯 Voice AI Consulting

Get dedicated support and consultation to ensure your specific needs are met.

Speech-to-text (STT) technology has become an essential part of modern applications. From live transcription and meeting notes to hands-free dictation and accessibility tools, converting spoken language into text enables users to interact naturally and efficiently with software.

For C# developers, implementing real-time transcription in .NET can be challenging. You need fast, accurate results with minimal latency—ideally running completely on-device to ensure privacy and reliability.

That's where Cheetah Streaming Speech-to-Text comes in. Cheetah Streaming Speech-to-Text provides real-time, offline transcription, allowing your .NET application to process speech as it's spoken and display text instantly.

In this tutorial, you'll learn how to integrate Cheetah Streaming Speech-to-Text into a .NET C# application—from installing the SDK to capturing audio and receiving live transcription results.

If you don't need real-time transcription, use Leopard Speech-to-Text. Leopard Speech-to-Text is designed for batch transcription and includes additional features such as processing audio files and providing word-level metadata.

Custom Vocabulary & Boost Words

Cheetah Streaming Speech-to-Text's custom vocabulary feature ensures accurate transcription of specialized terminology. Developers can add unique words, adjust pronunciations, and boost recognition of specific phrases, which is ideal for applications in healthcare, finance, manufacturing, or customer support.

You can create and manage custom models using the Picovoice Console. See our guide on creating a custom Cheetah model, or watch the Cheetah Console Tutorial on YouTube.

If customization isn't required, you can use one of the default speech-to-text models.

How to Add Real-time Transcription to a .NET App

Before we start, make sure your environment meets the following .NET requirements:

Windows (x86_64): .NET Framework 4.6.1+, .NET Standard 2.0+, or .NET Core 3.0+
macOS (x86_64): .NET Standard 2.0+ or .NET Core 3.0+
macOS (arm64), Windows (arm64), Linux (x86_64), Raspberry Pi (3, 4, 5): .NET 6.0+

1. Get Your AccessKey

Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.

2. Install the NuGet Package

Install the Picovoice.Cheetah NuGet package:

dotnet add package Picovoice.Cheetah

3. Initialize Cheetah

Create a new Cheetah instance in your application, providing your AccessKey:

using Pv;

Cheetah cheetah = Cheetah.Create(
      "${ACCESS_KEY}", // AccessKey from Picovoice Console
      modelPath: "${MODEL_FILE_PATH}", // Your custom model (omit to use the default English model)
      enableAutomaticPunctuation: true); // To optionally add punctuations

4. Streaming Audio & Receiving Text

Send audio frames to Cheetah and handle the resulting partial transcripts.

short[] GetNextAudioFrame()
{
    // ...
}

string transcript = "";

while (true)
{
    CheetahTranscript transcriptObj = cheetah.Process(GetNextAudioFrame());
    transcript += transcriptObj.Transcript;

    if (transcriptObj.IsEndpoint) {
        CheetahTranscript finalTranscriptObj = cheetah.Flush();
        transcript += finalTranscriptObj.Transcript;
    }
}

If your application doesn't capture audio yet, refer to Recording Audio in .NET Applications.

5. Releasing Resources

Once you no longer need Cheetah, call Dispose() to free acquired memory.

cheetah.Dispose();

Real-time Transcription .NET Demo

Below is a complete working C# demo that streams microphone input to Cheetah Streaming Speech-to-Text and displays transcribed text in real time. PvRecorder is used for audio capture:

using System;
using System.Threading;
using Pv;

class CheetahDemo
{
    static void Main()
    {
        Console.WriteLine("Initializing Cheetah & PvRecorder...");
        Cheetah cheetah = Cheetah.Create(
                "${ACCESS_KEY}", // AccessKey from Picovoice Console
                "${MODEL_FILE_PATH}", // Your custom model (omit to use the default English model)
                enableAutomaticPunctuation: true); // Add punctuations (optional)
        PvRecorder recorder = PvRecorder.Create(frameLength: cheetah.FrameLength);

        string transcript = "";
        var processingThread = new Thread(() =>
        {
            while (recorder.IsRecording)
            {
                short[] audioFrame = recorder.Read();
                CheetahTranscript transcriptObj = cheetah.Process(audioFrame);
                transcript += transcriptObj.Transcript;
            
                if (transcriptObj.IsEndpoint) {
                    CheetahTranscript finalTranscriptObj = cheetah.Flush();
                    transcript += finalTranscriptObj.Transcript;
                    Console.Write("\r" + transcript);
                    Console.WriteLine("\r");
                    transcript = "";
                }
                else
                {
                    if (recorder.IsRecording)
                    {
                        Console.Write("\r" + transcript);
                    }
                }
            }
        });

        recorder.Start();
        processingThread.Start();

        Console.WriteLine("Listening... Press Enter to stop.");
        Console.ReadLine();

        Console.WriteLine("Enter pressed. Stopping...");
        recorder.Stop();
        processingThread.Join();

        cheetah.Dispose();
        recorder.Dispose();

        Console.WriteLine("Stopped and resources released.");
    }
}

For a complete .NET application, see the Cheetah .NET demo on GitHub.

This tutorial uses the following package:

Explore our documentation for more details:

Best Practices & Developer Tips

Audio format: Record audio as mono, 16-bit PCM, using the sample rate and frame length defined by Cheetah.SampleRate and Cheetah.FrameLength.
Custom models: Create a custom domain-specific model if your app needs to recognize specialized terms or abbreviations.
Threading: Handle audio capture and transcription in a background thread to keep your UI responsive.

Integrate With Other Speech Recognition Engines

Integrate Cheetah Streaming Speech-to-Text with complementary voice technologies to build richer interfaces:

Porcupine Wake Word: activate transcription after a wake phrase.
Rhino Speech-to-Intent: interpret the meaning of spoken commands.

For conversational applications, you can also connect Cheetah with picoLLM On-Device LLM Inference in .NET to stream transcribed text into a fully local language model, enabling hands-free voice assistants without cloud dependencies.

Start Free