Speech-to-text (STT) technology has become an essential part of modern applications. From live transcription and meeting notes to hands-free dictation and accessibility tools, converting spoken language into text enables users to interact naturally and efficiently with software.
For C# developers, implementing real-time transcription in .NET can be challenging. You need fast, accurate results with minimal latency—ideally running completely on-device to ensure privacy and reliability.
That's where Cheetah Streaming Speech-to-Text comes in. Cheetah Streaming Speech-to-Text provides real-time, offline transcription, allowing your .NET application to process speech as it's spoken and display text instantly.
In this tutorial, you'll learn how to integrate Cheetah Streaming Speech-to-Text into a .NET C# application—from installing the SDK to capturing audio and receiving live transcription results.
If you don't need real-time transcription, use Leopard Speech-to-Text. Leopard Speech-to-Text is designed for batch transcription and includes additional features such as processing audio files and providing word-level metadata.
Custom Vocabulary & Boost Words
Cheetah Streaming Speech-to-Text's custom vocabulary feature ensures accurate transcription of specialized terminology. Developers can add unique words, adjust pronunciations, and boost recognition of specific phrases, which is ideal for applications in healthcare, finance, manufacturing, or customer support.
You can create and manage custom models using the Picovoice Console. See our guide on creating a custom Cheetah model, or watch the Cheetah Console Tutorial on YouTube.
If customization isn't required, you can use one of the default speech-to-text models.
How to Add Real-time Transcription to a .NET App
Before we start, make sure your environment meets the following .NET requirements:
- Windows (x86_64): .NET Framework 4.6.1+, .NET Standard 2.0+, or .NET Core 3.0+
- macOS (x86_64): .NET Standard 2.0+ or .NET Core 3.0+
- macOS (arm64), Windows (arm64), Linux (x86_64), Raspberry Pi (3, 4, 5): .NET 6.0+
1. Get Your AccessKey
Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.
2. Install the NuGet Package
Install the Picovoice.Cheetah NuGet package:
3. Initialize Cheetah
Create a new Cheetah instance in your application, providing your AccessKey:
4. Streaming Audio & Receiving Text
Send audio frames to Cheetah and handle the resulting partial transcripts.
If your application doesn't capture audio yet, refer to Recording Audio in .NET Applications.
5. Releasing Resources
Once you no longer need Cheetah, call Dispose() to free acquired memory.
Real-time Transcription .NET Demo
Below is a complete working C# demo that streams microphone input to Cheetah Streaming Speech-to-Text and displays transcribed text in real time. PvRecorder is used for audio capture:
For a complete .NET application, see the Cheetah .NET demo on GitHub.
This tutorial uses the following package:
Explore our documentation for more details:
Best Practices & Developer Tips
- Audio format: Record audio as mono, 16-bit PCM, using the sample rate and frame length defined by
cheetah.SampleRateandcheetah.FrameLength. - Custom models: Create a custom domain-specific model if your app needs to recognize specialized terms or abbreviations.
- Threading: Handle audio capture and transcription in a background thread to keep your UI responsive.
Integrate With Other Speech Recognition Engines
Integrate Cheetah Streaming Speech-to-Text with complementary voice technologies to build richer interfaces:
- Porcupine Wake Word: activate transcription after a wake phrase.
- Rhino Speech-to-Intent: interpret the meaning of spoken commands.







