Voice Activity Detection (VAD), also known as speech activity detection or speech detection, plays a critical role in modern speech applications. From transcription and wake-word detection to call analytics, accurate speech detection improves clarity, responsiveness, and user experiences.
For enterprise .NET developers, achieving high accuracy in voice detection is essential. A fast and highly accurate VAD improves speech-to-text quality, reduces false triggers, and optimizes bandwidth and storage by filtering out silence and background noise.
Cobra Voice Activity Detection delivers best-in-class accuracy while remaining lightweight, making it ideal for real-time applications where performance and efficiency matter. Designed for precision and minimal resource usage, Cobra VAD lets your .NET applications reliably detect human speech without taxing CPU or memory.
In this guide, you'll learn how to integrate Cobra Voice Activity Detection into your .NET C# application—from installing the SDK to capturing microphone input and visualizing voice activity in real time.
Step-by-step: Add VAD to a .NET App
First, ensure your environment meets the following .NET requirements:
- Windows (x86_64): .NET Framework 4.6.1+, .NET Standard 2.0+, or .NET Core 3.0+
- macOS (x86_64): .NET Standard 2.0+ or .NET Core 3.0+
- macOS (arm64), Windows (arm64), Linux (x86_64), Raspberry Pi (3, 4, 5): .NET 6.0+
1. Get Your AccessKey
Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.
2. Install the NuGet Package
Install the Cobra NuGet package:
3. Initialize Cobra
Create a new Cobra instance in your application:
You can now start sending PCM audio frames (16-bit, mono) to Cobra for processing.
4. Process Audio Frames
Feed audio data to Cobra and retrieve the voice activity probability for each frame.
The returned value (voiceProbability) ranges from 0.0 to 1.0, where:
- 0.0 → no human speech detected
- 1.0 → definite human speech
You can use this value to decide when to start recording, trigger an event, or feed audio into another model (like Cheetah Streaming Speech-to-Text or Rhino Speech-to-Intent for voice commands).
If your application doesn't capture audio yet, refer to Recording Audio in .NET Applications.
5. Releasing Resources
Always dispose of the Cobra instance when done to free resources:
Real-time VAD Visualization in .NET
Below is a complete example that visualizes voice activity probability from your microphone in real time. It uses PvRecorder for capturing audio input:
For a complete .NET application, see the Cobra .NET demo on GitHub.
This tutorial uses the following packages:
Explore our documentation for more details:
Best Practices
- Audio format: Use 16-bit PCM, mono, with the sample rate and frame length defined by Cobra.SampleRate and Cobra.FrameLength.
- Threading: Run audio capture and processing on a background thread to keep your UI responsive.
Combine Speech Detection with Other Picovoice Engines
Cobra integrates seamlessly with other Picovoice technologies for end-to-end voice interfaces:
- Orca Streaming Text-to-Speech: While
Orcais speaking,Cobracan detect when the user starts talking and automatically stop the audio output. - Rhino Speech-to-Intent: Use
Cobrato feed the voice command engine only active speech, improving intent detection accuracy. - Cheetah Streaming Speech-to-Text: Combine
Cobrawith streaming speech-to-text to start transcribing when speech begins and pause during silence.







