🎯 Voice AI Consulting
Get dedicated support and consultation to ensure your specific needs are met.
Consult an AI Expert

Voice Activity Detection (VAD), also known as speech activity detection or speech detection, plays a critical role in modern speech applications. From transcription and wake-word detection to call analytics, accurate speech detection improves clarity, responsiveness, and user experiences.

For enterprise .NET developers, achieving high accuracy in voice detection is essential. A fast and highly accurate VAD improves speech-to-text quality, reduces false triggers, and optimizes bandwidth and storage by filtering out silence and background noise.

Cobra Voice Activity Detection delivers best-in-class accuracy while remaining lightweight, making it ideal for real-time applications where performance and efficiency matter. Designed for precision and minimal resource usage, Cobra VAD lets your .NET applications reliably detect human speech without taxing CPU or memory.

In this guide, you'll learn how to integrate Cobra Voice Activity Detection into your .NET C# application—from installing the SDK to capturing microphone input and visualizing voice activity in real time.

Step-by-step: Add VAD to a .NET App

First, ensure your environment meets the following .NET requirements:

  • Windows (x86_64): .NET Framework 4.6.1+, .NET Standard 2.0+, or .NET Core 3.0+
  • macOS (x86_64): .NET Standard 2.0+ or .NET Core 3.0+
  • macOS (arm64), Windows (arm64), Linux (x86_64), Raspberry Pi (3, 4, 5): .NET 6.0+

1. Get Your AccessKey

Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.

2. Install the NuGet Package

Install the Cobra NuGet package:

3. Initialize Cobra

Create a new Cobra instance in your application:

You can now start sending PCM audio frames (16-bit, mono) to Cobra for processing.

4. Process Audio Frames

Feed audio data to Cobra and retrieve the voice activity probability for each frame.

The returned value (voiceProbability) ranges from 0.0 to 1.0, where:

  • 0.0 → no human speech detected
  • 1.0 → definite human speech

You can use this value to decide when to start recording, trigger an event, or feed audio into another model (like Cheetah Streaming Speech-to-Text or Rhino Speech-to-Intent for voice commands).

If your application doesn't capture audio yet, refer to Recording Audio in .NET Applications.

5. Releasing Resources

Always dispose of the Cobra instance when done to free resources:

Real-time VAD Visualization in .NET

Below is a complete example that visualizes voice activity probability from your microphone in real time. It uses PvRecorder for capturing audio input:

For a complete .NET application, see the Cobra .NET demo on GitHub.

This tutorial uses the following packages:

Explore our documentation for more details:

Best Practices

  • Audio format: Use 16-bit PCM, mono, with the sample rate and frame length defined by Cobra.SampleRate and Cobra.FrameLength.
  • Threading: Run audio capture and processing on a background thread to keep your UI responsive.

Combine Speech Detection with Other Picovoice Engines

Cobra integrates seamlessly with other Picovoice technologies for end-to-end voice interfaces:

Start Free