How to Add Voice Activity Detection to a .NET App

🎯 Voice AI Consulting

Get dedicated support and consultation to ensure your specific needs are met.

Voice Activity Detection (VAD), also known as speech activity detection or speech detection, plays a critical role in modern speech applications. From transcription and wake-word detection to call analytics, accurate speech detection improves clarity, responsiveness, and user experiences.

For enterprise .NET developers, achieving high accuracy in voice detection is essential. A fast and highly accurate VAD improves speech-to-text quality, reduces false triggers, and optimizes bandwidth and storage by filtering out silence and background noise.

Cobra Voice Activity Detection delivers best-in-class accuracy while remaining lightweight, making it ideal for real-time applications where performance and efficiency matter. Designed for precision and minimal resource usage, Cobra VAD lets your .NET applications reliably detect human speech without taxing CPU or memory.

In this guide, you'll learn how to integrate Cobra Voice Activity Detection into your .NET C# application—from installing the SDK to capturing microphone input and visualizing voice activity in real time.

Step-by-step: Add VAD to a .NET App

First, ensure your environment meets the following .NET requirements:

Windows (x86_64): .NET Framework 4.6.1+, .NET Standard 2.0+, or .NET Core 3.0+
macOS (x86_64): .NET Standard 2.0+ or .NET Core 3.0+
macOS (arm64), Windows (arm64), Linux (x86_64), Raspberry Pi (3, 4, 5): .NET 6.0+

1. Get Your AccessKey

Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.

2. Install the NuGet Package

Install the Cobra NuGet package:

dotnet add package Cobra

3. Initialize Cobra

Create a new Cobra instance in your application:

using Pv;

// AccessKey from Picovoice Console
Cobra cobra = new Cobra("${ACCESS_KEY}");

You can now start sending PCM audio frames (16-bit, mono) to Cobra for processing.

4. Process Audio Frames

Feed audio data to Cobra and retrieve the voice activity probability for each frame.

short[] GetNextAudioFrame()
{
    // ...
}

while (true)
{
    float voiceProbability = cobra.Process(GetNextAudioFrame());
    Console.WriteLine($"Voice probability: {voiceProbability}");
}

The returned value (voiceProbability) ranges from 0.0 to 1.0, where:

0.0 → no human speech detected
1.0 → definite human speech

You can use this value to decide when to start recording, trigger an event, or feed audio into another model (like Cheetah Streaming Speech-to-Text or Rhino Speech-to-Intent for voice commands).

If your application doesn't capture audio yet, refer to Recording Audio in .NET Applications.

5. Releasing Resources

Always dispose of the Cobra instance when done to free resources:

cobra.Dispose();

Real-time VAD Visualization in .NET

Below is a complete example that visualizes voice activity probability from your microphone in real time. It uses PvRecorder for capturing audio input:

using System;
using System.Threading;
using Pv;

class CobraDemo
{
    static void Main()
    {
        Console.WriteLine("Initializing Cobra & PvRecorder...");
        Cobra cobra = new Cobra("${ACCESS_KEY}"); // AccessKey from Picovoice Console
        PvRecorder recorder = PvRecorder.Create(frameLength: cobra.FrameLength);

        var processingThread = new Thread(() =>
        {
            while (recorder.IsRecording)
            {
                short[] audioFrame = recorder.Read();
                float voiceProbability = cobra.Process(audioFrame);

                // Your processing logic
                if (recorder.IsRecording)
                {
                    int barLength = (int)(voiceProbability * 10);
                    string bar = new string('#', barLength).PadRight(10);
                    Console.Write($"\r[{bar}] {voiceProbability:P1}");
                }
            }
        });

        recorder.Start();
        processingThread.Start();

        Console.WriteLine("Listening... Press Enter to stop.");
        Console.ReadLine();

        Console.WriteLine("Enter pressed. Stopping...");
        recorder.Stop();
        processingThread.Join();

        cobra.Dispose();
        recorder.Dispose();

        Console.WriteLine("Stopped and resources released.");
    }
}

For a complete .NET application, see the Cobra .NET demo on GitHub.

This tutorial uses the following packages:

Explore our documentation for more details:

Best Practices

Audio format: Use 16-bit PCM, mono, with the sample rate and frame length defined by Cobra.SampleRate and Cobra.FrameLength.
Threading: Run audio capture and processing on a background thread to keep your UI responsive.

Combine Speech Detection with Other Picovoice Engines

Cobra integrates seamlessly with other Picovoice technologies for end-to-end voice interfaces:

Orca Streaming Text-to-Speech: While Orca is speaking, Cobra can detect when the user starts talking and automatically stop the audio output.
Rhino Speech-to-Intent: Use Cobra to feed the voice command engine only active speech, improving intent detection accuracy.
Cheetah Streaming Speech-to-Text: Combine Cobra with streaming speech-to-text to start transcribing when speech begins and pause during silence.

Start Free