Recording audio from a web browser is more challenging than it might seem at first glance. While the browser's abstraction from the hardware it's running on has its benefits, it can make it difficult to communicate with certain peripherals - e.g. a user's connected microphone. Luckily for us modern developers, the Web Audio API and the MediaStream API came along over a decade ago and solved many of these problems.

The Web Audio API is a powerful tool for manipulating audio in the browser. It allows developers to analyze, synthesize, and manipulate audio in real-time using some simple JavaScript. The MediaStream API allows developers to open streams of media content from many sources, including the microphone. In this article, we will look at how to use the Web Audio API and the MediaStream API to capture microphone audio in any modern web browser.

Setting up a basic HTML page

First, let's create a basic HTML page that we can use to control audio capture from the microphone. Create a new file called index.html and add the following code:

Capturing audio from the microphone

Now that we have our HTML page, let's create the main JavaScript file to capture microphone audio. Create a new file called main.js and add the following code:

With this code, we are able to capture microphone audio using the Web Audio API and the MediaStream API. When the user clicks the Start Capture button, we create an AudioContext and request access to the user's microphone. Once we know we have access to the microphone audio, we then create an audio processing graph using a MediaStreamAudioSourceNode to capture the audio and an AudioWorkletNode to process it.

Audio Context Graph

The Web Audio API and MediaStream API are supported on Google Chrome, Firefox, Safari, Microsoft Edge and Opera. A host of mobile web browsers are also supported.

Processing the captured audio data

Now that we have set up the basic infrastructure for capturing microphone audio, we can start processing the real-time audio data. To do this, we will need to define the behaviour of the AudioWorkletNode with an AudioWorkletProcessor implementation of our own.

Create a new file called my-audio-processor.js and add the following code:

In the process function that we've defined, we can access the input audio data and perform various operations on it. For example, we can use the Web Audio API's AnalyserNode to analyze the frequency spectrum or buffer the audio to send to a speech recognition engine.

With this final addition, we can now capture real-time microphone audio from the HTML page we created earlier.

Capturing audio from the browser on Easy Mode

Now, you might be thinking, "this approach seems complicated and limited (i.e. can't choose the sample rate of the incoming audio, audio processing on the main thread seems bad, etc.)", and you would be right. That's why we created the Web Voice Processor library.

At Picovoice, we ran into a multitude of challenges getting audio from the web browser for speech recognition. We require specific audio properties for our speech recognition engines, and - since our audio processing happens all in the browser - we want the processing to happen on a worker thread. We found ourselves building out a complex array of utility functions to help, which we eventually merged into an open-source library, @picovoice/web-voice-processor.

With Web Voice Processor imported, our main.js file would look like this:

In addition to simplifying the audio capture process, Web Voice Processor adds options for resampling the input audio, selecting the audio device to record with and running audio processing on a Worker Thread.

Explore

The Web Voice Processor is open-source and available on GitHub. There is also a demo in the repository that explores more of the features of the library.