Recording audio from a web browser is more challenging than it might seem at first glance. While the browser's abstraction from the hardware it's running on has its benefits, it can make it difficult to communicate with certain peripherals - e.g. a user's connected microphone. Luckily for us modern developers, the Web Audio API and the MediaStream API came along over a decade ago and solved many of these problems.
MediaStream API allows developers to open streams of media content from many sources, including the microphone. In this article, we will look at how to use the
Web Audio API and the
MediaStream API to capture microphone audio in any modern web browser.
Setting up a basic HTML page
First, let's create a basic HTML page that we can use to control audio capture from the microphone. Create a new file called
index.html and add the following code:
Capturing audio from the microphone
main.js and add the following code:
With this code, we are able to capture microphone audio using the Web Audio API and the
MediaStream API. When the user clicks the
Start Capture button, we create an
AudioContext and request access to the user's microphone. Once we know we have access to the microphone audio, we then create an audio processing graph using a
MediaStreamAudioSourceNode to capture the audio and an
AudioWorkletNode to process it.
Web Audio API and
MediaStream API are supported on
Microsoft Edge and
Opera. A host of mobile web browsers are also supported.
Processing the captured audio data
Now that we have set up the basic infrastructure for capturing microphone audio, we can start processing the real-time audio data. To do this, we will need to define the behaviour of the
AudioWorkletNode with an
AudioWorkletProcessor implementation of our own.
Create a new file called
my-audio-processor.js and add the following code:
process function that we've defined, we can access the input audio data and perform various operations on it. For example, we can use the Web Audio API's
AnalyserNode to analyze the frequency spectrum or buffer the audio to send to a speech recognition engine.
With this final addition, we can now capture real-time microphone audio from the HTML page we created earlier.
Capturing audio from the browser on Easy Mode
Now, you might be thinking, "this approach seems complicated and limited (i.e. can't choose the sample rate of the incoming audio, audio processing on the main thread seems bad, etc.)", and you would be right. That's why we created the
Web Voice Processor library.
At Picovoice, we ran into a multitude of challenges getting audio from the web browser for speech recognition. We require specific audio properties for our speech recognition engines, and - since our audio processing happens all in the browser - we want the processing to happen on a worker thread. We found ourselves building out a complex array of utility functions to help, which we eventually merged into an open-source library,
Web Voice Processor imported, our
main.js file would look like this:
In addition to simplifying the audio capture process,
Web Voice Processor adds options for resampling the input audio, selecting the audio device to record with and running audio processing on a