Recording audio from a web browser is more challenging than it might seem at first glance. While the browser's abstraction from the hardware it's running on has its benefits, it can make it difficult to communicate with certain peripherals - e.g. a user's connected microphone. Luckily for us modern developers, the Web Audio API and the MediaStream API came along over a decade ago and solved many of these problems.
The Web Audio API
is a powerful tool for manipulating audio in the browser. It allows developers to analyze, synthesize, and manipulate audio in real-time using some simple JavaScript. The MediaStream API
allows developers to open streams of media content from many sources, including the microphone. In this article, we will look at how to use the Web Audio API
and the MediaStream API
to capture microphone audio in any modern web browser.
Setting up a basic HTML page
First, let's create a basic HTML page that we can use to control audio capture from the microphone. Create a new file called index.html
and add the following code:
Capturing audio from the microphone
Now that we have our HTML page, let's create the main JavaScript file to capture microphone audio. Create a new file called main.js
and add the following code:
With this code, we are able to capture microphone audio using the Web Audio API and the MediaStream API
. When the user clicks the Start Capture
button, we create an AudioContext
and request access to the user's microphone. Once we know we have access to the microphone audio, we then create an audio processing graph using a MediaStreamAudioSourceNode
to capture the audio and an AudioWorkletNode
to process it.
The Web Audio API
and MediaStream API
are supported on Google Chrome
, Firefox
, Safari
, Microsoft Edge
and Opera
. A host of mobile web browsers are also supported.
Processing the captured audio data
Now that we have set up the basic infrastructure for capturing microphone audio, we can start processing the real-time audio data. To do this, we will need to define the behaviour of the AudioWorkletNode
with an AudioWorkletProcessor
implementation of our own.
Create a new file called my-audio-processor.js
and add the following code:
In the process
function that we've defined, we can access the input audio data and perform various operations on it. For example, we can use the Web Audio API's AnalyserNode
to analyze the frequency spectrum or buffer the audio to send to a speech recognition engine.
With this final addition, we can now capture real-time microphone audio from the HTML page we created earlier.
Capturing audio from the browser on Easy Mode
Now, you might be thinking, "this approach seems complicated and limited (i.e. can't choose the sample rate of the incoming audio, audio processing on the main thread seems bad, etc.)", and you would be right. That's why we created the Picovoice Audio Recorders.
At Picovoice, we ran into a multitude of challenges getting audio from the web browser for speech recognition. We require specific audio properties for our speech recognition engines, and - since our audio processing happens all in the browser - we want the processing to happen on a worker thread. We found ourselves building out a complex array of utility functions to help, which we eventually merged into an open-source library: Picovoice Web Voice Processor.
With Web Voice Processor
imported, our main.js
file would look like this:
In addition to simplifying the audio capture process, Web Voice Processor
adds options for resampling the input audio, selecting the audio device to record with and running audio processing on a Worker Thread
.
It takes less than 90 seconds to start recording audio from a web browser:
Explore
The Web Voice Processor is open-source and available on GitHub. There is also a demo in the repository that explores more of the features of the library.