Real-time transcription is the process of converting spoken words into text immediately as they are spoken. While it is common to use cloud-based services for real-time transcription, there are also options available for running it on-device.

When using cloud-based real-time transcription, the audio is recorded and sent to a vendor server that houses the transcription engine. This server then transcribes the audio, and sends the transcription back to the client. This method can be susceptible to delays or interruptions in transcription due to network latency or connectivity issues. In contrast, on-device real-time transcription performs the transcription directly on a local device, eliminating these inherent latency and reliability challenges.

Picovoice's Cheetah Streaming Speech-to-Text is an on-device software designed to perform speech-to-text locally. Cheetah ensures your voice data remains private (i.e. it is GDPR and HIPAA-compliant by design). Additionally, it guarantees a real-time experience by eliminating unpredictable delays.

Cheetah Streaming Speech-to-Text can run on Linux, macOS, Windows, Raspberry Pi, and NVIDIA Jetson.

In just a few minutes, you can start transcribing speech to text in real time using the Cheetah Streaming Speech-to-Text JavaScript SDK. Let's get started!

1. Project setup

Create a new folder and initialize an npm project:

Ensure Node.js is installed. Next, install @picovoice/web-voice-processor and @picovoice/cheetah-web:

Also install http-server as a development dependency, so we can view our project on localhost:

2. HTML

Create an index.html file with the following scripts:

add the following line to the project's package.json's scripts:

You'll now be able to run the local server to load the page:

You can see the page at http://localhost:5000. This will just look like a blank page for now.

3. Picovoice Console

Sign up for a free Picovoice Console account and copy your AccessKey, found on the main dashboard.

Download the default model and put it in the project's root directory. If you're adding Cheetah to an existing project, put the model in the public (or equivalent) directory instead.

Instead of using the default model, you can also use the Picovoice console to create a custom model if you want to add custom vocabulary and/or boost the probability of certain words.

4. Initialize Cheetah

In a <script> tag within the <body> of the html file, create an instance of CheetahWorker with your Picovoice AccessKey and a transcriptCallback function.

When audio has been processed, Cheetah will return via the transcriptCallback function a transcript string and an isEndpoint bool.

  • transcript represents the most recent portion of the transcription
  • isEndpoint is a flag that will be set to true when Cheetah detects a chunk of audio (1s by default) after an utterance without any speech in it

5. Start Detecting Voice

In order to begin transcribing speech, we need to be able to access and pass audio to Cheetah. The Web Audio API and the MediaStream API are commonly used by developers to work with audio in web browsers. Although powerful, setup for the Web Audio and MediaStream APIs can be fairly complex. This is why we created Web Voice Processor - an open-source library that handles recording audio and passing it to Cheetah for you.

To start detecting voice, simply subscribe cheetah to WebVoiceProcessor.

To stop processing audio, unsubscribe cheetah.

6. Complete HTML

Add some html elements and app logic to see Cheetah in action. It might look something like this:

Finally, go back to http://localhost:5000. Click "Start Cheetah" and speak into your microphone to see the live transcription!

Adding to Existing Project?

If you are working within an existing project that has a module bundler, you can use the import syntax instead:


For more information, check out the Cheetah Streaming Speech-to-Text product page or refer to the Cheetah Voice Activity Detection JavaScript SDK quick start guide.