Real-time transcription is the process of converting spoken words into text immediately as they are spoken. While it is common to use
cloud-based services for
real-time transcription, there are also options available for running it
cloud-based real-time transcription, the audio is recorded and sent to a vendor server that houses the transcription engine. This server then transcribes the audio, and sends the transcription back to the client. This method can be susceptible to delays or interruptions in transcription due to network latency or connectivity issues. In contrast,
on-device real-time transcription performs the transcription directly on a local device, eliminating these inherent latency and reliability challenges.
Cheetah Streaming Speech-to-Text is an
on-device software designed to perform
Cheetah ensures your voice data remains private (i.e. it is
HIPAA-compliant by design). Additionally, it guarantees a
real-time experience by eliminating unpredictable delays.
Cheetah Streaming Speech-to-Text can run on
Raspberry Pi, and
1. Project setup
Create a new folder and initialize an npm project:
Also install http-server as a development dependency, so we can view our project on
index.html file with the following scripts:
add the following line to the project's
You'll now be able to run the local server to load the page:
You can see the page at
http://localhost:5000. This will just look like a blank page for now.
3. Picovoice Console
Sign up for a free Picovoice Console account and copy your
AccessKey, found on the main dashboard.
Download the default model and put it in the project's root directory. If you're adding
Cheetah to an existing project, put the model in the
public (or equivalent) directory instead.
4. Initialize Cheetah
<script> tag within the
<body> of the
html file, create an instance of
CheetahWorker with your Picovoice
AccessKey and a
When audio has been processed,
Cheetah will return via the
transcriptCallback function a
transcript string and an
transcriptrepresents the most recent portion of the transcription
isEndpointis a flag that will be set to
Cheetahdetects a chunk of audio (
1sby default) after an utterance without any speech in it
5. Start Detecting Voice
In order to begin transcribing speech, we need to be able to access and pass audio to
Cheetah. The Web Audio API and the MediaStream API are commonly used by developers to work with audio in web browsers. Although powerful, setup for the Web Audio and MediaStream APIs can be fairly complex. This is why we created Web Voice Processor - an open-source library that handles recording audio and passing it to
Cheetah for you.
To start detecting voice, simply subscribe
To stop processing audio, unsubscribe
6. Complete HTML
html elements and app logic to see
Cheetah in action. It might look something like this:
Finally, go back to
http://localhost:5000. Click "Start Cheetah" and speak into your microphone to see the live transcription!
Adding to Existing Project?
If you are working within an existing project that has a module bundler, you can use the
import syntax instead: