Real-time transcription
is the process of converting spoken words into text immediately as they are spoken. While it is common to use cloud-based
services for real-time transcription
, there are also options available for running it on-device
.
When using cloud-based real-time transcription
, the audio is recorded and sent to a vendor server that houses the transcription engine. This server then transcribes the audio, and sends the transcription back to the client. This method can be susceptible to delays or interruptions in transcription due to network latency or connectivity issues. In contrast, on-device real-time transcription
performs the transcription directly on a local device, eliminating these inherent latency and reliability challenges.
Picovoice's Cheetah Streaming Speech-to-Text
is an on-device
software designed to perform speech-to-text
locally. Cheetah
ensures your voice data remains private (i.e. it is GDPR
and HIPAA
-compliant by design). Additionally, it guarantees a real-time
experience by eliminating unpredictable delays.
Cheetah Streaming Speech-to-Text
can run on Linux
, macOS
, Windows
, Raspberry Pi
, and NVIDIA Jetson
.
In just a few minutes, you can start transcribing speech to text in real time using the Cheetah Streaming Speech-to-Text JavaScript SDK. Let's get started!
1. Project setup
Create a new folder and initialize an npm project:
Ensure Node.js is installed. Next, install @picovoice/web-voice-processor and @picovoice/cheetah-web:
Also install http-server as a development dependency, so we can view our project on localhost
:
2. HTML
Create an index.html
file with the following scripts:
add the following line to the project's package.json
's scripts
:
You'll now be able to run the local server to load the page:
You can see the page at http://localhost:5000
. This will just look like a blank page for now.
3. Picovoice Console
Sign up for a free Picovoice Console account and copy your AccessKey
, found on the main dashboard.
Download the default model and put it in the project's root directory. If you're adding Cheetah
to an existing project, put the model in the public
(or equivalent) directory instead.
Instead of using the default model, you can also use the Picovoice console to create a custom model if you want to add custom vocabulary and/or boost the probability of certain words.
4. Initialize Cheetah
In a <script>
tag within the <body>
of the html
file, create an instance of CheetahWorker
with your Picovoice AccessKey
and a transcriptCallback
function.
When audio has been processed, Cheetah
will return via the transcriptCallback
function a transcript
string and an isEndpoint
bool.
transcript
represents the most recent portion of the transcriptionisEndpoint
is a flag that will be set totrue
whenCheetah
detects a chunk of audio (1s
by default) after an utterance without any speech in it
5. Start Detecting Voice
In order to begin transcribing speech, we need to be able to access and pass audio to Cheetah
. The Web Audio API and the MediaStream API are commonly used by developers to work with audio in web browsers. Although powerful, setup for the Web Audio and MediaStream APIs can be fairly complex. This is why we created Web Voice Processor - an open-source library that handles recording audio and passing it to Cheetah
for you.
To start detecting voice, simply subscribe cheetah
to WebVoiceProcessor
.
To stop processing audio, unsubscribe cheetah
.
6. Complete HTML
Add some html
elements and app logic to see Cheetah
in action. It might look something like this:
Finally, go back to http://localhost:5000
. Click "Start Cheetah" and speak into your microphone to see the live transcription!
Adding to Existing Project?
If you are working within an existing project that has a module bundler, you can use the import
syntax instead:
For more information, check out the Cheetah Streaming Speech-to-Text product page or refer to the Cheetah Voice Activity Detection JavaScript SDK quick start guide.