Learn how to transcribe speech to text using Picovoice Leopard Speech-to-Text Node.js
SDK. The SDK runs on Linux
, macOS
, Windows
, Raspberry Pi
, and NVIDIA Jetson
.
Speech-to-text
(STT
), automatic speech recognition
(ASR
), automatic transcription
, and large-vocabulary
speech recognition
are the same. If you are looking for any of these in Node.js, this is it!
Install Speech-to-Text Node.js SDK
Create a project and install the SDK:
Sign up for Picovoice Console
Log in to (sign up for) Picovoice Console. It is free, and no credit card is required!
Copy your AccessKey
to the clipboard.
Implement transcription in JavaScript
Create an instance of Leopard with your AccessKey:
Transcribe an audio file. Leopard ASR engine supports almost any audio format, including FLAC
, MP3
, MP4
, m4a
,
Ogg
, WAV
, and WebM
.
Explore ASR Features
Leopard provides more than just the transcript. It offers:
Custom Vocabulary
Keyword Boosting
Word Timestamps
Word-Level Confidence
Truecasing
Automatic Punctuation
Custom Vocabulary & Keyword Boosting
ASRs can recognize many common words in the language. If you are doing transcription within a specialized domain (e.g. technical, medical, law, or sales), there will be words that are not recognizable by the engine. These are called Out-Of-Vocabulary
(OOV
) words. Leopard overcomes this by enabling developers to teach Leopard about the OOV words and create custom models using Picovoice Console.
Additionally, sometimes you know some words are likely to happen often. You can improve the accuracy by telling it about these expected keywords and boosting ASR's sensitivity towards them. Picovoice Console also enables you to do this.
Learn more by checking out Picovoice Console STT documentation.
Word Timestamps & Confidence
Word Timestamps are essential for creating subtitles and searching. Word confidence identifies portions of the transcription that the ASR engine is unsure of. The certainty information is beneficial for manual correction and as an additional feature for downstream NLU or NLP tasks (e.g. Intent Inference or Sentiment Analysis).
Inspect the word timestamps and confidence:
The output depends on the input audio. For our sample input, below is a snippet:
Truecasing & Automatic Punctuation
Truecasing and Automatic Punctuation help with the readability of the transcription. Create an instance of Leopard with Truecasing and Automatic Punctuation:
Have you seen our other Node.js tutorials? Don’t forget to check out Real-time Transcription with Node.js, Speaker Recognition with Node.js, and Voice Activity Detection with Node.js.