Learn how to transcribe speech to text in real time using Picovoice Cheetah Streaming Speech-to-Text Python
SDK. Cheetah performs speech recognition locally, keeping your voice data private (i.e., GDPR
and HIPAA
compliant by design). The SDK runs on Linux
, macOS
, Windows
, Raspberry Pi
, and NVIDIA Jetson
.
Cheetah can also run on Android
, iOS
, and even inside a Web Browser
!
Speech-to-Text
(STT
), Automatic Speech Recognition
(ASR
), Automatic Transcription
, and Large-Vocabulary Speech Recognition
(LVSR
) are the same. Similarly, Real-Time
, Online
, or Streaming
STT (ASR) all refer to an engine that makes transcription available as the user speaks with minimum delay.
Install Streaming Speech-to-Text Python SDK
Install the SDK:
Sign up for Picovoice Console
Log in to (sign up for) Picovoice Console. It is free, and no credit card is required!
Copy your AccessKey
to the clipboard.
Implementation
The transcription implementation has only three steps.
Step 1
Import Cheetah STT package:
Step 2
Create an instance of the STT object with your AccessKey:
Step 3
Implement audio recording. The audio might be from a microphone or a stream you receive from another program. For the following, we assume there is a function available to us that provides the next available audio chunk (frame) as below.
Transcribe an audio stream in real-time: