Learn how to transcribe speech to text in real time using Picovoice Cheetah Streaming Speech-to-Text Python SDK. Cheetah performs speech recognition locally, keeping your voice data private (i.e., GDPR and HIPAA compliant by design). The SDK runs on Linux, macOS, Windows, Raspberry Pi, and NVIDIA Jetson.

Cheetah can also run on Android, iOS, and even inside a Web Browser!

Speech-to-Text (STT), Automatic Speech Recognition (ASR), Automatic Transcription, and Large-Vocabulary Speech Recognition (LVSR) are the same. Similarly, Real-Time, Online, or Streaming STT (ASR) all refer to an engine that makes transcription available as the user speaks with minimum delay.

Install Streaming Speech-to-Text Python SDK

Install the SDK:

Sign up for Picovoice Console

Log in to (sign up for) Picovoice Console. It is free, and no credit card is required! Copy your AccessKey to the clipboard.

Implementation

The transcription implementation has only three steps.

Step 1

Import Cheetah STT package:

Step 2

Create an instance of the STT object with your AccessKey:

Step 3

Implement audio recording. The audio might be from a microphone or a stream you receive from another program. For the following, we assume there is a function available to us that provides the next available audio chunk (frame) as below.

Transcribe an audio stream in real-time: