Real-Time Transcription in Python

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI apps using Python without sending user data to 3rd party servers.

Learn how to transcribe speech to text in real time using Picovoice Cheetah Streaming Speech-to-Text Python SDK. Cheetah performs speech recognition locally, keeping your voice data private (i.e., GDPR and HIPAA compliant by design). The SDK runs on Linux, macOS, Windows, Raspberry Pi, and NVIDIA Jetson.

Cheetah can also run on Android, iOS, and even inside a Web Browser!

Speech-to-Text (STT), Automatic Speech Recognition (ASR), Automatic Transcription, and Large-Vocabulary Speech Recognition (LVSR) are the same. Similarly, Real-Time, Online, or Streaming STT (ASR) all refer to an engine that makes transcription available as the user speaks with minimum delay.

Install Streaming Speech-to-Text Python SDK

Install the SDK:

pip install pvcheetah

Log in to (sign up for) Picovoice Console. It is free, and no credit card is required! Copy your AccessKey to the clipboard.

Implementation

The transcription implementation has only three steps.

Step 1

Import Cheetah STT package:

import pvcheetah

Step 2

Create an instance of the STT object with your AccessKey:

handle = pvcheetah.create(access_key)

Step 3

Implement audio recording. The audio might be from a microphone or a stream you receive from another program. For the following, we assume there is a function available to us that provides the next available audio chunk (frame) as below.

def get_next_audio_frame():
    pass

Transcribe an audio stream in real-time:

while True:
    partial_transcript, is_endpoint = handle.process(get_next_audio_frame())
    if is_endpoint:
        final_transcript = handle.flush()