Leopard Speech-to-Text
Python Quick Start

Platforms

Linux (x86_64)
macOS (x86_64, arm64)
Windows (x86_64, arm64)
Raspberry Pi (3, 4, 5)

Requirements

Picovoice Account & AccessKey
Python 3.9+
PIP

Picovoice Account & AccessKey

Signup or Login to Picovoice Console to get your AccessKey. Make sure to keep your AccessKey secret.

Quick Start

Setup

Install Python 3.
Install the pvleopard Python package using PIP:

pip3 install pvleopard

Usage

Create an instance of the engine and transcribe an audio file:

import pvleopard

leopard = pvleopard.create(access_key='${ACCESS_KEY}')

transcript, words = leopard.process_file('${AUDIO_PATH}')
print(transcript)
for word in words:
    print(
      "{word=\"%s\" start_sec=%.2f end_sec=%.2f confidence=%.2f}"
      % (word.word, word.start_sec, word.end_sec, word.confidence))

Transcribe raw audio data (sample rate of 16 kHz, 16-bit linearly encoded and 1 channel):

def get_audio_data():
   // get audio data

transcript, words = leopard.process(get_audio_data())
print(transcript)
for word in words:
    print(
      "{word=\"%s\" start_sec=%.2f end_sec=%.2f confidence=%.2f}"
      % (word.word, word.start_sec, word.end_sec, word.confidence))

Free resources used by Leopard Speech-to-Text:

leopard.delete();

Model File

The Leopard Speech-to-Text Python SDK comes preloaded with a default English language model (.pv file). Default models for other supported languages can be found in the Leopard Speech-to-Text GitHub repository.

Create custom language models using the Picovoice Console. Here you can train language models with custom vocabulary and boost words in the existing vocabulary.

Pass in the .pv file via the model_path argument:

leopard = pvleopard.create(
    access_key='${ACCESS_KEY}',
    model_path='${MODEL_FILE_PATH}')

Word Metadata

Along with the transcript, Leopard Speech-to-Text returns metadata for each transcribed word. Available metadata items are:

Start Time: Indicates when the word started in the transcribed audio. Value is in seconds.
End Time: Indicates when the word ended in the transcribed audio. Value is in seconds.
Confidence: Leopard Speech-to-Text's confidence that the transcribed word is accurate. It is a number within [0, 1].
Speaker Tag: If speaker diarization is enabled on initialization, the speaker tag is a non-negative integer identifying unique speakers, with 0 reserved for unknown speakers. If speaker diarization is not enabled, the value will always be -1.

Demo

For the Leopard Speech-to-Text Python SDK, we offer demo applications that demonstrate how to use the Speech-to-Text engine on audio files.

Setup

Install the pvleoparddemo Python package using PIP:

pip3 install pvleoparddemo

This package installs command-line utilities for the Leopard Speech-to-Text Python demos.

Usage

Use the --help flag to see the usage options for the demo:

leopard_demo_file --help

Run the following command to transcribe an audio file:

leopard_demo_file --access_key ${ACCESS_KEY} \
                  --audio_paths ${AUDIO_PATH1} ${AUDIO_PATH2} ...

For more information on our Leopard Speech-to-Text demos for Python, head over to our GitHub repository.

Leopard Speech-to-Text
Python Quick Start

Platforms

Requirements

Picovoice Account & AccessKey

Quick Start

Setup

Usage

Model File

Word Metadata

Demo

Setup

Usage

Resources

Packages

API

GitHub

Benchmark

Further Reading

Leopard Speech-to-Text Python Quick Start

Platforms

Requirements

Picovoice Account & AccessKey

Quick Start

Setup

Usage

Model File

Word Metadata

Demo

Setup

Usage

Resources

Packages

API

GitHub

Benchmark

Further Reading

Leopard Speech-to-Text
Python Quick Start