Speech-to-Text, also known as Automatic Speech Recognition, is a technology that converts spoken audio into text. The technology has a wide range of applications, from video transcription to hands-free user interfaces.

While many cloud Speech-to-Text APIs are available on the market, most can only transcribe in English. Picovoice's Leopard Speech-to-Text engine, however, supports 8 different languages and achieves state-of-the-art performance, all while running locally on-device.

In this tutorial, we will walk through the process of using the Leopard Speech-to-Text Python SDK to transcribe Spanish audio in just a few lines of code.

Prerequisites

Sign up for a free Picovoice Console account. Once you've created an account, copy your AccessKey on the main dashboard.

Install Python (version 3.7 or higher) and ensure it is successfully installed:

Install the pvleopard Python SDK package:

Leopard Speech-to-Text Model File

To initialize Leopard Speech-to-Text, we will need a Leopard Speech-to-Text model file. The Leopard Speech-to-Text model files for all supported languages are publicly available on GitHub. For Spanish Speech-to-Text, download the leopard_params_es.pv model file.

Implementation

After completing the setup, the actual implementation of the Speech-to-Text system can be written in just a few lines of code.

Import the pvleopard package:

Set the paths for all the required files. Make sure to replace ${ACCESS_KEY} with your actual AccessKey from the Picovoice Console, ${MODEL_FILE} with the Spanish Leopard Speech-to-Text model file and ${AUDIO_FILE} with the audio file you want to transcribe:

Initialize Leopard Speech-to-Text and transcribe the audio file:

Leopard Speech-to-Text also provides start and end time-stamps, as well as confidence scores for each word:

Additional Languages

Leopard Speech-to-Text supports 8 different languages, all of which are equally straightforward to use. Simply download the corresponding model file from GitHub, initialize Leopard Speech-to-Text with the file, and begin transcribing.