Speech-to-Text, also known as Automatic Speech Recognition, is a technology that converts spoken audio into text. The technology has a wide range of applications, from video transcription to hands-free user interfaces.
While many cloud Speech-to-Text APIs are available on the market, most can only transcribe in English. Picovoice's Leopard Speech-to-Text engine, however, supports 8 different languages and achieves state-of-the-art performance, all while running locally on-device.
In this tutorial, we will walk through the process of using the Leopard Speech-to-Text Python SDK to transcribe Spanish audio in just a few lines of code.
Prerequisites
Sign up for a free Picovoice Console account.
Once you've created an account, copy your AccessKey
on the main dashboard.
Install Python (version 3.7 or higher) and ensure it is successfully installed:
Install the pvleopard Python SDK package:
Leopard Speech-to-Text Model File
To initialize Leopard Speech-to-Text, we will need a Leopard Speech-to-Text model file.
The Leopard Speech-to-Text model files for all supported languages are publicly available on GitHub.
For Spanish Speech-to-Text, download the leopard_params_es.pv
model file.
Implementation
After completing the setup, the actual implementation of the Speech-to-Text system can be written in just a few lines of code.
Import the pvleopard
package:
Set the paths for all the required files.
Make sure to replace ${ACCESS_KEY}
with your actual AccessKey
from the Picovoice Console, ${MODEL_FILE}
with the Spanish Leopard Speech-to-Text model file and ${AUDIO_FILE}
with the audio file you want to transcribe:
Initialize Leopard Speech-to-Text and transcribe the audio file:
Leopard Speech-to-Text also provides start and end time-stamps, as well as confidence scores for each word:
Additional Languages
Leopard Speech-to-Text supports 8 different languages, all of which are equally straightforward to use. Simply download the corresponding model file from GitHub, initialize Leopard Speech-to-Text with the file, and begin transcribing.