Ubuntu Speech-to-Text Tutorial

🚀 Best-in-class Voice AI!

Build desktop and server applications with on-device voice AI and LLMs.

We love Ubuntu at Picovoice. Our standard dev machines are running Ubuntu. No offence to macOS and Windows fans 😉

Today you can run Ubuntu on a single-board computer (SBC) like Raspberry Pi, NVIDIA Jetson, or BeagleBone. At the same time, one can have it on a server or a desktop. Below we look at options for running Speech-to-Text on an Ubuntu machine. Then we dive deeper into how to run Picovoice Leopard Speech-to-Text Engine on Ubuntu.

Speech-to-Text on Ubuntu

API

You can use any API: Google Speech-to-Text, Amazon Transcribe, IBM Watson Speech-to-Text, or Azure Cognitive Services Speech-to-Text. The downside? They are pretty expensive for anything other than a proof of concept but are relatively accurate. Additionally, you need to send raw audio data to the cloud, which means extra power consumption and bandwidth cost. The latter is only a concern if you are on a cellular connection.

FOSS

Alternatively, you can use free and open-source (FOSS) software. Kaldi (derivations of such as Vosk), Mozilla DeepSpeech (derivations of such as Coqui), and many more. The upside is that they are free, but the downside is that they hardly match the accuracy of API-based ASRs nor have all the features you might require (e.g. custom words and keyword boosting). If you care about the runtime efficiency, they are not necessarily optimized. These can be good starting points if you decide to build your own.

Picovoice

Picovoice Leopard Speech-to-Text processes voice locally on the device while matching the accuracy of API alternatives from Big Tech. Developers can start transcribing in seconds with Picovoice’s Free Plan, even for commercial projects.

Leopard comes with a total package size of 20MB (compared to GBs of FOSS alternatives). Leopard runtime efficiency enables it to run even on Raspberry Pi 3 using only a quarter of only one of the CPU cores.

Leopard Python SDK

Install Leopard Python package using PIP:

pip3 install pvleopard

Sign up for Picovoice Console and copy your AccessKey to the clipboard. AccessKey handles authentication and authorization.

Create an instance of Leopard STT and transcribe a file:

import pvleopard
leopard = pvleopard.create(access_key)

transcript, words = leopard.process_file(path)

Node.js, Rust, Go, Java, .NET, ...

o = pvleopard.create(access_key)

transcript, words = 
  o.process_file(path)
Build with Python