Speech to Text Transcription in Linux

🚀 Best-in-class Voice AI!

Build desktop and server applications with on-device voice AI and LLMs.

Linux is versatile as it can power servers, desktops, and various embedded machines like Raspberry Pi or NVIDIA Jetson. Thankfully, Picovoice Speech-to-Text engines support all these variants of Linux-based computers.

Picovoice Speech-to-Text engines can also run on macOS, Windows, Android, iOS, and modern web browsers (using WebAssembly).

There are two types of Speech-to-text: Real-Time and Batch.

Real-Time Speech-to-Text

A real-time Speech-to-Text engine makes text available while the user is still talking. Similar to how a human functions. We transcribe speech to text in our head as people are talking to us. What is the downside of this? We sometimes make hearing, or semantic mistakes and only recognize them when the sentence finishes or when we are more accustomed to the voice of the person talking to us.

Real-Time Speech-to-Text, Online Automatic Speech Recognition, and Streaming Speech-to-Text all refer to the same technology.

Picovoice offers Cheetah Streaming Speech-to-Text to convert speech to text in real time. Cheetah is unique because it runs all voice recognition on your device to avoid network delays and minimize the time between a user uttering and transcription output.

Below is the list of SDKs supported by Cheetah with corresponding code snippets and quick-start links.

o = pvcheetah.create(access_key)

partial_transcript, is_endpoint =
  o.process(get_next_audio_frame())
Build with Python

Batch Speech-to-Text

Batch Speech-to-Text differs from its real-time counterpart as it requires the whole utterance before creating the transcription. Why would one want to use this? Because this method is more accurate and runtime efficient than the real-time variant. It's more precise as it sees the future spoken words and can make linguistic and acoustical adjustments accordingly. It is also more efficient as it does not need to juggle between listening and transcribing.

Picovoice offers Leopard Speech-to-Text for batch transcription. It is remarkable because it processes voice data 100% on your device and hence is private by design (HIPAA and GDPR compliant). It also allows adding custom vocabulary and boosting specific phrases using the Picovoice Console.

Below is the list of SDKs supported by Leopard with corresponding code snippets and quick-start links.

o = pvleopard.create(access_key)

transcript, words = 
  o.process_file(path)
Build with Python

Linux Speech to Text

Real-Time Speech-to-Text

Batch Speech-to-Text

More from Picovoice