Speech to Text Transcription in Android Tutorial

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI applications running entirely on mobile without sharing user data with 3rd parties.

Android is an incredibly versatile platform, powering smartphones, tablets, and various embedded devices such as smartwatches and IoT devices. Fortunately, Picovoice's Speech-to-Text technology seamlessly integrates into the Android ecosystem.

In addition to Android, Picovoice's Speech-to-Text engines are compatible in wide array of environments, such as iOS, Linux, macOS, Windows, and modern web browsers (via WebAssembly).

With Speech-to-Text transcription, there are two main approaches: Real-Time and Batch.

Real-Time Speech-to-Text

Real-time Speech-to-Text engines provide text output as the user speaks, mimicking the natural human process of listening and decoding; similar to how we mentally transcribe spoken words while conversing with others. The downside is that this approach might sometimes lead to mistakes due to auditory or semantic challenges, which become evident only after the completion of a sentence or through familiarity with the speaker's voice. These trade-offs should be considered before deciding that an application requires real-time transcription.

Real-Time Speech-to-Text, Online Automatic Speech Recognition, and Streaming Speech-to-Text all refer to the same core technology.

For Android devices, Picovoice provides Cheetah Streaming Speech-to-Text, a unique technology that performs all voice recognition directly on the device. This approach eliminates network-related delays and minimizes the latency between the user's speech input and the transcription output.

Below is the list of software development kits (SDKs) supported by Cheetah, along with corresponding code snippets and quick-start guides.

o = pvcheetah.create(access_key)

partial_transcript, is_endpoint =
  o.process(get_next_audio_frame())
Build with Python

Batch Speech-to-Text

In contrast to the real-time approach, Batch Speech-to-Text requires the entire spoken phrase before returning a transcription. This approach offers increased accuracy and runtime efficiency compared to the real-time alternative. With the ability to anticipate spoken words, this method can make linguistic and acoustic adjustments for enhanced precision. Furthermore, it eliminates the need to switch between listening and transcribing, thus optimizing efficiency.

For Android-based devices, Picovoice offers Leopard Speech-to-Text, a state-of-the-art technology for batch transcription tasks. Leopard processes all voice audio data solely on device, ensuring privacy by design and compliance with regulations such as HIPAA and GDPR. Furthermore, users can enhance the technology by incorporating a custom vocabulary and boosting specific phrases via the Picovoice Console.

Below is the list of SDKs supported by Leopard, along with corresponding code snippets and quick-start guides.

o = pvleopard.create(access_key)

transcript, words = 
  o.process_file(path)
Build with Python

Android Speech-to-Text Transcription: On-device Alternative to SpeechRecognizer

Real-Time Speech-to-Text

Batch Speech-to-Text

More from Picovoice