Speech Recognition in Android Tutorial

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI applications running entirely on mobile without sharing user data with 3rd parties.

This article serves as a comprehensive guide for adding on-device Speech Recognition to an Android app.

In the world of software, there is often confusion about the exact meaning of Speech Recognition. Most assume it refers solely to Speech-to-Text features. However, Speech-to-Text represents only a single facet of Speech Recognition. Additional technologies under to umbrella of Speech Recognition include Wake Word Detection, Voice Command Recognition, and Voice Activity Detection (VAD).

Here's a handy guide for selecting an appropriate Speech Recognition approach for your Android application:

Identify if a person is speaking and when -> Cobra VAD
Recognize specific phrases or words -> Porcupine Wake Word
Understand voice commands and extracting intent with details (i.e. slot values) -> Rhino Speech-to-Intent
Transcribe speech to text in real time -> Cheetah Streaming Speech-to-Text
Batch speech to text transcription of large volumes of audio data -> Leopard Speech-to-Text

There are also SDKs available for iOS, as well as cross-platform mobile frameworks Flutter and React Native.

Cobra VAD

To integrate the Cobra VAD SDK into your Android project, ensure you have included mavenCentral() in your top-level build.gradle file, then add the following dependency to your app’s build.gradle file:

dependencies {
    implementation 'ai.picovoice:cobra-android:${LATEST_VERSION}'
}

Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.
Add the following to the app's AndroidManifest.xml file to enable recording with an Android device's microphone.

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />

Create an instance of the VAD engine:

import ai.picovoice.cobra.Cobra;
import ai.picovoice.cobra.CobraException;

try {
    Cobra cobra = new Cobra("${ACCESS_KEY}");
} catch (CobraException e) { }

Find the probability of voice by passing in audio frames to the .process function:

while (true) {
    try {
        short[] audioFrame = getNextAudioFrame();
        float voiceProbability = cobra.process(audioFrame);
        // take action based on probability of voice
    } catch (CobraException e) { }
}

For further details, visit the Cobra VAD product page or refer to Cobra's Android SDK quick start guide.

Porcupine Wake Word

To integrate the Porcupine Wake Word SDK into your Android project, ensure you have included mavenCentral() in your top-level build.gradle file, then add the following dependency to your app’s build.gradle file:

dependencies {
    implementation 'ai.picovoice:porcupine-android:${LATEST_VERSION}'
}

Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.
Add the following to the app's AndroidManifest.xml file to enable recording with an Android device's microphone.

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />

Create a custom wake word model using Picovoice Console.
Download the .ppn model file and copy it into your Android assets folder (${ANDROID_APP}/src/main/assets).
Initialize the Porcupine Wake Word engine with the .ppn file name (or path relative to the assets folder):

import ai.picovoice.porcupine.*;

Porcupine porcupine = new Porcupine.Builder()
        .setAccessKey("${ACCESS_KEY}")
        .setKeywordPaths(["${KEYWORD_FILE_NAME}"])
        .build(appContext);

Detect the keyword by passing in audio frames to the .process function:

while (true) {
   try {
      short[] audioFrame = getNextAudioFrame();
      int keywordIndex = porcupine.process(audioFrame);
      if (keywordIndex > -1) {
         // keyword detected
      }
   } catch (PorcupineException e) { }
}

For further details, visit the Porcupine Wake Word product page or refer to Porcupine's Android SDK quick start guide.

Rhino Speech-to-Intent

To integrate the Rhino Speech-to-Intent SDK into your Android project, ensure you have included mavenCentral() in your top-level build.gradle file, then add the following dependency to your app’s build.gradle file:

dependencies {
    implementation 'ai.picovoice:rhino-android:${LATEST_VERSION}'
}

Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.
Add the following to the app's AndroidManifest.xml file to enable recording with an Android device's microphone.

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />

Create a custom context model using Picovoice Console.
Download the .rhn model file and copy it into your Android assets folder (${ANDROID_APP}/src/main/assets).
Initialize the Rhino Speech-to-Intent engine with the .rhn file name (or path relative to the assets folder):

import ai.picovoice.rhino.*;

Rhino rhino = new Rhino.Builder()
      .setAccessKey("${ACCESS_KEY}")
      .setContextPath("${CONTEXT_FILE_PATH}")
      .build(appContext);

Infer the user's intent by passing in audio frames to the .process function:

while (true) {
   try {
      short[] audioFrame = getNextAudioFrame();
      boolean isFinalized = rhino.process(audioFrame);
      if (isFinalized) {
         RhinoInference inference = rhino.getInference();
         if (inference.getIsUnderstood()) {
            final String intent = inference.getIntent();
            final Map<String, String> slots = inference.getSlots();
            // take action based on inferred intent and slot values
         }
         else {
            // handle unsupported commands
         }
      }
   } catch (RhinoException e) { }
}

For further details, visit the Rhino Speech-to-Intent product page or refer to Rhino's Android SDK quick start guide.

Cheetah Streaming Speech-to-Text

To integrate the Cheetah Streaming Speech-to-Text SDK into your Android project, ensure you have included mavenCentral() in your top-level build.gradle file, then add the following dependency to your app’s build.gradle file:

dependencies {
    implementation 'ai.picovoice:cheetah-android:${LATEST_VERSION}'
}

Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.
Add the following to the app's AndroidManifest.xml file to enable recording with an Android device's microphone.

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />

Download the .pv language model file from the Cheetah GitHub repository and copy it into your Android assets folder (${ANDROID_APP}/src/main/assets).
Initialize the Cheetah Streaming Speech-to-Text engine with the .pv file name (or path relative to the assets folder):

try {
   Cheetah cheetah = new Cheetah.Builder()
      .setAccessKey("${ACCESS_KEY}")
      .setModelPath("${MODEL_FILE_NAME}")
      .build(appContext);
} catch (CheetahException ex) { }

Transcribe speech to text in real time by passing in audio frames to the .process function:

String transcript = "";
while (true) {
   try {
      short[] audioFrame = getNextAudioFrame();
      CheetahTranscript partialResult = cheetah.process(audioFrame);
      transcript += partialResult.getTranscript();

      if (partialResult.getIsEndpoint()) {
         CheetahTranscript finalResult = cheetah.flush();
         transcript += finalResult.getTranscript();
      }
   } catch (CheetahException ex) { }
}

For further details, visit the Cheetah Streaming Speech-to-Text product page or refer to Cheetah's Android SDK quick start guide.

Leopard Speech-to-Text

To integrate the Leopard Speech-to-Text SDK into your Android project, ensure you have included mavenCentral() in your top-level build.gradle file, then add the following dependency to your app’s build.gradle file:

dependencies {
    implementation 'ai.picovoice:leopard-android:${LATEST_VERSION}'
}

Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.
Add the following to the app's AndroidManifest.xml file to enable recording with an Android device's microphone.

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />

Download the .pv language model file from the Leopard GitHub repository and copy it into your Android assets folder (${ANDROID_APP}/src/main/assets).
Create an instance of Leopard for speech-to-text transcription:

try {
   Leopard leopard = new Leopard.Builder()
      .setAccessKey("${ACCESS_KEY}")
      .setModelPath("${MODEL_FILE_NAME}")
      .build(appContext);
} catch (LeopardException ex) { }

Transcribe speech to text by passing an audio file to the .processFile function:

File audioFile = new File("${AUDIO_FILE_PATH}");
try {
   LeopardTranscript result = leopard.processFile(audioFile.getAbsolutePath());
   for (LeopardTranscript.Word word : result.getWordArray()) {
      System.out.format(
         "%10s - %5.2f - %5.2f - %5.2f\n",
         word.getWord(),
         word.getStartSec(),
         word.getEndSec(),
         word.getConfidence());
   }
} catch (LeopardException ex) { }

For further details, visit the Leopard Speech-to-Text product page or refer to Leopard's Android SDK quick start guide.

Android Speech Recognition

Cobra VAD

Porcupine Wake Word

Rhino Speech-to-Intent

Cheetah Streaming Speech-to-Text

Leopard Speech-to-Text

More from Picovoice