Speech Recognition in iOS Tutorial

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI applications running entirely on mobile without sharing user data with 3rd parties.

This article serves as a comprehensive guide for adding on-device Speech Recognition to an iOS app.

In today's software lingo, the precise definition of Speech Recognition is a bit ambiguous. Many people associate it exclusively with Speech-to-Text technology, but Speech-to-Text constitutes only one component of the broader field of speech technology. Examples of Speech Recognition can include Wake Word Detection, Voice Command Recognition, and Voice Activity Detection (VAD).

Here's a helpful guide to assist you in choosing the right Speech Recognition approach for your iOS application:

Detect if someone is speaking and when they are speaking:
- Cobra VAD
Recognize specific phrases or words:
- Porcupine Wake Word
Understand voice commands and extract intent (including slot values):
- Rhino Speech-to-Intent
Convert spoken words into written text in real-time:
- Cheetah Streaming Speech-to-Text
Perform batch transcription of large audio datasets:
- Leopard Speech-to-Text

There are also SDKs available for Android, as well as cross-platform mobile frameworks Flutter and React Native.

All Picovoice iOS SDKs are distributed via the CocoaPods package manager.

Now, let's delve into each of the Speech Recognition approaches for iOS.

Cobra VAD

To integrate the Cobra VAD SDK into your iOS project, add the following to the project's Podfile:

pod 'Cobra-iOS'

Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.
Add the following to the app's Info.plist file to enable recording with an iOS device's microphone:

<key>NSMicrophoneUsageDescription</key>
<string>[Permission explanation]</string>

Create an instance of the VAD engine:

import Cobra

do {
    cobra = try Cobra(accessKey: "${ACCESS_KEY}")
} catch { }

Find the probability of voice by passing in audio frames to the .process function:

while true {
    do {
        let voiceProbability = try cobra.process(getNextAudioFrame())
        // take action based on the probability of voice
    } catch { }
}

For further details, visit the Cobra VAD product page or refer to the Cobra iOS SDK quick start guide.

Porcupine Wake Word

To integrate the Porcupine Wake Word SDK into your iOS project, add the following to the project's Podfile:

pod 'Porcupine-iOS'

Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.
Add the following to the app's Info.plist file to enable recording with an iOS device's microphone:

<key>NSMicrophoneUsageDescription</key>
<string>[Permission explanation]</string>

Create a custom wake word model using Picovoice Console.
Download the .ppn model file and include it in the app as a bundled resource (found by selecting in Build Phases > Copy Bundle Resources). Then, get its path from the app bundle:

let keywordPath = Bundle(for: type(of: self)).path(
        forResource: "${KEYWORD_FILE}",
        ofType: "ppn")

Initialize the Porcupine Wake Word engine with the .ppn resource:

import Porcupine

do {
    let porcupine = try Porcupine(
        accessKey: "${ACCESS_KEY}",
        keywordPath: keywordPath)
} catch { }

Detect the keyword by passing in audio frames to the .process function:

while true {
    do {
        let audioFrame = getNextAudioFrame()
        let keywordIndex = porcupine.process(audioFrame)        
        if keywordIndex > -1 {
            // keyword detected
        }        
    } catch { }
}

For further details, visit the Porcupine Wake Word product page or refer to the Porcupine iOS SDK quick start guide.

Rhino Speech-to-Intent

To integrate the Rhino Speech-to-Intent SDK into your iOS project, add the following to the project's Podfile:

pod 'Rhino-iOS'

Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.
Add the following to the app's Info.plist file to enable recording with an iOS device's microphone:

<key>NSMicrophoneUsageDescription</key>
<string>[Permission explanation]</string>

Create a custom context model using Picovoice Console.
Download the .rhn model file and include it in your app as a bundled resource (found by selecting in Build Phases > Copy Bundle Resources). Then, get its path from the app bundle:

let contextPath = Bundle(for: type(of: self)).path(
        forResource: "${CONTEXT_FILE}",
        ofType: "rhn")

Initialize the Rhino Speech-to-Intent engine with the .rhn resource:

import Rhino

do {
    let rhino = try Rhino(
        accessKey: "${ACCESS_KEY}",
        contextPath: contextPath)
} catch { }

Infer the user's intent by passing in audio frames to the .process function:

while true {
    do {
        let audioFrame = getNextAudioFrame()
        let isFinalized = rhino.process(audioFrame)
        if isFinalized {
            let inference = rhino.getInference()
            if inference.isUnderstood {
                let intent = inference.intent
                let slots = inference.slots
                // take action based on intent and slot values
            }
            else {
                // handle unsupported commands
            }
        }
    } catch { }
}

For further details, visit the Rhino Speech-to-Intent product page or refer to the Rhino's iOS SDK quick start guide.

Cheetah Streaming Speech-to-Text

To integrate the Cheetah Streaming Speech-to-Text SDK into your iOS project, add the following to the project's Podfile:

pod 'Cheetah-iOS'

Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.
Add the following to the app's Info.plist file to enable recording with an iOS device's microphone:

<key>NSMicrophoneUsageDescription</key>
<string>[Permission explanation]</string>

Download the .pv language model file from the Cheetah GitHub repository and include it in the app as a bundled resource (found by selecting in Build Phases > Copy Bundle Resources). Then, get its path from the app bundle:

let modelPath = Bundle(for: type(of: self)).path(
        forResource: "${MODEL_FILE}",
        ofType: "pv")

Initialize the Cheetah Streaming Speech-to-Text engine with the .pv resource:

import Cheetah

do {
    let cheetah = Cheetah(
        accessKey: "${ACCESS_KEY}",
        modelPath: modelPath)
} catch { }

Transcribe speech to text in real time by passing in audio frames to the .process function:

while true {
  do {
    let audioFrame = getNextAudioFrame()
    let partialTranscript, isEndpoint = try cheetah.process(audioFrame)
    if isEndpoint {
      let finalTranscript = try cheetah.flush()
    }
  } catch { }
}

For further details, visit the Cheetah Streaming Speech-to-Text product page or refer to the Cheetah iOS SDK quick start guide.

Leopard Speech-to-Text

To integrate the Leopard Speech-to-Text SDK into your iOS project, add the following to the project's Podfile:

pod 'Leopard-iOS'

Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.
Add the following to the app's Info.plist file to enable recording with an iOS device's microphone:

<key>NSMicrophoneUsageDescription</key>
<string>[Permission explanation]</string>

Download the .pv language model file from the Leopard GitHub repository and include it in the app as a bundled resource (found by selecting in Build Phases > Copy Bundle Resources). Then, get its path from the app bundle:

let modelPath = Bundle(for: type(of: self)).path(
        forResource: "${MODEL_FILE}",
        ofType: "pv")

Create an instance of Leopard for speech-to-text transcription:

import Leopard

do {
    let leopard = Leopard(
        accessKey: "${ACCESS_KEY}",
        modelPath: modelPath)
} catch { }

Transcribe speech to text by passing an audio file to the .processFile function:

do {
    let audioPath = Bundle(for: type(of: self)).path(
        forResource: "${AUDIO_FILE_NAME}",
        ofType: "${AUDIO_FILE_EXTENSION}")
    let result = leopard.process_file(audioPath);
    print(result.transcript)
} catch { }

For further details, visit the Leopard Speech-to-Text product page or refer to Leopard's iOS SDK quick start guide.

iOS Speech Recognition

Cobra VAD

Porcupine Wake Word

Rhino Speech-to-Intent

Cheetah Streaming Speech-to-Text

Leopard Speech-to-Text

More from Picovoice