Simplifying Speech Recognition in Flutter

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI applications running entirely on mobile without sharing user data with 3rd parties.

While Flutter offers the convenience of deploying apps pretty much anywhere, many developers get stumped when they require certain device-specific capabilities. Speech-to-Text is one such feature that becomes too difficult to add to a cross-platform framework. Picovoice’s Leopard Speech-to-Text SDK for Flutter allows us to add cross-platform, on-device speech recognition to a Flutter app with minimal code. With on-device speech-to-text, user audio will not leave the device to be transcribed in the cloud, giving us the benefit of significantly reduced latency and our users the benefit of increased privacy.

Setting Up a Flutter Project

Create a new Flutter project or open an existing one, and ensure you have the necessary permissions enabled for each platform:

Android Permissions

For Android, add the following to your AndroidManifest.xml:

<uses-permission android:name="android.permission.RECORD_AUDIO"/>
<uses-permission android:name="android.permission.INTERNET"/>

Internet permissions are only required for Picovoice AccessKey validation - audio will not be streamed.

iOS Permissions

For iOS, add the following to your Info.plist:

<key>NSMicrophoneUsageDescription</key>
<string>[Permission explanation]</string>

To incorporate Leopard Speech-to-Text into your Flutter project, you’ll need to add the leopard_flutter plugin as a dependency. Open your project's pubspec.yaml file and add the following:

dependencies:
  leopard_flutter: ^{LATEST_VERSION}

Place the desired language model into your project’s assets/ folder and add the file to your pubspec.yaml:

flutter:
  assets:
    - assets/{MODEL_FILE}.pv

Lastly, you will need a Picovoice AccessKey, which can obtained with a free Picovoice Console account.

Transcribing Audio with Leopard Speech-to-Text

Import leopard_flutter and create an instance of the Leopard class:

import 'package:leopard_flutter/leopard.dart';

final String accessKey = "{YOUR_ACCESS_KEY}";
final String modelPath = "assets/{MODEL_FILE}.pv";
try {
    Leopard leopard = await Leopard.create(accessKey, modelPath);
} on LeopardException catch (err) {
    // handle Leopard init error
}

Now, let’s assume we’re buffering audio data from the device’s microphone elsewhere in the app (we’ll implement this in the next section). We’ll take this PCM array and pass it straight into Leopard to be converted to text:

LeopardTranscript result = await leopard.process(audioData);
print(result.trascript);

The LeopardTranscript also contains some useful word metadata. The start position, end position and confidence are generated for each word:

for (LeopardWord word in result.words) {
    print(word.startSec);
    print(word.endSec);
    print(word.confidence);
}

How to Record Audio in Flutter

As with many other cross-platform frameworks, recording media in Flutter can be challenging. It requires native implementations, unified by an interface that can be used by Flutter. To simplify the process for our own demos, we created an audio capture plugin called flutter_voice_processor that will handle all the complexity for us. Add the plugin to your pubspec.yaml and then add the following code to start buffering audio data:

List<int> audioData = [];

final int frameLength = 512;
VoiceProcessor voiceProcessor = VoiceProcessor.getVoiceProcessor(
        frameLength, 
        leopard.sampleRate);

voiceProcessor.addListener((buffer) {
    List<int> frame = (buffer as List<dynamic>).cast<int>();
    audioData.addAll(frame);
});

if (await voiceProcessor.hasRecordAudioPermission()) {
    await voiceProcessor.start();
}

Putting It All Together

This is a simplified example, but contains all the necessary code to get started. For a complete demo application, see the Leopard Speech-to-Text Flutter demo on our GitHub repository.

import 'package:leopard_flutter/leopard.dart';
import 'package:leopard_flutter/leopard_error.dart';
import 'package:leopard_flutter/leopard_transcript.dart';
import 'package:flutter_voice_processor/flutter_voice_processor.dart';

final String accessKey = "{YOUR_ACCESS_KEY}";
final String modelPath = "assets/{MODEL_FILE}.pv";

Leopard leopard;
VoiceProcessor voiceProcessor;

final int frameLength = 512;
List<int> audioData = [];

@override
void initState() {
    super.initState();
    initLeopard();
}

Future<void> initLeopard() async {
    try {
        leopard = await Leopard.create(accessKey, modelPath);
        initVoiceProcessor(leopard.sampleRate);
    } on LeopardException catch (err) {
        // handle Leopard init error
    }
}

void initVoiceProcessor(int sampleRate) {
    voiceProcessor = VoiceProcessor.getVoiceProcessor(
        frameLength,
        sampleRate);
    voiceProcessor.addListener((buffer) {
        List<int> frame = (buffer as List<dynamic>).cast<int>();
        audioData.addAll(frame);
    });
}

Future<void> startRecording() async {
    audioData = [];
    if (await voiceProcessor.hasRecordAudioPermission()) {
        await voiceProcessor.start();
    }
}

Future<void> stopRecording() async {
    await voiceProcessor.stop();
    LeopardTranscript result = await leopard.process(audioData);
    print(result.transcript);
}

Real-time Transcription

You may have noticed that Leopard Speech-to-Text engine operates on chunks of buffered audio. For most speech recognition applications, this is the use-case we’ll be dealing with. However, if we are interested in providing live feedback to the user, we’ll need to use a real-time transcription engine. Picovoice’s Cheetah Streaming Speech-to-Text engine allows for streaming of transcription results while capturing audio. Check out the cheetah_flutter plugin if real-time feedback is a requirement for your project.

Cross-Platform Alternatives to Flutter

If Flutter is not your cross-platform framework of choice, there are also React Native SDKs for both Leopard and Cheetah. Cross-platform desktop and web are also supported through SDKs in Python, .NET, and JavaScript, to name a few. Check out the Leopard and Cheetah docs to see all the available SDKs and a wide array of helpful demos.