Real-Time Transcription for Flutter: On-Device STT

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI applications running entirely on mobile without sharing user data with 3rd parties.

Speech-to-text is one of the most natural ways to interact with devices. From note-taking apps to hands-free controls, it opens up new levels of accessibility and user experience.

Real-time speech-to-text takes it one step further, letting your app transcribe audio instantly without needing to wait for the recording to finish. While Flutter simplifies cross-platform development, handling continuous audio streams and low-latency processing can be challenging.

That's where Picovoice's Cheetah Streaming Speech-to-Text Flutter SDK comes in. It delivers continuous, low-latency transcription directly on-device—no cloud, no delay, and full privacy. In this post, we'll show how to integrate streaming speech-to-text into your Flutter app for fast and reliable real-time voice interaction.

Cheetah Streaming Speech-to-Text delivers higher accuracy than Google Streaming ASR, despite running on the device—being much smaller than Google Cloud.

This guide shows how to add custom, on-device streaming speech-to-text to a Flutter app with Cheetah Streaming Speech-to-Text.

What you'll learn:

How to set up audio recording permissions for iOS and Android
How to record audio in Flutter with VoiceProcessor
How to add Cheetah Streaming Speech-to-Text to your Flutter app

What you need:

Flutter SDK (2.8.1+)
Android SDK (21+)
JDK (11+)
Xcode (13+)

Enable Microphone Permissions

This tutorial requires recording audio, so before we begin, you'll need to configure your Flutter project to request audio recording permissions from the user. Make sure the appropriate permissions are enabled for each platform:

iOS

Add the following block to Info.plist:

<key>NSMicrophoneUsageDescription</key>
<string>[Permission explanation]</string>

Android

Add the following block to AndroidManifest.xml:

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />

Internet is required only for licensing and usage tracking. Audio remains on-device, and is not streamed. Once Cheetah has been initialized, it can run offline.

Recording Audio with VoiceProcessor

As you'll see later, Cheetah is easy to use—just pass it audio, and it returns text. But how do you record that audio? Like many cross-platform frameworks, recording media in Flutter can be challenging. To simplify this process, we created an audio capture plugin that handles all the complexity for us: flutter_voice_processor.

In this section, we'll show how to record audio in Flutter using this plugin. In the next section, we'll show how to pass this audio to Cheetah for transcription.

Add the flutter_voice_processor plugin as a dependency. Open your project's pubspec.yaml file and add the following:

dependencies:
  flutter_voice_processor: ^{LATEST_VERSION}

VoiceProcessor: Step-by-Step Code Walkthrough

Create an instance of VoiceProcessor and add frame listeners. Eventually, we'll pass the audio to Cheetah for transcription, but for now, we won't actually do anything useful with the recorded audio:

import 'package:flutter_voice_processor/flutter_voice_processor.dart';

// ...
  _voiceProcessor = VoiceProcessor.instance;
  _voiceProcessor?.addFrameListener(_onFrame);
  _voiceProcessor?.addErrorListener(_onError);
  
  void _onFrame(List<int> frame) {
    print('Received frame of length: ${frame.length}');
  }

  void _onError(VoiceProcessorException error) {
    print('Voice processor error: $error');
  }

Call hasRecordAudioPermission() to prompt the user to give audio recording permissions. Once accepted, call start() to begin recording:

import 'package:flutter_voice_processor/flutter_voice_processor.dart';

// ...
  final hasPermission =
      await _voiceProcessor?.hasRecordAudioPermission() ?? false;
  if (!hasPermission) {
    print("Record audio permission not granted.");
    return;
  }

  await _voiceProcessor?.start(_frameLength, _sampleRate);

Call stop() to stop recording audio:

await _voiceProcessor?.stop();

When you no longer need to record audio, clean up the frame listeners.

_voiceProcessor?.removeFrameListener(_onFrame);
_voiceProcessor?.removeErrorListener(_onError);

Below is a fully implemented widget you can add to your project to see flutter_voice_processor in action:

import 'package:flutter/material.dart';
import 'package:flutter_voice_processor/flutter_voice_processor.dart';

enum AppState { ready, loading, listening }

class VoiceProcessorWidget extends StatefulWidget {
  const VoiceProcessorWidget({super.key});

  @override
  State<VoiceProcessorWidget> createState() => _VoiceProcessorWidgetState();
}

class _VoiceProcessorWidgetState extends State<VoiceProcessorWidget> {
  AppState _appState = AppState.ready;
  VoiceProcessor? _voiceProcessor;
  final int _frameLength = 512;
  final int _sampleRate = 16000;

  // 1. Initialize VoiceProcessor and add frame listeners
  @override
  void initState() {
    super.initState();
    _voiceProcessor = VoiceProcessor.instance;
    _voiceProcessor?.addFrameListener(_onFrame);
    _voiceProcessor?.addErrorListener(_onError);
  }

  void _onFrame(List<int> frame) {
    print('Received frame of length: ${frame.length}');
  }

  void _onError(VoiceProcessorException error) {
    print('Voice processor error: $error');
  }

  // 2. Request audio recording permission and start recording audio
  Future<void> _start() async {
    setState(() => _appState = AppState.loading);
    try {
      final hasPermission =
          await _voiceProcessor?.hasRecordAudioPermission() ?? false;
      if (!hasPermission) {
        print("Record audio permission not granted.");
        setState(() => _appState = AppState.ready);
        return;
      }

      await _voiceProcessor?.start(_frameLength, _sampleRate);
      setState(() => _appState = AppState.listening);
    } on VoiceProcessorException catch (err) {
      print("Error starting voice processor: $ex");
      setState(() => _appState = AppState.ready);
    }
  }

  // 3. Stop recording audio
  Future<void> _stop() async {
    setState(() => _appState = AppState.loading);
    try {
      await _voiceProcessor?.stop();
    } on VoiceProcessorException catch (err) {
      print("Error stopping voice processor: $ex");
    }
    setState(() => _appState = AppState.ready);
  }

  // 4. Clean up resources
  @override
  void dispose() {
    _voiceProcessor?.stop();
    _voiceProcessor?.removeFrameListener(_onFrame);
    _voiceProcessor?.removeErrorListener(_onError);
    super.dispose();
  }

  @override
  Widget build(BuildContext context) {
    return Column(
      children: [
        ElevatedButton(
          onPressed: _appState == AppState.ready ? _start : null,
          child: const Text('Start'),
        ),
        ElevatedButton(
          onPressed: _appState == AppState.listening ? _stop : null,
          child: const Text('Stop'),
        ),
      ],
    );
  }
}

This is a simplified example that includes all the essential code to get you started. If you'd like to see a complete working app, check out the Flutter Voice Processor demo on our GitHub repository.

You can also explore our documentation for more details:

Streaming Speech-to-Text with Cheetah

Now that we know how to record audio in Flutter (assuming you've followed the previous section), we'll now learn how to pass recorded audio to Cheetah for streaming speech-to-text.

Add Cheetah Flutter Plugin: To use Cheetah Streaming Speech-to-Text in your Flutter project, add the cheetah_flutter plugin as a dependency. Open your project's pubspec.yaml file and add the following:

dependencies:
  flutter_voice_processor: ^{LATEST_VERSION} # from previous section
  cheetah_flutter: ^{LATEST_VERSION}

Get Your Picovoice Access Key: Sign up for a free Picovoice Console account and obtain your AccessKey. The AccessKey is only required for authentication and authorization.
Train Custom Model: Create and download a custom model using the Picovoice Console. This is useful if you need Cheetah to recognize words outside the standard vocabulary, prioritize certain words for easier recognition, or modify pronunciations of certain words. If you do not need a custom model, use one of the default models.

If you'd like to see a video walkthrough for this step, check out Picovoice Console Tutorial: Leopard & Cheetah Speech-to-Text.

Add Model Files to Your Project: Place your model file into your project's assets/ folder and add the file path to your pubspec.yaml:

flutter:
  assets:
    - assets/{MODEL_FILE} # e.g. assets/cheetah_params.pv

Cheetah: Step-by-Step Code Walkthrough

Create an instance of Cheetah:

import "package:cheetah_flutter/cheetah.dart";

// ...
  Cheetah? _cheetah;

  _cheetah = await Cheetah.create(
    "{ACCESS_KEY}",
    "assets/{MODEL_FILE}", // e.g. assets/cheetah_params.pv
    enableAutomaticPunctuation: true,
  );

To transcribe a chunk of audio, pass it to process(). We will be using flutter_voice_processor to handle recording and passing audio to Cheetah, but for now we'll omit the implementation for brevity:

List<int> buffer = getAudioFrame();

CheetahTranscript partialResult = await _cheetah.process(getAudioFrame());
print(partialResult.transcript);

When you no longer need Cheetah, call delete() to release the acquired resources:

await _cheetah.delete();

Below is a fully implemented widget you can add to your project to see Cheetah and VoiceProcessor in action. Be sure to replace {ACCESS_KEY} with your own AccessKey from Picovoice Console and {MODEL_FILE} with your model file.

import "package:flutter/material.dart";
import "package:flutter_voice_processor/flutter_voice_processor.dart";
import "package:cheetah_flutter/cheetah.dart";
import "package:cheetah_flutter/cheetah_error.dart";

enum AppState { initial, loading, ready, listening }

class CheetahWidget extends StatefulWidget {
  const CheetahWidget({super.key});

  @override
  State<CheetahWidget> createState() => _CheetahWidgetState();
}

class _CheetahWidgetState extends State<CheetahWidget> {
  VoiceProcessor? _voiceProcessor;
  Cheetah? _cheetah;
  AppState _appState = AppState.initial;

  // Pass audio to Cheetah and handle outputted transcript
  void _onFrame(List<int> frame) async {
    if (_cheetah == null) return;

    try {
      final CheetahTranscript partialResult = await _cheetah!.process(frame);

      if (partialResult.isEndpoint) {
        final CheetahTranscript finalResult = await _cheetah!.flush();

        String finalTranscript = partialResult.transcript;
        if (finalResult.transcript.isNotEmpty) {
          finalTranscript += " ${finalResult.transcript}";
        }

        print("Final: $finalTranscript");
      } else if (partialResult.transcript.isNotEmpty) {
        print("Partial: ${partialResult.transcript}");
      }
    } on CheetahException catch (error) {
      print("Cheetah error: ${error.message}");
    }
  }

  void _onError(VoiceProcessorException error) {
    print("Voice processor error: $error");
  }

  // 1. Initialize Voice Processor & Cheetah, request record audio permissions
  Future<void> _init() async {
    setState(() => _appState = AppState.loading);

    _voiceProcessor ??= VoiceProcessor.instance;
    final hasPermission =
        await _voiceProcessor?.hasRecordAudioPermission() ?? false;
    if (!hasPermission) {
      print("Record audio permission not granted.");
      setState(() => _appState = AppState.initial);
      return;
    }

    if (_cheetah == null) {
      try {
        _cheetah = await Cheetah.create(
          "{ACCESS_KEY}",
          "assets/{MODEL_FILE}", // e.g. assets/cheetah_params.pv
          enableAutomaticPunctuation: true,
        );

        _voiceProcessor?.addFrameListener(_onFrame);
        _voiceProcessor?.addErrorListener(_onError);

        setState(() => _appState = AppState.ready);
      } on CheetahException catch (err) {
        print("Cheetah initialization failed: ${ex.message}");
        setState(() => _appState = AppState.initial);
      }
    }
  }

  // 2. Start recording audio 
  Future<void> _start() async {
    if (_cheetah == null) return;
    setState(() => _appState = AppState.loading);

    try {
      await _voiceProcessor?.start(_cheetah!.frameLength, _cheetah!.sampleRate);
      setState(() => _appState = AppState.listening);
    } on CheetahException catch (err) {
      print("Error starting voice processor: $ex");
      setState(() => _appState = AppState.ready);
    }
  }

  // 3. Stop recording audio
  Future<void> _stop() async {
    setState(() => _appState = AppState.loading);

    try {
      await _voiceProcessor?.stop();
      setState(() => _appState = AppState.ready);
    } on CheetahException catch (err) {
      print("Error stopping voice processor: $ex");
      setState(() => _appState = AppState.listening);
    }
  }

  // 4. Cleanup resources
  Future<void> _delete() async {
    setState(() => _appState = AppState.loading);

    await _stop();
    _voiceProcessor?.removeFrameListener(_onFrame);
    _voiceProcessor?.removeErrorListener(_onError);

    await _cheetah?.delete();
    _cheetah = null;

    setState(() => _appState = AppState.initial);
  }

  @override
  void dispose() {
    _voiceProcessor?.stop();
    _voiceProcessor?.removeFrameListener(_onFrame);
    _voiceProcessor?.removeErrorListener(_onError);
    _cheetah?.delete();
    super.dispose();
  }

  @override
  Widget build(BuildContext context) {
    return Column(
      children: [
        ElevatedButton(
          onPressed: _appState == AppState.initial ? _init : null,
          child: const Text("Init"),
        ),
        ElevatedButton(
          onPressed: _appState == AppState.ready ? _start : null,
          child: const Text("Start"),
        ),
        ElevatedButton(
          onPressed: _appState == AppState.listening ? _stop : null,
          child: const Text("Stop"),
        ),
        ElevatedButton(
          onPressed: _appState == AppState.ready ? _delete : null,
          child: const Text("Delete"),
        ),
      ],
    );
  }
}

This is a simplified example that includes all the essential code to get you started. If you'd like to see a complete working app, check out the Cheetah Flutter demo on our GitHub repository.

This tutorial uses the following packages:

You can also explore our documentation for more details:

Batch Transcription

Streaming Speech-to-Text is ideal when you want transcripts in real time, without waiting for the audio to finish. If real-time transcription isn't critical, and you need features like word-level metadata, consider using Leopard Speech-to-Text. Use leopard_flutter if this is a requirement for your project.

Start Building

Real-Time Transcription for Flutter: On-Device Speech-to-Text Tutorial