Implement Cross-Platform Streaming Speech-to-Text in C

🎯 Voice AI Consulting

Get dedicated support and consultation to ensure your specific needs are met.

Enterprise developers building real-time speech recognition in C face challenges with audio pipeline complexity, memory management, and cross-platform compatibility. Streaming speech-to-text (STT) solutions must also work reliably on Linux, Windows, macOS, and Raspberry Pi while maintaining minimal latency.

Cloud-based services like Azure Real-Time STT, Amazon Transcribe Streaming, and Google Streaming ASR require constant internet connectivity and send audio data to remote servers. For applications handling sensitive audio or that run in environments with unreliable internet connection, on-device speech recognition is essential.

This tutorial shows how to implement cross-platform streaming speech to text in C using Cheetah Streaming Speech-to-Text, an on-device engine compatible with Linux, Windows, macOS, and Raspberry Pi. You'll learn to capture live microphone input, process audio frames in real time, and generate accurate transcriptions—all from a single codebase.

By the end, you'll have a working C application that performs real-time transcription across all major platforms with a single codebase.

Important: This guide builds on How to Record Audio in C. If you haven't completed that setup yet, start with that tutorial to get your recording environment in place.

Prerequisites

C99-compatible compiler
Windows: MinGW

Supported Platforms

Linux (x86_64)
macOS (x86_64, arm64)
Windows (x86_64, arm64)
Raspberry Pi (3, 4, 5)

Project Setup

This is the folder structure used in this tutorial. You can organize your files differently if you like, but make sure to update the paths in the examples accordingly:

project_root/
├── cheetah_tutorial.c
├── pvrecorder/ # Set up previously in "How to Record Audio in C"
│   ├── libpv_recorder.{so|dylib|dll}
│   └── include/
│       ├── pv_circular_buffer.h
│       └── pv_recorder.h
└── cheetah/ # This folder will be created in the next step.
    ├── cheetah_params.pv
    ├── libpv_cheetah.{so|dylib|dll}
    └── include/
        ├── picovoice.h 
        └── pv_cheetah.h

To set up audio capture (pvrecorder), refer to How to Record Audio in C.

Step 1. Add Cheetah library files

Create a folder named cheetah/.
Download the Cheetah header files from GitHub and place them in:

cheetah/include/

Download a Cheetah model file and the correct library file for your platform and place them in:

cheetah/

If your application needs to recognize custom vocabulary, boost recognition of specific phrases, and custom pronunciations, train a custom STT model instead of using one of the default model files.

Implement Dynamic Loading

Cheetah distributes pre-built platform libraries, meaning:

the shared library (.so, .dylib, .dll) is not linked at compile time
the program loads it at runtime
functions must be retrieved by name

So, we need to write small helper functions to:

open the shared library
look up function pointers
close the library

Step 2. Include platform-specific headers

#if defined(_WIN32) || defined(_WIN64)
#include <windows.h>
#else
#include <dlfcn.h>
#endif

#include <getopt.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

#include "pv_cheetah.h"
#include "pv_recorder.h"

Why these matter

On Windows systems, windows.h provides the LoadLibrary function to load a shared library and GetProcAddress to retrieve individual function pointers.
On Unix-based systems, dlopen and dlsym from the dlfcn.h header provide the same functionality.
Lastly, signal.h allows us to handle Ctrl-C later in this example.

Step 3. Define dynamic loading helper functions

3a. Open the shared library

static void *open_dl(const char *dl_path) {
#if defined(_WIN32) || defined(_WIN64)
    return LoadLibrary(dl_path);
#else
    return dlopen(dl_path, RTLD_NOW);
#endif
}

3b. Load function symbols

static void *load_symbol(void *handle, const char *symbol) {
#if defined(_WIN32) || defined(_WIN64)
    return GetProcAddress((HMODULE) handle, symbol);
#else
    return dlsym(handle, symbol);
#endif
}

3c. Close the library

static void close_dl(void *handle) {
#if defined(_WIN32) || defined(_WIN64)
    FreeLibrary((HMODULE) handle);
#else
    dlclose(handle);
#endif
}

3d. Print platform-correct errors

static void print_dl_error(const char *message) {
#if defined(_WIN32) || defined(_WIN64)
    fprintf(stderr, "%s with code '%lu'.\n", message, GetLastError());
#else
    fprintf(stderr, "%s with `%s`.\n", message, dlerror());
#endif
}

Implement Streaming Speech-to-Text

Now that we've set up dynamic loading, we can actually use the Cheetah API.

Step 4. Load the library file

Downloaded the correct library file for your platform and point library_path to the file.

const char *library_path = "./cheetah/libpv_cheetah.so"; // adjust per platform
void *dl_handle = open_dl(library_path);

Step 5. Initialize Cheetah

Sign up for an account on Picovoice Console for free and obtain your AccessKey
Replace ${ACCESS_KEY} with your AccessKey
Download a model file and point model_path to the file. You can choose between default and fast models for each supported language.

Call pv_cheetah_init to create a Cheetah instance:

pv_status_t (*pv_cheetah_init_func)(const char *, const char *, float, bool, pv_cheetah_t **) =
    load_symbol(dl_handle, "pv_cheetah_init");

static const char* access_key = "${ACCESS_KEY}"; // replace with your AccessKey
const char *model_path = "./cheetah/cheetah_params.pv"; // replace with your chosen model file
float endpoint_duration_sec = 1.f;
bool enable_automatic_punctuation = false;

pv_cheetah_t *cheetah;
const pv_status_t status = pv_cheetah_init_func(
    access_key,
    model_path,
    endpoint_duration_sec,
    enable_automatic_punctuation,
    &cheetah);

Explanation of parameters:

access_key: Picovoice Console AccessKey
model_path: Choose desired language model or train a custom model
endpoint_duration_sec: Duration of endpoint in seconds. A speech endpoint is detected when there is a segment of audio (with a duration specified herein) after an utterance without any speech in it. Set to 0 to disable endpoint detection.
enable_automatic_punctuation : Set to true to enable automatic punctuation insertion.

Step 6. Transcribe audio

Pass recorded audio frames (with PvRecorder) to Cheetah for processing:

pv_status_t (*pv_cheetah_process_func)(pv_cheetah_t *, const int16_t *, char **, bool *) =
    load_symbol(dl_handle, "pv_cheetah_process");

pv_status_t (*pv_cheetah_flush_func)(pv_cheetah_t *, char **) = load_symbol(dl_handle, "pv_cheetah_flush");

pv_status_t (*pv_cheetah_transcript_delete_func)(char *) =
    load_symbol(dl_handle, "pv_cheetah_transcript_delete");

int32_t (*pv_cheetah_frame_length_func)() = load_symbol(dl_handle, "pv_cheetah_frame_length");

const int32_t frame_length = pv_cheetah_frame_length_func();
int16_t *frame = malloc(frame_length * sizeof(int16_t));

while (true) {
   // from "How to Record Audio in C"
   pv_recorder_status_t status = pv_recorder_read_func(recorder, frame);

   char *partial_transcript = NULL;
   bool is_endpoint = false;
   const pv_status_t status = pv_cheetah_process_func(cheetah, frame, &partial_transcript, &is_endpoint);
   fprintf(stdout, "%s", partial_transcript);
   pv_cheetah_transcript_delete(partial_transcript);

   if (is_endpoint) {
      char *final_transcript = NULL;
      status = pv_cheetah_flush_func(cheetah, &final_transcript);
      fprintf(stdout, "%s\n", final_transcript);
      pv_cheetah_transcript_delete_func(final_transcript);
   }
}
free(frame);

Explanation:

pv_cheetah_frame_length: Required number of samples per frame.
pv_cheetah_process: Buffers audio until sufficient context is available. Once sufficient context is available, it returns a partial transcript; otherwise, it returns NULL.
- is_endpoint: Indicates a natural pause in speech, marking a possible end of an utterance.
pv_cheetah_flush: Transcribes any remaining buffered audio.

Step 7. Cleanup

When done, delete Cheetah to free memory:

void (*pv_cheetah_delete_func)(pv_cheetah_t *) = load_symbol(dl_handle, "pv_cheetah_delete");
    
pv_cheetah_delete_func(cheetah);
close_dl(dl_handle);

Complete Example: On-device Streaming Transcription in C

Here is the complete cheetah_tutorial.c you can copy, build, and run (complete with PvRecorder):

Replace ${ACCESS_KEY} with your AccessKey from Picovoice Console
update model_path to point to the Cheetah model file (.pv)
update library_path to point to the correct Cheetah library for your platform
update pv_recorder_library_path to point to the correct PvRecorder library for your platform

#if defined(_WIN32) || defined(_WIN64)
#include <windows.h>
#else
#include <dlfcn.h>
#endif

#include <getopt.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

#include "pv_cheetah.h"
#include "pv_recorder.h"

static volatile bool is_interrupted = false;

void interrupt_handler(int _) {
    (void) _;
    is_interrupted = true;
}

static void *open_dl(const char *dl_path) {
#if defined(_WIN32) || defined(_WIN64)
    return LoadLibrary(dl_path);
#else
    return dlopen(dl_path, RTLD_NOW);
#endif
}

static void *load_symbol(void *handle, const char *symbol) {
#if defined(_WIN32) || defined(_WIN64)
    return GetProcAddress((HMODULE) handle, symbol);
#else
    return dlsym(handle, symbol);
#endif
}

static void close_dl(void *handle) {
#if defined(_WIN32) || defined(_WIN64)
    FreeLibrary((HMODULE) handle);
#else
    dlclose(handle);
#endif
}

static void print_dl_error(const char *message) {
#if defined(_WIN32) || defined(_WIN64)
    fprintf(stderr, "%s with code '%lu'.\n", message, GetLastError());
#else
    fprintf(stderr, "%s with `%s`.\n", message, dlerror());
#endif
}

int main(void) {
    signal(SIGINT, interrupt_handler);

    // PvRecorder dynamic loading
    const char *pv_recorder_library_path = "./pvrecorder/libpv_recorder.so"; // adjust per platform
    void *recorder_dl_handle = open_dl(pv_recorder_library_path);
    if (!recorder_dl_handle) {
        fprintf(stderr, "failed to load dynamic library at `%s`.\n", pv_recorder_library_path);
        exit(1);
    }

    const char *(*pv_recorder_status_to_string_func)(pv_recorder_status_t) = 
        load_symbol(recorder_dl_handle, "pv_recorder_status_to_string");
    if (!pv_recorder_status_to_string_func) {
        print_dl_error("failed to load `pv_recorder_status_to_string`");
        exit(1);
    }

    pv_recorder_status_t (*pv_recorder_init_func)(const int32_t, const int32_t, const int32_t, pv_recorder_t **) =
        load_symbol(recorder_dl_handle, "pv_recorder_init");
    if (!pv_recorder_init_func) {
        print_dl_error("failed to load `pv_recorder_init`");
        exit(1);
    }

    pv_recorder_status_t (*pv_recorder_start_func)(pv_recorder_t *) =
        load_symbol(recorder_dl_handle, "pv_recorder_start");
    if (!pv_recorder_start_func) {
        print_dl_error("failed to load `pv_recorder_start`");
        exit(1);
    }

    pv_recorder_status_t (*pv_recorder_read_func)(pv_recorder_t *, int16_t *) =
        load_symbol(recorder_dl_handle, "pv_recorder_read");
    if (!pv_recorder_read_func) {
        print_dl_error("failed to load `pv_recorder_read`");
        exit(1);
    }

    pv_recorder_status_t (*pv_recorder_stop_func)(pv_recorder_t *) =
        load_symbol(recorder_dl_handle, "pv_recorder_stop");
    if (!pv_recorder_stop_func) {
        print_dl_error("failed to load `pv_recorder_stop`");
        exit(1);
    }

    void (*pv_recorder_delete_func)(pv_recorder_t *) = load_symbol(recorder_dl_handle, "pv_recorder_delete");
    if (!pv_recorder_delete_func) {
        print_dl_error("failed to load `pv_recorder_delete`");
        exit(1);
    }

    // Cheetah dynamic loading
    const char *library_path = "./cheetah/libpv_cheetah.so"; // adjust per platform
    void *dl_handle = open_dl(library_path);
    if (!dl_handle) {
        fprintf(stderr, "failed to load dynamic library at `%s`.\n", library_path);
        exit(1);
    }

    const char *(*pv_status_to_string_func)(pv_status_t) = load_symbol(dl_handle, "pv_status_to_string");
    if (!pv_status_to_string_func) {
        print_dl_error("failed to load `pv_status_to_string`");
        exit(1);
    }

    pv_status_t (*pv_cheetah_init_func)(const char *, const char *, float, bool, pv_cheetah_t **) =
        load_symbol(dl_handle, "pv_cheetah_init");
    if (!pv_cheetah_init_func) {
        print_dl_error("failed to load `pv_cheetah_init`");
        exit(1);
    }

    void (*pv_cheetah_delete_func)(pv_cheetah_t *) = load_symbol(dl_handle, "pv_cheetah_delete");
    if (!pv_cheetah_delete_func) {
        print_dl_error("failed to load `pv_cheetah_delete`");
        exit(1);
    }

    pv_status_t (*pv_cheetah_process_func)(pv_cheetah_t *, const int16_t *, char **, bool *) =
        load_symbol(dl_handle, "pv_cheetah_process");
    if (!pv_cheetah_process_func) {
        print_dl_error("failed to load `pv_cheetah_process`");
        exit(1);
    }

    pv_status_t (*pv_cheetah_flush_func)(pv_cheetah_t *, char **) = load_symbol(dl_handle, "pv_cheetah_flush");
    if (!pv_cheetah_flush_func) {
        print_dl_error("failed to load `pv_cheetah_flush`");
        exit(1);
    }

    int32_t(*pv_cheetah_frame_length_func)() = load_symbol(dl_handle, "pv_cheetah_frame_length");
    if (!pv_cheetah_frame_length_func) {
        print_dl_error("failed to load `pv_cheetah_frame_length`");
        exit(1);
    }

    pv_status_t (*pv_cheetah_transcript_delete_func)(char *) =
        load_symbol(dl_handle, "pv_cheetah_transcript_delete");
    if (!pv_cheetah_transcript_delete_func) {
        print_dl_error("failed to load `pv_cheetah_transcript_delete`");
        exit(1);
    }

    const char *access_key = "${ACCESS_KEY}";
    const char *model_path = "./cheetah/cheetah_params.pv";
    const float endpoint_duration_sec = 1.f;
    const bool enable_automatic_punctuation = true;

    pv_cheetah_t *cheetah = NULL;
    pv_status_t status = pv_cheetah_init_func(
            access_key,
            model_path,
            endpoint_duration_sec,
            enable_automatic_punctuation,
            &cheetah);
    if (status != PV_STATUS_SUCCESS) {
        fprintf(stderr, "Failed to init with `%s`", pv_status_to_string_func(status));
    }

    const int32_t frame_length = pv_cheetah_frame_length_func();
    const int32_t device_index = -1; // -1 == default device
    const int32_t buffered_frame_count = 10;

    pv_recorder_t *recorder = NULL;
    pv_recorder_status_t recorder_status = pv_recorder_init_func(
            frame_length,
            device_index,
            buffered_frame_count,
            &recorder);
    if (recorder_status != PV_RECORDER_STATUS_SUCCESS) {
        fprintf(stderr, "Failed to initialize device with %s.\n", pv_recorder_status_to_string_func(recorder_status));
        exit(1);
    }

    recorder_status = pv_recorder_start_func(recorder);
    if (recorder_status != PV_RECORDER_STATUS_SUCCESS) {
        fprintf(stderr, "Failed to start device with %s.\n", pv_recorder_status_to_string_func(recorder_status));
        exit(1);
    }

    int16_t *frame = malloc(frame_length * sizeof(int16_t));

    printf("Recording... Press Ctrl+C to stop.\n");
    while (!is_interrupted) {
        pv_recorder_status_t recorder_status = pv_recorder_read_func(recorder, frame);
        if (recorder_status != PV_RECORDER_STATUS_SUCCESS) {
            fprintf(
                    stderr,
                    "Failed to read audio frames with %s.\n",
                    pv_recorder_status_to_string_func(recorder_status));
            exit(1);
        }

        char *partial_transcript = NULL;
        bool is_endpoint = false;
        status = pv_cheetah_process_func(cheetah, frame, &partial_transcript, &is_endpoint);
        if (status != PV_STATUS_SUCCESS) {
            fprintf(
                    stderr,
                    "Failed to process with `%s`",
                    pv_status_to_string_func(status));
            exit(1);
        }
        fprintf(stdout, "%s", partial_transcript);
        fflush(stdout);
        pv_cheetah_transcript_delete_func(partial_transcript);
        if (is_endpoint) {
            char *final_transcript = NULL;
            status = pv_cheetah_flush_func(cheetah, &final_transcript);
            if (status != PV_STATUS_SUCCESS) {
                fprintf(
                        stderr,
                        "Failed to flush with `%s`",
                        pv_status_to_string_func(status));
                exit(1);
            }
            fprintf(stdout, "%s\n", final_transcript);
            pv_cheetah_transcript_delete_func(final_transcript);
        }
    }
    free(frame);
    fprintf(stdout, "\n");

    recorder_status = pv_recorder_stop_func(recorder);
    if (recorder_status != PV_RECORDER_STATUS_SUCCESS) {
        fprintf(stderr, "Failed to stop device with %s.\n", pv_recorder_status_to_string_func(recorder_status));
        exit(1);
    }

    printf("Stopped.\n");
    pv_recorder_delete_func(recorder);
    pv_cheetah_delete_func(cheetah);
    close_dl(recorder_dl_handle);
    close_dl(dl_handle);
}

This is a simplified example but includes all the necessary components to get started. Check out the Cheetah C demo on GitHub for a complete demo application.

Build & Run

Build and run the application:

Linux (gcc) and Raspberry Pi (gcc)

gcc -std=c99 -O2 -Wall -Wextra -I./pvrecorder/include -I./cheetah/include -o cheetah_tutorial cheetah_tutorial.c -ldl

./cheetah_tutorial

macOS (clang)

clang -std=c99 -O2 -Wall -Wextra -I./pvrecorder/include -I./cheetah/include -o cheetah_tutorial cheetah_tutorial.c

./cheetah_tutorial

Windows (MinGW)

gcc -std=c99 -O2 -Wall -Wextra -I./pvrecorder/include -I./cheetah/include -o cheetah_tutorial.exe cheetah_tutorial.c

./cheetah_tutorial.exe

Troubleshooting Common Issues

1. Speech-to-Text Returns Silence or No Transcription

Make sure you're capturing audio from the correct microphone. If you're using PvRecorder, check that it's set up properly before proceeding.

2. Partial Words or Truncated Transcriptions

If words appear cut off or transcriptions seem incomplete, you may be terminating the audio stream before all buffered audio has been processed. Cheetah Streaming Speech-to-Text maintains an internal buffer to ensure accurate context-based recognition.

Solution: Always call pv_cheetah_flush after you've finished streaming audio. This function processes any remaining buffered audio and returns the final transcript segment.

3. Increase Transcription Speed

If you need faster transcriptions than the default model provides, consider using a Cheetah fast model instead.

Solution: Switch to a fast model variant designed for lower latency. Fast models process audio more quickly with a minor reduction in accuracy—typically acceptable for real-time applications where responsiveness is critical.

4. Library Initialization Fails on Target Platform

If Cheetah fails to initialize, you may be using an incorrect library binary for your system architecture.

Solution: Download the correct library file for your specific platform and architecture combination (e.g., Linux x86_64, macOS ARM64, Windows x86_64, Raspberry Pi). The library file extension varies by platform: .so (Linux), .dylib (macOS), .dll (Windows).

Start Building

Frequently Asked Questions

Can I use multiple STT engines simultaneously in C?

Yes. You can run each engine on its own thread with separate audio buffers. Ensure proper synchronization to avoid race conditions.

What is the ideal audio frame size for streaming STT in C?

Frame sizes of 256–1024 samples are typical. Picovoice engines typically require a frame size of 512. Smaller frames reduce latency but increase CPU usage; larger frames reduce CPU load but increase latency.

How do I compile for cross-platform deployment from a single codebase?

The code in this tutorial is already cross-platform. Use conditional compilation directives (e.g. "#if defined(_WIN32)") to handle platform-specific library loading. Compile with the appropriate compiler for each target platform: gcc for Linux and Raspberry Pi, clang for macOS, MinGW for Windows.

How do I handle microphone input on different platforms?

Use a cross-platform library like PvRecorder. It abstracts away platform-specific APIs and provides a consistent interface for capturing live audio from microphones on Linux, Windows, macOS, and Raspberry Pi.