Android Real-Time TTS: Streaming Text-to-Speech Tutorial [2025]

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI applications running entirely on mobile without sharing user data with 3rd parties.

Streaming Text-to-Speech (TTS) enables Android apps to generate and play audio incrementally as text arrives, which is essential for real-time voice interfaces, accessibility features, and conversational assistants.

A major limitation on Android is that the native TextToSpeech API cannot produce streaming or token-by-token audio. It requires complete text before synthesis, making it unsuitable for real-time applications like handling partial LLM outputs.

Cloud-based TTS services such as Amazon Polly, Azure TTS, ElevenLabs, and OpenAI TTS introduce additional issues: network latency, dependency on connectivity, and privacy concerns. Even the fastest cloud engines can add hundreds to thousands of milliseconds of delay, whereas on-device TTS begins synthesizing immediately and delivers audio 6.5x faster than the closest competitor (ElevenLabs).

The solution is on-device, streaming speech synthesis with Orca Streaming Text-to-Speech. This tutorial demonstrates how to build real-time speech generation on Android using Orca for voice synthesis and Android's AudioTrack API for PCM audio streaming. The approach works with any streaming text source—including live LLM output (ChatGPT, Claude, or picoLLM On-device LLM Inference) or dynamically generated content.

What you'll learn:

Initialize an on-device TTS engine in Android
Stream text to the TTS engine to generate real-time speech (PCM data)
Handle PCM audio playback with AudioTrack

Key benefits for enterprise developers:

Low-latency streaming: Audio plays as text arrives; 130 ms first-word latency
On-device processing: Runs in environments with unreliable network connectivity
Flexible text sources: Works with LLMs, user input, or any streaming text source

How to Build Streaming TTS on Android

Prerequisites

Before you begin, make sure you have the following:

Android Studio
Android device or emulator (Android 7.0 API 24 or higher)
USB debugging enabled on your Android device
Picovoice Account and AccessKey

1. Project Setup

This tutorial demonstrates a project built with Kotlin with Jetpack Compose, targeting Android 15 (API level 35) with a minimum supported version of Android 7.0 (API level 24).

Add Internet Permission

Include this in your AndroidManifest.xml:

<uses-permission android:name="android.permission.INTERNET" />

Orca Streaming Text-to-Speech requires internet connectivity only for authenticating your AccessKey. All speech synthesis runs entirely on-device.

2. Add Orca Library and Model File

2a. Add Orca Library via Maven Central

In app/build.gradle.kts, add orca-android to your dependencies:

dependencies {
    implementation(libs.orca.android)
}

Then in gradle/libs.versions.toml (replace {LATEST_VERSION}, e.g. 1.2.0):

[versions]
orcaAndroid = "{LATEST_VERSION}"

[libraries]
orca-android = { module = "ai.picovoice:orca-android", version.ref = "orcaAndroid" }

Execute a Gradle sync.

2b. Add Orca Model File

Orca uses model files (.pv) for different languages and voices.

Download your desired model from the Orca GitHub repository. The filename indicates the language and gender of the speaker.
Place the model in your Android project under: {ANDROID_APP}/src/main/assets

3. Implement Speech Synthesis with Orca

3a. Initialize Orca

Use Orca.Builder to create an Orca instance:

orca = Orca.Builder()
    .setAccessKey("{ACCESS_KEY}") // AccessKey from Picovoice Console
    .setModelPath("{ORCA_MODEL_FILE}") // e.g. orca_params_en_male.pv
    .build(appContext)

3b. Create OrcaStream

Open an OrcaStream object:

val params = OrcaSynthesizeParams.Builder().build()
val stream = orca?.streamOpen(params)

Optionally, OrcaSynthesizeParams.Builder can be used to configure settings such as speech rate.

3c. Streaming Text to Speech

We'll simulate text streaming by looping through an array of words and passing each word to Orca one at a time:

import java.util.concurrent.ConcurrentLinkedQueue

val pcmQueue = ConcurrentLinkedQueue<ShortArray>()

for (word in words) {
    val chunk = "$word "
    val pcm = stream.synthesize(chunk)
    if (pcm != null && pcm.isNotEmpty()) pcmQueue.add(pcm)
}

// Flush remaining buffered text
val flushed = stream.flush()
if (flushed != null && flushed.isNotEmpty()) pcmQueue.add(flushed)

Orca synthesizes speech from text incrementally using a streaming interface. Orca buffers incoming text internally until it has enough context to generate speech.

synthesize() returns null if Orca needs more text to generate audio.
Call flush() after passing all text to ensure that any remaining buffered text is synthesized.
PCM audio chunks are added to a queue for playback, allowing the audio to be played while more text is still being synthesized.

4. Playing Synthesized Speech with AudioTrack

Once you have PCM audio chunks in a queue, you can play them using AudioTrack, which streams raw PCM audio to the device's speakers.

4a. Configure AudioTrack

Orca outputs mono, 16-bit PCM, with a sample rate of 22050 Hz, which is a common format for speech synthesis. Using AudioTrack in streaming mode allows you to play audio chunks incrementally, keeping latency low.

val sampleRate = orca?.sampleRate ?: 22050
val minBuffer = AudioTrack.getMinBufferSize(
    sampleRate,
    AudioFormat.CHANNEL_OUT_MONO,
    AudioFormat.ENCODING_PCM_16BIT
)

val audioTrack = AudioTrack.Builder()
    .setAudioFormat(
        AudioFormat.Builder()
            .setEncoding(AudioFormat.ENCODING_PCM_16BIT)
            .setSampleRate(sampleRate)
            .setChannelMask(AudioFormat.CHANNEL_OUT_MONO)
            .build()
    )
    .setBufferSizeInBytes(minBuffer)
    .setTransferMode(AudioTrack.MODE_STREAM)
    .build()

Explanation of key settings:

ENCODING_PCM_16BIT: Matches Orca's 16-bit PCM output.
CHANNEL_OUT_MONO: Single-channel audio for voice playback; matches Orca's mono PCM output.
MODE_STREAM: Enables incremental writing of audio data as it's synthesized, instead of buffering everything first.

4b. Play Audio from PCM Queue

Once AudioTrack is configured, you can continuously write PCM chunks from a queue filled by Orca. Using a queue allows synthesis and playback to run simultaneously on separate threads:

audioTrack.play()
while (isQueueing.get() || pcmQueue.isNotEmpty()) {
    val pcm = pcmQueue.poll()
    if (pcm != null) audioTrack.write(pcm, 0, pcm.size)
}

Key points:

isQueueing.get(): Ensures playback continues while new audio chunks are being synthesized.
pcmQueue.poll(): Fetches the next available PCM chunk for immediate playback.
audioTrack.write(): Streams PCM data directly to the audio hardware.

5. Stop & Clean Up Resources

When done, always clean up resources to free memory:

audioTrack.stop()
audioTrack.release()

orcaStream?.close()
orcaStream = null
orca?.delete()
orca = null

Complete Example: Android Streaming TTS

Below is a simplified but complete example demonstrating:

State handling (Initial, Loading, Ready, Streaming)
Buttons to initialize Orca, stream text, and stop/cleanup
Multithreaded PCM synthesis and playback

Replace {ORCA_MODEL_FILE} with your model file (.pv) and {ACCESS_KEY} with your Picovoice AccessKey.

package com.example.orcatest

import ai.picovoice.orca.*
import ai.picovoice.orca.Orca.OrcaStream
import android.content.Context
import android.media.AudioFormat
import android.media.AudioTrack
import android.os.Bundle
import androidx.activity.ComponentActivity
import androidx.activity.compose.setContent
import androidx.activity.enableEdgeToEdge
import androidx.compose.foundation.layout.*
import androidx.compose.material3.*
import androidx.compose.runtime.*
import androidx.compose.ui.Modifier
import androidx.compose.ui.unit.dp
import com.example.orcatest.ui.theme.OrcaTestTheme
import java.io.File
import java.io.FileOutputStream
import java.util.concurrent.ConcurrentLinkedQueue
import java.util.concurrent.Executors
import java.util.concurrent.atomic.AtomicBoolean

enum class AppState {
    Initial,    // Ready to initialize Orca
    Loading,    // Initialization in progress
    Ready,      // User can type input and start streaming
    Streaming,  // Orca is synthesizing text and audio is playing
}

class MainActivity : ComponentActivity() {

    private var orca: Orca? = null
    private var orcaStream: OrcaStream? = null
    private val pcmQueue = ConcurrentLinkedQueue<ShortArray>()
    private val isQueueing = AtomicBoolean(false)

    private val executorSynthesize = Executors.newSingleThreadExecutor()
    private val executorPlay = Executors.newSingleThreadExecutor()

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        enableEdgeToEdge()
        setContent {
            OrcaTestTheme {
                AppUI(appContext = applicationContext)
            }
        }
    }

    @Composable
    fun AppUI(appContext: Context) {
        var state by remember { mutableStateOf(AppState.Initial) }
        var errorMsg by remember { mutableStateOf<String?>(null) }
        var userText by remember { mutableStateOf("") }
        val isBusy = state == AppState.Loading || state == AppState.Streaming

        Column(
            modifier = Modifier
                .fillMaxSize()
                .padding(16.dp),
            verticalArrangement = Arrangement.spacedBy(16.dp)
        ) {
            errorMsg?.let { Text(it, color = MaterialTheme.colorScheme.error) }

            if (state == AppState.Initial) {
                Button(
                    onClick = {
                        state = AppState.Loading
                        errorMsg = null
                        try {
                            orca = Orca.Builder()
                                .setAccessKey("{ACCESS_KEY}")
                                .setModelPath("{ORCA_MODEL_FILE}")
                                .build(appContext)
                            state = AppState.Ready
                        } catch (e: Exception) {
                            errorMsg = "Initialization failed: ${e.message}"
                            state = AppState.Initial
                        }
                    },
                    enabled = !isBusy
                ) {
                    Text("Initialize Orca")
                }
            }

            if (state == AppState.Ready || state == AppState.Streaming) {
                OutlinedTextField(
                    value = userText,
                    onValueChange = { userText = it },
                    label = { Text("Text to synthesize") },
                    enabled = !isBusy,
                    modifier = Modifier.fillMaxWidth()
                )

                Row(horizontalArrangement = Arrangement.spacedBy(8.dp)) {
                    Button(
                        onClick = {
                            state = AppState.Streaming
                            errorMsg = null
                            startStreaming(userText) {
                                state = AppState.Ready
                            }
                        },
                        enabled = userText.isNotBlank() && !isBusy
                    ) {
                        Text("Start Streaming")
                    }

                    Button(
                        onClick = {
                            stopEverything()
                            state = AppState.Initial
                        },
                        enabled = !isBusy
                    ) {
                        Text("Stop / Cleanup")
                    }
                }
            }

            if (state == AppState.Loading) {
                Text("Initializing...")
            }
            if (state == AppState.Streaming) {
                Text("Streaming...")
            }
        }
    }

    private fun startStreaming(text: String, onComplete: () -> Unit) {
        val stream = try {
            orcaStream ?: orca?.streamOpen(OrcaSynthesizeParams.Builder().build())
        } catch (e: Exception) {
            null
        } ?: run {
            onComplete()
            return
        }

        orcaStream = stream
        pcmQueue.clear()
        isQueueing.set(true)

        // Thread 1: synthesize PCM chunks
        executorSynthesize.submit {
            try {
                val words = text.split(" ")
                for (word in words) {
                    val chunk = "$word "
                    val pcm = stream.synthesize(chunk)
                    if (pcm != null && pcm.isNotEmpty()) pcmQueue.add(pcm)
                }
                val flushed = stream.flush()
                if (flushed != null && flushed.isNotEmpty()) pcmQueue.add(flushed)
            } catch (e: Exception) {
                e.printStackTrace()
            } finally {
                isQueueing.set(false)
            }
        }

        // Thread 2: play PCM
        executorPlay.submit {
            try {
                val sampleRate = orca?.sampleRate ?: 22050
                val minBuffer = AudioTrack.getMinBufferSize(
                    sampleRate,
                    AudioFormat.CHANNEL_OUT_MONO,
                    AudioFormat.ENCODING_PCM_16BIT
                )
                val audioTrack = AudioTrack.Builder()
                    .setAudioFormat(
                        AudioFormat.Builder()
                            .setEncoding(AudioFormat.ENCODING_PCM_16BIT)
                            .setSampleRate(sampleRate)
                            .setChannelMask(AudioFormat.CHANNEL_OUT_MONO)
                            .build()
                    )
                    .setBufferSizeInBytes(minBuffer)
                    .setTransferMode(AudioTrack.MODE_STREAM)
                    .build()
                audioTrack.play()
                while (isQueueing.get() || pcmQueue.isNotEmpty()) {
                    val pcm = pcmQueue.poll()
                    if (pcm != null) audioTrack.write(pcm, 0, pcm.size)
                }
                audioTrack.stop()
                audioTrack.release()
            } catch (e: Exception) {
                e.printStackTrace()
            } finally {
                onComplete()
            }
        }
    }

    private fun stopEverything() {
        try {
            orcaStream?.close()
            orcaStream = null
        } catch (_: Exception) {}
        try {
            orca?.delete()
            orca = null
        } catch (_: Exception) {}
        pcmQueue.clear()
        isQueueing.set(false)
    }
}

For a complete Android application, see the Orca Streaming Text-to-Speech Android demo on GitHub.

This tutorial uses the following package:

Orca Android

Explore our documentation for more details:

Troubleshooting

Initialization fails: Ensure the model file exists in assets and is copied to internal storage.
No audio output: Verify your device's volume, audio routing, and that the AudioTrack sample rate and channel configuration matches Orca's output (mono, 16-bit PCM, with a sample rate of 22050 Hz).
Latency or gaps in streaming: Use proper queue management. Ensure your text chunks are passed as incrementally as they become available. Call flush() when the stream completes.

Next Steps

Optimize Streaming TTS for Production Android Applications

Permissions: If your app targets Android 12+ or later, review runtime permission requests carefully. While the current example only requires INTERNET for authentication, additional network or audio features may require dynamic permission handling.
Audio focus: To avoid conflicts with other audio apps, request audio focus when playing TTS. Consider handling focus loss gracefully (pause/resume) for a better user experience.
Threading and lifecycle management: When streaming is done, cancel background threads and clean up Orca and AudioTrack to prevent memory leaks or audio glitches.
Error handling: For production, display user-friendly messages when initialization fails or streaming errors occur.

Further Improvements

Once you have streaming Text-to-Speech implemented, consider building a complete voice AI assistant for Android by integrating:

Cheetah Streaming Speech-to-Text: for real-time, on-device speech-to-text
picoLLM On-Device LLM Inference: for on-device LLM inference, enabling live text generation for conversational experiences

With Orca, Cheetah, and picoLLM, you can implement fully on-device voice AI that streams LLM output to TTS with minimal latency, offering a secure, private, and responsive solution suitable for enterprise Android apps.

Start Free

Frequently Asked Questions

How does streaming TTS differ from batch TTS for LLM applications?

Streaming TTS processes text incrementally as it arrives from the LLM, playing audio chunks as soon as they're ready—similar to how ChatGPT displays text word-by-word. Batch TTS waits for the entire response before starting synthesis, adding 2-5 seconds of delay for typical LLM outputs. For conversational AI, streaming is essential for maintaining natural interaction flow.

Can I use this with OpenAI's streaming API or Anthropic Claude?

Yes. Any streaming text source works—OpenAI's Chat Completions, Anthropic's Server-Sent Events API, or local LLM inference libraries such as picoLLM. The key is feeding text chunks to the streaming TTS engine as they arrive, rather than waiting for complete responses.

Is it possible to support multiple languages for streaming synthesis in the same app?

You can bundle multiple model files (one per language/voice) and switch between them by creating new Orca instances with different model paths.