Orca Streaming Text-to-Speech
Android Quick Start

Platforms

Android (5.0+, API 21+)

Requirements

Picovoice Account and AccessKey
Android Studio
Android device with USB debugging enabled or Android simulator

Picovoice Account & AccessKey

Signup or Login to Picovoice Console to get your AccessKey. Make sure to keep your AccessKey secret.

Quick Start

Setup

Install Android Studio.
Include mavenCentral() repository in the top-level build.gradle. Then add the following to the app's build.gradle:

dependencies {
    // ...
    implementation 'ai.picovoice:orca-android:${LATEST_VERSION}' // replace with latest version
}

Add the following to the app's AndroidManifest.xml file to enable AccessKey validation:

<uses-permission android:name="android.permission.INTERNET" />

Model File

Orca Streaming Text-to-Speech can synthesize speech in different languages and with a variety of voices, each of which is characterized by a model file (.pv) located in the Orca GitHub repository. The language and gender of the speaker is indicated in the file name.

To add an Orca Streaming Text-to-Speech model file to your Android application:

Download an Orca Streaming Text-to-Speech model file from the Orca GitHub Repository.
Add the model as a bundled resource by placing it under the ${ANDROID_APP}/src/main/assets directory of your Android project.

Usage

Create an instance of the Orca Streaming Text-to-Speech engine:

import ai.picovoice.orca.*;

final String accessKey = "${ACCESS_KEY}"; // AccessKey provided by Picovoice Console (https://console.picovoice.ai/)
final String modelPath = "${MODEL_PATH}"; // path relative to the assets folder or absolute path to file (`.pv`) on device

try {
    Orca orca = new Orca.Builder()
      .setAccessKey(accessKey)
      .setModelPath(modelPath)
      .build(appContext);
} catch (OrcaException ex) { }

Orca Streaming Text-to-Speech supports two modes of operation: streaming and single synthesis. In the streaming synthesis mode, Orca processes an incoming text stream in real-time and generates audio in parallel. In the single synthesis mode, a complete text is synthesized in a single call to the Orca engine.

Streaming synthesis

To synthesize a text stream, create an Orca.OrcaStream object and add text to it one-by-one:

OrcaSynthesizeParams params = new OrcaSynthesizeParams.Builder().build();

try {
    Orca.OrcaStream orcaStream = orca.streamOpen(params);

    for (String textChunk : textGenerator()) {
      short[] pcm = orcaStream.synthesize(textChunk);
      if (pcm != null) {
        // handle pcm
      }
    }
} catch (OrcaException ex) { }

The textGenerator() function can be any stream generating text, such as an LLM response.

The Orca.OrcaStream object buffers input text until there is enough context to generate audio. If there is not enough text to generate audio, null is returned.

Once the text stream is complete, call the flush method to synthesize the remaining text:

try {
    short[] flushedPcm = orcaStream.flush();
    if (flushedPcm != null) {
      // handle pcm
    }
} catch (OrcaException ex) { }

When done with streaming text synthesis, the Orca.OrcaStream object needs to be closed:

stream.close()

Single synthesis

Synthesize speech by calling one of the synthesize methods:

OrcaSynthesizeParams params = new OrcaSynthesizeParams.Builder().build();

try {
    // Return raw PCM and alignments
    OrcaAudio audio = orca.synthesize("${TEXT}", params);
} catch (OrcaException ex) { }

try {
    // Save the generated audio to a WAV file directly
    OrcaWord[] orcaWords = orca.synthesizeToFile("${TEXT}", "${OUTPUT_PATH}", params);
} catch (OrcaException ex) { }

Replace ${TEXT} with the text to be synthesized and ${OUTPUT_PATH} with the path to save the generated audio as a single-channel 16-bit PCM WAV file. In single synthesis mode, Orca Streaming Text-to-Speech returns alignment metadata of the synthesized audio in the form of an array of OrcaWord objects.

The OrcaWord object has the following properties:

Word: String representation of the word.
Start Time: Indicates when the word started in the synthesized audio. Value is in seconds.
End Time: Indicates when the word ended in the synthesized audio. Value is in seconds.
Phonemes: An array of OrcaPhoneme objects.

The OrcaPhoneme object has the following properties:

Phoneme: String representation of the phoneme.
Start Time: Indicates when the phoneme started in the synthesized audio. Value is in seconds.
End Time: Indicates when the phoneme ended in the synthesized audio. Value is in seconds.

When done make sure to explicitly release the resources using:

orca.delete()

Demos

For the Orca Streaming Text-to-Speech Android SDK, we offer a demo application that demonstrates how to use the Orca engine.

Setup

Clone the Orca Streaming Text-to-Speech repository from GitHub using HTTPS:

git clone --recurse-submodules https://github.com/Picovoice/orca.git

Usage

Open the Android demo using Android Studio.
Copy your AccessKey from Picovoice Console into the ACCESS_KEY variable in MainActivity.java.
Run the application using a connected Android device or using an Android simulator.

Resources

Package

orca-android on Maven Central

API

orca-android API Docs

GitHub

Was this doc helpful?

Issue with this doc?

Orca Streaming Text-to-Speech Android Quick Start

Platforms

Requirements

Picovoice Account & AccessKey

Quick Start

Setup

Model File

Usage

Streaming synthesis

Single synthesis

Demos

Setup

Usage

Resources

Package

API

GitHub

Orca Streaming Text-to-Speech
Android Quick Start