Orca Streaming Text-to-Speech
Android Quick Start
Platforms
- Android (5.0+, API 21+)
Requirements
- Picovoice Account and AccessKey
- Android Studio
- Android device with USB debugging enabled or Android simulator
Picovoice Account & AccessKey
Signup or Login to Picovoice Console to get your AccessKey
.
Make sure to keep your AccessKey
secret.
Quick Start
Setup
Install Android Studio.
Include
mavenCentral()
repository in the top-levelbuild.gradle
. Then add the following to the app'sbuild.gradle
:
- Add the following to the app's
AndroidManifest.xml
file to enableAccessKey
validation:
Model File
Orca Streaming Text-to-Speech can synthesize speech with various voices, each of which is characterized by a model file located in the Orca GitHub repository.
To add an Orca Streaming Text-to-Speech voice model file to your Android application:
- Download an Orca Streaming Text-to-Speech voice model file from the Orca GitHub Repository.
- Add the model as a bundled resource by placing it under the
${ANDROID_APP}/src/main/assets
directory of your Android project.
Usage
Create an instance of the Orca Streaming Text-to-Speech engine:
Orca Streaming Text-to-Speech supports two modes of operation: streaming and single synthesis. In the streaming synthesis mode, Orca processes an incoming text stream in real-time and generates audio in parallel. In the single synthesis mode, a complete text is synthesized in a single call to the Orca engine.
Streaming synthesis
To synthesize a text stream, create an Orca.OrcaStream
object and add text to it one-by-one:
The textGenerator()
function can be any stream generating text, such as an LLM response.
The Orca.OrcaStream
object buffers input text until there is enough context to generate audio.
If there is not enough text to generate audio, null
is returned.
Once the text stream is complete, call the flush
method to synthesize the remaining text:
When done with streaming text synthesis, the Orca.OrcaStream
object needs to be closed:
Single synthesis
Synthesize speech by calling one of the synthesize
methods:
Replace ${TEXT}
with the text to be synthesized and ${OUTPUT_PATH}
with the path to save the generated audio as a single-channel 16-bit PCM WAV file.
In single synthesis mode, Orca Streaming Text-to-Speech returns alignment metadata of the synthesized audio in the form of an array of OrcaWord
objects.
The OrcaWord
object has the following properties:
- Word: String representation of the word.
- Start Time: Indicates when the word started in the synthesized audio. Value is in seconds.
- End Time: Indicates when the word ended in the synthesized audio. Value is in seconds.
- Phonemes: An array of
OrcaPhoneme
objects.
The OrcaPhoneme
object has the following properties:
- Phoneme: String representation of the phoneme.
- Start Time: Indicates when the phoneme started in the synthesized audio. Value is in seconds.
- End Time: Indicates when the phoneme ended in the synthesized audio. Value is in seconds.
When done make sure to explicitly release the resources using:
Demos
For the Orca Streaming Text-to-Speech Android SDK, we offer a demo application that demonstrates how to use the Orca engine.
Setup
Clone the Orca Streaming Text-to-Speech repository from GitHub using HTTPS:
Usage
- Open the Android demo using Android Studio.
- Copy your
AccessKey
from Picovoice Console into theACCESS_KEY
variable in MainActivity.java. - Run the application using a connected Android device or using an Android simulator.