Orca Streaming Text-to-Speech
iOS Quick Start
Platforms
- iOS (13.0+)
Requirements
Picovoice Account & AccessKey
Signup or Login to Picovoice Console to get your AccessKey
.
Make sure to keep your AccessKey
secret.
Quick Start
Setup
Install Xcode.
Install CocoaPods.
Import the Orca-iOS binding by adding the following line to
Podfile
:
- Run the following from the project directory:
Model File
Orca Streaming Text-to-Speech can synthesize speech with various voices, each of which is characterized by a model file located in the Orca GitHub repository.
To add an Orca Streaming Text-to-Speech voice model file to your iOS application:
- Download an Orca Streaming Text-to-Speech voice model file from the Orca GitHub Repository.
- Add the model as a bundled resource by selecting Build Phases and adding it to
Copy Bundle Resources
step.
Usage
Create an instance of the Orca Streaming Text-to-Speech engine:
Alternatively, you can provide modelPath
as an absolute path to the model file on device.
Orca Streaming Text-to-Speech supports two modes of operation: streaming and single synthesis. In the streaming synthesis mode, Orca processes an incoming text stream in real-time and generates audio in parallel. In the single synthesis mode, a complete text is synthesized in a single call to the Orca engine.
Streaming synthesis
To synthesize a text stream, create an Orca.OrcaStream
object and add text to it one-by-one:
The textGenerator()
function can be any stream generating text, such as an LLM response.
The Orca.OrcaStream
object buffers input text until there is enough context to generate audio.
If there is not enough text to generate audio, nil
is returned.
Once the text stream is complete, call the flush
method to synthesize the remaining text:
When done with streaming text synthesis, the Orca.OrcaStream
object needs to be closed:
Single synthesis
If the complete text is known before synthesis, single synthesis mode can be used to generate speech in a single call to Orca Streaming Text-to-Speech:
Replace ${TEXT}
with the text to be synthesized and ${OUTPUT_PATH}
with the path to save the generated audio as a
single-channel 16-bit PCM WAV file.
In single synthesis mode, Orca Streaming Text-to-Speech returns metadata of the synthesized audio in the form of an array of OrcaWord
objects.
The OrcaWord
object has the following properties:
- Word: String representation of the word.
- Start Time: Indicates when the word started in the synthesized audio. Value is in seconds.
- End Time: Indicates when the word ended in the synthesized audio. Value is in seconds.
- Phonemes: An array of
OrcaPhoneme
objects.
The OrcaPhoneme
object has the following properties:
- Phoneme: String representation of the phoneme.
- Start Time: Indicates when the phoneme started in the synthesized audio. Value is in seconds.
- End Time: Indicates when the phoneme ended in the synthesized audio. Value is in seconds.
When done make sure to explicitly release the resources using:
For more information on our Orca Streaming Text-to-Speech iOS SDK, head over to our Orca GitHub repository.
Demos
For the Orca Streaming Text-to-Speech iOS SDK, we offer a demo application that demonstrates how to use the Text-to-Speech engine.
Setup
Clone the Repository
Usage
- Install dependencies:
Replace
let ACCESS_KEY = "${YOUR_ACCESS_KEY_HERE}"
in the ViewModel.swift file with a validAccessKey
.Open
OrcaDemo.xcworkspace
in XCode and run the demo.
For more information on our Orca demos for iOS, head over to our GitHub repository.