Orca Streaming Text-to-Speech
iOS Quick Start

Platforms

iOS (13.0+)

Requirements

Picovoice Account & AccessKey

Signup or Login to Picovoice Console to get your AccessKey. Make sure to keep your AccessKey secret.

Quick Start

Setup

Install Xcode.
Import the Orca-iOS package into your project.

To import the package using SPM, open up your project's Package Dependencies in XCode and add:

https://github.com/Picovoice/orca.git

To import it into your iOS project using CocoaPods, add the following line to your Podfile:

pod 'Orca-iOS'

Then, run the following from the project directory:

pod install

Model File

Orca Streaming Text-to-Speech can synthesize speech in different languages and with a variety of voices, each of which is characterized by a model file (.pv) located in the Orca GitHub repository. The language and gender of the speaker is indicated in the file name.

To add an Orca Streaming Text-to-Speech model file to your iOS application:

Download an Orca Streaming Text-to-Speech model file from the Orca GitHub Repository.
Add the model as a bundled resource by selecting Build Phases and adding it to Copy Bundle Resources step.

Usage

Create an instance of the Orca Streaming Text-to-Speech engine:

import orca

let accessKey = "${ACCESS_KEY}"
let modelPath = Bundle(for: type(of: self)).path(
        forResource: "${ORCA_MODEL_FILE}",
        ofType: "pv")!

do {
    let orca = try Orca(accessKey: accessKey, modelPath: modelPath)
} catch { }

Alternatively, you can provide modelPath as an absolute path to the model file on device.

Orca Streaming Text-to-Speech supports two modes of operation: streaming and single synthesis. In the streaming synthesis mode, Orca processes an incoming text stream in real-time and generates audio in parallel. In the single synthesis mode, a complete text is synthesized in a single call to the Orca engine.

Streaming synthesis

To synthesize a text stream, create an Orca.OrcaStream object and add text to it one-by-one:

do {
  let orcaStream = try orca.streamOpen()

  for textChunk in textGenerator() {
    let pcm = orcaStream.synthesize(textChunk)
    if pcm != nil {
      // handle pcm
    }
  }
} catch { }

The textGenerator() function can be any stream generating text, such as an LLM response.

The Orca.OrcaStream object buffers input text until there is enough context to generate audio. If there is not enough text to generate audio, nil is returned.

Once the text stream is complete, call the flush method to synthesize the remaining text:

do {
  let pcm = orcaStream.flush()
  if pcm != nil {
    // handle pcm
  }
} catch { }

When done with streaming text synthesis, the Orca.OrcaStream object needs to be closed:

orcaStream.close()

Single synthesis

If the complete text is known before synthesis, single synthesis mode can be used to generate speech in a single call to Orca Streaming Text-to-Speech:

do {
  // Return raw PCM and alignments
  let (pcm, wordArray) = try orca.synthesize(text: "${TEXT}")
} catch { }

do {
  // Save the generated audio to a WAV file directly
  let wordArray = try orca.synthesizeToFile(text: "${TEXT}", outputPath: "${OUTPUT_PATH}")
} catch { }

Replace ${TEXT} with the text to be synthesized and ${OUTPUT_PATH} with the path to save the generated audio as a single-channel 16-bit PCM WAV file.

In single synthesis mode, Orca Streaming Text-to-Speech returns metadata of the synthesized audio in the form of an array of OrcaWord objects.

The OrcaWord object has the following properties:

Word: String representation of the word.
Start Time: Indicates when the word started in the synthesized audio. Value is in seconds.
End Time: Indicates when the word ended in the synthesized audio. Value is in seconds.
Phonemes: An array of OrcaPhoneme objects.

The OrcaPhoneme object has the following properties:

Phoneme: String representation of the phoneme.
Start Time: Indicates when the phoneme started in the synthesized audio. Value is in seconds.
End Time: Indicates when the phoneme ended in the synthesized audio. Value is in seconds.

When done make sure to explicitly release the resources using:

orca.delete()

For more information on our Orca Streaming Text-to-Speech iOS SDK, head over to our Orca GitHub repository.

Demos

For the Orca Streaming Text-to-Speech iOS SDK, we offer a demo application that demonstrates how to use the Text-to-Speech engine.

Setup

Clone the Repository

git clone --recurse-submodules https://github.com/Picovoice/orca.git

Usage

Replace "${YOUR_ACCESS_KEY_HERE}" in the ViewModel.swift file with a valid AccessKey.
Open OrcaDemo.xcodeproj in XCode and run the demo.

For more information on our Orca Streaming Text-to-Speech demo for iOS, head over to our GitHub repository.

Resources

Package

Orca-iOS on Cocoapods

API

Orca-iOS API Docs

GitHub

Was this doc helpful?

Issue with this doc?

Orca Streaming Text-to-Speech iOS Quick Start

Platforms

Requirements

Picovoice Account & AccessKey

Quick Start

Setup

Model File

Usage

Streaming synthesis

Single synthesis

Demos

Setup

Usage

Resources

Package

API

GitHub

Orca Streaming Text-to-Speech
iOS Quick Start