Orca Streaming Text-to-Speech
Node.js API
API Reference for the Orca Node.js SDK (npm)
Orca
Class for the Orca Streaming Text-to-Speech engine.
Orca can be initialized using the class constructor()
.
Resources should be cleaned when you are done using the release()
method.
Orca.constructor()
Orca
constructor.
Parameters
accessKey
string : AccessKey obtained from Picovoice Console.options
OrcaOptions: Optional configuration arguments:modelPath
string : Path to the file containing model parameters (.pv
).libraryPath
string : Path to the Orca dynamic library (.node
).
Returns
Orca
: An instance of the Orca engine.
Orca.release()
Releases resources acquired by Orca
.
Orca.version
Getter for version.
Returns
string
: CurrentOrca
version.
Orca.validCharacters
Getter for the valid characters accepted as input to the synthesize functions.
Returns
string[]
: Valid characters accepted as input to the synthesize functions.
Orca.sampleRate
Getter for the audio sample rate of the synthesized speech.
Returns
number
: Audio sample rate of the synthesized speech.
Orca.maxCharacterLimit
Getter for the maximum number of characters allowed in a single synthesis request.
Returns
number
: Maximum number of characters allowed in a single synthesis request.
Orca.synthesize()
Generates audio from text. The returned audio contains the speech representation of the text.
If you wish to save the synthesized speech to a file, consider
using Orca.synthesizeToFile()
.
Parameters
text
string : Text to be converted to audio. The maximum number of characters per call isOrca.maxCharacterLimit
. Allowed characters can be retrieved by callingOrca.validCharacters
. Custom pronunciations can be embedded in the text via the syntax "{word|pronunciation}". The pronunciation is expressed in ARPAbet phonemes, for example: "{read|R IY D} this as {read|R EH D}".synthesizeParams
OrcaSynthesizeParams : Optional configuration arguments.speechRate
number : Speed of generated speech. Valid values are within [0.7, 1.3]. Higher (lower) values produce faster (slower) speech. The default is1.0
.randomState
number: Random seed for the synthesis process. This can be used to ensure that the synthesized speech is deterministic across different runs. Valid values are all non-negative integers. If not provided, a random seed will be chosen and the synthesis process will be non-deterministic.
Returns
OrcaSynthesizeResult
: An object containing the generated audio as a sequence of 16-bit linearly-encoded integers and an array ofOrcaAlignment
objects representing the word alignments.
Orca.synthesizeToFile()
Generates audio from text and saves it to a WAV file. The file contains the speech representation of the text.
Parameters
text
string : Text to be converted to audio. For details see the documentation ofOrca.synthesize()
.outputPath
string : Absolute path to save the generated audio as a single-channel 16-bit PCM WAV file.synthesizeParams
OrcaSynthesizeParams : Optional configuration arguments.speechRate
number : Speed of generated speech. For details see the documentation ofOrca.synthesize()
.randomState
number : Random seed for the synthesis process. For details see the documentation ofOrca.synthesize()
.
Returns
OrcaSynthesizeToFileResult
: An array ofOrcaAlignment
objects representing the word alignments.
Orca.streamOpen()
Opens an OrcaStream
object for streaming input text synthesis.
Parameters
synthesizeParams
OrcaSynthesizeParams
: Optional configuration arguments.speechRate
number : Speed of generated speech. For details see the documentation ofOrca.synthesize()
.randomState
number : Random seed for the synthesis process. For details see the documentation ofOrca.synthesize()
.
Returns
OrcaStream
: An instance ofOrcaStream
.
OrcaStream
Class for handling input text streaming synthesis.
An OrcaStream
object is initialized via OrcaStream.streamOpen()
method
and needs to be closed with OrcaStream.close()
method.
OrcaStream.synthesize()
Adds a chunk of text to the OrcaStream
object and generates audio if enough text has been
added.
This function is expected to be called multiple times with consecutive chunks of text from a text stream.
The incoming text is buffered as it arrives until there is enough context to convert a chunk of the
buffered text into audio. The caller needs to use OrcaStream.flush()
to generate
the audio chunk for the remaining text that has not yet been synthesized.
Parameters
text
string : A chunk of text (e.g. an LLM token) from a text input stream, comprised of valid characters. For details see the documentation ofOrca.synthesize()
.
Returns
OrcaStreamSynthesizeResult
: The generated audio as a sequence of 16-bit linearly-encoded integers,null
if no audio chunk has been produced.
OrcaStream.flush()
Generates audio for all buffered text that was added to the OrcaStream
object
via OrcaStream.synthesize()
.
Returns
OrcaStreamSynthesizeResult
: The generated audio as a sequence of 16-bit linearly-encoded integers,null
if no audio chunk has been produced.
OrcaStream.close()
Closes the OrcaStream
object and releases resources acquired by it.
OrcaOptions
Orca options type.
modelPath
string : Path to the file containing model parameters (.pv
).libraryPath
string : Path to the Orca dynamic library (.node
).
OrcaSynthesizeParams
Orca synthesize params type.
speechRate
number : Optional configuration to control the speed of the generated speech. Valid values are within [0.7, 1.3]. A higher value produces speech that is faster, and a lower value produces speech that is slower. The default is1.0
.randomState
number : Optional configuration to set the random state for sampling during synthesis. This can be used to ensure that the synthesized speech is deterministic across different runs. Valid values are all non-negative integers. If not provided, a random seed will be chosen and the synthesis process will be non-deterministic.
OrcaAlignment
Orca word alignment type.
word
string : Synthesized word.startSec
number : Start time of the word in seconds.endSec
number : End time of the word in seconds.phonemes
OrcaPhoneme[] : Orca phonemes.
OrcaPhoneme
Orca phoneme alignment type.
word
string : Synthesized phoneme.startSec
number : Start time of the phoneme in seconds.endSec
number : End time of the phoneme in seconds.
OrcaSynthesizeResult
Orca synthesize result type.
pcm
Int16Array : The output audio, represented as a 16-bit linearly-encoded integer array.alignments
OrcaAlignment[] : Orca alignments.
OrcaSynthesizeToFileResult
Orca synthesize to file result type.
OrcaAlignment
OrcaAlignment[] : An array ofOrcaAlignment
objects representing the word alignments.
OrcaStreamSynthesizeResult
OrcaStream synthesize result type.
This value will be either the generated audio as a sequence of 16-bit linearly-encoded integers, or null
if no audio chunk has been produced.
Errors
Exceptions thrown if an error occurs within Orca
Text-to-Speech engine.
Exceptions: