Orca Streaming Text-to-Speech
Python API
API Reference for the Python Orca SDK (PyPI).
pvorca.create()
Factory method for Orca Streaming Text-to-Speech engine.
Parameters
access_keystr : AccessKey obtained from Picovoice Console.model_pathOptional[str] : Absolute path to the file containing model parameters (.pv). This file determines the voice of the synthesized speech.library_pathOptional[str] : Absolute path to Orca's dynamic library.
Returns
Orca: An instance of the Orca Streaming Text-to-Speech engine.
Throws
pvorca.Orca
Class for the Orca Streaming Text-to-Speech engine.
Orca can be initialized either using the module level create() function
or directly using the class __init__() method.
Resources should be cleaned when you are done using the delete() method.
pvorca.Orca.version
The version string of the Orca library.
pvorca.Orca.valid_characters
The set of valid characters that Orca accepts in the text input to the synthesis methods.
pvorca.Orca.sample_rate
The audio sample rate of the synthesized speech.
pvorca.Orca.max_character_limit
The maximum number of characters allowed in a single synthesis request.
pvorca.Orca.__init__()
__init__ method for Orca Streaming Text-to-Speech engine.
Parameters
access_keystr : AccessKey obtained from Picovoice Console.model_pathstr : Absolute path to the file containing model parameters (.pv). This file determines the voice of the synthesized speech.library_pathstr : Absolute path to Orca's dynamic library.
Returns
Orca: An instance of the Orca Streaming Text-to-Speech engine.
Throws
pvorca.Orca.delete()
Releases resources acquired by Orca.
pvorca.Orca.synthesize()
Generates audio from text. The returned audio contains the speech representation of the text.
If you wish to save the synthesized speech to a file, consider
using Orca.synthesize_to_file().
Parameters
textstr : Text to be converted to audio. The maximum number of characters per call isself.max_character_limit. Allowed characters can be retrieved by callingself.pv_orca_valid_characters. Custom pronunciations can be embedded in the text via the syntax "{word|pronunciation}". The pronunciation is expressed in ARPAbet phonemes, for example: "{read|R IY D} this as {read|R EH D}".speech_rateOptional[float] : Speed of generated speech. Valid values are within[0.7, 1.3]. Higher (lower) values produce faster (slower) speech. The default is1.0.random_stateOptional[int]: Random seed for the synthesis process. This can be used to ensure that the synthesized speech is deterministic across different runs. Valid values are all non-negative integers. If not provided, a random seed will be chosen and the synthesis process will be non-deterministic.
Returns
Tuple[Sequence[int], Sequence[WordAlignment]]: A tuple containing the generated audio as a sequence of 16-bit linearly-encoded integers and a sequence ofWordAlignmentobjects representing the word alignments.
Throws
pvorca.Orca.synthesize_to_file()
Generates audio from text and saves it to a WAV file. The file contains the speech representation of the text.
Parameters
textstr : Text to be converted to audio. For details see the documentation ofOrca.synthesize().output_pathstr : Absolute path to save the generated audio as a single-channel 16-bit PCM WAV file.speech_rateOptional[float] : Speed of generated speech. For details see the documentation ofOrca.synthesize().random_stateOptional[int] : Random seed for the synthesis process. For details see the documentation ofOrca.synthesize().
Returns
Sequence[WordAlignment]: A sequence ofWordAlignmentobjects representing the word alignments.
Throws
pvorca.Orca.stream_open()
Opens an Orca.OrcaStream object for streaming input text synthesis.
Parameters
speech_rateOptional[float] : Speed of speech generated byOrcaStream.synthesize(). For details see the documentation ofOrca.synthesize().random_stateOptional[int] : Random seed for the synthesis process. For details see the documentation ofOrca.synthesize().
Returns
Orca.OrcaStream: An instance ofOrca.OrcaStream.
Throws
pvorca.Orca.WordAlignment
Metadata representing the alignment of a word in the synthesized audio.
wordstr : Synthesized word.start_secfloat : Start time of the word in seconds.end_secfloat : End time of the word in seconds.phonemesList[PhonemeAlignment] : List of phoneme alignments for the word.
pvorca.Orca.PhonemeAlignment
Metadata representing the alignment of a phoneme in the synthesized audio.
phonemestr : Synthesized phoneme.start_secfloat : Start time of the phoneme in seconds.end_secfloat : End time of the phoneme in seconds.
pvorca.Orca.OrcaStream
Class for handling input text streaming synthesis.
An Orca.OrcaStream object is initialized via Orca.stream_open()
method
and needs to be closed with Orca.OrcaStream.close() method.
pvorca.Orca.OrcaStream.synthesize()
Adds a chunk of text to the Orca.OrcaStream object and generates audio if enough text has been
added.
This function is expected to be called multiple times with consecutive chunks of text from a text stream.
The incoming text is buffered as it arrives until there is enough context to convert a chunk of the
buffered text into audio. The caller needs to use Orca.OrcaStream.flush() to generate
the audio chunk for the remaining text that has not yet been synthesized.
Parameters
textstr : A chunk of text (e.g. an LLM token) from a text input stream, comprised of valid characters. For details see the documentation ofOrca.synthesize().
Returns
Optional[Sequence[int]]: The generated audio as a sequence of 16-bit linearly-encoded integers,Noneif no audio chunk has been produced.
Throws
pvorca.Orca.OrcaStream.flush()
Generates audio for all the buffered text that was added to the Orca.OrcaStream object
via Orca.OrcaStream.synthesize().
Returns
Optional[Sequence[int]]: The generated audio as a sequence of 16-bit linearly-encoded integers,Noneif no audio chunk has been produced.
Throws
pvorca.Orca.OrcaStream.close()
Closes the Orca.OrcaStream object and releases resources acquired by it.
pvorca.OrcaError
Error thrown if an error occurs within the Orca Text-to-Speech engine.
Exceptions