Orca Streaming Text-to-Speech
C API

API Reference for the Orca C SDK.

pv_orca_t

typedef struct pv_orca pv_orca_t;

Container representing the Orca Streaming Text-to-Speech object.

pv_orca_stream_t

typedef struct pv_orca_stream pv_orca_stream_t;

Container representing the OrcaStream object to synthesizes audio from a stream of text.

pv_orca_init()

pv_status_t pv_orca_init(
        const char *access_key,
        const char *model_path,
        pv_orca_t **object);

Creates an Orca instance. Resources should be cleaned when you are done using the pv_orca_delete() function.

Parameters

access_key const char * : AccessKey obtained from Picovoice Console.
model_path const char * : Absolute path to the file containing model parameters (.pv). This file determines the voice of the synthesized speech.
object pv_orca_t ** : Constructed instance of Orca.

Returns

pv_status_t : Status code.

pv_orca_delete()

void pv_orca_delete(pv_orca_t *object);

Releases resources acquired by Orca.

Parameters

object pv_orca_t * : Orca object.

pv_orca_valid_characters()

pv_status_t pv_orca_valid_characters(
        const pv_orca_t *object,
        int32_t *num_characters,
        const char ***characters);

Getter for the valid characters accepted as input to the Orca synthesize functions.

Parameters

object pv_orca_t * : Orca object.
num_characters int32_t * : The number of valid characters.
characters const char *** : The array of valid characters.

Returns

pv_status_t : Status code.

pv_orca_valid_characters_delete()

void pv_orca_valid_characters_delete(const char **characters);

Deletes the resources acquired when calling pv_orca_valid_characters().

Parameters

characters const char ** : The array of valid characters.

pv_orca_sample_rate()

pv_status_t pv_orca_sample_rate(
        const pv_orca_t *object,
        int32_t *sample_rate);

Getter for the sample rate of the audio produced by Orca.

Parameters

object pv_orca_t * : Orca object.
sample_rate int32_t * : Sample rate of the audio produced by Orca.

Returns

pv_status_t : Status code.

pv_orca_max_character_limit()

pv_status_t pv_orca_max_character_limit(
        const pv_orca_t *object, 
        int32_t *max_character_limit);

Getter for the maximum number of characters that can be synthesized at once.

Parameters

object pv_orca_t * : Orca object.
max_character_limit int32_t * : Maximum number of characters.

Returns

pv_status_t : Status code.

pv_orca_synthesize()

pv_status_t pv_orca_synthesize(
        const pv_orca_t *object,
        const char *text,
        const pv_orca_synthesize_params_t *synthesize_params,
        int32_t *num_samples,
        int16_t **pcm,
        int32_t *num_alignments,
        pv_orca_word_alignment_t ***alignments);

Generates audio from text. The returned audio contains the speech representation of the text. This function returns PV_STATUS_INVALID_STATE if an OrcaStream object is open.

The memory of the returned audio and the alignment metadata is allocated by Orca and needs to be deleted with pv_orca_pcm_delete() and pv_orca_word_alignments_delete(), respectively.

If you wish to save the synthesized speech to a file, consider using pv_orca_synthesize_to_file().

Parameters

object pv_orca_t * : Orca object.
text const char * : Text to be converted to audio. The maximum length can be attained by calling pv_orca_max_character_limit(). Allowed characters can be retrieved by calling pv_orca_valid_characters(). Custom pronunciations can be embedded in the text via the syntax "{word|pronunciation}". The pronunciation is expressed in ARPAbet format, e.g.: "I {live|L IH V} in {Sevilla|S EH V IY Y AH}".
synthesize_params pv_orca_synthesize_params_t * : Global parameters that give control over the voice generation. See pv_orca_synthesize_params_t for details.
num_samples int32_t * : The length of the output audio.
pcm int16_t ** : The output audio, represented as a 16-bit linearly-encoded integer array.
num_alignments int32_t * : The number of word alignments.
alignments pv_orca_word_alignment_t *** : The word alignments and their associated metadata.

Returns

pv_status_t : Status code.

pv_orca_synthesize_to_file()

pv_status_t pv_orca_synthesize_to_file(
        const pv_orca_t *object,
        const char *text,
        const pv_orca_synthesize_params_t *synthesize_params,
        const char *output_path,
        int32_t *num_alignments,
        pv_orca_word_alignment_t ***alignments);

Generates audio from text and saves it to a file. The file contains the speech representation of the text. This function returns PV_STATUS_INVALID_STATE if an OrcaStream object is open.

The memory of the alignment metadata is allocated by Orca and needs to be deleted with pv_orca_word_alignments_delete().

Parameters

object pv_orca_t * : Orca object.
text const char * : Text to be converted to audio. For details see the documentation of pv_orca_synthesize().
synthesize_params pv_orca_synthesize_params_t * : Global parameters that give control over the voice generation. See pv_orca_synthesize_params_t for details.
output_path const char * : Absolute path to save the generated audio as a single-channel 16-bit PCM WAV file.
num_alignments int32_t * : The number of word alignments.
alignments pv_orca_word_alignment_t *** : The word alignments and their associated metadata.

Returns

pv_status_t : Status code.

pv_orca_pcm_delete()

void pv_orca_pcm_delete(int16_t *pcm);

Deletes the audio previously generated by the pv_orca_synthesize() function.

Parameters

pcm int16_t * : The audio generated by pv_orca_synthesize().

pv_orca_word_alignments_delete()

pv_status_t pv_orca_word_alignments_delete(
        int32_t num_alignments,
        pv_orca_word_alignment_t **alignments);

Deletes word alignments returned from Orca synthesize functions.

Parameters

num_alignments int32_t : Number of alignments.
alignments pv_orca_word_alignment_t ** : Alignments returned from Orca synthesize functions.

pv_orca_stream_open()

pv_status_t pv_orca_stream_open(
        pv_orca_t *object,
        const pv_orca_synthesize_params_t *synthesize_params,
        pv_orca_stream_t **stream);

Opens an OrcaStream object to synthesize audio from a stream of text.

Parameters

object pv_orca_t * : Orca object.
synthesize_params pv_orca_synthesize_params_t * : Global parameters that give control over the voice generation. See pv_orca_synthesize_params_t for details.
stream pv_orca_stream_t ** : The OrcaStream object.

Returns

pv_status_t : Status code.

pv_orca_stream_close()

void pv_orca_stream_close(pv_orca_stream_t *object);

Closes the OrcaStream object and deletes the resources acquired by it.

Parameters

object pv_orca_stream_t * : OrcaStream object.

pv_orca_stream_synthesize()

pv_status_t pv_orca_stream_synthesize(
        pv_orca_stream_t *object,
        const char *text,
        int32_t *num_samples,
        int16_t **pcm);

Adds a chunk of text to the OrcaStream object and generates audio if enough text has been added. This function is expected to be called multiple times with consecutive chunks of text from a text stream. The incoming text is buffered as it arrives until there is enough context to convert a chunk of the buffered text into audio. The caller needs to use pv_orca_stream_flush() to generate the audio chunk for the remaining text that has not yet been synthesized. The caller is responsible for deleting the generated audio with pv_orca_pcm_delete().

Parameters

object pv_orca_stream_t * : The OrcaStream object.
text const char * : A chunk of text from a text input stream, comprised of valid characters. This is typically a word or a token from an LLM response. For more details on the format, see the documentation of pv_orca_synthesize().
num_samples int32_t * : The length of the pcm produced, 0 if no audio chunk has been produced.
pcm int16_t ** : The output audio chunk, NULL if no audio chunk has been produced.

Returns

pv_status_t : Status code.

pv_orca_stream_flush()

pv_status_t pv_orca_stream_flush(
        pv_orca_stream_t *object,
        int32_t *num_samples,
        int16_t **pcm);

Generates a final audio chunk corresponding to the buffered text added to the OrcaStream object via pv_orca_stream_synthesize(). The caller is responsible for deleting the generated audio with pv_orca_pcm_delete().

Parameters

object pv_orca_stream_t * : The OrcaStream object.
num_samples int32_t * : The length of the pcm, 0 if no audio chunk has been produced.
pcm int16_t ** : The output audio, NULL if no audio chunk has been produced.

Returns

pv_status_t : Status code.

pv_orca_version()

const char *pv_orca_version(void);

Getter for version.

Returns

const char * : Orca version.

pv_orca_synthesize_params_t

typedef struct pv_orca_synthesize_params pv_orca_synthesize_params_t;

Object holding global parameters that give control over the voice generation. This object is argument to the Orca synthesize functions.

An instance can be created with pv_orca_synthesize_params_init() and must be deleted with pv_orca_synthesize_params_delete().

Use pv_orca_synthesize_params_set_* functions to set a parameter to its desired values. Use pv_orca_synthesize_params_get_* functions to get the current value of a parameter.

pv_orca_synthesize_params_init()

pv_status_t pv_orca_synthesize_params_init(pv_orca_synthesize_params_t **object);

Creates a Synthesize params object. All parameters are set to their default values.

Parameters

object pv_orca_synthesize_params_t ** : Constructed instance of Synthesize params.

Returns

pv_status_t : Status code.

pv_orca_synthesize_params_delete()

void pv_orca_synthesize_params_delete(pv_orca_synthesize_params_t *object);

Releases resources acquired by Synthesize params.

Parameters

object pv_orca_synthesize_params_t * : Synthesize params object.

pv_orca_synthesize_params_set_speech_rate()

pv_status_t pv_orca_synthesize_params_set_speech_rate(
        pv_orca_synthesize_params_t *object,
        float speech_rate);

Setter for the speech rate.

Parameters

object pv_orca_synthesize_params_t * : Synthesize params object.
speech_rate float : Speed of generated speech. Valid values are within [0.7, 1.3]. Higher (lower) values produce faster (slower) speech. The default is 1.0.

Returns

pv_status_t : Status code.

pv_orca_synthesize_params_get_speech_rate()

pv_status_t pv_orca_synthesize_params_get_speech_rate(
        const pv_orca_synthesize_params_t *object,
        float *speech_rate);

Getter for the speech rate.

Parameters

object pv_orca_synthesize_params_t * : Synthesize params object.
speech_rate float * : Speed of generated speech. For details see the documentation of pv_orca_synthesize_params_set_speech_rate().

Returns

pv_status_t : Status code.

pv_orca_synthesize_params_set_random_state()

pv_status_t pv_orca_synthesize_params_set_random_state(
        pv_orca_synthesize_params_t *object,
        int64_t random_state);

Setter for the random state.

Parameters

object pv_orca_synthesize_params_t * : Synthesize params object.
random_state int64_t : Random seed for the synthesis process. This can be used to ensure that the synthesized speech is deterministic across different runs. Valid values are all non-negative integers. If not provided, a random seed will be chosen and the synthesis process will be non-deterministic.

Returns

pv_status_t : Status code.

pv_orca_synthesize_params_get_random_state()

pv_status_t pv_orca_synthesize_params_get_random_state(
        const pv_orca_synthesize_params_t *object,
        int64_t *random_state);

Getter for the random state.

Parameters

object pv_orca_synthesize_params_t * : Synthesize params object.
random_state int64_t * : Random seed for the synthesis process. For details see the documentation of pv_orca_synthesize_params_set_random_state().

Returns

pv_status_t : Status code.

pv_orca_word_alignment_t

typedef struct {
    char *word; /** Synthesized word. */
    float start_sec; /** Start of word in seconds. */
    float end_sec; /** End of word in seconds. */

    int32_t num_phonemes; /** Number of phonemes in the word. */
    pv_orca_phoneme_alignment_t **phonemes; /** Array of phonemes in the word. */
} pv_orca_word_alignment_t;

Struct for a synthesized word and its associated metadata.

pv_orca_phoneme_alignment_t

typedef struct {
    char *phoneme; /** Synthesized phoneme. */
    float start_sec; /** Start of phoneme in seconds. */
    float end_sec; /** End of phoneme in seconds. */
} pv_orca_phoneme_alignment_t;

Struct for a synthesized phoneme and its associated metadata.

pv_status_t

typedef enum {
    PV_STATUS_SUCCESS = 0,
    PV_STATUS_OUT_OF_MEMORY,
    PV_STATUS_IO_ERROR,
    PV_STATUS_INVALID_ARGUMENT,
    PV_STATUS_STOP_ITERATION,
    PV_STATUS_KEY_ERROR,
    PV_STATUS_INVALID_STATE,
    PV_STATUS_RUNTIME_ERROR,
    PV_STATUS_ACTIVATION_ERROR,
    PV_STATUS_ACTIVATION_LIMIT_REACHED,
    PV_STATUS_ACTIVATION_THROTTLED,
    PV_STATUS_ACTIVATION_REFUSED
} pv_status_t;

Status code enum.

pv_status_to_string()

const char *pv_status_to_string(pv_status_t status);

Parameters

status int32_t : Status code.

Returns

const char * : String representation of status code.

pv_get_error_stack()

pv_status_t pv_get_error_stack(
        char ***message_stack,
        int32_t *message_stack_depth);

If a function returns a failure (any pv_status_t other than PV_STATUS_SUCCESS), this function can be called to get a series of error messages related to the failure. This function can only be called only once per failure status on another function. The memory for message_stack must be freed using pv_free_error_stack.

Regardless of the return status of this function, if message_stack is not NULL, then message_stack contains valid memory. However, a failure status on this function indicates that future error messages may not be reported.

Parameters

message_stack const char * * * : Array of messages relating to the failure. Messages are NULL terminated strings. The array and messages must be freed using pv_free_error_stack().
message_stack_depth int32_t * : The number of messages in the message_stack array.

pv_free_error_stack()

void pv_free_error_stack(char **message_stack);

This function frees the memory used by error messages allocated by pv_get_error_stack().

Parameters

message_stack const char * * : Array of messages relating to the failure.

Was this doc helpful?

Issue with this doc?

Orca Streaming Text-to-Speech C API

Orca Streaming Text-to-Speech
C API