Cheetah Speech-to-Text
Python API
API Reference for the Python Cheetah SDK (PyPI).
pvcheetah.create()
Factory method for Cheetah Speech-to-Text engine.
Parameters
access_keystr : AccessKey obtained from Picovoice Console.library_pathOptional[str] : Absolute path to Cheetah's dynamic library.model_pathOptional[str] : Absolute path to the file containing model parameters.endpoint_duration_secOptional[float] : Duration of endpoint in seconds. A speech endpoint is detected when there is a chunk of audio (with a duration specified herein) after an utterance without any speech in it. Set toNoneto disable endpoint detection.enable_automatic_punctuationbool : Set toTrueto enable automatic punctuation insertion.
Returns
Cheetah: An instance of Cheetah Speech-to-Text engine.
Throws
pvcheetah.Cheetah
Class for the Cheetah Speech-to-Text engine.
Cheetah can be initialized either using the module level create() function
or directly using the class __init__() method.
Resources should be cleaned when you are done using the delete() method.
pvcheetah.Cheetah.version
The version string of the Cheetah library.
pvcheetah.Cheetah.frame_length
The number of audio samples per frame that Cheetah accepts.
pvcheetah.Cheetah.sample_rate
The audio sample rate the Cheetah accepts.
pvcheetah.Cheetah.__init__()
__init__ method for Cheetah Speech-to-Text engine.
Parameters
access_keystr : AccessKey obtained from Picovoice Console.model_pathstr : Absolute path to the file containing model parameters.library_pathstr : Absolute path to Cheetah's dynamic library.endpoint_duration_secfloat : Duration of endpoint in seconds.enable_automatic_punctuationbool : Set toTrueto enable automatic punctuation insertion.
Returns
Cheetah: An instance of Cheetah Speech-to-Text engine.
Throws
pvcheetah.Cheetah.delete()
Releases resources acquired by Cheetah.
pvcheetah.Cheetah.process()
Processes a frame of audio and returns newly-transcribed text and a flag indicating if an endpoint has been detected. Upon detection of an endpoint, the client may invoke .flush() to retrieve any remaining transcription.
The number of samples per frame can be attained by calling .frame_length. The incoming audio needs to have a sample rate equal to .sample_rate and be 16-bit linearly-encoded. Furthermore, Cheetah operates on single-channel audio.
Parameters
pcmSequence[int] : A frame of audio samples.
Returns
Tuple[str, bool]: Any newly-transcribed speech (if none is available then an empty string is returned) and a flag indicating if an endpoint has been detected.
Throws
pvcheetah.Cheetah.flush()
Marks the end of the audio stream, flushes internal state of the object, and returns any remaining transcribed text.
Returns
str: Any remaining transcribed text. If none is available then an empty string is returned.
Throws
pvcheetah.CheetahError
Error thrown if an error occurs within Cheetah Speech-to-Text engine.
Exceptions