Cheetah Speech-to-Text
Python API
API Reference for the Python Cheetah SDK (PyPI).
pvcheetah.create()
Factory method for Cheetah Speech-to-Text engine.
Parameters
access_key
str : AccessKey obtained from Picovoice Console.library_path
Optional[str] : Absolute path to Cheetah's dynamic library.model_path
Optional[str] : Absolute path to the file containing model parameters.endpoint_duration_sec
Optional[float] : Duration of endpoint in seconds. A speech endpoint is detected when there is a chunk of audio (with a duration specified herein) after an utterance without any speech in it. Set toNone
to disable endpoint detection.enable_automatic_punctuation
bool : Set toTrue
to enable automatic punctuation insertion.
Returns
Cheetah
: An instance of Cheetah Speech-to-Text engine.
Throws
pvcheetah.Cheetah
Class for the Cheetah Speech-to-Text engine.
Cheetah can be initialized either using the module level create()
function
or directly using the class __init__()
method.
Resources should be cleaned when you are done using the delete()
method.
pvcheetah.Cheetah.version
The version string of the Cheetah library.
pvcheetah.Cheetah.frame_length
The number of audio samples per frame that Cheetah accepts.
pvcheetah.Cheetah.sample_rate
The audio sample rate the Cheetah accepts.
pvcheetah.Cheetah.__init__()
__init__
method for Cheetah Speech-to-Text engine.
Parameters
access_key
str : AccessKey obtained from Picovoice Console.model_path
str : Absolute path to the file containing model parameters.library_path
str : Absolute path to Cheetah's dynamic library.endpoint_duration_sec
float : Duration of endpoint in seconds.enable_automatic_punctuation
bool : Set toTrue
to enable automatic punctuation insertion.
Returns
Cheetah
: An instance of Cheetah Speech-to-Text engine.
Throws
pvcheetah.Cheetah.delete()
Releases resources acquired by Cheetah.
pvcheetah.Cheetah.process()
Processes a frame of audio and returns newly-transcribed text and a flag indicating if an endpoint has been detected. Upon detection of an endpoint, the client may invoke .flush()
to retrieve any remaining transcription.
The number of samples per frame can be attained by calling .frame_length
. The incoming audio needs to have a sample rate equal to .sample_rate
and be 16-bit linearly-encoded. Furthermore, Cheetah operates on single-channel audio.
Parameters
pcm
Sequence[int] : A frame of audio samples.
Returns
Tuple[str, bool]
: Any newly-transcribed speech (if none is available then an empty string is returned) and a flag indicating if an endpoint has been detected.
Throws
pvcheetah.Cheetah.flush()
Marks the end of the audio stream, flushes internal state of the object, and returns any remaining transcribed text.
Returns
str
: Any remaining transcribed text. If none is available then an empty string is returned.
Throws
pvcheetah.CheetahError
Error thrown if an error occurs within Cheetah
Speech-to-Text engine.
Exceptions