Leopard Speech-to-Text
Python API
API Reference for the Python Leopard SDK (PyPI).
pvleopard.create()
Factory method for Leopard Speech-to-Text engine.
Parameters
access_keystr : AccessKey obtained from Picovoice Console.model_pathOptional[str] : Absolute path to the file containing model parameters.deviceOptional[str] : String representation of the device (e.g., CPU or GPU) to use. If set tobest, the most suitable device is selected automatically. If set togpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}, where${GPU_INDEX}is the index of the target GPU. If set tocpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}, where${NUM_THREADS}is the desired number of threads.library_pathOptional[str] : Absolute path to Leopard's dynamic library.enable_automatic_punctuationbool : Set toTrueto enable automatic punctuation insertion.enable_diarizationbool : Set totrueto enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeakerTagto identify unique speakers.
Returns
Leopard: An instance of Leopard Speech-to-Text engine.
Throws
pvleopard.available_devices()
Lists all available devices that Leopard can use for inference. Each entry in the list can be the device argument
of create() factory method or Leopard constructor.
Parameters
library_pathOptional[str] : Absolute path to Leopard's dynamic library. If not set it will be set to the default location.
Returns
- Sequence[str]: List of all available devices that Leopard can use for inference.
Throws
pvleopard.Leopard
Class for the Leopard Speech-to-Text engine.
Leopard can be initialized either using the module level create() function
or directly using the class __init__() method.
Resources should be cleaned when you are done using the delete() method.
pvleopard.Leopard.version
The version string of the Leopard library.
pvleopard.Leopard.sample_rate
The audio sample rate the Leopard accepts.
pvleopard.Leopard.__init__()
__init__ method for Leopard Speech-to-Text engine.
Parameters
access_keystr : AccessKey obtained from Picovoice Console.model_pathstr : Absolute path to the file containing model parameters.devicestr : String representation of the device (e.g., CPU or GPU) to use. If set tobest, the most suitable device is selected automatically. If set togpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}, where${GPU_INDEX}is the index of the target GPU. If set tocpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}, where${NUM_THREADS}is the desired number of threads.library_pathstr : Absolute path to Leopard's dynamic library.enable_automatic_punctuationbool : Set toTrueto enable automatic punctuation insertion.enable_diarizationbool : Set totrueto enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeakerTagto identify unique speakers.
Returns
Leopard: An instance of Leopard Speech-to-Text engine.
Throws
pvleopard.Leopard.delete()
Releases resources acquired by Leopard.
pvleopard.Leopard.Word
Metadata associated with a transcribed word.
wordstr : Transcribed word.start_secfloat : Start of word in secondsend_secfloat : End of word in secondsconfidencefloat : Transcription confidence.speaker_tagint : Speaker tag is-1if diarization is not enabled during initialization; otherwise, it's a non-negative integer identifying unique speakers, with0reserved for unknown speakers.
pvleopard.Leopard.process()
Processes a given audio data and returns its transcription. The audio needs to have a sample rate equal to .sample_rate and be 16-bit linearly-encoded. This function operates on single-channel audio.
If you wish to process data in a different sample rate or format consider using .process_file().
Parameters
pcmSequence[int] : Audio data.
Returns
Tuple[str, Sequence[Word]]: Inferred transcription and sequence of transcribed words and their associated metadata.
Throws
pvleopard.Leopard.process_file()
Processes a given audio file and returns its transcription. The supported formats are: 3gp (AMR), FLAC, MP3, MP4/m4a (AAC), Ogg, WAV, and WebM.
Parameters
audio_pathstr : Absolute path to the audio file.
Returns
Tuple[str, Sequence[Word]]: Inferred transcription and sequence of transcribed words and their associated metadata.
Throws
pvleopard.list_hardware_devices()
Lists all available devices that Leopard can use for inference. Each entry in the list can be the device argument
of create() factory method or Leopard constructor.
Internal method. The higher level pvleopard.available_devices() should be used instead.
Parameters
library_pathstr : Absolute path to Leopard's dynamic library.
Returns
- Sequence[str]: List of all available devices that Leopard can use for inference.
Throws
pvleopard.LeopardError
Error thrown if an error occurs within Leopard Speech-to-Text engine.
Exceptions