Leopard Speech-to-Text
.NET API
API Reference for the .NET Leopard SDK (NuGet)
namespace: Pv
Leopard
Class for the Leopard Speech-to-Text engine.
Leopard.Create()
Leopard constructor.
Parameters
accessKeystring : AccessKey obtained from Picovoice Console.modelPathstring : Absolute path to the file containing model parameters (.pv).devicestring : String representation of the device (e.g., CPU or GPU) to use. If set tobest, the most suitable device is selected automatically. If set togpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}, where${GPU_INDEX}is the index of the target GPU. If set tocpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}, where${NUM_THREADS}is the desired number of threads.enableAutomaticPunctuationbool : Whether to enable automatic punctuation.enableDiarizationbool : Whether to enable diarization. Set totrueto enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeaker_tagto identify unique speakers.
Returns
Leopard: An instance of Leopard Speech-To-Text engine.
Throws
LeopardException: If an error occurs while creating an instance of the Leopard Speech-to-Text engine.
Leopard.Process()
Processes given audio data and returns its transcription. The incoming audio needs to have a sample rate equal
to .SampleRate() and be 16-bit linearly-encoded. Furthermore, Leopard operates on
single channel audio. If you wish to process data in a different sample rate or format consider
using .ProcessFile().
Parameters
pcmshort[] : Audio data.
Returns
LeopardTranscript: object which contains the transcription results of the engine.
Throws
LeopardException: if there is an error while processing the audio frame.
Leopard.ProcessFile()
Processes a given audio file and returns its transcription.
Parameters
audioPathstring : Absolute path to the audio file. The supported audio file formats are:3gp (AMR),FLAC,MP3,MP4/m4a (AAC),Ogg,WAVandWebM.
Returns
LeopardTranscript: object which contains the transcription results of the engine.
Throws
LeopardException: if there is an error while processing the audio file.
Leopard.SampleRate
Getter for audio sample rate accepted by Picovoice.
Returns
int: Audio sample rate accepted by Picovoice.
Leopard.Version
Getter for version.
Returns
string: CurrentLeopardversion.
Leopard.GetAvailableDevices()
Retrieves a list of hardware devices that can be specified when constructing Leopard.
Returns
string[]: An array of available hardware devices.
Throws
LeopardException: If an error occurs while retrieving the hardware devices.
LeopardTranscript
Class that contains transcription results returned from Leopard.process()
and Leopard.processFile().
Parameters
transcriptStringString : Inferred transcription.wordArrayLeopardWord[] : Transcribed words and their associated metadata.
LeopardTranscript.TranscriptString
Getter for the inferred transcription.
Returns
String: Inferred transcription.
LeopardTranscript.WordArray
Getter for transcribed words and their associated metadata.
Returns
LeopardWord[]: Transcribed words and their associated metadata.
LeopardWord
Class for storing word metadata.
Parameters
wordString : Transcribed word.confidencefloat : Transcription confidence. It is a number within [0, 1].startSecfloat : Start of word in seconds.endSecfloat : End of word in seconds.speakerTagint : The speaker tag is-1if diarization is not enabled during initialization; otherwise, it's a non-negative integer identifying unique speakers, with0reserved for unknown speakers.
LeopardWord.Word
Getter for the transcribed word.
Returns
String: Transcribed word.
LeopardWord.Confidence
Getter for the transcription confidence.
Returns
float: Transcription confidence. It is a number within [0, 1].
LeopardWord.StartSec
Getter for the start of word in seconds.
Returns
float: Start of word in seconds.
LeopardWord.EndSec
Getter for the end of word in seconds.
Returns
float: End of word in seconds.
LeopardWord.SpeakerTag
Getter for the speaker tag.
Returns
int: Speaker tag associated with speaker.
LeopardException
Exception thrown if an error occurs within Leopard Speech-to-Text engine.
Exceptions: