Leopard Speech-to-Text
Android API
API Reference for the Android Leopard SDK (leopard-android)
package: ai.picovoice.leopard
Leopard
Class for the Leopard Speech-to-Text engine.
Leopard must be initialized using the Builder() Class. Resources should be cleaned when you are done using the delete() function.
Leopard.delete()
Releases resources acquired by Leopard.
Leopard.getSampleRate()
Getter for required audio sample rate for PCM data.
Returns
int: Required audio sample rate for PCM data.
Leopard.getVersion()
Getter for version.
Returns
String: CurrentLeopardversion.
Leopard.process()
Processes given audio data and returns its transcription. The incoming audio needs to have a sample rate equal to .getSampleRate() and be 16-bit linearly-encoded.
Furthermore, Leopard operates on single channel audio. If you wish to process data in a different sample rate or format consider using .processFile().
Parameters
pcmshort[] : A frame of audio samples.
Returns
LeopardTranscript: Inferred transcription and word metadata.
Throws
LeopardException: If there is an error while processing the audio frame.
Leopard.processFile()
Processes a given audio file and returns its transcription.
Parameters
pathString : Absolute path to the audio file on device. The supported audio file formats are:3gp (AMR),FLAC,MP3,MP4/m4a (AAC),Ogg,WAVandWebM.
Returns
LeopardTranscript: Inferred transcription and word metadata.
Throws
LeopardException: If there is an error while processing the audio file.
Leopard.Builder
Builder for creating an instance of Leopard with a mixture of default arguments.
Leopard.Builder.build()
Creates an instance of Leopard Speech-to-Text engine.
Parameters
contextContext : The Android app context.
Returns
Leopard: An instance of Leopard Speech-to-Text engine.
Throws
LeopardException: If an error occurs while creating an instance of Leopard Speech-to-Text engine.
Leopard.Builder.setAccessKey()
Sets the AccessKey of the builder.
Parameters
accessKeyString : AccessKey obtained from Picovoice Console.
Returns
Leopard.Builder: Modified Leopard.Builder object.
Leopard.Builder.setModelPath()
Sets the model path of the builder.
Parameters
modelPathString : Path to the file containing model parameters (.pv). Can be either a path that is relative to the project'sassetsfolder or an absolute path to the file on device.
Returns
Leopard.Builder: Modified Leopard.Builder object.
Leopard.Builder.setEnableAutomaticPunctuation()
Setter for enabling automatic punctuation insertion.
Parameters
enableAutomaticPunctuationboolean : Set totrueto enable automatic punctuation insertion.
Returns
Leopard.Builder: Modified Leopard.Builder object.
Leopard.Builder.setEnableDiarization()
Setter for enabling speaker diarization.
Parameters
enableDiarizationboolean : Set totrueto enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeakerTagto identify unique speakers.
Returns
Leopard.Builder: Modified Leopard.Builder object.
LeopardException
Exception thrown if an error occurs within Leopard Speech-to-Text engine.
Exceptions:
LeopardTranscript
Class that contains transcription results returned from Leopard.process() and Leopard.processFile().
Parameters
transcriptStringString : Inferred transcription.wordArrayLeopardTranscript.Word[] : Transcribed words and their associated metadata.
LeopardTranscript.getTranscriptString()
Getter for the inferred transcription.
Returns
String: Inferred transcription.
LeopardTranscript.getWordArray()
Getter for transcribed words and their associated metadata.
Returns
LeopardTranscript.Word[]: Transcribed words and their associated metadata.
LeopardTranscript.Word
Class for storing word metadata from a LeopardTranscript.
Parameters
wordString : Transcribed word.confidencefloat : Transcription confidence. It is a number within [0, 1].startSecfloat : Start of word in seconds.endSecfloat : End of word in seconds.speakerTagint : The speaker tag is-1if diarization is not enabled during initialization; otherwise, it's a non-negative integer identifying unique speakers, with0reserved for unknown speakers.
LeopardTranscript.Word.getWord()
Getter for the transcribed word.
Returns
String: Transcribed word.
LeopardTranscript.Word.getConfidence()
Getter for the transcription confidence.
Returns
float: Transcription confidence. It is a number within [0, 1].
LeopardTranscript.Word.getStartSec()
Getter for the start of word in seconds.
Returns
float: Start of word in seconds.
LeopardTranscript.Word.getEndSec()
Getter for the end of word in seconds.
Returns
float: End of word in seconds.
LeopardTranscript.Word.getSpeakerTag()
Getter for the speaker tag.
Returns
int: Speaker tag associated with speaker.