Leopard Speech-to-Text
Android API
API Reference for the Android Leopard SDK (leopard-android)
package: ai.picovoice.leopard
Leopard
Class for the Leopard Speech-to-Text engine.
Leopard must be initialized using the Builder() Class. Resources should be cleaned when you are done using the delete()
function.
Leopard.delete()
Releases resources acquired by Leopard
.
Leopard.getSampleRate()
Getter for required audio sample rate for PCM data.
Returns
int
: Required audio sample rate for PCM data.
Leopard.getVersion()
Getter for version.
Returns
String
: CurrentLeopard
version.
Leopard.process()
Processes given audio data and returns its transcription. The incoming audio needs to have a sample rate equal to .getSampleRate()
and be 16-bit linearly-encoded.
Furthermore, Leopard
operates on single channel audio. If you wish to process data in a different sample rate or format consider using .processFile()
.
Parameters
pcm
short[] : A frame of audio samples.
Returns
LeopardTranscript
: Inferred transcription and word metadata.
Throws
LeopardException
: If there is an error while processing the audio frame.
Leopard.processFile()
Processes a given audio file and returns its transcription.
Parameters
path
String : Absolute path to the audio file on device. The supported audio file formats are:3gp (AMR)
,FLAC
,MP3
,MP4/m4a (AAC)
,Ogg
,WAV
andWebM
.
Returns
LeopardTranscript
: Inferred transcription and word metadata.
Throws
LeopardException
: If there is an error while processing the audio file.
Leopard.Builder
Builder for creating an instance of Leopard
with a mixture of default arguments.
Leopard.Builder.build()
Creates an instance of Leopard
Speech-to-Text engine.
Parameters
context
Context : The Android app context.
Returns
Leopard
: An instance of Leopard Speech-to-Text engine.
Throws
LeopardException
: If an error occurs while creating an instance of Leopard Speech-to-Text engine.
Leopard.Builder.setAccessKey()
Sets the AccessKey of the builder.
Parameters
accessKey
String : AccessKey obtained from Picovoice Console.
Returns
Leopard.Builder
: Modified Leopard.Builder object.
Leopard.Builder.setModelPath()
Sets the model path of the builder.
Parameters
modelPath
String : Path to the file containing model parameters (.pv
). Can be either a path that is relative to the project'sassets
folder or an absolute path to the file on device.
Returns
Leopard.Builder
: Modified Leopard.Builder object.
Leopard.Builder.setEnableAutomaticPunctuation()
Setter for enabling automatic punctuation insertion.
Parameters
enableAutomaticPunctuation
boolean : Set totrue
to enable automatic punctuation insertion.
Returns
Leopard.Builder
: Modified Leopard.Builder object.
Leopard.Builder.setEnableDiarization()
Setter for enabling speaker diarization.
Parameters
enableDiarization
boolean : Set totrue
to enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeakerTag
to identify unique speakers.
Returns
Leopard.Builder
: Modified Leopard.Builder object.
LeopardException
Exception thrown if an error occurs within Leopard
Speech-to-Text engine.
Exceptions:
LeopardTranscript
Class that contains transcription results returned from Leopard.process()
and Leopard.processFile()
.
Parameters
transcriptString
String : Inferred transcription.wordArray
LeopardTranscript.Word[] : Transcribed words and their associated metadata.
LeopardTranscript.getTranscriptString()
Getter for the inferred transcription.
Returns
String
: Inferred transcription.
LeopardTranscript.getWordArray()
Getter for transcribed words and their associated metadata.
Returns
LeopardTranscript.Word[]
: Transcribed words and their associated metadata.
LeopardTranscript.Word
Class for storing word metadata from a LeopardTranscript
.
Parameters
word
String : Transcribed word.confidence
float : Transcription confidence. It is a number within [0, 1].startSec
float : Start of word in seconds.endSec
float : End of word in seconds.speakerTag
int : The speaker tag is-1
if diarization is not enabled during initialization; otherwise, it's a non-negative integer identifying unique speakers, with0
reserved for unknown speakers.
LeopardTranscript.Word.getWord()
Getter for the transcribed word.
Returns
String
: Transcribed word.
LeopardTranscript.Word.getConfidence()
Getter for the transcription confidence.
Returns
float
: Transcription confidence. It is a number within [0, 1].
LeopardTranscript.Word.getStartSec()
Getter for the start of word in seconds.
Returns
float
: Start of word in seconds.
LeopardTranscript.Word.getEndSec()
Getter for the end of word in seconds.
Returns
float
: End of word in seconds.
LeopardTranscript.Word.getSpeakerTag()
Getter for the speaker tag.
Returns
int
: Speaker tag associated with speaker.