Leopard Speech-to-Text
iOS API
API Reference for the iOS Leopard SDK (Cocoapod)
Leopard
Class for the Leopard Speech-to-Text engine.
Resources should be cleaned when you are done using the delete() function.
Leopard.init()
init method for Leopard Speech-to-Text engine with a mixture of arguments.
Parameters
accessKeyString : The AccessKey obtained from Picovoice Console.modelPathString : Absolute path to file containing model parameters (.pv).enableAutomaticPunctuationBool : Set totrueto enable automatic punctuation insertion.enableDiarizationBool : Set totrueto enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeaker_tagto identify unique speakers.
Throws
LeopardError: If an error occurs while creating an instance of Leopard Speech-to-Text engine.
Parameters
accessKeyString : The AccessKey obtained from Picovoice Console.modelURLURL : URL to file containing model parameters (.pv).enableAutomaticPunctuationBool : Set totrueto enable automatic punctuation insertion.enableDiarizationBool : Set totrueto enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeaker_tagto identify unique speakers.
Throws
LeopardError: If an error occurs while creating an instance of Leopard Speech-to-Text engine.
Leopard.delete()
Releases resources acquired by the Leopard engine.
Leopard.process()
Processes given audio data with the Leopard Speech-to-Text engine.
Parameters
pcm[Int16] : The incoming audio needs to have a sample rate equal toLeopard.sampleRateand be 16-bit linearly-encoded. Furthermore, Leopard operates on single-channel audio.
Returns
- String, [
LeopardWord] : Inferred transcription and sequence of transcribed words with their associated metadata.
Throws
LeopardError: If there is an error while processing the audio frame.
Leopard.processFile()
Processes a given audio file with the Leopard Speech-to-Text engine.
Parameters
audioPathString : Absolute path to the audio file. The supported formats are:3gp (AMR),FLAC,MP3,MP4/m4a (AAC),Ogg,WAVandWebM.
Returns
- String, [
LeopardWord] : Inferred transcription and sequence of transcribed words with their associated metadata.
Throws
LeopardError: If there is an error while processing the audio frame.
Leopard.processFile()
Processes a given audio file with the Leopard Speech-to-Text engine.
Parameters
audioURLURL : URL of the audio file. The supported formats are:3gp (AMR),FLAC,MP3,MP4/m4a (AAC),Ogg,WAVandWebM.
Returns
- String, [
LeopardWord] : Inferred transcription and sequence of transcribed words with their associated metadata.
Throws
LeopardError: If there is an error while processing the audio frame.
Leopard.sampleRate
Audio sample rate accepted by Leopard.
Leopard.version
Current Leopard version.
LeopardError
Error thrown if an error occurs within Leopard Speech-to-Text engine.
LeopardWord
Struct for storing word metadata returned from the Leopard engine.
LeopardWord.word
The transcribed word.
LeopardWord.confidence
Transcription confidence. It is a number within [0, 1].
LeopardWord.startSec
Start of word in seconds.
LeopardWord.endSec
End of word in seconds.
LeopardWord.speakerTag
Speaker tag is -1 if diarization is not enabled during initialization;
otherwise, it's a non-negative integer identifying unique speakers, with 0 reserved for unknown speakers.