Leopard Speech-to-Text
iOS API
API Reference for the iOS Leopard SDK (Cocoapod)
Leopard
Class for the Leopard Speech-to-Text engine.
Resources should be cleaned when you are done using the delete()
function.
Leopard.init()
init
method for Leopard Speech-to-Text engine with a mixture of arguments.
Parameters
accessKey
String : The AccessKey obtained from Picovoice Console.modelPath
String : Absolute path to file containing model parameters (.pv
).enableAutomaticPunctuation
Bool : Set totrue
to enable automatic punctuation insertion.enableDiarization
Bool : Set totrue
to enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeaker_tag
to identify unique speakers.
Throws
LeopardError
: If an error occurs while creating an instance of Leopard Speech-to-Text engine.
Parameters
accessKey
String : The AccessKey obtained from Picovoice Console.modelURL
URL : URL to file containing model parameters (.pv
).enableAutomaticPunctuation
Bool : Set totrue
to enable automatic punctuation insertion.enableDiarization
Bool : Set totrue
to enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeaker_tag
to identify unique speakers.
Throws
LeopardError
: If an error occurs while creating an instance of Leopard Speech-to-Text engine.
Leopard.delete()
Releases resources acquired by the Leopard engine.
Leopard.process()
Processes given audio data with the Leopard Speech-to-Text engine.
Parameters
pcm
[Int16] : The incoming audio needs to have a sample rate equal toLeopard.sampleRate
and be 16-bit linearly-encoded. Furthermore, Leopard operates on single-channel audio.
Returns
- String, [
LeopardWord
] : Inferred transcription and sequence of transcribed words with their associated metadata.
Throws
LeopardError
: If there is an error while processing the audio frame.
Leopard.processFile()
Processes a given audio file with the Leopard Speech-to-Text engine.
Parameters
audioPath
String : Absolute path to the audio file. The supported formats are:3gp (AMR)
,FLAC
,MP3
,MP4/m4a (AAC)
,Ogg
,WAV
andWebM
.
Returns
- String, [
LeopardWord
] : Inferred transcription and sequence of transcribed words with their associated metadata.
Throws
LeopardError
: If there is an error while processing the audio frame.
Leopard.processFile()
Processes a given audio file with the Leopard Speech-to-Text engine.
Parameters
audioURL
URL : URL of the audio file. The supported formats are:3gp (AMR)
,FLAC
,MP3
,MP4/m4a (AAC)
,Ogg
,WAV
andWebM
.
Returns
- String, [
LeopardWord
] : Inferred transcription and sequence of transcribed words with their associated metadata.
Throws
LeopardError
: If there is an error while processing the audio frame.
Leopard.sampleRate
Audio sample rate accepted by Leopard.
Leopard.version
Current Leopard version.
LeopardError
Error thrown if an error occurs within Leopard Speech-to-Text engine.
LeopardWord
Struct for storing word metadata returned from the Leopard engine.
LeopardWord.word
The transcribed word.
LeopardWord.confidence
Transcription confidence. It is a number within [0, 1].
LeopardWord.startSec
Start of word in seconds.
LeopardWord.endSec
End of word in seconds.
LeopardWord.speakerTag
Speaker tag is -1
if diarization is not enabled during initialization;
otherwise, it's a non-negative integer identifying unique speakers, with 0
reserved for unknown speakers.