Leopard Speech-to-Text
iOS API
API Reference for the iOS Leopard SDK (Cocoapod)
Leopard
Class for the Leopard Speech-to-Text engine.
Resources should be cleaned when you are done using the delete() function.
Leopard.getAvailableDevices()
Retrieves a list of devices that can be specified when constructing Leopard.
Returns
- [String] : An array of available devices.
Throws
LeopardError: If an error occurs while retrieving the devices.
Leopard.init()
init methods for Leopard Speech-to-Text engine.
Parameters
accessKeyString : The AccessKey obtained from Picovoice Console.modelPathString : Absolute path to file containing model parameters (.pv).deviceString? : String representation of the device (e.g., CPU or GPU) to use. If set tobest, the most suitable device is selected automatically. If set togpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}, where${GPU_INDEX}is the index of the target GPU. If set tocpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}, where${NUM_THREADS}is the desired number of threads.enableAutomaticPunctuationBool : Set totrueto enable automatic punctuation insertion.enableDiarizationBool : Set totrueto enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeaker_tagto identify unique speakers.
Throws
LeopardError: If an error occurs while creating an instance of Leopard Speech-to-Text engine.
Parameters
accessKeyString : The AccessKey obtained from Picovoice Console.modelURLURL : URL to file containing model parameters (.pv).deviceString? : String representation of the device (e.g., CPU or GPU) to use. If set tobest, the most suitable device is selected automatically. If set togpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}, where${GPU_INDEX}is the index of the target GPU. If set tocpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}, where${NUM_THREADS}is the desired number of threads.enableAutomaticPunctuationBool : Set totrueto enable automatic punctuation insertion.enableDiarizationBool : Set totrueto enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include aspeaker_tagto identify unique speakers.
Throws
LeopardError: If an error occurs while creating an instance of Leopard Speech-to-Text engine.
Leopard.delete()
Releases resources acquired by the Leopard engine.
Leopard.process()
Processes given audio data with the Leopard Speech-to-Text engine.
Parameters
pcm[Int16] : The incoming audio needs to have a sample rate equal toLeopard.sampleRateand be 16-bit linearly-encoded. Furthermore, Leopard operates on single-channel audio.
Returns
- String, [
LeopardWord] : Inferred transcription and sequence of transcribed words with their associated metadata.
Throws
LeopardError: If there is an error while processing the audio frame.
Leopard.processFile()
Processes a given audio file with the Leopard Speech-to-Text engine.
Parameters
audioPathString : Absolute path to the audio file. The supported formats are:3gp (AMR),FLAC,MP3,MP4/m4a (AAC),Ogg,WAVandWebM.
Returns
- String, [
LeopardWord] : Inferred transcription and sequence of transcribed words with their associated metadata.
Throws
LeopardError: If there is an error while processing the audio frame.
Leopard.processFile()
Processes a given audio file with the Leopard Speech-to-Text engine.
Parameters
audioURLURL : URL of the audio file. The supported formats are:3gp (AMR),FLAC,MP3,MP4/m4a (AAC),Ogg,WAVandWebM.
Returns
- String, [
LeopardWord] : Inferred transcription and sequence of transcribed words with their associated metadata.
Throws
LeopardError: If there is an error while processing the audio frame.
Leopard.sampleRate
Audio sample rate accepted by Leopard.
Leopard.version
Current Leopard version.
LeopardError
Error thrown if an error occurs within Leopard Speech-to-Text engine.
LeopardWord
Struct for storing word metadata returned from the Leopard engine.
LeopardWord.word
The transcribed word.
LeopardWord.confidence
Transcription confidence. It is a number within [0, 1].
LeopardWord.startSec
Start of word in seconds.
LeopardWord.endSec
End of word in seconds.
LeopardWord.speakerTag
Speaker tag is -1 if diarization is not enabled during initialization;
otherwise, it's a non-negative integer identifying unique speakers, with 0 reserved for unknown speakers.