Picovoice Wordmark
Start Building
Introduction
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryPicovoice picoLLMGPTQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeWeb
SummaryPicovoice LeopardAmazon TranscribeAzure Speech-to-TextGoogle ASRGoogle ASR (Enhanced)IBM Watson Speech-to-TextWhisper Speech-to-Text
FAQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeWeb
SummaryPicovoice CheetahAzure Real-Time Speech-to-TextAmazon Transcribe StreamingGoogle Streaming ASR
FAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryAmazon PollyAzure TTSElevenLabsOpenAI TTSPicovoice Orca
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice KoalaMozilla RNNoise
Introduction
AndroidCiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidCNode.jsPythoniOSWeb
SummaryPicovoice EaglepyannoteSpeechBrainWeSpeaker
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice FalconAmazon TranscribeAzure Speech-to-TextGoogle Speech-to-Textpyannote
Introduction
AndroidArduinoCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSMicrocontrollerNode.jsPythonRaspberry PiReactReact NativeSafariWebWindows
AndroidC.NETFlutteriOSJavaMicrocontrollerNode.jsPythonReactReact NativeWeb
SummaryPicovoice PorcupineSnowboyPocketSphinx
Wake Word TipsFAQ
Introduction
AndroidCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeSafariWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeWeb
SummaryPicovoice RhinoGoogle DialogflowAmazon LexIBM WatsonMicrosoft LUIS
Expression SyntaxFAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryPicovoice CobraWebRTC VADSilero VAD
FAQ
Introduction
AndroidC.NETFlutteriOSNode.jsPythonReact NativeWeb
AndroidC.NETFlutteriOSNode.jsPythonReact NativeWeb
Introduction
C.NETNode.jsPython
C.NETNode.jsPython
FAQGlossary

Leopard Speech-to-Text
iOS API

API Reference for the iOS Leopard SDK (Cocoapod)


Leopard

public class Leopard { }

Class for the Leopard Speech-to-Text engine. Resources should be cleaned when you are done using the delete() function.


Leopard.getAvailableDevices()

public static func getAvailableDevices() throws -> [String]

Retrieves a list of devices that can be specified when constructing Leopard.

Returns

  • [String] : An array of available devices.

Throws

  • LeopardError: If an error occurs while retrieving the devices.

Leopard.init()

init methods for Leopard Speech-to-Text engine.

public init(
accessKey: String,
modelPath: String,
device: String? = nil,
enableAutomaticPunctuation: Bool = false,
enableDiarization: Bool = false) throws -> Leopard

Parameters

  • accessKey String : The AccessKey obtained from Picovoice Console.
  • modelPath String : Absolute path to file containing model parameters (.pv).
  • device String? : String representation of the device (e.g., CPU or GPU) to use. If set to best, the most suitable device is selected automatically. If set to gpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument to gpu:${GPU_INDEX}, where ${GPU_INDEX} is the index of the target GPU. If set to cpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument to cpu:${NUM_THREADS}, where ${NUM_THREADS} is the desired number of threads.
  • enableAutomaticPunctuation Bool : Set to true to enable automatic punctuation insertion.
  • enableDiarization Bool : Set to true to enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include a speaker_tag to identify unique speakers.

Throws

  • LeopardError: If an error occurs while creating an instance of Leopard Speech-to-Text engine.
public init(
accessKey: String,
modelURL: URL,
device: String? = nil,
enableAutomaticPunctuation: Bool = false,
enableDiarization: Bool = false) throws -> Leopard

Parameters

  • accessKey String : The AccessKey obtained from Picovoice Console.
  • modelURL URL : URL to file containing model parameters (.pv).
  • device String? : String representation of the device (e.g., CPU or GPU) to use. If set to best, the most suitable device is selected automatically. If set to gpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument to gpu:${GPU_INDEX}, where ${GPU_INDEX} is the index of the target GPU. If set to cpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument to cpu:${NUM_THREADS}, where ${NUM_THREADS} is the desired number of threads.
  • enableAutomaticPunctuation Bool : Set to true to enable automatic punctuation insertion.
  • enableDiarization Bool : Set to true to enable speaker diarization, which allows Leopard to differentiate speakers as part of the transcription process. Word metadata will include a speaker_tag to identify unique speakers.

Throws

  • LeopardError: If an error occurs while creating an instance of Leopard Speech-to-Text engine.

Leopard.delete()

public func delete()

Releases resources acquired by the Leopard engine.


Leopard.process()

public func process(pcm: [Int16]) throws -> (transcript: String, words: [LeopardWord])

Processes given audio data with the Leopard Speech-to-Text engine.

Parameters

  • pcm [Int16] : The incoming audio needs to have a sample rate equal to Leopard.sampleRate and be 16-bit linearly-encoded. Furthermore, Leopard operates on single-channel audio.

Returns

  • String, [LeopardWord] : Inferred transcription and sequence of transcribed words with their associated metadata.

Throws

  • LeopardError: If there is an error while processing the audio frame.

Leopard.processFile()

public func processFile(audioPath: String) throws -> (transcript: String, words: [LeopardWord])

Processes a given audio file with the Leopard Speech-to-Text engine.

Parameters

  • audioPath String : Absolute path to the audio file. The supported formats are: 3gp (AMR), FLAC, MP3, MP4/m4a (AAC), Ogg, WAV and WebM.

Returns

  • String, [LeopardWord] : Inferred transcription and sequence of transcribed words with their associated metadata.

Throws

  • LeopardError: If there is an error while processing the audio frame.

Leopard.processFile()

public func processFile(audioURL: URL) throws -> (transcript: String, words: [LeopardWord])

Processes a given audio file with the Leopard Speech-to-Text engine.

Parameters

  • audioURL URL : URL of the audio file. The supported formats are: 3gp (AMR), FLAC, MP3, MP4/m4a (AAC), Ogg, WAV and WebM.

Returns

  • String, [LeopardWord] : Inferred transcription and sequence of transcribed words with their associated metadata.

Throws

  • LeopardError: If there is an error while processing the audio frame.

Leopard.sampleRate

public static let sampleRate: UInt32

Audio sample rate accepted by Leopard.


Leopard.version

public static let version: String

Current Leopard version.


LeopardError

public class LeopardError : LocalizedError { }

Error thrown if an error occurs within Leopard Speech-to-Text engine.

public class LeopardMemoryError : LeopardError {}
public class LeopardIOError : LeopardError {}
public class LeopardInvalidArgumentError : LeopardError {}
public class LeopardStopIterationError : LeopardError {}
public class LeopardKeyError : LeopardError {}
public class LeopardInvalidStateError : LeopardError {}
public class LeopardRuntimeError : LeopardError {}
public class LeopardActivationError : LeopardError {}
public class LeopardActivationLimitError : LeopardError {}
public class LeopardActivationThrottledError : LeopardError {}
public class LeopardActivationRefusedError : LeopardError {}

LeopardWord

public struct LeopardWord { }

Struct for storing word metadata returned from the Leopard engine.


LeopardWord.word

LeopardWord.word: String

The transcribed word.


LeopardWord.confidence

LeopardWord.confidence: Float

Transcription confidence. It is a number within [0, 1].


LeopardWord.startSec

LeopardWord.startSec: Float

Start of word in seconds.


LeopardWord.endSec

LeopardWord.endSec: Float

End of word in seconds.


LeopardWord.speakerTag

LeopardWord.speakerTag: Int

Speaker tag is -1 if diarization is not enabled during initialization; otherwise, it's a non-negative integer identifying unique speakers, with 0 reserved for unknown speakers.

Was this doc helpful?

Issue with this doc?

Report a GitHub Issue
Leopard Speech-to-Text iOS API
  • Leopard
  • getAvailableDevices()
  • init()
  • delete()
  • process()
  • processFile()
  • processFile()
  • sampleRate
  • version
  • LeopardError
  • LeopardWord
  • word
  • confidence
  • startSec
  • endSec
  • speakerTag
Voice AI
  • picoLLM On-Device LLM
  • Leopard Speech-to-Text
  • Cheetah Streaming Speech-to-Text
  • Orca Text-to-Speech
  • Koala Noise Suppression
  • Eagle Speaker Recognition
  • Falcon Speaker Diarization
  • Porcupine Wake Word
  • Rhino Speech-to-Intent
  • Cobra Voice Activity Detection
Resources
  • Docs
  • Console
  • Blog
  • Use Cases
  • Playground
Sales & Services
  • Consulting
  • Foundation Plan
  • Enterprise Plan
  • Enterprise Support
Company
  • About us
  • Careers
Follow Picovoice
  • LinkedIn
  • GitHub
  • X
  • YouTube
  • AngelList
Subscribe to our newsletter
Terms of Use
Privacy Policy
© 2019-2025 Picovoice Inc.