Picovoice Wordmark
Start Building
Introduction
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryPicovoice picoLLMGPTQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustWeb
SummaryPicovoice LeopardAmazon TranscribeAzure Speech-to-TextGoogle ASRGoogle ASR (Enhanced)IBM Watson Speech-to-TextWhisper Speech-to-Text
FAQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustWeb
SummaryPicovoice Cheetah
FAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryAmazon PollyAzure TTSElevenLabsOpenAI TTSPicovoice Orca
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice KoalaMozilla RNNoise
Introduction
AndroidCiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidCNode.jsPythoniOSWeb
SummaryPicovoice EaglepyannoteSpeechBrainWeSpeaker
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice FalconAmazon TranscribeAzure Speech-to-TextGoogle Speech-to-Textpyannote
Introduction
AndroidArduinoCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSMicrocontrollerNode.jsPythonRaspberry PiReactReact NativeRustSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaMicrocontrollerNode.jsPythonReactReact NativeRustUnityWeb
SummaryPorcupineSnowboyPocketSphinx
Wake Word TipsFAQ
Introduction
AndroidCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustUnityWeb
SummaryPicovoice RhinoGoogle DialogflowAmazon LexIBM WatsonMicrosoft LUIS
Expression SyntaxFAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiRustWebWindows
AndroidC.NETiOSNode.jsPythonRustWeb
SummaryPicovoice CobraWebRTC VAD
FAQ
Introduction
AndroidC.NETFlutteriOSNode.jsPythonReact NativeRustUnityWeb
AndroidC.NETFlutteriOSNode.jsPythonReact NativeRustUnityWeb
Introduction
C.NETNode.jsPython
C.NETNode.jsPython
FAQGlossary

picoLLM Inference Engine
iOS API

API Reference for the picoLLM iOS SDK (Cocoapod)


PicoLLM

public class PicoLLM { }

Class for the picoLLM Inference Engine.

Resources should be cleaned when you are done using the delete() function.


PicoLLM.model

public let model: String

Getter for the model's information.

Returns

  • String : Model name.

PicoLLM.contextLength

public let contextLength: Int32

Getter for the model's context length.

Returns

  • Int32 : Context length.

PicoLLM.version

public static let version: String

Current picoLLM version.


PicoLLM.maxTopChoices

public static let maxTopChoices: Int32

Maximum number of top choices for .generate().


PicoLLM.init()

public init(
accessKey: String,
modelPath: String,
device: String = "best:0"
) throws -> PicoLLM

init method for picoLLM Inference Engine with a mixture of arguments.

Parameters

  • accessKey String : The AccessKey obtained from Picovoice Console.
  • modelPath String : Absolute path to file containing model parameters (.pllm).
  • device String : String representation of the device (e.g., CPU or GPU) to use for inference. If set to best, picoLLM picks the most suitable device. If set to gpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument to gpu:${GPU_INDEX}, where ${GPU_INDEX} is the index of the target GPU. If set to cpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument to cpu:${NUM_THREADS}, where ${NUM_THREADS} is the desired number of threads.

Throws

  • PicoLLMError: If an error occurs while creating an instance of picoLLM Inference Engine.

PicoLLM.delete()

Releases resources acquired by picoLLM Inference Engine.

public func delete()

PicoLLM.generate()

public func generate(
prompt: String,
completionTokenLimit: Int32? = nil,
stopPhrases: [String]? = nil,
seed: Int32? = nil,
presencePenalty: Float = 0.0,
frequencyPenalty: Float = 0.0,
temperature: Float = 0.0,
topP: Float = 1.0,
numTopChoices: Int32 = 0,
streamCallback: ((String) -> Void)? = nil
) throws -> PicoLLMCompletion

Given a text prompt and a set of generation parameters, creates a completion text and relevant metadata.

Parameters

  • prompt String : Text prompt.
  • completionTokenLimit Int32 : Maximum number of tokens in the completion. If the generation process stops due to reaching this limit, the endpoint output argument will be PicoLLMEndpoint.completionTokenLimitReached. Set to nil to impose no limit.
  • stopPhrases [String]? : The generation process stops when it encounters any of these phrases in the completion. The already generated completion, including the encountered stop phrase, will be returned. The endpoint output argument will be PicoLLMEndpoint.stopPhraseEncountered. Set to nil to turn off this feature.
  • seed Int32 : The internal random number generator uses it as its seed if set to a positive integer value. Seeding enforces deterministic outputs. Set to nil for randomized outputs for a given prompt.
  • presencePenalty Float : It penalizes logits already appearing in the partial completion if set to a positive value. If set to 0.0, it has no effect.
  • frequencyPenalty Float : If set to a positive floating-point value, it penalizes logits proportional to the frequency of their appearance in the partial completion. If set to 0.0, it has no effect.
  • temperature Float : Sampling temperature. Temperature is a non-negative floating-point value that controls the randomness of the sampler. A higher temperature smoothens the samplers' output, increasing the randomness. In contrast, a lower temperature creates a narrower distribution and reduces variability. Setting it to 0 selects the maximum logit during sampling.
  • topP Float : A positive floating-point number within (0, 1]. It restricts the sampler's choices to high-probability logits that form the topP portion of the probability mass. Hence, it avoids randomly selecting unlikely logits. A value of 1. enables the sampler to pick any token with non-zero probability, turning off the feature.
  • numTopChoices Int32 : If set to a positive value, picoLLM returns the list of the highest probability tokens for any generated token. Set to 0 to turn off the feature. The maximum number of top choices is .maxTopChoices().
  • stream_callback ((String) -> Void)? : If not set to nil, picoLLM executes this callback every time a new piece of completion string becomes available.

Returns

  • PicoLLMCompletion : Object containing stats and generated tokens.

Throws

  • PicoLLMError: If there is an error while generating the prompt.

PicoLLM.interrupt()

Interrupts .generate() if generation is in progress. Otherwise, it has no effect.

public func interrupt() throws

Throws

  • PicoLLMError: If interrupt fails.

PicoLLM.tokenize()

public func tokenize(
text: String,
bos: Bool,
eos: Bool
) throws -> [Int32]

Tokenizes a given text using the model's tokenizer. This is a low-level function meant for benchmarking and advanced usage. .generate() should be used when possible.

Parameters

  • text String : Text.
  • bos Bool : If set to true, the tokenizer prepends the beginning of the sentence token to the result.
  • eos Bool : If set to true, the tokenizer appends the end of the sentence token to the result.

Returns

  • [Int32] : Tokens representing the input text.

Throws

  • PicoLLMError: If there is an error while tokenizing.

PicoLLM.forward()

Perform a single forward pass given a token and return the logits. .generate() should be used when possible.

public func forward(token: Int32) throws -> [Float]

Parameters

  • token int32_t : Input token.

Returns

  • [Float] : Logits.

Throws

  • PicoLLMError: If there is an error while executing a forward.

PicoLLM.reset()

public func reset() throws

Resets the internal state of LLM. It should be called in conjunction with .forward() when processing a new sequence of tokens. This is a low-level function for benchmarking and advanced usage. .generate() should be used when possible.

Throws

  • PicoLLMError: If there is an error while resetting.

PicoLLM.getAvailableDevices()

public static func getAvailableDevices() throws -> [String]

Gets a list of hardware devices that can be specified when calling .init().

Returns

  • [String] : Array of available hardware devices.

PicoLLMError

public class PicoLLMError : LocalizedError { }

Error thrown if an error occurs within picoLLM Inference engine.

public class PicoLLMMemoryError : PicoLLMError {}
public class PicoLLMIOError : PicoLLMError {}
public class PicoLLMInvalidArgumentError : PicoLLMError {}
public class PicoLLMStopIterationError : PicoLLMError {}
public class PicoLLMKeyError : PicoLLMError {}
public class PicoLLMInvalidStateError : PicoLLMError {}
public class PicoLLMRuntimeError : PicoLLMError {}
public class PicoLLMActivationError : PicoLLMError {}
public class PicoLLMActivationLimitError : PicoLLMError {}
public class PicoLLMActivationThrottledError : PicoLLMError {}
public class PicoLLMActivationRefusedError : PicoLLMError {}

PicoLLMUsage

public struct PicoLLMUsage { }

Struct for the number of tokens in the prompt and completion.


PicoLLMUsage.promptTokens

PicoLLMUsage.promptTokens: Int

Number of tokens in the prompt.


PicoLLMUsage.completionTokens

PicoLLMUsage.completionTokens: Int

Number of tokens in the completion.


PicoLLMEndpoint

public enum PicoLLMEndpoint: Codable {
case endOfSentence
case completionTokenLimitReached
case stopPhraseEncountered
}

Enum for the endpoint detection types.


PicoLLMToken

public struct PicoLLMToken { }

Struct for a token and its associated log probability.


PicoLLMToken.token

PicoLLMToken.token: String

Token.


PicoLLMToken.logProb

PicoLLMToken.logProb: Float

Log probability.


PicoLLMCompletionToken

public struct PicoLLMCompletionToken { }

Struct for a token within completion and top alternative tokens.


PicoLLMCompletionToken.token

PicoLLMCompletionToken.token: PicoLLMToken

Token. See PicoLLMToken.


PicoLLMCompletionToken.topChoices

PicoLLMCompletionToken.topChoices: [PicoLLMToken]

Top choices. See PicoLLMToken.


PicoLLMCompletion

public struct PicoLLMCompletion { }

Result object containing stats and generated tokens.


PicoLLMCompletion.usage

PicoLLMCompletion.usage: PicoLLMUsage

Usage. See PicoLLMUsage.


PicoLLMCompletion.endpoint

PicoLLMCompletion.endpoint: PicoLLMEndpoint

Endpoint. See PicoLLMEndpoint.


PicoLLMCompletion.completionTokens

PicoLLMCompletion.completionTokens: [PicoLLMCompletionToken]

Completion Tokens. See PicoLLMCompletionToken.


PicoLLMCompletion.completion

PicoLLMCompletion.completion: String

Completion string.


PicoLLMDialog

public protocol PicoLLMDialog {}

Protocol representing the picoLLM dialog interface.


PicoLLMDialog.init()

init(history: Int32?, system: String?) throws

init method for PicoLLMDialog.

Parameters

  • history Int32? : The AccessKey obtained from Picovoice Console.
  • system String? : Absolute path to file containing model parameters (.pllm).

Throws

  • PicoLLMError: If an error occurs while creating an instance of PicoLLMDialog.

PicoLLMDialog.addHumanRequest()

func addHumanRequest(content: String) throws

Adds human's request to the dialog.

Parameters

  • content String : Human's request.

Throws

  • PicoLLMError: If an error occurs while adding the human's request.

PicoLLMDialog.addLLMResponse()

func addLLMResponse(content: String) throws

Adds LLM's response to the dialog.

Parameters

  • content String : LLM's response.

Throws

  • PicoLLMError: If an error occurs while adding the LLM's response.

PicoLLMDialog.prompt()

func prompt() throws -> String

Creates a prompt string given parameters passed the constructor and dialog's content.

Returns

  • String : Formatted prompt.

Throws

  • PicoLLMError: If an error occurs while adding the LLM's response.

BasePicoLLMDialog

public class BasePicoLLMDialog { }

BasePicoLLMDialog is a helper class that stores a chat dialog and formats it according to an instruction-tuned LLM's chat template. BasePicoLLMDialog is the base class. Each supported instruction-tuned LLM has an accompanying concrete subclass.


Phi2Dialog

public class Phi2Dialog: BasePicoLLMDialog { }

Dialog helper for phi-2. This is a base class, use one of the mode-specific subclasses.


Phi2Dialog.init()

init(
humanRequestsTag: String,
llmResponsesTag: String,
history: Int32?,
system: String?
) throws

init method for Phi2Dialog.

Parameters

  • humanRequestsTag String : Tag to classify human requests.
  • llmResponsesTag String : Tag to classify llm responses.
  • history Int32? : The AccessKey obtained from Picovoice Console.
  • system String? : Absolute path to file containing model parameters (.pllm).

Throws

  • PicoLLMError: If an error occurs while creating an instance of Phi2Dialog.

Phi2QADialog

public class Phi2QADialog: Phi2Dialog { }

Dialog helper for phi-2 qa mode.


Phi2ChatDialog

public class Phi2ChatDialog: Phi2Dialog { }

Dialog helper for phi-2 chat mode.


Phi3ChatDialog

public class Phi3ChatDialog: BasePicoLLMDialog { }

Dialog helper for phi3.


Phi35ChatDialog

public class Phi35ChatDialog: Phi3ChatDialog { }

Dialog helper for phi3.5.


MistralChatDialog

public class MistralChatDialog: BasePicoLLMDialog { }

Dialog helper for mistral-7b-instruct-v0.1 and mistral-7b-instruct-v0.2.


MixtralChatDialog

public class MixtralChatDialog: MistralChatDialog { }

Dialog helper for mixtral-8x7b-instruct-v0.1.


Llama2ChatDialog

public class Llama2ChatDialog: BasePicoLLMDialog { }

Dialog helper for llama-2-7b-chat, llama-2-13b-chat, and llama-2-70b-chat.


Llama3ChatDialog

public class Llama3ChatDialog: BasePicoLLMDialog { }

Dialog helper for llama-3-8b-instruct and llama-3-70b-instruct.


Llama32ChatDialog

public class Llama32ChatDialog: Llama3ChatDialog { }

Dialog helper for llama-3.2-1b-instruct and llama-3.2-3b-instruct.


GemmaChatDialog

public class GemmaChatDialog: BasePicoLLMDialog { }

Dialog helper for gemma-2b-it and gemma-7b-it.


Was this doc helpful?

Issue with this doc?

Report a GitHub Issue
picoLLM Inference Engine iOS API
  • PicoLLM
  • model
  • contextLength
  • version
  • maxTopChoices
  • init()
  • delete()
  • generate()
  • interrupt()
  • tokenize()
  • forward()
  • reset()
  • getAvailableDevices()
  • PicoLLMError
  • PicoLLMUsage
  • promptTokens
  • completionTokens
  • PicoLLMEndpoint
  • PicoLLMToken
  • token
  • logProb
  • PicoLLMCompletionToken
  • token
  • topChoices
  • PicoLLMCompletion
  • usage
  • endpoint
  • completionTokens
  • completion
  • PicoLLMDialog
  • init()
  • addHumanRequest()
  • addLLMResponse()
  • prompt()
  • BasePicoLLMDialog
  • Phi2Dialog
  • init()
  • Phi2QADialog
  • Phi2ChatDialog
  • Phi3ChatDialog
  • Phi35ChatDialog
  • MistralChatDialog
  • MixtralChatDialog
  • Llama2ChatDialog
  • Llama3ChatDialog
  • Llama32ChatDialog
  • GemmaChatDialog
Voice AI
  • Leopard Speech-to-Text
  • Cheetah Streaming Speech-to-Text
  • Orca Text-to-Speech
  • Koala Noise Suppression
  • Eagle Speaker Recognition
  • Falcon Speaker Diarization
  • Porcupine Wake Word
  • Rhino Speech-to-Intent
  • Cobra Voice Activity Detection
Local LLM
  • picoLLM Inference
  • picoLLM Compression
  • picoLLM GYM
Resources
  • Docs
  • Console
  • Blog
  • Use Cases
  • Playground
Sales & Services
  • Consulting
  • Foundation Plan
  • Enterprise Plan
  • Enterprise Support
Company
  • About us
  • Careers
Follow Picovoice
  • LinkedIn
  • GitHub
  • X
  • YouTube
  • AngelList
Subscribe to our newsletter
Terms of Use
Privacy Policy
© 2019-2025 Picovoice Inc.