Picovoice Wordmark
Start Building
Introduction
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryPicovoice picoLLMGPTQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustWeb
SummaryPicovoice LeopardAmazon TranscribeAzure Speech-to-TextGoogle ASRGoogle ASR (Enhanced)IBM Watson Speech-to-TextWhisper Speech-to-Text
FAQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustWeb
SummaryPicovoice Cheetah
FAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryAmazon PollyAzure TTSElevenLabsOpenAI TTSPicovoice Orca
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice KoalaMozilla RNNoise
Introduction
AndroidCiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidCNode.jsPythoniOSWeb
SummaryPicovoice EaglepyannoteSpeechBrainWeSpeaker
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice FalconAmazon TranscribeAzure Speech-to-TextGoogle Speech-to-Textpyannote
Introduction
AndroidArduinoCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSMicrocontrollerNode.jsPythonRaspberry PiReactReact NativeRustSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaMicrocontrollerNode.jsPythonReactReact NativeRustUnityWeb
SummaryPorcupineSnowboyPocketSphinx
Wake Word TipsFAQ
Introduction
AndroidCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustUnityWeb
SummaryPicovoice RhinoGoogle DialogflowAmazon LexIBM WatsonMicrosoft LUIS
Expression SyntaxFAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiRustWebWindows
AndroidC.NETiOSNode.jsPythonRustWeb
SummaryPicovoice CobraWebRTC VAD
FAQ
Introduction
AndroidC.NETFlutteriOSNode.jsPythonReact NativeRustUnityWeb
AndroidC.NETFlutteriOSNode.jsPythonReact NativeRustUnityWeb
Introduction
C.NETNode.jsPython
C.NETNode.jsPython
FAQGlossary

picoLLM Inference Engine
Python API

API Reference for the picoLLM Python SDK (PyPI).


picollm.available_devices()

def available_devices(library_path: Optional[str] = None) -> Sequence[str]

Lists all available devices that picoLLM can use for inference. Each entry in the list can be used as the device argument of the .create() factory method or the PicoLLM constructor.

Parameters

  • library_path Optional[str] : Absolute path to picoLLM's dynamic library. If not set, it will default to the location.

Returns

  • Sequence[str] : List of all available devices that picoLLM can use for inference.

Throws

  • PicoLLMError

picollm.create()

def create(
access_key: str,
model_path: str,
device: Optional[str] = None,
library_path: Optional[str] = None) -> PicoLLM

Factory method for picoLLM inference engine.

Parameters

  • access_key str : AccessKey obtained from Picovoice Console.
  • model_path str : Absolute path to the file containing LLM parameters.
  • device Optional[str] : String representation of the device (e.g., CPU or GPU) to use for inference. If set to best, picoLLM picks the most suitable device. If set to gpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument to gpu:${GPU_INDEX}, where ${GPU_INDEX} is the index of the target GPU. If set to cpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument to cpu:${NUM_THREADS}, where ${NUM_THREADS} is the desired number of threads. If set to None, best device will be used.
  • library_path Optional[str] : Absolute path to picoLLM's dynamic library. If not set it will be set to the default location.

Returns

  • PicoLLM : An instance of picoLLM inference engine.

Throws

  • PicoLLMError

picollm.PicoLLM

class PicoLLM(object)

Class for the picoLLM Inference Engine.

PicoLLM can be initialized either using the module level create() function or directly using the class __init__() method. Resources should be cleaned when you are done using the delete() method.


picollm.PicoLLM.model

@property
def model(self) -> str

Getter for model's name.

Returns

  • str : Model name.

picollm.PicoLLM.context_length

@property
def context_length(self) -> int

Getter for model's context length.

Returns

  • int : Context length.

picollm.PicoLLM.version

@property
def version(self) -> str

Getter for version.

Returns

  • str : Version string.

picollm.PicoLLM.max_top_choices

@property
def max_top_choices(self) -> int

Getter for maximum number of top choices.

Returns

  • int : Maximum number of top choices.

picollm.PicoLLM.__init__()

def __init__(
self,
access_key: str,
model_path: str,
device: str,
library_path: str) -> None

Constructor for the PicoLLM class.

Parameters

  • access_key str : AccessKey obtained from Picovoice Console.
  • model_path str : Absolute path to the file containing LLM parameters (.pllm).
  • device str : String representation of the device (e.g., CPU or GPU) to use for inference. If set to best, picoLLM picks the most suitable device. If set to gpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument to gpu:${GPU_INDEX}, where ${GPU_INDEX} is the index of the target GPU. If set to cpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument to cpu:${NUM_THREADS}, where ${NUM_THREADS} is the desired number of threads.
  • library_path str : Absolute path to picoLLM's dynamic library.

Throws

  • PicoLLMError

picollm.PicoLLM.generate()

def generate(
self,
prompt: str,
completion_token_limit: Optional[int] = None,
stop_phrases: Optional[Set[str]] = None,
seed: Optional[int] = None,
presence_penalty: float = 0.,
frequency_penalty: float = 0.,
temperature: float = 0.,
top_p: float = 1.,
num_top_choices: int = 0,
stream_callback: Callable[[str], None] = None) -> PicoLLMCompletion

Given a text prompt and a set of generation parameters, creates a completion text and relevant metadata.

Parameters

  • prompt str : Prompt.
  • completion_token_limit Optional[int] : Maximum number of tokens in the completion. If the generation process stops due to reaching this limit, the .endpoint parameter in PicoLLMCompletion output will be PicoLLMEndpoints.COMPLETION_TOKEN_LIMIT_REACHED. Set to None to impose no limit.
  • stop_phrases Optional[Set[str]] : The generation process stops when it encounters any of these phrases in the completion. The already generated completion, including the encountered stop phrase, will be returned. The endpoint parameter in PicoLLMCompletion output will be PicoLLMEndpoints.STOP_PHRASE_ENCOUNTERED. Set to None to turn off this feature.
  • seed Optional[int] : The internal random number generator uses it as its seed if set to a positive integer value. Seeding enforces deterministic outputs. Set to None for randomized outputs for a given prompt.
  • presence_penalty float : It penalizes logits already appearing in the partial completion if set to a positive value. If set to 0.0, it has no effect.
  • frequency_penalty float : If set to a positive floating-point value, it penalizes logits proportional to the frequency of their appearance in the partial completion. If set to 0.0, it has no effect.
  • temperature float : Sampling temperature. Temperature is a non-negative floating-point value that controls the randomness of the sampler. A higher temperature smoothens the samplers' output, increasing the randomness. In contrast, a lower temperature creates a narrower distribution and reduces variability. Setting it to 0 selects the maximum logit during sampling.
  • top_p float : A positive floating-point number within 0 and 1. It restricts the sampler's choices to high-probability logits that form the top_p portion of the probability mass. Hence, it avoids randomly selecting unlikely logits. A value of 1. enables the sampler to pick any token with non-zero probability, turning off the feature.
  • num_top_choices int : If set to a positive value, picoLLM returns the list of the highest probability tokens for any generated token. Set to 0 to turn off the feature. The maximum number of top choices is .max_top_choices.
  • stream_callback Callable[[str], None] : If not set to None, picoLLM executes this callback every time a new piece of completion string becomes available.

Returns

  • PicoLLMCompletion : Completion result.

Throws

  • PicoLLMError

picollm.PicoLLM.interrupt()

def interrupt() -> None

Interrupts .generate() if generation is in progress. Otherwise, it has no effect.

Throws

  • PicoLLMError

picollm.PicoLLM.tokenize()

def tokenize(self, text: str, bos: bool, eos: bool) -> Sequence[int]

Tokenizes a given text using the model's tokenizer. This is a low-level function meant for benchmarking and advanced usage. .generate() should be used when possible.

Parameters

  • text str : Text.
  • bos bool : If set to True, the tokenizer prepends the beginning of the sentence token to the result.
  • eos bool : If set to True, the tokenizer appends the end of the sentence token to the result.

Returns

  • Sequence[int] : Tokens representing the input text.

Throws

  • PicoLLMError

picollm.PicoLLM.forward()

def forward(self, token: int) -> Sequence[float]

Performs a single forward pass given a token and returns the logits. This is a low-level function for benchmarking and advanced usage. .generate() should be used when possible.

Parameters

  • token int : Input token.

Returns

  • Sequence[float] : Logits.

Throws

  • PicoLLMError

picollm.PicoLLM.reset()

def reset(self) -> None

Resets the internal state of LLM. It should be called in conjunction with .forward() when processing a new sequence of tokens. This is a low-level function for benchmarking and advanced usage. .generate() should be used when possible.

Throws

  • PicoLLMError

picollm.PicoLLM.release()

def release(self) -> None

Releases resources acquired by picoLLM.


picollm.PicoLLM.get_dialog()

def get_dialog(
self,
mode: Optional[str] = None,
history: Optional[int] = 0,
system: Optional[str] = None) -> Dialog

Return the Dialog object corresponding to the loaded model. The model needs to be instruction-tuned and have a specific chat template.

Parameters

  • mode Optional[str] : Some models (e.g., phi-2) define multiple chat template modes. For example, phi-2 allows both qa and chat templates.
  • history Optional[int] : History refers to the number of latest back-and-forths to include in the prompt. Setting history to None will embed the entire dialog in the prompt.
  • system Optional[str] : System instruction to embed in the prompt for configuring the model's responses.

Returns

  • Dialog : Constructed dialog object.

Throws

  • PicoLLMError

picollm.PicoLLMError

class PicoLLMError(Exception)

Error thrown if an error occurs within picoLLM Inference Engine.

Exceptions

class PicoLLMActivationError(PicoLLMError)
class PicoLLMActivationLimitError(PicoLLMError)
class PicoLLMActivationRefusedError(PicoLLMError)
class PicoLLMActivationThrottledError(PicoLLMError)
class PicoLLMIOError(PicoLLMError)
class PicoLLMInvalidArgumentError(PicoLLMError)
class PicoLLMInvalidStateError(PicoLLMError)
class PicoLLMKeyError(PicoLLMError)
class PicoLLMMemoryError(PicoLLMError)
class PicoLLMRuntimeError(PicoLLMError)
class PicoLLMStopIterationError(PicoLLMError)

picollm.PicoLLMUsage

@dataclass
class PicoLLMUsage:
prompt_tokens: int
completion_tokens: int

Usage information.

  • prompt_tokens int : Number of tokens in the prompt.
  • completion_tokens int : Number of tokens in the completion.

picollm.PicoLLMEndpoints

class PicoLLMEndpoints(Enum)

Reasons for ending the generation process.

  • END_OF_SENTENCE : 0
  • COMPLETION_TOKEN_LIMIT_REACHED : 1
  • STOP_PHRASE_ENCOUNTERED : 2
  • INTERRUPTED : 3

picollm.PicoLLMToken

@dataclass
class PicoLLMToken:
token: str
log_prob: float

Generated token and its log probability.

  • token str : Token.
  • log_prob float : Log probability.

picollm.PicoLLMCompletionToken

@dataclass
class PicoLLMCompletionToken:
token: PicoLLMToken
top_choices: Sequence[PicoLLMToken]

Generated token within completion and top alternative tokens.

  • token PicoLLMToken : Token.
  • top_choices Sequence[PicoLLMToken] : Top choices.

picollm.PicoLLMCompletion

@dataclass
class PicoLLMCompletionToken:
usage: PicoLLMUsage
endpoint: PicoLLMEndpoints
completion_tokens: Sequence[PicoLLMCompletionToken]
completion: str

LLM completion result.

  • usage PicoLLMUsage : Usage information.
  • endpoint PicoLLMEndpoints : Reason for ending the generation process.
  • completion_tokens Sequence[PicoLLMCompletionToken] : Generated tokens within completion and top alternative tokens.
  • completion str : Completion string.

picollm.Dialog

class Dialog(object)

Dialog is a helper class that stores a chat dialog and formats it according to an instruction-tuned LLM's chat template. Dialog is the base class. Each supported instruction-tuned LLM has an accompanying concrete subclass.


picollm.Dialog.__init__()

def __init__(self, history: Optional[int] = None, system: Optional[str] = None) -> None

Constructor for the Dialog class.

Parameters

  • history Optional[int] : History refers to the number of latest back-and-forths to include in the prompt. Setting history to None will embed the entire dialog in the prompt.
  • system Optional[str] : System instruction to embed in the prompt for configuring the model's responses.

Throws

  • ValueError : If history is set to a negative value.

picollm.Dialog.add_human_request()

def add_human_request(self, content: str) -> None

Adds a human's request to the dialog.

Parameters

  • content str : Human's request.

Throws

  • RuntimeError : If a human request is added without entering the last LLM response.

picollm.Dialog.add_llm_response()

def add_llm_response(self, content: str) -> None

Adds LLM's response to the dialog.

Parameters

  • content str : LLM's response.

Throws

  • RuntimeError : If an LLM response is added without entering the human request.

picollm.GemmaChatDialog

class GemmaChatDialog(Dialog)

Dialog helper for gemma-2b-it and gemma-7b-it.


picollm.GemmaChatDialog.prompt()

def prompt(self) -> str

Creates a prompt string for gemma-2b-it and gemma-7b-it models.

Returns

  • str : Formatted prompt.

Throws

  • RuntimeError : If a prompt is created without an outstanding human request.

picollm.Llama2ChatDialog

class Llama2ChatDialog(Dialog)

Dialog helper for llama-2-7b-chat, llama-2-13b-chat, and llama-2-70b-chat.


picollm.Llama2ChatDialog.prompt()

def prompt(self) -> str

Creates a prompt string for llama-2-7b-chat, llama-2-13b-chat, and llama-2-70b-chat models.

Returns

  • str : Formatted prompt.

Throws

  • RuntimeError : If a prompt is created without an outstanding human request.

picollm.Llama3ChatDialog

class Llama3ChatDialog(Dialog)

Dialog helper for llama-3-8b-instruct and llama-3-70b-instruct.


picollm.Llama3ChatDialog.prompt()

def prompt(self) -> str

Creates a prompt string for llama-3-8b-instruct and llama-3-70b-instruct models.

Returns

  • str : Formatted prompt.

Throws

  • RuntimeError : If a prompt is created without an outstanding human request.

picollm.Llama32ChatDialog

class Llama32ChatDialog(Llama3ChatDialog)

Dialog helper for llama-3.2-1b-instruct and llama-3.2-3b-instruct.


picollm.Llama32ChatDialog.prompt()

def prompt(self) -> str

Creates a prompt string for llama-3.2-1b-instruct and llama-3.2-3b-instruct models.

Returns

  • str : Formatted prompt.

Throws

  • RuntimeError : If a prompt is created without an outstanding human request.

picollm.MistralChatDialog

class MistralChatDialog(Dialog)

Dialog helper for mistral-7b-instruct-v0.1 and mistral-7b-instruct-v0.2.


picollm.MistralChatDialog.prompt()

def prompt(self) -> str

Creates a prompt string for mistral-7b-instruct-v0.1 and mistral-7b-instruct-v0.2 models.

Returns

  • str : Formatted prompt.

Throws

  • RuntimeError : If a prompt is created without an outstanding human request.

picollm.MixtralChatDialog

class MixtralChatDialog(MistralChatDialog)

Dialog helper for mixtral-8x7b-instruct-v0.1. This class inherits methods from MistralChatDialog.


picollm.Phi2Dialog

class Phi2Dialog(Dialog)

Dialog helper for phi-2. This is a base class, use one of the mode-specific subclasses.


picollm.Phi2Dialog.__init__()

def __init__(
self,
human_tag: str,
llm_tag: str,
history: Optional[int] = None,
system: Optional[str] = None) -> None

Constructor for the Phi2Dialog class.

Parameters

  • human_tag str : Tag for human input.
  • llm_tag str : Tag for LLM input.
  • history Optional[int] : History refers to the number of latest back-and-forths to include in the prompt. Setting history to None will embed the entire dialog in the prompt.
  • system Optional[str] : System instruction to embed in the prompt for configuring the model's responses.

picollm.Phi2Dialog.prompt()

def prompt(self) -> str

Creates a prompt string for phi-2 model.

Returns

  • str : Formatted prompt.

Throws

  • RuntimeError : If a prompt is created without an outstanding human request.

picollm.Phi2QADialog

class Phi2QADialog(Phi2Dialog)

Dialog helper for phi-2 in qa mode. This class inherits methods from Phi2Dialog.


picollm.Phi2ChatDialog

class Phi2ChatDialog(Phi2Dialog)

Dialog helper for phi-2 in chat mode. This class inherits methods from Phi2Dialog.


picollm.Phi3Dialog

class Phi3Dialog(Dialog)

Dialog helper for phi3.


picollm.Phi35Dialog

class Phi35Dialog(Phi3Dialog)

Dialog helper for phi3.5.


Was this doc helpful?

Issue with this doc?

Report a GitHub Issue
picoLLM Inference Engine Python API
  • picollm.available_devices()
  • picollm.create()
  • picollm.PicoLLM
  • model
  • context_length
  • version
  • max_top_choices
  • __init__()
  • generate()
  • interrupt()
  • tokenize()
  • forward()
  • reset()
  • release()
  • get_dialog()
  • picollm.PicoLLMError
  • picollm.PicoLLMUsage
  • picollm.PicoLLMEndpoints
  • picollm.PicoLLMToken
  • picollm.PicoLLMCompletionToken
  • picollm.PicoLLMCompletion
  • picollm.Dialog
  • __init__()
  • add_human_request()
  • add_llm_response()
  • picollm.GemmaChatDialog
  • prompt()
  • picollm.Llama2ChatDialog
  • prompt()
  • picollm.Llama3ChatDialog
  • prompt()
  • picollm.Llama32ChatDialog
  • prompt()
  • picollm.MistralChatDialog
  • prompt()
  • picollm.MixtralChatDialog
  • picollm.Phi2Dialog
  • __init__()
  • prompt()
  • picollm.Phi2QADialog
  • picollm.Phi2ChatDialog
  • picollm.Phi3Dialog
  • picollm.Phi35Dialog
Voice AI
  • Leopard Speech-to-Text
  • Cheetah Streaming Speech-to-Text
  • Orca Text-to-Speech
  • Koala Noise Suppression
  • Eagle Speaker Recognition
  • Falcon Speaker Diarization
  • Porcupine Wake Word
  • Rhino Speech-to-Intent
  • Cobra Voice Activity Detection
Local LLM
  • picoLLM Inference
  • picoLLM Compression
  • picoLLM GYM
Resources
  • Docs
  • Console
  • Blog
  • Use Cases
  • Playground
Sales & Services
  • Consulting
  • Foundation Plan
  • Enterprise Plan
  • Enterprise Support
Company
  • About us
  • Careers
Follow Picovoice
  • LinkedIn
  • GitHub
  • X
  • YouTube
  • AngelList
Subscribe to our newsletter
Terms of Use
Privacy Policy
© 2019-2025 Picovoice Inc.