picoLLM Inference Engine
Python API

API Reference for the picoLLM Python SDK (PyPI).

picollm.`available_devices()`

def available_devices(library_path: Optional[str] = None) -> Sequence[str]

Lists all available devices that picoLLM can use for inference. Each entry in the list can be used as the device argument of the .create() factory method or the PicoLLM constructor.

Parameters

library_path Optional[str] : Absolute path to picoLLM's dynamic library. If not set, it will default to the location.

Returns

Sequence[str] : List of all available devices that picoLLM can use for inference.

Throws

PicoLLMError

picollm.`create()`

def create(
        access_key: str,
        model_path: str,
        device: Optional[str] = None,
        library_path: Optional[str] = None) -> PicoLLM

Factory method for picoLLM inference engine.

Parameters

access_key str : AccessKey obtained from Picovoice Console.
model_path str : Absolute path to the file containing LLM parameters.
device Optional[str] : String representation of the device (e.g., CPU or GPU) to use for inference. If set to best, picoLLM picks the most suitable device. If set to gpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument to gpu:${GPU_INDEX}, where ${GPU_INDEX} is the index of the target GPU. If set to cpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument to cpu:${NUM_THREADS}, where ${NUM_THREADS} is the desired number of threads. If set to None, best device will be used.
library_path Optional[str] : Absolute path to picoLLM's dynamic library. If not set it will be set to the default location.

Returns

PicoLLM : An instance of picoLLM inference engine.

Throws

PicoLLMError

PicoLLM can be initialized either using the module level create() function or directly using the class __init__() method. Resources should be cleaned when you are done using the delete() method.

picollm.PicoLLM.`model`

@property
def model(self) -> str

Getter for model's name.

Returns

str : Model name.

picollm.PicoLLM.`context_length`

@property
def context_length(self) -> int

Getter for model's context length.

Returns

int : Context length.

picollm.PicoLLM.`version`

@property
def version(self) -> str

Getter for version.

Returns

str : Version string.

picollm.PicoLLM.`max_top_choices`

@property
def max_top_choices(self) -> int

Getter for maximum number of top choices.

Returns

int : Maximum number of top choices.

picollm.PicoLLM.`init()`

def __init__(
        self,
        access_key: str,
        model_path: str,
        device: str,
        library_path: str) -> None

Constructor for the PicoLLM class.

Parameters

access_key str : AccessKey obtained from Picovoice Console.
model_path str : Absolute path to the file containing LLM parameters (.pllm).
device str : String representation of the device (e.g., CPU or GPU) to use for inference. If set to best, picoLLM picks the most suitable device. If set to gpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument to gpu:${GPU_INDEX}, where ${GPU_INDEX} is the index of the target GPU. If set to cpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument to cpu:${NUM_THREADS}, where ${NUM_THREADS} is the desired number of threads.
library_path str : Absolute path to picoLLM's dynamic library.

Throws

PicoLLMError

picollm.PicoLLM.`generate()`

def generate(
        self,
        prompt: str,
        completion_token_limit: Optional[int] = None,
        stop_phrases: Optional[Set[str]] = None,
        seed: Optional[int] = None,
        presence_penalty: float = 0.,
        frequency_penalty: float = 0.,
        temperature: float = 0.,
        top_p: float = 1.,
        num_top_choices: int = 0,
        stream_callback: Callable[[str], None] = None) -> PicoLLMCompletion

Given a text prompt and a set of generation parameters, creates a completion text and relevant metadata.

Parameters

prompt str : Prompt.
completion_token_limit Optional[int] : Maximum number of tokens in the completion. If the generation process stops due to reaching this limit, the .endpoint parameter in PicoLLMCompletion output will be PicoLLMEndpoints.COMPLETION_TOKEN_LIMIT_REACHED. Set to None to impose no limit.
stop_phrases Optional[Set[str]] : The generation process stops when it encounters any of these phrases in the completion. The already generated completion, including the encountered stop phrase, will be returned. The endpoint parameter in PicoLLMCompletion output will be PicoLLMEndpoints.STOP_PHRASE_ENCOUNTERED. Set to None to turn off this feature.
seed Optional[int] : The internal random number generator uses it as its seed if set to a positive integer value. Seeding enforces deterministic outputs. Set to None for randomized outputs for a given prompt.
presence_penalty float : It penalizes logits already appearing in the partial completion if set to a positive value. If set to 0.0, it has no effect.
frequency_penalty float : If set to a positive floating-point value, it penalizes logits proportional to the frequency of their appearance in the partial completion. If set to 0.0, it has no effect.
temperature float : Sampling temperature. Temperature is a non-negative floating-point value that controls the randomness of the sampler. A higher temperature smoothens the samplers' output, increasing the randomness. In contrast, a lower temperature creates a narrower distribution and reduces variability. Setting it to 0 selects the maximum logit during sampling.
top_p float : A positive floating-point number within 0 and 1. It restricts the sampler's choices to high-probability logits that form the top_p portion of the probability mass. Hence, it avoids randomly selecting unlikely logits. A value of 1. enables the sampler to pick any token with non-zero probability, turning off the feature.
num_top_choices int : If set to a positive value, picoLLM returns the list of the highest probability tokens for any generated token. Set to 0 to turn off the feature. The maximum number of top choices is .max_top_choices.
stream_callback Callable[[str], None] : If not set to None, picoLLM executes this callback every time a new piece of completion string becomes available.

Returns

PicoLLMCompletion : Completion result.

Throws

PicoLLMError

picollm.PicoLLM.`interrupt()`

def interrupt() -> None

Interrupts .generate() if generation is in progress. Otherwise, it has no effect.

Throws

PicoLLMError

picollm.PicoLLM.`tokenize()`

def tokenize(self, text: str, bos: bool, eos: bool) -> Sequence[int]

Tokenizes a given text using the model's tokenizer. This is a low-level function meant for benchmarking and advanced usage. .generate() should be used when possible.

Parameters

text str : Text.
bos bool : If set to True, the tokenizer prepends the beginning of the sentence token to the result.
eos bool : If set to True, the tokenizer appends the end of the sentence token to the result.

Returns

Sequence[int] : Tokens representing the input text.

Throws

PicoLLMError

picollm.PicoLLM.`forward()`

def forward(self, token: int) -> Sequence[float]

Performs a single forward pass given a token and returns the logits. This is a low-level function for benchmarking and advanced usage. .generate() should be used when possible.

Parameters

token int : Input token.

Returns

Sequence[float] : Logits.

Throws

PicoLLMError

picollm.PicoLLM.`reset()`

def reset(self) -> None

Resets the internal state of LLM. It should be called in conjunction with .forward() when processing a new sequence of tokens. This is a low-level function for benchmarking and advanced usage. .generate() should be used when possible.

Throws

PicoLLMError

picollm.PicoLLM.`release()`

def release(self) -> None

Releases resources acquired by picoLLM.

picollm.PicoLLM.`get_dialog()`

def get_dialog(
        self,
        mode: Optional[str] = None,
        history: Optional[int] = 0,
        system: Optional[str] = None) -> Dialog

Return the Dialog object corresponding to the loaded model. The model needs to be instruction-tuned and have a specific chat template.

Parameters

mode Optional[str] : Some models (e.g., phi-2) define multiple chat template modes. For example, phi-2 allows both qa and chat templates.
history Optional[int] : History refers to the number of latest back-and-forths to include in the prompt. Setting history to None will embed the entire dialog in the prompt.
system Optional[str] : System instruction to embed in the prompt for configuring the model's responses.

Returns

Dialog : Constructed dialog object.

Throws

PicoLLMError

picollm.PicoLLMError

class PicoLLMError(Exception)

Error thrown if an error occurs within picoLLM Inference Engine.

Exceptions

class PicoLLMActivationError(PicoLLMError)
class PicoLLMActivationLimitError(PicoLLMError)
class PicoLLMActivationRefusedError(PicoLLMError)
class PicoLLMActivationThrottledError(PicoLLMError)
class PicoLLMIOError(PicoLLMError)
class PicoLLMInvalidArgumentError(PicoLLMError)
class PicoLLMInvalidStateError(PicoLLMError)
class PicoLLMKeyError(PicoLLMError)
class PicoLLMMemoryError(PicoLLMError)
class PicoLLMRuntimeError(PicoLLMError)
class PicoLLMStopIterationError(PicoLLMError)

picollm.PicoLLMUsage

@dataclass
class PicoLLMUsage:
    prompt_tokens: int
    completion_tokens: int

Usage information.

prompt_tokens int : Number of tokens in the prompt.
completion_tokens int : Number of tokens in the completion.

picollm.PicoLLMEndpoints

class PicoLLMEndpoints(Enum)

Reasons for ending the generation process.

END_OF_SENTENCE : 0
COMPLETION_TOKEN_LIMIT_REACHED : 1
STOP_PHRASE_ENCOUNTERED : 2
INTERRUPTED : 3

picollm.PicoLLMToken

@dataclass
class PicoLLMToken:
    token: str
    log_prob: float

Generated token and its log probability.

token str : Token.
log_prob float : Log probability.

picollm.PicoLLMCompletionToken

@dataclass
class PicoLLMCompletionToken:
    token: PicoLLMToken
    top_choices: Sequence[PicoLLMToken]

Generated token within completion and top alternative tokens.

token PicoLLMToken : Token.
top_choices Sequence[PicoLLMToken] : Top choices.

picollm.PicoLLMCompletion

@dataclass
class PicoLLMCompletionToken:
    usage: PicoLLMUsage
    endpoint: PicoLLMEndpoints
    completion_tokens: Sequence[PicoLLMCompletionToken]
    completion: str

LLM completion result.

usage PicoLLMUsage : Usage information.
endpoint PicoLLMEndpoints : Reason for ending the generation process.
completion_tokens Sequence[PicoLLMCompletionToken] : Generated tokens within completion and top alternative tokens.
completion str : Completion string.

picollm.Dialog

class Dialog(object)

Dialog is a helper class that stores a chat dialog and formats it according to an instruction-tuned LLM's chat template. Dialog is the base class. Each supported instruction-tuned LLM has an accompanying concrete subclass.

picollm.Dialog.`init()`

def __init__(self, history: Optional[int] = None, system: Optional[str] = None) -> None

Constructor for the Dialog class.

Parameters

history Optional[int] : History refers to the number of latest back-and-forths to include in the prompt. Setting history to None will embed the entire dialog in the prompt.
system Optional[str] : System instruction to embed in the prompt for configuring the model's responses.

Throws

ValueError : If history is set to a negative value.

picollm.Dialog.`add_human_request()`

def add_human_request(self, content: str) -> None

Adds a human's request to the dialog.

Parameters

content str : Human's request.

Throws

RuntimeError : If a human request is added without entering the last LLM response.

picollm.Dialog.`add_llm_response()`

def add_llm_response(self, content: str) -> None

Adds LLM's response to the dialog.

Parameters

content str : LLM's response.

Throws

RuntimeError : If an LLM response is added without entering the human request.

picollm.GemmaChatDialog

class GemmaChatDialog(Dialog)

Dialog helper for gemma-2b-it and gemma-7b-it.

picollm.GemmaChatDialog.`prompt()`

def prompt(self) -> str

Creates a prompt string for gemma-2b-it and gemma-7b-it models.

Returns

str : Formatted prompt.

Throws

RuntimeError : If a prompt is created without an outstanding human request.

picollm.Llama2ChatDialog

class Llama2ChatDialog(Dialog)

Dialog helper for llama-2-7b-chat, llama-2-13b-chat, and llama-2-70b-chat.

picollm.Llama2ChatDialog.`prompt()`

def prompt(self) -> str

Creates a prompt string for llama-2-7b-chat, llama-2-13b-chat, and llama-2-70b-chat models.

Returns

str : Formatted prompt.

Throws

RuntimeError : If a prompt is created without an outstanding human request.

picollm.Llama3ChatDialog

class Llama3ChatDialog(Dialog)

Dialog helper for llama-3-8b-instruct and llama-3-70b-instruct.

picollm.Llama3ChatDialog.`prompt()`

def prompt(self) -> str

Creates a prompt string for llama-3-8b-instruct and llama-3-70b-instruct models.

Returns

str : Formatted prompt.

Throws

RuntimeError : If a prompt is created without an outstanding human request.

picollm.Llama32ChatDialog

class Llama32ChatDialog(Llama3ChatDialog)

Dialog helper for llama-3.2-1b-instruct and llama-3.2-3b-instruct.

picollm.Llama32ChatDialog.`prompt()`

def prompt(self) -> str

Creates a prompt string for llama-3.2-1b-instruct and llama-3.2-3b-instruct models.

Returns

str : Formatted prompt.

Throws

RuntimeError : If a prompt is created without an outstanding human request.

picollm.MistralChatDialog

class MistralChatDialog(Dialog)

Dialog helper for mistral-7b-instruct-v0.1 and mistral-7b-instruct-v0.2.

picollm.MistralChatDialog.`prompt()`

def prompt(self) -> str

Creates a prompt string for mistral-7b-instruct-v0.1 and mistral-7b-instruct-v0.2 models.

Returns

str : Formatted prompt.

Throws

RuntimeError : If a prompt is created without an outstanding human request.

picollm.MixtralChatDialog

class MixtralChatDialog(MistralChatDialog)

Dialog helper for mixtral-8x7b-instruct-v0.1. This class inherits methods from MistralChatDialog.

picollm.Phi2Dialog

class Phi2Dialog(Dialog)

Dialog helper for phi-2. This is a base class, use one of the mode-specific subclasses.

picollm.Phi2Dialog.`init()`

def __init__(
  self,
  human_tag: str,
  llm_tag: str,
  history: Optional[int] = None,
  system: Optional[str] = None) -> None

Constructor for the Phi2Dialog class.

Parameters

human_tag str : Tag for human input.
llm_tag str : Tag for LLM input.
history Optional[int] : History refers to the number of latest back-and-forths to include in the prompt. Setting history to None will embed the entire dialog in the prompt.
system Optional[str] : System instruction to embed in the prompt for configuring the model's responses.