picoLLM Inference Engine
Python API
API Reference for the picoLLM Python SDK (PyPI).
picollm.available_devices()
Lists all available devices that picoLLM can use for inference. Each entry in the list can be used as the device
argument of the .create()
factory method or the PicoLLM
constructor.
Parameters
library_path
Optional[str] : Absolute path to picoLLM's dynamic library. If not set, it will default to the location.
Returns
Sequence[str]
: List of all available devices that picoLLM can use for inference.
Throws
picollm.create()
Factory method for picoLLM inference engine.
Parameters
access_key
str : AccessKey obtained from Picovoice Console.model_path
str : Absolute path to the file containing LLM parameters.device
Optional[str] : String representation of the device (e.g., CPU or GPU) to use for inference. If set tobest
, picoLLM picks the most suitable device. If set togpu
, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}
, where${GPU_INDEX}
is the index of the target GPU. If set tocpu
, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}
, where${NUM_THREADS}
is the desired number of threads. If set toNone
,best
device will be used.library_path
Optional[str] : Absolute path to picoLLM's dynamic library. If not set it will be set to the default location.
Returns
PicoLLM
: An instance of picoLLM inference engine.
Throws
picollm.PicoLLM
Class for the picoLLM Inference Engine.
PicoLLM can be initialized either using the module level create()
function
or directly using the class __init__()
method.
Resources should be cleaned when you are done using the delete()
method.
picollm.PicoLLM.model
Getter for model's name.
Returns
str
: Model name.
picollm.PicoLLM.context_length
Getter for model's context length.
Returns
int
: Context length.
picollm.PicoLLM.version
Getter for version.
Returns
str
: Version string.
picollm.PicoLLM.max_top_choices
Getter for maximum number of top choices.
Returns
int
: Maximum number of top choices.
picollm.PicoLLM.__init__()
Constructor for the PicoLLM
class.
Parameters
access_key
str : AccessKey obtained from Picovoice Console.model_path
str : Absolute path to the file containing LLM parameters (.pllm
).device
str : String representation of the device (e.g., CPU or GPU) to use for inference. If set tobest
, picoLLM picks the most suitable device. If set togpu
, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}
, where${GPU_INDEX}
is the index of the target GPU. If set tocpu
, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}
, where${NUM_THREADS}
is the desired number of threads.library_path
str : Absolute path to picoLLM's dynamic library.
Throws
picollm.PicoLLM.generate()
Given a text prompt and a set of generation parameters, creates a completion text and relevant metadata.
Parameters
prompt
str : Prompt.completion_token_limit
Optional[int] : Maximum number of tokens in the completion. If the generation process stops due to reaching this limit, the.endpoint
parameter inPicoLLMCompletion
output will bePicoLLMEndpoints.COMPLETION_TOKEN_LIMIT_REACHED
. Set toNone
to impose no limit.stop_phrases
Optional[Set[str]] : The generation process stops when it encounters any of these phrases in the completion. The already generated completion, including the encountered stop phrase, will be returned. Theendpoint
parameter inPicoLLMCompletion
output will bePicoLLMEndpoints.STOP_PHRASE_ENCOUNTERED
. Set toNone
to turn off this feature.seed
Optional[int] : The internal random number generator uses it as its seed if set to a positive integer value. Seeding enforces deterministic outputs. Set toNone
for randomized outputs for a given prompt.presence_penalty
float : It penalizes logits already appearing in the partial completion if set to a positive value. If set to0.0
, it has no effect.frequency_penalty
float : If set to a positive floating-point value, it penalizes logits proportional to the frequency of their appearance in the partial completion. If set to0.0
, it has no effect.temperature
float : Sampling temperature. Temperature is a non-negative floating-point value that controls the randomness of the sampler. A higher temperature smoothens the samplers' output, increasing the randomness. In contrast, a lower temperature creates a narrower distribution and reduces variability. Setting it to0
selects the maximum logit during sampling.top_p
float : A positive floating-point number within 0 and 1. It restricts the sampler's choices to high-probability logits that form thetop_p
portion of the probability mass. Hence, it avoids randomly selecting unlikely logits. A value of1.
enables the sampler to pick any token with non-zero probability, turning off the feature.num_top_choices
int : If set to a positive value, picoLLM returns the list of the highest probability tokens for any generated token. Set to0
to turn off the feature. The maximum number of top choices is.max_top_choices
.stream_callback
Callable[[str], None] : If not set toNone
, picoLLM executes this callback every time a new piece of completion string becomes available.
Returns
PicoLLMCompletion
: Completion result.
Throws
picollm.PicoLLM.interrupt()
Interrupts .generate()
if generation is in progress. Otherwise, it has no effect.
Throws
picollm.PicoLLM.tokenize()
Tokenizes a given text using the model's tokenizer. This is a low-level function meant for benchmarking and advanced usage. .generate()
should be used when possible.
Parameters
text
str : Text.bos
bool : If set toTrue
, the tokenizer prepends the beginning of the sentence token to the result.eos
bool : If set toTrue
, the tokenizer appends the end of the sentence token to the result.
Returns
Sequence[int]
: Tokens representing the input text.
Throws
picollm.PicoLLM.forward()
Performs a single forward pass given a token and returns the logits. This is a low-level function for benchmarking and advanced usage. .generate()
should be used when possible.
Parameters
token
int : Input token.
Returns
Sequence[float]
: Logits.
Throws
picollm.PicoLLM.reset()
Resets the internal state of LLM. It should be called in conjunction with .forward()
when processing a new sequence of tokens. This is a low-level function for benchmarking and advanced usage. .generate()
should be used when possible.
Throws
picollm.PicoLLM.release()
Releases resources acquired by picoLLM.
picollm.PicoLLM.get_dialog()
Return the Dialog
object corresponding to the loaded model. The model needs to be instruction-tuned and have a specific chat template.
Parameters
mode
Optional[str] : Some models (e.g.,phi-2
) define multiple chat template modes. For example,phi-2
allows bothqa
andchat
templates.history
Optional[int] : History refers to the number of latest back-and-forths to include in the prompt. Setting history toNone
will embed the entire dialog in the prompt.system
Optional[str] : System instruction to embed in the prompt for configuring the model's responses.
Returns
Dialog
: Constructed dialog object.
Throws
picollm.PicoLLMError
Error thrown if an error occurs within picoLLM
Inference Engine.
Exceptions
picollm.PicoLLMUsage
Usage information.
prompt_tokens
int : Number of tokens in the prompt.completion_tokens
int : Number of tokens in the completion.
picollm.PicoLLMEndpoints
Reasons for ending the generation process.
END_OF_SENTENCE
: 0COMPLETION_TOKEN_LIMIT_REACHED
: 1STOP_PHRASE_ENCOUNTERED
: 2
picollm.PicoLLMToken
Generated token and its log probability.
token
str : Token.log_prob
float : Log probability.
picollm.PicoLLMCompletionToken
Generated token within completion and top alternative tokens.
token
PicoLLMToken : Token.top_choices
Sequence[PicoLLMToken] : Top choices.
picollm.PicoLLMCompletion
LLM completion result.
usage
PicoLLMUsage : Usage information.endpoint
PicoLLMEndpoints : Reason for ending the generation process.completion_tokens
Sequence[PicoLLMCompletionToken] : Generated tokens within completion and top alternative tokens.completion
str : Completion string.
picollm.Dialog
Dialog is a helper class that stores a chat dialog and formats it according to an instruction-tuned LLM's chat template. Dialog is the base class. Each supported instruction-tuned LLM has an accompanying concrete subclass.
picollm.Dialog.__init__()
Constructor for the Dialog
class.
Parameters
history
Optional[int] : History refers to the number of latest back-and-forths to include in the prompt. Setting history toNone
will embed the entire dialog in the prompt.system
Optional[str] : System instruction to embed in the prompt for configuring the model's responses.
Throws
ValueError
: Ifhistory
is set to a negative value.
picollm.Dialog.add_human_request()
Adds a human's request to the dialog.
Parameters
content
str : Human's request.
Throws
RuntimeError
: If a human request is added without entering the last LLM response.
picollm.Dialog.add_llm_response()
Adds LLM's response to the dialog.
Parameters
content
str : LLM's response.
Throws
RuntimeError
: If an LLM response is added without entering the human request.
picollm.GemmaChatDialog
Dialog helper for gemma-2b-it
and gemma-7b-it
.
picollm.GemmaChatDialog.prompt()
Creates a prompt string for gemma-2b-it
and gemma-7b-it
models.
Returns
str
: Formatted prompt.
Throws
RuntimeError
: If a prompt is created without an outstanding human request.
picollm.Llama2ChatDialog
Dialog helper for llama-2-7b-chat
, llama-2-13b-chat
, and llama-2-70b-chat
.
picollm.Llama2ChatDialog.prompt()
Creates a prompt string for llama-2-7b-chat
, llama-2-13b-chat
, and llama-2-70b-chat
models.
Returns
str
: Formatted prompt.
Throws
RuntimeError
: If a prompt is created without an outstanding human request.
picollm.Llama3ChatDialog
Dialog helper for llama-3-8b-instruct
and llama-3-70b-instruct
.
picollm.Llama32ChatDialog
Dialog helper for llama-3.2-1b-instruct
and llama-3.2-3b-instruct
.
picollm.Llama3ChatDialog.prompt()
Creates a prompt string for llama-3-8b-instruct
and llama-3-70b-instruct
models.
Returns
str
: Formatted prompt.
Throws
RuntimeError
: If a prompt is created without an outstanding human request.
picollm.Llama32ChatDialog.prompt()
Creates a prompt string for llama-3.2-1b-instruct
and llama-3.2-3b-instruct
models.
Returns
str
: Formatted prompt.
Throws
RuntimeError
: If a prompt is created without an outstanding human request.
picollm.MistralChatDialog
Dialog helper for mistral-7b-instruct-v0.1
and mistral-7b-instruct-v0.2
.
picollm.MistralChatDialog.prompt()
Creates a prompt string for mistral-7b-instruct-v0.1
and mistral-7b-instruct-v0.2
models.
Returns
str
: Formatted prompt.
Throws
RuntimeError
: If a prompt is created without an outstanding human request.
picollm.MixtralChatDialog
Dialog helper for mixtral-8x7b-instruct-v0.1
. This class inherits methods from MistralChatDialog
.
picollm.Phi2Dialog
Dialog helper for phi-2
. This is a base class, use one of the mode-specific subclasses.
picollm.Phi2Dialog.__init__()
Constructor for the Phi2Dialog
class.
Parameters
human_tag
str : Tag for human input.llm_tag
str : Tag for LLM input.history
Optional[int] : History refers to the number of latest back-and-forths to include in the prompt. Setting history toNone
will embed the entire dialog in the prompt.system
Optional[str] : System instruction to embed in the prompt for configuring the model's responses.
picollm.Phi2Dialog.prompt()
Creates a prompt string for phi-2
model.
Returns
str
: Formatted prompt.
Throws
RuntimeError
: If a prompt is created without an outstanding human request.
picollm.Phi2QADialog
Dialog helper for phi-2
in qa
mode. This class inherits methods from Phi2Dialog
.
picollm.Phi2ChatDialog
Dialog helper for phi-2
in chat
mode. This class inherits methods from Phi2Dialog
.
picollm.Phi3Dialog
Dialog helper for phi3
.
picollm.Phi35Dialog
Dialog helper for phi3.5
.