picoLLM Inference Engine
C API
API Reference for the picoLLM C SDK.
pv_picollm_t
Container representing the picoLLM Inference Engine.
pv_picollm_init()
Creates a picoLLM instance. Resources should be cleaned when you are done using the pv_picollm_delete() function.
Parameters
access_key
const char * : AccessKey obtained from Picovoice Console.model_path
const char * : Absolute path to the file containing model parameters (.pllm
).device
const char * : String representation of the device (e.g., CPU or GPU) to use for inference. If set tobest
, picoLLM picks the most suitable device. If set togpu
, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}
, where${GPU_INDEX}
is the index of the target GPU. If set tocpu
, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}
, where${NUM_THREADS}
is the desired number of threads.object
pv_picollm_t * * : Constructed instance of picoLLM.
Returns
- pv_status_t : Status code.
pv_picollm_delete()
Releases resources acquired by picoLLM.
Parameters
object
pv_picollm_t * : picoLLM object.
pv_picollm_generate()
Given a text prompt and a set of generation parameters, creates a completion text and relevant metadata. The caller is responsible for freeing completion and meta objects using pv_picollm_delete_completion() and pv_picollm_delete_completion_tokens().
Parameters
object
pv_picollm_t * : picoLLM object.prompt
const char * : Text prompt.completion_token_limit
int32_t : Maximum number of tokens in the completion. If the generation process stops due to reaching this limit, theendpoint
output argument will bePV_PICOLLM_ENDPOINT_COMPLETION_TOKEN_LIMIT_REACHED
. Set to-1
to impose no limit.stop_phrases
const char * : The generation process stops when it encounters any of these phrases in the completion. The already generated completion, including the encountered stop phrase, will be returned. Theendpoint
output argument will bePV_PICOLLM_ENDPOINT_STOP_PHRASE_ENCOUNTERED
. Set toNULL
to turn off this feature.num_stop_phrases
int32_t : Number of stop phrases. Set to0
to turn off this feature.seed
int32_t : The internal random number generator uses it as its seed if set to a positive integer value. Seeding enforces deterministic outputs. Set to-1
for randomized outputs for a given prompt.presence_penalty
float : It penalizes logits already appearing in the partial completion if set to a positive value. If set to0.0
, it has no effect.frequency_penalty
float : If set to a positive floating-point value, it penalizes logits proportional to the frequency of their appearance in the partial completion. If set to0.0
, it has no effect.temperature
float : Sampling temperature. Temperature is a non-negative floating-point value that controls the randomness of the sampler. A higher temperature smoothens the samplers' output, increasing the randomness. In contrast, a lower temperature creates a narrower distribution and reduces variability. Setting it to0
selects the maximum logit during sampling.top_p
float : A positive floating-point number within (0, 1]. It restricts the sampler's choices to high-probability logits that form thetop_p
portion of the probability mass. Hence, it avoids randomly selecting unlikely logits. A value of1.
enables the sampler to pick any token with non-zero probability, turning off the feature.num_top_choices
int32_t : If set to a positive value, picoLLM returns the list of the highest probability tokens for any generated token. Set to0
to turn off the feature. The maximum number of top choices ispv_picollm_max_top_choices()
.stream_callback
pv_picollm_stream_callback_t : If not set toNULL
, picoLLM executes this callback every time a new piece of completion string becomes available.stream_callback_context
void * : Pointer containing user-defined data that is passed tostream_callback
on every invocation.usage
pv_picollm_usage_t * : Number of tokens in the prompt and completion.endpoint
pv_picollm_endpoint_t * : Indicates the reason for termination of the generation process.completion_tokens
pv_picollm_completion_token_t * * : Token-level information about the generated completion.num_completion_tokens
int32_t * : Number of tokens in the completion.completion
char * * : Completion.
Returns
- pv_status_t : Status code.
pv_picollm_interrupt()
Interrupts pv_picollm_generate()
if generation is in progress. Otherwise, it has no effect
Parameters
object
pv_picollm_t * : picoLLM object.
Returns
- pv_status_t : Status code.
pv_picollm_delete_completion_tokens()
Deletes completion tokens returned from pv_picollm_generate().
Parameters
completion_tokens
pv_picollm_completion_token_t * : Completion tokens.num_completion_tokens
int32_t : Number of completion tokens.
pv_picollm_delete_completion()
Deletes completion text returned from pv_picollm_generate().
Parameters
completion
char * : Completion text.
pv_picollm_tokenize()
Tokenizes a given text using the model's tokenizer. The caller is responsible for freeing the returned tokens buffer using pv_picollm_delete_tokens(). This is a low-level function meant for benchmarking and advanced usage. pv_picollm_generate() should be used when possible.
Parameters
-- object
pv_picollm_t * : picoLLM object.
text
const char * : Text.bos
bool : If set totrue
, the tokenizer prepends the beginning of the sentence token to the result.eos
bool : If set totrue
, the tokenizer appends the end of the sentence token to the result.num_tokens
int32_t * : Number of tokens.tokens
int32_t * : Tokens representing the input text.
Returns
- pv_status_t : Status code.
pv_picollm_delete_tokens()
Deletes tokens returned from pv_picollm_tokenize().
Parameters
tokens
int32_t * : Tokens.
pv_picollm_forward()
Perform a single forward pass given a token and return the logits. The caller is responsible for freeing the logits buffer using pv_picollm_delete_logits(). This is a low-level function for benchmarking and advanced usage. pv_picollm_generate() should be used when possible.
Parameters
object
pv_picollm_t * : picoLLM object.token
int32_t : Input token.num_logits
int32_t * : Number of logits.logits
float * * : Logits.
Returns
- pv_status_t : Status code.
pv_picollm_delete_logits()
Deletes logits returned from pv_picollm_forward().
Parameters
logits
float * : Logits.
pv_picollm_reset()
Resets the internal state of LLM. It should be called in conjunction with pv_picollm_forward() when processing a new sequence of tokens. This is a low-level function for benchmarking and advanced usage. pv_picollm_generate() should be used when possible.
Parameters
object
pv_picollm_t * : picoLLM object.
Returns
- pv_status_t : Status code.
pv_picollm_model()
Getter for the model's information.
Parameters
object
pv_picollm_t * : picoLLM object.model
const char * : Model information.
Returns
- pv_status_t : Status code.
pv_picollm_context_length()
Getter for model's context length.
Parameters
object
pv_picollm_t * : picoLLM object.context_length
int32_t * : Context length.
Returns
- pv_status_t : Status code.
pv_picollm_version()
Getter for version.
Returns
- const char * : Version.
pv_picollm_max_top_choices()
Getter for maximum number of top choices for pv_picollm_generate().
Returns
- int32_t : Maximum number of top choices.
pv_picollm_list_hardware_devices()
Gets a list of hardware devices that can be specified when calling pv_picollm_init().
Parameters
hardware_devices
const char * * : Array of available hardware devices. Devices are NULL terminated strings. The array must be freed using pv_picollm_free_hardware_devices().num_hardware_devices
int32_t * : The number of devices in thehardware_devices
array.
Returns
- pv_status_t : Status code.
pv_picollm_free_hardware_devices()
Frees the memory allocated for the hardware devices list returned by pv_picollm_list_hardware_devices().
Parameters
hardware_devices
const char * * : Array of available hardware devices.num_hardware_devices
int32_t : The number of devices in thehardware_devices
array.
pv_status_t
Status code enum.
pv_picollm_usage_t
Struct for the number of tokens in the prompt and completion.
pv_picollm_endpoint_t
Enum for the endpoint detection types.
pv_picollm_token_t
Struct for a token and its associated log probability.
pv_picollm_completion_token_t
Struct for a token within completion and top alternative tokens.
pv_picollm_stream_callback_t
Stream callback function type used in the pv_picollm_generate() function.
pv_status_to_string()
Parameters
status
pv_status_t : Status code.
Returns
- const char * : String representation of the status code.
pv_get_error_stack()
If a function returns a failure (any pv_status_t other than PV_STATUS_SUCCESS
), this function can be
called to get a series of error messages related to the failure. This function can only be called only once per
failure status on another function. The memory for message_stack
must be freed using pv_free_error_stack
.
Regardless of the return status of this function, if message_stack
is not NULL
, then message_stack
contains valid memory. However, a failure status on this function indicates that future error messages
may not be reported.
Parameters
message_stack
char * * : Array of messages relating to the failure. Messages are NULL terminated strings. The array must be freed using pv_free_error_stack().message_stack_depth
int32_t * : The number of messages in themessage_stack
array.
Returns
- pv_status_t : Status code.
pv_free_error_stack()
This function frees the memory used by error messages allocated by pv_get_error_stack()
.
Parameters
message_stack
char * * : Array of messages relating to the failure. Messages are NULL terminated strings.