picoLLM Inference Engine
C API
API Reference for the picoLLM C SDK.
pv_picollm_t
Container representing the picoLLM Inference Engine.
pv_picollm_init()
Creates a picoLLM instance. Resources should be cleaned when you are done using the pv_picollm_delete() function.
Parameters
access_keyconst char * : AccessKey obtained from Picovoice Console.model_pathconst char * : Absolute path to the file containing model parameters (.pllm).deviceconst char * : String representation of the device (e.g., CPU or GPU) to use for inference. If set tobest, picoLLM picks the most suitable device. If set togpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}, where${GPU_INDEX}is the index of the target GPU. If set tocpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}, where${NUM_THREADS}is the desired number of threads.objectpv_picollm_t * * : Constructed instance of picoLLM.
Returns
- pv_status_t : Status code.
pv_picollm_delete()
Releases resources acquired by picoLLM.
Parameters
objectpv_picollm_t * : picoLLM object.
pv_picollm_generate()
Given a text prompt and a set of generation parameters, creates a completion text and relevant metadata. The caller is responsible for freeing completion and meta objects using pv_picollm_delete_completion() and pv_picollm_delete_completion_tokens().
Parameters
objectpv_picollm_t * : picoLLM object.promptconst char * : Text prompt.completion_token_limitint32_t : Maximum number of tokens in the completion. If the generation process stops due to reaching this limit, theendpointoutput argument will bePV_PICOLLM_ENDPOINT_COMPLETION_TOKEN_LIMIT_REACHED. Set to-1to impose no limit.stop_phrasesconst char * : The generation process stops when it encounters any of these phrases in the completion. The already generated completion, including the encountered stop phrase, will be returned. Theendpointoutput argument will bePV_PICOLLM_ENDPOINT_STOP_PHRASE_ENCOUNTERED. Set toNULLto turn off this feature.num_stop_phrasesint32_t : Number of stop phrases. Set to0to turn off this feature.seedint32_t : The internal random number generator uses it as its seed if set to a positive integer value. Seeding enforces deterministic outputs. Set to-1for randomized outputs for a given prompt.presence_penaltyfloat : It penalizes logits already appearing in the partial completion if set to a positive value. If set to0.0, it has no effect.frequency_penaltyfloat : If set to a positive floating-point value, it penalizes logits proportional to the frequency of their appearance in the partial completion. If set to0.0, it has no effect.temperaturefloat : Sampling temperature. Temperature is a non-negative floating-point value that controls the randomness of the sampler. A higher temperature smoothens the samplers' output, increasing the randomness. In contrast, a lower temperature creates a narrower distribution and reduces variability. Setting it to0selects the maximum logit during sampling.top_pfloat : A positive floating-point number within (0, 1]. It restricts the sampler's choices to high-probability logits that form thetop_pportion of the probability mass. Hence, it avoids randomly selecting unlikely logits. A value of1.enables the sampler to pick any token with non-zero probability, turning off the feature.num_top_choicesint32_t : If set to a positive value, picoLLM returns the list of the highest probability tokens for any generated token. Set to0to turn off the feature. The maximum number of top choices ispv_picollm_max_top_choices().stream_callbackpv_picollm_stream_callback_t : If not set toNULL, picoLLM executes this callback every time a new piece of completion string becomes available.stream_callback_contextvoid * : Pointer containing user-defined data that is passed tostream_callbackon every invocation.usagepv_picollm_usage_t * : Number of tokens in the prompt and completion.endpointpv_picollm_endpoint_t * : Indicates the reason for termination of the generation process.completion_tokenspv_picollm_completion_token_t * * : Token-level information about the generated completion.num_completion_tokensint32_t * : Number of tokens in the completion.completionchar * * : Completion.
Returns
- pv_status_t : Status code.
pv_picollm_interrupt()
Interrupts pv_picollm_generate() if generation is in progress. Otherwise, it has no effect
Parameters
objectpv_picollm_t * : picoLLM object.
Returns
- pv_status_t : Status code.
pv_picollm_delete_completion_tokens()
Deletes completion tokens returned from pv_picollm_generate().
Parameters
completion_tokenspv_picollm_completion_token_t * : Completion tokens.num_completion_tokensint32_t : Number of completion tokens.
pv_picollm_delete_completion()
Deletes completion text returned from pv_picollm_generate().
Parameters
completionchar * : Completion text.
pv_picollm_tokenize()
Tokenizes a given text using the model's tokenizer. The caller is responsible for freeing the returned tokens buffer using pv_picollm_delete_tokens(). This is a low-level function meant for benchmarking and advanced usage. pv_picollm_generate() should be used when possible.
Parameters
-- object pv_picollm_t * : picoLLM object.
textconst char * : Text.bosbool : If set totrue, the tokenizer prepends the beginning of the sentence token to the result.eosbool : If set totrue, the tokenizer appends the end of the sentence token to the result.num_tokensint32_t * : Number of tokens.tokensint32_t * : Tokens representing the input text.
Returns
- pv_status_t : Status code.
pv_picollm_delete_tokens()
Deletes tokens returned from pv_picollm_tokenize().
Parameters
tokensint32_t * : Tokens.
pv_picollm_forward()
Perform a single forward pass given a token and return the logits. The caller is responsible for freeing the logits buffer using pv_picollm_delete_logits(). This is a low-level function for benchmarking and advanced usage. pv_picollm_generate() should be used when possible.
Parameters
objectpv_picollm_t * : picoLLM object.tokenint32_t : Input token.num_logitsint32_t * : Number of logits.logitsfloat * * : Logits.
Returns
- pv_status_t : Status code.
pv_picollm_delete_logits()
Deletes logits returned from pv_picollm_forward().
Parameters
logitsfloat * : Logits.
pv_picollm_reset()
Resets the internal state of LLM. It should be called in conjunction with pv_picollm_forward() when processing a new sequence of tokens. This is a low-level function for benchmarking and advanced usage. pv_picollm_generate() should be used when possible.
Parameters
objectpv_picollm_t * : picoLLM object.
Returns
- pv_status_t : Status code.
pv_picollm_model()
Getter for the model's information.
Parameters
objectpv_picollm_t * : picoLLM object.modelconst char * : Model information.
Returns
- pv_status_t : Status code.
pv_picollm_context_length()
Getter for model's context length.
Parameters
objectpv_picollm_t * : picoLLM object.context_lengthint32_t * : Context length.
Returns
- pv_status_t : Status code.
pv_picollm_version()
Getter for version.
Returns
- const char * : Version.
pv_picollm_max_top_choices()
Getter for maximum number of top choices for pv_picollm_generate().
Returns
- int32_t : Maximum number of top choices.
pv_picollm_list_hardware_devices()
Gets a list of hardware devices that can be specified when calling pv_picollm_init().
Parameters
hardware_devicesconst char * * : Array of available hardware devices. Devices are NULL terminated strings. The array must be freed using pv_picollm_free_hardware_devices().num_hardware_devicesint32_t * : The number of devices in thehardware_devicesarray.
Returns
- pv_status_t : Status code.
pv_picollm_free_hardware_devices()
Frees the memory allocated for the hardware devices list returned by pv_picollm_list_hardware_devices().
Parameters
hardware_devicesconst char * * : Array of available hardware devices.num_hardware_devicesint32_t : The number of devices in thehardware_devicesarray.
pv_status_t
Status code enum.
pv_picollm_usage_t
Struct for the number of tokens in the prompt and completion.
pv_picollm_endpoint_t
Enum for the endpoint detection types.
pv_picollm_token_t
Struct for a token and its associated log probability.
pv_picollm_completion_token_t
Struct for a token within completion and top alternative tokens.
pv_picollm_stream_callback_t
Stream callback function type used in the pv_picollm_generate() function.
pv_status_to_string()
Parameters
statuspv_status_t : Status code.
Returns
- const char * : String representation of the status code.
pv_get_error_stack()
If a function returns a failure (any pv_status_t other than PV_STATUS_SUCCESS), this function can be
called to get a series of error messages related to the failure. This function can only be called only once per
failure status on another function. The memory for message_stack must be freed using pv_free_error_stack.
Regardless of the return status of this function, if message_stack is not NULL, then message_stack
contains valid memory. However, a failure status on this function indicates that future error messages
may not be reported.
Parameters
message_stackchar * * : Array of messages relating to the failure. Messages are NULL terminated strings. The array must be freed using pv_free_error_stack().message_stack_depthint32_t * : The number of messages in themessage_stackarray.
Returns
- pv_status_t : Status code.
pv_free_error_stack()
This function frees the memory used by error messages allocated by pv_get_error_stack().
Parameters
message_stackchar * * : Array of messages relating to the failure. Messages are NULL terminated strings.