picoLLM Inference Engine
C API
API Reference for the picoLLM C SDK.
pv_picollm_t
Container representing the picoLLM Inference Engine.
pv_picollm_init()
Creates a picoLLM instance. Resources should be cleaned when you are done using the pv_picollm_delete() function.
Parameters
access_keyconst char * : AccessKey obtained from Picovoice Console.model_pathconst char * : Absolute path to the file containing model parameters (.pllm).deviceconst char * : String representation of the device (e.g., CPU or GPU) to use for inference. If set tobest, picoLLM picks the most suitable device. If set togpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}, where${GPU_INDEX}is the index of the target GPU. If set tocpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}, where${NUM_THREADS}is the desired number of threads.objectpv_picollm_t * * : Constructed instance of picoLLM.
Returns
- pv_status_t : Status code.
pv_picollm_delete()
Releases resources acquired by picoLLM.
Parameters
objectpv_picollm_t * : picoLLM object.
pv_picollm_generate()
Given a text prompt and a set of generation parameters, creates a completion text and relevant metadata. The caller is responsible for freeing the completion text and meta objects using pv_picollm_delete_completion() and pv_picollm_delete_completion_tokens().
Parameters
objectpv_picollm_t * : picoLLM object.promptconst char * : Text prompt.completion_token_limitint32_t : Maximum number of tokens in the completion. If the generation process stops due to reaching this limit, theendpointoutput argument will bePV_PICOLLM_ENDPOINT_COMPLETION_TOKEN_LIMIT_REACHED. Set to-1to impose no limit.stop_phrasesconst char * : The generation process stops when it encounters any of these phrases in the completion. The already generated completion, including the encountered stop phrase, will be returned. Theendpointoutput argument will bePV_PICOLLM_ENDPOINT_STOP_PHRASE_ENCOUNTERED. Set toNULLto turn off this feature.num_stop_phrasesint32_t : Number of stop phrases. Set to0to turn off this feature.seedint32_t : The internal random number generator uses it as its seed if set to a positive integer value. Seeding enforces deterministic outputs. Set to-1for randomized outputs for a given prompt.presence_penaltyfloat : It penalizes logits already appearing in the partial completion if set to a positive value. If set to0.0, it has no effect.frequency_penaltyfloat : If set to a positive floating-point value, it penalizes logits proportional to the frequency of their appearance in the partial completion. If set to0.0, it has no effect.temperaturefloat : Sampling temperature. Temperature is a non-negative floating-point value that controls the randomness of the sampler. A higher temperature smoothens the samplers' output, increasing the randomness. In contrast, a lower temperature creates a narrower distribution and reduces variability. Setting it to0selects the maximum logit during sampling.top_pfloat : A positive floating-point number within (0, 1]. It restricts the sampler's choices to high-probability logits that form thetop_pportion of the probability mass. Hence, it avoids randomly selecting unlikely logits. A value of1.enables the sampler to pick any token with non-zero probability, turning off the feature.num_top_choicesint32_t : If set to a positive value, picoLLM returns the list of the highest probability tokens for any generated token. Set to0to turn off the feature. The maximum number of top choices ispv_picollm_max_top_choices().stream_callbackpv_picollm_stream_callback_t : If not set toNULL, picoLLM executes this callback every time a new piece of completion text becomes available.stream_callback_contextvoid * : Pointer containing user-defined data that is passed tostream_callbackon every invocation.usagepv_picollm_usage_t * : Number of tokens in the prompt and completion.endpointpv_picollm_endpoint_t * : Indicates the reason for termination of the generation process.completion_tokenspv_picollm_completion_token_t * * : Token-level information about the generated completion.num_completion_tokensint32_t * : Number of tokens in the completion.completionchar * * : Completion text.
Returns
- pv_status_t : Status code.
pv_picollm_generate_with_image()
Given a text prompt, an image and a set of generation parameters, creates a completion text and relevant metadata. The caller is responsible for freeing the completion text and meta objects using pv_picollm_delete_completion() and pv_picollm_delete_completion_tokens().
For use with vision models only.
Parameters
objectpv_picollm_t * : picoLLM object.promptconst char * : Text prompt.image_widthint32_t : Width of the image in pixels.image_heightint32_t : Height of the image in pixels.imageconst uint8_t * : Image pixel data in 8-bit, RGB format.completion_token_limitint32_t : Maximum number of tokens in the completion. If the generation process stops due to reaching this limit, theendpointoutput argument will bePV_PICOLLM_ENDPOINT_COMPLETION_TOKEN_LIMIT_REACHED. Set to-1to impose no limit.stop_phrasesconst char * : The generation process stops when it encounters any of these phrases in the completion. The already generated completion, including the encountered stop phrase, will be returned. Theendpointoutput argument will bePV_PICOLLM_ENDPOINT_STOP_PHRASE_ENCOUNTERED. Set toNULLto turn off this feature.num_stop_phrasesint32_t : Number of stop phrases. Set to0to turn off this feature.seedint32_t : The internal random number generator uses it as its seed if set to a positive integer value. Seeding enforces deterministic outputs. Set to-1for randomized outputs for a given prompt.presence_penaltyfloat : It penalizes logits already appearing in the partial completion if set to a positive value. If set to0.0, it has no effect.frequency_penaltyfloat : If set to a positive floating-point value, it penalizes logits proportional to the frequency of their appearance in the partial completion. If set to0.0, it has no effect.temperaturefloat : Sampling temperature. Temperature is a non-negative floating-point value that controls the randomness of the sampler. A higher temperature smoothens the samplers' output, increasing the randomness. In contrast, a lower temperature creates a narrower distribution and reduces variability. Setting it to0selects the maximum logit during sampling.top_pfloat : A positive floating-point number within (0, 1]. It restricts the sampler's choices to high-probability logits that form thetop_pportion of the probability mass. Hence, it avoids randomly selecting unlikely logits. A value of1.enables the sampler to pick any token with non-zero probability, turning off the feature.num_top_choicesint32_t : If set to a positive value, picoLLM returns the list of the highest probability tokens for any generated token. Set to0to turn off the feature. The maximum number of top choices ispv_picollm_max_top_choices().stream_callbackpv_picollm_stream_callback_t : If not set toNULL, picoLLM executes this callback every time a new piece of completion text becomes available.stream_callback_contextvoid * : Pointer containing user-defined data that is passed tostream_callbackon every invocation.prompt_progress_callbackpv_picollm_progress_callback_t : If not set toNULL, picoLLM uses this callback to report the prompt evaluation progress as a floating-point number within (0, 100]. A value of 100 indicates that prompt evaluation is complete and completion tokens are now being generated.prompt_progress_callback_contextvoid * : Pointer containing user-defined data that is passed toprompt_progress_callbackon every invocation.usagepv_picollm_usage_t * : Number of tokens in the prompt and completion.endpointpv_picollm_endpoint_t * : Indicates the reason for termination of the generation process.completion_tokenspv_picollm_completion_token_t * * : Token-level information about the generated completion.num_completion_tokensint32_t * : Number of tokens in the completion.completionchar * * : Completion text.
Returns
- pv_status_t : Status code.
pv_picollm_generate_embeddings()
Generates numerical vector representations of the input text prompt. The caller is responsible for freeing the embeddings using pv_picollm_delete_embeddings().
For use with embedding models only.
Parameters
objectpv_picollm_t * : picoLLM object.promptconst char * : Text prompt.num_embeddingsint32_t * : Number of generated embeddings.embeddingsfloat * * : Generated embeddings.
Returns
- pv_status_t : Status code.
pv_picollm_generate_ocr()
Generates a completion text representing text found in the given image. The caller is responsible for freeing the completion text using pv_picollm_delete_completion().
For use with OCR (Optical Character Recognition) models only.
Parameters
objectpv_picollm_t * : picoLLM object.image_widthint32_t : Width of the image in pixels.image_heightint32_t : Height of the image in pixels.imageconst uint8_t * : Image pixel data in 8-bit, RGB format.completion_token_limitint32_t : Maximum number of tokens in the completion. If the generation process stops due to reaching this limit, theendpointoutput argument will bePV_PICOLLM_ENDPOINT_COMPLETION_TOKEN_LIMIT_REACHED. Set to-1to impose no limit.stream_callbackpv_picollm_stream_callback_t : If not set toNULL, picoLLM executes this callback every time a new piece of completion text becomes available.stream_callback_contextvoid * : Pointer containing user-defined data that is passed tostream_callbackon every invocation.prompt_progress_callbackpv_picollm_progress_callback_t : If not set toNULL, picoLLM uses this callback to report the prompt evaluation progress as a floating-point number within (0, 100]. A value of 100 indicates that prompt evaluation is complete and completion tokens are now being generated.prompt_progress_callback_contextvoid * : Pointer containing user-defined data that is passed toprompt_progress_callbackon every invocation.endpointpv_picollm_endpoint_t * : Indicates the reason for termination of the generation process.completionchar * * : Completion text.
Returns
- pv_status_t : Status code.
pv_picollm_interrupt()
Interrupts pv_picollm_generate(),pv_picollm_generate_with_image() and pv_picollm_generate_ocr() if generation is in progress. Otherwise, it has no effect.
Parameters
objectpv_picollm_t * : picoLLM object.
Returns
- pv_status_t : Status code.
pv_picollm_delete_completion_tokens()
Deletes completion tokens returned from pv_picollm_generate() and pv_picollm_generate_with_image().
Parameters
completion_tokenspv_picollm_completion_token_t * : Completion tokens.num_completion_tokensint32_t : Number of completion tokens.
pv_picollm_delete_completion()
Deletes completion text returned from pv_picollm_generate(), pv_picollm_generate_with_image() and pv_picollm_generate_ocr().
Parameters
completionchar * : Completion text.
pv_picollm_delete_embeddings()
Deletes embeddings returned by pv_picollm_generate_embeddings().
Parameters
embeddingsfloat * : Embeddings.
pv_picollm_tokenize()
Tokenizes a given text using the model's tokenizer. The caller is responsible for freeing the returned tokens buffer using pv_picollm_delete_tokens(). This is a low-level function meant for benchmarking and advanced usage. pv_picollm_generate() should be used when possible.
Parameters
-- object pv_picollm_t * : picoLLM object.
textconst char * : Text.bosbool : If set totrue, the tokenizer prepends the beginning of the sentence token to the result.eosbool : If set totrue, the tokenizer appends the end of the sentence token to the result.num_tokensint32_t * : Number of tokens.tokensint32_t * : Tokens representing the input text.
Returns
- pv_status_t : Status code.
pv_picollm_delete_tokens()
Deletes tokens returned from pv_picollm_tokenize().
Parameters
tokensint32_t * : Tokens.
pv_picollm_forward()
Perform a single forward pass given a token and return the logits. The caller is responsible for freeing the logits buffer using pv_picollm_delete_logits(). This is a low-level function for benchmarking and advanced usage. pv_picollm_generate() should be used when possible.
Parameters
objectpv_picollm_t * : picoLLM object.tokenint32_t : Input token.num_logitsint32_t * : Number of logits.logitsfloat * * : Logits.
Returns
- pv_status_t : Status code.
pv_picollm_delete_logits()
Deletes logits returned from pv_picollm_forward().
Parameters
logitsfloat * : Logits.
pv_picollm_reset()
Resets the internal state of LLM. It should be called in conjunction with pv_picollm_forward() when processing a new sequence of tokens. This is a low-level function for benchmarking and advanced usage. pv_picollm_generate() should be used when possible.
Parameters
objectpv_picollm_t * : picoLLM object.
Returns
- pv_status_t : Status code.
pv_picollm_model()
Getter for the model's information.
Parameters
objectpv_picollm_t * : picoLLM object.modelconst char * : Model information.
Returns
- pv_status_t : Status code.
pv_picollm_context_length()
Getter for model's context length.
Parameters
objectpv_picollm_t * : picoLLM object.context_lengthint32_t * : Context length.
Returns
- pv_status_t : Status code.
pv_picollm_version()
Getter for version.
Returns
- const char * : Version.
pv_picollm_max_top_choices()
Getter for maximum number of top choices for pv_picollm_generate() and pv_picollm_generate_with_image().
Returns
- int32_t : Maximum number of top choices.
pv_picollm_list_hardware_devices()
Gets a list of hardware devices that can be specified when calling pv_picollm_init().
Parameters
hardware_devicesconst char * * : Array of available hardware devices. Devices are NULL terminated strings. The array must be freed using pv_picollm_free_hardware_devices().num_hardware_devicesint32_t * : The number of devices in thehardware_devicesarray.
Returns
- pv_status_t : Status code.
pv_picollm_free_hardware_devices()
Frees the memory allocated for the hardware devices list returned by pv_picollm_list_hardware_devices().
Parameters
hardware_devicesconst char * * : Array of available hardware devices.num_hardware_devicesint32_t : The number of devices in thehardware_devicesarray.
pv_status_t
Status code enum.
pv_picollm_usage_t
Struct for the number of tokens in the prompt and completion.
pv_picollm_endpoint_t
Enum for the endpoint detection types.
pv_picollm_token_t
Struct for a token and its associated log probability.
pv_picollm_completion_token_t
Struct for a token within completion and top alternative tokens.
pv_picollm_stream_callback_t
Stream callback function type used in the pv_picollm_generate(), pv_picollm_generate_with_image() and pv_picollm_generate_ocr() function.
pv_picollm_progress_callback_t
Progress function type used in pv_picollm_generate_with_image() and pv_picollm_generate_ocr().
pv_status_to_string()
Parameters
statuspv_status_t : Status code.
Returns
- const char * : String representation of the status code.
pv_get_error_stack()
If a function returns a failure (any pv_status_t other than PV_STATUS_SUCCESS), this function can be
called to get a series of error messages related to the failure. This function can only be called only once per
failure status on another function. The memory for message_stack must be freed using pv_free_error_stack.
Regardless of the return status of this function, if message_stack is not NULL, then message_stack
contains valid memory. However, a failure status on this function indicates that future error messages
may not be reported.
Parameters
message_stackchar * * : Array of messages relating to the failure. Messages are NULL terminated strings. The array must be freed using pv_free_error_stack().message_stack_depthint32_t * : The number of messages in themessage_stackarray.
Returns
- pv_status_t : Status code.
pv_free_error_stack()
This function frees the memory used by error messages allocated by pv_get_error_stack().
Parameters
message_stackchar * * : Array of messages relating to the failure. Messages are NULL terminated strings.