picoLLM Inference Engine
C API

API Reference for the picoLLM C SDK.

pv_picollm_t

typedef struct pv_picollm pv_picollm_t;

Container representing the picoLLM Inference Engine.

pv_picollm_init()

pv_status_t pv_picollm_init(
        const char *access_key,
        const char *model_path,
        const char *device,
        pv_picollm_t **object);

Creates a picoLLM instance. Resources should be cleaned when you are done using the pv_picollm_delete() function.

Parameters

access_key const char * : AccessKey obtained from Picovoice Console.
model_path const char * : Absolute path to the file containing model parameters (.pllm).
device const char * : String representation of the device (e.g., CPU or GPU) to use for inference. If set to best, picoLLM picks the most suitable device. If set to gpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument to gpu:${GPU_INDEX}, where ${GPU_INDEX} is the index of the target GPU. If set to cpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument to cpu:${NUM_THREADS}, where ${NUM_THREADS} is the desired number of threads.
object pv_picollm_t * * : Constructed instance of picoLLM.

Returns

pv_status_t : Status code.

pv_picollm_delete()

void pv_picollm_delete(pv_picollm_t *object);

Releases resources acquired by picoLLM.

Parameters

object pv_picollm_t * : picoLLM object.

pv_picollm_generate()

pv_status_t pv_picollm_generate(
        pv_picollm_t *object,
        const char *prompt,
        int32_t completion_token_limit,
        const char *const *stop_phrases,
        int32_t num_stop_phrases,
        int32_t seed,
        float presence_penalty,
        float frequency_penalty,
        float temperature,
        float top_p,
        int32_t num_top_choices,
        pv_picollm_stream_callback_t stream_callback,
        void *stream_callback_context,
        pv_picollm_usage_t *usage,
        pv_picollm_endpoint_t *endpoint,
        pv_picollm_completion_token_t **completion_tokens,
        int32_t *num_completion_tokens,
        char **completion);

Given a text prompt and a set of generation parameters, creates a completion text and relevant metadata. The caller is responsible for freeing completion and meta objects using pv_picollm_delete_completion() and pv_picollm_delete_completion_tokens().

Parameters

object pv_picollm_t * : picoLLM object.
prompt const char * : Text prompt.
completion_token_limit int32_t : Maximum number of tokens in the completion. If the generation process stops due to reaching this limit, the endpoint output argument will be PV_PICOLLM_ENDPOINT_COMPLETION_TOKEN_LIMIT_REACHED. Set to -1 to impose no limit.
stop_phrases const char * : The generation process stops when it encounters any of these phrases in the completion. The already generated completion, including the encountered stop phrase, will be returned. The endpoint output argument will be PV_PICOLLM_ENDPOINT_STOP_PHRASE_ENCOUNTERED. Set to NULL to turn off this feature.
num_stop_phrases int32_t : Number of stop phrases. Set to 0 to turn off this feature.
seed int32_t : The internal random number generator uses it as its seed if set to a positive integer value. Seeding enforces deterministic outputs. Set to -1 for randomized outputs for a given prompt.
presence_penalty float : It penalizes logits already appearing in the partial completion if set to a positive value. If set to 0.0, it has no effect.
frequency_penalty float : If set to a positive floating-point value, it penalizes logits proportional to the frequency of their appearance in the partial completion. If set to 0.0, it has no effect.
temperature float : Sampling temperature. Temperature is a non-negative floating-point value that controls the randomness of the sampler. A higher temperature smoothens the samplers' output, increasing the randomness. In contrast, a lower temperature creates a narrower distribution and reduces variability. Setting it to 0 selects the maximum logit during sampling.
top_p float : A positive floating-point number within (0, 1]. It restricts the sampler's choices to high-probability logits that form the top_p portion of the probability mass. Hence, it avoids randomly selecting unlikely logits. A value of 1. enables the sampler to pick any token with non-zero probability, turning off the feature.
num_top_choices int32_t : If set to a positive value, picoLLM returns the list of the highest probability tokens for any generated token. Set to 0 to turn off the feature. The maximum number of top choices is pv_picollm_max_top_choices().
stream_callback pv_picollm_stream_callback_t : If not set to NULL, picoLLM executes this callback every time a new piece of completion string becomes available.
stream_callback_context void * : Pointer containing user-defined data that is passed to stream_callback on every invocation.
usage pv_picollm_usage_t * : Number of tokens in the prompt and completion.
endpoint pv_picollm_endpoint_t * : Indicates the reason for termination of the generation process.
completion_tokens pv_picollm_completion_token_t * * : Token-level information about the generated completion.
num_completion_tokens int32_t * : Number of tokens in the completion.
completion char * * : Completion.

Returns

pv_status_t : Status code.

pv_picollm_delete_completion_tokens()

void pv_picollm_delete_completion_tokens(
        pv_picollm_completion_token_t *completion_tokens,
        int32_t num_completion_tokens);

Deletes completion tokens returned from pv_picollm_generate().

Parameters

completion_tokens pv_picollm_completion_token_t * : Completion tokens.
num_completion_tokens int32_t : Number of completion tokens.

pv_picollm_delete_completion()

void pv_picollm_delete_completion(char *completion);

Deletes completion text returned from pv_picollm_generate().

Parameters

completion char * : Completion text.

pv_picollm_tokenize()

pv_status_t pv_picollm_tokenize(
        const pv_picollm_t *object,
        const char *text,
        bool bos,
        bool eos,
        int32_t *num_tokens,
        int32_t **tokens);

Tokenizes a given text using the model's tokenizer. The caller is responsible for freeing the returned tokens buffer using pv_picollm_delete_tokens(). This is a low-level function meant for benchmarking and advanced usage. pv_picollm_generate() should be used when possible.

Parameters

-- object pv_picollm_t * : picoLLM object.

text const char * : Text.
bos bool : If set to true, the tokenizer prepends the beginning of the sentence token to the result.
eos bool : If set to true, the tokenizer appends the end of the sentence token to the result.
num_tokens int32_t * : Number of tokens.
tokens int32_t * : Tokens representing the input text.

Returns

pv_status_t : Status code.

pv_picollm_delete_tokens()

void pv_picollm_delete_tokens(int32_t *tokens);

Deletes tokens returned from pv_picollm_tokenize().

Parameters

tokens int32_t * : Tokens.

pv_picollm_forward()

Perform a single forward pass given a token and return the logits. The caller is responsible for freeing the logits buffer using pv_picollm_delete_logits(). This is a low-level function for benchmarking and advanced usage. pv_picollm_generate() should be used when possible.

pv_status_t pv_picollm_forward(
        pv_picollm_t *object,
        int32_t token,
        int32_t *num_logits,
        float **logits);

Parameters

object pv_picollm_t * : picoLLM object.
token int32_t : Input token.
num_logits int32_t * : Number of logits.
logits float * * : Logits.

Returns

pv_status_t : Status code.

pv_picollm_delete_logits()

void pv_picollm_delete_logits(float *logits);

Deletes logits returned from pv_picollm_forward().

Parameters

logits float * : Logits.

pv_picollm_reset()

pv_status_t pv_picollm_reset(pv_picollm_t *object);

Resets the internal state of LLM. It should be called in conjunction with pv_picollm_forward() when processing a new sequence of tokens. This is a low-level function for benchmarking and advanced usage. pv_picollm_generate() should be used when possible.

Parameters

object pv_picollm_t * : picoLLM object.

Returns

pv_status_t : Status code.

pv_picollm_model()

pv_status_t pv_picollm_model(const pv_picollm_t *object, const char **model);

Getter for the model's information.

Parameters

object pv_picollm_t * : picoLLM object.
model const char * : Model information.

Returns

pv_status_t : Status code.

pv_picollm_context_length()

pv_status_t pv_picollm_context_length(const pv_picollm_t *object, int32_t *context_length);

Getter for model's context length.

Parameters

object pv_picollm_t * : picoLLM object.
context_length int32_t * : Context length.

Returns

pv_status_t : Status code.

pv_picollm_version()

const char *pv_picollm_version(void);

Getter for version.

Returns

const char * : Version.

pv_picollm_max_top_choices()

int32_t pv_picollm_max_top_choices(void);

Getter for maximum number of top choices for pv_picollm_generate().

Returns

int32_t : Maximum number of top choices.

pv_picollm_list_hardware_devices()

pv_status_t pv_picollm_list_hardware_devices(
        char ***hardware_devices,
        int32_t *num_hardware_devices);

Gets a list of hardware devices that can be specified when calling pv_picollm_init().

Parameters

hardware_devices const char * * : Array of available hardware devices. Devices are NULL terminated strings. The array must be freed using pv_picollm_free_hardware_devices().
num_hardware_devices int32_t * : The number of devices in the hardware_devices array.

Returns

pv_status_t : Status code.

pv_picollm_free_hardware_devices()

void pv_picollm_free_hardware_devices(
        char **hardware_devices,
        int32_t num_hardware_devices);

Frees the memory allocated for the hardware devices list returned by pv_picollm_list_hardware_devices().

Parameters

hardware_devices const char * * : Array of available hardware devices.
num_hardware_devices int32_t : The number of devices in the hardware_devices array.

pv_status_t

typedef enum {
    PV_STATUS_SUCCESS = 0,
    PV_STATUS_OUT_OF_MEMORY,
    PV_STATUS_IO_ERROR,
    PV_STATUS_INVALID_ARGUMENT,
    PV_STATUS_STOP_ITERATION,
    PV_STATUS_KEY_ERROR,
    PV_STATUS_INVALID_STATE,
    PV_STATUS_RUNTIME_ERROR,
    PV_STATUS_ACTIVATION_ERROR,
    PV_STATUS_ACTIVATION_LIMIT_REACHED,
    PV_STATUS_ACTIVATION_THROTTLED,
    PV_STATUS_ACTIVATION_REFUSED
} pv_status_t;

Status code enum.

pv_picollm_usage_t

typedef struct {
    /** Number of tokens in the prompt. */
    int32_t prompt_tokens;
    /** Number of tokens in the completion. */
    int32_t completion_tokens;
} pv_picollm_usage_t;

Struct for the number of tokens in the prompt and completion.

pv_picollm_endpoint_t

typedef enum {
    PV_PICOLLM_ENDPOINT_END_OF_SENTENCE = 0,
    PV_PICOLLM_ENDPOINT_COMPLETION_TOKEN_LIMIT_REACHED = 1,
    PV_PICOLLM_ENDPOINT_STOP_PHRASE_ENCOUNTERED = 2,
} pv_picollm_endpoint_t;

Enum for the endpoint detection types.

pv_picollm_token_t

typedef struct {
    /** Token. */
    char *token;
    /** Log probability. */
    float log_prob;
} pv_picollm_token_t;

Struct for a token and its associated log probability.

pv_picollm_completion_token_t

typedef struct {
    /** Token. */
    pv_picollm_token_t token;
    /** Number of top choices. */
    int32_t num_top_choices;
    /** Top choices. */
    pv_picollm_token_t *top_choices;
} pv_picollm_completion_token_t;

Struct for a token within completion and top alternative tokens.

pv_picollm_stream_callback_t

typedef void (*pv_picollm_stream_callback_t)(const char *, void *);

Stream callback function type used in the pv_picollm_generate() function.

pv_status_to_string()

const char *pv_status_to_string(pv_status_t status);

Parameters

status pv_status_t : Status code.

Returns

const char * : String representation of the status code.

pv_get_error_stack()

pv_status_t pv_get_error_stack(
        char ***message_stack,
        int32_t *message_stack_depth);

If a function returns a failure (any pv_status_t other than PV_STATUS_SUCCESS), this function can be called to get a series of error messages related to the failure. This function can only be called only once per failure status on another function. The memory for message_stack must be freed using pv_free_error_stack.

Regardless of the return status of this function, if message_stack is not NULL, then message_stack contains valid memory. However, a failure status on this function indicates that future error messages may not be reported.

Parameters

message_stack char * * : Array of messages relating to the failure. Messages are NULL terminated strings. The array must be freed using pv_free_error_stack().
message_stack_depth int32_t * : The number of messages in the message_stack array.

Returns

pv_status_t : Status code.

pv_free_error_stack()

void pv_free_error_stack(char **message_stack);

This function frees the memory used by error messages allocated by pv_get_error_stack().

Parameters

message_stack char * * : Array of messages relating to the failure. Messages are NULL terminated strings.

Was this doc helpful?

Issue with this doc?

picoLLM Inference Engine C API

picoLLM Inference Engine
C API