picoLLM Inference Engine
iOS API
API Reference for the picoLLM iOS SDK (Cocoapod)
PicoLLM
Class for the picoLLM Inference Engine.
Resources should be cleaned when you are done using the delete()
function.
PicoLLM.model
Getter for the model's information.
Returns
String
: Model name.
PicoLLM.contextLength
Getter for the model's context length.
Returns
Int32
: Context length.
PicoLLM.version
Current picoLLM version.
PicoLLM.maxTopChoices
Maximum number of top choices for .generate()
.
PicoLLM.init()
init
method for picoLLM Inference Engine with a mixture of arguments.
Parameters
accessKey
String : The AccessKey obtained from Picovoice Console.modelPath
String : Absolute path to file containing model parameters (.pllm
).device
String : String representation of the device (e.g., CPU or GPU) to use for inference. If set tobest
, picoLLM picks the most suitable device. If set togpu
, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}
, where${GPU_INDEX}
is the index of the target GPU. If set tocpu
, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}
, where${NUM_THREADS}
is the desired number of threads.
Throws
PicoLLMError
: If an error occurs while creating an instance of picoLLM Inference Engine.
PicoLLM.delete()
Releases resources acquired by picoLLM Inference Engine.
PicoLLM.generate()
Given a text prompt and a set of generation parameters, creates a completion text and relevant metadata.
Parameters
prompt
String : Text prompt.completionTokenLimit
Int32 : Maximum number of tokens in the completion. If the generation process stops due to reaching this limit, theendpoint
output argument will bePicoLLMEndpoint.completionTokenLimitReached
. Set tonil
to impose no limit.stopPhrases
[String]? : The generation process stops when it encounters any of these phrases in the completion. The already generated completion, including the encountered stop phrase, will be returned. Theendpoint
output argument will bePicoLLMEndpoint.stopPhraseEncountered
. Set tonil
to turn off this feature.seed
Int32 : The internal random number generator uses it as its seed if set to a positive integer value. Seeding enforces deterministic outputs. Set tonil
for randomized outputs for a given prompt.presencePenalty
Float : It penalizes logits already appearing in the partial completion if set to a positive value. If set to0.0
, it has no effect.frequencyPenalty
Float : If set to a positive floating-point value, it penalizes logits proportional to the frequency of their appearance in the partial completion. If set to0.0
, it has no effect.temperature
Float : Sampling temperature. Temperature is a non-negative floating-point value that controls the randomness of the sampler. A higher temperature smoothens the samplers' output, increasing the randomness. In contrast, a lower temperature creates a narrower distribution and reduces variability. Setting it to0
selects the maximum logit during sampling.topP
Float : A positive floating-point number within (0, 1]. It restricts the sampler's choices to high-probability logits that form thetopP
portion of the probability mass. Hence, it avoids randomly selecting unlikely logits. A value of1.
enables the sampler to pick any token with non-zero probability, turning off the feature.numTopChoices
Int32 : If set to a positive value, picoLLM returns the list of the highest probability tokens for any generated token. Set to0
to turn off the feature. The maximum number of top choices is.maxTopChoices()
.stream_callback
((String) -> Void)? : If not set tonil
, picoLLM executes this callback every time a new piece of completion string becomes available.
Returns
PicoLLMCompletion
: Object containing stats and generated tokens.
Throws
PicoLLMError
: If there is an error while generating the prompt.
PicoLLM.interrupt()
Interrupts .generate()
if generation is in progress. Otherwise, it has no effect.
Throws
PicoLLMError
: If interrupt fails.
PicoLLM.tokenize()
Tokenizes a given text using the model's tokenizer. This is a low-level function meant for benchmarking and advanced usage. .generate()
should be used when possible.
Parameters
text
String : Text.bos
Bool : If set totrue
, the tokenizer prepends the beginning of the sentence token to the result.eos
Bool : If set totrue
, the tokenizer appends the end of the sentence token to the result.
Returns
[Int32]
: Tokens representing the input text.
Throws
PicoLLMError
: If there is an error while tokenizing.
PicoLLM.forward()
Perform a single forward pass given a token and return the logits. .generate()
should be used when possible.
Parameters
token
int32_t : Input token.
Returns
[Float]
: Logits.
Throws
PicoLLMError
: If there is an error while executing a forward.
PicoLLM.reset()
Resets the internal state of LLM. It should be called in conjunction with .forward()
when processing a new sequence of tokens. This is a low-level function for benchmarking and advanced usage. .generate()
should be used when possible.
Throws
PicoLLMError
: If there is an error while resetting.
PicoLLM.getAvailableDevices()
Gets a list of hardware devices that can be specified when calling .init()
.
Returns
[String]
: Array of available hardware devices.
PicoLLMError
Error thrown if an error occurs within picoLLM Inference engine.
PicoLLMUsage
Struct for the number of tokens in the prompt and completion.
PicoLLMUsage.promptTokens
Number of tokens in the prompt.
PicoLLMUsage.completionTokens
Number of tokens in the completion.
PicoLLMEndpoint
Enum for the endpoint detection types.
PicoLLMToken
Struct for a token and its associated log probability.
PicoLLMToken.token
Token.
PicoLLMToken.logProb
Log probability.
PicoLLMCompletionToken
Struct for a token within completion and top alternative tokens.
PicoLLMCompletionToken.token
Token. See PicoLLMToken.
PicoLLMCompletionToken.topChoices
Top choices. See PicoLLMToken.
PicoLLMCompletion
Result object containing stats and generated tokens.
PicoLLMCompletion.usage
Usage. See PicoLLMUsage.
PicoLLMCompletion.endpoint
Endpoint. See PicoLLMEndpoint.
PicoLLMCompletion.completionTokens
Completion Tokens. See PicoLLMCompletionToken.
PicoLLMCompletion.completion
Completion string.
PicoLLMDialog
Protocol representing the picoLLM dialog interface.
PicoLLMDialog.init()
init
method for PicoLLMDialog.
Parameters
history
Int32? : The AccessKey obtained from Picovoice Console.system
String? : Absolute path to file containing model parameters (.pllm
).
Throws
PicoLLMError
: If an error occurs while creating an instance of PicoLLMDialog.
PicoLLMDialog.addHumanRequest()
Adds human's request to the dialog.
Parameters
content
String : Human's request.
Throws
PicoLLMError
: If an error occurs while adding the human's request.
PicoLLMDialog.addLLMResponse()
Adds LLM's response to the dialog.
Parameters
content
String : LLM's response.
Throws
PicoLLMError
: If an error occurs while adding the LLM's response.
PicoLLMDialog.prompt()
Creates a prompt string given parameters passed the constructor and dialog's content.
Returns
String
: Formatted prompt.
Throws
PicoLLMError
: If an error occurs while adding the LLM's response.
BasePicoLLMDialog
BasePicoLLMDialog is a helper class that stores a chat dialog and formats it according to an instruction-tuned LLM's chat template. BasePicoLLMDialog is the base class. Each supported instruction-tuned LLM has an accompanying concrete subclass.
Phi2Dialog
Dialog helper for phi-2
. This is a base class, use one of the mode-specific subclasses.
Phi2Dialog.init()
init
method for Phi2Dialog.
Parameters
humanRequestsTag
String : Tag to classify human requests.llmResponsesTag
String : Tag to classify llm responses.history
Int32? : The AccessKey obtained from Picovoice Console.system
String? : Absolute path to file containing model parameters (.pllm
).
Throws
PicoLLMError
: If an error occurs while creating an instance of Phi2Dialog.
Phi2QADialog
Dialog helper for phi-2
qa
mode.
Phi2ChatDialog
Dialog helper for phi-2
chat
mode.
MistralChatDialog
Dialog helper for mistral-7b-instruct-v0.1
and mistral-7b-instruct-v0.2
.
MixtralChatDialog
Dialog helper for mixtral-8x7b-instruct-v0.1
.
Llama2ChatDialog
Dialog helper for llama-2-7b-chat
, llama-2-13b-chat
, and llama-2-70b-chat
.
Llama3ChatDialog
Dialog helper for llama-3-8b-instruct
and llama-3-70b-instruct
.
GemmaChatDialog
Dialog helper for gemma-2b-it
and gemma-7b-it
.