picoLLM Inference Engine
Node.js API
API Reference for the picoLLM Node.js SDK (npm)
PicoLLM
Class for the picoLLM Inference Engine.
picoLLM can be initialized using the class constructor().
Resources should be cleaned when you are done using the release() method.
PicoLLM.constructor()
picoLLM constructor.
Parameters
accessKeystring : AccessKey obtained from Picovoice Console.modelPathstring : Absolute path to the file containing LLM parameters (.pllm).optionsPicoLLMInitOptions : Optional init configuration arguments.
Returns
PicoLLM: An instance of the picoLLM engine.
Throws
PicoLLMError: If an error is encountered.
PicoLLM.generate()
Given a text prompt and a set of generation parameters, creates a completion text and relevant metadata.
Parameters
promptstring : Prompt text.optionsPicoLLMGenerateOptions : Generate configuration arguments, see PicoLLMGenerateOptions for details.
Returns
PicoLLMCompletion: Completion result.
Throws
PicoLLMError: If an error is encountered.
PicoLLM.interrupt()
Interrupts .generate() if generation is in progress. Otherwise, it has no effect.
Throws
PicoLLMError: If interrupt fails.
PicoLLM.tokenize()
Tokenizes a given text using the model's tokenizer. This is a low-level function meant for benchmarking and
advanced usage. .generate() should be used when possible.
Parameters
textstring : Text string.bosboolean : If set totrue, the tokenizer prepends the beginning of the sentence token to the result.esoboolean : If set totrue, the tokenizer appends the end of the sentence token to the result.
Returns
number[]: Tokens representing the input text.
Throws
PicoLLMError: If an error is encountered.
PicoLLM.forward()
Performs a single forward pass given a token and returns the logits. This is a low-level function for
benchmarking and advanced usage. .generate() should be used when possible.
Parameters
tokennumber : Input token.
Returns
number[]: Logits.
Throws
PicoLLMError: If an error is encountered.
PicoLLM.reset()
Resets the internal state of LLM. It should be called in conjunction with .forward() when processing a new
sequence of tokens. This is a low-level function for benchmarking and advanced usage. .generate() should be
used when possible.
Throws
PicoLLMError: If an error is encountered.
PicoLLM.getDialog()
Return the Dialog object corresponding to the loaded model. The model needs to be instruction-tuned and have a specific chat template.
Parameters
modestring? : Some models (e.g.,phi-2) define multiple chat template models. For example,phi-2allows bothqaandchattemplates.historynumber? : refers to the number of latest back-and-forths to include in the prompt. Setting history toundefinedwill embed the entire dialog in the prompt.systemsystem : System instruction to embed in the prompt for configuring the model's responses.
Returns
Throws
PicoLLMError: If an error is encountered.
PicoLLM.release()
Releases resources acquired by the picoLLM Node.js SDK.
PicoLLM.listAvailableDevices()
Lists all available devices that picoLLM can use for inference. Each entry in the list can be the device argument
of the constructor.
Returns
string[]: all available devices that picoLLM can use for inference.
Throws
PicoLLMError: If an error is encountered.
PicoLLM.contextLength
Model's context length.
PicoLLM.maxTopChoices
Maximum number of top choices for generate.
PicoLLM.model
Model's name.
PicoLLM.version
picoLLM version.
PicoLLMEndpoint
Enum of picoLLM endpoints.
PicoLLMInitOptions
picoLLM init options type.
devicestring? : String representation of the device (e.g., CPU or GPU) to use for inference. If set tobest, picoLLM picks the most suitable device. If set togpu, the engine uses the first available GPU device. To select a specific GPU device, set this argument togpu:${GPU_INDEX}, where${GPU_INDEX}is the index of the target GPU. If set tocpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}, where${NUM_THREADS}is the desired number of threads.libraryPathstring? : Absolute path to picoLLM's dynamic library.
PicoLLMGenerateOptions
picoLLM generate options type.
completionTokenLimitnumber : Maximum number of tokens in the completion. If the generation process stops due to reaching this limit, the.endpointparameter inPicoLLMCompletionoutput will bePicoLLMEndpoint.COMPLETION_TOKEN_LIMIT_REACHED. Set toundefinedto impose no limit.stopPhrasesstring[] : The generation process stops when it encounters any of these phrases in the completion. The already generated completion, including the encountered stop phrase, will be returned. Theendpointparameter inPicoLLMCompletionoutput will bePicoLLMEndpoint.STOP_PHRASE_ENCOUNTERED. Set toundefinedto turn off this feature.seednumber : The internal random number generator uses it as its seed if set to a positive integer value. Seeding enforces deterministic outputs. Set toundefinedfor randomized outputs for a given prompt.presencePenaltynumber : It penalizes logits already appearing in the partial completion if set to a positive value. If set to0orundefined, it has no effect.frequencyPenaltynumber : If set to a positive floating-point value, it penalizes logits proportional to the frequency of their appearance in the partial completion. If set to0orundefined, it has no effect.temperaturenumber : Sampling temperature. Temperature is a non-negative floating-point value that controls the randomness of the sampler. A higher temperature smoothens the samplers' output, increasing the randomness. In contrast, a lower temperature creates a narrower distribution and reduces variability. Setting it to0orundefinedselects the maximum logit during sampling.topPnumber : A positive floating-point number within 0, and 1. It restricts the sampler's choices to high-probability logits that form thetopPportion of the probability mass. Hence, it avoids randomly selecting unlikely logits. A value of1orundefinedenables the sampler to pick any token with non-zero probability turning off the feature.numTopChoicesnumber : If set to a positive value, picoLLM returns the list of the highest probability tokens for any generated token. Set to0to turn off the feature. The maximum number of top choices is.maxTopChoices.streamCallback(token: string) => void : If not set toundefined, picoLLM executes this callback every time a new piece of completion string becomes available.
PicoLLMUsage
picoLLM usage type.
promptTokensnumber : Number of tokens in the prompt.completionTokensnumber : Number of tokens in the completion.
PicoLLMToken
picoLLM token type.
tokenstring : Token string.logProbnumber : Log probability.
PicoLLMCompletionToken
picoLLM completion token type.
tokenPicoLLMToken:PicoLLMToken.topChoicesPicoLLMToken[] : TopPicoLLMTokenchoices.
PicoLLMCompletion
picoLLM completion type.
usagePicoLLMUsage: Usage information.endpointPicoLLMEndpoint[] : Reason for ending the generation process.completionTokensPicoLLMCompletionToken[] : Generated tokens within completion and top alternative tokens.completionstring : Completion string.
PicoLLMError
Error thrown if an error occurs within picoLLM Inference Engine.
Errors:
Dialog
Dialog is a helper class that stores a chat dialog and formats it according to an instruction-tuned LLM's chat template. Dialog is the base class. Each supported instruction-tuned LLM has an accompanying concrete subclass.
Dialog.constructor
Dialog constructor.
Parameters
historynumber? : The number of latest back-and-forths to include in the prompt. Setting history toundefinedwill embed the entire dialog in the prompt.systemstring? : Instruction to embed in the prompt for configuring the model's responses.
Returns
Dialog.addHumanRequest
Adds human's request to the dialog.
Parameters
contentstring : Human's request.
Dialog.addLLMResponse
Adds LLM's response to the dialog.
Parameters
contentstring : LLM's response.
Dialog.prompt
Creates a prompt string given parameters passed the constructor and dialog's content.
Returns
string: Formatted prompt.
GemmaChatDialog
Dialog helper for gemma-2b-it.
Llama2ChatDialog
Dialog helper for llama-2-7b-chat.
Llama3ChatDialog
Dialog helper for llama-3-8b-instruct.
Llama32ChatDialog
Dialog helper for llama-3.2-1b-instruct and llama-3.2-3b-instruct.
MistralChatDialog
Dialog helper for mistral-7b-instruct-v0.1 and mistral-7b-instruct-v0.2.
MixtralChatDialog
Dialog helper for mixtral-8x7b-instruct-v0.1.
Phi2Dialog
Dialog helper for phi-2 base class.
Phi2Dialog.constructor
typescript constructor.
Parameters
humanTagstring : Tag to classify human requests.llmTagstring :Tag to classify llm responses.historynumber? : The number of latest back-and-forths to include in the prompt. Setting history toundefinedwill embed the entire dialog in the prompt.systemstring? : Instruction to embed in the prompt for configuring the model's responses.
Returns
Phi2Dialog: An instance ofPhi2Dialogclass.
Phi2QADialog
Dialog helper for phi-2 qa mode.
Phi2ChatDialog
Dialog helper for phi-2 chat mode.
Phi3ChatDialog
Dialog helper for phi3.
Phi35ChatDialog
Dialog helper for phi3.5.