picoLLM Inference Engine
Web API
API Reference for the picoLLM Web SDK (npm)
PicoLLM
Class for the picoLLM Inference Engine.
PicoLLM.create()
Creates an instance of picoLLM Inference engine using .pllm
file in public directory. The model size is large, hence it will try to use the existing one if it exists, otherwise saves the model in storage.
Parameters
accessKey
string : AccessKey obtained from Picovoice Console.model
PicoLLMModel : picoLLM model representation, see PicoLLMModel for details.options
PicoLLMOptions : Optional init configuration arguments, see PicoLLMInitOptions for details.
Returns
PicoLLM
: An instance of the picoLLM Inference Engine.
Throws
PicoLLMError
: If an error is encountered.
PicoLLM.generate()
Given a text prompt and a set of generation parameters, creates a completion text and relevant metadata.
Parameters
prompt
string : Prompt text.options
PicoLLMGenerateOptions : Generate configuration arguments, see PicoLLMGenerateOptions for details.
Returns
PicoLLMCompletion
: Completion result.
Throws
PicoLLMError
: If an error is encountered.
PicoLLM.interrupt()
Interrupts .generate()
if generation is in progress. Otherwise, it has no effect.
Throws
PicoLLMError
: If interrupt fails.
PicoLLM.tokenize()
Tokenizes a given text using the model's tokenizer. This is a low-level function meant for benchmarking and
advanced usage. .generate()
should be used when possible.
Parameters
text
string : Text string.bos
boolean : If set totrue
, the tokenizer prepends the beginning of the sentence token to the result.eso
boolean : If set totrue
, the tokenizer appends the end of the sentence token to the result.
Returns
number[]
: Tokens representing the input text.
Throws
PicoLLMError
: If an error is encountered.
PicoLLM.forward()
Performs a single forward pass given a token and returns the logits. This is a low-level function for
benchmarking and advanced usage. .generate()
should be used when possible.
Parameters
token
number : Input token.
Returns
number[]
: Logits.
Throws
PicoLLMError
: If an error is encountered.
PicoLLM.reset()
Resets the internal state of LLM. It should be called in conjunction with .forward()
when processing a new
sequence of tokens. This is a low-level function for benchmarking and advanced usage. .generate()
should be
used when possible.
PicoLLM.getDialog()
Return the Dialog object corresponding to the loaded model. The model needs to be instruction-tuned and have a specific chat template.
Parameters
mode
string? : Some models (e.g.,phi-2
) define multiple chat template models. For example,phi-2
allows bothqa
andchat
templates.history
number? : refers to the number of latest back-and-forths to include in the prompt. Setting history toundefined
will embed the entire dialog in the prompt.system
system : System instruction to embed in the prompt for configuring the model's responses.
Returns
Throws
PicoLLMError
: If an error is encountered.
PicoLLM.release()
Releases resources acquired by the picoLLM Web SDK.
PicoLLM.listAvailableDevices()
Lists all available devices that picoLLM can use for inference. Each entry in the list can be the device
argument
of .create
method.
Returns
string[]
: all available devices that picoLLM can use for inference.
Throws
PicoLLMError
: If an error is encountered.
PicoLLM.contextLength
Model's context length.
PicoLLM.maxTopChoices
Maximum number of top choices for generate.
PicoLLM.model
Model's name.
PicoLLM.version
picoLLM version.
PicoLLMEndpoint
Enum of picoLLM
endpoints.
PicoLLMModel
picoLLM
model type.
modelFile
string | File | Blob | (string | File | Blob)[]: The model file can be one or chunks (in order) of:- URL string of the model file.
- File object containing the contents of the model file.
- Blob object containing the bytes of the model file.
cacheFilePath
string?: Custom path to save the model in storage. Set to a different name to use multiple models acrosspicoLLM
instances.cacheFileVersion
number? :picoLLM
model version. Set to a higher number to update the model file.cacheFileOverwrite
boolean? : Flag to force overwrite the model in storage even if it exists.numFetchRetries
number? : Number of retries to try and fetch the model file.
PicoLLMInitOptions
picoLLM
init options type.
device
string? : String representation of the device to use for inference. If set tobest
, picoLLM picks the most suitable device. If set tocpu
, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument tocpu:${NUM_THREADS}
, where${NUM_THREADS}
is the desired number of threads. The number of threads is capped at the max available cores determined by the browser.
PicoLLMGenerateOptions
picoLLM
generate options type.
completionTokenLimit
number : Maximum number of tokens in the completion. If the generation process stops due to reaching this limit, the.endpoint
parameter inPicoLLMCompletion
output will bePicoLLMEndpoint.COMPLETION_TOKEN_LIMIT_REACHED
. Set toundefined
to impose no limit.stopPhrases
string[] : The generation process stops when it encounters any of these phrases in the completion. The already generated completion, including the encountered stop phrase, will be returned. Theendpoint
parameter inPicoLLMCompletion
output will bePicoLLMEndpoint.STOP_PHRASE_ENCOUNTERED
. Set toundefined
to turn off this feature.seed
number : The internal random number generator uses it as its seed if set to a positive integer value. Seeding enforces deterministic outputs. Set toundefined
for randomized outputs for a given prompt.presencePenalty
number : It penalizes logits already appearing in the partial completion if set to a positive value. If set to0
orundefined
, it has no effect.frequencyPenalty
number : If set to a positive floating-point value, it penalizes logits proportional to the frequency of their appearance in the partial completion. If set to0
orundefined
, it has no effect.temperature
number : Sampling temperature. Temperature is a non-negative floating-point value that controls the randomness of the sampler. A higher temperature smoothens the samplers' output, increasing the randomness. In contrast, a lower temperature creates a narrower distribution and reduces variability. Setting it to0
orundefined
selects the maximum logit during sampling.topP
number : A positive floating-point number within 0, and 1. It restricts the sampler's choices to high-probability logits that form thetopP
portion of the probability mass. Hence, it avoids randomly selecting unlikely logits. A value of1
orundefined
enables the sampler to pick any token with non-zero probability turning off the feature.numTopChoices
number : If set to a positive value, picoLLM returns the list of the highest probability tokens for any generated token. Set to0
to turn off the feature. The maximum number of top choices is.maxTopChoices
.streamCallback
(token: string) => void : If not set toundefined
, picoLLM executes this callback every time a new piece of completion string becomes available.
PicoLLMUsage
picoLLM
usage type.
promptTokens
number : Number of tokens in the prompt.completionTokens
number : Number of tokens in the completion.
PicoLLMToken
picoLLM
token type.
token
string : Token string.logProb
number : Log probability.
PicoLLMCompletionToken
picoLLM
completion token type.
token
PicoLLMToken
:PicoLLMToken
.topChoices
PicoLLMToken
[] : TopPicoLLMToken
choices.
PicoLLMCompletion
picoLLM
completion type.
usage
PicoLLMUsage
: Usage information.endpoint
PicoLLMEndpoint
[] : Reason for ending the generation process.completionTokens
PicoLLMCompletionToken
[] : Generated tokens within completion and top alternative tokens.completion
string : Completion string.
PicoLLMError
Error thrown if an error occurs within picoLLM
Inference Engine.
Errors:
PicoLLMWorker
Class for the running the picoLLM Inference Engine on a Web Worker.
PicoLLMWorker.create()
Creates an instance of PicoLLMWorker using .pllm
file in public directory. The model size is large, hence it will try to use the existing one if it exists, otherwise saves the model in storage.
Parameters
accessKey
string : AccessKey obtained from Picovoice Console.model
PicoLLMModel : picoLLM model representation, see PicoLLMModel for details.options
PicoLLMOptions : Optional init configuration arguments, see PicoLLMInitOptions for details.
Returns
PicoLLMWorker
: An instance of the picoLLM Inference Engine on a Web Worker.
Throws
PicoLLMError
: If an error is encountered.
PicoLLMWorker.generate()
Given a text prompt and a set of generation parameters, creates a completion text and relevant metadata.
Parameters
prompt
string : Prompt text.options
PicoLLMGenerateOptions : Generate configuration arguments, see PicoLLMGenerateOptions for details.
Returns
PicoLLMCompletion
: Completion result.
Throws
PicoLLMError
: If an error is encountered.
PicoLLMWorker.interrupt()
Interrupts .generate()
if generation is in progress. Otherwise, it has no effect.
Throws
PicoLLMError
: If interrupt fails.
PicoLLMWorker.tokenize()
Tokenizes a given text using the model's tokenizer. This is a low-level function meant for benchmarking and
advanced usage. .generate()
should be used when possible.
Parameters
text
string : Text string.bos
boolean : If set totrue
, the tokenizer prepends the beginning of the sentence token to the result.eso
boolean : If set totrue
, the tokenizer appends the end of the sentence token to the result.
Returns
number[]
: Tokens representing the input text.
Throws
PicoLLMError
: If an error is encountered.
PicoLLMWorker.forward()
Performs a single forward pass given a token and returns the logits. This is a low-level function for
benchmarking and advanced usage. .generate()
should be used when possible.
Parameters
token
number : Input token.
Returns
number[]
: Logits.
Throws
PicoLLMError
: If an error is encountered.
PicoLLMWorker.reset()
Resets the internal state of LLM. It should be called in conjunction with .forward()
when processing a new
sequence of tokens. This is a low-level function for benchmarking and advanced usage. .generate()
should be
used when possible.
Throws
PicoLLMError
: If an error is encountered.
PicoLLMWorker.getDialog()
Return the Dialog object corresponding to the loaded model. The model needs to be instruction-tuned and have a specific chat template.
Parameters
mode
string? : Some models (e.g.,phi-2
) define multiple chat template models. For example,phi-2
allows bothqa
andchat
templates.history
number? : refers to the number of latest back-and-forths to include in the prompt. Setting history toundefined
will embed the entire dialog in the prompt.system
system : System instruction to embed in the prompt for configuring the model's responses.
Returns
PicoLLMWorker.release()
Releases resources acquired by the picoLLM Web SDK.
PicoLLMWorker.listAvailableDevices()
Lists all available devices that picoLLM can use for inference. Each entry in the list can be the device
argument
of .create
method.
Returns
string[]
: all available devices that picoLLM can use for inference.
PicoLLMWorker.contextLength
Model's context length.
PicoLLMWorker.maxTopChoices
Maximum number of top choices for generate.
PicoLLMWorker.model
Model's name.
PicoLLMWorker.version
PicoLLM version.
Dialog
Dialog is a helper class that stores a chat dialog and formats it according to an instruction-tuned LLM's chat template. Dialog is the base class. Each supported instruction-tuned LLM has an accompanying concrete subclass.
Dialog.constructor
Dialog
constructor.
Parameters
history
number? : The number of latest back-and-forths to include in the prompt. Setting history toundefined
will embed the entire dialog in the prompt.system
string? : Instruction to embed in the prompt for configuring the model's responses.
Returns
Dialog.addHumanRequest
Adds human's request to the dialog.
Parameters
content
string : Human's request.
Dialog.addLLMResponse
Adds LLM's response to the dialog.
Parameters
content
string : LLM's response.
Dialog.prompt
Creates a prompt string given parameters passed the constructor and dialog's content.
Returns
string
: Formatted prompt.
GemmaChatDialog
Dialog helper for gemma-2b-it
.
Llama2ChatDialog
Dialog helper for llama-2-7b-chat
.
Llama3ChatDialog
Dialog helper for llama-3-8b-instruct
.
MistralChatDialog
Dialog helper for mistral-7b-instruct-v0.1
and mistral-7b-instruct-v0.2
.
MixtralChatDialog
Dialog helper for mixtral-8x7b-instruct-v0.1
.
Phi2Dialog
Dialog helper for phi-2
base class.
Phi2Dialog.constructor
typescript
constructor.
Parameters
humanTag
string : Tag to classify human requests.llmTag
string :Tag to classify llm responses.history
number? : The number of latest back-and-forths to include in the prompt. Setting history toundefined
will embed the entire dialog in the prompt.system
string? : Instruction to embed in the prompt for configuring the model's responses.
Returns
Phi2Dialog
: An instance ofPhi2Dialog
class.
Phi2QADialog
Dialog helper for phi-2
qa
mode.
Phi2ChatDialog
Dialog helper for phi-2
chat
mode.