Picovoice Wordmark
Start Building
Introduction
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryPicovoice picoLLMGPTQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustWeb
SummaryPicovoice LeopardAmazon TranscribeAzure Speech-to-TextGoogle ASRGoogle ASR (Enhanced)IBM Watson Speech-to-TextWhisper Speech-to-Text
FAQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustWeb
SummaryPicovoice Cheetah
FAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryAmazon PollyAzure TTSElevenLabsOpenAI TTSPicovoice Orca
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice KoalaMozilla RNNoise
Introduction
AndroidCiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidCNode.jsPythoniOSWeb
SummaryPicovoice EaglepyannoteSpeechBrainWeSpeaker
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice FalconAmazon TranscribeAzure Speech-to-TextGoogle Speech-to-Textpyannote
Introduction
AndroidArduinoCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSMicrocontrollerNode.jsPythonRaspberry PiReactReact NativeRustSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaMicrocontrollerNode.jsPythonReactReact NativeRustUnityWeb
SummaryPorcupineSnowboyPocketSphinx
Wake Word TipsFAQ
Introduction
AndroidCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustUnityWeb
SummaryPicovoice RhinoGoogle DialogflowAmazon LexIBM WatsonMicrosoft LUIS
Expression SyntaxFAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiRustWebWindows
AndroidC.NETiOSNode.jsPythonRustWeb
SummaryPicovoice CobraWebRTC VAD
FAQ
Introduction
AndroidC.NETFlutteriOSNode.jsPythonReact NativeRustUnityWeb
AndroidC.NETFlutteriOSNode.jsPythonReact NativeRustUnityWeb
Introduction
C.NETNode.jsPython
C.NETNode.jsPython
FAQGlossary

picoLLM Inference Engine
Web API

API Reference for the picoLLM Web SDK (npm)


PicoLLM

class PicoLLM {}

Class for the picoLLM Inference Engine.


PicoLLM.create()

static async function create(
accessKey: string,
model: PicoLLMModel,
options: PicoLLMInitOptions = {}
): Promise<PicoLLM>

Creates an instance of picoLLM Inference engine using .pllm file in public directory. The model size is large, hence it will try to use the existing one if it exists, otherwise saves the model in storage.

Parameters

  • accessKey string : AccessKey obtained from Picovoice Console.
  • model PicoLLMModel : picoLLM model representation, see PicoLLMModel for details.
  • options PicoLLMOptions : Optional init configuration arguments, see PicoLLMInitOptions for details.

Returns

  • PicoLLM : An instance of the picoLLM Inference Engine.

Throws

  • PicoLLMError: If an error is encountered.

PicoLLM.generate()

async function generate(
prompt: string,
options: PicoLLMGenerateOptions = {}
): Promise<PicoLLMCompletion>

Given a text prompt and a set of generation parameters, creates a completion text and relevant metadata.

Parameters

  • prompt string : Prompt text.
  • options PicoLLMGenerateOptions : Generate configuration arguments, see PicoLLMGenerateOptions for details.

Returns

  • PicoLLMCompletion : Completion result.

Throws

  • PicoLLMError: If an error is encountered.

PicoLLM.interrupt()

function interrupt(): void

Interrupts .generate() if generation is in progress. Otherwise, it has no effect.

Throws

  • PicoLLMError: If interrupt fails.

PicoLLM.tokenize()

async function tokenize(
text: string,
bos: boolean,
eos: boolean
): Promise<number[]>

Tokenizes a given text using the model's tokenizer. This is a low-level function meant for benchmarking and advanced usage. .generate() should be used when possible.

Parameters

  • text string : Text string.
  • bos boolean : If set to true, the tokenizer prepends the beginning of the sentence token to the result.
  • eso boolean : If set to true, the tokenizer appends the end of the sentence token to the result.

Returns

  • number[] : Tokens representing the input text.

Throws

  • PicoLLMError: If an error is encountered.

PicoLLM.forward()

async function forward(token: number): Promise<number[]>

Performs a single forward pass given a token and returns the logits. This is a low-level function for benchmarking and advanced usage. .generate() should be used when possible.

Parameters

  • token number : Input token.

Returns

  • number[] : Logits.

Throws

  • PicoLLMError: If an error is encountered.

PicoLLM.reset()

async function reset(): Promise<void>

Resets the internal state of LLM. It should be called in conjunction with .forward() when processing a new sequence of tokens. This is a low-level function for benchmarking and advanced usage. .generate() should be used when possible.


PicoLLM.getDialog()

async function getDialog(
mode?: string,
history: number = 0,
system?: string
): Dialog

Return the Dialog object corresponding to the loaded model. The model needs to be instruction-tuned and have a specific chat template.

Parameters

  • mode string? : Some models (e.g., phi-2) define multiple chat template models. For example, phi-2 allows both qa and chat templates.
  • history number? : refers to the number of latest back-and-forths to include in the prompt. Setting history to undefined will embed the entire dialog in the prompt.
  • system system : System instruction to embed in the prompt for configuring the model's responses.

Returns

  • Dialog : Constructed dialog object..

Throws

  • PicoLLMError: If an error is encountered.

PicoLLM.release()

async function release(): Promise<void>

Releases resources acquired by the picoLLM Web SDK.


PicoLLM.listAvailableDevices()

static async function listAvailableDevices(): Promise<string[]>

Lists all available devices that picoLLM can use for inference. Each entry in the list can be the device argument of .create method.

Returns

  • string[] : all available devices that picoLLM can use for inference.

Throws

  • PicoLLMError: If an error is encountered.

PicoLLM.contextLength

get contextLength(): number

Model's context length.


PicoLLM.maxTopChoices

get maxTopChoices(): number

Maximum number of top choices for generate.


PicoLLM.model

get model(): string

Model's name.


PicoLLM.version

get version(): string

picoLLM version.


PicoLLMEndpoint

enum PicoLLMEndpoint {
END_OF_SENTENCE,
COMPLETION_TOKEN_LIMIT_REACHED,
STOP_PHRASE_ENCOUNTERED,
}

Enum of picoLLM endpoints.


PicoLLMModel

type PicoLLMModel = {
modelFile: string | File | Blob | (string | File | Blob)[];
cacheFilePath?: string;
cacheFileVersion?: number;
cacheFileOverwrite?: boolean;
numFetchRetries?: number;
}

picoLLM model type.

  • modelFile string | File | Blob | (string | File | Blob)[]: The model file can be one or chunks (in order) of:
    • URL string of the model file.
    • File object containing the contents of the model file.
    • Blob object containing the bytes of the model file.
  • cacheFilePath string?: Custom path to save the model in storage. Set to a different name to use multiple models across picoLLM instances.
  • cacheFileVersion number? : picoLLM model version. Set to a higher number to update the model file.
  • cacheFileOverwrite boolean? : Flag to force overwrite the model in storage even if it exists.
  • numFetchRetries number? : Number of retries to try and fetch the model file.

PicoLLMInitOptions

type PicoLLMInitOptions = {
device?: string;
};

picoLLM init options type.

  • device string? : String representation of the device to use for inference. If set to best, picoLLM picks the most suitable device. If set to cpu, the engine will run on the CPU with the default number of threads. To specify the number of threads, set this argument to cpu:${NUM_THREADS}, where ${NUM_THREADS} is the desired number of threads. The number of threads is capped at the max available cores determined by the browser.

PicoLLMGenerateOptions

type PicoLLMGenerateOptions = {
completionTokenLimit?: number;
stopPhrases?: string[];
seed?: number;
presencePenalty?: number;
frequencyPenalty?: number;
temperature?: number;
topP?: number;
numTopChoices?: number;
streamCallback?: (token: string) => void;
}

picoLLM generate options type.

  • completionTokenLimit number : Maximum number of tokens in the completion. If the generation process stops due to reaching this limit, the .endpoint parameter in PicoLLMCompletion output will be PicoLLMEndpoint.COMPLETION_TOKEN_LIMIT_REACHED. Set to undefined to impose no limit.
  • stopPhrases string[] : The generation process stops when it encounters any of these phrases in the completion. The already generated completion, including the encountered stop phrase, will be returned. The endpoint parameter in PicoLLMCompletion output will be PicoLLMEndpoint.STOP_PHRASE_ENCOUNTERED. Set to undefined to turn off this feature.
  • seed number : The internal random number generator uses it as its seed if set to a positive integer value. Seeding enforces deterministic outputs. Set to undefined for randomized outputs for a given prompt.
  • presencePenalty number : It penalizes logits already appearing in the partial completion if set to a positive value. If set to 0 or undefined, it has no effect.
  • frequencyPenalty number : If set to a positive floating-point value, it penalizes logits proportional to the frequency of their appearance in the partial completion. If set to 0 or undefined, it has no effect.
  • temperature number : Sampling temperature. Temperature is a non-negative floating-point value that controls the randomness of the sampler. A higher temperature smoothens the samplers' output, increasing the randomness. In contrast, a lower temperature creates a narrower distribution and reduces variability. Setting it to 0 or undefined selects the maximum logit during sampling.
  • topP number : A positive floating-point number within 0, and 1. It restricts the sampler's choices to high-probability logits that form the topP portion of the probability mass. Hence, it avoids randomly selecting unlikely logits. A value of 1 or undefined enables the sampler to pick any token with non-zero probability turning off the feature.
  • numTopChoices number : If set to a positive value, picoLLM returns the list of the highest probability tokens for any generated token. Set to 0 to turn off the feature. The maximum number of top choices is .maxTopChoices.
  • streamCallback (token: string) => void : If not set to undefined, picoLLM executes this callback every time a new piece of completion string becomes available.

PicoLLMUsage

type PicoLLMUsage = {
promptTokens: number;
completionTokens: number;
}

picoLLM usage type.

  • promptTokens number : Number of tokens in the prompt.
  • completionTokens number : Number of tokens in the completion.

PicoLLMToken

type PicoLLMToken = {
token: string;
logProb: number;
}

picoLLM token type.

  • token string : Token string.
  • logProb number : Log probability.

PicoLLMCompletionToken

type PicoLLMCompletionToken = {
token: PicoLLMToken;
topChoices: PicoLLMToken[];
}

picoLLM completion token type.

  • token PicoLLMToken : PicoLLMToken.
  • topChoices PicoLLMToken[] : Top PicoLLMToken choices.

PicoLLMCompletion

type PicoLLMCompletion = {
usage: PicoLLMUsage;
endpoint: PicoLLMEndpoint;
completionTokens: PicoLLMCompletionToken[];
completion: string;
}

picoLLM completion type.

  • usage PicoLLMUsage : Usage information.
  • endpoint PicoLLMEndpoint[] : Reason for ending the generation process.
  • completionTokens PicoLLMCompletionToken[] : Generated tokens within completion and top alternative tokens.
  • completion string : Completion string.

PicoLLMError

class PicoLLMError extends Error { }

Error thrown if an error occurs within picoLLM Inference Engine.

Errors:

class PicoLLMActivationError extends PicoLLMError { }
class PicoLLMActivationLimitError extends PicoLLMError { }
class PicoLLMActivationRefusedError extends PicoLLMError { }
class PicoLLMActivationThrottledError extends PicoLLMError { }
class PicoLLMIOError extends PicoLLMError { }
class PicoLLMInvalidArgumentError extends PicoLLMError { }
class PicoLLMInvalidStateError extends PicoLLMError { }
class PicoLLMKeyError extends PicoLLMError { }
class PicoLLMMemoryError extends PicoLLMError { }
class PicoLLMRuntimeError extends PicoLLMError { }
class PicoLLMStopIterationError extends PicoLLMError { }

PicoLLMWorker

class PicoLLMWorker {}

Class for the running the picoLLM Inference Engine on a Web Worker.


PicoLLMWorker.create()

static async function create(
accessKey: string,
model: PicoLLMModel,
options: PicoLLMInitOptions = {}
): Promise<PicoLLMWorker>

Creates an instance of PicoLLMWorker using .pllm file in public directory. The model size is large, hence it will try to use the existing one if it exists, otherwise saves the model in storage.

Parameters

  • accessKey string : AccessKey obtained from Picovoice Console.
  • model PicoLLMModel : picoLLM model representation, see PicoLLMModel for details.
  • options PicoLLMOptions : Optional init configuration arguments, see PicoLLMInitOptions for details.

Returns

  • PicoLLMWorker : An instance of the picoLLM Inference Engine on a Web Worker.

Throws

  • PicoLLMError: If an error is encountered.

PicoLLMWorker.generate()

async function generate(
prompt: string,
options: PicoLLMGenerateOptions = {}
): Promise<PicoLLMCompletion>

Given a text prompt and a set of generation parameters, creates a completion text and relevant metadata.

Parameters

  • prompt string : Prompt text.
  • options PicoLLMGenerateOptions : Generate configuration arguments, see PicoLLMGenerateOptions for details.

Returns

  • PicoLLMCompletion : Completion result.

Throws

  • PicoLLMError: If an error is encountered.

PicoLLMWorker.interrupt()

function interrupt(): void

Interrupts .generate() if generation is in progress. Otherwise, it has no effect.

Throws

  • PicoLLMError: If interrupt fails.

PicoLLMWorker.tokenize()

async function tokenize(
text: string,
bos: boolean,
eos: boolean
): Promise<number[]>

Tokenizes a given text using the model's tokenizer. This is a low-level function meant for benchmarking and advanced usage. .generate() should be used when possible.

Parameters

  • text string : Text string.
  • bos boolean : If set to true, the tokenizer prepends the beginning of the sentence token to the result.
  • eso boolean : If set to true, the tokenizer appends the end of the sentence token to the result.

Returns

  • number[] : Tokens representing the input text.

Throws

  • PicoLLMError: If an error is encountered.

PicoLLMWorker.forward()

async function forward(token: number): Promise<number[]>

Performs a single forward pass given a token and returns the logits. This is a low-level function for benchmarking and advanced usage. .generate() should be used when possible.

Parameters

  • token number : Input token.

Returns

  • number[] : Logits.

Throws

  • PicoLLMError: If an error is encountered.

PicoLLMWorker.reset()

async function reset(): Promise<void>

Resets the internal state of LLM. It should be called in conjunction with .forward() when processing a new sequence of tokens. This is a low-level function for benchmarking and advanced usage. .generate() should be used when possible.

Throws

  • PicoLLMError: If an error is encountered.

PicoLLMWorker.getDialog()

async function getDialog(
mode?: string,
history: number = 0,
system?: string
): Dialog

Return the Dialog object corresponding to the loaded model. The model needs to be instruction-tuned and have a specific chat template.

Parameters

  • mode string? : Some models (e.g., phi-2) define multiple chat template models. For example, phi-2 allows both qa and chat templates.
  • history number? : refers to the number of latest back-and-forths to include in the prompt. Setting history to undefined will embed the entire dialog in the prompt.
  • system system : System instruction to embed in the prompt for configuring the model's responses.

Returns

  • Dialog : Constructed dialog object..

PicoLLMWorker.release()

async function release(): Promise<void>

Releases resources acquired by the picoLLM Web SDK.


PicoLLMWorker.listAvailableDevices()

static async function listAvailableDevices(): Promise<string[]>

Lists all available devices that picoLLM can use for inference. Each entry in the list can be the device argument of .create method.

Returns

  • string[] : all available devices that picoLLM can use for inference.

PicoLLMWorker.contextLength

get contextLength(): number

Model's context length.


PicoLLMWorker.maxTopChoices

get maxTopChoices(): number

Maximum number of top choices for generate.


PicoLLMWorker.model

get model(): string

Model's name.


PicoLLMWorker.version

get version(): string

PicoLLM version.


Dialog

class Dialog {}

Dialog is a helper class that stores a chat dialog and formats it according to an instruction-tuned LLM's chat template. Dialog is the base class. Each supported instruction-tuned LLM has an accompanying concrete subclass.


Dialog.constructor

Dialog.constructor(
history?: number,
system?: string
)

Dialog constructor.

Parameters

  • history number? : The number of latest back-and-forths to include in the prompt. Setting history to undefined will embed the entire dialog in the prompt.
  • system string? : Instruction to embed in the prompt for configuring the model's responses.

Returns

  • Dialog: An instance of Dialog class.

Dialog.addHumanRequest

function addHumanRequest(content: string): void

Adds human's request to the dialog.

Parameters

  • content string : Human's request.

Dialog.addLLMResponse

function addLLMResponse(content: string): void

Adds LLM's response to the dialog.

Parameters

  • content string : LLM's response.

Dialog.prompt

function prompt(): string

Creates a prompt string given parameters passed the constructor and dialog's content.

Returns

  • string : Formatted prompt.

GemmaChatDialog

class GemmaChatDialog extends Dialog {}

Dialog helper for gemma-2b-it.


Llama2ChatDialog

class Llama2ChatDialog extends Dialog {}

Dialog helper for llama-2-7b-chat.


Llama3ChatDialog

class Llama3ChatDialog extends Dialog {}

Dialog helper for llama-3-8b-instruct.


Llama32ChatDialog

class Llama32ChatDialog extends Llama3ChatDialog {}

Dialog helper for llama-3.2-1b-instruct and llama-3.2-3b-instruct.


MistralChatDialog

class MistralChatDialog extends Dialog {}

Dialog helper for mistral-7b-instruct-v0.1 and mistral-7b-instruct-v0.2.


MixtralChatDialog

class MixtralChatDialog extends Dialog {}

Dialog helper for mixtral-8x7b-instruct-v0.1.


Phi2Dialog

class Phi2Dialog extends Dialog {}

Dialog helper for phi-2 base class.


Phi2Dialog.constructor

Phi2Dialog.constructor(
humanTag: string,
llmTag: string,
history?: number,
system?: string
)

typescript constructor.

Parameters

  • humanTag string : Tag to classify human requests.
  • llmTag string :Tag to classify llm responses.
  • history number? : The number of latest back-and-forths to include in the prompt. Setting history to undefined will embed the entire dialog in the prompt.
  • system string? : Instruction to embed in the prompt for configuring the model's responses.

Returns

  • Phi2Dialog: An instance of Phi2Dialog class.

Phi2QADialog

class Phi2QADialog extends Phi2Dialog {}

Dialog helper for phi-2 qa mode.


Phi2ChatDialog

class Phi2ChatDialog extends Phi2Dialog {}

Dialog helper for phi-2 chat mode.


Phi3ChatDialog

class Phi3ChatDialog extends Dialog {}

Dialog helper for phi3.


Phi35ChatDialog

class Phi3ChatDialog extends Phi3ChatDialog {}

Dialog helper for phi3.5.

Was this doc helpful?

Issue with this doc?

Report a GitHub Issue
picoLLM Inference Engine Web API
  • PicoLLM
  • create()
  • generate()
  • interrupt()
  • tokenize()
  • forward()
  • reset()
  • getDialog()
  • release()
  • listAvailableDevices()
  • contextLength
  • maxTopChoices
  • model
  • version
  • PicoLLMEndpoint
  • PicoLLMModel
  • PicoLLMInitOptions
  • PicoLLMGenerateOptions
  • PicoLLMUsage
  • PicoLLMToken
  • PicoLLMCompletionToken
  • PicoLLMCompletion
  • PicoLLMError
  • PicoLLMWorker
  • create()
  • generate()
  • interrupt()
  • tokenize()
  • forward()
  • reset()
  • getDialog()
  • release()
  • listAvailableDevices()
  • contextLength
  • maxTopChoices
  • model
  • version
  • Dialog
  • constructor
  • addHumanRequest
  • addLLMResponse
  • prompt
  • GemmaChatDialog
  • Llama2ChatDialog
  • Llama3ChatDialog
  • Llama32ChatDialog
  • MistralChatDialog
  • MixtralChatDialog
  • Phi2Dialog
  • constructor
  • Phi2QADialog
  • Phi2ChatDialog
  • Phi3ChatDialog
  • Phi35ChatDialog
Voice AI
  • Leopard Speech-to-Text
  • Cheetah Streaming Speech-to-Text
  • Orca Text-to-Speech
  • Koala Noise Suppression
  • Eagle Speaker Recognition
  • Falcon Speaker Diarization
  • Porcupine Wake Word
  • Rhino Speech-to-Intent
  • Cobra Voice Activity Detection
Local LLM
  • picoLLM Inference
  • picoLLM Compression
  • picoLLM GYM
Resources
  • Docs
  • Console
  • Blog
  • Use Cases
  • Playground
Sales & Services
  • Consulting
  • Foundation Plan
  • Enterprise Plan
  • Enterprise Support
Company
  • About us
  • Careers
Follow Picovoice
  • LinkedIn
  • GitHub
  • X
  • YouTube
  • AngelList
Subscribe to our newsletter
Terms of Use
Privacy Policy
© 2019-2025 Picovoice Inc.