Large Language Models (LLMs), such as Llama 2
and Llama 3
, represent significant advancements in artificial intelligence, fundamentally changing how we are able to interact with technology. These AI models can understand and generate human-like text, making them useful for applications like chatbots, voice assistants, and natural language processing.
One major drawback of large language models (LLMs) is their high resource requirements. However, by running cross-platform (Windows, Mac, Linux) offline LLMs, these models can leverage robust CPU
s and GPU
s across different systems. Using local hardware eliminates network latency issues and also addresses privacy concerns, as data stays on the user's device. By running models like Llama 2
or Llama 3
locally, users gain enhanced privacy, reliability, and efficiency without needing an internet connection.
Picovoice's picoLLM Inference engine makes it easy to perform offline LLM inference. picoLLM Inference
is a lightweight inference engine that operates locally, ensuring privacy compliance with GDPR
and HIPAA
regulations. Llama
models, compressed by picoLLM Compression, are ideal for real-time applications given their smaller footprint.
In just a few lines of code, you can start performing LLM inference using the picoLLM Inference
Node.js SDK. Let’s get started!
Before Running Llama with Node.js
Install Packages
Create a project and install @picovoice/picollm-node.
Sign up for Picovoice Console
Next, create a Picovoice Console account, and copy your AccessKey
from the main dashboard. Creating an account is free, and no credit card is required!
Model File
Download any Llama 2
or Llama 3
model file (.pllm
) from the picoLLM page on Picovoice Console and place the file in your project.
Building a Simple Application with Llama
Create an instance of picoLLM
with your AccessKey
and model file (.pllm
):
Pass in your prompt to the generate
function. You may also pass in a streamCallback
function to handle response tokens as soon as they are available:
There are many configuration options in addition to streamCallback
. For the full list of options, check out the picoLLM Inference API docs.
When done, be sure to release the resources explicitly:
For a complete working project, take a look at the picoLLM Inference Node.js Demo. You can also view the picoLLM Inference Node.js API docs for details on the package.