Local LLM for Windows, Mac, Linux: Run Llama with Node.js

🎯 On-device LLMs for Enterprises

Deploy state-of-the-art LLMs on-device to reduce latency, and cut infrastructure costs without sacrificing accuracy.

Large Language Models (LLMs), such as Llama 2 and Llama 3, represent significant advancements in artificial intelligence, fundamentally changing how we are able to interact with technology. These AI models can understand and generate human-like text, making them useful for applications like chatbots, voice assistants, and natural language processing.

One major drawback of large language models (LLMs) is their high resource requirements. However, by running cross-platform (Windows, Mac, Linux) offline LLMs, these models can leverage robust CPUs and GPUs across different systems. Using local hardware eliminates network latency issues and also addresses privacy concerns, as data stays on the user's device. By running models like Llama 2 or Llama 3 locally, users gain enhanced privacy, reliability, and efficiency without needing an internet connection.

Picovoice's picoLLM Inference engine makes it easy to perform offline LLM inference. picoLLM Inference is a lightweight inference engine that operates locally, ensuring privacy compliance with GDPR and HIPAA regulations. Llama models, compressed by picoLLM Compression, are ideal for real-time applications given their smaller footprint.

In just a few lines of code, you can start performing LLM inference using the picoLLM Inference Node.js SDK. Let’s get started!

Before Running Llama with Node.js

Install Packages

Create a project and install @picovoice/picollm-node.

npm install @picovoice/picollm-node

Next, create a Picovoice Console account, and copy your AccessKey from the main dashboard. Creating an account is free, and no credit card is required!

Model File

Download any Llama 2 or Llama 3 model file (.pllm) from the picoLLM page on Picovoice Console and place the file in your project.

Building a Simple Application with Llama

Create an instance of picoLLM with your AccessKey and model file (.pllm):

const { PicoLLM } = require("@picovoice/picollm-node");

const pllm = new PicoLLM(
  "${ACCESS_KEY}", // Replace with your Picovoice AccessKey
  "${MODEL_PATH}") // Replace with the path to the downloaded model file

Pass in your prompt to the generate function. You may also pass in a streamCallback function to handle response tokens as soon as they are available:

const result = pllm.generate("${PROMPT}", {
  streamCallback: (token) => process.stdout.write(token)
});
console.log(result);

There are many configuration options in addition to streamCallback. For the full list of options, check out the picoLLM Inference API docs.

When done, be sure to release the resources explicitly:

pllm.release()

For a complete working project, take a look at the picoLLM Inference Node.js Demo. You can also view the picoLLM Inference Node.js API docs for details on the package.

Local LLM for Windows, Mac, Linux: Run Llama with Node.js

Before Running Llama with Node.js

Install Packages

Sign up for Picovoice Console

Model File

Building a Simple Application with Llama

More from Picovoice