Local LLM for Web Browsers: Run Llama with Javascript

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI apps running within web browsers without sending user data to 3rd party servers.

The increasing interest in running large language models like Llama2 and Llama3 on local machines stems from several compelling reasons. Local LLM inference offers enhanced privacy, reduced latency and eliminates dependency on cloud services and their associated costs. However, this trend has posed challenges for web developers. Most inference engines rely on WebGPU, a technology not universally supported across all browsers and often requires the activation of experimental features. Moreover, the need for a GPU limits the potential user base for web applications leveraging these models. These factors have made integrating LLMs into web applications a less attractive option.

picoLLM runs LLMs locally within web browsers, like Chrome, Firefox, and Safari. This allows developers to achieve local LLM inference with just a few lines of JavaScript code, without requiring a GPU. This opens up many use cases for LLMs in web applications, such as summarization, proofreading, text generation, and question-answering.

What you need to run LLama

To follow along, ensure you have the following:

Node.js: Download and install it from the Node.js download page.
Picovoice AccessKey: Obtain a free AccessKey by creating an account on Picovoice Console.
picoLLM model File (.pllm): Visit the picoLLM tab on Picovoice Console and download a model file for either Llama2 or Llama3.

Building a Simple Web Application with Llama

Initialize a new project:

npm init

Install required packages:

npm install http-server @picovoice/picollm-web

http-server: Enables local server creation on your machine.
@picovoice/picollm-web: The picollm-web package for web-based inference.

Create an HTML file (index.html) with the following code:

<!DOCTYPE html>
<html lang="en">
  <head>
    <script src="node_modules/@picovoice/picollm-web/dist/iife/index.js"></script>
    <script type="application/javascript">
        async function demo() {
            const modelFile = document.getElementById("modelFile").files;
            console.log(modelFile);
            const pllm = await PicollmWeb.PicoLLMWorker.create(
                "${ACCESS_KEY}",
                {modelFile: modelFile[0]}
                );
            const res = await pllm.generate("What is the capital of Canada?");
            console.log(res.completion);
        }
    </script>
    <title>Llama - picoLLM</title>
  </head>
  <body>
    <input id="modelFile" type="file" accept="pllm" />
    <button onclick="demo()">Run</button>
  </body>
</html>

This code defines a main function called demo. Here's a breakdown of its steps:

Retrieves the model file:

const modelFile = document.getElementById("modelFile").files;

Creates a PicoLLMWorker instance:

const pllm = await PicollmWeb.PicoLLMWorker.create(
    "${ACCESS_KEY}",
    {modelFile: modelFile[0]}
);

This requires your Picovoice AccessKey and the selected model file.

Generates a text completion:

const res = await pllm.generate("What is the capital of Canada?");
console.log(res.completion);

Running the Llama Web Application

Start a local server to run the project:

npx http-server -a localhost -p 5000

Open your browser and navigate to http://localhost:5000 to see the demo in action.

Delve Deeper

This guide has provided a basic introduction to running Llama models in your browser using picollm-web. To further your exploration, consider these resources:

Explore our web demos on GitHub for practical examples and more advanced use cases, such as building a chatbot.
Refer to the API documentation for detailed information on the picollm-web SDK and its features.
To understand the underlying technology behind picollm-web, read our cross-browser local LLM inference using WebAssembly blog post.

Local LLM for Web Browsers: Run Llama with Javascript

What you need to run LLama

Building a Simple Web Application with Llama

Running the Llama Web Application

Delve Deeper

YouTube Tutorial

More from Picovoice