The increasing interest in running large language models like Llama2 and Llama3 on local machines stems from several compelling reasons. Local LLM inference offers enhanced privacy, reduced latency and eliminates dependency on cloud services and their associated costs. However, this trend has posed challenges for web developers. Most inference engines rely on WebGPU, a technology not universally supported across all browsers and often requires the activation of experimental features. Moreover, the need for a GPU limits the potential user base for web applications leveraging these models. These factors have made integrating LLMs into web applications a less attractive option.

picoLLM runs LLMs locally within web browsers, like Chrome, Firefox, and Safari. This allows developers to achieve local LLM inference with just a few lines of JavaScript code, without requiring a GPU. This opens up many use cases for LLMs in web applications, such as summarization, proofreading, text generation, and question-answering.

What you need to run LLama

To follow along, ensure you have the following:

  • Node.js: Download and install it from the Node.js download page.
  • Picovoice AccessKey: Obtain a free AccessKey by creating an account on Picovoice Console.
  • picoLLM model File (.pllm): Visit the picoLLM tab on Picovoice Console and download a model file for either Llama2 or Llama3.

Building a Simple Web Application with Llama

  1. Initialize a new project:
  1. Install required packages:
  • http-server: Enables local server creation on your machine.
  • @picovoice/picollm-web: The picollm-web package for web-based inference.
  1. Create an HTML file (index.html) with the following code:

This code defines a main function called demo. Here's a breakdown of its steps:

  • Retrieves the model file:

  • Creates a PicoLLMWorker instance:

    This requires your Picovoice AccessKey and the selected model file.

  • Generates a text completion:

Running the Llama Web Application

Start a local server to run the project:

Open your browser and navigate to http://localhost:5000 to see the demo in action.

Delve Deeper

This guide has provided a basic introduction to running Llama models in your browser using picollm-web. To further your exploration, consider these resources: