The increasing interest in running large language models like Llama2
and Llama3
on local machines stems from several compelling reasons. Local LLM inference offers enhanced privacy, reduced latency and eliminates dependency on cloud services and their associated costs. However, this trend has posed challenges for web developers. Most inference engines rely on WebGPU, a technology not universally supported across all browsers and often requires the activation of experimental features. Moreover, the need for a GPU limits the potential user base for web applications leveraging these models. These factors have made integrating LLMs into web applications a less attractive option.
picoLLM runs LLMs locally within web browsers, like Chrome, Firefox, and Safari. This allows developers to achieve local LLM inference with just a few lines of JavaScript code, without requiring a GPU. This opens up many use cases for LLMs in web applications, such as summarization, proofreading, text generation, and question-answering.
What you need to run LLama
To follow along, ensure you have the following:
- Node.js: Download and install it from the Node.js download page.
- Picovoice AccessKey: Obtain a free AccessKey by creating an account on Picovoice Console.
- picoLLM model File (
.pllm
): Visit the picoLLM tab on Picovoice Console and download a model file for eitherLlama2
orLlama3
.
Building a Simple Web Application with Llama
- Initialize a new project:
- Install required packages:
http-server
: Enables local server creation on your machine.@picovoice/picollm-web
: Thepicollm-web
package for web-based inference.
- Create an HTML file (
index.html
) with the following code:
This code defines a main function called demo
. Here's a breakdown of its steps:
Retrieves the model file:
Creates a
PicoLLMWorker
instance:This requires your Picovoice AccessKey and the selected model file.
Generates a text completion:
Running the Llama Web Application
Start a local server to run the project:
Open your browser and navigate to http://localhost:5000
to see the demo in action.
Delve Deeper
This guide has provided a basic introduction to running Llama models in your browser using picollm-web
. To further your exploration, consider these resources:
- Explore our web demos on GitHub for practical examples and more advanced use cases, such as building a chatbot.
- Refer to the API documentation for detailed information on the
picollm-web
SDK and its features. - To understand the underlying technology behind
picollm-web
, read our cross-browser local LLM inference using WebAssembly blog post.