How to Run a Local LLM on Android

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI applications running entirely on mobile without sharing user data with 3rd parties.

Running large language models (LLMs) on Android mobile devices presents a unique set of challenges and opportunities. Mobile devices are constrained by limited computational power, memory, and battery life, making it difficult to reasonably run popular AI models such as Microsoft's Phi-2 and Google's Gemma. However, the emergence of model compression and hardware-accelerated inference is transforming this landscape. picoLLM offers a variety of hyper-compressed, open-weight models that can be run on-device using the picoLLM Inference Engine. By enabling local AI inference, picoLLM enhances user privacy, reduces latency, and ensures more stable access to AI-powered applications. These benefits make picoLLM an ideal solution for users seeking robust AI capabilities without depending on the cloud.

The picoLLM Inference Engine is a cross-platform library that supports Linux, macOS, Windows, Raspberry Pi, Android, iOS and Web Browsers. picoLLM has SDKs for Python, Node.js, Android, iOS, and JavaScript.

The following guide will walk you through all the steps required to run a local LLM on an Android device. For this guide, we're going to use the picoLLM Chat app as a jumping-off point.

Setup

Install Android Studio.
Clone the picoLLM repository from GitHub:

git clone https://github.com/Picovoice/picollm.git

Connect an Android device in dev mode or launch an Android simulator.

Running the Chat App

Open the picoLLM Chat app with Android Studio.
Go to Picovoice Console to download a picoLLM model file (.pllm) and retrieve your AccessKey.
Upload the .pllm file to your device using Android Studio's Device Explorer or using adb push:

adb push ~/model.pllm /sdcard/Download/

Replace the value of ${YOUR_ACCESS_KEY_HERE} in MainActivity.java with your Picovoice AccessKey.
Build and run the demo on the connected the device.
The app will prompt you to load a .pllm file. Browse to where you uploaded the model file (ensure Large files toggle is on) and select the file.
Enter a text prompt to begin a chat with the LLM.

Integrating into your App

Ensure mavenCentral() repository has been added to the top-level build.gradle:

repositories {
    mavenCentral()
}

Add picollm-android to as a dependency in your app's build.gradle:

dependencies {
    implementation 'ai.picovoice:picollm-android:${LATEST_VERSION}'
}

Create an instance of the engine:

import ai.picovoice.picollm.*;

try {
    PicoLLM picollm = new PicoLLM.Builder()
        .setAccessKey("${ACCESS_KEY}")
        .setModelPath("${MODEL_PATH}")
        .build();
} catch (PicoLLMException e) { }

Pass in a text prompt to generate an LLM completion:

try {
    PicoLLMCompletion res = picollm.generate(
        "${PROMPT}",
        new PicoLLMGenerateParams.Builder().build());
} catch (PicoLLMException e) { }

When done, be sure to release the resources explicitly:

picollm.delete()

How to Run a Local LLM on Android

Setup

Running the Chat App

Integrating into your App

More from Picovoice