As Large Language Models (LLMs), such as Llama 2
and Llama 3
, continue to push boundaries in artificial intelligence, they are changing how we interact with the technology around us. Voice Assistants
have long since been integrated into our mobile phones, but until now their ability to understand and process requests has been quite limited. However, these new LLM
style AI models can understand and generate human-like text, making them ideal candidates to augment applications like voice assistants, as well as chatbots and other natural language processing tasks.
However, a major limitation of these AI models is that they require significant resources in order to run their computations. Whilst desktop applications can make use of powerful CPU
s and GPU
s, mobile phones have much more limited hardware. To make things more difficult, since our mobile devices are with us almost all the time, privacy is also a much larger concern. Network connectivity is also an issue, as a fast reliable signal is not a guarantee. Therefore, in order to make the most use of something like Llama 3
on an Android device, we have to run it offline on the device.
Luckily, Picovoice's picoLLM Inference Engine makes it easy to perform offline LLM
inference. picoLLM Inference
is a lightweight inference engine that operates locally, ensuring privacy compliance with GDPR
and HIPAA
regulations, and usability where network connection is a concern. Llama
models compressed by picoLLM Compression are small enough that they are able to run on most Android devices.
picoLLM Inference Engine
also runs on iOS
, Linux
, Windows
, macOS
, Raspberry Pi
and modern Web Browsers
.
In just a few lines of code, you can start performing LLM
inference using the picoLLM Inference
Android SDK. Let’s get started!
Before Running Llama on Android
Install picoLLM Packages
The picollm-android
package is hosted on the Maven Central Repository. To include the package in your Android project, ensure you have included mavenCentral()
in your top-level build.gradle
file and then add the following to your app's build.gradle
:
Sign up for Picovoice Console
Next, create a Picovoice Console account, and copy your AccessKey
from the main dashboard. Creating an account is free, and no credit card is required.
Downloading the picoLLM Compressed Llama 2 or 3 Model File
Download any of the Llama 2
or Llama 3
picoLLM
model files (.pllm
) from the picoLLM page on Picovoice Console.
Model files are also available other open weight models, such as Gemma
, Mistral
, Mixtral
and Phi-2
.
The model needs to be transferred to the device, there are several ways to do this depending on the application use case. For testing it is best to use the Android Debug Bridge (ADB) command adb push
to transfer the model file directly to a connected device.
Using picoLLM in an Android Application
Create an instance of picoLLM
with your AccessKey
and model file path (.pllm
):
Pass in your text prompt to the generate
function to generate an LLM completion. You may also use .setStreamCallback()
to provide a function that handles response tokens as soon as they are available:
There are many configuration options in addition to .setStreamCallback()
. For the full list of options, check out the picoLLM Inference Android API docs.
When done, be sure to release the resources explicitly:
For a complete working project, take a look at the picoLLM Completion Android Demo or the picoLLM Chat Android Demo. You can also view the picoLLM Inference Android API docs for complete details on the Android SDK.