Running large language models (LLMs) on iOS mobile devices presents a unique set of challenges and opportunities. Mobile device performance is bound by limited computational power, memory, and battery life, making it difficult to feasibly run popular AI models such as Microsoft's Phi-2 and Google's Gemma. However, the emergence of model compression and hardware-accelerated inference is transforming this landscape. picoLLM
offers a variety of hyper-compressed, open-weight models that can be run on-device using the picoLLM Inference Engine
. By enabling local AI inference, picoLLM
enhances user privacy, reduces latency, and ensures more stable access to AI-powered applications. These benefits make picoLLM
an ideal solution for users seeking robust AI capabilities without depending on the cloud.
The picoLLM Inference Engine is a cross-platform library that supports Linux, macOS, Windows, Raspberry Pi, Android, iOS and Web Browsers. picoLLM has SDKs for Python, Node.js, Android, iOS, and JavaScript.
The following guide will walk you through all the steps required to run a local LLM on an iOS device. For this guide, we're going to use the picoLLM Chat app as our starting point.
Setup
- Connect an iOS device in developer mode or launch an iOS simulator.
Running the Chat App
- Go to the picoLLM Chat app directory and run:
Open the generated
PicoLLMChatDemo.xcworkspace
with Xcode.Go to Picovoice Console to download a
picoLLM
model file (.pllm
) and retrieve yourAccessKey
.Upload the
.pllm
file to your device using Apple AirDrop or via USB and Finder on your Mac.Replace the value of
${YOUR_ACCESS_KEY_HERE}
in ViewModel.swift with your PicovoiceAccessKey
.Build and run the demo on the connected the device.
Integrating into your App
- Import the picoLLM-iOS binding by adding the following line to your
Podfile
:
- Run the following from the project directory:
- Create an instance of the engine:
- Pass in a text prompt to generate an LLM completion: