How to Run a Local LLM on iOS

🚀 Best-in-class Voice AI!

Build compliant and low-latency AI applications running entirely on mobile without sharing user data with 3rd parties.

Running large language models (LLMs) on iOS mobile devices presents a unique set of challenges and opportunities. Mobile device performance is bound by limited computational power, memory, and battery life, making it difficult to feasibly run popular AI models such as Microsoft's Phi-2 and Google's Gemma. However, the emergence of model compression and hardware-accelerated inference is transforming this landscape. picoLLM offers a variety of hyper-compressed, open-weight models that can be run on-device using the picoLLM Inference Engine. By enabling local AI inference, picoLLM enhances user privacy, reduces latency, and ensures more stable access to AI-powered applications. These benefits make picoLLM an ideal solution for users seeking robust AI capabilities without depending on the cloud.

The picoLLM Inference Engine is a cross-platform library that supports Linux, macOS, Windows, Raspberry Pi, Android, iOS and Web Browsers. picoLLM has SDKs for Python, Node.js, Android, iOS, and JavaScript.

The following guide will walk you through all the steps required to run a local LLM on an iOS device. For this guide, we're going to use the picoLLM Chat app as our starting point.

Setup

Install Xcode.
Install CocoaPods.
Clone the picoLLM repository from GitHub:

git clone https://github.com/Picovoice/picollm.git

Connect an iOS device in developer mode or launch an iOS simulator.

Running the Chat App

Go to the picoLLM Chat app directory and run:

pod install

Open the generated PicoLLMChatDemo.xcworkspace with Xcode.
Go to Picovoice Console to download a picoLLM model file (.pllm) and retrieve your AccessKey.
Upload the .pllm file to your device using Apple AirDrop or via USB and Finder on your Mac.
Replace the value of ${YOUR_ACCESS_KEY_HERE} in ViewModel.swift with your Picovoice AccessKey.
Build and run the demo on the connected the device.

Integrating into your App

Import the picoLLM-iOS binding by adding the following line to your Podfile:

pod 'picoLLM-iOS'

Run the following from the project directory:

pod install

Create an instance of the engine:

import PicoLLM

do {
    let picollm = try PicoLLM(
        accessKey: "${ACCESS_KEY}",
        modelPath: "${MODEL_PATH}")
} catch { }

Pass in a text prompt to generate an LLM completion:

do {
    let res = picollm.generate(prompt: "${PROMPT}")
} catch { }

How to Run a Local LLM on iOS

Setup

Running the Chat App

Integrating into your App

More from Picovoice