Running large language models (LLMs) on iOS mobile devices presents a unique set of challenges and opportunities. Mobile device performance is bound by limited computational power, memory, and battery life, making it difficult to feasibly run popular AI models such as Microsoft's Phi-2 and Google's Gemma. However, the emergence of model compression and hardware-accelerated inference is transforming this landscape. picoLLM offers a variety of hyper-compressed, open-weight models that can be run on-device using the picoLLM Inference Engine. By enabling local AI inference, picoLLM enhances user privacy, reduces latency, and ensures more stable access to AI-powered applications. These benefits make picoLLM an ideal solution for users seeking robust AI capabilities without depending on the cloud.

The picoLLM Inference Engine is a cross-platform library that supports Linux, macOS, Windows, Raspberry Pi, Android, iOS and Web Browsers. picoLLM has SDKs for Python, Node.js, Android, iOS, and JavaScript.

The following guide will walk you through all the steps required to run a local LLM on an iOS device. For this guide, we're going to use the picoLLM Chat app as our starting point.


  1. Install Xcode.

  2. Install CocoaPods.

  3. Clone the picoLLM repository from GitHub:

  1. Connect an iOS device in developer mode or launch an iOS simulator.

Running the Chat App

  1. Go to the picoLLM Chat app directory and run:
  1. Open the generated PicoLLMChatDemo.xcworkspace with Xcode.

  2. Go to Picovoice Console to download a picoLLM model file (.pllm) and retrieve your AccessKey.

  3. Upload the .pllm file to your device using Apple AirDrop or via USB and Finder on your Mac.

  4. Replace the value of ${YOUR_ACCESS_KEY_HERE} in ViewModel.swift with your Picovoice AccessKey.

  5. Build and run the demo on the connected the device.

Integrating into your App

  1. Import the picoLLM-iOS binding by adding the following line to your Podfile:
  1. Run the following from the project directory:
  1. Create an instance of the engine:
  1. Pass in a text prompt to generate an LLM completion: