picoLLM Inference Engine

picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language models.

picoLLM Inference Engine is:

Accurate; picoLLM Compression improves GPTQ by up to 98%.
Private; LLM inference runs 100% locally.
Cross-Platform
- Linux (x86_64), macOS (arm64, x86_64), and Windows (x86_64, arm64)
- Raspberry Pi (4 and 5)
- Android and iOS
- Chrome, Safari, Edge, and Firefox
Runs on CPU and GPU
Free for open-weight models

Get Started

Anyone who is using Picovoice needs to have a valid AccessKey. AccessKey is your authentication and authorization token for using Picovoice. It also verifies that your usage is within the limits of your account. You must keep your AccessKey secret!

Obtain your AccessKey

Download SDK

Picovoice SDKs are available both on GitHub and via SDK-specific package managers. Follow one of the quick starts to get started with picoLLM using your newly-created AccessKey.

Models

picoLLM Inference Engine supports the following open-weight models:

Gemma
- gemma-2b
- gemma-2b-it
- gemma-7b
- gemma-7b-it
Llama-2
- llama-2-7b
- llama-2-7b-chat
- llama-2-13b
- llama-2-13b-chat
- llama-2-70b
- llama-2-70b-chat
Llama-3
- llama-3-8b
- llama-3-8b-instruct
- llama-3-70b
- llama-3-70b-instruct
Llama-3.2
- llama3.2-1b-instruct
- llama3.2-3b-instruct
Mistral
- mistral-7b-v0.1
- mistral-7b-instruct-v0.1
- mistral-7b-instruct-v0.2
Mixtral
- mixtral-8x7b-v0.1
- mixtral-8x7b-instruct-v0.1
Phi-2
- phi2
Phi-3
- phi3
Phi-3.5
- phi3.5

Download Model

Was this doc helpful?