picoLLM Inference Engine
picoLLM Inference Engine is a highly accurate and cross-platform SDK optimized for running compressed large language models.
picoLLM Inference Engine is:
- Accurate; picoLLM Compression improves GPTQ by up to 98%.
- Private; LLM inference runs 100% locally.
- Cross-Platform
- Linux (x86_64), macOS (arm64, x86_64), and Windows (x86_64)
- Raspberry Pi (5 and 4)
- Android and iOS
- Chrome, Safari, Edge, and Firefox
- Runs on CPU and GPU
- Free for open-weight models
Get Started
Anyone who is using Picovoice needs to have a valid AccessKey. AccessKey is your authentication and authorization token for using Picovoice. It also verifies that your usage is within the limits of your account. You must keep your AccessKey secret!
Sign up for Picovoice Console
Sign up for Picovoice Console. It is free, no credit card required.
Obtain your AccessKey
Log in to your Picovoice Console account and copy your AccessKey
from the home page:
Download SDK
Picovoice SDKs are available both on GitHub and via SDK-specific package managers. Follow one of the quick starts to get started with picoLLM using your newly-created AccessKey.
Models
picoLLM Inference Engine supports the following open-weight models:
- Gemma
gemma-2b
gemma-2b-it
gemma-7b
gemma-7b-it
- Llama-2
llama-2-7b
llama-2-7b-chat
llama-2-13b
llama-2-13b-chat
llama-2-70b
llama-2-70b-chat
- Llama-3
llama-3-8b
llama-3-8b-instruct
llama-3-70b
llama-3-70b-instruct
- Mistral
mistral-7b-v0.1
mistral-7b-instruct-v0.1
mistral-7b-instruct-v0.2
- Mixtral
mixtral-8x7b-v0.1
mixtral-8x7b-instruct-v0.1
- Phi-2
phi2
Download Model
Log in to the Picovoice Console and navigate to the picoLLM
page. Choose a model file that you would like to download: