Running an LLM locally inside a .NET application is increasingly valuable for developers who need private, low-latency AI features without relying on cloud APIs. In this tutorial, you will learn how to add on-device LLM inference to a .NET C# application using picoLLM On-Device LLM Inference, Picovoice's lightweight, cross-platform runtime for local text generation.
picoLLM is designed for scenarios where developers want fast inference, predictable performance, and full data privacy. All processing happens locally, and it runs in desktop (Windows, macOS, Linux), embedded (Raspberry Pi), and restricted-connectivity environments. This walkthrough shows how to initialize picoLLM in a .NET project, load a model, and run basic text generation using the .NET API.
By the end of the tutorial, you will have a minimal but complete implementation of on-device LLM inference in .NET.
Key Takeaways
- You can run LLMs fully locally in .NET using picoLLM with C#.
- Works on Windows, Linux, macOS, and Raspberry Pi.
- Text generation, chat, and instruct models run without cloud APIs.
How to Run Local LLM Inference in a .NET C# Project
Easily integrate on-device LLM inference into your .NET C# applications using picoLLM. Follow these steps to install, initialize, and run your first text generation model locally.
Prerequisites
- Windows (x86_64): .NET Framework 4.6.1+, .NET Standard 2.0+, or .NET Core 3.0+
- macOS (x86_64): .NET Standard 2.0+ or .NET Core 3.0+
- macOS (arm64), Windows (arm64), Linux (x86_64), Raspberry Pi (4, 5): .NET 6.0+
1. Install the NuGet Package
Add the official PicoLLM NuGet package to your .NET project to enable local LLM inference:
This package provides all the APIs needed to load models and perform text generation and chat-based LLM inference on-device.
2. Download a picoLLM Model
Sign up for a Picovoice Console account for free and copy your AccessKey from the main dashboard.
Download a picoLLM model file (.pllm) from the picoLLM page. If you want to follow this guide for chat-based interactions, select a model that supports chat functionality.
3. Initialize the Local LLM Inference Engine
Initialize the LLM inference engine in your C# project using your AccessKey and the path to the downloaded model:
This sets up your local LLM inference engine, ready to process prompts without requiring cloud connectivity.
4. Create a Dialog and Add User Prompts
Use PicoLLMDialog to manage conversational context. Capture user input and send it to the model:
This allows your model to maintain context across multiple turns, which is essential for chat-based LLM applications.
5. Configure Text Generation Parameters
Control the behavior of your LLM inference with parameters like temperature, completionTokenLimit, stopPhrases, presencePenalty, and frequencyPenalty. Adjust these to customize response creativity, length, and style:
See the picoLLM API docs for the full list of available parameters. Fine-tuning these parameters helps produce outputs that best suit your application's needs, whether for question answering, summarization, or dialogue generation.
6. Generate Text and Handle Streaming Output
Generate responses incrementally and process tokens in real-time using a streamCallback function. This is useful for displaying progressive outputs or streaming responses to a UI:
You can also interrupt ongoing generation if the user cancels or changes their input:
This ensures your on-device LLM inference is responsive and interactive for real-time applications.
7. Clean Up Resources
PicoLLM will have its resources freed by the garbage collector, but to have resources freed immediately after use, wrap it in a using statement (or call Dispose() directly):
C# Code Example: Local Text Generation Without Cloud APIs
Implementing picoLLM properly into a .NET application requires proper handling of loops and threads. Here is a complete console demo in C# to demonstrate:
This is a simplified example that includes only the essential code to get you started. To see a complete .NET application, check out the picoLLM On-Device LLM Inference .NET demo on GitHub.
Tips & Best Practices: LLM Inference in .NET
- Prompt clarity: Provide clear and concise prompts to get the best results.
- Context management: Keep track of conversation state for multi-turn interactions. If you're using
picoLLM LLM Inference, you can configure this with thehistoryparameter of GetDialog. - Choosing the right chat template: Some models define multiple chat template modes. For example,
phi-2allows bothqaandchattemplates. Set the mode you wish to use in GetDialog for the best results. - Resource efficiency: Dispose of LLM instances when not needed to minimize memory use.
Common Issues & Troubleshooting
Model load errors
If a model fails to load, ensure that the .pllm file path is correct and accessible. Verify that your Picovoice AccessKey is valid and that the model version matches your picoLLM runtime.
Performance bottlenecks
Common bottlenecks include large prompt histories, oversized models, or running multiple inference sessions simultaneously on limited-core devices. Monitoring CPU and memory usage during runtime can help identify which part of your workflow is slowing down. Consider limiting context length or using lighter models for faster responses.
Incorrect or unexpected responses
If the model produces irrelevant or confusing output, check your prompt clarity and context history. Overly long or ambiguous prompts can confuse the model. Trimming unnecessary conversation history or refining prompts can improve response quality.
Enhance Your App with Speech Recognition
picoLLM LLM Inference can be combined with speech recognition engines for full voice-powered apps:
- Porcupine Wake Word: to trigger
picoLLMonly after a wake word. - Rhino Speech-to-Intent: for structured command understanding.
- Cheetah Streaming Speech-to-Text: to convert user speech into text prompts for
picoLLM. - Orca Streaming Text-to-Speech: to speak
picoLLMresponses back to the user.
To see a complete voice assistant demo, check out the .NET LLM Voice Assistant on GitHub.
Start FreeFrequently Asked Questions
On-device LLM inference allows a .NET application to run a local language model for text generation or chat without relying on cloud APIs. This ensures low latency, predictable performance, and full data privacy.
picoLLM works on Windows, macOS, Linux, and Raspberry Pi (x86_64 and ARM64), supporting desktop, mobile, and embedded .NET applications.
Once the model is downloaded, all processing happens locally on your device. Internet is required only for licensing and usage tracking.
Yes. By using the PicoLLMDialog object, you can manage conversation history and maintain context across multiple prompts and responses.







