As enterprises look to bring AI capabilities closer to their data, on-device LLM inference is emerging as a powerful solution. By integrating local language models into .NET C# applications, developers can enable features like text generation, question answering, document summarization, and workflow automation—all without sending sensitive data to the cloud. Running quantized models directly on desktop, mobile, or embedded devices reduces latency, removes cloud dependencies, and enhances privacy, making it ideal for secure or offline enterprise scenarios.
picoLLM On-Device LLM Inference enables fast, private, and scalable LLM inference directly on-device. In this tutorial, we'll show you how to implement picoLLM On-Device LLM Inference in a .NET app; from installing the PicoLLM .NET SDK and loading a model to running inference and handling responses.
Step-by-Step: Run an LLM Locally in a .NET App
First, make sure your environment meets the minimum .NET requirements:
- Windows (x86_64): .NET Framework 4.6.1+, .NET Standard 2.0+, or .NET Core 3.0+
- macOS (x86_64): .NET Standard 2.0+ or .NET Core 3.0+
- macOS (arm64), Windows (arm64), Linux (x86_64), Raspberry Pi (3, 4, 5): .NET 6.0+
Next, sign up for a Picovoice Console account and copy your AccessKey.
1. Install the NuGet Package
Add the PicoLLM NuGet package to your .NET project using the CLI:
2. Initialize picoLLM
Download a picoLLM model file (.pllm) from the Picovoice Console. As this tutorial will demonstrate chat functionality, choose a model that supports chat if you wish to follow along.
Initialize an instance of PicoLLM, passing in your AccessKey and downloaded model:
3. Send Prompts
Create a PicoLLMDialog object. Then, add your prompt to the dialog object via dialog.AddHumanRequest:
4. Handle Responses
Call Generate with dialog.Prompt() and a streamCallback function to handle the output. Be sure to also add the response to the dialog via dialog.AddLLMResponse:
There are many additional parameters you can pass to Generate to configure responses, such as the token limit and stop phrases. Refer to the picoLLM .NET API docs for the full list.
5. Interrupting Response Generation
You can interrupt an ongoing Generate if the user cancels or changes their input:
6. Clean Up Resources
PicoLLM will have its resources freed by the garbage collector, but to have resources freed immediately after use, wrap it in a using statement (or call Dispose() directly):
Complete Example: On-Device LLM in .NET
Implementing picoLLM properly into a .NET application requires proper handling of loops and threads. Here is a complete console demo in C# to demonstrate:
This is a simplified example that includes only the essential code to get you started. To see a complete .NET application, check out the picoLLM On-Device LLM Inference .NET demo on GitHub.
Tips & Best Practices: LLM Inference in .NET
- Prompt clarity: Provide clear and concise prompts to get the best results.
- Context management: Keep track of conversation state for multi-turn interactions. If you're using
picoLLM LLM Inference, you can configure this with thehistoryparameter of GetDialog. - Choosing the right chat template: Some models define multiple chat template modes. For example,
phi-2allows bothqaandchattemplates. Set the mode you wish to use in GetDialog for the best results. - Resource efficiency: Dispose of LLM instances when not needed to minimize memory use.
Enhancing Your App with Speech Recognition
picoLLM LLM Inference can be combined with speech recognition engines for full voice-powered apps:
- Porcupine Wake Word: to trigger
picoLLMonly after a wake word. - Rhino Speech-to-Intent: for structured command understanding.
- Cheetah Streaming Speech-to-Text: to convert user speech into text prompts for
picoLLM. - Orca Streaming Text-to-Speech: to speak
picoLLMresponses back to the user.
To see a complete voice assistant demo, check out the .NET LLM Voice Assistant on GitHub.
Start Free






