🎯 Enterprise LLM Consulting
Work with AI consultants to build LLM-powered apps to improve productivity, retention, and time-to-market.
Consult an AI Expert

Running an LLM locally inside a .NET application is increasingly valuable for developers who need private, low-latency AI features without relying on cloud APIs. In this tutorial, you will learn how to add on-device LLM inference to a .NET C# application using picoLLM On-Device LLM Inference, Picovoice's lightweight, cross-platform runtime for local text generation.

picoLLM is designed for scenarios where developers want fast inference, predictable performance, and full data privacy. All processing happens locally, and it runs in desktop (Windows, macOS, Linux), embedded (Raspberry Pi), and restricted-connectivity environments. This walkthrough shows how to initialize picoLLM in a .NET project, load a model, and run basic text generation using the .NET API.

By the end of the tutorial, you will have a minimal but complete implementation of on-device LLM inference in .NET.

Key Takeaways

  • You can run LLMs fully locally in .NET using picoLLM with C#.
  • Works on Windows, Linux, macOS, and Raspberry Pi.
  • Text generation, chat, and instruct models run without cloud APIs.

How to Run Local LLM Inference in a .NET C# Project

Easily integrate on-device LLM inference into your .NET C# applications using picoLLM. Follow these steps to install, initialize, and run your first text generation model locally.

Prerequisites

  • Windows (x86_64): .NET Framework 4.6.1+, .NET Standard 2.0+, or .NET Core 3.0+
  • macOS (x86_64): .NET Standard 2.0+ or .NET Core 3.0+
  • macOS (arm64), Windows (arm64), Linux (x86_64), Raspberry Pi (4, 5): .NET 6.0+

1. Install the NuGet Package

Add the official PicoLLM NuGet package to your .NET project to enable local LLM inference:

This package provides all the APIs needed to load models and perform text generation and chat-based LLM inference on-device.

2. Download a picoLLM Model

Sign up for a Picovoice Console account for free and copy your AccessKey from the main dashboard.

Download a picoLLM model file (.pllm) from the picoLLM page. If you want to follow this guide for chat-based interactions, select a model that supports chat functionality.

3. Initialize the Local LLM Inference Engine

Initialize the LLM inference engine in your C# project using your AccessKey and the path to the downloaded model:

This sets up your local LLM inference engine, ready to process prompts without requiring cloud connectivity.

4. Create a Dialog and Add User Prompts

Use PicoLLMDialog to manage conversational context. Capture user input and send it to the model:

This allows your model to maintain context across multiple turns, which is essential for chat-based LLM applications.

5. Configure Text Generation Parameters

Control the behavior of your LLM inference with parameters like temperature, completionTokenLimit, stopPhrases, presencePenalty, and frequencyPenalty. Adjust these to customize response creativity, length, and style:

See the picoLLM API docs for the full list of available parameters. Fine-tuning these parameters helps produce outputs that best suit your application's needs, whether for question answering, summarization, or dialogue generation.

6. Generate Text and Handle Streaming Output

Generate responses incrementally and process tokens in real-time using a streamCallback function. This is useful for displaying progressive outputs or streaming responses to a UI:

You can also interrupt ongoing generation if the user cancels or changes their input:

This ensures your on-device LLM inference is responsive and interactive for real-time applications.

7. Clean Up Resources

PicoLLM will have its resources freed by the garbage collector, but to have resources freed immediately after use, wrap it in a using statement (or call Dispose() directly):

C# Code Example: Local Text Generation Without Cloud APIs

Implementing picoLLM properly into a .NET application requires proper handling of loops and threads. Here is a complete console demo in C# to demonstrate:

This is a simplified example that includes only the essential code to get you started. To see a complete .NET application, check out the picoLLM On-Device LLM Inference .NET demo on GitHub.

Tips & Best Practices: LLM Inference in .NET

  • Prompt clarity: Provide clear and concise prompts to get the best results.
  • Context management: Keep track of conversation state for multi-turn interactions. If you're using picoLLM LLM Inference, you can configure this with the history parameter of GetDialog.
  • Choosing the right chat template: Some models define multiple chat template modes. For example, phi-2 allows both qa and chat templates. Set the mode you wish to use in GetDialog for the best results.
  • Resource efficiency: Dispose of LLM instances when not needed to minimize memory use.

Common Issues & Troubleshooting

Model load errors

If a model fails to load, ensure that the .pllm file path is correct and accessible. Verify that your Picovoice AccessKey is valid and that the model version matches your picoLLM runtime.

Performance bottlenecks

Common bottlenecks include large prompt histories, oversized models, or running multiple inference sessions simultaneously on limited-core devices. Monitoring CPU and memory usage during runtime can help identify which part of your workflow is slowing down. Consider limiting context length or using lighter models for faster responses.

Incorrect or unexpected responses

If the model produces irrelevant or confusing output, check your prompt clarity and context history. Overly long or ambiguous prompts can confuse the model. Trimming unnecessary conversation history or refining prompts can improve response quality.

Enhance Your App with Speech Recognition

picoLLM LLM Inference can be combined with speech recognition engines for full voice-powered apps:

To see a complete voice assistant demo, check out the .NET LLM Voice Assistant on GitHub.

Start Free

Frequently Asked Questions

What is on-device LLM inference in .NET?

On-device LLM inference allows a .NET application to run a local language model for text generation or chat without relying on cloud APIs. This ensures low latency, predictable performance, and full data privacy.

Which platforms support picoLLM for .NET?

picoLLM works on Windows, macOS, Linux, and Raspberry Pi (x86_64 and ARM64), supporting desktop, mobile, and embedded .NET applications.

Can I run picoLLM fully offline in .NET?

Once the model is downloaded, all processing happens locally on your device. Internet is required only for licensing and usage tracking.

Can picoLLM handle multi-turn conversations?

Yes. By using the PicoLLMDialog object, you can manage conversation history and maintain context across multiple prompts and responses.