How to Add On-Device LLM Inference to a .NET App

🎯 Enterprise LLM Consulting

Work with AI consultants to build LLM-powered apps to improve productivity, retention, and time-to-market.

As enterprises look to bring AI capabilities closer to their data, on-device LLM inference is emerging as a powerful solution. By integrating local language models into .NET C# applications, developers can enable features like text generation, question answering, document summarization, and workflow automation—all without sending sensitive data to the cloud. Running quantized models directly on desktop, mobile, or embedded devices reduces latency, removes cloud dependencies, and enhances privacy, making it ideal for secure or offline enterprise scenarios.

picoLLM On-Device LLM Inference enables fast, private, and scalable LLM inference directly on-device. In this tutorial, we'll show you how to implement picoLLM On-Device LLM Inference in a .NET app; from installing the PicoLLM .NET SDK and loading a model to running inference and handling responses.

Step-by-Step: Run an LLM Locally in a .NET App

First, make sure your environment meets the minimum .NET requirements:

Windows (x86_64): .NET Framework 4.6.1+, .NET Standard 2.0+, or .NET Core 3.0+
macOS (x86_64): .NET Standard 2.0+ or .NET Core 3.0+
macOS (arm64), Windows (arm64), Linux (x86_64), Raspberry Pi (3, 4, 5): .NET 6.0+

Next, sign up for a Picovoice Console account and copy your AccessKey.

1. Install the NuGet Package

Add the PicoLLM NuGet package to your .NET project using the CLI:

dotnet add package PicoLLM

2. Initialize picoLLM

Download a picoLLM model file (.pllm) from the Picovoice Console. As this tutorial will demonstrate chat functionality, choose a model that supports chat if you wish to follow along.

Initialize an instance of PicoLLM, passing in your AccessKey and downloaded model:

using Pv;

PicoLLM picoLLM = PicoLLM.Create(
    "${ACCESS_KEY}"
    "${MODEL_PATH}");

3. Send Prompts

Create a PicoLLMDialog object. Then, add your prompt to the dialog object via dialog.AddHumanRequest:

PicoLLMDialog dialog = picoLLM.GetDialog();

string? prompt = Console.ReadLine();
dialog.AddHumanRequest(prompt);

4. Handle Responses

Call Generate with dialog.Prompt() and a streamCallback function to handle the output. Be sure to also add the response to the dialog via dialog.AddLLMResponse:

PicoLLMCompletion response = picoLLM.Generate(
    dialog.Prompt(),
    streamCallback: (string token) =>
        {
            if (!isInterrupt)
            {
                Console.Write(token);
                Console.Out.Flush();
            }
        });
dialog.AddLLMResponse(response.Completion);

There are many additional parameters you can pass to Generate to configure responses, such as the token limit and stop phrases. Refer to the picoLLM .NET API docs for the full list.

5. Interrupting Response Generation

You can interrupt an ongoing Generate if the user cancels or changes their input:

picoLLM.Interrupt();

6. Clean Up Resources

PicoLLM will have its resources freed by the garbage collector, but to have resources freed immediately after use, wrap it in a using statement (or call Dispose() directly):

using (PicoLLM pllm = PicoLLM.Create(accessKey, modelPath))
{
    // .. picoLLM usage here
}

Complete Example: On-Device LLM in .NET

Implementing picoLLM properly into a .NET application requires proper handling of loops and threads. Here is a complete console demo in C# to demonstrate:

using System;
using System.Threading;
using Pv;

class PicoLLMDemo
{
    static void Main()
    {
        Console.WriteLine("Initializing picoLLM...");
        using (PicoLLM picoLLM = PicoLLM.Create("${ACCESS_KEY}", "${MODEL_PATH}"))
        {
            PicoLLMDialog dialog = picoLLM.GetDialog();
            Console.WriteLine(
                "Type something and press Enter. " +
                "Press Spacebar to interrupt. " +
                "Press Ctrl+C to exit."
            );

            while (true)
            {
                Console.Write("\n> ");
                string? prompt = Console.ReadLine();
                if (prompt != null)
                {
                    dialog.AddHumanRequest(prompt);
                    bool isInterrupt = false;
                    PicoLLMCompletion? response = null;
                    Task interruptKeyTask = Task.Run(async () =>
                    {
                        while (!isInterrupt && response == null)
                        {
                            if (Console.KeyAvailable)
                            {
                                ConsoleKeyInfo keyInfo = Console.ReadKey(intercept: true);
                                if (keyInfo.Key == ConsoleKey.Spacebar)
                                {
                                    Console.WriteLine("\nInterrupting generation...");
                                    isInterrupt = true;
                                    picoLLM.Interrupt();
                                }
                            }
                            await Task.Delay(100);
                        }
                    });

                    response = picoLLM.Generate(
                        dialog.Prompt(),
                        streamCallback: (string token) =>
                            {
                                if (!isInterrupt)
                                {
                                    Console.Write(token);
                                    Console.Out.Flush();
                                }
                            });
                    interruptKeyTask.Wait();
                    dialog.AddLLMResponse(response.Completion);
                }
            }
        }
    }
}

This is a simplified example that includes only the essential code to get you started. To see a complete .NET application, check out the picoLLM On-Device LLM Inference .NET demo on GitHub.

Tips & Best Practices: LLM Inference in .NET

Prompt clarity: Provide clear and concise prompts to get the best results.
Context management: Keep track of conversation state for multi-turn interactions. If you're using picoLLM LLM Inference, you can configure this with the history parameter of GetDialog.
Choosing the right chat template: Some models define multiple chat template modes. For example, phi-2 allows both qa and chat templates. Set the mode you wish to use in GetDialog for the best results.
Resource efficiency: Dispose of LLM instances when not needed to minimize memory use.

Enhancing Your App with Speech Recognition

picoLLM LLM Inference can be combined with speech recognition engines for full voice-powered apps:

Porcupine Wake Word: to trigger picoLLM only after a wake word.
Rhino Speech-to-Intent: for structured command understanding.
Cheetah Streaming Speech-to-Text: to convert user speech into text prompts for picoLLM.
Orca Streaming Text-to-Speech: to speak picoLLM responses back to the user.

To see a complete voice assistant demo, check out the .NET LLM Voice Assistant on GitHub.

Start Free