April 30, 2025·RAG·2 min read·Senior developers

Kernel Memory in .NET — Microsoft's Out-of-the-Box RAG Service Reviewed

Kernel Memory packages "do RAG" into a service you can run in-process or stand alone. Useful at the start. Constraining when you grow out of it.

Microsoft Kernel Memory is the closest thing to "RAG-as-a-service" in the .NET world. It bundles ingestion, chunking, embedding, retrieval, and synthetic-memory pipelines behind a small API. I've used it on two projects. Here's the honest verdict.

In-process mode is the minimum-friction RAG starter:

var memory = new KernelMemoryBuilder()
    .WithAzureOpenAITextGeneration(azureCfg)
    .WithAzureOpenAITextEmbeddingGeneration(azureCfg)
    .WithPostgresMemoryDb(connStr)
    .Build<MemoryServerless>();

await memory.ImportDocumentAsync("policy.pdf");
var answer = await memory.AskAsync("What's the refund window?");

That's a RAG. From scratch, in fifteen minutes, with PDF ingestion and Postgres + pgvector storage. Solid.

Service mode (Microsoft.KernelMemory.Service.AspNetCore) runs Kernel Memory as a separate process you call over HTTP. Useful when multiple apps share the same knowledge base, or when ingestion is heavy and you want it isolated from your API.

What's good:

The pipeline handlers are pluggable. You can write a custom chunker, a custom extractor, your own embedding step, and slot them in. The architecture is open.
PDF/Office/markdown ingestion is built in. You don't have to wire up iText or PdfPig yourself.
The synthetic memory concept (LLM-generated summaries stored alongside chunks) is a real win on long documents — the model gets to query summaries first.

What constrains you:

The retrieval pipeline is opinionated. Want to do hybrid search with BM25 + vectors + custom RRF? You're fighting the framework. Want a cross-encoder rerank step from Cohere? Custom handler.
The chunking strategies are limited compared to writing your own. For specialised corpora (code, legal, transcripts), the built-in chunkers aren't enough.
Performance debugging is harder. The pipeline abstraction hides where time goes. Add OpenTelemetry early.

When to use Kernel Memory:

You're starting a new RAG project and you want the boring 80% paved.
Your corpus is general-purpose (mixed PDFs, markdown, web pages).
You're going to ship in a sprint and improve later.

When to skip it and build your own pipeline with raw Semantic Kernel:

You already know your retrieval will need hybrid + rerank + adaptive routing.
Your documents have a structure you want to exploit (medical records, legal contracts, code).
You care about every 50ms of latency.
You're operating at a scale where you'll need to swap pieces independently.

The pattern I've landed on: start with Kernel Memory, get to a working baseline, measure on a real eval set. If the baseline meets the bar, ship it. If not, replace components — first the chunker, then the retriever, then the prompt template. Eventually most of Kernel Memory is gone and you've got a custom Semantic Kernel pipeline. That graduation is fine and expected. Don't fight it by trying to make Kernel Memory do everything from day one.