Kernel Memory in .NET — Microsoft's Out-of-the-Box RAG Service Reviewed
Kernel Memory packages "do RAG" into a service you can run in-process or stand alone. Useful at the start. Constraining when you grow out of it.
Microsoft Kernel Memory is the closest thing to "RAG-as-a-service" in the .NET world. It bundles ingestion, chunking, embedding, retrieval, and synthetic-memory pipelines behind a small API. I've used it on two projects. Here's the honest verdict.
In-process mode is the minimum-friction RAG starter:
var memory = new KernelMemoryBuilder()
.WithAzureOpenAITextGeneration(azureCfg)
.WithAzureOpenAITextEmbeddingGeneration(azureCfg)
.WithPostgresMemoryDb(connStr)
.Build<MemoryServerless>();
await memory.ImportDocumentAsync("policy.pdf");
var answer = await memory.AskAsync("What's the refund window?");
That's a RAG. From scratch, in fifteen minutes, with PDF ingestion and Postgres + pgvector storage. Solid.
Service mode (Microsoft.KernelMemory.Service.AspNetCore) runs Kernel Memory as a separate process you call over HTTP. Useful when multiple apps share the same knowledge base, or when ingestion is heavy and you want it isolated from your API.
What's good:
- The pipeline handlers are pluggable. You can write a custom chunker, a custom extractor, your own embedding step, and slot them in. The architecture is open.
- PDF/Office/markdown ingestion is built in. You don't have to wire up
iTextorPdfPigyourself. - The synthetic memory concept (LLM-generated summaries stored alongside chunks) is a real win on long documents — the model gets to query summaries first.
What constrains you:
- The retrieval pipeline is opinionated. Want to do hybrid search with BM25 + vectors + custom RRF? You're fighting the framework. Want a cross-encoder rerank step from Cohere? Custom handler.
- The chunking strategies are limited compared to writing your own. For specialised corpora (code, legal, transcripts), the built-in chunkers aren't enough.
- Performance debugging is harder. The pipeline abstraction hides where time goes. Add OpenTelemetry early.
When to use Kernel Memory:
- You're starting a new RAG project and you want the boring 80% paved.
- Your corpus is general-purpose (mixed PDFs, markdown, web pages).
- You're going to ship in a sprint and improve later.
When to skip it and build your own pipeline with raw Semantic Kernel:
- You already know your retrieval will need hybrid + rerank + adaptive routing.
- Your documents have a structure you want to exploit (medical records, legal contracts, code).
- You care about every 50ms of latency.
- You're operating at a scale where you'll need to swap pieces independently.
The pattern I've landed on: start with Kernel Memory, get to a working baseline, measure on a real eval set. If the baseline meets the bar, ship it. If not, replace components — first the chunker, then the retriever, then the prompt template. Eventually most of Kernel Memory is gone and you've got a custom Semantic Kernel pipeline. That graduation is fine and expected. Don't fight it by trying to make Kernel Memory do everything from day one.