·RAG·2 min read·Senior developers

GraphRAG in .NET — When Vector Search Can't Reason Across Documents

Vector search is great at "find me a relevant chunk." It's bad at "find me chunks that mention X and Y in a particular relationship." That's what GraphRAG is for.

If your knowledge base has questions like "which engineers worked on projects that shipped in Q3 and used Azure?", no amount of vector search will save you. The answer isn't in one chunk. It's in the connections between chunks. That's the case for GraphRAG.

The idea: during ingestion, extract entities (people, projects, technologies, dates) and the relationships between them. Store them as a graph. At query time, parse the question into a graph traversal and run it.

The ingest step is where the LLM earns its keep:

var extractor = new ChatCompletionAgent
{
    Instructions = """
        Extract entities and relationships from the text.
        Return JSON: {
          "entities": [{"id": "...", "type": "Person|Project|Tech|Date", "name": "..."}],
          "relations": [{"from": "id", "to": "id", "type": "WORKED_ON|USED|SHIPPED_IN"}]
        }
        """,
    Kernel = kernel
};

await foreach (var msg in extractor.InvokeAsync(chunkText))
    StoreInGraph(JsonSerializer.Deserialize<GraphExtract>(msg.Content));

Storage: Neo4j is the natural fit. If you don't want another database, Postgres works with a graph schema:

CREATE TABLE entities (id TEXT PRIMARY KEY, type TEXT, name TEXT, embedding VECTOR(1536));
CREATE TABLE relations (id BIGSERIAL, from_id TEXT, to_id TEXT, type TEXT,
                        source_chunk_id TEXT);
CREATE INDEX ON entities USING ivfflat (embedding);
CREATE INDEX ON relations (from_id, type);

Note the embedding column on entities — useful for fuzzy matching ("Lyndsey" vs "Lindsay") at query time.

Query time, the agent translates the question into a traversal:

flowchart LR
    Q["Which engineers worked on<br/>projects that used Azure?"] --> P[Parse question]
    P --> S1["Find entities matching 'Azure'"]
    S1 --> S2[Walk USED relations backward<br/>to Projects]
    S2 --> S3[Walk WORKED_ON relations backward<br/>to Engineers]
    S3 --> A[Return engineer list with citations]

Compared to vector RAG on a relationship-heavy query:

Approach "Which engineers worked on Azure projects in Q3?"
Vector RAG (top 10) Returned 4 unrelated project mentions, 1 right answer buried at rank 8
GraphRAG Returned 5 engineers, each with citing chunk IDs

Two things to know before you commit:

Ingest is slow. Every chunk goes through an LLM-as-extractor. A 200-page corpus that took 2 minutes to embed will take 20-40 minutes to extract entities for. Plan for batch processing, not real-time ingestion.

Schema drift is real. The LLM will sometimes invent relationship types ("CONTRIBUTED_TO" when you meant "WORKED_ON"). Either constrain the extractor with a fixed enum in the prompt, or normalise after the fact. Don't let the graph schema be whatever the LLM felt like that day.

When to use GraphRAG: knowledge bases where the relationships are the value. Project lineage, org charts, supply-chain data, contract networks. Not customer support docs. Not product documentation. Those don't need graphs; vector search handles them.

The honest summary: GraphRAG is real and powerful for the right corpus. It's also significantly more work than vector RAG. Don't reach for it because it sounds smarter. Reach for it when your evals show relationships are the gap.