Last updated January 14, 2026
Graph RAG represents a fundamental evolution in retrieval-augmented generation by replacing flat vector similarity with structured knowledge traversal. Instead of treating documents as isolated chunks, Graph RAG extracts entities and relationships to build a queryable knowledge graph that enables multi-hop reasoning across interconnected concepts.
Traditional RAG hits fundamental limitations when reasoning requires connecting multiple facts
Standard RAG architectures retrieve document chunks based on semantic similarity, then pass them directly to a language model. This approach works well for single-fact queries where the answer exists verbatim in one passage. But real enterprise questions often require synthesizing information scattered across many documents.
Consider asking a RAG system: “Which suppliers have contracts expiring within 90 days that also had quality issues last quarter?” A vector search might retrieve some contract documents and some quality reports, but connecting the dots between specific suppliers, their contracts, and their quality incidents requires explicit relationship tracking.
The original RAG paper by Lewis et al. (2020) demonstrated impressive results on knowledge-intensive NLP tasks, but the authors acknowledged limitations on complex reasoning chains. Subsequent research from Microsoft identified this gap, leading to the GraphRAG approach published in 2024 that introduced hierarchical community detection for document summarization (Microsoft Research, 2024).
Graph RAG addresses these limitations by constructing a knowledge graph during indexing. Entities extracted from documents become nodes. Relationships between entities become edges. At query time, the system traverses this graph to gather contextually relevant information that might never appear in the same chunk.
Entity extraction transforms unstructured text into graph-ready knowledge
The first stage of Graph RAG pipelines involves identifying entities within documents. Named Entity Recognition (NER) models detect people, organizations, locations, dates, products, and domain-specific entities. Modern approaches use transformer-based models fine-tuned on domain corpora.
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
import spacy
# Load a pre-trained NER model
tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
def extract_entities(text: str) -> list[dict]:
"""
Extract named entities from text using transformer-based NER.
Returns list of entities with type and position.
"""
raw_entities = ner_pipeline(text)
entities = []
for ent in raw_entities:
entities.append({
"text": ent["word"],
"type": ent["entity_group"],
"confidence": ent["score"],
"start": ent["start"],
"end": ent["end"]
})
return entities
# Example extraction
document = """
Acme Corporation signed a $2.5M contract with GlobalTech on January 15, 2026.
The agreement covers cloud infrastructure services for their Paris datacenter.
CEO John Smith announced the partnership at the Tech Summit in Berlin.
"""
entities = extract_entities(document)
for ent in entities:
print(f"{ent['type']}: {ent['text']} (confidence: {ent['confidence']:.2f})")
Beyond basic NER, production Graph RAG systems implement coreference resolution to link pronouns and references back to their antecedents. When a document mentions “the company” or “they,” the system must determine which entity is being referenced. SpaCy’s neuralcoref or more recent transformer-based coreference models handle this disambiguation.
Entity linking then maps extracted mentions to canonical entities in the knowledge graph. “Acme Corp,” “Acme Corporation,” and “ACME” should all resolve to the same graph node. This normalization prevents graph fragmentation and enables accurate traversal.
Relation extraction identifies connections between entities
Once entities are identified, relation extraction determines how they connect. This transforms text like “Acme Corporation signed a contract with GlobalTech” into a structured triplet: (Acme Corporation, SIGNED_CONTRACT_WITH, GlobalTech).
Modern relation extraction approaches fall into two categories. Pipeline approaches first extract entities, then classify the relationship between each entity pair. Joint extraction models simultaneously identify entities and their relationships, often achieving better accuracy through shared representations.
from typing import NamedTuple
from openai import OpenAI
class Triplet(NamedTuple):
subject: str
predicate: str
object: str
confidence: float
def extract_relations_llm(text: str, entities: list[dict]) -> list[Triplet]:
"""
Extract relationships between entities using an LLM.
This approach leverages the model's reasoning for complex relations.
"""
client = OpenAI()
entity_list = ", ".join([e["text"] for e in entities])
prompt = f"""Extract all relationships between the following entities found in the text.
Entities: {entity_list}
Text: {text}
For each relationship, output in format:
SUBJECT | PREDICATE | OBJECT | CONFIDENCE
Use clear, normalized predicates like:
- WORKS_FOR
- LOCATED_IN
- SIGNED_CONTRACT_WITH
- ANNOUNCED
- PARTNERED_WITH
Output only the relationships, one per line."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
temperature=0.1
)
triplets = []
for line in response.choices[0].message.content.strip().split("\n"):
parts = [p.strip() for p in line.split("|")]
if len(parts) == 4:
triplets.append(Triplet(
subject=parts[0],
predicate=parts[1],
object=parts[2],
confidence=float(parts[3])
))
return triplets
# Alternative: Use dedicated RE models for better control
from transformers import AutoModelForSeq2SeqLM
def extract_relations_rebel(text: str) -> list[Triplet]:
"""
Extract relations using the REBEL model (Relation Extraction By End-to-end Language generation).
REBEL was trained on Wikipedia and Wikidata relations.
Reference: Huguet Cabot and Navigli, 2021
"""
tokenizer = AutoTokenizer.from_pretrained("Babelscape/rebel-large")
model = AutoModelForSeq2SeqLM.from_pretrained("Babelscape/rebel-large")
inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=256, num_beams=5)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=False)
# Parse REBEL's output format
triplets = parse_rebel_output(decoded)
return triplets
The REBEL model from Babelscape (Huguet Cabot and Navigli, 2021) provides a competitive open-source option for relation extraction. Trained on Wikipedia with Wikidata supervision, it generates triplets directly in a seq2seq fashion. For domain-specific relations, fine-tuning on annotated examples from your corpus improves extraction quality.
Graph databases store and query extracted knowledge efficiently
The extracted entities and relations need persistent storage that supports efficient traversal queries. Graph databases excel at this task because they represent relationships as first-class citizens rather than foreign key joins.
Neo4j remains the most widely deployed graph database, offering the Cypher query language optimized for path traversal. For teams already running PostgreSQL, Apache AGE provides graph capabilities as an extension. Amazon Neptune and Azure Cosmos DB offer managed graph services in cloud environments.
from neo4j import GraphDatabase
from typing import Optional
class KnowledgeGraph:
"""
Knowledge graph interface for Graph RAG using Neo4j.
Handles entity storage, relation creation, and subgraph retrieval.
"""
def __init__(self, uri: str, user: str, password: str):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def add_entity(self, entity_id: str, entity_type: str, properties: dict):
"""Add or update an entity node in the graph."""
with self.driver.session() as session:
session.run(
f"""
MERGE (e:{entity_type} {{id: $id}})
SET e += $properties
""",
id=entity_id,
properties=properties
)
def add_relation(self, subject_id: str, predicate: str, object_id: str,
properties: Optional[dict] = None):
"""Create a relationship between two entities."""
with self.driver.session() as session:
session.run(
f"""
MATCH (s {{id: $subject_id}})
MATCH (o {{id: $object_id}})
MERGE (s)-[r:{predicate}]->(o)
SET r += $properties
""",
subject_id=subject_id,
object_id=object_id,
properties=properties or {}
)
def get_entity_subgraph(self, entity_id: str, max_hops: int = 2) -> dict:
"""
Retrieve the local subgraph around an entity up to max_hops away.
Returns nodes and relationships for context assembly.
"""
with self.driver.session() as session:
result = session.run(
"""
MATCH path = (start {id: $entity_id})-[*1..$max_hops]-(connected)
RETURN path
""",
entity_id=entity_id,
max_hops=max_hops
)
nodes = {}
relations = []
for record in result:
path = record["path"]
for node in path.nodes:
nodes[node["id"]] = dict(node)
for rel in path.relationships:
relations.append({
"source": rel.start_node["id"],
"target": rel.end_node["id"],
"type": rel.type,
"properties": dict(rel)
})
return {"nodes": nodes, "relations": relations}
def find_paths(self, start_id: str, end_id: str, max_length: int = 4) -> list:
"""
Find all paths between two entities up to a maximum length.
Essential for multi-hop reasoning queries.
"""
with self.driver.session() as session:
result = session.run(
"""
MATCH path = shortestPath((start {id: $start_id})-[*1..$max_length]-(end {id: $end_id}))
RETURN path
ORDER BY length(path)
LIMIT 10
""",
start_id=start_id,
end_id=end_id,
max_length=max_length
)
paths = []
for record in result:
path_nodes = [dict(n) for n in record["path"].nodes]
path_rels = [{"type": r.type} for r in record["path"].relationships]
paths.append({"nodes": path_nodes, "relations": path_rels})
return paths
Graph indexing strategies matter for query performance. Creating indexes on entity IDs and commonly queried properties accelerates MATCH operations. For very large graphs, partitioning strategies based on entity type or document source can maintain sub-second query times.
Query processing links user questions to graph entities
When a user submits a query, Graph RAG systems must identify which entities in the knowledge graph are relevant. This entity linking at query time mirrors the extraction process but operates on shorter text with less context.
The query “What contracts does Acme have expiring soon?” should link to the Acme Corporation entity and the concept of contract expiration. Dense retrieval can help here: embedding the query and comparing against entity embeddings identifies candidate entities even when exact name matches fail.
import numpy as np
from sentence_transformers import SentenceTransformer
class QueryProcessor:
"""
Process user queries to identify relevant graph entities and formulate traversal strategies.
"""
def __init__(self, kg: KnowledgeGraph, embedding_model: str = "BAAI/bge-large-en-v1.5"):
self.kg = kg
self.encoder = SentenceTransformer(embedding_model)
self.entity_embeddings = {}
self.entity_metadata = {}
def index_entities(self, entities: list[dict]):
"""Pre-compute embeddings for all entities in the graph."""
texts = []
ids = []
for ent in entities:
# Create rich text representation of entity
text = f"{ent['type']}: {ent['name']}. {ent.get('description', '')}"
texts.append(text)
ids.append(ent['id'])
self.entity_metadata[ent['id']] = ent
embeddings = self.encoder.encode(texts, normalize_embeddings=True)
for i, ent_id in enumerate(ids):
self.entity_embeddings[ent_id] = embeddings[i]
def link_query_entities(self, query: str, top_k: int = 5) -> list[dict]:
"""
Find graph entities most relevant to the query.
Uses dense retrieval over entity embeddings.
"""
query_embedding = self.encoder.encode(query, normalize_embeddings=True)
similarities = []
for ent_id, ent_embedding in self.entity_embeddings.items():
sim = np.dot(query_embedding, ent_embedding)
similarities.append((ent_id, sim))
similarities.sort(key=lambda x: x[1], reverse=True)
results = []
for ent_id, score in similarities[:top_k]:
results.append({
**self.entity_metadata[ent_id],
"relevance_score": float(score)
})
return results
def expand_context(self, seed_entities: list[str], query: str) -> dict:
"""
Expand from seed entities through the graph to gather relevant context.
Uses relationship types and entity properties to guide expansion.
"""
context = {
"entities": {},
"relations": [],
"paths": []
}
# Gather local neighborhoods of seed entities
for ent_id in seed_entities:
subgraph = self.kg.get_entity_subgraph(ent_id, max_hops=2)
context["entities"].update(subgraph["nodes"])
context["relations"].extend(subgraph["relations"])
# Find connecting paths between seed entities
if len(seed_entities) > 1:
for i, ent1 in enumerate(seed_entities):
for ent2 in seed_entities[i+1:]:
paths = self.kg.find_paths(ent1, ent2, max_length=3)
context["paths"].extend(paths)
return context
Query decomposition helps complex questions that require multiple traversals. “Which executives at companies with expiring contracts also attended last month’s board meeting?” decomposes into: (1) find companies with expiring contracts, (2) find executives at those companies, (3) filter by board meeting attendance. Each sub-query can traverse different parts of the graph.
Context assembly formats graph knowledge for LLM consumption
The subgraph retrieved during query processing must be formatted for the language model. Raw triplets lack the fluency needed for good generation. Graph-to-text approaches convert structured knowledge into natural language passages.
Microsoft’s GraphRAG paper introduced community summaries as an abstraction layer. The Leiden algorithm partitions the graph into communities of densely connected entities. Each community gets a summary generated by an LLM, creating hierarchical context that captures both local and global structure.
from dataclasses import dataclass
from typing import List
@dataclass
class GraphContext:
"""Formatted context from graph traversal for LLM prompt."""
entity_descriptions: List[str]
relationship_statements: List[str]
supporting_passages: List[str]
def format_triplets_as_text(triplets: list[dict]) -> list[str]:
"""
Convert graph triplets into natural language statements.
More readable for LLM consumption than raw structured data.
"""
statements = []
predicate_templates = {
"WORKS_FOR": "{subject} works for {object}",
"LOCATED_IN": "{subject} is located in {object}",
"SIGNED_CONTRACT_WITH": "{subject} signed a contract with {object}",
"PARTNERED_WITH": "{subject} has a partnership with {object}",
"REPORTED_TO": "{subject} reports to {object}",
"SUBSIDIARY_OF": "{subject} is a subsidiary of {object}",
"ACQUIRED": "{subject} acquired {object}",
}
for triplet in triplets:
predicate = triplet.get("type", triplet.get("predicate"))
subject = triplet.get("source", triplet.get("subject"))
obj = triplet.get("target", triplet.get("object"))
if predicate in predicate_templates:
statement = predicate_templates[predicate].format(
subject=subject, object=obj
)
else:
# Fallback for unknown predicates
readable_pred = predicate.replace("_", " ").lower()
statement = f"{subject} {readable_pred} {obj}"
# Add temporal/property context if available
if props := triplet.get("properties", {}):
if date := props.get("date"):
statement += f" (as of {date})"
if amount := props.get("amount"):
statement += f" for {amount}"
statements.append(statement)
return statements
def assemble_prompt(query: str, context: GraphContext) -> str:
"""
Assemble the final prompt combining query and graph context.
Structured to guide the LLM toward graph-grounded responses.
"""
prompt_parts = [
"Answer the following question using the provided knowledge graph context.",
"Base your response only on the information given. If the context doesn't contain",
"enough information to fully answer, acknowledge what's missing.",
"",
"## Question",
query,
"",
"## Entity Information",
]
for desc in context.entity_descriptions:
prompt_parts.append(f"- {desc}")
prompt_parts.extend([
"",
"## Known Relationships",
])
for stmt in context.relationship_statements:
prompt_parts.append(f"- {stmt}")
if context.supporting_passages:
prompt_parts.extend([
"",
"## Supporting Document Excerpts",
])
for passage in context.supporting_passages:
prompt_parts.append(f"> {passage}")
prompt_parts.extend([
"",
"## Answer",
])
return "\n".join(prompt_parts)
Hybrid approaches combine graph context with traditional vector-retrieved chunks. The graph provides structured relationships while chunks offer verbatim supporting text. This combination grounds generation in both explicit knowledge (graph) and implicit context (passages).
LLM generation produces graph-grounded responses
The language model receives the formatted context and generates a response. Graph grounding reduces hallucination because the model’s output can be traced back to specific entities and relationships in the retrieved subgraph.
Post-generation verification compares mentioned facts against the graph. Claims about entity properties or relationships get validated. If the model asserts something not present in the provided context, the system can flag or filter that content.
from openai import OpenAI
class GraphRAGGenerator:
"""
Generate responses grounded in knowledge graph context.
Includes citation tracking and hallucination detection.
"""
def __init__(self, model: str = "gpt-4o"):
self.client = OpenAI()
self.model = model
def generate(self, query: str, context: GraphContext) -> dict:
"""Generate a response with citation tracking."""
prompt = assemble_prompt(query, context)
system_prompt = """You are a precise assistant that answers questions based on
knowledge graph data. Follow these rules:
1. Only use information from the provided context
2. Cite entities when making claims about them
3. If information is missing, explicitly state what you cannot determine
4. Structure your response clearly with the reasoning chain visible"""
response = self.client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt}
],
temperature=0.1
)
answer = response.choices[0].message.content
# Track which entities were mentioned in the response
cited_entities = self.extract_citations(answer, context)
return {
"answer": answer,
"cited_entities": cited_entities,
"context_used": {
"entities": len(context.entity_descriptions),
"relationships": len(context.relationship_statements)
}
}
def extract_citations(self, answer: str, context: GraphContext) -> list[str]:
"""Identify which context entities appear in the generated answer."""
cited = []
answer_lower = answer.lower()
for desc in context.entity_descriptions:
# Extract entity name from description
entity_name = desc.split(":")[0] if ":" in desc else desc.split()[0]
if entity_name.lower() in answer_lower:
cited.append(entity_name)
return list(set(cited))
def verify_claims(self, answer: str, context: GraphContext) -> dict:
"""
Verify that claims in the answer are supported by the context.
Returns verification status and unsupported claims.
"""
verification_prompt = f"""Analyze this answer and identify any claims made:
Answer: {answer}
For each claim, check if it's supported by this context:
{chr(10).join(context.relationship_statements)}
Output each claim with SUPPORTED or UNSUPPORTED status."""
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": verification_prompt}],
temperature=0
)
return {"verification": response.choices[0].message.content}
Performance considerations shape production Graph RAG deployments
Building and querying knowledge graphs at scale requires careful engineering. Entity extraction and relation mining add significant indexing time compared to simple chunking. A corpus that takes hours to index for vector RAG might take days for full Graph RAG processing.
Incremental updates present challenges. When new documents arrive, the system must extract their entities, link them to existing graph nodes, and add new relationships. Duplicate detection and entity resolution become continuous processes rather than one-time indexing steps.
Query latency depends on graph traversal depth. Single-hop queries complete in milliseconds. Multi-hop reasoning with path finding can take seconds on large graphs without proper indexing. Caching frequently traversed subgraphs helps for common query patterns.
The Microsoft GraphRAG paper reports their hierarchical community approach processes a corpus of 1,700 podcasts in approximately 4 hours using GPT-4 for entity extraction and summarization. Cost scales with LLM usage during indexing, making open-source extraction models attractive for large corpora.
LangChain and LlamaIndex provide Graph RAG abstractions
Both major LLM frameworks now include Graph RAG components. LlamaIndex’s KnowledgeGraphIndex builds graphs during indexing and supports Cypher query generation. LangChain’s graph integrations connect to Neo4j, NebulaGraph, and other stores.
from llama_index.core import KnowledgeGraphIndex, SimpleDirectoryReader
from llama_index.core.graph_stores import Neo4jGraphStore
from llama_index.llms.openai import OpenAI
# LlamaIndex Graph RAG setup
def create_llamaindex_graph_rag(documents_path: str, neo4j_config: dict):
"""
Create a Graph RAG index using LlamaIndex.
Handles entity extraction and graph construction automatically.
"""
# Load documents
documents = SimpleDirectoryReader(documents_path).load_data()
# Configure graph store
graph_store = Neo4jGraphStore(
username=neo4j_config["user"],
password=neo4j_config["password"],
url=neo4j_config["uri"],
database=neo4j_config.get("database", "neo4j")
)
# Create index with entity extraction
llm = OpenAI(model="gpt-4o", temperature=0)
index = KnowledgeGraphIndex.from_documents(
documents,
graph_store=graph_store,
llm=llm,
max_triplets_per_chunk=10,
include_embeddings=True
)
return index
def query_graph_rag(index: KnowledgeGraphIndex, query: str):
"""Query the Graph RAG index with graph-based retrieval."""
query_engine = index.as_query_engine(
include_text=True, # Include source text alongside graph
response_mode="tree_summarize",
embedding_mode="hybrid" # Combine graph and vector retrieval
)
response = query_engine.query(query)
return {
"answer": str(response),
"source_nodes": [node.text for node in response.source_nodes]
}
These frameworks abstract significant complexity but may not expose all tuning options. Production deployments often customize the extraction pipeline, entity resolution logic, and traversal strategies beyond what framework defaults provide.
Graph RAG excels at specific use cases
Not every RAG application benefits from graph structure. Graph RAG adds value when queries require connecting information across documents, when entities and relationships are central to the domain, or when multi-hop reasoning chains answer important questions.
Legal document analysis benefits from tracking parties, obligations, and relationships across contracts. Medical literature review connects drugs, conditions, treatments, and outcomes. Financial analysis links companies, executives, transactions, and events. These domains have inherent graph structure that RAG can exploit.
Conversely, if most queries seek verbatim passages with answers contained in single chunks, traditional vector RAG may suffice. The added complexity of graph construction and maintenance only pays off when queries actually require traversal.
Hybrid architectures combine the best of both approaches
Many production systems implement both vector and graph retrieval, selecting the approach based on query characteristics. Simple factual questions route to vector RAG for speed. Complex analytical questions activate graph traversal for thoroughness.
Query classification determines the routing. A lightweight model predicts whether a query requires single-hop lookup or multi-hop reasoning. Alternatively, both retrievers run in parallel with results merged before generation.
from enum import Enum
from dataclasses import dataclass
class QueryComplexity(Enum):
SIMPLE = "simple" # Single fact lookup
MODERATE = "moderate" # Multiple facts, same entity
COMPLEX = "complex" # Multi-hop reasoning required
@dataclass
class HybridRAGConfig:
vector_weight: float = 0.5
graph_weight: float = 0.5
complexity_threshold: float = 0.7
class HybridRAG:
"""
Hybrid RAG system combining vector and graph retrieval.
Routes queries based on complexity analysis.
"""
def __init__(self, vector_retriever, graph_retriever, config: HybridRAGConfig):
self.vector = vector_retriever
self.graph = graph_retriever
self.config = config
def classify_query(self, query: str) -> QueryComplexity:
"""
Analyze query to determine required retrieval strategy.
Uses heuristics and optional ML classification.
"""
# Simple heuristics
multi_entity_keywords = ["relationship", "connected", "between", "compare"]
reasoning_keywords = ["why", "how does", "what causes", "impact"]
query_lower = query.lower()
if any(kw in query_lower for kw in multi_entity_keywords):
return QueryComplexity.COMPLEX
if any(kw in query_lower for kw in reasoning_keywords):
return QueryComplexity.MODERATE
# Count potential entity references (capitalized words)
entity_candidates = len([w for w in query.split() if w[0].isupper()])
if entity_candidates > 2:
return QueryComplexity.COMPLEX
return QueryComplexity.SIMPLE
def retrieve(self, query: str) -> dict:
"""
Retrieve context using appropriate strategy based on query complexity.
"""
complexity = self.classify_query(query)
if complexity == QueryComplexity.SIMPLE:
# Vector-only retrieval for simple queries
return {
"strategy": "vector",
"context": self.vector.retrieve(query)
}
elif complexity == QueryComplexity.COMPLEX:
# Graph-primary retrieval with vector augmentation
graph_context = self.graph.retrieve(query)
vector_context = self.vector.retrieve(query, limit=3)
return {
"strategy": "graph_primary",
"graph_context": graph_context,
"supporting_chunks": vector_context
}
else:
# Parallel retrieval with weighted merge
graph_context = self.graph.retrieve(query)
vector_context = self.vector.retrieve(query)
return {
"strategy": "hybrid",
"graph_context": graph_context,
"vector_context": vector_context,
"weights": {
"graph": self.config.graph_weight,
"vector": self.config.vector_weight
}
}
Moving forward with Graph RAG implementation
Starting a Graph RAG project requires entity schema design before any code. What entity types matter for your domain? What relationships connect them? This ontology shapes extraction pipelines and query capabilities.
Begin with a subset of your corpus to validate the approach. Extract entities and relations from representative documents. Build a test graph. Run sample queries manually to verify the structure captures the knowledge you need.
Evaluation remains challenging. Standard RAG benchmarks test single-hop retrieval. Graph RAG’s multi-hop capabilities require custom evaluation sets with questions that genuinely need relationship traversal. Without appropriate evaluation, you cannot measure whether the added complexity provides value.
The Graph RAG landscape continues evolving. Research explores neural graph construction that learns entity and relation schemas from data. Knowledge graph embeddings enable soft matching beyond exact entity linking. Temporal reasoning tracks how relationships change over time. These advances will make Graph RAG more powerful and easier to deploy in the years ahead.