Skip to content

Query Modes Deep-Dive

Understanding EdgeQuake’s Multi-Strategy Retrieval System

EdgeQuake provides 6 distinct query modes, each optimized for different types of questions. This guide explains when and why to use each mode, with practical examples and tuning recommendations.



Different questions require fundamentally different retrieval strategies. Consider these queries about a document about climate science:

QuestionOptimal Strategy
”What is the greenhouse effect?”Vector search - Find semantically similar chunks
”How does Sarah Chen’s work relate to atmospheric modeling?”Graph traversal - Follow entity relationships
”What are the main themes in this document?”Community detection - Analyze topic clusters
”Explain Sarah Chen’s contributions to climate research”Both - Entity + broader context

A single retrieval strategy cannot optimally serve all these query types. EdgeQuake’s multi-mode system allows you to match the strategy to your question.

┌─────────────────┐
│ PRECISION │
│ │
│ (Specific, │
│ Accurate) │
└────────┬────────┘
Naive ─────────┼─────────
┌────────────────────┼────────────────────┐
│ │ │
│ Hybrid │ │
│ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ SPEED │ │ │ │ COVERAGE│
│ │──────────│ Mix │──────────│ │
│ (Fast, │ Local │ │ Global │ (Broad, │
│ Cheap) │ │ │ │Complete)│
└─────────┘ └─────────┘ └─────────┘

No mode is universally “best” - each makes different trade-offs.


ModeVector SearchGraph TraversalBest For
NaiveFactual queries, keyword lookup
LocalEntity-specific questions
GlobalTheme/topic analysis
HybridComplex, multi-faceted queries
MixCustom weighted retrieval
BypassDirect LLM, testing
┌─────────────────────────────────────────────────────────────────┐
│ QUERY MODE QUICK GUIDE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ "What is X?" → Naive (fast, direct) │
│ "How does A relate to B?" → Local (entity graph) │
│ "What are the main themes?" → Global (topic clusters) │
│ "Tell me about X and its impact"→ Hybrid (comprehensive) │
│ "I need custom weights" → Mix (tunable) │
│ "Skip RAG, just ask LLM" → Bypass (testing) │
│ │
└─────────────────────────────────────────────────────────────────┘

Use this decision tree to select the optimal mode:

┌─────────────────────────┐
│ Is RAG needed at all? │
└───────────┬─────────────┘
┌───────────────┴───────────────┐
│ │
YES NO
│ │
▼ ▼
┌───────────────────────┐ ┌───────────────┐
│ Does query mention │ │ BYPASS │
│ specific entities? │ │ (no RAG) │
└───────────┬───────────┘ └───────────────┘
┌───────────┴───────────┐
│ │
YES NO
│ │
▼ ▼
┌───────────────────────┐ ┌───────────────────────┐
│ Also asking about │ │ Asking about themes │
│ broader context? │ │ or overarching topics?│
└───────────┬───────────┘ └───────────┬───────────┘
│ │
┌───────┴───────┐ ┌───────┴───────┐
│ │ │ │
YES NO YES NO
│ │ │ │
▼ ▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
│HYBRID │ │ LOCAL │ │GLOBAL │ │ NAIVE │
│ │ │ │ │ │ │ │
└───────┘ └───────┘ └───────┘ └───────┘

FEAT0101: Vector similarity search only

Naive mode performs pure vector similarity search on document chunks, without graph traversal. It’s the fastest mode and works well for simple factual queries.

┌─────────────────────────────────────────────────────────────────┐
│ NAIVE MODE FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Query: "What is machine learning?" │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Embed Query │ → [0.23, -0.45, 0.87, ...] │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Vector Database (pgvector) │ │
│ │ ┌────────┐ ┌────────┐ ┌────────┐ │ │
│ │ │chunk_1 │ │chunk_2 │ │chunk_3 │ ... │ │
│ │ │sim:0.92│ │sim:0.85│ │sim:0.78│ │ │
│ │ └────────┘ └────────┘ └────────┘ │ │
│ └─────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Top-K Chunks │ → ["ML is a subset of AI...", │
│ │ (scored) │ "Training neural networks..."] │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ LLM Generation │ → "Machine learning is..." │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Good for:

  • Simple factual questions (“What is X?”)
  • Keyword-based lookup
  • Fast response requirements
  • When graph data is sparse

Avoid when:

  • Asking about relationships
  • Need comprehensive coverage
  • Entities are important
Terminal window
curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "What is the greenhouse effect?",
"mode": "naive"
}'
MetricTypical Value
Latency100-300ms
Context tokens500-2000
LLM calls1

FEAT0102: Entity-centric graph traversal

Local mode combines vector search with graph traversal from identified entities. It excels at questions about specific entities and their relationships.

┌─────────────────────────────────────────────────────────────────┐
│ LOCAL MODE FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Query: "How does Sarah Chen work with the IPCC?" │
│ │ │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Embed Query │ │ Extract Entities│ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ Vector Search │ │ Entity Lookup │ │
│ │ (chunks) │ │ SARAH_CHEN, IPCC │ │
│ └────────┬────────┘ └────────┬────────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌─────────────────────────┐ │
│ │ │ Graph Traversal │ │
│ │ │ │ │
│ │ │ SARAH_CHEN ──WORKS_WITH──▶ IPCC │
│ │ │ │ │ │
│ │ │ └──AUTHORED──▶ PAPER_1 │
│ │ │ │ │
│ │ └─────────────────────────┘ │
│ │ │ │
│ └──────────┬──────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Merge Context │ │
│ │ (chunks + │ │
│ │ entities + │ │
│ │ relationships) │ │
│ └────────┬────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ LLM Generation │ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Good for:

  • Questions about specific people, places, organizations
  • Relationship queries (“How does X relate to Y?”)
  • When entity context enriches the answer
  • Named entity questions

Avoid when:

  • Entities not well-extracted
  • Asking about abstract concepts
  • Need speed over comprehensiveness
Terminal window
curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "What is Sarah Chen'\''s research focus?",
"mode": "local"
}'
MetricTypical Value
Latency200-500ms
Context tokens1000-3000
Graph queries3-10

FEAT0103: Community-based summarization

Global mode focuses on high-level topic clusters identified during indexing. It’s ideal for theme analysis and summary questions.

┌─────────────────────────────────────────────────────────────────┐
│ GLOBAL MODE FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Query: "What are the main themes in this document?" │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Community Detection (pre-computed during index)│ │
│ │ │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ Community 1 │ │ Community 2 │ │ │
│ │ │ "Climate" │ │ "Technology" │ │ │
│ │ │ │ │ │ │ │
│ │ │ • IPCC │ │ • MACHINE_ │ │ │
│ │ │ • SARAH_CHEN │ │ LEARNING │ │ │
│ │ │ • CO2_LEVELS │ │ • NEURAL_NET │ │ │
│ │ │ • WARMING │ │ • PREDICTION │ │ │
│ │ └─────────────────┘ └─────────────────┘ │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ Community Summaries │ │ │
│ │ │ "Climate: Research focuses on..." │ │ │
│ │ │ "Technology: ML applications..." │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ LLM Generation │ │
│ │ (theme synthesis)│ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Good for:

  • “What are the main themes/topics?”
  • Summary questions
  • Overview requests
  • When breadth matters more than depth

Avoid when:

  • Asking about specific entities
  • Need precise factual answers
  • Speed is critical
Terminal window
curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "What topics does this document cover?",
"mode": "global"
}'
MetricTypical Value
Latency300-800ms
Context tokens2000-4000
Communities5-20

FEAT0104: Combines Local and Global (Default)

Hybrid mode uses both vector search and full graph traversal, combining the precision of Local with the coverage of Global. It’s the default mode because it handles the widest variety of queries.

┌─────────────────────────────────────────────────────────────────┐
│ HYBRID MODE FLOW │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Query: "Explain Sarah Chen's impact on climate modeling" │
│ │ │
│ ├─────────────────────────────────────┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ LOCAL PATH │ │ GLOBAL PATH │ │
│ │ │ │ │ │
│ │ • Vector search│ │ • Community │ │
│ │ • Entity lookup│ │ summaries │ │
│ │ • Neighborhood │ │ • Topic context│ │
│ │ traversal │ │ │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ │ ┌───────────────────────────┐ │ │
│ └─▶│ CONTEXT FUSION │◀───┘ │
│ │ │ │
│ │ 1. Deduplicate entities │ │
│ │ 2. Merge relationships │ │
│ │ 3. Combine chunks │ │
│ │ 4. Apply token budget │ │
│ └─────────────┬─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ LLM Generation │ │
│ │ (comprehensive answer)│ │
│ └─────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Good for:

  • Complex, multi-faceted questions
  • When you’re unsure which mode to use
  • Production default
  • Comprehensive answers needed

Avoid when:

  • Speed is critical
  • Token budget is tight
  • Simple factual queries
Terminal window
curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "Explain the relationship between ML and climate research",
"mode": "hybrid"
}'
MetricTypical Value
Latency400-1000ms
Context tokens3000-4000
LLM calls1

FEAT0105: Weighted combination with tunable parameters

Mix mode allows explicit weighting between vector and graph retrieval. Use it when you need fine-grained control over the retrieval strategy.

{
"query": "Your question here",
"mode": "mix",
"params": {
"vector_weight": 0.7,
"graph_weight": 0.3
}
}

Good for:

  • A/B testing retrieval strategies
  • Domain-specific tuning
  • When default weights don’t work well
  • Research and experimentation

FEAT0106: Direct LLM, no retrieval

Bypass mode skips RAG entirely and sends the query directly to the LLM. Useful for testing or when external knowledge isn’t needed.

Terminal window
curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "What is 2 + 2?",
"mode": "bypass"
}'

ModeLatencyAccuracyContextCost
Naive⚡ Fast (100-300ms)⭐⭐⭐ GoodSmall💵 Low
Local🚀 Medium (200-500ms)⭐⭐⭐⭐ HighMedium💵💵 Medium
Global🐢 Slow (300-800ms)⭐⭐⭐⭐ HighLarge💵💵 Medium
Hybrid🐢 Slow (400-1000ms)⭐⭐⭐⭐⭐ BestLarge💵💵💵 High
MixVariableTunableTunableVariable
Bypass⚡ Fastest⭐ LLM onlyNone💵 Low
┌─────────────────────────────────────────────────────────────────┐
│ RESOURCE USAGE BY MODE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Naive ████░░░░░░░░░░░░░░░░ (Vector only) │
│ │
│ Local ████████░░░░░░░░░░░░ (Vector + Graph node) │
│ │
│ Global ██████████░░░░░░░░░░ (Graph communities) │
│ │
│ Hybrid ████████████████░░░░ (All sources) │
│ │
│ Mix ████████████░░░░░░░░ (Weighted blend) │
│ │
│ ─────────────────────────────────────────► │
│ Low High │
│ │
└─────────────────────────────────────────────────────────────────┘

QueryEngineConfig {
default_mode: QueryMode::Hybrid,
max_chunks: 10,
max_entities: 20,
max_context_tokens: 4000,
graph_depth: 2,
min_score: 0.1,
include_sources: true,
}
ParameterDefaultEffect
max_chunks10More chunks = more context, higher cost
max_entities20More entities = richer graph context
max_context_tokens4000Token budget for LLM context
graph_depth2How many hops in graph traversal
min_score0.1Similarity threshold for inclusion

For Naive mode:

  • Increase max_chunks for better coverage
  • Lower min_score for more permissive matching

For Local mode:

  • Increase graph_depth for deeper relationships
  • Balance max_entities vs max_chunks

For Global mode:

  • Ensure communities are well-formed
  • Consider community detection parameters

For Hybrid mode:

  • Use max_context_tokens to balance cost
  • Enable reranking for better precision

Terminal window
curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-H "X-Workspace-ID: your-workspace" \
-d '{
"query": "What is the main finding?",
"mode": "naive"
}'
Terminal window
curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "Explain the climate research methodology",
"mode": "hybrid",
"enable_rerank": true,
"rerank_top_k": 5
}'
Terminal window
curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "Your question",
"mode": "local",
"context_only": true
}'

This returns only the retrieved context without LLM generation, useful for debugging retrieval quality.

Terminal window
curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "Your question",
"mode": "hybrid",
"prompt_only": true
}'

Returns the formatted prompt that would be sent to the LLM.