Skip to content

Tutorial: Building Your First RAG App

End-to-End Guide: From Documents to Intelligent Q&A

In this tutorial, you’ll build a complete RAG application that can answer questions about your documents using EdgeQuake’s graph-enhanced retrieval.

Time: ~30 minutes
Level: Beginner
Prerequisites: EdgeQuake running (Quick Start)


┌─────────────────────────────────────────────────────────────────┐
│ YOUR RAG APPLICATION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Upload │───▶│ Index │───▶│ Query │ │
│ │ Documents │ │ & Extract │ │ & Answer │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Features: │
│ • Multi-document ingestion │
│ • Knowledge graph extraction │
│ • Multiple query modes │
│ • Source citations │
│ │
└─────────────────────────────────────────────────────────────────┘

First, make sure EdgeQuake is running:

Terminal window
# Option A: With Ollama (free, local)
make dev
# Option B: With OpenAI (requires API key)
export OPENAI_API_KEY="sk-your-key"
make dev

Verify it’s running:

Terminal window
curl http://localhost:8080/health

Expected response:

{ "status": "ok", "version": "0.1.0", "storage_mode": "postgresql" }

Workspaces organize your documents and provide isolation:

Terminal window
curl -X POST http://localhost:8080/api/v1/workspaces \
-H "Content-Type: application/json" \
-d '{
"name": "My First RAG App",
"description": "Tutorial workspace for learning EdgeQuake"
}'

Response:

{
"id": "ws_abc123",
"name": "My First RAG App",
"description": "Tutorial workspace for learning EdgeQuake",
"created_at": "2024-01-15T10:00:00Z"
}

Save the workspace ID for later:

Terminal window
export WORKSPACE_ID="ws_abc123"

Let’s create some sample documents about a fictional company:

doc1.txt - Company Overview

TechCorp Innovation Labs was founded in 2020 by Sarah Chen and Marcus Williams.
The company is headquartered in San Francisco, with research offices in Boston and Seattle.
Sarah Chen serves as CEO and leads the company's AI research initiatives.
She previously worked at Google DeepMind where she led the language model team.
Marcus Williams is the CTO and oversees all engineering operations.
He has a PhD in Computer Science from MIT and previously founded two startups.
TechCorp's flagship product is NeuralSearch, an enterprise search platform
that uses advanced AI to help companies find information in their documents.

doc2.txt - Recent News

TechCorp Announces $50M Series B Funding
SAN FRANCISCO, January 2024 - TechCorp Innovation Labs announced today that
it has raised $50 million in Series B funding led by Venture Partners Capital.
"This funding will accelerate our mission to make enterprise knowledge
accessible to everyone," said Sarah Chen, CEO of TechCorp.
The company plans to use the funds to expand its engineering team and
open a new research office in London. NeuralSearch now serves over 200
enterprise customers including Fortune 500 companies.
Existing investors including Startup Capital and AI Ventures also
participated in the round.

doc3.txt - Product Features

NeuralSearch Features and Capabilities
NeuralSearch is TechCorp's enterprise search platform that combines
traditional keyword search with AI-powered semantic understanding.
Key Features:
- Semantic Search: Understands the meaning behind queries, not just keywords
- Knowledge Graph: Automatically extracts entities and relationships from documents
- Multi-modal: Supports text, PDFs, images, and spreadsheets
- Enterprise Security: SOC 2 Type II certified with role-based access control
- Integrations: Works with Slack, Microsoft Teams, Google Workspace, and Salesforce
NeuralSearch was developed by Marcus Williams and his engineering team of 50+
engineers. The platform processes over 1 billion queries per month across
all customer deployments.

Upload each document to your workspace:

Terminal window
# Upload doc1.txt
curl -X POST "http://localhost:8080/api/v1/documents?workspace_id=$WORKSPACE_ID" \
-F "file=@doc1.txt" \
-F "title=Company Overview"
# Upload doc2.txt
curl -X POST "http://localhost:8080/api/v1/documents?workspace_id=$WORKSPACE_ID" \
-F "file=@doc2.txt" \
-F "title=Series B Announcement"
# Upload doc3.txt
curl -X POST "http://localhost:8080/api/v1/documents?workspace_id=$WORKSPACE_ID" \
-F "file=@doc3.txt" \
-F "title=Product Features"

Each upload returns a document ID and triggers background processing:

{
"id": "doc_xyz789",
"title": "Company Overview",
"status": "processing",
"workspace_id": "ws_abc123"
}

Check document processing status:

Terminal window
curl "http://localhost:8080/api/v1/documents?workspace_id=$WORKSPACE_ID"

Response:

{
"documents": [
{
"id": "doc_xyz789",
"title": "Company Overview",
"status": "completed",
"chunk_count": 3,
"entity_count": 8,
"created_at": "2024-01-15T10:05:00Z"
},
...
]
}

Wait until all documents show status: "completed".


See what entities were extracted:

Terminal window
curl "http://localhost:8080/api/v1/graph/entities?workspace_id=$WORKSPACE_ID"

Response:

{
"entities": [
{
"name": "SARAH_CHEN",
"entity_type": "PERSON",
"description": "CEO of TechCorp Innovation Labs, previously led language model team at Google DeepMind",
"source_count": 3
},
{
"name": "MARCUS_WILLIAMS",
"entity_type": "PERSON",
"description": "CTO of TechCorp, PhD from MIT, founded two startups",
"source_count": 2
},
{
"name": "TECHCORP_INNOVATION_LABS",
"entity_type": "ORGANIZATION",
"description": "AI company founded in 2020, headquartered in San Francisco",
"source_count": 3
},
{
"name": "NEURALSEARCH",
"entity_type": "PRODUCT",
"description": "Enterprise search platform with AI-powered semantic understanding",
"source_count": 2
}
]
}

See relationships between entities:

Terminal window
curl "http://localhost:8080/api/v1/graph/relationships?workspace_id=$WORKSPACE_ID"

Response:

{
"relationships": [
{
"source": "SARAH_CHEN",
"target": "TECHCORP_INNOVATION_LABS",
"relationship_type": "FOUNDED",
"description": "Sarah Chen co-founded TechCorp Innovation Labs in 2020"
},
{
"source": "SARAH_CHEN",
"target": "TECHCORP_INNOVATION_LABS",
"relationship_type": "LEADS",
"description": "Sarah Chen serves as CEO"
},
{
"source": "MARCUS_WILLIAMS",
"target": "NEURALSEARCH",
"relationship_type": "DEVELOPED",
"description": "Marcus Williams and his engineering team developed NeuralSearch"
}
]
}

Now the fun part! Ask questions about your documents:

Terminal window
curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \
-H "Content-Type: application/json" \
-d '{
"query": "Who founded TechCorp?",
"mode": "hybrid"
}'

Response:

{
"answer": "TechCorp Innovation Labs was founded in 2020 by Sarah Chen and Marcus Williams. Sarah Chen serves as CEO and leads the company's AI research initiatives, while Marcus Williams is the CTO overseeing all engineering operations.",
"sources": [
{
"document_id": "doc_xyz789",
"title": "Company Overview",
"chunk": "TechCorp Innovation Labs was founded in 2020 by Sarah Chen and Marcus Williams..."
}
],
"entities_used": [
"SARAH_CHEN",
"MARCUS_WILLIAMS",
"TECHCORP_INNOVATION_LABS"
],
"mode": "hybrid"
}
Terminal window
curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \
-H "Content-Type: application/json" \
-d '{
"query": "What is the relationship between Sarah Chen and Google?",
"mode": "local"
}'

Response:

{
"answer": "Sarah Chen previously worked at Google DeepMind where she led the language model team before co-founding TechCorp Innovation Labs and becoming its CEO.",
"sources": [...],
"entities_used": ["SARAH_CHEN", "GOOGLE_DEEPMIND"]
}
Terminal window
curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \
-H "Content-Type: application/json" \
-d '{
"query": "What are the main themes across these documents?",
"mode": "global"
}'

Response:

{
"answer": "The main themes across these documents are:\n\n1. **Company Leadership**: The documents describe TechCorp's founding team - Sarah Chen (CEO) and Marcus Williams (CTO) - their backgrounds and roles.\n\n2. **Product Innovation**: NeuralSearch is the company's flagship product, an AI-powered enterprise search platform.\n\n3. **Growth and Funding**: TechCorp recently raised $50M in Series B funding and is expanding internationally.\n\n4. **AI and Enterprise**: The company focuses on making enterprise knowledge accessible through AI technology.",
"sources": [...],
"communities_used": 2
}

Try the same question with different modes:

Terminal window
# Naive mode (vector search only)
curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \
-H "Content-Type: application/json" \
-d '{"query": "Tell me about NeuralSearch", "mode": "naive"}'
# Local mode (entity-focused)
curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \
-H "Content-Type: application/json" \
-d '{"query": "Tell me about NeuralSearch", "mode": "local"}'
# Global mode (community summaries)
curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \
-H "Content-Type: application/json" \
-d '{"query": "Tell me about NeuralSearch", "mode": "global"}'
# Hybrid mode (combined - default)
curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \
-H "Content-Type: application/json" \
-d '{"query": "Tell me about NeuralSearch", "mode": "hybrid"}'

Notice how each mode provides slightly different perspectives based on its retrieval strategy.


Open the EdgeQuake Web UI for a visual experience:

  1. Open http://localhost:3000 in your browser
  2. Select your workspace “My First RAG App”
  3. Navigate to the Documents tab to see your uploads
  4. Navigate to the Graph tab to visualize the knowledge graph
  5. Navigate to the Query tab to ask questions interactively
┌─────────────────────────────────────────────────────────────────┐
│ EDGEQUAKE WEB UI │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Documents│ │ Graph │ │ Query │ │Settings │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └─────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Upload │ │ Visual │ │ Chat │ │
│ │ List │ │ Graph │ │ Interface│ │
│ │ Status │ │ Explorer│ │ Modes │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Delete the workspace and all its data:

Terminal window
curl -X DELETE "http://localhost:8080/api/v1/workspaces/$WORKSPACE_ID"

✅ Created a workspace for document organization
✅ Uploaded multiple documents for processing
✅ Monitored document indexing status
✅ Explored the extracted knowledge graph
✅ Queried with different modes (naive, local, global, hybrid)
✅ Compared retrieval strategies
✅ Used both API and Web UI


TutorialDescription
Document Ingestion Deep-DiveCustom chunking and processing
Query OptimizationChoosing the right mode
Multi-Tenant SetupBuilding a SaaS app
Custom Entity TypesDomain-specific extraction

Terminal window
# Check worker status
curl http://localhost:8080/api/v1/tasks?status=pending
# View backend logs
docker compose logs -f edgequake
  1. Verify documents completed processing
  2. Check workspace_id is correct
  3. Try naive mode to verify basic retrieval works
  1. Check API key: echo $OPENAI_API_KEY
  2. Verify Ollama is running: curl http://localhost:11434/api/tags
  3. Check logs for rate limit errors

Here’s a Python script that does everything above:

import requests
import time
BASE_URL = "http://localhost:8080/api/v1"
# Step 1: Create workspace
resp = requests.post(f"{BASE_URL}/workspaces", json={
"name": "Python Tutorial",
"description": "Created from Python script"
})
workspace = resp.json()
workspace_id = workspace["id"]
print(f"Created workspace: {workspace_id}")
# Step 2: Upload documents
documents = [
("Company Overview", "doc1.txt"),
("Series B Announcement", "doc2.txt"),
("Product Features", "doc3.txt"),
]
for title, filename in documents:
with open(filename, "rb") as f:
resp = requests.post(
f"{BASE_URL}/documents?workspace_id={workspace_id}",
files={"file": f},
data={"title": title}
)
print(f"Uploaded: {title} -> {resp.json()['id']}")
# Step 3: Wait for processing
print("Waiting for processing...")
while True:
resp = requests.get(f"{BASE_URL}/documents?workspace_id={workspace_id}")
docs = resp.json()["documents"]
if all(d["status"] == "completed" for d in docs):
break
time.sleep(2)
print("All documents processed!")
# Step 4: Query
questions = [
"Who founded TechCorp?",
"What is NeuralSearch?",
"How much funding did they raise?",
]
for question in questions:
resp = requests.post(
f"{BASE_URL}/query?workspace_id={workspace_id}",
json={"query": question, "mode": "hybrid"}
)
answer = resp.json()["answer"]
print(f"\nQ: {question}")
print(f"A: {answer[:200]}...")