Tutorial: Building Your First RAG App
Tutorial: Building Your First RAG App
Section titled “Tutorial: Building Your First RAG App”End-to-End Guide: From Documents to Intelligent Q&A
In this tutorial, you’ll build a complete RAG application that can answer questions about your documents using EdgeQuake’s graph-enhanced retrieval.
Time: ~30 minutes
Level: Beginner
Prerequisites: EdgeQuake running (Quick Start)
What You’ll Build
Section titled “What You’ll Build”┌─────────────────────────────────────────────────────────────────┐│ YOUR RAG APPLICATION │├─────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ││ │ Upload │───▶│ Index │───▶│ Query │ ││ │ Documents │ │ & Extract │ │ & Answer │ ││ └─────────────┘ └─────────────┘ └─────────────┘ ││ ││ Features: ││ • Multi-document ingestion ││ • Knowledge graph extraction ││ • Multiple query modes ││ • Source citations ││ │└─────────────────────────────────────────────────────────────────┘Step 1: Start EdgeQuake
Section titled “Step 1: Start EdgeQuake”First, make sure EdgeQuake is running:
# Option A: With Ollama (free, local)make dev
# Option B: With OpenAI (requires API key)export OPENAI_API_KEY="sk-your-key"make devVerify it’s running:
curl http://localhost:8080/healthExpected response:
{ "status": "ok", "version": "0.1.0", "storage_mode": "postgresql" }Step 2: Create a Workspace
Section titled “Step 2: Create a Workspace”Workspaces organize your documents and provide isolation:
curl -X POST http://localhost:8080/api/v1/workspaces \ -H "Content-Type: application/json" \ -d '{ "name": "My First RAG App", "description": "Tutorial workspace for learning EdgeQuake" }'Response:
{ "id": "ws_abc123", "name": "My First RAG App", "description": "Tutorial workspace for learning EdgeQuake", "created_at": "2024-01-15T10:00:00Z"}Save the workspace ID for later:
export WORKSPACE_ID="ws_abc123"Step 3: Prepare Sample Documents
Section titled “Step 3: Prepare Sample Documents”Let’s create some sample documents about a fictional company:
doc1.txt - Company Overview
TechCorp Innovation Labs was founded in 2020 by Sarah Chen and Marcus Williams.The company is headquartered in San Francisco, with research offices in Boston and Seattle.
Sarah Chen serves as CEO and leads the company's AI research initiatives.She previously worked at Google DeepMind where she led the language model team.
Marcus Williams is the CTO and oversees all engineering operations.He has a PhD in Computer Science from MIT and previously founded two startups.
TechCorp's flagship product is NeuralSearch, an enterprise search platformthat uses advanced AI to help companies find information in their documents.doc2.txt - Recent News
TechCorp Announces $50M Series B Funding
SAN FRANCISCO, January 2024 - TechCorp Innovation Labs announced today thatit has raised $50 million in Series B funding led by Venture Partners Capital.
"This funding will accelerate our mission to make enterprise knowledgeaccessible to everyone," said Sarah Chen, CEO of TechCorp.
The company plans to use the funds to expand its engineering team andopen a new research office in London. NeuralSearch now serves over 200enterprise customers including Fortune 500 companies.
Existing investors including Startup Capital and AI Ventures alsoparticipated in the round.doc3.txt - Product Features
NeuralSearch Features and Capabilities
NeuralSearch is TechCorp's enterprise search platform that combinestraditional keyword search with AI-powered semantic understanding.
Key Features:- Semantic Search: Understands the meaning behind queries, not just keywords- Knowledge Graph: Automatically extracts entities and relationships from documents- Multi-modal: Supports text, PDFs, images, and spreadsheets- Enterprise Security: SOC 2 Type II certified with role-based access control- Integrations: Works with Slack, Microsoft Teams, Google Workspace, and Salesforce
NeuralSearch was developed by Marcus Williams and his engineering team of 50+engineers. The platform processes over 1 billion queries per month acrossall customer deployments.Step 4: Upload Documents
Section titled “Step 4: Upload Documents”Upload each document to your workspace:
# Upload doc1.txtcurl -X POST "http://localhost:8080/api/v1/documents?workspace_id=$WORKSPACE_ID" \ -F "file=@doc1.txt" \ -F "title=Company Overview"
# Upload doc2.txtcurl -X POST "http://localhost:8080/api/v1/documents?workspace_id=$WORKSPACE_ID" \ -F "file=@doc2.txt" \ -F "title=Series B Announcement"
# Upload doc3.txtcurl -X POST "http://localhost:8080/api/v1/documents?workspace_id=$WORKSPACE_ID" \ -F "file=@doc3.txt" \ -F "title=Product Features"Each upload returns a document ID and triggers background processing:
{ "id": "doc_xyz789", "title": "Company Overview", "status": "processing", "workspace_id": "ws_abc123"}Step 5: Monitor Processing
Section titled “Step 5: Monitor Processing”Check document processing status:
curl "http://localhost:8080/api/v1/documents?workspace_id=$WORKSPACE_ID"Response:
{ "documents": [ { "id": "doc_xyz789", "title": "Company Overview", "status": "completed", "chunk_count": 3, "entity_count": 8, "created_at": "2024-01-15T10:05:00Z" }, ... ]}Wait until all documents show status: "completed".
Step 6: Explore the Knowledge Graph
Section titled “Step 6: Explore the Knowledge Graph”See what entities were extracted:
curl "http://localhost:8080/api/v1/graph/entities?workspace_id=$WORKSPACE_ID"Response:
{ "entities": [ { "name": "SARAH_CHEN", "entity_type": "PERSON", "description": "CEO of TechCorp Innovation Labs, previously led language model team at Google DeepMind", "source_count": 3 }, { "name": "MARCUS_WILLIAMS", "entity_type": "PERSON", "description": "CTO of TechCorp, PhD from MIT, founded two startups", "source_count": 2 }, { "name": "TECHCORP_INNOVATION_LABS", "entity_type": "ORGANIZATION", "description": "AI company founded in 2020, headquartered in San Francisco", "source_count": 3 }, { "name": "NEURALSEARCH", "entity_type": "PRODUCT", "description": "Enterprise search platform with AI-powered semantic understanding", "source_count": 2 } ]}See relationships between entities:
curl "http://localhost:8080/api/v1/graph/relationships?workspace_id=$WORKSPACE_ID"Response:
{ "relationships": [ { "source": "SARAH_CHEN", "target": "TECHCORP_INNOVATION_LABS", "relationship_type": "FOUNDED", "description": "Sarah Chen co-founded TechCorp Innovation Labs in 2020" }, { "source": "SARAH_CHEN", "target": "TECHCORP_INNOVATION_LABS", "relationship_type": "LEADS", "description": "Sarah Chen serves as CEO" }, { "source": "MARCUS_WILLIAMS", "target": "NEURALSEARCH", "relationship_type": "DEVELOPED", "description": "Marcus Williams and his engineering team developed NeuralSearch" } ]}Step 7: Query Your Documents
Section titled “Step 7: Query Your Documents”Now the fun part! Ask questions about your documents:
Simple Question
Section titled “Simple Question”curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \ -H "Content-Type: application/json" \ -d '{ "query": "Who founded TechCorp?", "mode": "hybrid" }'Response:
{ "answer": "TechCorp Innovation Labs was founded in 2020 by Sarah Chen and Marcus Williams. Sarah Chen serves as CEO and leads the company's AI research initiatives, while Marcus Williams is the CTO overseeing all engineering operations.", "sources": [ { "document_id": "doc_xyz789", "title": "Company Overview", "chunk": "TechCorp Innovation Labs was founded in 2020 by Sarah Chen and Marcus Williams..." } ], "entities_used": [ "SARAH_CHEN", "MARCUS_WILLIAMS", "TECHCORP_INNOVATION_LABS" ], "mode": "hybrid"}Relationship Question
Section titled “Relationship Question”curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \ -H "Content-Type: application/json" \ -d '{ "query": "What is the relationship between Sarah Chen and Google?", "mode": "local" }'Response:
{ "answer": "Sarah Chen previously worked at Google DeepMind where she led the language model team before co-founding TechCorp Innovation Labs and becoming its CEO.", "sources": [...], "entities_used": ["SARAH_CHEN", "GOOGLE_DEEPMIND"]}Overview Question
Section titled “Overview Question”curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \ -H "Content-Type: application/json" \ -d '{ "query": "What are the main themes across these documents?", "mode": "global" }'Response:
{ "answer": "The main themes across these documents are:\n\n1. **Company Leadership**: The documents describe TechCorp's founding team - Sarah Chen (CEO) and Marcus Williams (CTO) - their backgrounds and roles.\n\n2. **Product Innovation**: NeuralSearch is the company's flagship product, an AI-powered enterprise search platform.\n\n3. **Growth and Funding**: TechCorp recently raised $50M in Series B funding and is expanding internationally.\n\n4. **AI and Enterprise**: The company focuses on making enterprise knowledge accessible through AI technology.", "sources": [...], "communities_used": 2}Step 8: Compare Query Modes
Section titled “Step 8: Compare Query Modes”Try the same question with different modes:
# Naive mode (vector search only)curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \ -H "Content-Type: application/json" \ -d '{"query": "Tell me about NeuralSearch", "mode": "naive"}'
# Local mode (entity-focused)curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \ -H "Content-Type: application/json" \ -d '{"query": "Tell me about NeuralSearch", "mode": "local"}'
# Global mode (community summaries)curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \ -H "Content-Type: application/json" \ -d '{"query": "Tell me about NeuralSearch", "mode": "global"}'
# Hybrid mode (combined - default)curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \ -H "Content-Type: application/json" \ -d '{"query": "Tell me about NeuralSearch", "mode": "hybrid"}'Notice how each mode provides slightly different perspectives based on its retrieval strategy.
Step 9: Use the Web UI
Section titled “Step 9: Use the Web UI”Open the EdgeQuake Web UI for a visual experience:
- Open http://localhost:3000 in your browser
- Select your workspace “My First RAG App”
- Navigate to the Documents tab to see your uploads
- Navigate to the Graph tab to visualize the knowledge graph
- Navigate to the Query tab to ask questions interactively
┌─────────────────────────────────────────────────────────────────┐│ EDGEQUAKE WEB UI │├─────────────────────────────────────────────────────────────────┤│ ││ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ││ │Documents│ │ Graph │ │ Query │ │Settings │ ││ └────┬────┘ └────┬────┘ └────┬────┘ └─────────┘ ││ │ │ │ ││ ▼ ▼ ▼ ││ ┌─────────┐ ┌─────────┐ ┌─────────┐ ││ │ Upload │ │ Visual │ │ Chat │ ││ │ List │ │ Graph │ │ Interface│ ││ │ Status │ │ Explorer│ │ Modes │ ││ └─────────┘ └─────────┘ └─────────┘ ││ │└─────────────────────────────────────────────────────────────────┘Step 10: Clean Up (Optional)
Section titled “Step 10: Clean Up (Optional)”Delete the workspace and all its data:
curl -X DELETE "http://localhost:8080/api/v1/workspaces/$WORKSPACE_ID"What You Learned
Section titled “What You Learned”✅ Created a workspace for document organization
✅ Uploaded multiple documents for processing
✅ Monitored document indexing status
✅ Explored the extracted knowledge graph
✅ Queried with different modes (naive, local, global, hybrid)
✅ Compared retrieval strategies
✅ Used both API and Web UI
Next Steps
Section titled “Next Steps”| Tutorial | Description |
|---|---|
| Document Ingestion Deep-Dive | Custom chunking and processing |
| Query Optimization | Choosing the right mode |
| Multi-Tenant Setup | Building a SaaS app |
| Custom Entity Types | Domain-specific extraction |
Troubleshooting
Section titled “Troubleshooting”Documents stuck in “processing”
Section titled “Documents stuck in “processing””# Check worker statuscurl http://localhost:8080/api/v1/tasks?status=pending
# View backend logsdocker compose logs -f edgequakeEmpty responses
Section titled “Empty responses”- Verify documents completed processing
- Check workspace_id is correct
- Try
naivemode to verify basic retrieval works
LLM errors
Section titled “LLM errors”- Check API key:
echo $OPENAI_API_KEY - Verify Ollama is running:
curl http://localhost:11434/api/tags - Check logs for rate limit errors
Complete Code Example
Section titled “Complete Code Example”Here’s a Python script that does everything above:
import requestsimport time
BASE_URL = "http://localhost:8080/api/v1"
# Step 1: Create workspaceresp = requests.post(f"{BASE_URL}/workspaces", json={ "name": "Python Tutorial", "description": "Created from Python script"})workspace = resp.json()workspace_id = workspace["id"]print(f"Created workspace: {workspace_id}")
# Step 2: Upload documentsdocuments = [ ("Company Overview", "doc1.txt"), ("Series B Announcement", "doc2.txt"), ("Product Features", "doc3.txt"),]
for title, filename in documents: with open(filename, "rb") as f: resp = requests.post( f"{BASE_URL}/documents?workspace_id={workspace_id}", files={"file": f}, data={"title": title} ) print(f"Uploaded: {title} -> {resp.json()['id']}")
# Step 3: Wait for processingprint("Waiting for processing...")while True: resp = requests.get(f"{BASE_URL}/documents?workspace_id={workspace_id}") docs = resp.json()["documents"] if all(d["status"] == "completed" for d in docs): break time.sleep(2)print("All documents processed!")
# Step 4: Queryquestions = [ "Who founded TechCorp?", "What is NeuralSearch?", "How much funding did they raise?",]
for question in questions: resp = requests.post( f"{BASE_URL}/query?workspace_id={workspace_id}", json={"query": question, "mode": "hybrid"} ) answer = resp.json()["answer"] print(f"\nQ: {question}") print(f"A: {answer[:200]}...")See Also
Section titled “See Also”- Quick Start - Minimal setup guide
- Query Modes - Understanding retrieval strategies
- REST API - Complete API reference
- Architecture - System design