Skip to content

Troubleshooting Guide

Diagnosing and Resolving Common EdgeQuake Issues

This guide helps you identify and fix common problems when running EdgeQuake.


Terminal window
# Check basic health
curl http://localhost:8080/health
# Check readiness with dependencies
curl http://localhost:8080/health/ready
# Check if backend is responding
curl -I http://localhost:8080/api/v1/workspaces
Terminal window
# Check all services (if using make)
make status
# Check PostgreSQL
docker exec edgequake-postgres pg_isready -U edgequake
# Check Ollama
curl http://localhost:11434/api/tags

Symptom: “Expected request with Content-Type: application/json

Section titled “Symptom: “Expected request with Content-Type: application/json””

Cause: Using multipart form data (-F "file=@...") with the wrong endpoint.

Solution: EdgeQuake has two different endpoints for document upload:

Option A - Upload Text as JSON (/api/v1/documents):

Terminal window
curl -X POST http://localhost:8080/api/v1/documents \
-H "Content-Type: application/json" \
-d '{
"content": "Your text content here...",
"title": "Document Title"
}'

Option B - Upload Files (PDF, TXT, MD, etc.) (/api/v1/documents/upload):

Terminal window
curl -X POST http://localhost:8080/api/v1/documents/upload \
-F "file=@your-document.pdf" \
-F "title=My Document"

Symptom: “Failed to parse the request body as JSON”

Section titled “Symptom: “Failed to parse the request body as JSON””

Cause: Using -F flag (multipart) with a JSON endpoint, or mixing content types.

Solution: Choose the correct endpoint and format:

Upload TypeEndpointContent-TypeFormat
Text/JSON/api/v1/documentsapplication/json-d '{...}'
Files/api/v1/documents/uploadmultipart/form-data-F "file=@"
Batch Files/api/v1/documents/upload/batchmultipart/form-data-F "files=@"

Examples:

Terminal window
# ❌ WRONG - multipart to JSON endpoint
curl -X POST http://localhost:8080/api/v1/documents \
-F "file=@doc.pdf"
# ✅ CORRECT - multipart to upload endpoint
curl -X POST http://localhost:8080/api/v1/documents/upload \
-F "file=@doc.pdf"
# ✅ CORRECT - JSON to documents endpoint
curl -X POST http://localhost:8080/api/v1/documents \
-H "Content-Type: application/json" \
-d '{"content": "Text here", "title": "My Doc"}'

Cause: Port 8080 is already in use.

Solution:

Terminal window
# Find what's using port 8080
lsof -i :8080
# Kill the process
kill -9 <PID>
# Or use different port
PORT=9090 cargo run

Cause: Invalid PostgreSQL connection string.

Solution:

Terminal window
# Check format
DATABASE_URL="postgresql://user:password@host:port/database"
# Test connection
psql "$DATABASE_URL" -c "SELECT 1"

Symptom: “Extension ‘vector’ not found”

Section titled “Symptom: “Extension ‘vector’ not found””

Cause: pgvector extension not installed.

Solution:

-- As superuser in PostgreSQL
CREATE EXTENSION IF NOT EXISTS vector;

Or rebuild Docker container:

Terminal window
docker compose down -v
docker compose up -d

Symptom: Documents stay in “processing” status

Section titled “Symptom: Documents stay in “processing” status”

Diagnosis:

Terminal window
# Check pending tasks
curl http://localhost:8080/api/v1/tasks?status=pending
# Check backend logs
docker compose logs -f edgequake
# Or if running locally
tail -f /tmp/edgequake-backend.log

Common Causes:

CauseSolution
LLM rate limitWait and retry
Invalid API keyCheck OPENAI_API_KEY
Ollama not runningStart Ollama: ollama serve
Worker crashRestart backend

Solution:

Terminal window
# Restart workers
make stop
make dev
# Or manually retry document
curl -X POST "http://localhost:8080/api/v1/documents/$DOC_ID/reprocess"

PDF extraction can fail or produce poor quality results due to PDF structure, encoding, or layout complexity. This section covers the most common PDF-specific problems.

Issue 3.1: No Text Extracted (chunk_count = 0)

Section titled “Issue 3.1: No Text Extracted (chunk_count = 0)”

Symptom: After PDF upload, chunk_count = 0 or chunks are empty

Diagnosis:

Terminal window
# Check document details
curl http://localhost:8080/api/v1/documents/doc-uuid
# Response shows:
{
"chunk_count": 0,
"metadata": {"pages": 50, "extraction_mode": "Text"}
}

Cause: PDF is image-based (scanned document, no embedded text layer)

Solution 1 - Enable Vision Mode:

Terminal window
# Re-upload with vision mode
curl -X POST http://localhost:8080/api/v1/documents/upload \
-F "file=@scanned_book.pdf" \
-F "title=Scanned Book" \
-F 'config={"mode": "Vision", "vision_dpi": 150}'

Solution 2 - Try Hybrid Mode (automatic detection):

Terminal window
# Hybrid mode automatically detects low-quality pages
curl -X POST http://localhost:8080/api/v1/documents/upload \
-F "file=@mixed_quality.pdf" \
-F 'config={"mode": "Hybrid", "quality_threshold": 0.7}'

Cost Warning: Vision mode costs ~$0.001-0.01 per page with OpenAI GPT-4o-mini.

Verification:

  • Check chunk_count > 0 in response
  • Check extraction_mode shows "Vision" or "Hybrid"
  • Download chunks to verify content extracted

Related: See PDF Ingestion Tutorial for more details on vision mode.


Issue 3.2: Tables Not Detected or Malformed

Section titled “Issue 3.2: Tables Not Detected or Malformed”

Symptom: Tables appear as scrambled text or not detected at all

Before (text mode):

Header1 Header2 Header3 Data1a Data1b
Data1c Data2a Data2b Data2c Data3a

Diagnosis:

Terminal window
# Check table detection
curl http://localhost:8080/api/v1/documents/doc-uuid
# Response:
{
"metadata": {
"tables_detected": 0 // Should be > 0 if tables exist
}
}

Cause: Complex table layout (merged cells, nested structures, no clear borders)

Solution 1 - Enable Table Enhancement:

Terminal window
# LLM-based table refinement
curl -X POST http://localhost:8080/api/v1/documents \
-F "file=@financial_report.pdf" \
-F "title=Financial Report" \
-F 'config={"enhance_tables": true, "ai_temperature": 0.1}'

Solution 2 - Combine with Multi-Column Detection:

Terminal window
# For academic papers with tables in columns
curl -X POST http://localhost:8080/api/v1/documents \
-F "file=@research_paper.pdf" \
-F 'config={
"enhance_tables": true,
"layout": {"detect_columns": true}
}'

After (enhanced):

| Header 1 | Header 2 | Header 3 |
| -------- | -------- | -------- |
| Data 1a | Data 1b | Data 1c |
| Data 2a | Data 2b | Data 2c |
| Data 3a | Data 3b | Data 3c |

Trade-offs:

  • Processing time: 2x slower
  • Cost: ~$0.0001 per table
  • Accuracy: Significantly improved for complex tables

Limitations:

  • Very complex tables (5+ levels of merged cells) may still fail
  • Tables without any borders are harder to detect

Verification:

  • Check tables_detected > 0 in response
  • Inspect chunk content for proper markdown table format
  • Verify cell alignments correct

Related: See PDF Processing Deep Dive for algorithm details.


Issue 3.3: Wrong Text Order (Multi-Column Layout)

Section titled “Issue 3.3: Wrong Text Order (Multi-Column Layout)”

Symptom: Text from different columns interleaved incorrectly

Example Problem:

# PDF has 2 columns:
Column 1: "The experiment showed that X leads to Y..."
Column 2: "In conclusion, we recommend Z..."
# Extracted (wrong order):
"The experiment In showed conclusion, that we X recommend leads Z... to Y..."

Diagnosis:

Terminal window
# Download chunks and check text order
curl http://localhost:8080/api/v1/documents/doc-uuid/chunks
# Look for interleaved text from different sections

Cause: PDF has multi-column layout (academic papers, newspapers, magazines)

Solution - Enable Column Detection:

Terminal window
# Detect and respect column boundaries
curl -X POST http://localhost:8080/api/v1/documents \
-F "file=@research_paper.pdf" \
-F "title=Research Paper" \
-F 'config={
"layout": {
"detect_columns": true,
"column_gap_threshold": 20.0
}
}'

Configuration Options:

  • detect_columns: Enable multi-column detection (default: true)
  • column_gap_threshold: Minimum gap in points between columns (default: 20.0)
    • Increase for wider column gaps
    • Decrease for narrower column gaps

Verification:

  • Read first few chunks - text should flow naturally
  • Check column boundaries respected
  • Verify reading order left-to-right within each column

Tip: Academic papers almost always need column detection enabled.

Related: See PDF Processing Deep Dive for XY-Cut algorithm.


Issue 3.4: Encoding Errors (Special Characters)

Section titled “Issue 3.4: Encoding Errors (Special Characters)”

Symptom: or ? characters appear instead of actual text

Examples:

  • "Caf�" instead of "Café"
  • "Na�ve" instead of "Naïve"
  • "????" instead of Chinese/Arabic text

Diagnosis:

Terminal window
# Check extracted content for garbled characters
curl http://localhost:8080/api/v1/documents/doc-uuid/chunks | jq -r '.chunks[0].content'
# Look for � or ? characters

Cause: PDF uses custom fonts or non-standard encoding not supported by text extraction

Solution 1 - Enable Vision Mode:

Terminal window
# LLM vision reads the actual glyphs
curl -X POST http://localhost:8080/api/v1/documents \
-F "file=@custom_fonts.pdf" \
-F 'config={"mode": "Vision"}'

Solution 2 - Check PDF Font Embedding:

Terminal window
# Use pdffonts to check font embedding (requires poppler-utils)
pdffonts document.pdf
# Look for fonts marked "no" in "emb" column
# These fonts may cause encoding issues

Workaround: If vision mode too expensive, consider:

  1. Re-generate PDF with embedded fonts
  2. Convert PDF to another format (DOCX) then back to PDF
  3. Use OCR tool (Tesseract) to create new PDF with text layer

Verification:

  • Check special characters render correctly
  • Verify non-English text (if applicable)
  • Compare extracted text with PDF visual

Related: Vision mode uses LLM to read actual rendered text, avoiding encoding issues entirely.


Issue 3.5: Low Chunk Quality or Empty Chunks

Section titled “Issue 3.5: Low Chunk Quality or Empty Chunks”

Symptom: Some chunks are very short, empty, or contain garbage

Example:

{
"chunks": [
{ "content": "Page 1", "token_count": 2 }, // Too short
{ "content": "", "token_count": 0 }, // Empty
{ "content": "||||||||", "token_count": 8 } // Garbage
]
}

Diagnosis:

Terminal window
# Count empty or short chunks
curl http://localhost:8080/api/v1/documents/doc-uuid/chunks | \
jq '[.chunks[] | select(.token_count < 10)] | length'
# If many short chunks → extraction quality issue

Causes:

  1. PDF has headers/footers (page numbers, logos)
  2. Complex layout confuses chunking
  3. Embedded images without captions
  4. Poor quality scan

Solution 1 - Enable Readability Enhancement:

Terminal window
# LLM cleans up extracted text
curl -X POST http://localhost:8080/api/v1/documents \
-F "file=@complex_layout.pdf" \
-F 'config={"enhance_readability": true}'

Solution 2 - Normalize Spacing:

Terminal window
# Fix concatenated words and spacing issues
curl -X POST http://localhost:8080/api/v1/documents \
-F "file=@poor_spacing.pdf" \
-F 'config={"normalize_spacing": true, "consolidate_headers": true}'

Solution 3 - Adjust Chunking:

Terminal window
# Increase chunk size to merge small fragments
curl -X POST http://localhost:8080/api/v1/documents \
-F "file=@fragmented.pdf" \
-F "chunk_size=1024" \
-F "chunk_overlap=100"

Verification:

  • Check average chunk token count (should be 100-500 tokens)
  • Verify no empty chunks
  • Inspect chunks for coherent content

Symptom: PDF upload returns 500 error or times out

Common Errors:

{
"error": "Request timeout",
"status": 408
}
{
"error": "File too large",
"status": 413
}

Diagnosis:

Terminal window
# Check file size
ls -lh document.pdf
# Check backend logs
docker compose logs -f edgequake
# Or local logs
tail -f /tmp/edgequake-backend.log

Common Causes:

ErrorCauseSolution
413 Payload Too LargeFile > 50MBSplit PDF or increase limit
408 TimeoutProcessing > 60sUse vision mode or increase timeout
500 Internal ErrorCorrupted PDFRepair with pdftk or ghostscript
500 Memory ErrorLarge PDF + VisionProcess in batches with max_pages

Solution 1 - Test with Page Limit:

Terminal window
# Process first 10 pages to verify config
curl -X POST http://localhost:8080/api/v1/documents \
-F "file=@large_report.pdf" \
-F 'config={"max_pages": 10}'
# If successful, process full document

Solution 2 - Increase Server Timeouts:

Terminal window
# Increase timeout in config (if you control server)
REQUEST_TIMEOUT=300 # 5 minutes
# Or split PDF into smaller files
pdftk large.pdf cat 1-50 output part1.pdf
pdftk large.pdf cat 51-100 output part2.pdf

Solution 3 - Repair Corrupted PDF:

Terminal window
# Using ghostscript to repair
gs -o repaired.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress original.pdf
# Using pdftk
pdftk original.pdf output repaired.pdf

Verification:

  • Check upload returns 200 OK
  • Verify status: "completed" in response
  • Check no error logs in backend

Use this flowchart to quickly diagnose PDF issues:

PDF Upload Issue?
├─ chunk_count = 0
│ ├─ Try Vision mode → {"mode": "Vision"}
│ ├─ Still 0? → Check if PDF encrypted/protected
│ └─ Still 0? → File GitHub issue with sample
├─ Tables not detected / malformed
│ ├─ Enable table enhancement → {"enhance_tables": true}
│ ├─ Still bad? → Try Vision + enhance → {"mode": "Vision", "enhance_tables": true}
│ └─ Complex table? → Known limitation (file issue)
├─ Text order wrong
│ ├─ Enable column detection → {"layout": {"detect_columns": true}}
│ └─ Still wrong? → Adjust column_gap_threshold
├─ Encoding errors (�, ?)
│ ├─ Try Vision mode → {"mode": "Vision"}
│ └─ Still bad? → Check PDF font embedding (pdffonts)
├─ Upload fails / timeout
│ ├─ File > 50MB? → Split PDF or increase limit
│ ├─ Timeout? → Test with max_pages: 10
│ └─ Error 500? → Repair PDF (ghostscript, pdftk)
└─ Poor quality chunks
├─ Enable readability → {"enhance_readability": true}
├─ Normalize spacing → {"normalize_spacing": true}
└─ Adjust chunk_size → chunk_size=1024

Common configurations for different PDF types:

Digital PDF (good quality):

Terminal window
# Default settings - no config needed
curl -X POST http://localhost:8080/api/v1/documents/upload -F "file=@digital.pdf" http://localhost:8080/api/v1/documents/upload

Scanned Document:

Terminal window
curl -X POST http://localhost:8080/api/v1/documents/upload -F "file=@scanned.pdf" \
-F 'config={"mode": "Vision", "vision_dpi": 150}' \
http://localhost:8080/api/v1/documents/upload

Academic Paper (multi-column):

Terminal window
curl -X POST http://localhost:8080/api/v1/documents/upload -F "file=@paper.pdf" \
-F 'config={"layout": {"detect_columns": true}}' \
http://localhost:8080/api/v1/documents/upload

Financial Report (complex tables):

Terminal window
curl -X POST http://localhost:8080/api/v1/documents/upload -F "file=@financials.pdf" \
-F 'config={"enhance_tables": true}' \
http://localhost:8080/api/v1/documents/upload

Unknown Quality (auto-detect):

Terminal window
curl -X POST http://localhost:8080/api/v1/documents/upload -F "file=@unknown.pdf" \
-F 'config={"mode": "Hybrid", "quality_threshold": 0.7}' \
http://localhost:8080/api/v1/documents/upload

Critical Document (maximum accuracy):

Terminal window
curl -X POST http://localhost:8080/api/v1/documents/upload -F "file=@critical.pdf" \
-F 'config={
"mode": "Vision",
"enhance_tables": true,
"enhance_readability": true,
"vision_dpi": 200
}' \
http://localhost:8080/api/v1/documents/upload

If PDF extraction still fails after trying these solutions:

  1. Read Full Documentation:

  2. Check GitHub Issues:

    • Search existing issues: https://github.com/org/edgequake/issues
    • Look for similar PDF problems
    • Check if limitation is already known
  3. File New Issue:

    • Include PDF metadata (pages, file size, type)
    • Include error messages / logs
    • Attach problematic PDF (if not confidential)
    • Describe expected vs actual behavior
  4. Community Support:

    • Discord: #pdf-extraction channel
    • Stack Overflow: Tag edgequake pdf

Diagnosis:

Terminal window
# Check document count
curl "http://localhost:8080/api/v1/documents?workspace_id=$WORKSPACE_ID"
# Check entity count
curl "http://localhost:8080/api/v1/graph/entities?workspace_id=$WORKSPACE_ID"
# Check chunk count
curl "http://localhost:8080/api/v1/workspaces/$WORKSPACE_ID/stats"

Common Causes:

SymptomCauseSolution
0 documentsNo uploadsUpload documents
Documents but 0 entitiesProcessing failedReprocess documents
Entities but no resultsQuery not matchingTry different mode

Debug Query:

Terminal window
# Try naive mode (vector only) to verify basic retrieval
curl -X POST "http://localhost:8080/api/v1/query?workspace_id=$WORKSPACE_ID" \
-H "Content-Type: application/json" \
-d '{"query": "test", "mode": "naive", "max_chunks": 20}'

Symptom: “OpenAI API error: Rate limit exceeded”

Section titled “Symptom: “OpenAI API error: Rate limit exceeded””

Solution:

Terminal window
# Wait and retry
sleep 60
# Or switch to different provider
export EDGEQUAKE_LLM_PROVIDER=ollama

Symptom: “OpenAI API error: Invalid API key”

Section titled “Symptom: “OpenAI API error: Invalid API key””

Solution:

Terminal window
# Check key is set
echo $OPENAI_API_KEY
# Check key starts with sk-
# Keys should look like: sk-proj-abc123...
# Test key directly
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"

Symptom: “Connection refused to Ollama”

Section titled “Symptom: “Connection refused to Ollama””

Solution:

Terminal window
# Start Ollama
ollama serve
# Check it's running
curl http://localhost:11434/api/tags
# Pull required model
ollama pull gemma3:12b
ollama pull nomic-embed-text

Diagnosis:

Terminal window
# Enable debug logging
RUST_LOG="edgequake=debug" cargo run
# Check query timing in logs
# Look for: "query completed in Xms"

Common Causes:

CauseDiagnosisSolution
Cold LLMFirst query slowWarm up with test query
Large contextToo many chunksReduce max_chunks
Slow embeddingOllama on CPUUse GPU or OpenAI
DB connectionPool exhaustedCheck connections

Quick Fixes:

Terminal window
# Use faster model
curl -X POST "http://localhost:8080/api/v1/query" \
-d '{"query": "test", "llm_model": "gpt-4o-mini"}'
# Reduce context size
curl -X POST "http://localhost:8080/api/v1/query" \
-d '{"query": "test", "max_chunks": 5, "max_entities": 5}'

Solution:

Terminal window
# Check active connections
psql $DATABASE_URL -c "SELECT count(*) FROM pg_stat_activity WHERE datname='edgequake'"
# Increase pool size in DATABASE_URL
DATABASE_URL="postgresql://user:pass@host:5432/db?max_connections=50"
# Or use PgBouncer for pooling

Symptom: “relation ‘documents’ does not exist”

Section titled “Symptom: “relation ‘documents’ does not exist””

Cause: Migrations haven’t run.

Solution:

Terminal window
# Migrations run automatically on startup
# Restart the backend
cargo run
# Or run manually if needed
psql $DATABASE_URL -f migrations/001_initial.sql

Solution:

Terminal window
# Check disk usage
df -h
# Check table sizes
psql $DATABASE_URL -c "
SELECT schemaname, relname,
pg_size_pretty(pg_total_relation_size(relid))
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC LIMIT 10;
"
# Vacuum database
psql $DATABASE_URL -c "VACUUM ANALYZE"

Solution:

-- Load AGE extension
LOAD 'age';
SET search_path = ag_catalog, "$user", public;
-- Create graph if not exists
SELECT create_graph('edgequake_graph');

Diagnosis:

Terminal window
# Check relationship count
curl "http://localhost:8080/api/v1/graph/relationships?workspace_id=$WORKSPACE_ID"
# If 0 relationships, check extraction logs

Solution:

Terminal window
# Reprocess with verbose logging
RUST_LOG="edgequake_pipeline=debug" cargo run
# Then reprocess document
curl -X POST "http://localhost:8080/api/v1/documents/$DOC_ID/reprocess"

Symptom: Frontend can’t connect to backend

Section titled “Symptom: Frontend can’t connect to backend”

Check:

Terminal window
# Is backend running?
curl http://localhost:8080/health
# CORS enabled?
# Backend should return Access-Control-Allow-Origin header
curl -I http://localhost:8080/health

Solution:

Terminal window
# Start backend first, then frontend
make backend-dev &
sleep 5
make frontend-dev

Causes:

  1. No entities extracted
  2. WebSocket connection failed
  3. Sigma.js not loading

Solution:

Terminal window
# Check entities exist via API
curl "http://localhost:8080/api/v1/graph/entities?workspace_id=$WORKSPACE_ID"
# Check browser console for errors
# Open DevTools → Console

Terminal window
# Backend logs
tail -f /tmp/edgequake-backend.log
# Frontend logs
tail -f /tmp/edgequake-frontend.log
# Docker logs
docker compose logs -f
# Specific component
docker compose logs -f edgequake
docker compose logs -f postgres
-- Check document status
SELECT status, count(*) FROM documents GROUP BY status;
-- Find failed documents
SELECT id, title, error_message FROM documents WHERE status = 'failed';
-- Check entity counts by workspace
SELECT workspace_id, count(*) FROM entities GROUP BY workspace_id;
-- Check embedding dimensions
SELECT embedding_dimension, count(*) FROM embeddings GROUP BY embedding_dimension;
Terminal window
# Verbose curl output
curl -v http://localhost:8080/api/v1/workspaces
# Pretty print JSON
curl http://localhost:8080/api/v1/workspaces | jq
# Check response headers
curl -I http://localhost:8080/health

Error CodeMeaningSolution
400Bad requestCheck request format
401UnauthorizedAdd API key
404Not foundCheck workspace_id exists
422Validation errorCheck required fields
429Rate limitedWait and retry
500Server errorCheck logs
503Service unavailableCheck DB/LLM connection

  1. Check this troubleshooting guide
  2. Check logs for specific error messages
  3. Verify environment variables are set
  4. Try with minimal configuration (in-memory, mock LLM)

Start with maximum logging:

Terminal window
RUST_LOG="edgequake=trace,sqlx=debug,tower_http=debug" cargo run

Include in your report:

  1. EdgeQuake version (cargo run --version)
  2. Storage mode (PostgreSQL or Memory)
  3. LLM provider and model
  4. Steps to reproduce
  5. Relevant log output
  6. Expected vs actual behavior