Deep Dive: Graph Storage
Deep Dive: Graph Storage
Section titled “Deep Dive: Graph Storage”How EdgeQuake Stores and Queries Knowledge Graphs
Graph storage is the foundation of EdgeQuake’s knowledge management. This document explains how entities and relationships are stored, the property graph model, and available storage backends.
Overview
Section titled “Overview”EdgeQuake uses a property graph model to store extracted knowledge:
┌─────────────────────────────────────────────────────────────────┐│ PROPERTY GRAPH MODEL │├─────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────────────────────────────────────────────────────┐││ │ NODES (Entities) │││ │ │││ │ ┌───────────────────┐ ┌───────────────────┐ │││ │ │ SARAH_CHEN │ │ MIT │ │││ │ ├───────────────────┤ ├───────────────────┤ │││ │ │ type: PERSON │ │ type: ORGANIZATION│ │││ │ │ description: ... │ │ description: ... │ │││ │ │ source_id: chunk1 │ │ source_id: chunk1 │ │││ │ │ importance: 0.9 │ │ importance: 0.8 │ │││ │ └─────────┬─────────┘ └─────────┬─────────┘ │││ │ │ │ │││ └────────────│─────────────────────────│──────────────────────┘││ │ │ ││ ┌────────────│─────────────────────────│──────────────────────┐││ │ │ EDGES (Relationships)│ │││ │ │ │ │││ │ └─────────────────────────┘ │││ │ │ │││ │ ▼ │││ │ ┌─────────────────────────────────────────────────────────┐│││ │ │ SARAH_CHEN ──[works_at]──▶ MIT ││││ │ ├─────────────────────────────────────────────────────────┤│││ │ │ relation_type: works_at ││││ │ │ description: "Dr. Chen is a researcher at MIT" ││││ │ │ weight: 0.9 ││││ │ │ keywords: ["researcher", "faculty", "AI"] ││││ │ │ source_chunk_id: chunk1 ││││ │ └─────────────────────────────────────────────────────────┘│││ │ │││ └─────────────────────────────────────────────────────────────┘││ │└─────────────────────────────────────────────────────────────────┘Why Property Graphs?
Section titled “Why Property Graphs?”| Feature | Benefit |
|---|---|
| Arbitrary Properties | Each node/edge can have different attributes |
| Rich Metadata | Store descriptions, weights, timestamps, sources |
| Flexible Schema | Adapt to different domains without migration |
| Graph Traversal | Efficient neighbor and path queries |
| Compatibility | Works with Apache AGE, Neo4j, SurrealDB |
Core Data Structures
Section titled “Core Data Structures”GraphNode
Section titled “GraphNode”Represents an entity in the knowledge graph:
/// A node in the knowledge graph.#[derive(Debug, Clone, Serialize, Deserialize)]pub struct GraphNode { /// Node identifier (typically the normalized entity name) pub id: String,
/// Node properties (arbitrary key-value pairs) pub properties: HashMap<String, serde_json::Value>,}Standard Properties:
| Property | Type | Description |
|---|---|---|
entity_type | String | PERSON, ORGANIZATION, CONCEPT, etc. |
description | String | LLM-generated description |
source_chunk_id | String | Origin chunk for lineage |
source_document_id | String | Origin document |
importance | f32 | Relevance score (0.0-1.0) |
created_at | String | ISO timestamp |
GraphEdge
Section titled “GraphEdge”Represents a relationship between entities:
/// An edge in the knowledge graph.#[derive(Debug, Clone, Serialize, Deserialize)]pub struct GraphEdge { /// Source node identifier pub source: String,
/// Target node identifier pub target: String,
/// Edge properties pub properties: HashMap<String, serde_json::Value>,}Standard Properties:
| Property | Type | Description |
|---|---|---|
relation_type | String | works_at, developed, uses, etc. |
description | String | LLM-generated relationship description |
weight | f32 | Relationship strength (0.0-1.0) |
keywords | Vec | Up to 5 keywords (BR0004) |
source_chunk_id | String | Origin chunk |
KnowledgeGraph
Section titled “KnowledgeGraph”A subgraph result from queries:
/// A subgraph extracted from the knowledge graph.#[derive(Debug, Clone, Serialize, Deserialize)]pub struct KnowledgeGraph { /// Nodes in the subgraph pub nodes: Vec<GraphNode>,
/// Edges in the subgraph pub edges: Vec<GraphEdge>,
/// Whether result was truncated pub is_truncated: bool,}The GraphStorage Trait
Section titled “The GraphStorage Trait”All graph backends implement this trait:
#[async_trait]pub trait GraphStorage: Send + Sync { /// Get the storage namespace (for multi-tenancy). fn namespace(&self) -> &str;
/// Initialize the storage. async fn initialize(&self) -> Result<()>;
/// Flush pending changes. async fn finalize(&self) -> Result<()>;
// ========== Node Operations ==========
/// Check if a node exists. async fn has_node(&self, node_id: &str) -> Result<bool>;
/// Get a node by ID. async fn get_node(&self, node_id: &str) -> Result<Option<GraphNode>>;
/// Insert or update a node. async fn upsert_node(&self, node: &GraphNode) -> Result<()>;
/// Delete a node. async fn delete_node(&self, node_id: &str) -> Result<()>;
/// Get all nodes (with optional limit). async fn get_all_nodes(&self, limit: Option<usize>) -> Result<Vec<GraphNode>>;
// ========== Edge Operations ==========
/// Check if an edge exists. async fn has_edge(&self, source: &str, target: &str) -> Result<bool>;
/// Get edges from a node. async fn get_node_edges(&self, node_id: &str) -> Result<Vec<GraphEdge>>;
/// Insert or update an edge. async fn upsert_edge(&self, edge: &GraphEdge) -> Result<()>;
/// Delete an edge. async fn delete_edge(&self, source: &str, target: &str) -> Result<()>;
// ========== Traversal Operations ==========
/// Get neighbors of a node. async fn get_neighbors(&self, node_id: &str, depth: usize) -> Result<Vec<GraphNode>>;
/// Find path between two nodes. async fn find_path(&self, from: &str, to: &str) -> Result<Option<Vec<GraphNode>>>;
// ========== Analytics ==========
/// Get total node count. async fn node_count(&self) -> Result<usize>;
/// Get total edge count. async fn edge_count(&self) -> Result<usize>;
/// Get degree of a node (number of edges). async fn node_degree(&self, node_id: &str) -> Result<usize>;
// ========== Bulk Operations ==========
/// Clear all data. async fn clear(&self) -> Result<()>;
/// Get full graph. async fn get_graph(&self, limit: Option<usize>) -> Result<KnowledgeGraph>;}Storage Backends
Section titled “Storage Backends”MemoryGraphStorage
Section titled “MemoryGraphStorage”In-memory implementation for development and testing:
/// In-memory graph storage using DashMap.pub struct MemoryGraphStorage { namespace: String, nodes: DashMap<String, GraphNode>, edges: DashMap<(String, String), GraphEdge>,}Characteristics:
| Attribute | Value |
|---|---|
| Persistence | ❌ None (data lost on restart) |
| Speed | ⚡ Very fast (O(1) lookups) |
| Scalability | Limited by memory |
| Use Case | Development, testing, small datasets |
Usage:
let storage = MemoryGraphStorage::new("my_workspace");storage.initialize().await?;PostgresAGEStorage
Section titled “PostgresAGEStorage”Production-grade storage using PostgreSQL with Apache AGE extension:
/// PostgreSQL Apache AGE graph storage.pub struct PostgresAGEStorage { pool: PgPool, namespace: String, graph_name: String,}Characteristics:
| Attribute | Value |
|---|---|
| Persistence | ✅ Full durability |
| Speed | Good (optimized queries) |
| Scalability | Millions of nodes/edges |
| Use Case | Production deployments |
Features:
- Native graph queries via Cypher
- Automatic index creation
- Transaction support
- Connection pooling
Schema:
-- Apache AGE graph structureSELECT * FROM cypher('edgequake', $$ CREATE (n:Entity { id: 'SARAH_CHEN', entity_type: 'PERSON', description: 'Researcher at MIT' }) RETURN n$$) AS (n agtype);
-- Create relationshipSELECT * FROM cypher('edgequake', $$ MATCH (a:Entity {id: 'SARAH_CHEN'}) MATCH (b:Entity {id: 'MIT'}) CREATE (a)-[r:WORKS_AT { relation_type: 'works_at', weight: 0.9 }]->(b) RETURN r$$) AS (r agtype);Usage:
let pool = PgPoolOptions::new() .max_connections(10) .connect(&database_url) .await?;
let storage = PostgresAGEStorage::new(pool, "my_workspace").await?;storage.initialize().await?;Storage Operations
Section titled “Storage Operations”Node Operations
Section titled “Node Operations”// Create or update a nodelet mut node = GraphNode::new("SARAH_CHEN");node.set_property("entity_type", json!("PERSON"));node.set_property("description", json!("Researcher at MIT"));node.set_property("importance", json!(0.9));
storage.upsert_node(&node).await?;
// Get a nodeif let Some(node) = storage.get_node("SARAH_CHEN").await? { println!("Found: {}", node.id);}
// Delete a nodestorage.delete_node("SARAH_CHEN").await?;Edge Operations
Section titled “Edge Operations”// Create or update an edgelet mut edge = GraphEdge::new("SARAH_CHEN", "MIT");edge.set_property("relation_type", json!("works_at"));edge.set_property("description", json!("Research position"));edge.set_property("weight", json!(0.9));edge.set_property("keywords", json!(["researcher", "faculty"]));
storage.upsert_edge(&edge).await?;
// Get edges from a nodelet edges = storage.get_node_edges("SARAH_CHEN").await?;for edge in edges { println!("{} -> {}", edge.source, edge.target);}
// Delete an edgestorage.delete_edge("SARAH_CHEN", "MIT").await?;Traversal Operations
Section titled “Traversal Operations”// Get 1-hop neighborslet neighbors = storage.get_neighbors("SARAH_CHEN", 1).await?;
// Get 2-hop neighborslet extended = storage.get_neighbors("SARAH_CHEN", 2).await?;
// Find path between entitiesif let Some(path) = storage.find_path("SARAH_CHEN", "GOOGLE").await? { println!("Path: {:?}", path.iter().map(|n| &n.id).collect::<Vec<_>>());}Analytics
Section titled “Analytics”// Get countslet node_count = storage.node_count().await?;let edge_count = storage.edge_count().await?;
// Get node degreelet degree = storage.node_degree("SARAH_CHEN").await?;println!("SARAH_CHEN has {} connections", degree);Multi-Tenancy
Section titled “Multi-Tenancy”Graph storage supports namespace-based tenant isolation:
┌─────────────────────────────────────────────────────────────────┐│ MULTI-TENANT GRAPH STORAGE │├─────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────────────────────────────────────────────────────┐││ │ PostgreSQL Database │││ │ │││ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │││ │ │ tenant_a │ │ tenant_b │ │ tenant_c │ │││ │ │ (Graph) │ │ (Graph) │ │ (Graph) │ │││ │ │ │ │ │ │ │ │││ │ │ • 1000 nodes │ │ • 500 nodes │ │ • 2000 nodes │ │││ │ │ • 3000 edges │ │ • 1500 edges │ │ • 6000 edges │ │││ │ └───────────────┘ └───────────────┘ └───────────────┘ │││ │ │││ │ Each namespace = separate AGE graph │││ │ Complete isolation, independent schema │││ └─────────────────────────────────────────────────────────────┘││ │└─────────────────────────────────────────────────────────────────┘Implementation:
// Each workspace gets its own graphlet tenant_a = PostgresAGEStorage::new(pool.clone(), "tenant_a").await?;let tenant_b = PostgresAGEStorage::new(pool.clone(), "tenant_b").await?;
// Data is completely isolatedtenant_a.upsert_node(&node).await?;assert!(tenant_b.get_node(&node.id).await?.is_none());Performance Considerations
Section titled “Performance Considerations”Indexing
Section titled “Indexing”PostgreSQL AGE automatically creates indexes on:
- Node ID (primary key)
- Entity type (for type filtering)
- Edge source/target (for traversal)
Query Optimization
Section titled “Query Optimization”-- Efficient: Index-based lookupSELECT * FROM cypher('graph', $$ MATCH (n:Entity {id: 'SARAH_CHEN'}) RETURN n$$) AS (n agtype);
-- Efficient: Limited traversalSELECT * FROM cypher('graph', $$ MATCH (n:Entity {id: 'SARAH_CHEN'})-[r]->(m) RETURN n, r, m LIMIT 100$$) AS (n agtype, r agtype, m agtype);
-- Less efficient: Full scanSELECT * FROM cypher('graph', $$ MATCH (n:Entity) WHERE n.importance > 0.8 RETURN n$$) AS (n agtype);Connection Pooling
Section titled “Connection Pooling”let pool = PgPoolOptions::new() .max_connections(20) // Concurrent connections .min_connections(5) // Keep-alive connections .acquire_timeout(Duration::from_secs(30)) .idle_timeout(Duration::from_secs(600)) .connect(&database_url) .await?;Best Practices
Section titled “Best Practices”- Normalize Entity IDs - Use UPPERCASE_UNDERSCORE format (BR0008)
- Limit Properties - Don’t store large text in properties
- Use Embeddings Separately - Store embeddings in vector storage, not graph
- Batch Operations - Use bulk insert for large imports
- Monitor Size - Track node/edge counts for capacity planning
Benchmarks
Section titled “Benchmarks”Performance on typical workloads (PostgreSQL AGE, 10K nodes, 30K edges):
| Operation | Latency |
|---|---|
get_node | ~1ms |
upsert_node | ~2ms |
get_node_edges | ~3ms |
get_neighbors(1) | ~5ms |
get_neighbors(2) | ~15ms |
node_count | ~50ms |
get_graph(100) | ~10ms |
See Also
Section titled “See Also”- Entity Extraction - How entities are created
- Query Modes - How graph is queried
- Architecture: Crates - Storage crate details
- Performance Tuning - Optimization guide