Skip to content

Deep Dive: Graph Storage

How EdgeQuake Stores and Queries Knowledge Graphs

Graph storage is the foundation of EdgeQuake’s knowledge management. This document explains how entities and relationships are stored, the property graph model, and available storage backends.


EdgeQuake uses a property graph model to store extracted knowledge:

┌─────────────────────────────────────────────────────────────────┐
│ PROPERTY GRAPH MODEL │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ NODES (Entities) ││
│ │ ││
│ │ ┌───────────────────┐ ┌───────────────────┐ ││
│ │ │ SARAH_CHEN │ │ MIT │ ││
│ │ ├───────────────────┤ ├───────────────────┤ ││
│ │ │ type: PERSON │ │ type: ORGANIZATION│ ││
│ │ │ description: ... │ │ description: ... │ ││
│ │ │ source_id: chunk1 │ │ source_id: chunk1 │ ││
│ │ │ importance: 0.9 │ │ importance: 0.8 │ ││
│ │ └─────────┬─────────┘ └─────────┬─────────┘ ││
│ │ │ │ ││
│ └────────────│─────────────────────────│──────────────────────┘│
│ │ │ │
│ ┌────────────│─────────────────────────│──────────────────────┐│
│ │ │ EDGES (Relationships)│ ││
│ │ │ │ ││
│ │ └─────────────────────────┘ ││
│ │ │ ││
│ │ ▼ ││
│ │ ┌─────────────────────────────────────────────────────────┐││
│ │ │ SARAH_CHEN ──[works_at]──▶ MIT │││
│ │ ├─────────────────────────────────────────────────────────┤││
│ │ │ relation_type: works_at │││
│ │ │ description: "Dr. Chen is a researcher at MIT" │││
│ │ │ weight: 0.9 │││
│ │ │ keywords: ["researcher", "faculty", "AI"] │││
│ │ │ source_chunk_id: chunk1 │││
│ │ └─────────────────────────────────────────────────────────┘││
│ │ ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────┘

FeatureBenefit
Arbitrary PropertiesEach node/edge can have different attributes
Rich MetadataStore descriptions, weights, timestamps, sources
Flexible SchemaAdapt to different domains without migration
Graph TraversalEfficient neighbor and path queries
CompatibilityWorks with Apache AGE, Neo4j, SurrealDB

Represents an entity in the knowledge graph:

/// A node in the knowledge graph.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct GraphNode {
/// Node identifier (typically the normalized entity name)
pub id: String,
/// Node properties (arbitrary key-value pairs)
pub properties: HashMap<String, serde_json::Value>,
}

Standard Properties:

PropertyTypeDescription
entity_typeStringPERSON, ORGANIZATION, CONCEPT, etc.
descriptionStringLLM-generated description
source_chunk_idStringOrigin chunk for lineage
source_document_idStringOrigin document
importancef32Relevance score (0.0-1.0)
created_atStringISO timestamp

Represents a relationship between entities:

/// An edge in the knowledge graph.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct GraphEdge {
/// Source node identifier
pub source: String,
/// Target node identifier
pub target: String,
/// Edge properties
pub properties: HashMap<String, serde_json::Value>,
}

Standard Properties:

PropertyTypeDescription
relation_typeStringworks_at, developed, uses, etc.
descriptionStringLLM-generated relationship description
weightf32Relationship strength (0.0-1.0)
keywordsVecUp to 5 keywords (BR0004)
source_chunk_idStringOrigin chunk

A subgraph result from queries:

/// A subgraph extracted from the knowledge graph.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct KnowledgeGraph {
/// Nodes in the subgraph
pub nodes: Vec<GraphNode>,
/// Edges in the subgraph
pub edges: Vec<GraphEdge>,
/// Whether result was truncated
pub is_truncated: bool,
}

All graph backends implement this trait:

#[async_trait]
pub trait GraphStorage: Send + Sync {
/// Get the storage namespace (for multi-tenancy).
fn namespace(&self) -> &str;
/// Initialize the storage.
async fn initialize(&self) -> Result<()>;
/// Flush pending changes.
async fn finalize(&self) -> Result<()>;
// ========== Node Operations ==========
/// Check if a node exists.
async fn has_node(&self, node_id: &str) -> Result<bool>;
/// Get a node by ID.
async fn get_node(&self, node_id: &str) -> Result<Option<GraphNode>>;
/// Insert or update a node.
async fn upsert_node(&self, node: &GraphNode) -> Result<()>;
/// Delete a node.
async fn delete_node(&self, node_id: &str) -> Result<()>;
/// Get all nodes (with optional limit).
async fn get_all_nodes(&self, limit: Option<usize>) -> Result<Vec<GraphNode>>;
// ========== Edge Operations ==========
/// Check if an edge exists.
async fn has_edge(&self, source: &str, target: &str) -> Result<bool>;
/// Get edges from a node.
async fn get_node_edges(&self, node_id: &str) -> Result<Vec<GraphEdge>>;
/// Insert or update an edge.
async fn upsert_edge(&self, edge: &GraphEdge) -> Result<()>;
/// Delete an edge.
async fn delete_edge(&self, source: &str, target: &str) -> Result<()>;
// ========== Traversal Operations ==========
/// Get neighbors of a node.
async fn get_neighbors(&self, node_id: &str, depth: usize) -> Result<Vec<GraphNode>>;
/// Find path between two nodes.
async fn find_path(&self, from: &str, to: &str) -> Result<Option<Vec<GraphNode>>>;
// ========== Analytics ==========
/// Get total node count.
async fn node_count(&self) -> Result<usize>;
/// Get total edge count.
async fn edge_count(&self) -> Result<usize>;
/// Get degree of a node (number of edges).
async fn node_degree(&self, node_id: &str) -> Result<usize>;
// ========== Bulk Operations ==========
/// Clear all data.
async fn clear(&self) -> Result<()>;
/// Get full graph.
async fn get_graph(&self, limit: Option<usize>) -> Result<KnowledgeGraph>;
}

In-memory implementation for development and testing:

/// In-memory graph storage using DashMap.
pub struct MemoryGraphStorage {
namespace: String,
nodes: DashMap<String, GraphNode>,
edges: DashMap<(String, String), GraphEdge>,
}

Characteristics:

AttributeValue
Persistence❌ None (data lost on restart)
Speed⚡ Very fast (O(1) lookups)
ScalabilityLimited by memory
Use CaseDevelopment, testing, small datasets

Usage:

let storage = MemoryGraphStorage::new("my_workspace");
storage.initialize().await?;

Production-grade storage using PostgreSQL with Apache AGE extension:

/// PostgreSQL Apache AGE graph storage.
pub struct PostgresAGEStorage {
pool: PgPool,
namespace: String,
graph_name: String,
}

Characteristics:

AttributeValue
Persistence✅ Full durability
SpeedGood (optimized queries)
ScalabilityMillions of nodes/edges
Use CaseProduction deployments

Features:

  • Native graph queries via Cypher
  • Automatic index creation
  • Transaction support
  • Connection pooling

Schema:

-- Apache AGE graph structure
SELECT * FROM cypher('edgequake', $$
CREATE (n:Entity {
id: 'SARAH_CHEN',
entity_type: 'PERSON',
description: 'Researcher at MIT'
})
RETURN n
$$) AS (n agtype);
-- Create relationship
SELECT * FROM cypher('edgequake', $$
MATCH (a:Entity {id: 'SARAH_CHEN'})
MATCH (b:Entity {id: 'MIT'})
CREATE (a)-[r:WORKS_AT {
relation_type: 'works_at',
weight: 0.9
}]->(b)
RETURN r
$$) AS (r agtype);

Usage:

let pool = PgPoolOptions::new()
.max_connections(10)
.connect(&database_url)
.await?;
let storage = PostgresAGEStorage::new(pool, "my_workspace").await?;
storage.initialize().await?;

// Create or update a node
let mut node = GraphNode::new("SARAH_CHEN");
node.set_property("entity_type", json!("PERSON"));
node.set_property("description", json!("Researcher at MIT"));
node.set_property("importance", json!(0.9));
storage.upsert_node(&node).await?;
// Get a node
if let Some(node) = storage.get_node("SARAH_CHEN").await? {
println!("Found: {}", node.id);
}
// Delete a node
storage.delete_node("SARAH_CHEN").await?;
// Create or update an edge
let mut edge = GraphEdge::new("SARAH_CHEN", "MIT");
edge.set_property("relation_type", json!("works_at"));
edge.set_property("description", json!("Research position"));
edge.set_property("weight", json!(0.9));
edge.set_property("keywords", json!(["researcher", "faculty"]));
storage.upsert_edge(&edge).await?;
// Get edges from a node
let edges = storage.get_node_edges("SARAH_CHEN").await?;
for edge in edges {
println!("{} -> {}", edge.source, edge.target);
}
// Delete an edge
storage.delete_edge("SARAH_CHEN", "MIT").await?;
// Get 1-hop neighbors
let neighbors = storage.get_neighbors("SARAH_CHEN", 1).await?;
// Get 2-hop neighbors
let extended = storage.get_neighbors("SARAH_CHEN", 2).await?;
// Find path between entities
if let Some(path) = storage.find_path("SARAH_CHEN", "GOOGLE").await? {
println!("Path: {:?}", path.iter().map(|n| &n.id).collect::<Vec<_>>());
}
// Get counts
let node_count = storage.node_count().await?;
let edge_count = storage.edge_count().await?;
// Get node degree
let degree = storage.node_degree("SARAH_CHEN").await?;
println!("SARAH_CHEN has {} connections", degree);

Graph storage supports namespace-based tenant isolation:

┌─────────────────────────────────────────────────────────────────┐
│ MULTI-TENANT GRAPH STORAGE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ PostgreSQL Database ││
│ │ ││
│ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ││
│ │ │ tenant_a │ │ tenant_b │ │ tenant_c │ ││
│ │ │ (Graph) │ │ (Graph) │ │ (Graph) │ ││
│ │ │ │ │ │ │ │ ││
│ │ │ • 1000 nodes │ │ • 500 nodes │ │ • 2000 nodes │ ││
│ │ │ • 3000 edges │ │ • 1500 edges │ │ • 6000 edges │ ││
│ │ └───────────────┘ └───────────────┘ └───────────────┘ ││
│ │ ││
│ │ Each namespace = separate AGE graph ││
│ │ Complete isolation, independent schema ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────┘

Implementation:

// Each workspace gets its own graph
let tenant_a = PostgresAGEStorage::new(pool.clone(), "tenant_a").await?;
let tenant_b = PostgresAGEStorage::new(pool.clone(), "tenant_b").await?;
// Data is completely isolated
tenant_a.upsert_node(&node).await?;
assert!(tenant_b.get_node(&node.id).await?.is_none());

PostgreSQL AGE automatically creates indexes on:

  • Node ID (primary key)
  • Entity type (for type filtering)
  • Edge source/target (for traversal)
-- Efficient: Index-based lookup
SELECT * FROM cypher('graph', $$
MATCH (n:Entity {id: 'SARAH_CHEN'})
RETURN n
$$) AS (n agtype);
-- Efficient: Limited traversal
SELECT * FROM cypher('graph', $$
MATCH (n:Entity {id: 'SARAH_CHEN'})-[r]->(m)
RETURN n, r, m
LIMIT 100
$$) AS (n agtype, r agtype, m agtype);
-- Less efficient: Full scan
SELECT * FROM cypher('graph', $$
MATCH (n:Entity)
WHERE n.importance > 0.8
RETURN n
$$) AS (n agtype);
let pool = PgPoolOptions::new()
.max_connections(20) // Concurrent connections
.min_connections(5) // Keep-alive connections
.acquire_timeout(Duration::from_secs(30))
.idle_timeout(Duration::from_secs(600))
.connect(&database_url)
.await?;

  1. Normalize Entity IDs - Use UPPERCASE_UNDERSCORE format (BR0008)
  2. Limit Properties - Don’t store large text in properties
  3. Use Embeddings Separately - Store embeddings in vector storage, not graph
  4. Batch Operations - Use bulk insert for large imports
  5. Monitor Size - Track node/edge counts for capacity planning

Performance on typical workloads (PostgreSQL AGE, 10K nodes, 30K edges):

OperationLatency
get_node~1ms
upsert_node~2ms
get_node_edges~3ms
get_neighbors(1)~5ms
get_neighbors(2)~15ms
node_count~50ms
get_graph(100)~10ms