Deep Dive: Vector Storage
Deep Dive: Vector Storage
Section titled “Deep Dive: Vector Storage”How EdgeQuake Stores and Searches Vector Embeddings
Vector storage powers EdgeQuake’s semantic search capabilities. This document explains how embeddings are stored, indexed, and queried for similarity.
Overview
Section titled “Overview”EdgeQuake uses a trait-based vector storage abstraction to support multiple backends:
┌─────────────────────────────────────────────────────────────────┐│ VECTOR STORAGE ARCHITECTURE │├─────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────────────────────────────────────────────────────┐││ │ LLM PROVIDER │││ │ │││ │ ┌───────────────┐ ┌───────────────────────────────────┐│││ │ │ Text Chunk │────▶│ Embedding Model ││││ │ │ "Dr. Sarah │ │ (text-embedding-3-small) ││││ │ │ Chen..." │ │ ││││ │ └───────────────┘ └─────────────┬─────────────────────┘│││ │ │ │││ │ [1536-dim vector] │││ │ │ │││ └──────────────────────────────────────│──────────────────────┘││ │ ││ ▼ ││ ┌────────────────────────────────────────────────────────────┐ ││ │ VECTOR STORAGE │ ││ │ │ ││ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ ││ │ │ IVFFlat │ │ HNSW │ │ Memory │ │ ││ │ │ (lists) │ │ (graph) │ │ (brute) │ │ ││ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ ││ │ │ │ │ │ ││ │ └───────────────┼───────────────┘ │ ││ │ │ │ ││ │ ▼ │ ││ │ ┌───────────────────────────────────────┐ │ ││ │ │ pgvector / memory │ │ ││ │ │ [id, embedding, metadata] │ │ ││ │ └───────────────────────────────────────┘ │ ││ │ │ ││ └────────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘Why Separate Vector Storage?
Section titled “Why Separate Vector Storage?”| Reason | Benefit |
|---|---|
| Specialized Indices | HNSW, IVFFlat optimized for nearest-neighbor |
| GPU Acceleration | Backends like Faiss can use GPU |
| Different Scaling | Vectors scale differently than graph data |
| Backend Flexibility | Can use Pinecone, Weaviate, Qdrant, or pgvector |
Core Data Structures
Section titled “Core Data Structures”VectorSearchResult
Section titled “VectorSearchResult”Results from similarity queries:
/// Vector similarity search result.#[derive(Debug, Clone, Serialize, Deserialize)]pub struct VectorSearchResult { /// Record identifier (chunk ID, entity name, etc.) pub id: String,
/// Similarity score (higher = more similar) /// Range: -1.0 to 1.0 for cosine similarity pub score: f32,
/// Associated metadata (source, timestamps, etc.) pub metadata: serde_json::Value,}Metadata Fields:
| Field | Type | Description |
|---|---|---|
source_id | String | Origin document/chunk |
entity_type | String | PERSON, CONCEPT, etc. |
created_at | String | ISO timestamp |
workspace_id | UUID | Tenant isolation |
The VectorStorage Trait
Section titled “The VectorStorage Trait”All vector backends implement this interface:
#[async_trait]pub trait VectorStorage: Send + Sync { /// Get the storage namespace (for multi-tenancy) fn namespace(&self) -> &str;
/// Get the expected embedding dimension fn dimension(&self) -> usize;
/// Initialize storage (create tables/indices) async fn initialize(&self) -> Result<()>;
/// Flush pending changes async fn finalize(&self) -> Result<()>;
// ========== Search Operations ==========
/// Perform similarity search async fn query( &self, query_embedding: &[f32], top_k: usize, filter_ids: Option<&[String]>, ) -> Result<Vec<VectorSearchResult>>;
// ========== CRUD Operations ==========
/// Insert or update vectors async fn upsert( &self, data: &[(String, Vec<f32>, serde_json::Value)] ) -> Result<()>;
/// Delete vectors by IDs async fn delete(&self, ids: &[String]) -> Result<()>;
/// Delete all vectors for an entity async fn delete_entity(&self, entity_name: &str) -> Result<()>;
/// Delete relationship vectors for an entity async fn delete_entity_relations(&self, entity_name: &str) -> Result<()>;
// ========== Retrieval ==========
/// Get single vector by ID async fn get_by_id(&self, id: &str) -> Result<Option<Vec<f32>>>;
/// Get multiple vectors by IDs async fn get_by_ids(&self, ids: &[String]) -> Result<Vec<(String, Vec<f32>)>>;
// ========== Utility ==========
async fn is_empty(&self) -> Result<bool>; async fn count(&self) -> Result<usize>; async fn clear(&self) -> Result<()>; async fn clear_workspace(&self, workspace_id: &Uuid) -> Result<usize>;
/// Query with metadata pre-filter (SPEC-007 Tier 2+) async fn query_filtered( &self, query_embedding: &[f32], top_k: usize, filter_ids: Option<&[String]>, metadata_filter: Option<&MetadataFilter>, ) -> Result<Vec<VectorSearchResult>>;}SQL Pre-Filtering (SPEC-007)
Section titled “SQL Pre-Filtering (SPEC-007)”Added in v0.7.0 — Pushes metadata filtering to the SQL layer for dramatic performance improvements at scale.
MetadataFilter
Section titled “MetadataFilter”pub struct MetadataFilter { pub document_ids: Option<Vec<String>>, // Filter by document(s) pub tenant_id: Option<String>, // Filter by tenant pub workspace_id: Option<String>, // Filter by workspace}All fields are optional; only non-None fields participate in AND-combined filtering.
Three-Tier Architecture
Section titled “Three-Tier Architecture”| Tier | Strategy | Index Type | Performance |
|---|---|---|---|
| 1 | Post-retrieval filter | None | Baseline (scans all vectors) |
| 2 | JSONB WHERE + GIN | GIN | ~30-60% reduction in scans |
| 3 | Materialized columns | B-tree | ~60-90% reduction in scans |
How it works in PostgreSQL:
-- Tier 2+3 combined: column-first with JSONB fallbackSELECT id, metadata, 1 - (embedding <=> $1::vector) AS scoreFROM eq_workspace_vectorsWHERE (document_id = ANY($2) OR metadata->>'document_id' = ANY($2)) AND (tenant_id = $3 OR metadata->>'tenant_id' = $3)ORDER BY embedding <=> $1::vectorLIMIT $4The query planner uses B-tree indexes on materialized columns when available, falling back to GIN-indexed JSONB extraction otherwise.
Dual-Write on Upsert
Section titled “Dual-Write on Upsert”When vectors are inserted, metadata is written to both:
- The
metadataJSONB column (backward compatibility) - Materialized columns:
document_id,tenant_id,workspace_id
This ensures existing code reading the JSONB blob continues to work while new queries benefit from indexed column lookups.
Migration Safety
Section titled “Migration Safety”Migrations 027-029 use dynamic table discovery (pg_tables WHERE tablename LIKE 'eq_%_vectors') to safely apply indexes to all workspace-scoped vector tables, including those created at runtime.
Storage Backends
Section titled “Storage Backends”MemoryVectorStorage
Section titled “MemoryVectorStorage”In-memory implementation using brute-force cosine similarity:
pub struct MemoryVectorStorage { namespace: String, dimension: usize, vectors: RwLock<HashMap<String, Vec<f32>>>, metadata: RwLock<HashMap<String, serde_json::Value>>,}Characteristics:
| Attribute | Value |
|---|---|
| Index Type | None (brute-force) |
| Search Complexity | O(n) per query |
| Memory Usage | ~4KB per 1024-dim vector |
| Best For | Testing, <10K vectors |
Cosine Similarity:
fn cosine_similarity(a: &[f32], b: &[f32]) -> f32 { let dot: f32 = a.iter().zip(b.iter()) .map(|(x, y)| x * y).sum(); let norm_a: f32 = a.iter().map(|x| x * x).sum::<f32>().sqrt(); let norm_b: f32 = b.iter().map(|x| x * x).sum::<f32>().sqrt();
if norm_a == 0.0 || norm_b == 0.0 { 0.0 } else { dot / (norm_a * norm_b) }}Usage:
let storage = MemoryVectorStorage::new("my_workspace", 1536);storage.initialize().await?;PgVectorStorage
Section titled “PgVectorStorage”Production-grade storage using PostgreSQL with pgvector extension:
pub struct PgVectorStorage { pool: PostgresPool, table_name: String, namespace: String, dimension: usize, index_type: VectorIndexType, ivfflat_lists: u32, hnsw_m: u32, hnsw_ef_construction: u32,}Characteristics:
| Attribute | Value |
|---|---|
| Persistence | ✅ Full durability |
| Index Types | IVFFlat, HNSW |
| Distance Metrics | Cosine, L2, Inner Product |
| Best For | Production, >10K vectors |
Schema:
CREATE TABLE vectors ( id TEXT PRIMARY KEY, embedding vector(1536) NOT NULL, metadata JSONB DEFAULT '{}', created_at TIMESTAMPTZ NOT NULL DEFAULT NOW());Index Types
Section titled “Index Types”IVFFlat (Inverted File with Flat Quantization)
Section titled “IVFFlat (Inverted File with Flat Quantization)”Partitions vector space into clusters:
┌─────────────────────────────────────────────────────────────────┐│ IVFFlat INDEX │├─────────────────────────────────────────────────────────────────┤│ ││ Step 1: Cluster vectors into lists (Voronoi cells) ││ ││ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ││ │ List 1 │ │ List 2 │ │ List 3 │ │ List N │ ││ │ │ │ │ │ │ │ │ ││ │ ● ● ● │ │ ● ● ● │ │ ● ● │ │ ● ● ● │ ││ │ ● │ │ ● ● │ │ ● ● ● │ │ ● │ ││ └─────────┘ └─────────┘ └─────────┘ └─────────┘ ││ ││ Step 2: Query finds nearest centroid(s) ││ Step 3: Search only those lists (probes=1-5) ││ │└─────────────────────────────────────────────────────────────────┘Configuration:
-- Create IVFFlat indexCREATE INDEX ON vectorsUSING ivfflat (embedding vector_cosine_ops)WITH (lists = 100); -- ~sqrt(n) lists recommended| Parameter | Recommendation |
|---|---|
lists | sqrt(n) to 4*sqrt(n) |
probes | 1-10 (higher = better recall) |
HNSW (Hierarchical Navigable Small World)
Section titled “HNSW (Hierarchical Navigable Small World)”Multi-layer graph for efficient nearest-neighbor:
┌─────────────────────────────────────────────────────────────────┐│ HNSW INDEX │├─────────────────────────────────────────────────────────────────┤│ ││ Layer 2 (sparse) ●─────────────────────────● ││ │ │ ││ │ │ ││ Layer 1 (medium) ●───●───●─────────●───●───● ││ │ │ │ │ │ │ ││ │ │ │ │ │ │ ││ Layer 0 (dense) ●─●─●─●─●─●───────●─●─●─●─●─● ││ ││ • Entry point at top layer ││ • Greedy descent through layers ││ • Local search at layer 0 ││ │└─────────────────────────────────────────────────────────────────┘Configuration:
-- Create HNSW indexCREATE INDEX ON vectorsUSING hnsw (embedding vector_cosine_ops)WITH (m = 16, ef_construction = 64);| Parameter | Meaning | Default |
|---|---|---|
m | Max connections per layer | 16 |
ef_construction | Build-time beam width | 64 |
ef_search | Query-time beam width | 40 |
Index Selection Guide
Section titled “Index Selection Guide”┌─────────────────────────────────────────────────────────────────┐│ WHEN TO USE EACH INDEX │├─────────────────────────────────────────────────────────────────┤│ ││ Dataset Size: ││ ││ < 1K vectors → None (brute-force is fine) ││ 1K - 100K → IVFFlat (good balance) ││ > 100K → HNSW (faster queries) ││ ││ Query Pattern: ││ ││ Many inserts → IVFFlat (faster builds) ││ Many queries → HNSW (faster search) ││ Both balanced → HNSW (better latency) ││ ││ Resource Constraints: ││ ││ Limited memory → IVFFlat (lower overhead) ││ Abundant memory → HNSW (better performance) ││ │└─────────────────────────────────────────────────────────────────┘Embedding Dimensions
Section titled “Embedding Dimensions”EdgeQuake supports multiple embedding models:
| Model | Dimensions | Provider |
|---|---|---|
text-embedding-3-small | 1536 | OpenAI |
text-embedding-3-large | 3072 | OpenAI |
text-embedding-ada-002 | 1536 | OpenAI |
nomic-embed-text | 768 | Ollama |
mxbai-embed-large | 1024 | Ollama |
all-MiniLM-L6-v2 | 384 | Local |
Dimension Mismatch Handling:
EdgeQuake automatically detects when stored vectors have a different dimension than the current provider:
// Check stored dimension from pg_attributepub async fn get_stored_dimension(&self) -> Result<Option<usize>> { // Query atttypmod from pg_attribute (works for empty tables!) let sql = r#" SELECT a.atttypmod FROM pg_attribute a JOIN pg_class c ON a.attrelid = c.oid WHERE c.relname = $1 AND a.attname = 'embedding' "#; // ...}Vector Operations
Section titled “Vector Operations”Upsert Vectors
Section titled “Upsert Vectors”// Prepare embeddingslet data = vec![ ( "chunk_001".to_string(), vec![0.1, 0.2, 0.3, ...], // 1536 dimensions json!({ "source_id": "doc_123", "content_preview": "Dr. Sarah Chen..." }) ), // ...more vectors];
// Bulk insertstorage.upsert(&data).await?;Similarity Search
Section titled “Similarity Search”// Query embedding from LLMlet query_embedding = llm.embed("Who is Sarah Chen?").await?;
// Search top 10 most similarlet results = storage.query( &query_embedding, 10, // top_k None, // no filter).await?;
for result in results { println!("ID: {}, Score: {:.4}", result.id, result.score);}Filtered Search
Section titled “Filtered Search”// Search within specific chunks onlylet filter = vec![ "chunk_001".to_string(), "chunk_002".to_string(), "chunk_003".to_string(),];
let results = storage.query( &query_embedding, 10, Some(&filter), // restrict to these IDs).await?;Performance Tuning
Section titled “Performance Tuning”pgvector Settings
Section titled “pgvector Settings”-- Set HNSW search beam width (higher = better recall)SET hnsw.ef_search = 100;
-- Set IVF probes (higher = better recall)SET ivfflat.probes = 10;
-- Enable parallel queriesSET max_parallel_workers_per_gather = 4;Connection Pooling
Section titled “Connection Pooling”let pool = PgPoolOptions::new() .max_connections(20) // Concurrent queries .min_connections(5) // Keep-alive .acquire_timeout(Duration::from_secs(30)) .connect(&database_url) .await?;Batch Operations
Section titled “Batch Operations”// Bad: Many small insertsfor (id, vec, meta) in data { storage.upsert(&[(id, vec, meta)]).await?;}
// Good: Single batch insertstorage.upsert(&data).await?;Benchmarks
Section titled “Benchmarks”Performance on typical workloads (pgvector with HNSW, 100K vectors, 1536 dimensions):
| Operation | Latency | Notes |
|---|---|---|
query(k=10) | ~5ms | HNSW ef=100 |
query(k=100) | ~15ms | HNSW ef=100 |
upsert(1) | ~3ms | Single vector |
upsert(100) | ~50ms | Batch insert |
| Index build | ~30s | 100K vectors |
Memory Usage:
- 1536-dim vector: ~6KB (with overhead)
- 100K vectors: ~600MB
- HNSW index: ~200MB additional
Multi-Tenancy
Section titled “Multi-Tenancy”Vector storage supports workspace-based isolation:
// Each workspace gets own prefixlet storage_a = PgVectorStorage::new(config_a); // eq_ws_a_vectorslet storage_b = PgVectorStorage::new(config_b); // eq_ws_b_vectors
// Clear specific workspacestorage_a.clear_workspace(&workspace_id).await?;Best Practices
Section titled “Best Practices”- Match Dimensions - Always use same embedding model for indexing and querying
- Batch Inserts - Use bulk upsert for multiple vectors
- Tune Indices - Adjust HNSW m/ef or IVFFlat lists for your dataset
- Monitor Size - Track vector counts for capacity planning
- Normalize Vectors - Cosine similarity assumes unit vectors
Common Issues
Section titled “Common Issues”Dimension Mismatch
Section titled “Dimension Mismatch”Error: Vector dimension 768 doesn't match expected 1536Solution: Either:
- Rebuild embeddings with correct model
- Drop and recreate table with
drop_table()
Slow Queries
Section titled “Slow Queries”Symptoms: Query latency >100ms
Solutions:
- Create index if missing
- Increase
ef_searchfor HNSW - Increase
probesfor IVFFlat - Check connection pool exhaustion
Index Build Timeout
Section titled “Index Build Timeout”Symptoms: Index creation hangs
Solution: For large datasets, create index with reduced parameters:
CREATE INDEX CONCURRENTLY ON vectorsUSING hnsw (embedding vector_cosine_ops)WITH (m = 8, ef_construction = 32);See Also
Section titled “See Also”- Graph Storage - Knowledge graph storage
- Entity Extraction - How entities get embeddings
- Query Modes - How vector search is used
- Performance Tuning - Optimization guide