A high-performance bi-temporal graph database in Rust, designed for LLM integration and temporal reasoning.
GallifreyDB tracks both valid time (when facts were true in reality) and transaction time (when facts were recorded in the database). This enables powerful time-traveling queries and historical analysis, making it ideal for LLM applications that need to understand how knowledge evolves over time.
- Bi-Temporal Model: Track both valid time and transaction time for full temporal reasoning
- Hybrid Storage: Separate current state (fast path) from historical data (temporal path)
- Anchor+Delta Compression: 5-6X storage reduction while maintaining query performance
- ACID Transactions: Full snapshot isolation with write conflict detection
- Write-Ahead Log (WAL): Crash recovery with versioned binary format (v2)
- Vector Search: HNSW indexing for k-NN semantic search with temporal versioning
- Production Observability: Distributed tracing, metrics, and profiling (optional)
- High Performance: Sub-microsecond traversals (~22ns node lookup, ~23ns edge traversal)
- LLM-Friendly API: Natural query patterns for reasoning about temporal knowledge
- Rust 1.92+ (edition 2024)
- just - Command runner (optional but recommended)
- cargo-llvm-cov - For coverage reports
- Tracy Profiler - For performance profiling (optional)
# Clone the repository
git clone https://github.com/madmax983/GallifreyDB
cd GallifreyDB
# Install development tools
cargo install just cargo-llvm-cov
# Build the project
cargo build
# Run tests
cargo test
# Or use just
just test# Run tests
just test
# Check code coverage (must meet 85% threshold)
just coverage-check
# Generate coverage report (HTML)
just coverage
# Run linter
just lint
# Format code
just fmt
# Run all pre-commit checks
just pre-commit
# Full quality check (format, lint, test, coverage)
just check-all
# Run benchmarks
just bench
# Run benchmarks and generate HTML tables
just bench-tablesSee justfile for all available commands.
GallifreyDB uses Cargo feature flags for optional functionality:
[dependencies]
gallifreydb = { version = "0.1", features = ["observability"] }| Feature | Description | Dependencies |
|---|---|---|
observability |
Core observability (tracing + metrics) | tracing, tracing-subscriber |
observability-tracy |
Tracy CPU profiling integration | tracing-tracy, tracy-client |
observability-honeycomb |
Honeycomb distributed tracing | tracing-honeycomb, libhoney-rust |
observability-prometheus |
Prometheus metrics HTTP server | metrics, metrics-exporter-prometheus |
[dependencies]
gallifreydb = { version = "0.1", features = ["embedding-openai"] }| Feature | Description | Dependencies |
|---|---|---|
embeddings |
Core embedding types and service | tokio, async-trait, serde |
embedding-openai |
OpenAI embedding provider | embeddings, reqwest |
embedding-huggingface |
HuggingFace embedding provider | embeddings, reqwest |
embedding-ollama |
Ollama local embedding provider | embeddings, reqwest |
embedding-onnx |
ONNX local inference ( |
embeddings, ort, tokenizers |
embedding-all |
Enable all embedding providers | All of the above |
Note: Embedding features are completely optional and add zero overhead when disabled. The database core has no embedding dependencies.
GallifreyDB is designed for high performance with minimal temporal overhead. View live benchmark results:
- 📊 Latest Benchmarks - Comprehensive tables with all metrics
- 📈 Historical Trends - Performance over time with regression tracking
| Metric | Target | Actual |
|---|---|---|
| Current-state node lookup | <1µs | ~22ns ✅ |
| Current-state edge traversal | <1µs | ~23ns ✅ |
| 3-hop traversal | <100µs | ~20ns per hop ✅ |
Note: Time-travel query benchmarks are being improved to measure realistic historical reconstruction scenarios.
Benchmarks are automatically run on every push to trunk and published to GitHub Pages. See docs/BENCHMARKING.md for detailed benchmarking guide.
Current Phase: Core Complete, Vector Search (Phase 1-2) Complete, Observability Active
- Core ID types (NodeId, EdgeId, VersionId)
- Temporal primitives (BiTemporalInterval, TimeRange)
- Property system with Arc-based deduplication
- String interning for memory efficiency
- Error types and Result handling
- Test coverage infrastructure (85%+ threshold enforced)
- Current storage layer with CSR adjacency indexes
- Historical storage with anchor+delta compression
- ACID transactions with snapshot isolation
- Write conflict detection
- Write-Ahead Log (WAL) v2 with versioned binary format
- Persistence layer with recovery and migration
- Time-travel queries (as_of, get_node_at_time)
- Public API with read/write transactions
- Vector type with validation (VS-001 to VS-010)
- Similarity functions: cosine, Euclidean, dot product
- Vector normalization utilities
- Distance metric abstraction
- Property-attached vector embeddings
- Historical vector versioning (temporal vectors)
- HNSW indexing for k-NN search
- Auto-indexing on create/update with rollback
- Vector similarity search API
- Optional embedding providers (OpenAI, HuggingFace, Ollama, ONNX)
- Structured logging with
tracing - Tracy profiler integration for CPU profiling
- Honeycomb distributed tracing (via git dependency - see #271)
- Prometheus metrics HTTP server (stub - see #272)
- Critical error detection (lock poisons, timestamp violations, WAL checksum failures)
- Error categorization metrics
- Vector Search Phase 3: Temporal vector queries (semantic time-travel)
- Vector Search Phase 4: Hybrid graph+vector queries
- Vector Search Phase 5: Streaming and incremental updates
- Custom Honeycomb client wrapper (#271)
- Comprehensive Prometheus metrics suite (#272)
- MCP Server for Claude integration
- GraphQL/REST API layer
Test Coverage: 671+ tests passing, 86%+ line coverage (enforced: 85% minimum)
GallifreyDB uses a hybrid storage architecture:
┌─────────────────────────────────────────────────────┐
│ Query Engine │
│ - Temporal Query Planner │
│ - Graph Traversal Engine │
└─────────────────────────────────────────────────────┘
│
┌───────────────┴───────────────┐
│ │
┌───────▼─────────┐ ┌─────────▼─────────┐
│ Current Storage │ │ Historical Storage │
│ - Live Graph │ │ - Anchor+Delta │
│ - Hot Indexes │ │ - Compressed │
│ - Fast Path │ │ - Time Indexes │
└─────────────────┘ └────────────────────┘
Key Design Decisions:
- Current state separated for zero-overhead queries
- Anchor+delta compression for 5-6X storage savings
- Copy-on-write properties with Arc for deduplication
- String interning for memory efficiency
- Lock-free concurrent access (DashMap)
See CLAUDE.md for complete architecture and coding guidelines.
use gallifreydb::{GallifreyDB, PropertyMap};
// Create a new database
let db = GallifreyDB::new();
// Create nodes using write transactions
let alice_id = db.write(|tx| {
tx.create_node("Person", PropertyMap::from_iter([
("name".into(), "Alice".into()),
("age".into(), 30.into()),
]))
})?;
let bob_id = db.write(|tx| {
tx.create_node("Person", PropertyMap::from_iter([
("name".into(), "Bob".into()),
]))
})?;
// Create relationships
db.write(|tx| {
tx.create_edge(alice_id, bob_id, "KNOWS", PropertyMap::new())
})?;
// Read current state
let alice = db.get_node(alice_id)?;use gallifreydb::core::temporal::Timestamp;
// Get node at a specific point in time
let historical_alice = db.get_node_at_time(
alice_id,
Timestamp::from(past_time), // valid time
Timestamp::from(past_time), // transaction time
)?;
// Track how properties changed
if let Some(old_alice) = historical_alice {
println!("Alice's age was: {:?}", old_alice.properties.get("age"));
}// Explicit read transaction
let result = db.read(|tx| {
let node = tx.get_node(alice_id)?;
Ok(node.label.clone())
})?;
// Explicit write transaction with multiple operations
db.write(|tx| {
let node1 = tx.create_node("Event", PropertyMap::new())?;
let node2 = tx.create_node("Event", PropertyMap::new())?;
tx.create_edge(node1, node2, "FOLLOWS", PropertyMap::new())?;
Ok(())
})?;GallifreyDB includes an optional embedding generation system for semantic search:
use gallifreydb::{GallifreyDB, PropertyMapBuilder};
use gallifreydb::embeddings::{EmbeddingService, providers::openai::*};
use std::sync::Arc;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Enable in Cargo.toml: features = ["embedding-openai"]
// 1. Create embedding service
let config = OpenAIConfig::from_env(OpenAIModel::TextEmbedding3Small)?;
let provider = Arc::new(OpenAIProvider::new(config)?);
let service = EmbeddingService::new(provider);
// 2. Generate embeddings
let documents = vec![
"GallifreyDB is a bi-temporal graph database",
"It tracks both valid time and transaction time",
];
let embeddings = service.embed_batch(&documents).await?;
// 3. Store with vectors
let db = GallifreyDB::new();
for (text, embedding) in documents.iter().zip(embeddings.iter()) {
db.create_node(
"Document",
PropertyMapBuilder::new()
.insert("content", *text)
.insert_vector("embedding", embedding)
.build(),
)?;
}
Ok(())
}Available Providers:
- OpenAI: Best quality, API-based (~100-200ms)
- HuggingFace: Open-source models, free tier (~200-500ms)
- Ollama: Local inference, privacy-focused (~20-50ms)
- ONNX: Ultra-fast local, requires setup (~1-10ms)
See docs/EMBEDDINGS.md for complete documentation.
GallifreyDB includes comprehensive observability features for production deployments:
# Enable in Cargo.toml:
features = [
"observability", # Core: structured logging + metrics
"observability-tracy", # Tracy CPU profiling
"observability-honeycomb", # Honeycomb distributed tracing
"observability-prometheus", # Prometheus metrics HTTP server
]Basic usage:
use gallifreydb::observability;
fn main() {
// Initialize observability (call once at startup)
let config = observability::Config::from_env();
observability::init(config);
let db = gallifreydb::GallifreyDB::new();
// Metrics automatically collected
// Check for critical errors
let metrics = observability::metrics();
if metrics.has_critical_errors() {
panic!("Data corruption detected!");
}
}Environment Variables:
RUST_LOG: Control log level (e.g.,gallifreydb=debug)HONEYCOMB_API_KEY: Enable Honeycomb tracingHONEYCOMB_DATASET: Dataset name (default: "gallifreydb")PROMETHEUS_BIND_ADDR: Prometheus HTTP endpoint (e.g., "127.0.0.1:9090")
Critical Metrics (should NEVER be >0):
lock_poison_count: Thread panicked while holding locktimestamp_violations: Transaction time not monotonicwal_checksum_failures: WAL corruption detected
Backends:
- Stdout: Structured JSON logging (always available)
- Tracy: CPU profiling with flamegraphs and zone tracking
- Honeycomb: Distributed tracing for span analysis (
⚠️ uses git dependency, see #271) - Prometheus:
/metricsHTTP endpoint (⚠️ stub implementation, see #272)
Run the demo:
export HONEYCOMB_API_KEY="your-key"
export PROMETHEUS_BIND_ADDR="127.0.0.1:9090"
cargo run --example observability_demo --all-features| Operation | Target | Achieved |
|---|---|---|
| Current-state node lookup | <1µs | ~22ns |
| Current-state edge traversal | <1µs | ~23ns |
| Time-travel reconstruction | <10ms | ~20ns |
| Storage overhead | <2X | On target |
| Write throughput | >100k edges/s | 7-12µs per write |
Run benchmarks with just bench to verify on your hardware.
- CLAUDE.md - Architecture principles and development guidelines
- TESTING.md - Testing, coverage, and profiling guide
- WORKTREE_WORKFLOW.md - Parallel development workflow with git worktrees
- justfile - Available development commands
- docs/VECTOR_SEARCH_DESIGN.md - Vector search architecture (Phases 1-5)
- docs/EMBEDDINGS.md - Embedding generation guide (optional providers)
- docs/WAL.md - Write-Ahead Log format and migration guide
- docs/CODING_STANDARDS.md - Rust coding standards and best practices
- docs/adr/0016-embedding-providers.md - Embedding provider architecture
examples/observability_demo.rs- Production observability featuresexamples/doctor_who_demo.rs- Temporal graph modeling example
Enable LLMs to:
- Query "What did we know about X at time T?"
- Track how relationships evolved over time
- Detect contradictions through provenance
- Reason about causality and change
Track how your knowledge graph changes:
- Audit trails for compliance
- Historical analysis and trend detection
- Rollback capabilities
- Provenance tracking
- Fork the repository
- Create a feature branch (use worktrees:
just worktree-new feature/name) - Run tests:
just test - Check coverage:
just coverage-check - Run pre-commit checks:
just pre-commit - Submit a pull request
All contributions must:
- Pass all tests
- Maintain ≥85% code coverage (line, function, and region)
- Follow coding guidelines in CLAUDE.md
- Include appropriate documentation
- Never commit directly to trunk (use worktrees and PRs)
# Run all tests
just test
# Generate coverage report
just coverage
# Profile with Tracy
just profile-tracy
# Run benchmarks
just benchSee TESTING.md for detailed testing guidelines.
Licensed under the MIT License. See LICENSE for details.