Strengthen libravdb so it remains correct, performant, and predictable under high concurrent write and read load without relying on application-side wrappers as the primary safety mechanism.
The design should:
- preserve correctness
- avoid uncontrolled CPU and memory spikes
- maintain strong throughput for bulk ingestion
- keep search latency stable under write pressure
- work well in sharded/load-balanced production deployments
- Correctness comes first: scheduling must not change write semantics.
- Concurrency must be bounded, not caller-amplified.
- Backpressure is better than uncontrolled saturation.
- Defaults should be safe for production, with opt-in aggressive modes for bulk loading.
- Search and ingestion should not interfere more than necessary.
- The library should expose operational signals for tuning and autoscaling.
- Do not serialize all writes behind a global lock.
- Do not permanently reduce all batch operations to single-threaded execution.
- Do not require users to build external admission-control wrappers just to use the library safely.
- Do not change public CRUD semantics or HNSW correctness guarantees.
Today, batch execution can multiply concurrency in two ways:
- multiple callers may issue batch writes concurrently
- each batch may also use internal worker parallelism
This can lead to:
- excessive CPU saturation
- memory spikes
- noisy-neighbor behavior between reads and writes
- unstable desktop or production node behavior under pressure
The system currently optimizes for throughput, but not yet for global fairness, bounded resource usage, or adaptive backpressure.
Introduce an internal write scheduling and resource-control layer inside libravdb.
- Write admission controller
- Batch scheduler
- Adaptive concurrency policy
- Memory budget manager for batch execution
- Reader/writer isolation controls
- Operational metrics and pressure signals
- Explicit execution modes
Ensure the number of concurrent mutation-heavy operations is bounded at the collection or database level.
Add an internal scheduler that all mutation paths pass through:
InsertInsertBatchUpdateDelete- batch update/delete variants
- streaming ingestion paths
Each write operation must acquire execution permission before entering heavy work.
Introduce internal limits such as:
- max concurrent write jobs
- max concurrent batch jobs
- max concurrent index mutation jobs
- Keep this internal initially.
- Default limits should be conservative and production-safe.
- This scheduler should govern all write paths consistently.
Prevent concurrency multiplication from callers plus batch worker pools.
Define two distinct control layers:
- External concurrency:
- how many write jobs can execute simultaneously
- Internal batch concurrency:
- how many workers a single batch may use
A batch job should not be allowed to use full internal concurrency when the system is already under write load.
For example:
- if only one batch is running, it may use more workers
- if multiple batches are active, each batch should automatically scale down worker count
This preserves throughput while preventing oversubscription.
Make batch worker counts dynamic instead of fixed.
Replace static “use caller-requested max concurrency directly” behavior with adaptive selection based on:
- current active write jobs
- batch size
- chunk size
- estimated memory cost
- system mode
- collection/index type
- optional runtime pressure indicators
- small batches:
- run serially
- or with low concurrency
- medium batches:
- 2 workers
- large offline ingestion:
- allow higher concurrency
- if write queue is saturated:
- reduce worker count automatically
- if memory estimate is high:
- reduce worker count and/or chunk size
Keep BatchOptions.MaxConcurrency, but treat it as an upper bound, not a guaranteed execution level.
Prevent batch execution from causing uncontrolled memory spikes.
Extend batch planning to estimate:
- vector payload memory
- temporary batch buffers
- index mutation working memory
- storage write-path overhead
- quantization overhead if applicable
Before launching a batch:
- compute an estimated cost
- compare it against a configurable memory budget
- if above budget:
- reduce chunk size
- reduce worker count
- queue the job until budget is available
During execution:
- track in-flight reserved memory
- release reservations on completion
- allow waiting or rejection when budget is exhausted
Protect search latency from ingestion spikes.
Implement one or more of the following:
- lower write concurrency when search pressure is high
- reserve CPU budget for reads
- limit concurrent HNSW mutation work per collection
- isolate read and write execution lanes
- prefer reads when system is in latency-sensitive mode
Add a mode where write throughput is intentionally limited to preserve search responsiveness.
Provide safe but flexible operating profiles.
- prioritize search responsiveness
- low write concurrency
- smaller chunks
- conservative memory usage
- default mode
- moderate write concurrency
- adaptive chunking
- bounded resource usage
- higher write parallelism
- larger chunk sizes
- optimized for offline loading
- acceptable to trade some search latency for throughput
Add a configuration option at DB or collection level:
ExecutionModeLatencyExecutionModeBalancedExecutionModeBulkIngest
Use mode as a policy input for:
- worker count
- queue depth
- memory budget
- read/write fairness
Replace uncontrolled saturation with explicit pressure handling.
When write capacity is exhausted:
- queue operations up to a bounded depth
- or reject immediately with a retryable error
- or block with context cancellation support
Support:
- bounded queue size
- context-aware waiting
- retryable “write pressure” errors
- visibility into queue depth and wait times
Introduce retryable operational errors such as:
- write queue full
- memory budget exceeded
- ingestion throttled
- concurrency limit reached
Make production tuning and autoscaling possible.
- active write jobs
- active batch workers
- queued write jobs
- average queue wait time
- rejected/throttled writes
- estimated reserved write memory
- adaptive chosen concurrency per batch
- chunk size actually used
- read latency during write load
- write throughput by mode
Expose internal state such as:
- scheduler status
- current mode
- active budget usage
- per-collection pressure state
Ensure scheduling changes do not alter behavior.
- ordinal assignment semantics
- storage-first persistence behavior
- HNSW/provider consistency
- delete/update semantics
- rollback behavior on partial failures
- public API semantics for insert/update/delete/search
Add tests for:
- queued batch execution
- cancellation while queued
- adaptive worker reduction
- memory-budget throttling
- concurrent writers plus readers
- no lost writes
- no duplicate ordinals
- stable reopen/rebuild behavior under throttled ingestion
Ensure streaming ingestion uses the same safety model.
Route StreamingBatchInsert through the same scheduler and memory budget system.
Streaming workers should:
- respect write permits
- respect memory reservations
- adapt flush behavior under pressure
- downgrade concurrency instead of amplifying pressure
Add optional configuration structures such as:
type SchedulerConfig struct {
MaxConcurrentWrites int
MaxConcurrentBatches int
MaxWriteQueueDepth int
MaxWriteMemoryBytes int64
DefaultExecutionMode ExecutionMode
EnableAdaptiveBatching bool
EnableReadPriority bool
}type ExecutionMode string
const (
ExecutionModeLatency ExecutionMode = "latency"
ExecutionModeBalanced ExecutionMode = "balanced"
ExecutionModeBulkIngest ExecutionMode = "bulk_ingest"
)Retain:
ChunkSizeMaxConcurrency
But redefine behavior:
MaxConcurrencyis an upper bound- actual concurrency is chosen by the scheduler
ChunkSizemay be reduced internally when needed for memory safety
Add scheduler infrastructure behind current APIs.
Route batch insert through scheduler with conservative defaults.
Add adaptive concurrency and memory budgeting.
Integrate streaming and other write paths.
Add read-priority mode.
Expose metrics and operational introspection.
Tune defaults based on benchmark and production-style load tests.
- concurrent inserts preserve correctness
- queued writes remain ordered and durable where required
- updates and deletes behave correctly under scheduler pressure
- reopen/rebuild remains correct after throttled ingestion
- single-writer throughput remains strong
- balanced mode prevents uncontrolled CPU saturation
- latency mode protects read performance under write load
- bulk_ingest mode still scales well for offline load jobs
- many concurrent batch writers
- mixed read/write workloads
- forced memory budget exhaustion
- queue overflow and cancellation
- shard-style parallel deployments
The implementation is successful when:
libravdbremains correct under concurrent write pressure- CPU and memory usage are bounded and predictable
- search latency is protected in latency-sensitive mode
- bulk ingestion still performs well in high-throughput mode
- multiple callers cannot accidentally multiply concurrency into instability
- production operators can observe and tune the system without wrapping the library externally
For first rollout, use:
- execution mode:
balanced - low-to-moderate write concurrency
- adaptive batch concurrency enabled
- bounded write queue
- memory-budget enforcement enabled
BatchOptions.MaxConcurrencytreated as advisory upper bound
This gives a safer core immediately while preserving room for higher-throughput bulk modes when explicitly requested.