Maybe it's there, maybe it's not!
This is a toy project that is designed to be a learning project for people that I mentor in my company to understand how databases work under the hood.
SchrödingerDB is a simple key-value store using a Hash Index.
Like most client-server databases, the following components must be implemented:
block-beta
columns 1
block
columns 1
Transport
space
Transport --> QueryProcessor
space
QueryProcessor --> ExecutionEngine
space
ExecutionEngine --> StorageEngine
end
The DB must open a client communication port on a TCP port to handle Redis client requests.
No cluster communication is planned for now.
The query processor is responsible for parsing, validating and interpreting queries. No RBAC will be implemented, so no access checks need to be performed afterward.
The DB will leverage RESP (Redis serialization protocol) because this protocol is well documented, and we will be able to use a simple REDIS CLI to test the DB.
Regarding the version, RESP2 is enough for our limited use case.
The DB should accept a limited subset of Redis/Valkey queries. A query planner is not required as the DB only supports point queries based on a single index.
No query optimizer is required as queries will be basic and straightforward.
No remote execution is planned for now.
Constraint: The storage engine must be pluggable. Even if there is a single canonical implementation, a pluggable architecture allows developer to mock the storage engine during tests (spy, fault injection, ...).
No required for now. If we want to implement a Transaction-capable database, then using FoundationDB seems to most interesting option.
We may want to implement a limited concurrency control mechanism.
This component is the one on which we will focus on. It organizes data on disk and manages data retrieval.
This component should maintain a cache of data pages.
A basic recovery manager should be implemented, not all edge cases will be covered.
The database uses append-only data files, meaning that once a key-value pair is written to the database, the data file is never modified. Hence, challenges like update and deletion arise.
flowchart TD
A([Client]) -->|APPEND key value| B[(Database)]
B --> C{UPSERT}
C --> |O_APPEND| D[fa:fa-file LogFile]
C --> |ADD key offset| E[HashTable]
Now, what if we use multiple files (segments) instead of one?
- What are the challenges?
- How can we solve them?
flowchart TD
A([Client]) -->|GET key| B[(Database)]
Exercise 1: complete the GET flowchart
- Append-only data files
- Indexing
- Log file for durability
- Query language
- Partitioning
- Sharding
- Transactions
- Replication
- Clustering
To run the project schrodingerDB you will need to have maven and a java SDK installed
mvn clean compile
mvn test
mvn package
java -jar target/schrodingerdb-<Version>.jar