Skip to content

Commit 488e2ab

Browse files
authored
Architecture Documentation (#42)
1 parent 4758944 commit 488e2ab

File tree

533 files changed

+15242
-938
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

533 files changed

+15242
-938
lines changed

docs/.gitignore

-13
This file was deleted.
File renamed without changes.

docs/archetypes/default.md

-1
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,4 @@
22
date = '{{ .Date }}'
33
draft = true
44
title = '{{ replace .File.ContentBaseName "-" " " | title }}'
5-
disableTOC = false
65
+++

docs/content/_index.md

+64
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
---
2+
title: HonuDB Documentation
3+
weight: 0
4+
geekdocNav: false
5+
geekdocAlign: center
6+
geekdocAnchor: false
7+
geekdocBreadcrumb: true
8+
---
9+
10+
<!-- markdownlint-capture -->
11+
<!-- markdownlint-disable MD033 -->
12+
13+
<span class="badge-placeholder">[![Build Status](https://github.com/rotationalio/honu/actions/workflows/tests.yaml/badge.svg)](https://github.com/rotationalio/honu/actions/workflows/tests.yaml)</span>
14+
<span class="badge-placeholder">[![GitHub Release](https://img.shields.io/github/v/release/rotationalio/honu)](https://github.com/rotationalio/honu/releases/latest)</span>
15+
<span class="badge-placeholder">[![GitHub Contributors](https://img.shields.io/github/contributors/rotationalio/honu)](https://github.com/rotationalio/honu/graphs/contributors)</span>
16+
<span class="badge-placeholder">[![License: BSD3](https://img.shields.io/github/license/rotationalio/honu)](https://github.com/rotationalio/honu/blob/main/LICENSE)</span>
17+
18+
<!-- markdownlint-restore -->
19+
20+
The HonuDB Database is the first AI native distributed database intended for an audience of AI developers who need to manage multi-modal datasets with snapshots that can map to models and model training. A replicated document database, HonuDB provides rapid data ingestion and collection management for different mimetypes including JSON, Parquet, images, video, and more. With privacy in mind from the start, HonuDB has data governance features such as provenance and lineage tracking (including by geographic location), and fine-grain access controls. Data scientists and machine learning engineers can rely on Honu to manage small to extremely large datasets replicated over multiple geographic areas.
21+
22+
{{< button size="large" relref="quickstart/" >}}Get Started Now!{{< /button >}}
23+
24+
## Feature overview
25+
26+
{{< columns >}}
27+
28+
### Collections &amp; Datasets
29+
30+
Collections allow you to manage related data together; Datasets are snapshots of collections that indicate exactly what data was usd to train a model.
31+
32+
<--->
33+
34+
### Full Versioning
35+
36+
All objects in the database are fully versioned to prevent an update from changing the view of a dataset from a model perspective.
37+
38+
<--->
39+
40+
### Provenance Awareness
41+
42+
Regions and unique writers are tracked across all updates so you can monitor how data is changing in your system and implement privacy controls.
43+
44+
{{< /columns >}}
45+
46+
{{< columns >}}
47+
48+
### Smart Replication
49+
50+
Honu uses reinforcement learning anti-entropy to maximize consistency and scale replication to hundreds of nodes without increasing your cloud costs.
51+
52+
<--->
53+
54+
### Fine-Grain Access Control
55+
56+
Collections, objects, and datasets have a hierarchical permission model specifically for AI workloads including training and inferencing permissions.
57+
58+
<--->
59+
60+
### Model Context Protocol
61+
62+
Honu supports the [Model Context Protocol](https://modelcontextprotocol.io/introduction) so that you can directly add data to your LLM contexts using semantic similarity indexes.
63+
64+
{{< /columns >}}

docs/content/architecture/_index.md

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: Architecture
3+
weight: 500
4+
---
5+
6+
This section is primarily for contributors and developers of the HonuDB replicated database. In it, we will describe the design principles of the database system, how features are implemented and integrated, and the path towards creating complex systems designed from many simple components.
7+
8+
**Design Goal**<br />
9+
The goal of the database is to provide scalable data retrieval both in terms of number of nodes (e.g. scale to 100s of nodes) and amount of data (hundreds of terabytes). In addition to scale, this database provides data access controls, privacy and provenance, and other security related features. Finally, HonuDB provides artifacts and features to support the traning and inferencing model lifecycle. In short, HonuDB is a distributed data governance database for machine learning and artificial intelligence workloads.
10+
11+
**On This Page**
12+
13+
{{< toc >}}
14+
15+
## System Diagram
16+
17+
18+
19+
## Key Terms
20+
21+
<dl>
22+
<dt>Engine</dt>
23+
<dd>A database engine is a component that manages how data is stored and cached on disk.</dd>
24+
</dl>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
title: Data Model
3+
weight: 10
4+
resources:
5+
- name: keylayout
6+
src: figures/keylayout.svg
7+
title: Byte Layout of Keys
8+
---
9+
10+
The database engine manages the data as key/value pairs on disk such that the keys are ordered in byte-sort order for fast iteration. Generally speaking the engine uses an LSM-Tree (log structured merge tree) or similar structure for fast appends to the database and routine compaction and merging.
11+
12+
The layout of the `keys` and the data objects are important to understand.
13+
14+
{{< img name="keylayout" size="large" lazy=false >}}
Loading

docs/content/architecture/data-model/figures/keylayout.svg

+2
Loading

docs/content/contributing/_index.md

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
title: Contributing
3+
weight: 450
4+
---
5+
6+
HonuDB is an open source project that is supported by a community who will gratefully accept any contributions you might make to the project. Large or small, any contribution makes a difference! While Rotational Labs funds the development of HonuDB, we have committed to ensuring that HonuDB always remains open source; please rest assured that your contributions build the community and is not simply free work.
7+
8+
Database development is notoriously difficult -- but hopefully contributing to database code is not. For example, the implementation of an algorithm or component can often be accomplished without affecting the rest of the database code. If you're not sure where to start, look for `TODO` comments in the code or [get in touch with us](https://rotational.io/contact)!
9+
10+
Beyond the code, there are many ways to contribute:
11+
12+
- Submit a feature request or bug report on our [GitHub Issues](https://github.com/rotationalio/honu/issues).
13+
- Add to the documentation or help us with our website, [honudb.dev](https://honudb.dev).
14+
- Write a blog post, tweet, or share our project with others.
15+
- Star our [https://github.com/rotationalio/honu] on GitHub!
16+
- Translate our documentation into another language.
17+
- Write unit or integration tests for Honu.
18+
- Tell us about how you're using HonuDB!
19+
20+
There are lots of ways to get involved, and we'd love to have you be a part of our community.

docs/content/en/_index.md

-9
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
title: Introducing HonuDB
3+
type: posts
4+
date: 2025-02-27
5+
tags:
6+
- Updates
7+
- Informational
8+
---
9+
10+
Why does the world need yet another database? In short, because no one database can support all use cases all the time. HonuDB is for machine learning engineers, a group that needs data management more than most, but is often overlooked as users for database management systems.
11+
12+
<!--more-->
13+
14+
Instead of creating a single purpose tool like a vector database, or augmenting an existing database with vector capabilities like Elastic; HonuDB is focused on the workflow of AI and model development **from training datasets to inferencing context**.
15+
16+
The AI/ML workflow has specialized features that are not generally found together in a single system. To support reproducibility, datasets must be versioned and snapshotted so that training datasets can be mapped to their models, and datasets are a first class access pattern in HonuDB. We also understand how important _privacy_ and _data governance_ is, especially when it comes to AI -- so HonuDB is built to support provenance based investigations and geographic access controls. ML datasets range from the very small to the very large, so HonuDB can operate as a single node or scale to replicate to hundreds of nodes across multiple geographic regions. Finally, vector queries and model context protocols are needed for inferencing, and Honu is ready to support these protocols for RAG workflows.
17+
18+
HonuDB not only supports AI/ML workflows but uses ML under the hood to improve its performance. Based on the academic papers [Anti-Entropy Bandits for Geo-Replicated Consistency](https://ieeexplore.ieee.org/document/8416408) and [Bilateral Anti-Entropy for Eventual Consitency](https://dl.acm.org/doi/10.1145/3517209.3524083), HonuDB uses reinforcement learning to optimize replication and consistency in the wide area!
19+
20+
Always open-source, we hope HonuDB will accelerate your projects and that you'll enjoy using it as much as we do.

docs/content/posts/_index.md

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
title: News
3+
type: posts
4+
weight: 10
5+
geekdocHidden: true
6+
---

docs/content/quickstart/_index.md

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
title: Getting Started
3+
weight: -20
4+
---
5+
6+
{{< hint type="caution" icon="gdoc_fire" title="Coming Soon" >}}
7+
**We're still in development**\
8+
HonuDB is still in an Alpha development phase and is not ready for prime time use. We're excited that you're excited to try it out; if you're willing to be a Beta tester, please get in [contact with us](https://rotational.io/contact/)!
9+
{{< /hint >}}

docs/data/.gitkeep

Whitespace-only changes.

docs/data/en/footer.yaml

-2
This file was deleted.

docs/data/menu/extra.yaml

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
header:
3+
- name: GitHub
4+
ref: https://github.com/rotationalio/honu
5+
icon: gdoc_github
6+
external: true

docs/data/menu/more.yaml

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
more:
3+
- name: News
4+
ref: "/posts"
5+
icon: "gdoc_gitea"
6+
- name: Releases
7+
ref: "https://github.com/rotationalio/honu/releases"
8+
external: true
9+
icon: "gdoc_download"
10+
- name: View Source
11+
ref: "https://github.com/rotationalio/honu"
12+
external: true
13+
icon: "gdoc_github"

docs/hugo.toml

+147
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
baseURL = 'https://honudb.dev/'
2+
languageCode = 'en-us'
3+
title = 'HonuDB'
4+
theme = "hugo-geekdoc"
5+
6+
pluralizeListTitles = false
7+
8+
# Geekdoc required configuration
9+
# Ensures well formatted code blocks
10+
pygmentsUseClasses = true
11+
pygmentsCodeFences = true
12+
disablePathToLower = true
13+
enableGitInfo = true
14+
15+
# Required if you want to render robots.txt template
16+
enableRobotsTXT = true
17+
18+
# Needed for mermaid shortcodes
19+
[markup]
20+
[markup.goldmark.renderer]
21+
# Needed for mermaid shortcode or when nesting shortcodes
22+
unsafe = true
23+
[markup.tableOfContents]
24+
startLevel = 1
25+
endLevel = 9
26+
27+
[taxonomies]
28+
tag = "tags"
29+
30+
[params]
31+
description = "HonuDB is the worlds first AI native distributed database that provides data support for both training and inferencing workloads. Designed to scale to hundreds of nodes, HonuDB provides machine learning engineers practical dataset management, a model context protocol, vector search, and annotation references."
32+
keywords = [
33+
"HonuDB",
34+
"database",
35+
"distributed database",
36+
"vector database",
37+
"AI native database",
38+
"dataset management",
39+
"model context protocol",
40+
"vector search",
41+
"data governance",
42+
"provenance",
43+
"lineage",
44+
"reinforcement learning",
45+
"anti-entropy replication"
46+
]
47+
48+
images = [
49+
"socialmedia.png"
50+
]
51+
52+
# (Optional, default 6) Set how many table of contents levels to be showed on page.
53+
# Use false to hide ToC, note that 0 will default to 6 (https://gohugo.io/functions/default/)
54+
# You can also specify this parameter per page in front matter.
55+
geekdocToC = 3
56+
57+
# (Optional, default static/brand.svg) Set the path to a logo for the Geekdoc
58+
# relative to your 'static/' folder.
59+
geekdocLogo = "logo.png"
60+
61+
# (Optional, default false) Render menu from data file in 'data/menu/main.yaml'.
62+
# See also https://geekdocs.de/usage/menus/#bundle-menu.
63+
geekdocMenuBundle = false
64+
65+
# (Optional, default false) Collapse all menu entries, can not be overwritten
66+
# per page if enabled. Can be enabled per page via 'geekdocCollapseSection'.
67+
geekdocCollapseAllSections = true
68+
69+
# (Optional, default true) Show page navigation links at the bottom of each docs page.
70+
geekdocNextPrev = true
71+
72+
# (Optional, default true) Show a breadcrumb navigation bar at the top of each docs page.
73+
# You can also specify this parameter per page in front matter.
74+
geekdocBreadcrumb = true
75+
76+
# (Optional, default none) Set source repository location. Used for 'Edit page' links.
77+
# You can also specify this parameter per page in front matter.
78+
geekdocRepo = "https://github.com/rotationalio/honu"
79+
80+
# (Optional, default none) Enable 'Edit page' links. Requires 'geekdocRepo' param
81+
# and the path must point to the parent directory of the 'content' folder.
82+
# You can also specify this parameter per page in front matter.
83+
geekdocEditPath = "edit/main/docs"
84+
85+
# (Optional, default false) Show last modification date of the page in the header.
86+
# Keep in mind that last modification date works best if `enableGitInfo` is set to true.
87+
geekdocPageLastmod = false
88+
89+
# (Optional, default true) Enables search function with flexsearch.
90+
# Index is built on the fly and might slow down your website.
91+
geekdocSearch = true
92+
93+
# (Optional, default false) Display search results with the parent folder as prefix. This
94+
# option allows you to distinguish between files with the same name in different folders.
95+
# NOTE: This parameter only applies when 'geekdocSearch = true'.
96+
geekdocSearchShowParent = true
97+
98+
# (Optional, default none) Add a link to your Legal Notice page to the site footer.
99+
# It can be either a remote url or a local file path relative to your content directory.
100+
geekdocLegalNotice = "https://rotational.io/terms"
101+
102+
# (Optional, default none) Add a link to your Privacy Policy page to the site footer.
103+
# It can be either a remote url or a local file path relative to your content directory.
104+
geekdocPrivacyPolicy = "https://rotational.io/privacy"
105+
106+
# (Optional, default true) Add an anchor link to headlines.
107+
geekdocAnchor = true
108+
109+
# (Optional, default true) Copy anchor url to clipboard on click.
110+
geekdocAnchorCopy = true
111+
112+
# (Optional, default true) Enable or disable image lazy loading for images rendered
113+
# by the 'img' shortcode.
114+
geekdocImageLazyLoading = true
115+
116+
# (Optional, default false) Set HTMl <base> to .Site.Home.Permalink if enabled. It might be required
117+
# if a subdirectory is used within Hugos BaseURL.
118+
# See https://developer.mozilla.org/de/docs/Web/HTML/Element/base.
119+
geekdocOverwriteHTMLBase = false
120+
121+
# (Optional, default true) Enable or disable the JavaScript based color theme toggle switch. The CSS based
122+
# user preference mode still works.
123+
geekdocDarkModeToggle = true
124+
125+
# (Optional, default false) Auto-decrease brightness of images and add a slightly grayscale to avoid
126+
# bright spots while using the dark mode.
127+
geekdocDarkModeDim = false
128+
129+
# (Optional, default false) Enforce code blocks to always use the dark color theme.
130+
geekdocDarkModeCode = false
131+
132+
# (Optional, default true) Display a "Back to top" link in the site footer.
133+
geekdocBackToTop = true
134+
135+
# (Optional, default false) Enable or disable adding tags for post pages automatically to the navigation sidebar.
136+
geekdocTagsToMenu = true
137+
138+
# (Optional, default 'title') Configure how to sort file-tree menu entries. Possible options are 'title', 'linktitle',
139+
# 'date', 'publishdate', 'expirydate' or 'lastmod'. Every option can be used with a reverse modifier as well
140+
# e.g. 'title_reverse'.
141+
geekdocFileTreeSortBy = "title"
142+
143+
# (Optional, default none) Adds a "Content licensed under <license>" line to the footer.
144+
# Could be used if you want to define a default license for your content.
145+
# [params.geekdocContentLicense]
146+
# name = "CC BY-SA 4.0"
147+
# link = "https://creativecommons.org/licenses/by-sa/4.0/"

0 commit comments

Comments
 (0)