Add Data modelling section #270

bmunkholm · 2025-08-28T20:08:42Z

Summary of the changes / Why this is an improvement

Based on #233.
Current focus is adding the Data Modeling section and pages and adjust outline closer to Gitbook.
All pages are still under "Getting Started".

Checklist

Preview

https://cratedb-guide--270.org.readthedocs.build/

into data-modelling2

juanpardo · 2025-09-10T10:28:24Z

docs/start/modelling/vector.md

+# Vector data
+
+CrateDB natively supports **vector embeddings** for efficient **similarity
+search** using **k-nearest neighbour (kNN)** algorithms. This makes it a


CrateDB supports 2 ways of doing similarity search afaik: One is knn_match which indeed uses a k-nearest neighbour search algorithm, but the other one is vector_similarity, which computes the euclidean distance. I would perhaps add here a mention that CrateDB supports similarity search with k-nearest neighbour (kNN) algorithm and euclidean distance.

@juanpardo Can you please make suggested edits instead? Thanks!

Actually, I think that knn_search uses ANN instead of KNN

@juanpardo will you take lead on resolving this potentially with any update if needed?

I think I was wrong, I'm unsure. CrateDB ultimately uses KnnFloatVectorQuery which has the approximateSearch method, but the docstring all say 'k nearest document' (kNN), but in searchNearestVectors there is the reference "The search is allowed to be approximate, meaning the results are not guaranteed to be the true k closest neighbors" which literally sounds like aNN .

So there are a few approximate references, I guess my confusion is that I don't know if the current implementation just allows to swap between aNN and kNN as needed, as it would make sense in a large vector store to use aNN for performance reasons or if we always use kNN but it has some kind of improvement to let it be more 'aproximate' (while still not being aNN, which is a different algorithm).

I've asked internally, I can take lead in resolving any potential change in this section if that's alright with y'all @bmunkholm @juanpardo

Ok. We can correct that later when we figure it out.

Added a simple table definition to allow the following join query to successfully run

surister

Changed bits and reviewed to the best of my ability, there are things that I'd wish to improve but it's not my call (have less marketing statements).

bmunkholm · 2025-09-11T12:53:59Z

docs/start/modelling/primary-key.md

 This option involves declaring a column using `DEFAULT gen_random_text_uuid()`.
 ```psql
-CREATE TABLE example2 (
+CREATE TABLE example (


@surister It's intentional that I enumerate the table names where we redefine or evolve table definitions. This makes it more practical to copy/paste and test examples, and makes it clear to readers which tables we refer to afterwards.

bmunkholm · 2025-09-11T13:12:19Z

@surister

there are things that I'd wish to improve but it's not my call (have less marketing statements).

It's certainly within the scope to adjust the AI marketing prose to a useful level. This section is somewhat introductory, so it's ok that it explains and sells the features a bit, but not in too much "marketing" terms, but more educational in explaining what we are good at.
I suggest we take another round at that after doing the search pages. Perhaps there are more to adjust based on those.

bmunkholm · 2025-09-11T13:54:16Z

I created #275 to fix the 404 links breaking in the build. Also nyc.gov 404, but that seems temporarily.

stephanec76 and others added 13 commits August 27, 2025 00:11

Data modelling: Add new section

dd62b9b

Data modelling: Fix page about "relational data"

2549006

Data modelling: Fix page about "json data"

73ce057

Data modelling: Fix page about "timeseries data"

75899f6

Data modelling: Fix page about "geospatial data"

f6d3bba

Data modelling: Fix SQL in page about "geospatial data"

d60feef

Data modelling: Fix page about "full-text data"

52c3008

Data modelling: Fix page about "vector data"

736652c

Layout: Improve responsiveness on pages using cards heavily

33ecc94

Data modelling: Populate index page

b5ad8b4

Data modelling: Relocate original page about primary keys and sequences

036ebcf

Move connect page to install

488e43c

Updated FTS in modelling

8e8be1c

This comment was marked as outdated.

Sign in to view

bmunkholm mentioned this pull request Aug 28, 2025

Data modelling: Add new section (GenAI, unedited) #233

Closed

This comment was marked as outdated.

Sign in to view

bmunkholm added 13 commits August 29, 2025 09:49

Updated datamodel json from Gitbook.

68591e9

Updated datamodel json from Gitbook.

bebb183

Merge branch 'data-modelling2' of https://github.com/crate/cratedb-guide

27aaf16

into data-modelling2

reformat json to 80 chars, convert link to reference

855950c

Update Primary key strategies

a7afeda

Add link to data modelling

aee1c58

Move Going Further in index to match GitBook

ef495b9

Update timeseries with content from Gitbook

908e867

Added vector modeling content from Gitbook

198fa7f

relational.md updated with content from GitBook

8fc8021

Update Geospatial with Gitbook content

0d6a691

Updated global references and removed seperators

12c45c7

Merge branch 'main' into data-modelling2

5660a1f

bmunkholm marked this pull request as ready for review September 2, 2025 22:51

surister added 8 commits September 10, 2025 11:50

remove unnecessary line

3339e7c

Use event table consistently

d9e420b

minor tweak

5a3a849

remove 'schema explosion' its confusing

afb38cb

tweak comment on object fields

65010db

improve consistency

e64c022

improve consistency again

ab28efa

minor tweak

7243df9

juanpardo reviewed Sep 10, 2025

View reviewed changes

Add devices_info table definition to timeseries page

3a59299

Added a simple table definition to allow the following join query to successfully run

bmunkholm requested review from juanpardo, karynzv and surister September 10, 2025 12:05

Merge branch 'main' into data-modelling2

0caa1b4

This comment was marked as resolved.

Sign in to view

bmunkholm added 3 commits September 10, 2025 18:25

fixed bug in reference

9ccf253

Minor fixes in json modelling

2a1e24e

wording in vector.md

7b5ec7b

This comment was marked as resolved.

Sign in to view

Remove references to UUID4

d8e29e5

surister approved these changes Sep 11, 2025

View reviewed changes

bmunkholm commented Sep 11, 2025

View reviewed changes

enum example table name

d5cfb99

bmunkholm added 2 commits September 11, 2025 16:08

fix index reference to model-promary-key

9973fe5

revert fix.

bddacdb

bmunkholm merged commit abde7e6 into main Sep 11, 2025
2 of 3 checks passed

bmunkholm deleted the data-modelling2 branch September 11, 2025 14:14

coderabbitai bot mentioned this pull request Sep 16, 2025

Naming things: Use "time series" instead of "time-series" #315

Merged

Add Data modelling section #270

Add Data modelling section #270

Uh oh!

Conversation

bmunkholm commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of the changes / Why this is an improvement

Checklist

Preview

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

surister left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bmunkholm commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bmunkholm commented Sep 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

bmunkholm commented Aug 28, 2025 •

edited

Loading

bmunkholm commented Sep 11, 2025 •

edited

Loading