Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/_include/card/timeseries-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ for fast aggregations.
Combine time series data with document data: CrateDB is all you need.
::::

::::{grid-item-card} {material-outlined}`lightbulb;2em` Time Series: Advanced SQL
::::{grid-item-card} {material-outlined}`lightbulb;2em` Time Series: Analyzing Weather Data
:link: guide:timeseries-analysis-weather
:link-type: ref
:class-footer: text-smaller
CrateDB provides enhanced features for querying time series data.
CrateDB provides advanced SQL features for querying time series data.

:::{rubric} What's Inside
:::
Expand Down
1 change: 1 addition & 0 deletions docs/_include/links.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
[langchain-rag-sql-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Flangchain%2Fcratedb-vectorstore-rag-openai-sql.ipynb
[langchain-rag-sql-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/langchain/cratedb-vectorstore-rag-openai-sql.ipynb
[langchain-rag-sql-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/langchain/cratedb-vectorstore-rag-openai-sql.ipynb
[MLOps]: https://en.wikipedia.org/wiki/MLOps
[MongoDB]: https://www.mongodb.com/docs/manual/
[MongoDB Atlas]: https://www.mongodb.com/docs/atlas/
[MongoDB CDC Relay]: inv:ctk:*:label#mongodb-cdc-relay
Expand Down
3 changes: 3 additions & 0 deletions docs/feature/query/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,9 @@ search, all based on vanilla SQL.
- {ref}`vector-search`
- {ref}`hybrid-search`

## Text-to-SQL
Natural language to SQL conversions using adapters and frameworks.
- {ref}`text-to-sql`

## Time Bucketing
https://community.cratedb.com/t/resampling-time-series-data-with-date-bin/1009
Expand Down
14 changes: 11 additions & 3 deletions docs/feature/search/vector/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,13 @@
:::{rubric} Overview
:::
CrateDB can be used as a [vector database] (VDBMS) for storing and retrieving
vector embeddings based on the FLOAT_VECTOR data type and its accompanying
KNN_MATCH and VECTOR_SIMILARITY functions, effectively conducting HNSW
semantic similarity searches on them, also known as vector search.
vector embeddings.

CrateDB's FLOAT_VECTOR data type implements a vector store and the k-nearest
neighbor (kNN) search algorithm to find vectors that are similar to a query
vector. This works by using its accompanying KNN_MATCH and VECTOR_SIMILARITY
functions to perform HNSW-based semantic similarity search,
also known as vector search.

:::{rubric} About
:::
Expand All @@ -35,6 +39,10 @@ search finds similar data using approximate nearest neighbor (ANN) algorithms.
Compared to traditional keyword search, vector search yields more relevant
results and executes faster.

Feature vectors are computed from raw data via ML methods such as feature
extraction, word embeddings, or deep neural networks.


:::{rubric} Details
:::
CrateDB uses Lucene as a storage layer, so it inherits the implementation
Expand Down
14 changes: 14 additions & 0 deletions docs/solution/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,19 @@ always have them ready for historical analysis.
:::


:::{grid-item-card} {material-outlined}`model_training;2em` Machine Learning
:link: machine-learning
:link-type: ref
:link-alt: About CrateDB for machine learning applications

Learn how to integrate CrateDB with machine learning frameworks and tools.
+++
**What's inside:**
Use CrateDB with LangChain, LlamaIndex, MLflow, PyCaret, scikit-learn,
or TensorFlow.
:::


::::


Expand All @@ -78,5 +91,6 @@ always have them ready for historical analysis.

analytics/index
industrial/index
Machine learning <machine-learning/index>
migrate/index
```
198 changes: 198 additions & 0 deletions docs/solution/machine-learning/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
(ml)=
(ml-tools)=
(machine-learning)=
# Machine learning with CrateDB

:::{include} /_include/links.md
:::

:::{div} sd-text-muted
CrateDB provides a vector store natively, and adapters for integrating
with machine learning frameworks.
:::

## Vector store

:::{div}
[Vector databases][Vector Database] can be used for similarity search,
multi-modal search, recommendation engines, large language models (LLMs),
and other applications.

These applications can answer questions about specific sources of information,
for example using techniques like Retrieval Augmented Generation (RAG).
RAG is a technique for augmenting LLM knowledge with additional data,
often private or real-time.
:::

::::{grid} 2
:gutter: 4

:::{grid-item-card} Documentation: Vector search
:link: vector-search
:link-type: ref
CrateDB's FLOAT_VECTOR data type implements a vector store and the k‑nearest
neighbors (k‑NN) search algorithm to find vectors that are similar to a query
vector.
+++
Vector search on machine learning embeddings: CrateDB is all you need.
:::

:::{grid-item-card} Documentation: Hybrid search
:link: hybrid-search
:link-type: ref
Hybrid search is a technique to enhance relevancy and accuracy by combining
traditional full-text with semantic search algorithms, for achieving better
accuracy and relevancy than each algorithm would individually.
+++
Combined BM25 term search and vector search based on Apache Lucene,
using SQL: CrateDB is all you need.
:::

:::{grid-item-card} Integration: LangChain
:link: langchain
:link-type: ref
LangChain is a framework for developing applications powered by language models,
written in Python, and with a strong focus on composability.
It supports retrieval-augmented generation (RAG).
+++
The LangChain adapter lets you use CrateDB as a vector store database, load
documents via document loaders, and use LangChain’s conversational memory.
:::

::::


(text-to-sql)=
## Text-to-SQL

:::{div} sd-text-muted
Integrate CrateDB with Text-to-SQL solutions,
and provide MCP and AI enterprise data integrations.
:::

::::{grid} 2
:gutter: 4

:::{grid-item-card} Text-to-SQL with LlamaIndex
:link: llamaindex
:link-type: ref
Text-to-SQL is a technique that converts natural language queries into SQL
queries that can be executed by a database.
:::

:::{grid-item-card} All about MCP
:link: mcp
:link-type: ref
The Model Context Protocol (MCP) is an open protocol that enables seamless
integration between LLM applications and external data sources and tools.
:::

:::{grid-item-card} MindsDB
:link: mindsdb
:link-type: ref
MindsDB is the platform for customizing AI from enterprise data.
:::

::::


## Time series analysis

:::{div} sd-text-muted
Load and analyze data from database systems for
time series anomaly detection and forecasting.
:::

::::{grid} 2
:gutter: 4

:::{grid-item-card} Statistical analysis and visualization on huge datasets
:link: r-tutorial
:link-type: ref
Learn how to create a machine learning pipeline using R and CrateDB.
:::

:::{grid-item-card} Regression analysis with pandas and scikit-learn
:link: scikit-learn
:link-type: ref
Use pandas and scikit-learn to run a regression analysis within a Jupyter Notebook.
:::

:::{grid-item-card} Build model for predictive maintenance with TensorFlow
:link: tensorflow-tutorial
:link-type: ref
Learn how to build a machine learning model that will predict whether
a machine will fail within a specified time window in the future.
:::

:::{grid-item-card} Advanced time series analysis with MLflow and PyCaret
:link: ml-timeseries
:link-type: ref
Learn how to conduct advanced data analysis on large time series datasets
with CrateDB, MLflow, and PyCaret:
Anomaly detection and forecasting, time series decomposition,
Exploratory data analysis (EDA).
:::

::::


## MLOps and model training

:::{div} sd-text-muted
CrateDB supports MLOps procedures through adapters to best-of-breed software
frameworks.
:::

:::{div}
Training a machine learning model, running it in production, and maintaining
it, requires a significant amount of data processing and bookkeeping
operations.

Machine Learning Operations [MLOps] is a paradigm that aims to deploy and
maintain machine learning models in production reliably and efficiently,
including experiment tracking, and in the spirit of continuous development
and DevOps.
:::

::::{grid} 2
:gutter: 4

:::{grid-item-card} MLflow
:link: mlflow
:link-type: ref
MLflow is an open-source platform to manage the whole ML lifecycle,
including experimentation, reproducibility, deployment, and a central
model registry.
+++
CrateDB can be used as a storage database for the MLflow Tracking subsystem.
:::

:::{grid-item-card} PyCaret
:link: pycaret
:link-type: ref
PyCaret is an open-source, low-code machine learning library for Python
that automates machine learning workflows (AutoML).
+++
CrateDB can be used as a storage database for training and production datasets.
:::

:::{grid-item-card} Advanced time series analysis with MLflow and PyCaret
:link: ml-timeseries
:link-type: ref
:columns: 12
Learn how to conduct advanced data analysis on large time series datasets
with CrateDB, MLflow, and PyCaret.
+++
**What's inside:** Anomaly detection and forecasting, time series decomposition,
exploratory data analysis (EDA).
:::

::::


:::{toctree}
:maxdepth: 1
:hidden:
time-series
:::
Loading
Loading