Skip to content
Merged
1 change: 1 addition & 0 deletions docs/_include/links.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
[DynamoDB CDC Relay]: https://cratedb-toolkit.readthedocs.io/io/dynamodb/cdc.html
[DynamoDB CDC Relay with AWS Lambda]: https://cratedb-toolkit.readthedocs.io/io/dynamodb/cdc-lambda.html
[DynamoDB Table Loader]: https://cratedb-toolkit.readthedocs.io/io/dynamodb/loader.html
[Executable stack with Apache Kafka, Apache Flink, and CrateDB]: https://github.com/crate/cratedb-examples/tree/main/framework/flink/kafka-jdbcsink-java
[Geospatial Data Model]: https://cratedb.com/data-model/geospatial
[Geospatial Database]: https://cratedb.com/geospatial-spatial-database
[HNSW]: https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world
Expand Down
2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@
r"https://dzone.com/.*",
# 504 Client Error: Gateway Timeout for url
r"https://web.archive.org/.*",
# 403 Client Error: Forbidden for url
r"https://www.tableau.com/",
]

linkcheck_anchors_ignore_for_url += [
Expand Down
73 changes: 38 additions & 35 deletions docs/connect/df/index.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,16 @@
(df)=
(dataframe)=
(dataframes)=
(dataframe-examples)=
# CrateDB and DataFrame libraries

Data frame libraries and frameworks which can
be used together with CrateDB.


:::::{grid} 1 2 2 2
:margin: 4 4 0 0
:padding: 0
:gutter: 2

::::{grid-item-card} {material-outlined}`lightbulb;2em` Tutorials
:link: guide:dataframes
:link-type: ref
Learn how to use CrateDB together with popular open-source data frame
libraries, on behalf of hands-on tutorials and code examples.
+++
{tag-info}`Dask` {tag-info}`pandas` {tag-info}`Polars`
::::

::::{grid-item-card} {material-outlined}`read_more;2em` SQLAlchemy
CrateDB's SQLAlchemy dialect implementation provides fundamental infrastructure
to integrations with Dask, pandas, and Polars.
+++
[ORM Guides](inv:guide#orm) •
{ref}`ORM Catalog <orm>`
::::

:::::

How to use CrateDB together with popular open-source DataFrame libraries.

(dask)=
## Dask

:::{rubric} About
:::
[Dask] is a parallel computing library for analytics with task scheduling.
It is built on top of the Python programming language, making it easy to scale
the Python libraries that you know and love, like NumPy, pandas, and scikit-learn.
Expand All @@ -56,10 +32,20 @@ the Python libraries that you know and love, like NumPy, pandas, and scikit-lear
:style: "clear: both"
```

:::{rubric} Learn
:::
- [Guide to efficient data ingestion to CrateDB with pandas and Dask]
- [Efficient batch/bulk INSERT operations with pandas, Dask, and SQLAlchemy]
- [Import weather data using Dask]
- [Dask code examples]


(pandas)=
## pandas

:::{rubric} About
:::

```{div}
:style: "float: right"
[![](https://pandas.pydata.org/static/img/pandas.svg){w=180px}](https://pandas.pydata.org/)
Expand Down Expand Up @@ -92,10 +78,21 @@ and operations for manipulating numerical tables and time series.
:style: "clear: both"
```

:::{rubric} Learn
:::
- [Guide to efficient data ingestion to CrateDB with pandas]
- [Importing Parquet files into CrateDB using Apache Arrow and SQLAlchemy]
- [pandas code examples]
- [From data storage to data analysis: Tutorial on CrateDB and pandas]



(polars)=
## Polars

:::{rubric} About
:::

```{div}
:style: "float: right; margin-left: 0.5em"
[![](https://github.com/pola-rs/polars-static/raw/master/logos/polars-logo-dark.svg){w=180px}](https://pola.rs/)
Expand Down Expand Up @@ -147,13 +144,9 @@ This allows you to easily integrate Polars into your existing data stack.
:style: "clear: both"
```


```{toctree}
:maxdepth: 1
:hidden:

learn
```
:::{rubric} Learn
:::
- [Polars code examples]


[Apache Arrow]: https://arrow.apache.org/
Expand All @@ -162,3 +155,13 @@ learn
[Dask Futures]: https://docs.dask.org/en/latest/futures.html
[pandas]: https://pandas.pydata.org/
[Polars]: https://pola.rs/

[Dask code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/dask
[Efficient batch/bulk INSERT operations with pandas, Dask, and SQLAlchemy]: https://cratedb.com/docs/python/en/latest/by-example/sqlalchemy/dataframe.html
[From data storage to data analysis: Tutorial on CrateDB and pandas]: https://community.cratedb.com/t/from-data-storage-to-data-analysis-tutorial-on-cratedb-and-pandas/1440
[Guide to efficient data ingestion to CrateDB with pandas]: https://community.cratedb.com/t/guide-to-efficient-data-ingestion-to-cratedb-with-pandas/1541
[Guide to efficient data ingestion to CrateDB with pandas and Dask]: https://community.cratedb.com/t/guide-to-efficient-data-ingestion-to-cratedb-with-pandas-and-dask/1482
[Import weather data using Dask]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/dask-weather-data-import.ipynb
[Importing Parquet files into CrateDB using Apache Arrow and SQLAlchemy]: https://community.cratedb.com/t/importing-parquet-files-into-cratedb-using-apache-arrow-and-sqlalchemy/1161
[pandas code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/pandas
[Polars code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/polars
29 changes: 0 additions & 29 deletions docs/connect/df/learn.md

This file was deleted.

6 changes: 3 additions & 3 deletions docs/connect/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,9 +151,9 @@ Python

configure
Drivers <drivers>
CLI <cli>
DataFrame <df/index>
ORM <orm>
CLI programs <cli>
DataFrame libraries <df/index>
ORM libraries <orm>
```

```{toctree}
Expand Down
2 changes: 1 addition & 1 deletion docs/domain/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ telemetry/index
analytics/index
industrial/index
timeseries/index
ml/index
Machine Learning <ml/index>
```


Expand Down
94 changes: 92 additions & 2 deletions docs/domain/ml/index.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
(ml)=
(ml-tools)=
(machine-learning)=

# Machine Learning
# Machine Learning with CrateDB

:::{include} /_include/links.md
:::
:::{include} /_include/styles.html
:::

Machine learning applications and frameworks
which can be used together with CrateDB.

Integrate CrateDB with machine learning frameworks and
tools, for MLOps and vector database operations.

Expand Down Expand Up @@ -60,6 +63,27 @@ generation (RAG), and other applications.
(mlflow)=
### MLflow

:::{rubric} About
:::
```{div}
:style: "float: right; margin-left: 1em"
[![](https://github.com/crate/crate-clients-tools/assets/453543/d1d4f4ac-1b44-46b8-ba6f-4a82607c57d3){w=180px}](https://mlflow.org/)
```

[MLflow] is an open source platform to manage the whole ML lifecycle, including
experimentation, reproducibility, deployment, and a central model registry.

The [MLflow adapter for CrateDB], available through the [mlflow-cratedb] package,
provides support to use CrateDB as a storage database for the [MLflow Tracking]
subsystem, which is about recording and querying experiments, across code, data,
config, and results.

```{div}
:style: "clear: both"
```

:::{rubric} Learn
:::
Tutorials and Notebooks about using [MLflow] together with CrateDB.

::::{info-card}
Expand Down Expand Up @@ -109,6 +133,28 @@ prediction using NumPy, Salesforce Merlion, and Matplotlib.
(pycaret)=
### PyCaret

:::{rubric} About
:::
```{div}
:style: "float: right; margin-left: 1em"
[![](https://github.com/crate/crate-clients-tools/assets/453543/b17a59e2-6801-4f53-892f-ff472491095f){w=180px}](https://pycaret.org/)
```

[PyCaret] is an open-source, low-code machine learning library for Python that
automates machine learning workflows.

It is a high-level interface and AutoML wrapper on top of your loved machine learning
libraries like scikit-learn, xgboost, ray, lightgbm, and many more. PyCaret provides a
universal interface to utilize these libraries without needing to know the details
of the underlying model architectures and parameters.

```{div}
:style: "clear: both"
```

:::{rubric} Learn
:::

Tutorials and Notebooks about using [PyCaret] together with CrateDB.

::::{info-card}
Expand Down Expand Up @@ -154,9 +200,50 @@ How to train time series forecasting models using PyCaret and CrateDB.
::::


(iris-r)=
### R

Use R with CrateDB.

:::::{info-card}
::::{grid-item}
:columns: 9
**Statistical analysis and visualization on huge datasets**

Details about how to create a machine learning pipeline
using R and CrateDB.

:::{toctree}
:maxdepth: 1

r
:::

::::
::::{grid-item}
:columns: 3
{tags-primary}`Fundamentals`
::::
:::::


(scikit-learn)=
### scikit-learn

:::{rubric} About
:::
```{div}
:style: "float: right; margin-left: 1em"
[![](https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Scikit_learn_logo_small.svg/240px-Scikit_learn_logo_small.svg.png){w=180px}](https://scikit-learn.org/)

[![](https://pandas.pydata.org/static/img/pandas.svg){w=180px}](https://pandas.pydata.org/)

[![](https://jupyter.org/assets/logos/rectanglelogo-greytext-orangebody-greymoons.svg){w=180px}](https://jupyter.org/)
```

:::{rubric} Learn
:::

Use [scikit-learn] with CrateDB.

::::{info-card}
Expand Down Expand Up @@ -283,7 +370,10 @@ tensorflow
[Machine Learning and CrateDB: Getting Started With Jupyter]: https://cratedb.com/blog/machine-learning-cratedb-jupyter
[Machine Learning and CrateDB: Experiment Design & Linear Regression]: https://cratedb.com/blog/machine-learning-and-cratedb-part-three-experiment-design-and-linear-regression
[MLflow]: https://mlflow.org/
[MLflow adapter for CrateDB]: https://github.com/crate/mlflow-cratedb
[MLflow and CrateDB]: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/mlflow
[mlflow-cratedb]: https://pypi.org/project/mlflow-cratedb/
[MLflow Tracking]: https://mlflow.org/docs/latest/tracking.html
[MLOps]: https://en.wikipedia.org/wiki/MLOps
[pandas]: https://pandas.pydata.org/
[PyCaret]: https://www.pycaret.org
Expand Down
File renamed without changes.
Loading