Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/integrate/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ prometheus/index
pycaret/index
pyviz/index
queryzen/index
r/index
rill/index
risingwave/index
scikit-learn/index
Expand Down
35 changes: 35 additions & 0 deletions docs/integrate/r/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
(r)=
# R

```{div} .float-right
[![R logo](https://www.r-project.org/Rlogo.png){height=60px loading=lazy}][R]
```
```{div} .clearfix
```

:::{rubric} About
:::

[R] is a free software environment for statistical computing and graphics.
It compiles and runs on a wide variety of UNIX platforms, Windows and macOS.

:::{rubric} Learn
:::

::::{grid} 2

:::{grid-item-card} Statistical analysis and visualization on huge datasets
:link: r-tutorial
:link-type: ref
Learn how to create a machine learning pipeline using R and CrateDB.
:::

::::

:::{toctree}
:maxdepth: 1
:hidden:
Tutorial <tutorial>
:::

[R]: https://www.r-project.org/
31 changes: 15 additions & 16 deletions docs/topic/ml/r.rst → docs/integrate/r/tutorial.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
.. _cratedb-r:
.. _r-tutorial:

==============
CrateDB with R
Expand All @@ -7,8 +8,7 @@ CrateDB with R
This integration document details how to create a Machine Learning pipeline
using R and CrateDB.

Abstract
========
.. rubric:: Introduction

Statistical analysis and visualization on huge datasets is a common task many
data scientists face in their day-to-day life. One common tool for doing this
Expand All @@ -22,12 +22,7 @@ statistical computations.

This can be accomplished with the `RPostgreSQL`_ library.


Implementation
==============

Set Up
------
.. rubric:: About

For this implementation, we will be using the classic `iris classification
problem`_.
Expand All @@ -51,6 +46,8 @@ Using R, we want to:
4. Retrieve our unclassified iris data, enrich the data with a prediction from
our model, and insert the result into our iris table.

Setup
=====

Prerequisites
-------------
Expand All @@ -68,8 +65,8 @@ To install these libraries within R or RStudio, we can run:
> install.packages("caret")


CrateDB
-------
Provision data
--------------

First, we need to create a table to hold our training data, as well as our
unclassified irises:
Expand Down Expand Up @@ -112,9 +109,11 @@ We can verify that the data has been successfully imported like so:
+----------+
SELECT 1 row in set (0.130 sec)

Usage
=====

Examining The Data
------------------
Explore data
------------

With our data in CrateDB, we can now load it into R or RStudio. Within
R, we should first import our data. We do this by loading the ``RPostgreSQL``
Expand Down Expand Up @@ -186,8 +185,8 @@ As we can see, the lengths and widths of sepals and petals are very good
indicators of iris species, with little overlap between them.


Training A Model
----------------
Train model
-----------

Now that we have loaded our data and can visualize it to get a better idea of
what it contains, we can create a machine learning model to predict a species
Expand Down Expand Up @@ -287,8 +286,8 @@ misclassified a *versicolor* as a *virginica* and vice versa. We could improve
this by trying out other models, by tweaking our model, or by training on a
larger dataset.

Enriching Data
..............
Enrich data
-----------

Now that we have a model we are happy with, we can use this model to enrich
unclassified iris flowers data.
Expand Down
24 changes: 2 additions & 22 deletions docs/topic/ml/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,31 +71,11 @@ See the dedicated page: {ref}`pycaret`.
:::


(iris-r)=
### R

Use R with CrateDB.

:::::{info-card}
::::{grid-item}
:columns: 9
**Statistical analysis and visualization on huge datasets**

Details about how to create a machine learning pipeline
using R and CrateDB.

:::{toctree}
:maxdepth: 1

r
:::{seealso}
Please navigate to the dedicated page about {ref}`r`.
:::

::::
::::{grid-item}
:columns: 3
{tags-primary}`Fundamentals`
::::
:::::


### scikit-learn
Expand Down