Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 59 additions & 9 deletions .github/workflows/mkdocs-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ name: mkdocs-release
on:
push:
branches: [branch-*\.*]
repository_dispatch:
types:
- trigger-rebuild

concurrency:
group: ${{ github.workflow }}
Expand All @@ -13,25 +16,72 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Extract branch name (push)
if: ${{ github.event_name == 'push' }}
run: echo "BRANCH=${GITHUB_REF#refs/heads/}" >> "$GITHUB_ENV"

- name: Extract branch name (repository_dispatch)
if: ${{ github.event_name == 'repository_dispatch' }}
run: echo "BRANCH=${{ github.event.client_payload.branch }}" >> "$GITHUB_ENV"

- name: Extract version from branch name
run: echo "HOPSWORKS_VERSION=${BRANCH#branch-}" >> "$GITHUB_ENV"

- name: Checkout main repo
uses: actions/checkout@v4
with:
fetch-depth: 0
ref: ${{ env.BRANCH }}

- name: Checkout the API repo
uses: actions/checkout@v4
with:
# TODO: replace aversey with logicalclocks
repository: aversey/hopsworks-api
ref: ${{ env.BRANCH }}
path: hopsworks-api

- name: Cache local Maven repository
uses: actions/cache@v4
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-${{ hashFiles('java/pom.xml') }}
restore-keys: |
${{ runner.os }}-maven-

- name: Set up JDK 8
uses: actions/setup-java@v5
with:
java-version: "8"
distribution: "adopt"

- name: Build javadoc documentation
working-directory: hopsworks-api/java
run: mvn clean install javadoc:javadoc javadoc:aggregate -DskipTests && cp -r target/site/apidocs ../../docs/javadoc

- uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: Install ubuntu dependencies
run: sudo apt update && sudo apt-get install -y libxml2-dev libxslt-dev
- name: Install uv
uses: astral-sh/setup-uv@v7
with:
activate-environment: true
working-directory: hopsworks-api/python

- name: install deps
run: pip3 install -r requirements-docs.txt
- name: Install Python API dependencies
run: uv sync --extra dev --group docs --project hopsworks-api/python

- name: Install Python dependencies
run: uv pip install -r requirements-docs.txt

- name: Install Ubuntu dependencies
run: sudo apt update && sudo apt-get install -y libxml2-dev libxslt-dev

- name: setup git
- name: Setup git for mike
run: |
git config --global user.name Mike
git config --global user.email [email protected]

# Put this back and increment version when cutting a new release branch
# - name: mike deploy docs
# run: mike deploy 3.0 latest -u --push
- name: Deploy the docs with mike
run: mike deploy ${HOPSWORKS_VERSION} latest -u --push
62 changes: 51 additions & 11 deletions .github/workflows/mkdocs-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,25 +12,60 @@ jobs:
with:
fetch-depth: 0

- name: Checkout the API repo
uses: actions/checkout@v4
with:
# TODO: replace aversey with logicalclocks
repository: aversey/hopsworks-api
ref: ${{ github.base_ref }}
path: hopsworks-api

- name: Markdownlint
uses: DavidAnson/markdownlint-cli2-action@v21
with:
globs: '**/*.md'

- name: Cache local Maven repository
uses: actions/cache@v4
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-${{ hashFiles('java/pom.xml') }}
restore-keys: |
${{ runner.os }}-maven-

- name: Set up JDK 8
uses: actions/setup-java@v5
with:
java-version: "8"
distribution: "adopt"

- name: Build javadoc documentation
working-directory: hopsworks-api/java
run: mvn clean install javadoc:javadoc javadoc:aggregate -DskipTests && cp -r target/site/apidocs ../../docs/javadoc

- uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: Install ubuntu dependencies
run: sudo apt update && sudo apt-get install -y libxml2-dev libxslt-dev
- name: Install uv
uses: astral-sh/setup-uv@v7
with:
activate-environment: true
working-directory: hopsworks-api/python

- name: install deps
run: pip3 install -r requirements-docs.txt
- name: Install Python API dependencies
run: uv sync --extra dev --group docs --project hopsworks-api/python

- name: setup git
run: |
git config --global user.name Mike
git config --global user.email [email protected]
- name: Install Python dependencies
run: uv pip install -r requirements-docs.txt

- name: Install Ubuntu dependencies
run: sudo apt update && sudo apt-get install -y libxml2-dev libxslt-dev

- name: test broken links
- name: Check for broken links
run: |
# run the server
mkdocs serve > /dev/null 2>&1 &
mkdocs serve > /dev/null 2>&1 &
SERVER_PID=$!
echo "mk server in PID $SERVER_PID"
# Give enough time for deployment
Expand All @@ -41,5 +76,10 @@ jobs:
# If ok just kill the server
kill -9 $SERVER_PID

- name: mike deploy docs
- name: Setup git for mike
run: |
git config --global user.name Mike
git config --global user.email [email protected]

- name: Generate the docs with mike
run: mike deploy 3.2-SNAPSHOT dev -u
7 changes: 7 additions & 0 deletions .markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
MD041: false
MD013: false
MD033: false
MD045: false
MD046: false
MD004:
style: dash
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Documentation landing page

This is the source of the landing page for https://docs.hopsworks.ai
This is the source of the landing page for <https://docs.hopsworks.ai>

## Build instructions

Expand Down Expand Up @@ -35,11 +35,11 @@ Use mkdocs to build the documentation and serve it locally
{PY_ENV}/bin/mkdocs serve
```

The documentation should now be available locally on the following URL: http://127.0.0.1:8000/
The documentation should now be available locally on the following URL: <http://127.0.0.1:8000/>

## Adding new pages

The `mkdocs.yml` file of this repository defines the pages to show in the navigation.
The `mkdocs.yml` file of this repository defines the pages to show in the navigation.
After adding your new page in the docs folder, you also need to add it to this file for it to show up in the navigation.

## Checking links
Expand All @@ -56,4 +56,4 @@ linkchecker http://127.0.0.1:8000/

# If ok just kill the server
kill -9 $SERVER_PID
```
```
42 changes: 27 additions & 15 deletions docs/concepts/dev/inside.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,46 @@
Hopsworks provides a complete self-service development environment for feature engineering and model training. You can develop programs as Jupyter notebooks or jobs, customize the bundled FTI (feature, training and inference pipeline) python environments, you can manage your source code with Git, and you can orchestrate jobs with Airflow.

<img src="../../../assets/images/concepts/dev/dev-inside.svg">
Hopsworks provides a complete self-service development environment for feature engineering and model training.
You can develop programs as Jupyter notebooks or jobs, customize the bundled FTI (feature, training and inference pipeline) python environments, you can manage your source code with Git, and you can orchestrate jobs with Airflow.

<img src="../../../assets/images/concepts/dev/dev-inside.svg" alt="Hopsworks Development Environment" />

### Jupyter Notebooks

Hopsworks provides a Jupyter notebook development environment for programs written in Python, Spark, Flink, and SparkSQL. You can also develop in your IDE (PyCharm, IntelliJ, etc), test locally, and then run your programs as Jobs in Hopsworks. Jupyter notebooks can also be run as Jobs.
Hopsworks provides a Jupyter notebook development environment for programs written in Python, Spark, Flink, and SparkSQL.
You can also develop in your IDE (PyCharm, IntelliJ, etc), test locally, and then run your programs as Jobs in Hopsworks.
Jupyter notebooks can also be run as Jobs.

### Source Code Control

Hopsworks provides source code control support using Git (GitHub, GitLab or BitBucket). You can securely checkout code into your project and commit and push updates to your code to your source code repository.
Hopsworks provides source code control support using Git (GitHub, GitLab or BitBucket).
You can securely check out code into your project and commit and push updates to your code to your source code repository.

### FTI Pipeline Environments

Hopsworks postulates that building ML systems following the FTI pipeline architecture is best practice. This architecture consists of three independently developed and operated ML pipelines:
Hopsworks postulates that building ML systems following the FTI pipeline architecture is best practice.
This architecture consists of three independently developed and operated ML pipelines:

* Feature pipeline: takes as input raw data that it transforms into features (and labels)
* Training pipeline: takes as input features (and labels) and outputs a trained model
* Inference pipeline: takes new feature data and a trained model and makes predictions
- Feature pipeline: takes as input raw data that it transforms into features (and labels)
- Training pipeline: takes as input features (and labels) and outputs a trained model
- Inference pipeline: takes new feature data and a trained model and makes predictions

In order to facilitate the development of these pipelines Hopsworks bundles several python environments containing necessary dependencies. Each of these environments may then also be customized further by cloning it and installing additional dependencies from PyPi, Conda channels, Wheel files, GitHub repos or a custom Dockerfile. Internal compute such as Jobs and Jupyter is run in one of these environments and changes are applied transparently when you install new libraries using our APIs. That is, there is no need to write a Dockerfile, users install libraries directly in one or more of the environments. You can setup custom development and production environments by creating separate projects or creating multiple clones of an environment within the same project.
In order to facilitate the development of these pipelines Hopsworks bundles several python environments containing necessary dependencies.
Each of these environments may then also be customized further by cloning it and installing additional dependencies from PyPi, Conda channels, Wheel files, GitHub repos or a custom Dockerfile.
Internal compute such as Jobs and Jupyter is run in one of these environments and changes are applied transparently when you install new libraries using our APIs.
That is, there is no need to write a Dockerfile, users install libraries directly in one or more of the environments.
You can setup custom development and production environments by creating separate projects or creating multiple clones of an environment within the same project.

### Jobs

In Hopsworks, a Job is a schedulable program that is allocated compute and memory resources. You can run a Job in Hopsworks:
In Hopsworks, a Job is a schedulable program that is allocated compute and memory resources.
You can run a Job in Hopsworks:

* From the UI
* Programmatically with the Hopsworks SDK (Python, Java) or REST API
* From Airflow programs (either inside our outside Hopsworks)
* From your IDE using a plugin ([PyCharm/IntelliJ plugin](https://plugins.jetbrains.com/plugin/15537-hopsworks))
- From the UI
- Programmatically with the Hopsworks SDK (Python, Java) or REST API
- From Airflow programs (either inside our outside Hopsworks)
- From your IDE using a plugin ([PyCharm/IntelliJ plugin](https://plugins.jetbrains.com/plugin/15537-hopsworks))

### Orchestration

Airflow comes out-of-the box with Hopsworks, but you can also use an external Airflow cluster (with the Hopsworks Job operator) if you have one. Airflow can be used to schedule the execution of Jobs, individually or as part of Airflow DAGs.
Airflow comes out-of-the box with Hopsworks, but you can also use an external Airflow cluster (with the Hopsworks Job operator) if you have one.
Airflow can be used to schedule the execution of Jobs, individually or as part of Airflow DAGs.
7 changes: 5 additions & 2 deletions docs/concepts/dev/outside.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
You can write programs that use Hopsworks in any [Python, Spark, PySpark, or Flink environment](../../user_guides/integrations/index.md). Hopsworks also running SQL queries to compute features in external data warehouses. The Feature Store can also be queried with SQL.
You can write programs that use Hopsworks in any [Python, Spark, PySpark, or Flink environment](../../user_guides/integrations/index.md).
Hopsworks also running SQL queries to compute features in external data warehouses.
The Feature Store can also be queried with SQL.

There is REST API for Hopsworks that can be used with a valid API key, generated in Hopsworks. However, it is often easier to develop your programs against SDKs available in Python and Java/Scala for HSFS, in Python for HSML, and in Python for the Hopsworks API.
There is REST API for Hopsworks that can be used with a valid API key, generated in Hopsworks.
However, it is often easier to develop your programs against SDKs available in Python and Java/Scala for HSFS, in Python for HSML, and in Python for the Hopsworks API.

<img src="../../../assets/images/concepts/dev/dev-outside.svg">
7 changes: 5 additions & 2 deletions docs/concepts/fs/feature_group/external_fg.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
External feature groups are offline feature groups where their data is stored in an external table. An external table requires a data source, defined with the Connector API (or more typically in the user interface), to enable HSFS to retrieve data from the external table. An external feature group doesn't allow for offline data ingestion or modification; instead, it includes a user-defined SQL string for retrieving data. You can also perform SQL operations, including projections, aggregations, and so on. The SQL query is executed on-demand when HSFS retrieves data from the external Feature Group, for example, when creating training data using features in the external table.
External feature groups are offline feature groups where their data is stored in an external table.
An external table requires a data source, defined with the Connector API (or more typically in the user interface), to enable HSFS to retrieve data from the external table.
An external feature group doesn't allow for offline data ingestion or modification; instead, it includes a user-defined SQL string for retrieving data.
You can also perform SQL operations, including projections, aggregations, and so on.
The SQL query is executed on-demand when HSFS retrieves data from the external Feature Group, for example, when creating training data using features in the external table.

In the image below, we can see that HSFS currently supports a large number of data sources, including any JDBC-enabled source, Snowflake, Data Lake, Redshift, BigQuery, S3, ADLS, GCS, RDS, and Kafka

<img src="../../../../assets/images/concepts/fs/fg-connector-api.svg">

5 changes: 2 additions & 3 deletions docs/concepts/fs/feature_group/feature_monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,12 @@ HSFS supports monitoring features on your Feature Group by:

## Scheduled Statistics

After creating a Feature Group in HSFS, you can setup statistics monitoring to compute statistics over one or more features on a scheduled basis. Statistics are computed on the whole or a subset of feature data (i.e., detection window) already inserted into the Feature Group.
After creating a Feature Group in HSFS, you can setup statistics monitoring to compute statistics over one or more features on a scheduled basis.
Statistics are computed on the whole or a subset of feature data (i.e., detection window) already inserted into the Feature Group.

## Statistics Comparison

In addition to scheduled statistics, you can enable the comparison of statistics against a reference subset of feature data (i.e., reference window) and define the criteria for this comparison including the statistics metric to compare and a threshold to identify anomalous values.

!!! info "Feature Monitoring Guide"
More information can be found in the [Feature monitoring guide](../../../user_guides/fs/feature_monitoring/index.md).


Loading
Loading