Skip to content

Commit

Permalink
Merge branch 'release/v1.0.2'
Browse files Browse the repository at this point in the history
  • Loading branch information
RobertoChiosa committed Sep 17, 2024
2 parents 42854d7 + 70386b7 commit 848f03b
Show file tree
Hide file tree
Showing 14 changed files with 719 additions and 88 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Release
name: Release package

on:
# should work
Expand Down
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -111,4 +111,7 @@ GitHub.sublime-settings
*.png
.DS_Store
!data.csv
!example.png
!example.png
!cmp.html
!time_window_corrected.csv
!group_cluster.csv
10 changes: 7 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# If you need more help, visit the Dockerfile reference guide at
# https://docs.docker.com/engine/reference/builder/

ARG PYTHON_VERSION=3.12.3
ARG PYTHON_VERSION=3.11
FROM python:${PYTHON_VERSION}-slim as base

# Prevents Python from writing pyc files.
Expand Down Expand Up @@ -36,8 +36,12 @@ RUN adduser \
# Leverage a bind mount to requirements.txt to avoid having to copy them into
# into this layer.
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=bind,source=requirements.txt,target=requirements.txt \
python -m pip install -r requirements.txt
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
--mount=type=bind,source=poetry.lock,target=poetry.lock \
python -m pip install poetry && \
poetry config virtualenvs.create false && \
poetry install --only main --no-interaction --no-ansi && \
rm -rf $POETRY_CACHE_DIR

# Switch to the non-privileged user to run the application.
USER appuser
Expand Down
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,15 @@ docker-run:


.PHONY: rm-git-cache
rm-git-cache: ## Remove git cached files
rm-git-cache:
@echo "Removing git cached files"
git add .
git rm -r --cached .
git add .

.PHONY: setup
setup: ## Setup the project
setup:
@if [ ! -d "${VENV}" ]; then \
echo "Creating venv"; \
Expand Down
87 changes: 40 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
# Contextual Matrix Profile Calculation Tool

The Matrix Profile has the potential to revolutionize time series data mining because of its generality, versatility,
simplicity and scalability. In particular it has implications for time series motif discovery, time series joins,
shapelet discovery (classification), density estimation, semantic segmentation, visualization, rule discovery,
clustering etc.
Matrix Profile is an algorithm capable to discover motifs and discords in time series data. It is a powerful tool that
by calculating the (z-normalized) Euclidean distance between any subsequence within a time series and its nearest
neighbor it is able to provide insights on potential anomalies and/or repetitive patterns. In the field of building
energy management it can be employed to detect anomalies in electrical load timeseries.

This tool is a Python implementation of the Matrix Profile algorithm that employs contextual information (such as
external air temperature) to identify abnormal pattens in electrical load subsequences that start in predefined sub
daily time windows, as shown in the following figure.

![](./docs/example.png)

Expand All @@ -20,7 +24,7 @@ clustering etc.

## Usage

The tool comes with a cli that helps you to execute the script with the desired commands
The tool comes with a CLI that helps you to execute the script with the desired commands

```console
$ python -m src.cmp.main -h
Expand All @@ -39,20 +43,41 @@ options:
The arguments to pass to the script are the following:

* `input_file`: The input dataset via an HTTP URL. The tool should then download the dataset from that URL; since it's a
presigned URL, the tool would not need to deal with authentication—it can just download the dataset directly.
pre-signed URL, the tool would not need to deal with authentication—it can just download the dataset directly.
* `variable_name`: The variable name to be used for the analysis (i.e., the column of the csv that contains the
electrical load under analysis).
* `output_file`: The local path to the output HTML report. The platform would then get that HTML report and upload it to
the object
storage service for the user to review later.

You can run the main script through the console using either local files or download data from an external url. This
repository comes with a sample dataset (data.csv) that you can use to generate a report and you can pass the local path
repository comes with a sample dataset ([data.csv](.src/cmp/data/data.csv)) that you can use to generate a report and
you can pass the local path
as `input_file` argument as follows:

### Data format

todo
The tool requires the user to provide a csv file as input that contains electrical power timeseries for a specific
building, meter or energy system (e.g., whole building electrical power timeseries). The `csv` is a wide table format as
follows:

```csv
timestamp,column_1,temp
2019-01-01 00:00:00,116.4,-0.6
2019-01-01 00:15:00,125.6,-0.9
2019-01-01 00:30:00,119.2,-1.2
```

The csv must have the following columns:

- `timestamp` [case sensitive]: The timestamp of the observation in the format `YYYY-MM-DD HH:MM:SS`. This column is
supposed to be in
UTC timezone string format. It will be internally transformed by the tool into the index of the dataframe.
- `temp` [case sensitive]: Contains the external air temperature in Celsius degrees. This column is required to perform
thermal sensitive
analysis on the electrical load.
- `column_1`: Then the dataframe may have `N` arbitrary columns that refers to electrical load time series. The user has
to specify the column name that refers to the electrical load time series in the `variable_name` argument.

### Run locally

Expand All @@ -62,6 +87,7 @@ Create virtual environment and activate it and install dependencies:
```bash
make setup
```

- Linux:
```bash
python3 -m venv .venv
Expand Down Expand Up @@ -134,44 +160,6 @@ Run the docker image with the same arguments as before
At the end of the execution you can find the results in the [`results`](src/cmp/results) folder inside the docker
container.

## Additional Information

```
# 2) User Defined Context
# We want to find all the subsequences that start from 00:00 to 02:00 (2 hours) and covers the whole day
# In order to avoid overlapping we define the window length as the whole day of
# observation minus the context length.
# - Beginning of the context 00:00 AM [hours]
context_start = 17
# - End of the context 02:00 AM [hours]
context_end = 19
# - Context time window length 2 [hours]
m_context = context_end - context_start # 2
# - Time window length [observations]
# m = 96 [observations] - 4 [observation/hour] * 2 [hours] = 88 [observations] = 22 [hours]
# m = obs_per_day - obs_per_hour * m_context
m = 20 # with guess
# Context Definition:
# example FROM 00:00 to 02:00
# - m_context = 2 [hours]
# - obs_per_hour = 4 [observations/hour]
# - context_start = 0 [hours]
# - context_end = context_start + m_context = 0 [hours] + 2 [hours] = 2 [hours]
contexts = GeneralStaticManager([
range(
# FROM [observations] = x * 96 [observations] + 0 [hour] * 4 [observation/hour]
(x * obs_per_day) + context_start * obs_per_hour,
# TO [observations] = x * 96 [observations] + (0 [hour] + 2 [hour]) * 4 [observation/hour]
(x * obs_per_day) + (context_start + m_context) * obs_per_hour)
for x in range(len(data) // obs_per_day)
])
```

## Cite

You can cite this work by using the following reference or either though [this Bibtex file](./docs/ref.bib) or the
Expand All @@ -183,7 +171,12 @@ following plain text citation
## Contributors

- [Roberto Chiosa](https://github.com/RobertoChiosa)
- Author [Roberto Chiosa](https://github.com/RobertoChiosa)

## References

- Series Distance Matrix repository (https://github.com/predict-idlab/seriesdistancematrix)
- Stumpy Package (https://stumpy.readthedocs.io/en/latest/)

## License

Expand Down
4 changes: 3 additions & 1 deletion RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
### Features:

- Added command line interface for the tool
- Fixed csv path issues
- Documentation updates
- Docker run support
Loading

0 comments on commit 848f03b

Please sign in to comment.