PPG-DaLiA Data Processing Pipeline see our Health Reference Design Documentation for more information.
This repository is a reference design for an end-to-end machine learning workflow using Edge Impulse to process the PPG-DaLiA dataset, and assumes that the data is available and the transformation blocks have been set up. It demonstrates how to:
- Process raw sensor data (accelerometer and PPG) from multiple subjects.
- Extract and attach metadata to each subject's data.
- Combine all processed data into a single dataset suitable for machine learning tasks like heart rate variability (HRV) analysis and activity classification.
This reference design includes:
- DataProcessor: Processes raw data files for each subject.
- MetadataGenerator: Extracts metadata from questionnaire files and attaches it to the data.
- DataCombiner: Combines all processed data into a single dataset.
- Edge Impulse Pipeline: Automates the data processing workflow by chaining the transformation blocks.
- Overview
- Prerequisites
- Repository Structure
- Setting Up the Repository
- Transformation Blocks
- Creating the Pipeline in Edge Impulse
- Running the Pipeline
- Using the Combined Dataset
- Contributing
- License
The PPG-DaLiA dataset consists of data collected from 15 subjects performing various activities while wearing a wristband equipped with sensors. The dataset includes:
- Accelerometer data (ACC.csv)
- Photoplethysmography (PPG) data (BVP.csv)
- Heart rate data (HR.csv)
- Electrodermal activity (EDA.csv)
- Skin temperature (TEMP.csv)
- Activity labels (S*_activity.csv)
- Subject metadata (S*_quest.csv)
This repository provides a workflow to process this data using Edge Impulse transformation blocks, culminating in a combined dataset ready for machine learning projects.
- Edge Impulse Account: You need an Edge Impulse account with access to create custom transformation blocks and pipelines.
- Edge Impulse CLI: Install the Edge Impulse CLI (
edge-impulse-cli
) version 1.21.1 or higher. - Python: Python 3.7 or higher.
- Docker: Required for building and pushing transformation blocks.
- Git: For version control and repository management.
health-reference-design-public-data/
├── DataProcessor/
│ ├── transform.py
│ ├── parameters.json
│ ├── requirements.txt
│ └── Dockerfile
├── MetadataGenerator/
│ ├── transform.py
│ ├── parameters.json
│ ├── requirements.txt
│ └── Dockerfile
├── DataCombiner/
│ ├── transform.py
│ ├── parameters.json
│ ├── requirements.txt
│ └── Dockerfile
├── README.md
└── LICENSE
- DataProcessor/: Contains the transformation block for processing raw data.
- MetadataGenerator/: Contains the transformation block for extracting and attaching metadata.
- DataCombiner/: Contains the transformation block for combining all processed data.
- README.md: Documentation and instructions.
- LICENSE: License information.
cd health-reference-design-public-data
The repository contains separate folders for each transformation block. You'll need to set up each one individually.
Processes raw sensor data for each subject.
Files:
transform.py
: Script to process raw data files.parameters.json
: Defines parameters for the transformation block.requirements.txt
: Python dependencies.Dockerfile
: Docker configuration for the block.
Steps:
-
Navigate to the DataProcessor directory:
cd DataProcessor
-
Initialize the Transformation Block:
edge-impulse-blocks init --clean
- Select Transformation block when prompted.
- Provide a name and description.
-
Push the Block to Edge Impulse:
edge-impulse-blocks push
-
Repeat: After pushing, return to the main directory:
cd ..
Extracts metadata from questionnaire files and attaches it to the data.
Files:
transform.py
parameters.json
requirements.txt
Dockerfile
Steps:
-
Navigate to the MetadataGenerator directory:
cd MetadataGenerator
-
Initialize the Transformation Block:
edge-impulse-blocks init --clean
- Select Transformation block when prompted.
- Provide a name and description.
-
Push the Block to Edge Impulse:
edge-impulse-blocks push
-
Repeat: Return to the main directory:
cd ..
Combines all processed data into a single dataset.
Files:
transform.py
parameters.json
requirements.txt
Dockerfile
Steps:
-
Navigate to the DataCombiner directory:
cd DataCombiner
-
Initialize the Transformation Block:
edge-impulse-blocks init --clean
- Select Transformation block when prompted.
- Provide a name and description.
-
Push the Block to Edge Impulse:
edge-impulse-blocks push
-
Return to the main directory:
cd ..
Now that all transformation blocks are pushed to Edge Impulse, you can create a pipeline to chain them together.
Steps:
- In Edge Impulse Studio, navigate to your organization.
- Go to Data > Pipelines.
- Click on
+ Add a new pipeline
. - Name:
PPG-DaLiA Data Processing Pipeline
- Description:
Processes PPG-DaLiA data from raw files to a combined dataset
. Output Dataset: combined-dataset
code
- Transformation Block: DataProcessor
- Filter:
name LIKE '%S%_E4%'
(Selects subjects S1_E4 to S15_E4) "dataset-name": "ppg_dalia_combined.parquet" - Input Dataset:
raw-dataset
(Replace with your dataset name) - Output Dataset:
processed-dataset
ave the Pipeline. - Parameters:
Running the Pipeline
{ "in-directory": "."In the pipeline list, click on the ⋮ (ellipsis) next to your pipeline. }
-
Transformation Block: MetadataGenerator
-
Filter: Same as Step 1
-
Input Dataset:
processed-dataset
After completion, verify that the datasets (processed-dataset and combined-dataset) have been created and populated. -
Output Dataset:
processed-dataset
(Update in place) -
Parameters:(ppg_dalia_combined.parquet), you can:
{ "in-directory": "."Create a new project in Edge Impulse. }d the combined dataset.
Build models for HRV analysis and activity classification.
-
Transformation Block: DataCombinerraining, and evaluation.- Filter:
name LIKE '%'
(Selects all data items) -
Input Dataset:
processed-dataset
-
Output Dataset:
combined-dataset
-
Parameters:
{ "dataset-name": "ppg_dalia_combined.parquet" }
- In the pipeline list, click on the ⋮ (ellipsis) next to your pipeline.
- Select
Run pipeline now
.
- Check the pipeline logs to ensure each step runs successfully.
- Address any errors that may occur.
- After completion, verify that the datasets (
processed-dataset
andcombined-dataset
) have been created and populated.
With the combined dataset (ppg_dalia_combined.parquet
), you can:
- Create a new project in Edge Impulse.
- Use the Data Acquisition tab to upload the combined dataset.
- Ensure data is correctly labeled and metadata is intact.
- Build models for HRV analysis and activity classification.
- Utilize Edge Impulse's tools for data exploration, model training, and evaluation.