You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue 8: Define & Prepare Inputs for Simulation/Optimization
Description:
Corresponds to FR-2: Input data preparation & Feature definition.
FR-2.1: The system must retrieve historical database information that matches the simulation and optimization time windows.
FR-2.2: Important input features required for simulation models or optimization objective functions must be calculated by the system or defined by it.
Tasks:
Task 1: Implement logic to query and retrieve relevant time-windowed data (temp, humidity, CO2, etc.) from TimescaleDB (Ref: FR-2.1). (Basic retrieval implemented in DataIngestion/simulation_data_prep using asyncpg SELECT * - Note: Column selection not parameterized, see Extension Task 1.1)
Task 2: Define the specific list of derived input features needed for the chosen simulation model (Issue 9) and objective functions (Issue 10) (e.g., PAR sums, energy cost estimates).
Defined Features (Based on Discussion - Apr 16, 2025):
temp_rate_of_change: Rate of change of sensor_temperature_ins (e.g., difference from previous time step / time_step_duration)
humidity_rate_of_change: Rate of change of sensor_humidity_ins
temp_rolling_avg_Xmin: Rolling average of sensor_temperature_ins over X minutes (Need to define X)
humidity_rolling_avg_Xmin: Rolling average of sensor_humidity_ins over X minutes (Need to define X)
For Objective Functions:
8. total_heating_energy: Sum/Integral of (actuator_heating_power * time_step_duration)
9. total_ventilation_energy: Calculation TBD (Depends on how actuator_ventilation_pos relates to energy use, if any)
10. total_lighting_energy: Sum/Integral of (actuator_lighting_power or equivalent * time_step_duration)
11. total_light_par: Sum/Integral of (sensor_light_par * time_step_duration)
12. mean_abs_deviation_temp_from_setpoint: mean(abs(sensor_temperature_ins - temp_setpoint))
13. mean_abs_deviation_humidity_from_setpoint: mean(abs(sensor_humidity_ins - humidity_setpoint))
14. mean_abs_deviation_light_from_setpoint: mean(abs(sensor_light_par - light_setpoint)) (If applicable - might be more about total light)
15. time_in_optimal_temp_range: Percentage of time sensor_temperature_ins is within a defined optimal band.
16. time_in_optimal_humidity_range: Percentage of time sensor_humidity_ins is within a defined optimal band.
Notes:
Need to find/implement a standard function for SVP(temperature) (e.g., Magnus formula). (Done in flow.py using Buck's)
Need to decide on the time window X for rolling averages. (Configurable in plant_config.json, implementation for rolling std exists in flow.py)
Need to clarify calculation for total_ventilation_energy. (Pending)
Setpoints (temp_setpoint, humidity_setpoint, light_setpoint) need to be defined or provided. (Defined in plant_config.json for optimal ranges)
Task 3: Implement calculations for any derived features based on the raw ingested data or external data (e.g., price forecasts, if used). (Implemented in DataIngestion/simulation_data_prep/src/flow.py: VPD, DLI, GDD (daily only), DIF, CO2 Diff, Actuator Summaries, Rolling Std Dev, distance-from-optimal-midpoint, in-optimal-range-flag. Note: Cross-day cumulative GDD and Kalanchoe night-stress flag are pending - see Issues feat(DataPrep): Implement Cross-Day Cumulative GDD Calculation #16 and feat(DataPrep): Implement 'distance/in-range' and 'night-stress' Polars Features #15 respectively. Other features from Task 2 list like rate-of-change, deltas, energy integrals are not explicitly implemented.)
Task 4: Document the defined features and their calculation methods. (Docstrings added to Python modules, config file provides parameters. Further dedicated documentation in Doc-templates/ may be needed)
Acceptance Criteria:
Database queries retrieve data for the specified time windows. (Verified, uses SELECT *, requires correct window in Prefect flow parameters)
Handles database connection errors or missing data segments gracefully during data retrieval (Ref: NFR-5.1). (Basic error handling implemented in flow.py)
These tasks aim to enhance the robustness, configurability, efficiency, and maintainability of the data retrieval phase implemented in DataIngestion/simulation_data_prep.
1. Configuration & Flexibility:
Task 1.1: Parameterize Column Selection: Configure required raw columns via environment variable or config file instead of hardcoding (SELECT * currently used in flow.py).
Task 1.2: Enhance Timezone Handling: Explicitly configure and handle timezones for start/end times (currently uses naive datetimes based on UTC date).
2. Data Validation & Quality:
Task 2.1: Implement Post-Retrieval Schema Validation: Check retrieved DataFrame for expected columns and types (beyond basic column presence).
Task 2.2: Implement Basic Data Content Validation: Check for excessive nulls, timestamp monotonicity, etc., on the raw retrieved data.
3. Robustness & Error Handling:
Task 3.1: Refine Database Error Handling: Catch and handle specific asyncpg exceptions in flow.py.
Task 3.2: Handle Missing Configured Columns Gracefully: Define strategy (warn/error) if a configured column isn't found in DB. (Handled in transform_features task for required transformation columns)
4. Performance & Scalability:
Task 4.1: Implement Chunked Data Retrieval: Optionally fetch data in chunks using asyncpg cursors or other methods.
Task 4.2: Optimize DataFrame Memory Usage: Convert columns to more memory-efficient types (float32, int16, etc.) after retrieval or during processing.
5. Testing:
Task 5.1: Develop Unit Tests for Retrieval Logic: Mock DB connection/cursor to test SensorRepository.get_sensor_data.
Task 5.2: Develop Unit Tests for Validation Logic: Test schema/content validation functions (Tasks 2.1, 2.2) if implemented.
6. Maintainability & Documentation:
Task 6.1: Refine Logging: Enhance logging messages for clarity and diagnostics. (Improved in flow.py and other modules)
Task 6.2: Add/Update Docstrings and Type Hints: Ensure comprehensive documentation within the code. (Completed for core modules)
Issue 8: Define & Prepare Inputs for Simulation/Optimization
Description:
Corresponds to FR-2: Input data preparation & Feature definition.
Tasks:
DataIngestion/simulation_data_prepusing asyncpgSELECT *- Note: Column selection not parameterized, see Extension Task 1.1)temp_delta_in_out:sensor_temperature_ins - sensor_temperature_outsVPD_ins:SVP(sensor_temperature_ins) * (1 - (sensor_humidity_ins / 100))(Requires SVP function)humidity_delta_in_out:sensor_humidity_ins - sensor_humidity_outstemp_rate_of_change: Rate of change ofsensor_temperature_ins(e.g., difference from previous time step / time_step_duration)humidity_rate_of_change: Rate of change ofsensor_humidity_instemp_rolling_avg_Xmin: Rolling average ofsensor_temperature_insover X minutes (Need to define X)humidity_rolling_avg_Xmin: Rolling average ofsensor_humidity_insover X minutes (Need to define X)8.
total_heating_energy: Sum/Integral of (actuator_heating_power * time_step_duration)9.
total_ventilation_energy: Calculation TBD (Depends on howactuator_ventilation_posrelates to energy use, if any)10.
total_lighting_energy: Sum/Integral of (actuator_lighting_poweror equivalent *time_step_duration)11.
total_light_par: Sum/Integral of (sensor_light_par * time_step_duration)12.
mean_abs_deviation_temp_from_setpoint:mean(abs(sensor_temperature_ins - temp_setpoint))13.
mean_abs_deviation_humidity_from_setpoint:mean(abs(sensor_humidity_ins - humidity_setpoint))14.
mean_abs_deviation_light_from_setpoint:mean(abs(sensor_light_par - light_setpoint))(If applicable - might be more about total light)15.
time_in_optimal_temp_range: Percentage of timesensor_temperature_insis within a defined optimal band.16.
time_in_optimal_humidity_range: Percentage of timesensor_humidity_insis within a defined optimal band.SVP(temperature)(e.g., Magnus formula). (Done inflow.pyusing Buck's)Xfor rolling averages. (Configurable inplant_config.json, implementation for rolling std exists inflow.py)total_ventilation_energy. (Pending)temp_setpoint,humidity_setpoint,light_setpoint) need to be defined or provided. (Defined inplant_config.jsonfor optimal ranges)DataIngestion/simulation_data_prep/src/flow.py: VPD, DLI, GDD (daily only), DIF, CO2 Diff, Actuator Summaries, Rolling Std Dev, distance-from-optimal-midpoint, in-optimal-range-flag. Note: Cross-day cumulative GDD and Kalanchoe night-stress flag are pending - see Issues feat(DataPrep): Implement Cross-Day Cumulative GDD Calculation #16 and feat(DataPrep): Implement 'distance/in-range' and 'night-stress' Polars Features #15 respectively. Other features from Task 2 list like rate-of-change, deltas, energy integrals are not explicitly implemented.)Doc-templates/may be needed)Acceptance Criteria:
SELECT *, requires correct window in Prefect flow parameters)flow.py)Potential Extension Tasks (Data Retrieval & DataFrame Creation Focus)
These tasks aim to enhance the robustness, configurability, efficiency, and maintainability of the data retrieval phase implemented in
DataIngestion/simulation_data_prep.1. Configuration & Flexibility:
SELECT *currently used inflow.py).2. Data Validation & Quality:
3. Robustness & Error Handling:
asyncpgexceptions inflow.py.transform_featurestask for required transformation columns)4. Performance & Scalability:
asyncpgcursors or other methods.float32,int16, etc.) after retrieval or during processing.5. Testing:
SensorRepository.get_sensor_data.6. Maintainability & Documentation:
flow.pyand other modules)