Skip to content

Define & Prepare Inputs for Simulation/Optimization #8

@Fnux8890

Description

@Fnux8890

Issue 8: Define & Prepare Inputs for Simulation/Optimization

Description:
Corresponds to FR-2: Input data preparation & Feature definition.

  • FR-2.1: The system must retrieve historical database information that matches the simulation and optimization time windows.
  • FR-2.2: Important input features required for simulation models or optimization objective functions must be calculated by the system or defined by it.

Tasks:

  • Task 1: Implement logic to query and retrieve relevant time-windowed data (temp, humidity, CO2, etc.) from TimescaleDB (Ref: FR-2.1). (Basic retrieval implemented in DataIngestion/simulation_data_prep using asyncpg SELECT * - Note: Column selection not parameterized, see Extension Task 1.1)
  • Task 2: Define the specific list of derived input features needed for the chosen simulation model (Issue 9) and objective functions (Issue 10) (e.g., PAR sums, energy cost estimates).
    • Defined Features (Based on Discussion - Apr 16, 2025):
      • For Simulation Model:
        1. temp_delta_in_out: sensor_temperature_ins - sensor_temperature_outs
        2. VPD_ins: SVP(sensor_temperature_ins) * (1 - (sensor_humidity_ins / 100)) (Requires SVP function)
        3. humidity_delta_in_out: sensor_humidity_ins - sensor_humidity_outs
        4. temp_rate_of_change: Rate of change of sensor_temperature_ins (e.g., difference from previous time step / time_step_duration)
        5. humidity_rate_of_change: Rate of change of sensor_humidity_ins
        6. temp_rolling_avg_Xmin: Rolling average of sensor_temperature_ins over X minutes (Need to define X)
        7. humidity_rolling_avg_Xmin: Rolling average of sensor_humidity_ins over X minutes (Need to define X)
      • For Objective Functions:
        8. total_heating_energy: Sum/Integral of (actuator_heating_power * time_step_duration)
        9. total_ventilation_energy: Calculation TBD (Depends on how actuator_ventilation_pos relates to energy use, if any)
        10. total_lighting_energy: Sum/Integral of (actuator_lighting_power or equivalent * time_step_duration)
        11. total_light_par: Sum/Integral of (sensor_light_par * time_step_duration)
        12. mean_abs_deviation_temp_from_setpoint: mean(abs(sensor_temperature_ins - temp_setpoint))
        13. mean_abs_deviation_humidity_from_setpoint: mean(abs(sensor_humidity_ins - humidity_setpoint))
        14. mean_abs_deviation_light_from_setpoint: mean(abs(sensor_light_par - light_setpoint)) (If applicable - might be more about total light)
        15. time_in_optimal_temp_range: Percentage of time sensor_temperature_ins is within a defined optimal band.
        16. time_in_optimal_humidity_range: Percentage of time sensor_humidity_ins is within a defined optimal band.
      • Notes:
        • Need to find/implement a standard function for SVP(temperature) (e.g., Magnus formula). (Done in flow.py using Buck's)
        • Need to decide on the time window X for rolling averages. (Configurable in plant_config.json, implementation for rolling std exists in flow.py)
        • Need to clarify calculation for total_ventilation_energy. (Pending)
        • Setpoints (temp_setpoint, humidity_setpoint, light_setpoint) need to be defined or provided. (Defined in plant_config.json for optimal ranges)
  • Task 3: Implement calculations for any derived features based on the raw ingested data or external data (e.g., price forecasts, if used). (Implemented in DataIngestion/simulation_data_prep/src/flow.py: VPD, DLI, GDD (daily only), DIF, CO2 Diff, Actuator Summaries, Rolling Std Dev, distance-from-optimal-midpoint, in-optimal-range-flag. Note: Cross-day cumulative GDD and Kalanchoe night-stress flag are pending - see Issues feat(DataPrep): Implement Cross-Day Cumulative GDD Calculation #16 and feat(DataPrep): Implement 'distance/in-range' and 'night-stress' Polars Features #15 respectively. Other features from Task 2 list like rate-of-change, deltas, energy integrals are not explicitly implemented.)
  • Task 4: Document the defined features and their calculation methods. (Docstrings added to Python modules, config file provides parameters. Further dedicated documentation in Doc-templates/ may be needed)

Acceptance Criteria:


Potential Extension Tasks (Data Retrieval & DataFrame Creation Focus)

These tasks aim to enhance the robustness, configurability, efficiency, and maintainability of the data retrieval phase implemented in DataIngestion/simulation_data_prep.

1. Configuration & Flexibility:

  • Task 1.1: Parameterize Column Selection: Configure required raw columns via environment variable or config file instead of hardcoding (SELECT * currently used in flow.py).
  • Task 1.2: Enhance Timezone Handling: Explicitly configure and handle timezones for start/end times (currently uses naive datetimes based on UTC date).

2. Data Validation & Quality:

  • Task 2.1: Implement Post-Retrieval Schema Validation: Check retrieved DataFrame for expected columns and types (beyond basic column presence).
  • Task 2.2: Implement Basic Data Content Validation: Check for excessive nulls, timestamp monotonicity, etc., on the raw retrieved data.

3. Robustness & Error Handling:

  • Task 3.1: Refine Database Error Handling: Catch and handle specific asyncpg exceptions in flow.py.
  • Task 3.2: Handle Missing Configured Columns Gracefully: Define strategy (warn/error) if a configured column isn't found in DB. (Handled in transform_features task for required transformation columns)

4. Performance & Scalability:

  • Task 4.1: Implement Chunked Data Retrieval: Optionally fetch data in chunks using asyncpg cursors or other methods.
  • Task 4.2: Optimize DataFrame Memory Usage: Convert columns to more memory-efficient types (float32, int16, etc.) after retrieval or during processing.

5. Testing:

  • Task 5.1: Develop Unit Tests for Retrieval Logic: Mock DB connection/cursor to test SensorRepository.get_sensor_data.
  • Task 5.2: Develop Unit Tests for Validation Logic: Test schema/content validation functions (Tasks 2.1, 2.2) if implemented.

6. Maintainability & Documentation:

  • Task 6.1: Refine Logging: Enhance logging messages for clarity and diagnostics. (Improved in flow.py and other modules)
  • Task 6.2: Add/Update Docstrings and Type Hints: Ensure comprehensive documentation within the code. (Completed for core modules)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions