Skip to content

Conversation

@juaristi22
Copy link
Collaborator

@juaristi22 juaristi22 commented Aug 20, 2025

Fix #22

@juaristi22 juaristi22 changed the title ix calculated variables being left out of SingleYearDataset objects Fix calculated variables being left out of SingleYearDataset objects Aug 20, 2025
@juaristi22 juaristi22 requested a review from baogorek August 20, 2025 16:53
Copy link
Collaborator

@baogorek baogorek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only comments related to documentation (e.g., docstrings) and are non-blocking. I did have a bit of trouble figuring out exactly what was going on. Finally I get it. This Claude Code summary was really nice I thought:

● The PR fixes a bug where pre-computed income values were being lost during dataset minimization.

  When creating smaller dataset subsets for calibration, the code was only copying "input variables" (raw data like age, state) but dropping "calculated variables" (pre-computed
  values like employment_income, self_employment_income stored in the original dataset).

  This resulted in minimized datasets with all zeros for income fields, making them useless for calibration.

  Maria's fix:
  1. Identifies which variables in the dataset are calculated (not inputs)
  2. Explicitly preserves these calculated variables and their values when creating subsets
  3. Ensures income data remains intact throughout the calibration pipeline

  The tests confirm that employment_income, self_employment_income, and weekly_hours_worked now retain their non-zero values after minimization.

If there's any risk that other users are going to hit this in the future and not know what happened, is there any way to warn them proactively?

@juaristi22 juaristi22 merged commit fa1d1c4 into main Aug 22, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensure variables that aren't defined as input_variables are stored when converting from dataset classes

3 participants