-
Notifications
You must be signed in to change notification settings - Fork 0
Metrics matrix creation logic from database #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17 +/- ##
=======================================
Coverage ? 42.85%
=======================================
Files ? 10
Lines ? 679
Branches ? 0
=======================================
Hits ? 291
Misses ? 388
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
baogorek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't call this a review yet, just some comments. The code feels very performant, which is great. I didn't fully go through all the logic. Some of my comments from the scaling PR will apply here too, so perhaps it would make sense to handle that one first and then return.
baogorek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really excited to be in this repo. Great stuff here, especially the functions for handling the database constraints.
As I'm trying to understand the code, I'm looking for one end-to-end example that incorporates the state stacking and uses it. I'm in the Jupyter notebook and perhaps getting close with:
sim = Microsimulation(dataset = national_level_calibrated_dataset)
np.unique(sim.calculate('ucgid_str').values, return_counts=True)
But I still see that California has more samples, and I thought that state stacking meant we'd sample the same amount from every state.
Might want to turn down the logger in the python notebook:
import logging
# Option 1: just lower that logger
logger = logging.getLogger("microcalibrate.calibration")
logger.setLevel(logging.ERROR)
Also, I think this was triggered by running the notebook code. Any idea what happened?
baogorek@T14-Gen6:~/devl/policyengine-data$ ls
Alabama_calibration.csv Iowa_calibration_sparse.csv 'North Dakota_calibration_sparse.csv'
Alabama_calibration_sparse.csv Kansas_calibration.csv Ohio_calibration.csv
Alaska_calibration.csv Kansas_calibration_sparse.csv Ohio_calibration_sparse.csv
Alaska_calibration_sparse.csv Kentucky_calibration.csv Oklahoma_calibration.csv
Arizona_calibration.csv Kentucky_calibration_sparse.csv Oklahoma_calibration_sparse.csv
Arizona_calibration_sparse.csv LICENSE Oregon_calibration.csv
Arkansas_calibration.csv Louisiana_calibration.csv Oregon_calibration_sparse.csv
Arkansas_calibration_sparse.csv Louisiana_calibration_sparse.csv Pennsylvania_calibration.csv
California_calibration.csv Maine_calibration.csv Pennsylvania_calibration_sparse.csv
California_calibration_sparse.csv Maine_calibration_sparse.csv pyproject.toml
changelog_entry.yaml Makefile README.md
CHANGELOG.md Maryland_calibration.csv 'Rhode Island_calibration.csv'
changelog.yaml Maryland_calibration_sparse.csv 'Rhode Island_calibration_sparse.csv'
Colorado_calibration.csv Massachusetts_calibration.csv 'South Carolina_calibration.csv'
Colorado_calibration_sparse.csv Massachusetts_calibration_sparse.csv 'South Carolina_calibration_sparse.csv'
Connecticut_calibration.csv Michigan_calibration.csv 'South Dakota_calibration.csv'
Connecticut_calibration_sparse.csv Michigan_calibration_sparse.csv 'South Dakota_calibration_sparse.csv'
Dataset_stacked.h5 Minnesota_calibration.csv src
Dataset_state_level.h5 Minnesota_calibration_sparse.csv Tennessee_calibration.csv
Delaware_calibration.csv Mississippi_calibration.csv Tennessee_calibration_sparse.csv
Delaware_calibration_sparse.csv Mississippi_calibration_sparse.csv tests
'District of Columbia_calibration.csv' Missouri_calibration.csv Texas_calibration.csv
'District of Columbia_calibration_sparse.csv' Missouri_calibration_sparse.csv Texas_calibration_sparse.csv
docs Montana_calibration.csv 'United States_calibration.csv'
download Montana_calibration_sparse.csv Untitled.ipynb
Florida_calibration.csv Nebraska_calibration.csv Utah_calibration.csv
Florida_calibration_sparse.csv Nebraska_calibration_sparse.csv Utah_calibration_sparse.csv
full_calibration.csv Nevada_calibration.csv uv.lock
full_calibration_sparse.csv Nevada_calibration_sparse.csv Vermont_calibration.csv
Georgia_calibration.csv 'New Hampshire_calibration.csv' Vermont_calibration_sparse.csv
Georgia_calibration_sparse.csv 'New Hampshire_calibration_sparse.csv' Virginia_calibration.csv
Hawaii_calibration.csv 'New Jersey_calibration.csv' Virginia_calibration_sparse.csv
Hawaii_calibration_sparse.csv 'New Jersey_calibration_sparse.csv' Washington_calibration.csv
Idaho_calibration.csv 'New Mexico_calibration.csv' Washington_calibration_sparse.csv
Idaho_calibration_sparse.csv 'New Mexico_calibration_sparse.csv' 'West Virginia_calibration.csv'
Illinois_calibration.csv 'New York_calibration.csv' 'West Virginia_calibration_sparse.csv'
Illinois_calibration_sparse.csv 'New York_calibration_sparse.csv' Wisconsin_calibration.csv
Indiana_calibration.csv 'North Carolina_calibration.csv' Wisconsin_calibration_sparse.csv
Indiana_calibration_sparse.csv 'North Carolina_calibration_sparse.csv' Wyoming_calibration.csv
Iowa_calibration.csv 'North Dakota_calibration.csv' Wyoming_calibration_sparse.csv
| ucgid_values = sim.calculate("ucgid").values | ||
| # The system returns enum names as strings, so compare with the name | ||
| assert all(val == california_ucgid.name for val in ucgid_values) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just feels wierd to calculate ucgid and have the value be "CA" onscreen.
In [77]: sim.calculate('ucgid')
Out[77]:
value weight
0 CA 4709.080566
1 CA 4709.080566
2 CA 4709.080566
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! I thought making ucgid an Enum was a great idea but maybe it simply introduces unnecessary complexity... maybe the alternative geography identifying variables should be simple strings instead of Enums?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's run this by Max tomorrow. I know he wanted us to move away from in and I imagine just a string will suffice. However, I'm currently using the Enum to build the initial hierarchy for the strata (which I thought was a nice source of truth), so I'll have to get it somewhere else if it's no longer an Enum.
Yep, microcalibrate's Calibration function has a parameter called |
baogorek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Jupyter notebook is working really nicely now. There are still a few outstanding issues that we need to figure out, but I see no reason to block the merge. Nice work!
| ucgid_values = sim.calculate("ucgid").values | ||
| # The system returns enum names as strings, so compare with the name | ||
| assert all(val == california_ucgid.name for val in ucgid_values) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's run this by Max tomorrow. I know he wanted us to move away from in and I imagine just a string will suffice. However, I'm currently using the Enum to build the initial hierarchy for the strata (which I thought was a nice source of truth), so I'll have to get it somewhere else if it's no longer an Enum.
Fix #20
Fix #13
Fix #14
Fix #18
Fix #1
This matrix creation will now integrate geography hierarchical information as long as the database contains
ucgid_stras its variable and "in" as the constraint operation @baogorek