Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
26a9e5e
matrix creation logic
juaristi22 Jul 29, 2025
16a0f46
add sqlalchemy dependency
juaristi22 Jul 29, 2025
c030803
fix initialization
juaristi22 Jul 29, 2025
f725fba
update "in" operation to check for string matches
juaristi22 Jul 29, 2025
fff2e60
download database from huggingface
juaristi22 Jul 30, 2025
db5798e
update test to also use db downloading logic
juaristi22 Jul 30, 2025
047c8fd
add stratum constraint filtering option
juaristi22 Aug 4, 2025
0fdf959
Merge branch 'main' of https://github.com/PolicyEngine/policyengine-d…
juaristi22 Aug 6, 2025
ff23f25
adding note
juaristi22 Aug 6, 2025
8c3bf86
update import path
juaristi22 Aug 6, 2025
aabee8a
lint
juaristi22 Aug 6, 2025
e8f37d2
Merge branch 'main' of https://github.com/PolicyEngine/policyengine-d…
juaristi22 Aug 7, 2025
b183c6f
conversion between dataset classes
juaristi22 Aug 7, 2025
2506ee9
initial stab at state-level calibration logic
juaristi22 Aug 7, 2025
5b8ab95
update key normalisation to take more than one start_index
juaristi22 Aug 7, 2025
24ba55e
adding calibration function for all areas in a geography level (pendi…
juaristi22 Aug 7, 2025
00b38ac
debugged state-level calibration
juaristi22 Aug 8, 2025
75adba6
state level calibration works except age mapping
juaristi22 Aug 8, 2025
9cfce9d
handled age entity mapping in constraint application
juaristi22 Aug 8, 2025
f15c720
fixed state-level calibration
juaristi22 Aug 11, 2025
db10495
state and national calibration for age targets
juaristi22 Aug 12, 2025
803516f
fixing bug in microsims for when converting between dataset class types
juaristi22 Aug 13, 2025
709eda0
Fix database True/False to be handled as bool instead of str
juaristi22 Aug 14, 2025
b108de3
more testing coverage
juaristi22 Aug 14, 2025
5ab75d7
add function for calibrating all geo levels at once
juaristi22 Aug 14, 2025
f9c5cfa
create tests and document calibration
juaristi22 Aug 14, 2025
96a2c99
update database link to enable calibration in ci
juaristi22 Aug 15, 2025
2467e4f
update calibration test to use online database
juaristi22 Aug 15, 2025
1611052
add is_greater_than to be able to process snap
juaristi22 Aug 15, 2025
feaa022
update documentation
juaristi22 Aug 18, 2025
af9f536
remove -us Microsimulation dependencies
juaristi22 Aug 18, 2025
c03c4ca
update calibration docs with recent changes
juaristi22 Aug 18, 2025
b1426b0
included target uprating
juaristi22 Aug 18, 2025
5480f94
remove automatic saving of calibration log csvs
juaristi22 Aug 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions changelog_entry.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
- bump: minor
changes:
added:
- Logic to create estimate matrix for calibration from a database.
- Conversion functions between dataset classes to enable stacking datasets.
- Logic to calibrate for multiple geographic levels with two different routines.
- Calibration documentation.
2 changes: 2 additions & 0 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@ format: jb-book
root: intro
chapters:
- file: dataset.ipynb
- file: normalise_keys.md
- file: calibration.ipynb
394 changes: 394 additions & 0 deletions docs/calibration.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/dataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"id": "gq2207gugn",
"metadata": {},
"source": [
"# PolicyEngine Dataset classes documentation\n",
"# PolicyEngine Dataset classes\n",
"\n",
"This notebook provides documentation for the `SingleYearDataset` and `MultiYearDataset` classes in PolicyEngine Data. These classes are designed to handle structured data for policy analysis and microsimulation.\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/intro.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
## PolicyEngine Data
## PolicyEngine Data

This is the documentation for PolicyEngine Data, the open-source Python package powering PolicyEngine's data processing and storing functionality. It is used by PolicyEngine UK Data and PolicyEngine US Data, which each define the custom logic specific to processing UK and US data sources.
32 changes: 16 additions & 16 deletions docs/normalise_keys.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,32 +21,32 @@ Normalises primary and foreign keys across multiple related tables.
import pandas as pd
from policyengine_data import normalise_table_keys

users = pd.DataFrame({
'user_id': [101, 105, 103],
person = pd.DataFrame({
'person_id': [101, 105, 103],
'name': ['Alice', 'Bob', 'Carol']
})

orders = pd.DataFrame({
'order_id': [201, 205, 207],
'user_id': [105, 101, 105],
'amount': [25.99, 15.50, 42.00]
household = pd.DataFrame({
'household_id': [201, 205, 207],
'person_id': [105, 101, 105],
'income': [25000, 15000, 42000]
})

tables = {'users': users, 'orders': orders}
primary_keys = {'users': 'user_id', 'orders': 'order_id'}
tables = {'person': person, 'household': household}
primary_keys = {'person': 'person_id', 'household': 'household_id'}

# Auto-detect foreign keys
normalised = normalise_table_keys(tables, primary_keys)

# Or specify foreign keys explicitly
foreign_keys = {'orders': {'user_id': 'users'}}
foreign_keys = {'household': {'person_id': 'persons'}}
normalised = normalise_table_keys(tables, primary_keys, foreign_keys)
```

After normalisation:
- User IDs become 0, 1, 2 (instead of 101, 105, 103)
- Order IDs become 0, 1, 2 (instead of 201, 205, 207)
- Foreign key relationships are preserved (Bob's orders still reference Bob's new ID)
- Person IDs become 0, 1, 2 (instead of 101, 105, 103)
- Household IDs become 0, 1, 2 (instead of 201, 205, 207)
- Foreign key relationships are preserved (Bob's household still reference Bob's new ID)

### `normalise_single_table_keys(df, key_column, start_index=0)`

Expand All @@ -66,12 +66,12 @@ import pandas as pd
from policyengine_data import normalise_single_table_keys

df = pd.DataFrame({
'id': [101, 105, 103],
'value': ['A', 'B', 'C']
'person_id': [101, 105, 103],
'age': [25, 30, 35]
})

normalised = normalise_single_table_keys(df, 'id')
# Result: IDs become 0, 1, 2
normalised = normalise_single_table_keys(df, 'person_id')
# Result: person_ids become 0, 1, 2
```

## Key features
Expand Down
5 changes: 4 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,11 @@ dependencies = [
"huggingface_hub>=0.25.1",
"tables",
"policyengine-core>=3.6.4",
"policyengine-us", # remove as soon as we fix UCGID
"microdf-python",
"microcalibrate",
"sqlalchemy",
"huggingface_hub",
"torch",
]

[project.optional-dependencies]
Expand All @@ -30,6 +32,7 @@ dev = [
"build",
"linecheck",
"yaml-changelog>=0.1.7",
"policyengine-us>=1.366.0",
]

docs = [
Expand Down
11 changes: 11 additions & 0 deletions src/policyengine_data/calibration/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,12 @@
from .calibrate import calibrate_all_levels, calibrate_single_geography_level
from .dataset_duplication import (
load_dataset_for_geography_legacy,
minimize_calibrated_dataset_legacy,
)
from .metrics_matrix_creation import (
create_metrics_matrix,
validate_metrics_matrix,
)
from .target_rescaling import download_database, rescale_calibration_targets
from .target_uprating import uprate_calibration_targets
from .utils import create_geographic_normalization_factor
Loading