You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# TODO: create normalization factor to pass into Calibrator balancing targets at different levels
294
298
defcalibrate_all_levels(
295
299
database_stacking_areas: Dict[str, str],
296
300
dataset: str,
297
301
dataset_subsample_size: Optional[int] =None,
298
302
geo_sim_filter_variable: Optional[str] ="ucgid",
303
+
geo_hierarchy: Optional[List[str]] =None,
299
304
year: Optional[int] =2023,
300
305
db_uri: Optional[str] =None,
301
306
noise_level: Optional[float] =10.0,
302
307
regularize_with_l0: Optional[bool] =False,
303
308
raise_error: Optional[bool] =True,
304
-
):
309
+
)->"SingleYearDataset":
305
310
"""
306
311
This function will calibrate the dataset for all geography levels in the database, defaulting to stacking the base dataset per area within the specified level (it is recommended to use the lowest in the hierarchy for stacking). (Eg. when calibrating for district, state and national levels in the US, this function will stack the CPS dataset for each district and calibrate the stacked dataset for the three levels' targets.)
307
312
It will handle conversion between dataset classes to enable:
@@ -318,6 +323,7 @@ def calibrate_all_levels(
318
323
dataset (str): Path to the base dataset to stack.
319
324
dataset_subsample_size (Optional[int]): The size of the subsample to use for calibration.
320
325
geo_sim_filter_variable (Optional[str]): The variable to use for geographic similarity filtering. Default in the US: "ucgid".
326
+
geo_hierarchy (Optional[List[str]]): The geographic hierarchy to use for calibration.
321
327
year (Optional[int]): The year to use for calibration. Default: 2023.
322
328
db_uri (Optional[str]): The database URI to use for calibration. If None, it will download the database from the default URI.
323
329
noise_level (Optional[float]): The noise level to use for calibration. Default: 10.0.
Copy file name to clipboardExpand all lines: src/policyengine_data/calibration/metrics_matrix_creation.py
+2-34Lines changed: 2 additions & 34 deletions
Original file line number
Diff line number
Diff line change
@@ -6,41 +6,9 @@
6
6
frompolicyengine_usimportMicrosimulation
7
7
fromsqlalchemyimportcreate_engine
8
8
9
-
logger=logging.getLogger(__name__)
10
-
11
-
12
-
defdownload_database(
13
-
filename: Optional[str] ="policy_data.db",
14
-
repo_id: Optional[str] ="policyengine/test",
15
-
) ->create_engine:
16
-
"""
17
-
Download the SQLite database from Hugging Face Hub and return the connection string.
18
-
19
-
Args:
20
-
filename: Optional name of the database file to download
21
-
repo_id: Optional Hugging Face repository ID where the database is stored
22
-
23
-
Returns:
24
-
Connection string for the SQLite database
25
-
"""
26
-
importos
9
+
from .target_rescalingimportdownload_database
27
10
28
-
fromhuggingface_hubimporthf_hub_download
29
-
30
-
# Download the file to the current working directory
31
-
try:
32
-
downloaded_path=hf_hub_download(
33
-
repo_id=repo_id,
34
-
filename=filename,
35
-
local_dir=".", # Use "." for the current working directory
36
-
local_dir_use_symlinks=False, # Recommended to avoid symlinks
37
-
)
38
-
path=os.path.abspath(downloaded_path)
39
-
logger.info(f"File downloaded successfully to: {path}")
40
-
returnf"sqlite:///{path}"
41
-
42
-
exceptExceptionase:
43
-
raiseValueError(f"An error occurred: {e}")
11
+
logger=logging.getLogger(__name__)
44
12
45
13
46
14
# NOTE (juaristi22): This could fail if trying to filter by more than one stratum constraint if there are mismatches between the filtering variable, value and operation.
Create a normalization factor for the calibration process to balance targets that belong to different geographic areas or concepts.
17
+
18
+
Args:
19
+
geo_hierarchy (List[str]): Geographic hierarchy levels' codes (e.g., ["0100000US", "0400000US", "0500000US"]). Make sure to pass the part of the code general to all areas within a given level.
20
+
target_info (Dict[int, Dict[str, any]]): A dictionary containing information about each target, including its name which denotes geographic area and its active status.
21
+
22
+
Returns:
23
+
normalization_factor (torch.Tensor): Normalization factor for each active target.
24
+
"""
25
+
is_active= []
26
+
geo_codes= []
27
+
geo_level_sum= {}
28
+
29
+
forcodeingeo_hierarchy:
30
+
geo_level_sum[code] =0
31
+
32
+
# First pass: collect active status and geo codes for all targets
Test and example of the calibration routine involving calibrating one geographic level at a time from lowest to highest in the hierarchy and generating sparsity in all but the last levels.
Test and example of the calibration routine involving stacking datasets at a single (most often lowest) geographic level for increased data richness and then calibrating said stacked dataset for all geographic levels at once.
0 commit comments