|
| 1 | +This repository contains the code used to generate our environmental data used in the analysis for [[PAPER TITLE HERE]]. We have provided both our generated data and our source code in hope that it will facilitate future analyses. Refer to the methods section for more detailed explanation of data preparation. |
| 2 | + |
| 3 | +The final data files used in our analysis are located in "final_data/". They are "final_data/ghi_matched_master_cleaned_plus_zcta.tsv" and "final_data/zcta_master_with_pollution.tsv". Files match those used in our analysis to within rounding error. |
| 4 | + |
| 5 | +-"final_data/zcta_master_with_pollution.tsv" contains each Zip Code Tabulation Area internal point matched to its nearest-neighbor environmental metric in each category. Each ZCTA is given a row in this dataset. This file was used in our analysis to assign environmental exposures to each patient in our study, as patients could be approximately localized to a ZCTA. |
| 6 | +-"final_data/ghi_matched_master_cleaned_plus_zcta.tsv" is used to generate high-resolution maps of environmental variables and risk ratios. In this file, each point of measurement for GHI and DNI has been matched to their nearest neighbor for every other environmental variable. This permits plotting up to the resolution of GHI and DNI, our highest-resolution data. |
| 7 | + |
| 8 | +To generate these data files from scratch, run "./code/sh_run_all.sh". |
| 9 | + |
| 10 | +Notes: |
| 11 | +- The zcta column in final_data/ghi_matched_master_cleaned_plus_zcta.tsv refers to the nearest ZCTA internal point, not necessarily the ZCTA within which the GHI and DNI latitude and longitude point reside. |
| 12 | +- Data generated in this repo matches our analysis data to 5 decimal places |
| 13 | +- An improvement to the mapping code would map each environmental variable at its native resolution, rather than at GHI resolution. This would actually result in more crisp maps, because the Voronoi cells would be larger with straight lines. |
| 14 | + |
| 15 | + |
| 16 | +Citations for Data Sources: |
| 17 | +- ZCTA information (coordinates internal points) obtained from R's Tigris package. |
| 18 | +- Elevation information from USGS Lidar Explorer: "https://prd-tnm.s3.amazonaws.com/LidarExplorer/index.html#/" |
| 19 | + - Select "DEM", "Show where DEMs exist?", "more info", and click to download 1 arc-second data. |
| 20 | +- GHI and DNI information from nsrdb viewer: "https://maps.nrel.gov/nsrdb-viewer" |
| 21 | + - Select GOES PSM v3 dropdown, and download "Multi Year PSM Direct Normal Irradiance" and "Multi Year PSM Global Horizontal Irradiance" |
| 22 | +- Weather data from NOAA: "https://www.ncei.noaa.gov/pub/data/normals/1981-2010/" |
| 23 | + - Our project used 1981-2010 30 year Climate Normals, but newer data has become available. |
| 24 | + - download "allstations.txt" from "https://www.ncei.noaa.gov/pub/data/normals/1981-2010/station-inventories/" |
| 25 | + - Download the following from "https://www.ncei.noaa.gov/pub/data/normals/1981-2010/products/precipitation/": |
| 26 | + - ann-prcp-normal.txt |
| 27 | + - ann-snow-normal.txt |
| 28 | + - djf-prcp-normal.txt |
| 29 | + - djf-snow-normal.txt |
| 30 | + - jja-prcp-normal.txt |
| 31 | + - jja-snow-normal.txt |
| 32 | + - Download the following from "https://www.ncei.noaa.gov/pub/data/normals/1981-2010/products/precipitation/": |
| 33 | + - ann-dutr-normal.txt |
| 34 | + - ann-tavg-normal.txt |
| 35 | + - ann-tmax-normal.txt |
| 36 | + - ann-tmin-normal.txt |
| 37 | + - djf-tavg-normal.txt |
| 38 | + - jja-tavg-normal.txt |
| 39 | + |
| 40 | + |
| 41 | +The raw weather data is provided in a less intuitive format. |
| 42 | +The following key to understanding the data format is taken from |
| 43 | +https://www1.ncdc.noaa.gov/pub/data/normals/1981-2010/readme.txt |
| 44 | +""" |
| 45 | + A. FORMAT OF ANNUAL/SEASONAL FILES |
| 46 | + (ann-*.txt, djf-*.txt, mam-*.txt, jja-*.txt, son-*.txt) |
| 47 | + |
| 48 | + Each file contains the annual/seasonal values of one parameter at all |
| 49 | + qualifying stations. There is one record (line) per station. |
| 50 | + |
| 51 | + The variables in each record include the following: |
| 52 | + |
| 53 | + Variable Columns Type |
| 54 | + ---------------------------- |
| 55 | + STNID 1- 11 Character |
| 56 | + VALUE 19- 23 Integer |
| 57 | + FLAG 24- 24 Character |
| 58 | + ---------------------------- |
| 59 | + |
| 60 | + These variables have the following definitions: |
| 61 | + |
| 62 | + STNID is the GHCN-Daily station identification code. See the lists in the |
| 63 | + station-inventories directory. |
| 64 | + VALUE1 is the annual/seasonal value. |
| 65 | + FLAG1 is the completeness flag for the annual/seasonal value. See Flags |
| 66 | + section below. |
| 67 | + |
| 68 | + E. FORMAT OF STATION INVENTORIES |
| 69 | + (*-inventory.txt, allstations.txt) |
| 70 | + |
| 71 | + Each file contains on station per line. |
| 72 | + |
| 73 | + The variables in each record include the following: |
| 74 | + ------------------------------ |
| 75 | + Variable Columns Type |
| 76 | + ------------------------------ |
| 77 | + ID 1-11 Character |
| 78 | + LATITUDE 13-20 Real |
| 79 | + LONGITUDE 22-30 Real |
| 80 | + ELEVATION 32-37 Real |
| 81 | + STATE 39-40 Character |
| 82 | + NAME 42-71 Character |
| 83 | + GSNFLAG 73-75 Character |
| 84 | + HCNFLAG 77-79 Character |
| 85 | + WMOID 81-85 Character |
| 86 | + METHOD* 87-99 Character |
| 87 | + ------------------------------ |
| 88 | + |
| 89 | + UNITS: |
| 90 | + hundredths of inches for average monthly/seasonal/annual precipitation, |
| 91 | + month-to-date/year-to-date precipitation, and percentiles of precipitation. |
| 92 | + e.g., "1" is 0.01" and "1486" is 14.86" |
| 93 | + |
| 94 | + tenths of inches for average monthly/seasonal/annual snowfall, |
| 95 | + month-to-date/year-to-date snowfall, and percentiles of snowfall. |
| 96 | + e.g. "39" is 3.9" |
| 97 | +""" |
| 98 | + |
0 commit comments