You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have had several questions recently about how to "split" or "disaggregate" data reported for a country in the global assessment to sub-country regions (example: state province) in an OHI+ assessment. While it is possible, using local data is preferred, if possible. This is because, not only are there conceptual challenges with how well disaggregating represents reality, but depending on the dataset, it can take a lot of computational power to do after deciding conceptually that is is appropriate for the layer.
There is no short answer, but hopefully this issue will get you started about how to think about this moving forward. A lot of this will be in the context of “intermediate” data. We consider anything between the "raw data" that you download (or is given to you) and the "data layer" (the final 3-column table of regions and summarized values ready for OHI calculations) to be “intermediate data”. For more information, read a recent blog post about intermediate data.
Where is data prepared for the Global Assessment?
Global assessments have separate repositories for data preparation (they do not have a "prep" folder within the ohi-global repository). Instead, data preparation occurs in the ohiprep repository, and recently we have begun saving each assessment year as a separate repository:
Here, you will find the complete workflow to prepare data layers. They are organized in a reasonable way, usually based on goal or data set. We are constantly trying to improve our documentation, through READMEs and through doing all data prep in RMarkdown.
For example, to find how species (SPP) data are prepared, you would navigate to > spp_ico and scroll down to read the README. Clicking on the v2017 folder will show all the scripts involved in processing, and again scrolling down to the README will take you to the one RMarkdown document that presents the whole preparation as a workflow: spp_data_prep.html.
Splitting or disaggregating
Splitting or disaggregating data from the the global assessment (i.e., what is provided in your OHI+ repository) into your smaller-scale regions takes some thought. It's best to go back to the raw or intermediate data and see where would be best to intervene. If the raw data was spatial (a raster or polygons), see the next topic below. With tabular data, there is a series of decisions you will have to make because the data was reported at the country level, but you want it split/disaggregated to your coastal states or provinces, for example. To disaggregate, you will need to think of a proportion that would make sense to disaggregate each specific data layer, and sometimes, you do not want to disaggregate at all. We tend to use either proportion of area or proportion of coastal population. Some examples:
to disaggregate GDP data, we would use the proportion of coastal population for each region
to disaggregate habitat data, we would use use the proportion of area for each region (unless we could process the original spatial files!)
for wages data, we wouldn't disaggregate. Without further information, we would assume that wages was equal across all regions.
Big files, especially with spatial data
Right now, we only have the capacity to populate these tailored repositories with the final "data layers" tabluar data that we have already been processed from raster files during our data preparation phase. We could greatly improve the data these OHI+ groups have access to and therefore where they start their assessments if we had a system for extracting raster data for their spatial scale. OHI+ assessments often are most often focusing on province or state spatial scales, and can potentially have multiple coastlines like Colombia and Mexico. We all lose a lot of information because of the size of those original rasters because we don’t have a good way to process them for these requests.
We have written a blog to teach you how to extract raw data at the scale of your assessment, but it is very computationally intensive (there are ~15 global raster datasets and sometimes the downloading alone takes too much bandwidth). We are working to add functionality on our end to extract these data, stay tuned...
The text was updated successfully, but these errors were encountered:
We have had several questions recently about how to "split" or "disaggregate" data reported for a country in the global assessment to sub-country regions (example: state province) in an OHI+ assessment. While it is possible, using local data is preferred, if possible. This is because, not only are there conceptual challenges with how well disaggregating represents reality, but depending on the dataset, it can take a lot of computational power to do after deciding conceptually that is is appropriate for the layer.
There is no short answer, but hopefully this issue will get you started about how to think about this moving forward. A lot of this will be in the context of “intermediate” data. We consider anything between the "raw data" that you download (or is given to you) and the "data layer" (the final 3-column table of regions and summarized values ready for OHI calculations) to be “intermediate data”. For more information, read a recent blog post about intermediate data.
Where is data prepared for the Global Assessment?
Global assessments have separate repositories for data preparation (they do not have a "prep" folder within the ohi-global repository). Instead, data preparation occurs in the ohiprep repository, and recently we have begun saving each assessment year as a separate repository:
Here, you will find the complete workflow to prepare data layers. They are organized in a reasonable way, usually based on goal or data set. We are constantly trying to improve our documentation, through READMEs and through doing all data prep in RMarkdown.
For example, to find how species (SPP) data are prepared, you would navigate to > spp_ico and scroll down to read the README. Clicking on the v2017 folder will show all the scripts involved in processing, and again scrolling down to the README will take you to the one RMarkdown document that presents the whole preparation as a workflow: spp_data_prep.html.
Splitting or disaggregating
Splitting or disaggregating data from the the global assessment (i.e., what is provided in your OHI+ repository) into your smaller-scale regions takes some thought. It's best to go back to the raw or intermediate data and see where would be best to intervene. If the raw data was spatial (a raster or polygons), see the next topic below. With tabular data, there is a series of decisions you will have to make because the data was reported at the country level, but you want it split/disaggregated to your coastal states or provinces, for example. To disaggregate, you will need to think of a proportion that would make sense to disaggregate each specific data layer, and sometimes, you do not want to disaggregate at all. We tend to use either proportion of area or proportion of coastal population. Some examples:
Big files, especially with spatial data
Right now, we only have the capacity to populate these tailored repositories with the final "data layers" tabluar data that we have already been processed from raster files during our data preparation phase. We could greatly improve the data these OHI+ groups have access to and therefore where they start their assessments if we had a system for extracting raster data for their spatial scale. OHI+ assessments often are most often focusing on province or state spatial scales, and can potentially have multiple coastlines like Colombia and Mexico. We all lose a lot of information because of the size of those original rasters because we don’t have a good way to process them for these requests.
We have written a blog to teach you how to extract raw data at the scale of your assessment, but it is very computationally intensive (there are ~15 global raster datasets and sometimes the downloading alone takes too much bandwidth). We are working to add functionality on our end to extract these data, stay tuned...
The text was updated successfully, but these errors were encountered: