-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
183 lines (130 loc) · 6.04 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
---
title: "CEFI"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
[NOAA's Physical Science Laboratory (PSL)](https://psl.noaa.gov/) [Climate Ecosystems and Fisheries Initiative Portal](https://psl.noaa.gov/cefi_portal/) serves historical and forecast data useful in ecological studies. Data is served using a [THREDDS](https://psl.noaa.gov/thredds/catalog/Projects/CEFI/regional_mom6/catalog.html) catalog, but PSL also makes [tabular catalogs](https://psl.noaa.gov/cefi_portal/var_list_northwest_atlantic_hist_run.html) available, too.
# Requirements
+ [R v4.1+](https://www.r-project.org/)
+ [rlang](https://CRAN.R-project.org/package=rlang)
+ [dplyr](https://CRAN.R-project.org/package=dplyr)
+ [sf](https://CRAN.R-project.org/package=sf)
+ [stars](https://CRAN.R-project.org/package=stars)
+ [jsonlite](https://CRAN.R-project.org/package=jsonlite)
+ [tidync](https://CRAN.R-project.org/package=tidync)
# Installation
```
remotes::install_github("BigelowLab/cefi")
```
# Usage
Load the libraries needed.
```{r}
suppressPackageStartupMessages({
library(rnaturalearth)
library(cefi)
library(tidync)
library(stars)
library(dplyr)
})
```
## Catalogs
CEFI offers THREDDS catalogs which are great for mining programmatically, but they also provide simple tabular catalogs: one for historical runs and one for forecasts. (Yes, the first variable is "Varible_Name".) These are easy for users to navigate.
```{r}
uri = catalog_uri(region = "Northwest Atlantic", period = "history")
hist = read_catalog(uri) |>
dplyr::glimpse()
```
```{r}
uri = catalog_uri(region = "Northwest Atlantic", period = "forecast")
fcst = read_catalog(uri) |>
dplyr::glimpse()
```
Assuming these catalogs will remain and stay up-to-date, we can leverage them in lieu of coding up software to navigate the THREDDS catalogs.
# Getting data
To get data select one row from either catalog, and open that resource which we see in R as a [tidync](https://docs.ropensci.org/tidync/) object useful for navigating and extracting netcdf files.
### Historical data
Let's start with historical data.
```{r}
nc = hist |>
dplyr::filter(Variable_Name == "btm_o2") |>
cefi_open()
nc
```
Note that the spatial dimensions of the "active" grid are defined by `xh` nd `yh` not `x` and `y` or `lon` and `lat`. That is a choice of the creator of the NetCDF resource, and so we use those names for filtering.
Time requires some background understanding of the CEFI architecture coupled with the `tidync` approach to navigating the NetCDF object. To ease that for the user we have created a function called `cefi_time()` which is a wrapper around `tidync::hyper_transforms()`, but adds a column called `time_` which can be either 'Date' of 'POSIXct' class.
```{r}
cefi_time(nc)
```
Next we filter the array with `cefi_filter()`, so that we can collect a subset of the data. This function is a wrapper around the `tidync::hyper_filter()` function; we wrap because the time coordinate in the NetCDF files is not in user-friendly format. Note that it is important that `time` comes before any other filtering element.
```{r}
nc = cefi_filter(nc,
time = as.Date(c("1995-12-16", "1995-12-20")),
xh = xh > -75 & xh < -60,
yh = yh > 40 & yh < 50)
```
It is important to understand that the actual data is not loaded into R (yet), think of this a prefiltering step. To actually get the data we can call `cefi_var`. You can explore more about prefiltering at the [tidync website](https://docs.ropensci.org/tidync/).
Now we can load the data into R, using `cefi_var()`.
```{r}
s = cefi_var(nc)
s
```
And finally we can plot the result.
```{r}
coast = rnaturalearth::ne_coastline(scale = "medium", returnclass = "sf") |>
sf::st_geometry() |>
sf::st_crop(sf::st_bbox(s))
plot_coast = function(){
plot(sf::st_geometry(coast), add = TRUE, col = "darkorange")
}
plot(s, hook = plot_coast, key.pos = NULL)
```
### Forecast data
Now let's look at forecast data.
```{r}
nc = fcst |>
dplyr::filter(Variable_Name == "tob" & Time_of_Initialization == "2022-12") |>
cefi_open()
nc
```
Here time is measured in `lead` time (in months) relative to an initalization date. But note that the actually time varying dimension is called `lead` even though we address it in the more familiar `time`. Once again we can use `cefi_time()` to expose the time-varying dimension relative to the starting date.
```{r}
cefi_time(nc)
```
```{r}
nc = cefi_filter(nc,
time = as.Date(c("1994-12-01", "1995-02-01")),
xh = xh > -75 & xh < -60,
yh = yh > 40 & yh < 50)
```
Unlike the historical runs, the forecast results include one or more ensemble member results for each time period. These are identified as `member` which you might think of as replicates.
Here we show retrieving all of the members.
```{r}
s = cefi_var(nc, collapse_fun = NULL)
s
```
To see the ensemble results for one date, we can slice-and-dice using indexing.In this case, there are two variables, `tob` (temperature of bottom) and `tob_anom` temperature of bottom anomaly. Below we show only `tob`.
```{r}
times = st_get_dimension_values(s, "time")
plot(s['tob'][,,,,1], hook = plot_coast)
```
Requesting all members for a given time is possible as shown above, but a more common use case is to extract a summary (mean or median) or a measure of variability (standard deviation or variance).
```{r}
s = cefi_var(nc, collapse_fun = mean)
s
```
Note that the dimensionality is reduced because we computed the mean of the ensembles at each time.
```{r collaped_mean_plot}
plot(s['tob'], hook = plot_coast, main = "Mean Temp of Bottom")
```
### Granularity
Here we highlight the granularity of the data source by zooming in on the New England coastline. You can see that large embayments such as Penobscot and Narragansett bays are not included.
```{r granularity}
plot(dplyr::slice(s['tob'], "time", 1),
xlim = c(-72 , -63),
ylim = c(39, 45),
reset = FALSE,
axes = TRUE)
plot(coast, add = TRUE, col = "orange")
```