-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy path01_z04_railroad_acres.rmd
204 lines (157 loc) · 6.49 KB
/
01_z04_railroad_acres.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
### Project Overview
---
**Objective**
+ Understand the impact of railroad on corn planted acreage in Illinois
---
**Datasets**
+ USDA corn planted acreage for Illinois downloaded from the USDA NationalAgricultural Statistics Service (NASS) QuickStats service using `tidyUSDA` package
+ US railroads (line data) downloaded from [here](https://catalog.data.gov/dataset/tiger-line-shapefile-2015-nation-u-s-rails-national-shapefile)
---
**Econometric Model**
We will estimate the following model:
$$
y_i = \beta_0 + \beta_1 RL_i + v_i
$$
where $y_i$ is corn planted acreage in county $i$ in Illinois, $RL_i$ is the total length of railroad, and $v_i$ is the error term.
---
**GIS tasks**
+ Download USDA corn planted acreage by county as a spatial dataset (`sf` object)
* use `tidyUSDA::getQuickStat()`
+ Import US railroad shape file as a spatial dataset (`sf` object)
* use `sf:st_read()`
+ Spatially subset (crop) the railroad data to the geographic boundary of Illinois
* use `sf_1[sf_2, ]`
+ Find railroads for each county (cross-county railroad will be chopped into pieces for them to fit within a single county)
* use `sf::st_intersection()`
+ Calculate the travel distance of each railroad piece
* use `sf::st_length()`
* create maps using the `ggplot2` package
- use `ggplot2::geom_sf()`
---
**Preparation for replication**
+ Run the following code to install or load (if already installed) the `pacman` package, and then install or load (if already installed) the listed package inside the `pacman::p_load()` function.
```{r demo4_packages}
if (!require("pacman")) install.packages("pacman")
pacman::p_load(
tidyUSDA, # access USDA NASS data
sf, # vector data operations
dplyr, # data wrangling
ggplot2, # for map creation
modelsummary, # regression table generation
keyring # API management
)
```
+ Run the following code to define the theme for map:
```{r define_theme_2, eval = F}
theme_for_map <-
theme(
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.line = element_blank(),
panel.border = element_blank(),
panel.grid.major = element_line(color = "transparent"),
panel.grid.minor = element_line(color = "transparent"),
panel.background = element_blank(),
plot.background = element_rect(fill = "transparent", color = "transparent")
)
```
### Project Demonstration
We first download corn planted acreage data for 2018 from USDA NASS QuickStat service using `tidyUSDA` package^[In order to actually download the data, you need to obtain the API key [here](https://quickstats.nass.usda.gov/api). Once the API key was obtained, I stored it using `set_key()` from the `keyring` package, which was named "usda_nass_qs_api". In the code to the left, I retrieve the API key using `key_get("usda_nass_qs_api")` in the code. For your replication, replace `key_get("usda_nass_qs_api")` with your own API key.].
```{r get_quicknass, results = "hide", cache = TRUE}
(
IL_corn_planted <-
getQuickstat(
#--- use your own API key here fore replication ---#
key = key_get("usda_nass_qs_api"),
program = "SURVEY",
data_item = "CORN - ACRES PLANTED",
geographic_level = "COUNTY",
state = "ILLINOIS",
year = "2018",
geometry = TRUE
) %>%
#--- keep only some of the variables ---#
dplyr::select(year, NAME, county_code, short_desc, Value)
)
```
```{r IL_corn_planted, echo = F}
IL_corn_planted
```
A nice thing about this function is that the data is downloaded as an `sf` object with county geometry with `geometry = TRUE`. So, you can immediately plot it (Figure \@ref(fig:map-il-corn-acreage)) and use it for later spatial interactions without having to merge the downloaded data to an independent county boundary data.
```{r map-il-corn-acreage, fig.cap = "Map of Con Planted Acreage in Illinois in 2018"}
ggplot(IL_corn_planted) +
geom_sf(aes(fill = Value / 1000)) +
scale_fill_distiller(name = "Planted Acreage (1000 acres)", palette = "YlOrRd", trans = "reverse") +
theme(legend.position = "bottom") +
theme_for_map
```
---
Let's import the U.S. railroad data and reproject to the CRS of `IL_corn_planted`:
```{r Demo5_rail, dependson = "get_quicknass", cache = FALSE}
rail_roads <-
st_read("Data/tl_2015_us_rails.shp") %>%
#--- reproject to the CRS of IL_corn_planted ---#
st_transform(st_crs(IL_corn_planted))
```
Here is what it looks like:
```{r Demo5-rail-plot, fig.cap = "Map of Railroads", cache = TRUE}
ggplot(rail_roads) +
geom_sf() +
theme_for_map
```
We now crop it to the Illinois state border (Figure \@ref(fig:Demo5-rail-IL-plot)):
```{r crop_to_IL_run, echo = FALSE, dependson = "get_quicknass", cache = TRUE}
rail_roads_IL <- rail_roads[IL_corn_planted, ]
```
```{r crop_to_IL, eval = FALSE, dependson = "get_quicknass"}
rail_roads_IL <- rail_roads[IL_corn_planted, ]
```
```{r Demo5-rail-IL-plot, fig.cap = "Map of railroads in Illinois", cache = TRUE}
ggplot() +
geom_sf(data = rail_roads_IL) +
theme_for_map
```
Let's now find railroads for each county, where cross-county railroads will be chopped into pieces so each piece fits completely within a single county, using `st_intersection()`.
```{r intersect_rails, dependson = "get_quicknass", cache = TRUE}
rails_IL_segmented <- st_intersection(rail_roads_IL, IL_corn_planted)
```
Here are the railroads for Richland County:
```{r map_seg_rail, dependson = "intersect_rails", fig.height = 6}
ggplot() +
geom_sf(data = dplyr::filter(IL_corn_planted, NAME == "Richland")) +
geom_sf(
data = dplyr::filter(rails_IL_segmented, NAME == "Richland"),
aes(color = LINEARID)
) +
theme(legend.position = "bottom") +
theme_for_map
```
We now calculate the travel distance (Great-circle distance) of each railroad piece using `st_length()` and then sum them up by county to find total railroad length by county.
```{r rail_total_county}
(
rail_length_county <-
mutate(
rails_IL_segmented,
length_in_m = as.numeric(st_length(rails_IL_segmented))
) %>%
#--- geometry no longer needed ---#
st_drop_geometry() %>%
#--- group by county ID ---#
group_by(county_code) %>%
#--- sum rail length by county ---#
summarize(length_in_m = sum(length_in_m))
)
```
---
We merge the railroad length data to the corn planted acreage data and estimate the model.
```{r merge_data}
reg_data <- left_join(IL_corn_planted, rail_length_county, by = "county_code")
```
```{r rail-reg-table}
lm(Value ~ length_in_m, data = reg_data) %>%
modelsummary(
stars = TRUE,
gof_omit = "IC|Log|Adj|Within|Pseudo"
)
```
---