-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathPart1-AOT.Rmd
More file actions
193 lines (140 loc) · 7.75 KB
/
Part1-AOT.Rmd
File metadata and controls
193 lines (140 loc) · 7.75 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
---
title: "Part 1 - AOT Workshop"
author: "Center for Spatial Data Science"
date: "8/28/2018"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo =TRUE)
```

# Sensor Data Access and Mapping Basics
AoT Workshop, Spatial Analysis, Part 1. August 30th, 2018. Argonne National Laboratory
This tutorial is brought to you by Anais Ladoy, Isaac Kamber, Marynia Kolak, Julia Koschinsky, and Luc Anselin at the Center for Spatial Data Science at the University of Chicago (see <a href>spatial.uchicago.edu</a> for more info).
***
## Environment Setup
This tutorial was generated with the R 3.3 kernel installed on a jupyter notebook. See <a href>http://jupyter.org/install</a> for installation of jupyter notebooks on your system, and <a href>https://github.com/IRkernel/IRkernel</a> for installation instructions of the native R kernel.
Spatial analysis in R requires multiple libraries. Package installation is done with the following syntax: install.packages("sp"). Some of these take additional time for installation, depending on your system. The following list is comprehensive for this tutorial, as well as much spatial analysis.
```{r}
library(sp) #spatial data wrangling & analysis
library(rgdal) #spatial data wrangling
library(tmap) #modern data visualizations
library(leaflet) #modern data visualizations
library(rgeos)
library(tibble)
```
## Download Daily AoT Data
Former versions of uncalibrarted AoT data were found on the ANL Waggle datasite. This data was bundled daily to include all previous data, making the size unmanageable and not ideal for daily updates. (However, this data is still available and may be useful for one-off analysis or historical investigation of sensor data.)
In this tutorial, we focus on the new AoT data product, provided as complete daily data dumps on Plenar.io via <a href>
https://aot-file-browser.plenar.io/data-sets/chicago-complete</a>.
We use files from August 25, 2018, which are archived in `./Data`. The code below should not be run, unless you want to update the files in `./Data`.
```{r, eval=FALSE}
## Note: This code chunk has been set to *not* evaluate using "eval=FALSE"
## option in the markdown file
yesterday <- Sys.Date() - 1
remote.file <- sprintf(c("https://s3.amazonaws.com/aot-tarballs/chicago-public.daily.%s.tar"),
yesterday)
local.file <- basename(remote.file)
download.file(remote.file, local.file)
## untar the file.
untar(local.file)
```
## Read and Inspect Data
Read in the files that contain node data, and inspect
```{r}
nodes <- read.csv("Data/nodes.csv")
head(nodes)
sensor.info <- read.csv("Data/sensors.csv")
head(sensor.info)
provenance <- read.csv("Data/provenance.csv")
head(provenance)
```
Read in the files that contain sensor data, and inspect
```{r}
sensor.data <- read.csv("Data/data.csv.gz")
head(sensor.data)
```
Let's look at the underlying data structures of the data.
```{r}
glimpse(sensor.data)
```
We will return to this data more closely in Part 2, where we will plot and interpolate temperature, but first let's better understand the spatial distribution of the nodes and sensors.
## Convert data to spatial formats
From our data inspection completed prior, we know that latitutde and longitude information can be found in the nodes dataset. Data with latitute and longitude in columns is not explicitly spatial data until it's spatial features have been projected and enabled. Thus, we need to convert the CSV format to a spatial format. In R, when using the sp package, we use the Spatial Points Data Frame.
```{r}
nodes.spt <- SpatialPointsDataFrame(nodes[,c('lon','lat')],nodes)
proj4string(nodes.spt) <- CRS("+init=epsg:4326")
```
Let's plot the nodes in a basic format to ensure they're plotting correctly:
```{r}
plot(nodes.spt)
```
We can vaguely see the shape of Chicago, plus a monitor to the Southwest (likely at Argonne).
How many sensors are in the dataset? Checking the length of the dataset is one way to get the total number of sensors. You could alternatively count the number of unique node id's, just to make sure that no node is counted twice for any reason. There are 90 unique nodes.
```{r}
length(nodes.spt)
#unique(nodes.spt$node_id)
```
## Mapping Nodes
Let's plot the nodes again, but in a more modern cartography. Here we'll add a categorical component to distinguish the type of node using the "description" parameter.
```{r}
tmap_mode('plot')
tm_shape(nodes.spt) + tm_dots("description",style="cat",size=0.3)
```
Let's add a basemap for context, and make the map interactive.
```{r}
tmap_mode('view')
tm_shape(nodes.spt) + tm_dots("description",style="cat",size=0.1)
```
## Mapping Nodes with Community Areas
For additional context, we'll add Chicago community areas as a layer. First, you will need to download a spatial file of Chicago communities in a format like geojson or shapefile. Here, we've downloaded and added a shapefile (composed of 4 data files) to the working directory for easy access.
```{r}
chiCA <- readOGR("./Data","ChiComArea")
```
We could inspect the structure of the spatial data file using the str() function, we'll get a long overview returned detailing the complex spatial structures of the dataset. Spatial data formats are more complex than their non-spatial counterparts.
To inspect the data attributes of the shapefile, we'll look at the non-spatial data dimension only; with the sp package, this is done using @data.
```{r}
head(chiCA@data)
```
We've confirmed that our community area file has all 77 Chicago areas. Let's plot it quickly for final inspection:
```{r}
plot(chiCA)
```
Now we'll add the Community Areas to our interactive map.
```{r}
tm_shape(chiCA) + tm_borders(alpha = 0.7) + tm_shape(nodes.spt) + tm_dots("description",style="cat",size=0.1)
```
To finalize the map, let's add labels of each Community Area. The labels will be overlapping, with the current default, until the map is zoomed in further.
```{r}
tm_shape(chiCA) + tm_borders(alpha = 0.7) + tm_text("community", size=0.5) + tm_shape(nodes.spt) + tm_dots("description",style="cat",size=0.1)
```
## Mapping Density of AoT Sensors
Let's further inspect the distribution of sensors across the city. Our next goal will be to generate a buffer of 1 kilometer around each sensor. We will then visualize the buffers, and finally calculate the density of AoT sensor areas per Community Area.
First, we need to convert to a spatial project that preserves distance. We'll use EPSG:32616, which uses a unit of meters. See more about this project at: <a href>https://epsg.io/32616</a>.
```{r}
nodes.spt.32616 <- spTransform(nodes.spt, CRS("+init=epsg:32616"))
```
Next, we'll calculate 1 kilometer buffers (= 1,000 meters) for each node.
```{r}
buffers <- gBuffer(nodes.spt.32616, width = 1000, byid = TRUE)
```
Convert back to a projection that all layers have in common. We'll use the standard, EPSG.4326. We already used this at the beginning of the tutorial by default, but will denote in dataset name to avoid confusion. This extra step helps troubleshooting immensely!
```{r}
nodes.spt.4326 <- spTransform(nodes.spt, CRS("+init=epsg:4326"))
chiCA.4326 <- spTransform(chiCA, CRS("+init=epsg:4326"))
buffer.4326 <- spTransform(buffers, CRS("+init=epsg:4326"))
```
Map the buffers for inspection.
```{r}
tm_shape(chiCA.4326) + tm_borders() +
tm_shape(buffer.4326) + tm_borders(col = "blue") +
tm_shape(nodes.spt.4326) + tm_dots(col = "red")
```
Let's count all of the buffers per Community Area.
```{r}
chiCA.4326$count_buffers = rowSums(gIntersects(buffer.4326, chiCA.4326, byid = TRUE))
```
Finally, let's view the buffer density, overlaying the buffers for final confirmation and exploration.
```{r}
tm_shape(chiCA.4326) + tm_fill(col = "count_buffers", palette = "BuGn", style = "quantile",title = "AoT Density") + tm_shape(buffer.4326) + tm_borders(col = "blue")
```