-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathFrom Above Notebook
100 lines (67 loc) · 5 KB
/
From Above Notebook
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
title: "R Notebook for Satellite Data Analysis"
output:
html_document:
df_print: paged
---
Welcome to Team From Above's repository of work done during the JournalismAI 2021 Collab Challenge in the Americas earth_americas.
Background
For this year's challenge, participants from various newsrooms across the Americas came together to work with the Knight Lab team at Northwestern University exploring how we might use AI technologies to innovate newsgathering and investigative reporting techniques.
Team members
Gibran Mena (DataCritica)
Shreya Vaidyanathan (Bloomberg News)
David Ingold (Bloomberg News)
Flor Coehlo (LaNacion)
María Teresa Ronderos (CLIP)
First, we start by loading the libraries (packages) we need for this task.
```{r, include = F}
rm(list = ls(all.names = TRUE)) # will clear all objects, including hidden objects
gc() # free up memory and report memory usage
```
```{r include=T, message=FALSE}
# load required libraries (Note: if these packages are not installed, then install them first and then load)
library(sf)
library(stars)
library(ggplot2)
library(dplyr)
library(tidyr)
```
Then we read in and explore the satellite data matching our Area of Interest, previously downloaded from services suchs as Planet's NICFi or Sentinel data. Encoded in a satellite image, there is reflectance information encoded in blue, green, read and near infra red spectrums of an image such as the ones from the Planet program. Load the satellite image, in this case Planet's analytical tif file with four bands: G,R,B and Near Infrared (NIR)
```{r include=T, message=FALSE}
satellite_data <- read_stars("colombia2_analytic.tif", proxy = T, resample = "cubic_spline")
plot(satellite_data)
```
As you can see, there are five available bands for this image, which are already loaded into a stack (several images with exactly matching area and resolution, although different information) This different information layers will constitute the data that will feed the learning algorithm, once we match it with ourlabelling data.
We could read in several layers one by one and then stacking them together, but we did for this tutorial in the first example. To get to know better our information, we can read resolution, number and names of layers.
```{r include=T, message=TRUE}
st_crs(satellite_data)
st_dimensions(satellite_data)
```
But first, we can generate additional information, such as the normalized Vegetation Index (NDVI), usually brought up by land classification tasks. To find out what spectral band of an analytic image (as opposed to visual images downloaded from Planet) is represented by each layer, we can visit for this image https://www.planet.com/products/satellite-imagery/files/1610.06_Spec%20Sheet_Combined_Imagery_Product_Letter_ENGv1.pdf. We learn that the bands for our image are BGR NIR
```{r include=T, message=FALSE}
#Plotting true color composite
#par(col.axis="white",col.lab="white",tck=0)
#plot(satellite_data, r = 3, g = 2, b = 1, axes = TRUE,
#stretch = "lin", main = "Colombia's area of interest")
#box(col="white")
```
The false color composite uses NIR (4) for red, red (3) for green, and green (2) for blue. This is good for detecting vegetation
```{r include=T, message=FALSE}
#par(col.axis="white",col.lab="white",tck=0)
#plotRGB(satellite_data, r = 4, g = 3, b = 2, axes = TRUE, stretch = "lin", main = "False Color Composite of AOI")
#box(col="white")
```
Next we calculate Normalized Difference Vegetation Index (NDVI), per https://urbanspatial.github.io/classifying_satellite_imagery_in_R/
NDVI provides another way to identify vegetation and is calculated on a scale of -1 to 1, where values closer to 1 indicate more vegetative cover. The calculation is based on how pigments in vegetation absorb sunlight compared to other ground cover. We calculate NDVI using the following equation (NIR - Red)/(NIR + Red). The [[ ]] notation specifies the bands, in this case band 4 for NIR and band 3 for Red, within the multi-band raster.
```{r include=T, message=FALSE}
ndvi <- (planet_data[[4]] - planet_data[[3]])/(planet_data[[4]] + planet_data[[3]])
```
Now we need to label this data for what is called a supervised modelling approach, where we teach the machine how to integrate encoded information with "truth" ground information, the labels for the phenomena we need to understand better from the ground. These we achieved using a tool such as Groundwork (image of groundwork)
We read in the output data for the collaborative excercise from GroundWork, this is achieved pulling in a file in geojson format.
```{r include=T, message=FALSE}
labels <- st_read("export_data/labels/data/cd5e80ac-7548-490f-becd-06cc894b1e2f.geojson")
```
Now we create file for ML algorithm to learn from data, matching the raster (satellite image) with the labelling data, using the R package called stars, the function is called aggregate and can also be used within GIS programs like QGIS
```{r include=T, message=FALSE}
data_labels <- as.data.frame(aggregate(colombia_raster, labels_crs, FUN = mean))
```