Skip to content

Commit

Permalink
First commit
Browse files Browse the repository at this point in the history
  • Loading branch information
rcorrero committed Jul 4, 2020
0 parents commit dfba246
Show file tree
Hide file tree
Showing 12 changed files with 429 additions and 0 deletions.
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Text editor backups #
#######################
*~
*.pyc
*.pyo

# Irrelevant background files #
###############################
/meta/

# Personal files #
##################
/poisson/private/
11 changes: 11 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Copyright 2020 Richard Correro

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32 changes: 32 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
poisson — Richard Correro
==============================

poisson is a python module for small vessel detection in optical satellite imagery. This module provides a framework for training ship detection models and using them to identify vessels in satellite imagery from [planet](https://www.planet.com/).

This repository contains the module itself, a trained model and its associated files, working notes from the modules creation, and several papers relavant to vessel detection methods.

Repository Structure
------------
```
.
├── LICENSE
├── README.md
├── notes
│   ├── Panoptis\ ?\200\ Imagery\ Processing\ Pipeline.md
│   └── Poisson\ ?\200\ Development.md
├── papers
│   ├── remotesensing-10-00511.pdf
│   └── vessel_detect_survey.pdf
├── poisson
│   ├── panoptis
│   └── stropheus
└── setup.py
```

Support
-----------
Poisson was developed with the support of a research grant from the Stanford University Department of Statistics.

------------
Created by Richard Correro in 2020. Contact me at rcorrero at stanford dot edu
46 changes: 46 additions & 0 deletions notes/Panoptis ǀ Imagery Processing Pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: Panoptis | Imagery Processing Pipeline
created: '2020-06-25T20:23:26.524Z'
modified: '2020-07-02T21:08:49.188Z'
---

# Panoptis | Imagery Processing Pipeline

This is a working paper recording the development of _panoptis_, a satellite image processing pipeline designed for [poisson](https://github.com/rcorrero/poisson). πᾰνόπτηϛ means "all-seeing" in Attic Greek ([Woodhouse](http://artflsrv02.uchicago.edu/cgi-bin/efts/dicos/woodhouse_test.pl?keyword=^All-seeing,%20adj.)).

## Architecture
The image processing pipeline itself is written entirely in python using several third-party packages. As of now my intention is to build panoptis for use on a single machine until satisfactory performance is attained, at which point I will refactor the code, containerize it, and run it at scale using a container orchestration framework. Satellite image processing is [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel) in that the task may be separated by area of interest (AOI), image type, band, etc. When searching for objects on near-shore open ocean large areas need to be analyzed, and this means that any interesting applications require large-scale image processing. To run the code on several replicates the code may be structured such that each instance processes images linked to from a database containing the list of images to be processed. Once the statistics of interest are obtained from the imagery, these results may be written to another database containing the results from all replicates. __Update:__ Initial calculations suggest that a trained model running on a single machine should be able to handle the largest AOIs we will need to label. Containerization is likely unecessary, but this is not certain.


The pipeline consists of the following operations in order:
1. Accessing imagery from storage (Google Cloud Storage in my case)
2. Clipping the imagery to the AOI
3. Identifying land and other noise in the imagery (e.g. cloudcover)
4. Object detection using a set of techniques (so that objects of different sizes may be detected)
5. Object classification by size, shape, and other factors of interest
6. Writing detected object data to a database or dataframe

The basic statistics we need for each detected object are its
- Location (Lat/Long or other coordinate system)
- Time (timestamp of image in which the object is detected)
- Size
- Other classification data based on the above


## The Learning Problem
Achieving acceptable performance and generalizability requires framing vessel detection as a machine learning problem.

As described by [this survey](https://doi.org/10.1016/j.rse.2017.12.033) the learning workflow is
1. Mask land using a coastline shapefile
2. Correct environmental distortions in the images and mask any areas covered by thick clouds
3. Using image processing techniques identify potential vessels
4. Using a trained discriminator label candidate vessels as `vessel` or `not vessel`

Steps three and four may be combined into a single discriminator which takes corrected images as input. The decision of whether to combine these steps will be made based on the structure of previous vessel detection algorithms. If it seems possible to train discriminators with acceptable levels of performance which do not require candidate detection then I'll do that.

The logical avenue for development is toward deeper models, but there is likely wisdom in begining with a shallow model and developing the infrastructure necessary to train it – the training socket. Once this is built, model refinement, and importantly, the development of deeper models may proceed easily and at the same level of abstraction: model design and implementation. By abstracting away the finicky details of image preprocessing, I/O, etc., you can focus on designing models which yield better performance.

### Signal Source

[This dataset](https://www.iuii.ua.es/datasets/masati/) contains images of land and sea with seaborne vessels labeled with bounding boxes. [This dataset](https://www.kaggle.com/c/airbus-ship-detection/overview) contains roughly a quarter-million land and sea images with similar labels, of various resolutions and clearly gathered from several different imaging platforms. The latter lacks the clear catagorization which the former sports, but its being much larger makes it more attractive as a first training set.

21 changes: 21 additions & 0 deletions notes/Poisson ǀ Development.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: Poisson | Development
created: '2020-06-26T17:02:47.180Z'
modified: '2020-07-04T17:43:16.524Z'
---

# Poisson | Development

At the highest level this project encompasses the design, development, and implementation of a satellite imagery processing pipeline. This pipeline takes images of near-shore open seas and extracts statistics relevant to the study of the behavior of small- to medium-sized vessels and other objects. The focus is on small vessels because they are unlikely to use active reporting systems such as [VMS](https://en.wikipedia.org/wiki/Vessel_monitoring_system) or [AIS](). Consequently much of the illegal, unreported, and unregulated ([IUU](https://en.wikipedia.org/wiki/Illegal,_unreported_and_unregulated_fishing)) fishing activity globally is done by [smaller vessels](http://biblioimarpe.imarpe.gob.pe/bitstream/123456789/2328/1/THESIS%20final%20-%20post%20defense.pdf#page=159
).

## Architecture
The first milestone is to develop the satellite image processing pipeline, _panoptis_. The inputs to panoptis are raw satellite images (geotiff files). Panoptis processes these images, identifies vessels on the water, and creates a dateset containing vessel locations, sizes (length, width, area, bounding boxes, etc.) and timestamps associated with the time at which the image was captured. Panoptis can be thought of as three separate components strung together sequentially:

1. Image preprocessor
2. Vessel detector (the "model")
3. Data postprocessor

One and three are scaffolding which supports the main development, the model. The model itself is by far the most computationally complex part of poisson because it must identify vessels from raw satellite imagery.

To create and train a model with acceptable performance we need a training socket, called _stropheus_. This handles the I/O for the model, as well as hyperparameter selection, performance analysis, and report generation (describing the performance of the model).
Binary file added papers/remotesensing-10-00511.pdf
Binary file not shown.
Binary file added papers/vessel_detect_survey.pdf
Binary file not shown.
99 changes: 99 additions & 0 deletions poisson/panoptis/clip_imgs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
import os
import errno
import json
import gdal

from googleapiclient.http import MediaFileUpload

from google.cloud import storage

# Create the service client
from googleapiclient.discovery import build
from apiclient.http import MediaIoBaseDownload


GOOGLE_APPLICATION_CREDENTIALS = os.getenv('APPLICATION_CREDENTIALS')
BUCKET_NAME = os.getenv('BUCKET_NAME')
GEO_FILTER_PATH = os.getenv('GEO_FILTER_PATH')
PATH_PREFIX = os.getenv('PATH_PREFIX')
ORDER_ID = os.getenv('ORDER_ID')
ITEM_TYPE = os.getenv('ITEM_TYPE')
ITEM_ID_PATH = os.getenv('ITEM_ID_PATH')
DL_IMAGE_PATH = os.getenv('DL_IMAGE_PATH')
BAND_ID = os.getenv('BAND_ID')


def download_img(dl_path, id_num):
gcs_service = build('storage', 'v1')
if not os.path.exists(os.path.dirname(dl_path)):
try:
os.makedirs(os.path.dirname(dl_path))
except OSError as exc: # Guard against race condition
if exc.errno != errno.EEXIST:
raise
with open(dl_path, 'wb') as f:
# Download the file from the Google Cloud Storage bucket.
request = gcs_service.objects().get_media(bucket=BUCKET_NAME,
object=dl_path)
media = MediaIoBaseDownload(f, request)
print('Downloading image ', id_num, '...')
print('Download Progress: ')
done = False
while not done:
prog, done = media.next_chunk()
print(prog.progress())

print('Image ', id_num, ' downloaded.')
return dl_path


def clip_img(img, id_num):
img_cropped = img[:-4] + '_cropped.tif'
if not os.path.exists(os.path.dirname(img_cropped)):
try:
os.makedirs(os.path.dirname(img_cropped))
except OSError as exc: # Guard against race condition
if exc.errno != errno.EEXIST:
raise
print('Clipping image ', id_num, '...')
cmd = 'gdalwarp -of GTiff -cutline ' + GEO_FILTER_PATH + ' -crop_to_cutline '\
+ DL_IMAGE_PATH + img + ' ' + DL_IMAGE_PATH + img_cropped
response = os.system(cmd)
if response != 0:
raise RuntimeError('Clip command exited with nonzero status. Status: ' \
+ str(response))
return img_cropped


def upload_img(img_clipped, item_id, ul_path, BUCKET_NAME):
media = MediaFileUpload(img_clipped,
mimetype='image/tif',
resumable=True)

request = gcs_service.objects().insert(bucket=BUCKET_NAME,
name=ul_path,
media_body=media)

print('Uploading image ', id_num, '...')
response = None
while response is None:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, response = request.next_chunk()
print('Upload complete')
return response


if __name__ == '__main__':
inpath = r'' + PATH_PREFIX + ORDER_ID + '/' + ITEM_TYPE + '/'
with open(ITEM_ID_PATH) as f:
item_ids = f.read().splitlines()
for id_num, item_id in enumerate(item_ids):
dl_path = r'' + inpath + item_id + BAND_ID + '.tif'
ul_path = r'' + PATH_PREFIX + ORDER_ID + '/clipped/' \
+ ITEM_TYPE + '/' + item_id + BAND_ID + '.tif'
img = download_img(dl_path, id_num)
img_clipped = clip_img(img, id_num)
response = upload_img(img_clipped, item_id, ul_path, BUCKET_NAME)
#print(response)
print('Done.')
100 changes: 100 additions & 0 deletions poisson/panoptis/download_items.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
import json
import os
import pathlib
import time
import requests

from requests.auth import HTTPBasicAuth


PLANET_API_KEY = os.getenv('PL_API_KEY')
PLANET_USER = os.getenv('PL_USER')
PLANET_PASSWORD = os.getenv('PL_PASSWORD')
ORDER_NAME = os.getenv('ORDER_NAME')
ITEM_ID_PATH = os.getenv('ITEM_ID_PATH')
PATH_PREFIX = os.getenv('PATH_PREFIX')

GOOGLE_CREDENTIALS = os.getenv('APPLICATION_CREDENTIALS')
BUCKET_NAME = os.getenv('BUCKET_NAME')

orders_url = 'https://api.planet.com/compute/ops/orders/v2'
auth = HTTPBasicAuth(PLANET_API_KEY, '')
headers = {'content-type': 'application/json'}
user = PLANET_USER
password = PLANET_PASSWORD
name = ORDER_NAME
subscription_id = 0
item_type = "PSScene4Band" # Make env var
product_bundle = "analytic"
single_archive = False
archive_filename = "test_01"
bucket = BUCKET_NAME
path_prefix = PATH_PREFIX
email = True


def create_request(user, password, name, subscription_id, item_ids, item_type,
product_bundle, single_archive, archive_filename,
bucket, credentials, path_prefix, email):
request = {
"name": name,
"subscription_id": subscription_id,
"products": [
{
"item_ids": item_ids,
"item_type": item_type,
"product_bundle": product_bundle
}
],
"delivery": {
"single_archive": single_archive,
#"archive_filename": archive_filename,
"google_cloud_storage": {
"bucket": bucket,
"credentials": credentials,
"path_prefix": path_prefix
}
},
"notifications": {
"email": email
},
"order_type": "full"
}
return request


def place_order(request, auth):
response = requests.post(orders_url, data=json.dumps(request), auth=auth, headers=headers)
print("Response ok? ", response.ok)
order_id = response.json()['id']
print("Order id: ", order_id)
order_url = orders_url + '/' + order_id
return order_url


def poll_for_success(order_url, auth, num_loops=100):
print('Order status: ')
count = 0
while(count < num_loops):
count += 1
r = requests.get(order_url, auth=auth)
response = r.json()
state = response['state']
print(state)
end_states = ['success', 'failed', 'partial']
if state in end_states:
break
time.sleep(10)


if __name__ == '__main__':
with open(ITEM_ID_PATH) as f:
item_ids = f.read().splitlines()
with open(GOOGLE_CREDENTIALS) as f:
credentials = f.read()
request = create_request(user, password, name, subscription_id,
item_ids, item_type, product_bundle, single_archive,
archive_filename, bucket, credentials,
path_prefix, email)
order_url = place_order(request, auth)
poll_for_success(order_url, auth)
1 change: 1 addition & 0 deletions poisson/panoptis/make_cred_str.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
cat google_creds.json | base64 | tr -d '\n' > google_creds_str.txt
Loading

0 comments on commit dfba246

Please sign in to comment.