Skip to content

1. Introduction

rebecamoreno edited this page Aug 18, 2019 · 1 revision

Challenge

Project Jetson started mid-year 2017 with two operations, UNHCR Somalia and UNHCR Ethiopia Dollo Ado, Melkadida sub-office, were concerned about the ongoing drought conditions in Somalia exacerbating forced displacement. Alongside with a history of protracted conflict Somalia experienced similar drought conditions as the year 2011, were operational emergency response was surpassed by the amount of people displaced.

Particularly, Somalia Operation reached out to UNHCR Innovation Service to be able to see if there was a way to predict forced displacement. Originally, the challenge posed by the operation was to help them building scenarios with different displacement figures, regardless of UNHCR's persons of concern (PoCs) displacement reasons. The Innovation Service explored a creative way to solve their operational challenge away from conventional methods: the use of artificial intelligence (A.I.) particularly, machine learning (ML) for predictive analytics.

Problems related to the challenge

Some of the problems related to the challenge to work on were:

  • Access to historical humanitarian data, at least 7 years of it
  • Access to open data (machine-readable, time-series formatted)
  • Lack of humanitarian access difficult the availability of statistically significant data on forced displacement (read: Limitations)
  • Lack of research literature and/or academic work on predictive analytics applied into forced displacement with ML. Many of the research work on ML is in migration, which is completely different from forced displacement, in terms of assumptions and influential variables.

Rationale

Project Jetson implementation aimed to support the following goals, which will later become the main metrics for success of the project:

  • Operational Response: Help two (2) UNHCR Operations make evidence-based decisions and adequately plan/prepare for contingencies.
  • Innovation in Humanitarian Data Research: Set a precedent for this type of predictive analytics research work in humanitarian sector, particularly in forced displacement and population flow.
  • Humanitarian Knowledge-sharing and Coordination: Highlight the relevance of open data, including the relevance of coordination in the compilation data from different partners, as well as cross-knowledge sharing.

Research questions

We divide our research questions as posed by the different teams. The research questions would guide the design of the experiments.

From UNHCR Somalia Operation (IDPs)

  1. Where are they moving? = categorical variable
  2. When are they arriving? = numerical variable
  3. How many people are moving? = numerical variable

From UNHCR Ethiopia - Dollo Ado, Melkadida sub-office (refugees)

Given 2017 conditions (drought/conflict) and with prior institutional memory of displacement from the year 2011 with similar conditions:

  1. Are we going to receive the same amount of refugees we received in 2011? = numerical variable

UNHCR Innovation Service + Partners

  1. How far in the future can we reliably predict?
  2. To what extent does information from other regions help predict the focal region?
  3. How does having data on climate, conflict, etc. improve prediction, relative to having historical arrivals data?
  4. What is the right structure for incorporating historical lags?
  5. What are key drivers of displacement?

Assumptions

Challenge assumptions

  • “PoCs in Somalia are fleeing from conflict areas” Ways to initially test it: a) Desktop research with literature review on historical conflict in Somalia and b) Data on conflict (ACLED) aggregated graph trend.
  • “PoCs movement is also affected by external factors (e.g. drought/floods)” Ways to test it: a) Focus groups with PoCs and b) Interviews with different operational partners and UNHCR operations (both Melkadida sub-office and Somalia)
  • “PoCs in Somalia are going to places where humanitarian assistance is being provided” Ways to initially test it: a) Desktop research on data on humanitarian assistance and b) Semi-structured interviews (phone calls) with partners and UNHCR operations on their activities in certain regions of Somalia.

Ideation phase assumptions (solution building)

  • “A machine/computer program could help us predict how many people are going to be arriving/moving to a particular region” Ways to initially test it: a) Desktop research: academic papers on machine learning and migration (see additional resources section)

Open Data

The main central component of a predictive analytics project is open data. Open data as defined by Open Data Handbook is data that is legally open and technically open.

In the case of Jetson, legally open data refers to the datasets that partners share either:

  • Publicly on their websites with a well-documented API (e.g. ACLED API or in a humanitarian data broker website (e.g. OCHA HDX)
  • Bilaterally shared, derived from the principle of international cooperation (e.g. operational data shared on a joint-service or research between UN System agencies)
  • Via formal/legal data sharing agreement (e.g. Memorandum of Understanding, MoU or similar data sharing agreements).

Technically open data for the project is data that is a machine-readable format (ideally a tabular format, csv, json, xls) and/or that has a public API to be extracted automatically, with a certain time periodicity which, in the case of Project Jetson is every month.

Partnerships Relevance

Having these two main components is difficult in areas where there is no humanitarian access due to violent conflict or other reasons or that time/resource capacity to collect is limited or completely non-existent. Therefore ground truth data is difficult to gather as well as timely and systematic data sharing. These are some of the greatest data challenges of the humanitarian sector vis-à-vis other sectors to have datasets ready for predictive analysis research work.

This is why it is important to rely on partnerships, particularly for those non-traditional datasets for the humanitarian sector. This is the case for datasets coming from the development sector (e.g. climate-related and market-related data). Our data providers and partners have been key in the development of this experiment by providing us with either timely and open data or by publishing their data in their respective websites. Additionally, they have provided us with advice on improvements on methodology and techniques to conduct our experiment.

Data Protection

Additionally, and given the nature of our organizational mandate, all the datasets used in this experiment have been anonymized and aggregated in order to comply with UNHCR Data Protection Policy. For this reason, the scripts and API calls are only referring to the anonymized and aggregated copy of the original datasets.

The original datasets - as provided by the partners with their respective data protection clauses - are intact and stored in a UNHCR internal corporate repositories. They can be deleted at any time by the request of the respective data providers. More on Jetson term of use are here (link to terms of use).