Skip to content

Commit effe525

Browse files
authored
Expand readme (#5)
Adds an extended readme.
2 parents ec34af1 + 54e6b4b commit effe525

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+332058
-9
lines changed

README.md

Lines changed: 67 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,75 @@
1-
# Flowbot pipeline template
1+
# Flowbot
2+
A Cookiecutter template for Flowbot repos. This is for instantiating and maintaining reporting data pipelines
23

3-
A Cookiecutter template for Flowbot repos.
4+
## Quick start: instantiating a new active crisis
45
Make sure you have a `.geojson` of your affected areas before you begin - it's helpful to have the path to it in your clipboard.
56
To instantiate:
6-
- Clone this repo
7-
- Install deps with `pipenv install`
8-
- Run Cookiecutter with `pipenv run cookiecutter .`
9-
- Answer the questions
7+
1. Clone this repo
8+
```bash
9+
git clone https://github.com/Flowminder/flowbot-pipeline-root.git
10+
cd flowbot-pipeline-root
11+
1. Install deps with `pipenv install`
12+
1. Run Cookiecutter with `bash run.sh . -o ..`
13+
1. Choose 'active crisis' for the first option
14+
1. Follow the prompts onscreen, pasting the path to the geojson when asked. Some prompts are not needed for the active crisis
15+
1. Copy the result to the relevant server's `dags` or `flowbot` folder; you should see the new pipeline in Airflow's interface
16+
1. Copy the replay file from `~/.cookiecutter_replays/{your_project_slug}` to this folder; this will support merging changes from the original root repo into your rendered repo.
17+
1. Optional but strongly advised: run `git init` to create a git repo for tracking changes to the new pipeline
1018

11-
It's recommended that you copy the replay file from `~/.cookiecutter_replays/{your_project_slug}` to this folder; this will support merging changes from the original root repo into your rendered repo.
19+
## Available pipelines
20+
A pipeline is a combination of a country and a report configiuration, used to render a new repository from the contents of `{{cookiecutter.__project_slug}}`. The rendering process populates the relevant `*_config.py` file and removes any irrelevant notebooks and dags.
1221

13-
## Dev note; appovaltest
22+
### Active crisis
23+
A pipeline for monitoring an upcoming or ongoing crisis. To define, you need:
24+
- The geographic area of the crisis; this is currently a polygon in EPSG:4326
25+
- A key start date to begin monitoring
26+
- An optional end date
27+
28+
This pipeline produces a daily active crisis report showing total and newly displaced subscribers from affected areas to host locations. For an example report, see <https://www.flowminder.org/resources/publications-reports/haiti-gang-violence-in-downtown-port-au-prince-mobility-situation-report-29-february-12-march-2024>
29+
30+
### Mobility insights
31+
A pipeline for producing monthly reports on national mobility trends.
32+
To define this pipeline, you need:
33+
- A reference date range - this will be a period that builds up a baseline for the rest of the reports to refer to.
34+
- A start date
35+
36+
37+
For more information, see <https://haiti.mobility-dashboard.org/files/HTI%20platform%20release%20documentation%20Nov%202024.pdf>
38+
39+
## Other contents
40+
Some key other parts of the repo:
41+
### flowbot_dataclasses
42+
A set of modules contain the specific configurations for a country and a report type. They are part of the pipenv used in rendering the pipelines.
43+
- CountryStaticData defines a dataclass for country-server specific configurations; for example, if there is a query to fetch the administrative geometry of a country from `flowdb`, it would live in here. If you are implementing existing pipelines on a new in-country server, start here.
44+
- ProjectStructure specifies which parts of `{{cookiecutter._project_slug}}` are kept for a given pipeline, and which `*_config.py` is to be rendered. If you are implementing a new pipeline, start here.
45+
### hooks
46+
The `cookiecutter` hooks for populating the repo.
47+
- `pre_gen_project` is some simple validation rules
48+
- `post_get_project` is the machinery that implements the logic defined in `flowbot_dataclasses`
49+
### local_extensions
50+
Defines a set of custom Jinja filters for accessing `flowbot_dataclasses`
51+
52+
## Populated repo contents
53+
54+
### *_config.py
55+
This is the only Python file populated by the Cookiecutter template (the only other file populated is the README). This is imported into the DAG - if you need to make changes to an existing pipeline, this is where you should start looking.
56+
57+
### dags
58+
Contains the DAGs associated with this pipeline. Although all piplines currently consist of a single DAG, this may change in the future and is not an assumption you can rely on. If you are extending or modifying a DAG, it is recommended that you add any configuration variable (dates, static file names, region lists, ect) to the pipeline's `config.py`.
59+
60+
### notebooks
61+
Contains the parameterised notebooks and associated helper libraries that form the bulk of the pipeline processing via `flowpyter-task`. These should be as generic as possible and take any date ranges, static data sources or similar from the `config.py` file via a `params` cell. For further information, see <https://github.com/Flowminder/flowpyter-task>
62+
63+
### static
64+
Contains artefacts that are unchanging between dagruns, but may be changed by users. The current two use cases for this are static data files that contain population estimates and the Jinja templates used for reports.
65+
66+
### manual
67+
Room for information that must be added manually per dagrun. The main case for this is the key observations, but it is also used to provide references to previous reports for the back matter.
68+
69+
### data and executed_notebooks
70+
These folders are populated by `flowpyter-task` tasks - they exist, but should be left empty.
71+
72+
## Development
1473
There is a simple approvaltest in the .circleci dir. If you make a change to the structure of the repo, run
1574
```
1675
pipenv run cookiecutter . --replay-file .circleci/test-replay.json -o .circleci/approved -f
Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,6 @@
11
*
2-
!.gitignore
2+
!.gitignore
3+
!templates
4+
!templates/**
5+
!data
6+
!data/**

{{ cookiecutter.__project_slug }}/static/data/drc_pop_estimates.csv

Whitespace-only changes.

{{ cookiecutter.__project_slug }}/static/data/ghana_pop_estimates.csv

Whitespace-only changes.

0 commit comments

Comments
 (0)