|
1 | | -# Flowbot pipeline template |
| 1 | +# Flowbot |
| 2 | +A Cookiecutter template for Flowbot repos. This is for instantiating and maintaining reporting data pipelines |
2 | 3 |
|
3 | | -A Cookiecutter template for Flowbot repos. |
| 4 | +## Quick start: instantiating a new active crisis |
4 | 5 | Make sure you have a `.geojson` of your affected areas before you begin - it's helpful to have the path to it in your clipboard. |
5 | 6 | To instantiate: |
6 | | -- Clone this repo |
7 | | -- Install deps with `pipenv install` |
8 | | -- Run Cookiecutter with `pipenv run cookiecutter .` |
9 | | -- Answer the questions |
| 7 | +1. Clone this repo |
| 8 | + ```bash |
| 9 | + git clone https://github.com/Flowminder/flowbot-pipeline-root.git |
| 10 | + cd flowbot-pipeline-root |
| 11 | +1. Install deps with `pipenv install` |
| 12 | +1. Run Cookiecutter with `bash run.sh . -o ..` |
| 13 | +1. Choose 'active crisis' for the first option |
| 14 | +1. Follow the prompts onscreen, pasting the path to the geojson when asked. Some prompts are not needed for the active crisis |
| 15 | +1. Copy the result to the relevant server's `dags` or `flowbot` folder; you should see the new pipeline in Airflow's interface |
| 16 | +1. Copy the replay file from `~/.cookiecutter_replays/{your_project_slug}` to this folder; this will support merging changes from the original root repo into your rendered repo. |
| 17 | +1. Optional but strongly advised: run `git init` to create a git repo for tracking changes to the new pipeline |
10 | 18 |
|
11 | | -It's recommended that you copy the replay file from `~/.cookiecutter_replays/{your_project_slug}` to this folder; this will support merging changes from the original root repo into your rendered repo. |
| 19 | +## Available pipelines |
| 20 | +A pipeline is a combination of a country and a report configiuration, used to render a new repository from the contents of `{{cookiecutter.__project_slug}}`. The rendering process populates the relevant `*_config.py` file and removes any irrelevant notebooks and dags. |
12 | 21 |
|
13 | | -## Dev note; appovaltest |
| 22 | +### Active crisis |
| 23 | +A pipeline for monitoring an upcoming or ongoing crisis. To define, you need: |
| 24 | +- The geographic area of the crisis; this is currently a polygon in EPSG:4326 |
| 25 | +- A key start date to begin monitoring |
| 26 | +- An optional end date |
| 27 | + |
| 28 | +This pipeline produces a daily active crisis report showing total and newly displaced subscribers from affected areas to host locations. For an example report, see <https://www.flowminder.org/resources/publications-reports/haiti-gang-violence-in-downtown-port-au-prince-mobility-situation-report-29-february-12-march-2024> |
| 29 | + |
| 30 | +### Mobility insights |
| 31 | +A pipeline for producing monthly reports on national mobility trends. |
| 32 | +To define this pipeline, you need: |
| 33 | +- A reference date range - this will be a period that builds up a baseline for the rest of the reports to refer to. |
| 34 | +- A start date |
| 35 | + |
| 36 | + |
| 37 | +For more information, see <https://haiti.mobility-dashboard.org/files/HTI%20platform%20release%20documentation%20Nov%202024.pdf> |
| 38 | + |
| 39 | +## Other contents |
| 40 | +Some key other parts of the repo: |
| 41 | +### flowbot_dataclasses |
| 42 | +A set of modules contain the specific configurations for a country and a report type. They are part of the pipenv used in rendering the pipelines. |
| 43 | +- CountryStaticData defines a dataclass for country-server specific configurations; for example, if there is a query to fetch the administrative geometry of a country from `flowdb`, it would live in here. If you are implementing existing pipelines on a new in-country server, start here. |
| 44 | +- ProjectStructure specifies which parts of `{{cookiecutter._project_slug}}` are kept for a given pipeline, and which `*_config.py` is to be rendered. If you are implementing a new pipeline, start here. |
| 45 | +### hooks |
| 46 | +The `cookiecutter` hooks for populating the repo. |
| 47 | +- `pre_gen_project` is some simple validation rules |
| 48 | +- `post_get_project` is the machinery that implements the logic defined in `flowbot_dataclasses` |
| 49 | +### local_extensions |
| 50 | +Defines a set of custom Jinja filters for accessing `flowbot_dataclasses` |
| 51 | + |
| 52 | +## Populated repo contents |
| 53 | + |
| 54 | +### *_config.py |
| 55 | +This is the only Python file populated by the Cookiecutter template (the only other file populated is the README). This is imported into the DAG - if you need to make changes to an existing pipeline, this is where you should start looking. |
| 56 | + |
| 57 | +### dags |
| 58 | +Contains the DAGs associated with this pipeline. Although all piplines currently consist of a single DAG, this may change in the future and is not an assumption you can rely on. If you are extending or modifying a DAG, it is recommended that you add any configuration variable (dates, static file names, region lists, ect) to the pipeline's `config.py`. |
| 59 | +
|
| 60 | +### notebooks |
| 61 | +Contains the parameterised notebooks and associated helper libraries that form the bulk of the pipeline processing via `flowpyter-task`. These should be as generic as possible and take any date ranges, static data sources or similar from the `config.py` file via a `params` cell. For further information, see <https://github.com/Flowminder/flowpyter-task> |
| 62 | +
|
| 63 | +### static |
| 64 | +Contains artefacts that are unchanging between dagruns, but may be changed by users. The current two use cases for this are static data files that contain population estimates and the Jinja templates used for reports. |
| 65 | +
|
| 66 | +### manual |
| 67 | +Room for information that must be added manually per dagrun. The main case for this is the key observations, but it is also used to provide references to previous reports for the back matter. |
| 68 | +
|
| 69 | +### data and executed_notebooks |
| 70 | +These folders are populated by `flowpyter-task` tasks - they exist, but should be left empty. |
| 71 | +
|
| 72 | +## Development |
14 | 73 | There is a simple approvaltest in the .circleci dir. If you make a change to the structure of the repo, run |
15 | 74 | ``` |
16 | 75 | pipenv run cookiecutter . --replay-file .circleci/test-replay.json -o .circleci/approved -f |
|
0 commit comments