Migration Forecast EU

This is a repository of the code produced during the Project "Migration Forecast EU" by the Bertelsmann Foundation. Goal of the project was to use data from Google Trends to predict the number of registrations of EU nationals in Germany. It is basically composed of two components:

A python backend for data ingestion and transformation, particularly for Google Trends data using the official API
A selection of Jupyter notebooks for both exploratory analysis and training/evaluation of various ML regression models

Prerequisites

A valid Python installation (3.9 or newer)
Access to the official Google Trends API with a valid developer key (available for research institutions via request at Google)

First steps

Clone repository and open main folder in terminal (Windows users may use GitBash or WSL)
Create a new conda environment (https://conda.io/projects/conda/en/latest/user-guide/getting-started.html#managing-environments) or venv (https://docs.python.org/3/library/venv.html) and activate it
Install required packages: pip install -r requirements.txt
Create a .env-file with your Google API key: echo "GOOGLE_DEVELOPER_KEY=insert_your_developer_key_here" > .env

Loading and preparing the data

Run python get_data.py to obtain the data from the Google Trends API (python get_data.py -h for more options)
Run python process_data.py and python process_registrations.py to process and transform the raw data both for Google Trends and the official registration statistic. They are now stored in the folder data/processed. (Processed data as of mid-2021 are already included in the repo)
You can now play around with the Jupyter notebooks in the notebooks-folder.

Some more info about the structure of the repo

/: the main folder contains all the python executables for loading and processing the data
/data: all the data (raw and processed) as well as config files with metadata
/data/config: config files describing the metadata, used for processing
/data/keywords: Excel files containing the keywords for which the Google Trends API is queried
/data/processed: contains the processed data after the processing scripts have been run
/data/raw: raw data
/data/raw/eurostat: macroeconomic indicators for EU countries, obtained from EUROSTAT (Excel)
/data/raw/registrations: monthly registrations of EU nationals in Germany, obtained from DESTATIS (Excel)
/data/raw/trends: raw data from the Google Trends API, generated by running get_data.py
/modules: various python modules containing utilities
/modules/eumf_custom_models.py: custom ML models (right now only a linear dummy model)
/modules/eumf_data.py: utility functions for loading and transforming data
/modules/eumf_eval.py: utility functions for evaluating ML performance
/modules/eumf_google_trends.py: higher level functions for generating the Google Trends API queries from the keywords and storing the data
/notebooks: Jupyter notebooks for analysis, ML training and evaluation
/notebooks/analysis: descriptive and diagnostic analysis
/notebooks/experiments: experiments and analysis with various forecasting/regression algorithms (probably the most important folder)
/notebooks/presentations: plots generated for workshop presentations
/notebooks/prototypes: playground for trying out things

Loose ends

The forecast algorithms are right now only trained for academic analysis, there is right now no code for deploying them in production (as in a dashboard, e.g.)
It would be nice to use the DESTATIS API instead of Excel files (same for EUROSTAT)
In the branch dev_db, there are alternative scripts to store the Google Trends data in a Postgres database instead of csv files. This has been tested rudimentarily but not integrated into the main branch and the notebooks yet.
Right now, access to the official Google Trends API is needed which is only provided for certain institutions especially in research. It would be nice if also the inofficial, publicly accessible pytrends (https://pypi.org/project/pytrends/) could be used as backend

Happy forecasting!

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
data		data
modules		modules
notebooks		notebooks
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
get_data.py		get_data.py
logging.conf		logging.conf
poetry.lock		poetry.lock
process_data.py		process_data.py
process_registrations.py		process_registrations.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Migration Forecast EU

Prerequisites

First steps

Loading and preparing the data

Some more info about the structure of the repo

Loose ends

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

bertelsmannstift/eu-migration-forecast

Folders and files

Latest commit

History

Repository files navigation

Migration Forecast EU

Prerequisites

First steps

Loading and preparing the data

Some more info about the structure of the repo

Loose ends

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages