Skip to content

Training a model to forecast EU migration to Germany using online search behavior based on Google Trends

License

Notifications You must be signed in to change notification settings

bertelsmannstift/eu-migration-forecast

Repository files navigation

Migration Forecast EU

This is a repository of the code produced during the Project "Migration Forecast EU" by the Bertelsmann Foundation. Goal of the project was to use data from Google Trends to predict the number of registrations of EU nationals in Germany. It is basically composed of two components:

  • A python backend for data ingestion and transformation, particularly for Google Trends data using the official API
  • A selection of Jupyter notebooks for both exploratory analysis and training/evaluation of various ML regression models

Prerequisites

  • A valid Python installation (3.9 or newer)
  • Access to the official Google Trends API with a valid developer key (available for research institutions via request at Google)

First steps

Loading and preparing the data

  • Run python get_data.py to obtain the data from the Google Trends API (python get_data.py -h for more options)
  • Run python process_data.py and python process_registrations.py to process and transform the raw data both for Google Trends and the official registration statistic. They are now stored in the folder data/processed. (Processed data as of mid-2021 are already included in the repo)
  • You can now play around with the Jupyter notebooks in the notebooks-folder.

Some more info about the structure of the repo

  • /: the main folder contains all the python executables for loading and processing the data
  • /data: all the data (raw and processed) as well as config files with metadata
  • /data/config: config files describing the metadata, used for processing
  • /data/keywords: Excel files containing the keywords for which the Google Trends API is queried
  • /data/processed: contains the processed data after the processing scripts have been run
  • /data/raw: raw data
  • /data/raw/eurostat: macroeconomic indicators for EU countries, obtained from EUROSTAT (Excel)
  • /data/raw/registrations: monthly registrations of EU nationals in Germany, obtained from DESTATIS (Excel)
  • /data/raw/trends: raw data from the Google Trends API, generated by running get_data.py
  • /modules: various python modules containing utilities
  • /modules/eumf_custom_models.py: custom ML models (right now only a linear dummy model)
  • /modules/eumf_data.py: utility functions for loading and transforming data
  • /modules/eumf_eval.py: utility functions for evaluating ML performance
  • /modules/eumf_google_trends.py: higher level functions for generating the Google Trends API queries from the keywords and storing the data
  • /notebooks: Jupyter notebooks for analysis, ML training and evaluation
  • /notebooks/analysis: descriptive and diagnostic analysis
  • /notebooks/experiments: experiments and analysis with various forecasting/regression algorithms (probably the most important folder)
  • /notebooks/presentations: plots generated for workshop presentations
  • /notebooks/prototypes: playground for trying out things

Loose ends

  • The forecast algorithms are right now only trained for academic analysis, there is right now no code for deploying them in production (as in a dashboard, e.g.)
  • It would be nice to use the DESTATIS API instead of Excel files (same for EUROSTAT)
  • In the branch dev_db, there are alternative scripts to store the Google Trends data in a Postgres database instead of csv files. This has been tested rudimentarily but not integrated into the main branch and the notebooks yet.
  • Right now, access to the official Google Trends API is needed which is only provided for certain institutions especially in research. It would be nice if also the inofficial, publicly accessible pytrends (https://pypi.org/project/pytrends/) could be used as backend

Happy forecasting!

About

Training a model to forecast EU migration to Germany using online search behavior based on Google Trends

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published