- Development of a method to automatically extract funding amounts from funding programs.
- The problem can be framed as a Named Entity Recognition (NER) or information extraction task.
- The data basis consists of funding programs that are part of the German federal government's "Förderdatenbank" (funding database). These have been scraped and published here.
- The background is an attempt to quantify how much the German government spends on promoting democracy. As a first step, a classifier has already been developed to identify democracy funding programs. The next step is the extraction of funding amounts. You can find an article on the project (in German!) here.
- The data originates from the website: www.foerderdatenbank.de
- A description of the scraped dataset, as well as the link to the data, can be found here.
- An example of how the data can be read using Python is available here
- NER using the Python package spaCy.
- Fine-tuning language models like BERT.
- In-context learning with generative LLMs.
- The method should be evaluated using suitable metrics such as the F1 Score or Accuracy.
-
Install uv
-
uv sync
- [pdf Liste von im Jahr 2023 geförderten Organisationen aus Drucksache ] (https://dserver.bundestag.de/btd/20/102/2010233.pdf) s. 80 (Anlage 1)
- [csv of all NGOs dedicated to the goal of strengething democracy in Germany] (https://github.com/CorrelAid/h4sg25_cdl_challenge/data/ZER-Förderung des demokratischen Staatswesens-20250404.csv)
- extract list of NGOs and how much money they got from the government in 2023
- compare it to the list of NGOs
- calculate how high is the percentage of the whole amount that was dedicated to organizations with the goal of strengthening democracy