Link to the project description
The problem that I had to work was to nowcast and forecast the emission of CO2 of various industries in order to build a sustainable and green investment portfolio. I had a large dataset, provided by RAM Active Investments (https://ram-ai.com/fr/), with data from ~ 10 thousands industries. The main entries in the data are the emissions of CO2 emissions (normalized by sales) at a time t_0 and t_1 and the 10-k reports, which are textual reports that the industries has to submit to the american government.
I had to derive the change in the emissions of CO2 based on the textual report. I had to build algorithms with NLP (Natural Language Processing) and derive insights with Machine Learning. It is a very difficult problem because the text report do not clearly say any evident informations about the emissions of CO2.