Skip to content

This analytics challenge is to mine a dataset that provides text of nearly one million pieces of news articles and social media posts and comments with keywords related to climate risk and sustainability. The theme is Sustainable and Green Finance in Hong Kong.

Notifications You must be signed in to change notification settings

janecww/nlp-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 

Repository files navigation

Sustainability-related NLP Analytics Project

This analytics challenge is to mine a dataset that provides text of nearly one million pieces of news articles and social media posts and comments with keywords related to climate risk and sustainability. The theme is Sustainable and Green Finance in Hong Kong.

Summary

In view of the annoying trend of climate change, which has lasted for years due to its ever-worsening impact on ecosystems, sea levels and the frequency of natural disasters. We observe that our social lives and lifestyles have been forced to alter to tackle this great challenge. We start initiating our research with big data analytics, which combines statistics, data mining, natural language processing and machine learning approaches. We want to generate insights about ESG risks or opportunities and investigate the attitudes of Hong Kong people about sustainability. Dealing with climate change is a long battle, advancing the use of big data analytics helps to push forward sustainable investment in businesses so that we can deal with climate change more successfully and make ethical green financing decisions when coming to face climate risks in the future.

To commence with, we divided into city / climate level and investment level with four tasks in total. For city / climate level, we targeted to finish task 1 Manage Physical Climate Risk and task 2 Facilitate Green Finance. For investment level, we targeted to finish task 3 Stock Cluster Analysis and task 4 Stock Anomaly Analysis.

In conclusion, we hope to see the government and corporations make good use of big data analytic tools to turn climate risks into business opportunities. We believe that big data analytics could advance sustainable and green finance and inform investors to make responsible investment decisions and deal with one of the greatest global challenges - climate change.

Tasks Description (Codes)

  • Task 1: Manage Physical Climate Risk - we used GSF DATA SOURCE REPOSITORY dataset in task 1. There are four steps, namely counting, select, create and correlation. We aim to conduct a correlation analysis between the frequency of sustainability related keywords and the physical climate risks metrics to draw insights about ESG risks or opportunities with the use of visualization in data mining.

  • Task 2: Facilitate Green Finance - we used SOCIAL BIG DATA - TEXT DATA RELATED TO CLIMATE RISK AND SUSTAINABILITY IN HONG KONG dataset in task 2. We aim to draw a conclusion on Hong Kong’s attitude towards sustainability, namely positive, negative or neutral with the use of natural language processing and unsupervised classification in machine learning approaches.

  • Task 3: Stock Cluster Analysis - we used SOCIAL BIG DATA - TEXT DATA RELATED TO CLIMATE RISK AND SUSTAINABILITY IN HONG KONG and external dataset Yahoo Finance in task 3. We performed data pre-processing, hyperparameter tuning and model fitting. We aim to put stocks in S&P 500 Index and Hang Seng Index into groupings based on their ESG and financial performance through building an unsupervised k-means clustering model.

  • Task 4: Stock Anomaly Analysis - we used an external dataset Hang Seng Index and considered text and market performance over time in task 4. We performed text pre-processing, time series data pre-processing, anomaly detection using ARIMA model and anomaly and associated events analysis using topic modelling (latent dirichlet allocation). We aim to find out the events associated with the index anomalies to assist investors in devising smart investment strategies and determine good market timing.

About

This analytics challenge is to mine a dataset that provides text of nearly one million pieces of news articles and social media posts and comments with keywords related to climate risk and sustainability. The theme is Sustainable and Green Finance in Hong Kong.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published