Unveiling Climate Change Discourse: Unsupervised Sentiment Analysis of Global News Media

🎯 Overview

This research quantifies latent media bias in global climate discourse using a multi-faceted unsupervised NLP pipeline. We leverage both lexicon-based models and advanced zero-shot classifiers to gauge sentiment without requiring pre-labeled data. Thematic undercurrents are unearthed using transformer-based topic modeling (BERTopic) to cluster articles by semantic meaning. Bias is then calculated as the sentiment deviation against a dynamic, regional-topical baseline, allowing for robust peer-to-peer comparison. Finally, we employ statistical changepoint detection to identify significant shifts in reporting, correlating them with major world events.

✨ Key Highlights

Unsupervised NLP Pipeline: Reveals media bias in global climate reporting without manual labeling.
Hybrid Sentiment Analysis: Combines Zero-Shot Transformer inference with VADER sentiment scoring.
Thematic Discovery: Utilizes BERTopic to identify nuanced topics like "Renewable Energy" vs. "Natural Disasters."
Bias Normalization: Introduces unique baseline scores for fair cross-regional comparisons.
Temporal Analysis: Identifies event-driven shifts in news tone using PELT changepoint detection.

🛠️ Technology Stack

Sentiment: VADER, HuggingFace Zero-Shot Classification (BART/BERT).
Topic Modeling: BERTopic (Transformer-based embeddings).
Analysis: ruptures (Statistical Changepoint Detection), pandas, scikit-learn.
Visualization: matplotlib, seaborn, plotly.

📂 Project Structure

The project is organized into a modular pipeline where main.py orchestrates the flow from raw data to final visualizations.

climate-news-analysis/
├── data/                       # Raw news articles (aljazeera.jsonl, bbc.jsonl, etc.)
├── src/                        # Source code modules
│   ├── ingest.py               # Loads all articles from data folder
│   ├── preprocess.py           # Cleans text, parses dates, and handles deduplication
│   ├── sentiment.py            # Applies VADER sentiment scoring
│   ├── topics.py               # Implements BERTopic modeling and info extraction
│   ├── aggregate.py            # Groups data by region, time, source, and bias
│   ├── visualize.py            # Generates all PNG plots and timelines
│   ├── reports.py              # Logic for generating text-based analysis reports
│   └── utils.py                # Helper functions for saving/loading data
├── outputs/                    # Processed datasets and visual reports
│   ├── reports/                # Final visual and text outputs
│   │   ├── regional_comparisons/ # Plots comparing climate narratives by region
│   │   ├── source_timelines/     # Sentiment trends for individual news outlets
│   │   ├── bias_report.txt       # Quantified media bias analysis
│   │   └── topic_info.csv        # Metadata for discovered themes
│   ├── final_data.parquet      # Merged dataset with all scores and topics
│   └── processed.parquet       # Intermediate cleaned dataset
├── main.py                     # Entry point to run the entire pipeline
├── requirements.txt            # Project dependencies
└── README.md                   # Documentation

🚀 Getting Started

Clone the repo:

git clone https://github.com/YOUR_USERNAME/climate-discourse-analysis.git

Install dependencies:

pip install -r requirements.txt

Run the pipeline:

The entire research workflow is automated. Run the following command to execute ingestion, sentiment analysis, topic modeling, and visualization in one go:

python main.py

👥 Authors

Aditya Vasudev K
Ananya Vinay

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
climate-news-analysis		climate-news-analysis
climate-news-db-dataset		climate-news-db-dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unveiling Climate Change Discourse: Unsupervised Sentiment Analysis of Global News Media

🎯 Overview

✨ Key Highlights

🛠️ Technology Stack

📂 Project Structure

🚀 Getting Started

Run the pipeline:

👥 Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unveiling Climate Change Discourse: Unsupervised Sentiment Analysis of Global News Media

🎯 Overview

✨ Key Highlights

🛠️ Technology Stack

📂 Project Structure

🚀 Getting Started

Run the pipeline:

👥 Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages