ProblemExplorer: Interactive Visual Analysis of Reddit Problem Posts

Overview

ProblemExplorer is a tool for analyzing and visualizing problem posts from Reddit. It uses word embeddings and UMAP for dimensionality reduction to create clusters of posts and generate word clouds for annotating the clusters. This allows for easy identification of relevant issues.

Features

Data Retrieval from Reddit: Fetch problem posts from Reddit using async webscraping with rotating proxies.
Word Embeddings: Use pre-trained word embeddings (e.g., GloVe) to represent posts.
Clustering: Group similar problem posts to identify common themes.
Dimensionality Reduction: Apply UMAP to reduce dimensions for visualization.
Word Cloud Annotation: Generate word clouds to visually represent frequent terms within clusters.
Customizable Visualization: Adjust visualization and top posts with filters.

Installation

To run the project locally, follow these steps:

Clone the repository:

git clone https://github.com/louis.zk/ProblemExplorer.git
cd problemexplorer

Install the dependencies:
```
pip install -r requirements.txt
```
Download Glove.6B: https://nlp.stanford.edu/data/glove.6B.zip (50 Dimensions) and save it under glove.6B/glove.6B.50d.txt
Set OpenAI API credentials: (Only needed if you want summaries of the problems and/or ideas to solve them) Adjust api_key.json
Start the application:
```
py Problemexplorer.py
```
After a while, it runs locally under http://127.0.0.1:8050/ in your webbrowser.

Usage

Run crawling_reddit_async.py to fetch more Reddit posts from categories of your choice.
Choose these categories in the Sunburst
Choose Subreddits you want to analyse further
Explore Problems

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
get_data		get_data
problems		problems
GPTsummary.py		GPTsummary.py
Problemexplorer.py		Problemexplorer.py
README.md		README.md
api_key.json		api_key.json
crawling_reddit_async.py		crawling_reddit_async.py
cyto.py		cyto.py
filtered_categories.json		filtered_categories.json
getting_proxies.py		getting_proxies.py
leaf_categories.json		leaf_categories.json
problem_texts.py		problem_texts.py
requirements.txt		requirements.txt
visualize_categories.py		visualize_categories.py
visualize_network.py		visualize_network.py
visualize_subs.py		visualize_subs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ProblemExplorer: Interactive Visual Analysis of Reddit Problem Posts

Overview

Features

Installation

Usage

Screenshots

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Louiszk/ProblemExplorer

Folders and files

Latest commit

History

Repository files navigation

ProblemExplorer: Interactive Visual Analysis of Reddit Problem Posts

Overview

Features

Installation

Usage

Screenshots

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages