Skip to content

Topic Tracker is a backend system that automatically tracks, summarizes, and delivers recent updates from YouTube and Reddit for any given topic. It helps users avoid manually scouring these platforms by providing compressed insights, summaries, and alerts.

Notifications You must be signed in to change notification settings

kunal534/Topic_Tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Topic Tracker

A backend system that periodically fetches and summarizes topic-related content from Reddit and YouTube.


✅ Features Implemented

🔍 Topic Data Collection

  • Reddit:

    • Fetches top post titles related to a given topic.
  • YouTube:

    • Uses RapidAPI (youtube138, youtube-v2) to fetch video metadata, subtitles, and top comments.
    • Stores video_id, title, channel, subtitles, and top_comments.

🧠 Summarization

  • Reddit:

    • Summarized using OpenAI GPT-4 (via OpenRouter).
    • Chunked input handled manually using tiktoken.
  • YouTube:

    • Uses youtube-video-summarizer-gpt-ai via RapidAPI to save OpenAI credits.

🗃️ MongoDB Storage

  • Collections: reddit_posts, youtube_videos
  • Fields: topic, source, video_id, subtitles, top_comments, summary, created_at, summarized_at

⚙️ Celery Tasks

  • fetch_topic_data(topic):

    • Fetches and stores raw Reddit and YouTube content for a given topic.
  • summarize_topic_data(topic):

    • Reddit content summarized using OpenAI.
    • YouTube content summarized using RapidAPI.

🔄 Switching Summarization Source

  • OpenAI used only for Reddit (due to token cost).
  • YouTube switched to youtube-video-summarizer-gpt-ai (RapidAPI).

🐞 Latest Issue

May 24, 2025

Task summarize_topic_data_chunks raised:
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Cause: `chunk_size` passed as a string instead of integer.
Fix: Ensure chunk_size is cast or passed as an integer when used.

🧪 Running the App

1. Start MongoDB and Redis

2. Run Celery Worker

celery -A app.tasks worker --loglevel=info

3. Trigger Tasks

from app.tasks import fetch_topic_data, summarize_topic_data
fetch_topic_data.delay("amazon SDE interview")
summarize_topic_data.delay("amazon SDE interview")

📌 Environment Variables Required

  • OPENROUTER_API_KEY
  • RAPIDAPI_KEY

⏭️ Next Steps

  • Move YouTube summarization API call to summarize_raw_data
  • Add deduplication for fetched videos and posts
  • Expose summaries via REST API
  • Add user-specific topic registration and history
  • Add retry/fallback logic for failed summarizations

🧠 Tech Stack

  • Python 3.13
  • MongoDB
  • Redis
  • Celery
  • OpenAI API (via OpenRouter)
  • RapidAPI endpoints

Last updated: May 24, 2025

About

Topic Tracker is a backend system that automatically tracks, summarizes, and delivers recent updates from YouTube and Reddit for any given topic. It helps users avoid manually scouring these platforms by providing compressed insights, summaries, and alerts.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages