Skip to content

njt/Forcible

Repository files navigation

Forcible News Aggregator

A personalized New Zealand news aggregator with intelligent content curation.

Overview

Forcible is a command-line tool that collects news from New Zealand sources, stores them in a local SQLite database, and provides intelligent analysis using LLMs to extract key facts, detect PR content, and personalize the news feed.

Features

  • Multi-source aggregation: Fetches news from Radio New Zealand RSS feeds
  • LLM-powered analysis: Uses OpenAI's structured outputs to analyze articles
    • Extracts key facts and statistics with importance scoring
    • Scores article relevance for NZ news interests
    • Detects potential PR-planted stories
    • Classifies content as "headline-only" or "clickthrough"
    • Generates concise summaries
  • Flexible configuration: Supports both INI and JSON config formats
  • SQLite storage: Local database for all articles and analysis
  • CLI interface: Easy-to-use command-line tools

Installation

  1. Install Python dependencies:
pip install -r requirements.txt
  1. Initialize the configuration:
python forcible.py init

This creates a config.ini file from the example template.

Alternatively, you can use JSON configuration:

python forcible.py --config config.json init
  1. Edit config.ini (or config.json) and add your OpenAI API key:

For INI format:

[openai]
api_key = your-actual-api-key-here

For JSON format:

{
  "openai": {
    "api_key": "your-actual-api-key-here"
  }
}

Usage

Fetch News Articles

Fetch articles from all configured sources:

python forcible.py fetch

Fetch only from Radio New Zealand:

python forcible.py fetch --source rnz

List Articles

List recent articles:

python forcible.py list

List articles from a specific source:

python forcible.py list --source rnz_national

Limit the number of articles shown:

python forcible.py list --limit 10

View Statistics

Show database statistics:

python forcible.py stats

Process Articles with LLM

Process unprocessed articles with LLM analysis:

python forcible.py process

Process a limited number of articles:

python forcible.py process --limit 10

Process a specific article by ID:

python forcible.py process --article-id 42

Show detailed analysis results:

python forcible.py process --verbose

View Article Analysis

View a specific article with its LLM analysis:

python forcible.py view 42

This displays the article headline, content, and structured LLM analysis including:

  • Key facts and statistics
  • Relevance score
  • PR probability assessment
  • Content classification
  • Summary

Architecture

Components

  • config.py: Configuration management (supports both INI and JSON formats)
  • database.py: SQLite database interface for storing articles
  • rnz_ingester.py: Radio New Zealand RSS feed ingester
  • llm_processor.py: LLM-based article analysis with structured outputs
  • forcible.py: Command-line interface

Database Schema

articles table:

  • id: Primary key
  • url: Unique article URL
  • source: Source identifier (e.g., 'rnz_national')
  • headline: Article headline
  • published_date: Publication date (ISO format)
  • fetched_date: Date fetched from source
  • content: Article content/summary
  • data: JSON field for LLM analysis results (facts, relevance, PR probability, etc.)
  • created_at: Record creation timestamp
  • updated_at: Last update timestamp

The data field contains structured LLM analysis:

{
  "key_facts": [
    {"fact": "...", "importance": 8}
  ],
  "relevance_score": 7,
  "pr_probability": 25,
  "content_classification": "headline-only",
  "summary": "...",
  "reasoning": "...",
  "processed_at": "2024-01-01T12:00:00"
}

source_tracking table:

  • source_name: Source identifier (primary key)
  • last_scraped: Last scrape timestamp
  • last_article_date: Most recent article date seen

Configuration

The configuration file (INI or JSON format) supports:

  • OpenAI API key: For LLM processing
  • LLM model: Model to use (default: gpt-4o-mini)
  • Prompts: Configurable prompts for different analysis tasks (legacy, not used with structured outputs)
  • Sources: RSS/Atom feed URLs for different news sources
  • Database: Database file path

You can use either config.ini (INI format) or config.json (JSON format). The JSON format provides better structure for complex configurations.

News Sources

Currently supported:

Radio New Zealand (RNZ)

  • National news
  • World news
  • Business news
  • Political news

All via RSS feeds.

Future Enhancements

  • Additional sources (Stuff, NZ Herald)
  • Web scraping for sources without RSS feeds
  • archive.is integration for paywall bypass
  • User preference management
  • Web interface for viewing personalized feed
  • Batch processing optimizations
  • Caching and rate limiting for LLM API calls

About

New Zealand news hackery

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •