Skip to content

DEENUU1/ludus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ludus

High-performance ESPN Soccer Data Scraper

CI Python 3.13+ License: MIT Ruff uv

Fetch comprehensive soccer data from ESPN's API with intelligent caching, resume capability, and incremental exports.


Ludus Demo

InstallationQuick StartFeaturesDocumentation


What is Ludus?

Ludus scrapes soccer data from ESPN's public API and exports it to structured JSON and CSV files. Perfect for data analysis, machine learning projects, or building your own soccer database.

ESPN Soccer

ESPN provides comprehensive soccer data including leagues, teams, matches, lineups, statistics, and play-by-play events.

Why Ludus?

Feature Description
Resumable Interrupt anytime with Ctrl+C, resume where you left off
Incremental Data saved immediately as fetched, not cached in memory
Smart Caching HTTP responses cached to disk, skip redundant requests
Rate Limited Built-in delays and exponential backoff to respect API limits
Flexible Depth From metadata-only to complete play-by-play data
CSV Export Automatic export to analysis-ready CSV files

Installation

Requires Python 3.13+ and uv.

git clone https://github.com/vstorm-co/ludus.git
cd ludus
uv sync

Quick Start

# Interactive mode — select leagues and seasons from menu
uv run ludus

# Scrape Premier League 2024 season
uv run ludus --leagues eng.1 --seasons 2024

# Scrape multiple leagues with full match details
uv run ludus --variant FULL --leagues eng.1 esp.1 --seasons 2024 2023

# Check progress of interrupted scrape
uv run ludus --progress

Scraping Variants

Control how much data to fetch with --variant:

Variant Data Included Requests/League Use Case
MINIMAL Leagues, seasons ~10 Quick metadata fetch
BASIC + Teams, standings ~50 League tables
STANDARD + All matches ~400 Match results (default)
FULL + Rosters, statistics, odds ~2,000 Detailed match data
COMPLETE + Play-by-play ~10,000 Every action in matches
COMPLETE_ATHLETES + All athletes ~15,000 Full player database

Supported Leagues

League Slug Country
Premier League eng.1 England
La Liga esp.1 Spain
Bundesliga ger.1 Germany
Serie A ita.1 Italy
Ligue 1 fra.1 France
Eredivisie ned.1 Netherlands
MLS usa.1 USA
Champions League uefa.champions Europe
Europa League uefa.europa Europe
World Cup fifa.world International

Use interactive mode (uv run ludus) to discover 50+ available leagues including lower divisions and cup competitions.

Output Structure

data/output/
├── .progress.json          # Resume state
├── leagues/
│   └── {slug}.json
├── seasons/
│   └── {league}/{year}.json
├── teams/
│   └── {league}/{year}/{id}.json
├── standings/
│   └── {league}/{year}.json
├── events/
│   └── {league}/{year}/{id}.json
└── csv/
    ├── leagues.csv
    ├── teams.csv
    ├── standings.csv
    └── events.csv

CLI Reference

Usage: ludus [OPTIONS]

Options:
  -i, --interactive     Interactive mode (default if no options)
  --variant VARIANT     Scraping depth level
  --leagues SLUGS       League slugs (e.g., eng.1 esp.1 ger.1)
  --seasons YEARS       Season years (e.g., 2024 2023)
  --output-dir DIR      Output directory [default: data/output]
  --cache-dir DIR       Cache directory [default: data/cache]
  --no-cache            Disable HTTP caching
  --progress            Show scraping progress
  --reset               Reset progress and start fresh
  --export-csv          Export existing data to CSV
  -v, --verbose         Enable verbose logging

Features

Resumable Scraping

# Start scraping (Ctrl+C to interrupt)
uv run ludus --leagues eng.1 --seasons 2024

# Resume from where you left off
uv run ludus --leagues eng.1 --seasons 2024

# Check what's been completed
uv run ludus --progress

# Start fresh
uv run ludus --reset

Smart Caching

  • HTTP responses cached with MD5-hashed URLs
  • Hierarchical cache directory structure
  • Disable with --no-cache for fresh data

Rate Limiting

  • 0.5s delay between requests
  • 10s pause every 100 requests
  • Exponential backoff on server errors (5xx)
  • Graceful handling of missing data (404)

Development

# Install dev dependencies
uv sync --dev

# Run tests
uv run pytest

# Run tests with coverage
uv run pytest --cov

# Lint and format
uv run ruff check src tests
uv run ruff format src tests

# Type check
uv run mypy src

Architecture

CLI (cli.py)
    │
    ▼
SoccerScraper (scrapers/soccer.py)
    ├── ESPNScraper (scrapers/base.py) ─── HTTP client + caching
    ├── JSONStore (storage/json_store.py) ─── Incremental persistence
    ├── ProgressTracker (storage/tracker.py) ─── Resume state
    └── Pydantic Models (models/) ─── API schemas
    │
    ▼
CSVExporter (export/csv_export.py)

License

MIT © VStorm

About

High-performance ESPN Soccer Data Scraper with resumable scraping, smart caching, and CSV export

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages