High-performance ESPN Soccer Data Scraper
Fetch comprehensive soccer data from ESPN's API with intelligent caching, resume capability, and incremental exports.
Ludus scrapes soccer data from ESPN's public API and exports it to structured JSON and CSV files. Perfect for data analysis, machine learning projects, or building your own soccer database.
ESPN provides comprehensive soccer data including leagues, teams, matches, lineups, statistics, and play-by-play events.
| Feature | Description |
|---|---|
| Resumable | Interrupt anytime with Ctrl+C, resume where you left off |
| Incremental | Data saved immediately as fetched, not cached in memory |
| Smart Caching | HTTP responses cached to disk, skip redundant requests |
| Rate Limited | Built-in delays and exponential backoff to respect API limits |
| Flexible Depth | From metadata-only to complete play-by-play data |
| CSV Export | Automatic export to analysis-ready CSV files |
Requires Python 3.13+ and uv.
git clone https://github.com/vstorm-co/ludus.git
cd ludus
uv sync# Interactive mode — select leagues and seasons from menu
uv run ludus
# Scrape Premier League 2024 season
uv run ludus --leagues eng.1 --seasons 2024
# Scrape multiple leagues with full match details
uv run ludus --variant FULL --leagues eng.1 esp.1 --seasons 2024 2023
# Check progress of interrupted scrape
uv run ludus --progressControl how much data to fetch with --variant:
| Variant | Data Included | Requests/League | Use Case |
|---|---|---|---|
MINIMAL |
Leagues, seasons | ~10 | Quick metadata fetch |
BASIC |
+ Teams, standings | ~50 | League tables |
STANDARD |
+ All matches | ~400 | Match results (default) |
FULL |
+ Rosters, statistics, odds | ~2,000 | Detailed match data |
COMPLETE |
+ Play-by-play | ~10,000 | Every action in matches |
COMPLETE_ATHLETES |
+ All athletes | ~15,000 | Full player database |
| League | Slug | Country |
|---|---|---|
| Premier League | eng.1 |
England |
| La Liga | esp.1 |
Spain |
| Bundesliga | ger.1 |
Germany |
| Serie A | ita.1 |
Italy |
| Ligue 1 | fra.1 |
France |
| Eredivisie | ned.1 |
Netherlands |
| MLS | usa.1 |
USA |
| Champions League | uefa.champions |
Europe |
| Europa League | uefa.europa |
Europe |
| World Cup | fifa.world |
International |
Use interactive mode (
uv run ludus) to discover 50+ available leagues including lower divisions and cup competitions.
data/output/
├── .progress.json # Resume state
├── leagues/
│ └── {slug}.json
├── seasons/
│ └── {league}/{year}.json
├── teams/
│ └── {league}/{year}/{id}.json
├── standings/
│ └── {league}/{year}.json
├── events/
│ └── {league}/{year}/{id}.json
└── csv/
├── leagues.csv
├── teams.csv
├── standings.csv
└── events.csv
Usage: ludus [OPTIONS]
Options:
-i, --interactive Interactive mode (default if no options)
--variant VARIANT Scraping depth level
--leagues SLUGS League slugs (e.g., eng.1 esp.1 ger.1)
--seasons YEARS Season years (e.g., 2024 2023)
--output-dir DIR Output directory [default: data/output]
--cache-dir DIR Cache directory [default: data/cache]
--no-cache Disable HTTP caching
--progress Show scraping progress
--reset Reset progress and start fresh
--export-csv Export existing data to CSV
-v, --verbose Enable verbose logging
# Start scraping (Ctrl+C to interrupt)
uv run ludus --leagues eng.1 --seasons 2024
# Resume from where you left off
uv run ludus --leagues eng.1 --seasons 2024
# Check what's been completed
uv run ludus --progress
# Start fresh
uv run ludus --reset- HTTP responses cached with MD5-hashed URLs
- Hierarchical cache directory structure
- Disable with
--no-cachefor fresh data
- 0.5s delay between requests
- 10s pause every 100 requests
- Exponential backoff on server errors (5xx)
- Graceful handling of missing data (404)
# Install dev dependencies
uv sync --dev
# Run tests
uv run pytest
# Run tests with coverage
uv run pytest --cov
# Lint and format
uv run ruff check src tests
uv run ruff format src tests
# Type check
uv run mypy srcCLI (cli.py)
│
▼
SoccerScraper (scrapers/soccer.py)
├── ESPNScraper (scrapers/base.py) ─── HTTP client + caching
├── JSONStore (storage/json_store.py) ─── Incremental persistence
├── ProgressTracker (storage/tracker.py) ─── Resume state
└── Pydantic Models (models/) ─── API schemas
│
▼
CSVExporter (export/csv_export.py)
MIT © VStorm
