Skip to content

dcaribou/transfermarkt-datasets

Repository files navigation

Build Status Scraper Pipeline Status API Pipeline Status dbt Version

transfermarkt-datasets

Clean, structured and automatically updated football (soccer) dataset built from Transfermarkt data -- 68,000+ games, 30,000+ players, 1,500,000+ appearances and more, refreshed weekly.

What's in it

The dataset is composed of 10 tables covering competitions, games, clubs, players, appearances, player valuations, club games, game events, game lineups and transfers. Each table contains the attributes of the entity and IDs that can be used to join them together.

Table Description Scale
competitions Leagues and tournaments 40+
clubs Club details, squad size, market value 400+
players Player profiles, positions, market values 30,000+
games Match results, lineups, attendance 68,000+
appearances One row per player per game played 1,500,000+
player_valuations Historical market value records 450,000+
club_games Per-club view of each game 136,000+
game_events Goals, cards, substitutions 950,000+
game_lineups Starting and bench lineups 81,000+
transfers Player transfers between clubs --

Download Dataset Open in GitHub Codespaces Kaggle data.world

ER diagram
classDiagram
direction LR
competitions --|> games : competition_id
competitions --|> clubs : domestic_competition_id
clubs --|> players : current_club_id
clubs --|> club_games : opponent/club_id
clubs --|> game_events : club_id
players --|> appearances : player_id
players --|> game_events : player_id
players --|> player_valuations : player_id
games --|> appearances : game_id
games --|> game_events : game_id
games --|> clubs : home/away_club_id
games --|> club_games : game_id
class competitions {
 competition_id
}
class games {
    game_id
    home/away_club_id
    competition_id
}
class game_events {
    game_id
    player_id
}
class clubs {
    club_id
    domestic_competition_id
}
class club_games {
    club_id
    opponent_club_id
    game_id
}
class players {
    player_id
    current_club_id
}
class player_valuations{
    player_id
}
class appearances {
    appearance_id
    player_id
    game_id
}
Loading

Querying the data

Use any of the options above to get the data -- download the zip, grab it from Kaggle or data.world, or load individual tables into your tool of choice.

Pro-tip: You can also query any table remotely with DuckDB -- no download required!

INSTALL httpfs; LOAD httpfs;

SELECT player_id, name, position, market_value_in_eur
FROM read_csv_auto('https://pub-e682421888d945d684bcae8890b0ec20.r2.dev/data/players.csv.gz')
WHERE position = 'Attack'
ORDER BY market_value_in_eur DESC
LIMIT 10;

-- player_id | name             | position | market_value_in_eur
-- 581678    | Florian Wirtz    | Attack   | 200000000
-- 342229    | Kylian Mbappe    | Attack   | 180000000
-- 418560    | Erling Haaland   | Attack   | 180000000
-- 401923    | Lamine Yamal     | Attack   | 150000000
-- ...

Community

Getting in touch

In order to keep things tidy, there are two simple guidelines

  • Keep the conversation centralised and public by getting in touch via the Discussions tab.
  • Avoid topic duplication by having a quick look at the FAQs

Sponsoring

Maintenance of this project is made possible by sponsors. If you'd like to sponsor this project you can use the Sponsor button at the top.

Contributing

Contributions to transfermarkt-datasets are most welcome. If you want to contribute new fields or assets to this dataset, the instructions are quite simple:

  1. Fork the repo
  2. Set up your local environment
  3. Populate the data directory
  4. Start modifying assets or creating new ones in the dbt project
  5. If it's all looking good, create a pull request with your changes 🚀

In case you face any issue following the instructions above please get in touch

For full setup and workflow details, see the Developer guide.

Sponsor this project

 

Contributors 9