Clean, structured and automatically updated football (soccer) dataset built from Transfermarkt data -- 68,000+ games, 30,000+ players, 1,500,000+ appearances and more, refreshed weekly.
The dataset is composed of 10 tables covering competitions, games, clubs, players, appearances, player valuations, club games, game events, game lineups and transfers. Each table contains the attributes of the entity and IDs that can be used to join them together.
| Table | Description | Scale |
|---|---|---|
competitions |
Leagues and tournaments | 40+ |
clubs |
Club details, squad size, market value | 400+ |
players |
Player profiles, positions, market values | 30,000+ |
games |
Match results, lineups, attendance | 68,000+ |
appearances |
One row per player per game played | 1,500,000+ |
player_valuations |
Historical market value records | 450,000+ |
club_games |
Per-club view of each game | 136,000+ |
game_events |
Goals, cards, substitutions | 950,000+ |
game_lineups |
Starting and bench lineups | 81,000+ |
transfers |
Player transfers between clubs | -- |
ER diagram
classDiagram
direction LR
competitions --|> games : competition_id
competitions --|> clubs : domestic_competition_id
clubs --|> players : current_club_id
clubs --|> club_games : opponent/club_id
clubs --|> game_events : club_id
players --|> appearances : player_id
players --|> game_events : player_id
players --|> player_valuations : player_id
games --|> appearances : game_id
games --|> game_events : game_id
games --|> clubs : home/away_club_id
games --|> club_games : game_id
class competitions {
competition_id
}
class games {
game_id
home/away_club_id
competition_id
}
class game_events {
game_id
player_id
}
class clubs {
club_id
domestic_competition_id
}
class club_games {
club_id
opponent_club_id
game_id
}
class players {
player_id
current_club_id
}
class player_valuations{
player_id
}
class appearances {
appearance_id
player_id
game_id
}
Use any of the options above to get the data -- download the zip, grab it from Kaggle or data.world, or load individual tables into your tool of choice.
Pro-tip: You can also query any table remotely with DuckDB -- no download required!
INSTALL httpfs; LOAD httpfs;
SELECT player_id, name, position, market_value_in_eur
FROM read_csv_auto('https://pub-e682421888d945d684bcae8890b0ec20.r2.dev/data/players.csv.gz')
WHERE position = 'Attack'
ORDER BY market_value_in_eur DESC
LIMIT 10;
-- player_id | name | position | market_value_in_eur
-- 581678 | Florian Wirtz | Attack | 200000000
-- 342229 | Kylian Mbappe | Attack | 180000000
-- 418560 | Erling Haaland | Attack | 180000000
-- 401923 | Lamine Yamal | Attack | 150000000
-- ...In order to keep things tidy, there are two simple guidelines
- Keep the conversation centralised and public by getting in touch via the Discussions tab.
- Avoid topic duplication by having a quick look at the FAQs
Maintenance of this project is made possible by sponsors. If you'd like to sponsor this project you can use the Sponsor button at the top.
Contributions to transfermarkt-datasets are most welcome. If you want to contribute new fields or assets to this dataset, the instructions are quite simple:
- Fork the repo
- Set up your local environment
- Populate the
datadirectory - Start modifying assets or creating new ones in the dbt project
- If it's all looking good, create a pull request with your changes 🚀
In case you face any issue following the instructions above please get in touch
For full setup and workflow details, see the Developer guide.