This repository consists of a Python file that retrieves all athletes who participated in the Olympics from olympedia.org and a CSV file that stores the data collected by the Python file.
-
First of all, we need to call up all the countries that have participated in the modern Olympics.
-
Next, we create a Python dictionary data structure to know the host city from the host year and season later. The key of this dictionary is the concatenated Olympic year and season (e.g., 2018Winter), and the value is the host city.
-
Now we are traversing by country and importing athlete information.
- Get all 'href's of 'a' tag in the last column(Results) of 'Olympic Games' table of 'Participations by edition'.
- If you click ‘Results’ to access it, you will see the athletes who participated in the relevant Olympics. Duplicate athlete names will appear if an athlete has competed in multiple events.
- Now we create a non-duplicate 'athlete_id' set by looking at all records(all 'Results's) that a country has competed in past Olympics.
- Next, we start importing athlete information in earnest.
- Create an 'athelte_url' list from the 'athlete_id' set to access each athlete page.
- Access the player page, first collect the athlete's biographical information.
- Then, we import the games, sport, and detailed event that the athlete participated in from the 'Results' table.
- Finally, we get the medal information from the 'Results' table.
- Set the athlete information from the previously imported information and write it to the CSV file.
We are creating Kolympic, a website that visualizes Republic of Korea's Olympic records with the Olympic data collected in the above method and implements various high-level user scenarios related to the Olympics.