This week we're looking at album rankings from Rolling Stone. h/t Data is plural. A visual essay from The Pudding looks at what makes an album the greatest of all time, and shares the data they put together for the essay.
A new visual essay from The Pudding compares Rolling Stone’s “500 Greatest Albums of All Time” lists from 2003, 2012, and 2020. A methodology note says the project began with a spreadsheet by Chris Eckert and eventually led the authors to develop a dataset of their own. Theirs lists every album in the rankings — its name, genre, release year, 2003/2012/2020 rank, the artist’s name, birth year, gender, and more — plus each year’s voters. [h/t Jason Kottke]
What are the characteristics of artists and genres popular at different times?
# Option 1: tidytuesdayR package
## install.packages("tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2024-05-07')
## OR
tuesdata <- tidytuesdayR::tt_load(2024, week = 19)
rolling_stone <- tuesdata$rolling_stone
# Option 2: Read directly from GitHub
rolling_stone <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-05-07/rolling_stone.csv')
- Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
- Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
variable | class | description |
---|---|---|
sort_name | character | Name used for sorting |
clean_name | character | Clean name |
album | character | Album name |
rank_2003 | double | Rank in 2003. NA if album not released yet or not in top 500. |
rank_2012 | double | Rank in 2012. NA if album not released yet or not in top 500. |
rank_2020 | double | Rank in 2020. NA if not in top 500. |
differential | double | 2020-2003 Differential. Negative value if it went down in the chart. Positive value if it went up. |
release_year | double | Release Year |
genre | character | Album Genre |
type | character | Album Type |
weeks_on_billboard | double | Weeks on Billboard |
peak_billboard_position | double | Peak Billboard Position |
spotify_popularity | double | Spotify Popularity. NA if not on Spotify. |
spotify_url | character | Spotify URL. NA if not on Spotify. |
artist_member_count | double | Number of artists in the group |
artist_gender | character | Gender of the artist(s). Male/Female if it's a mixed-gender group. |
artist_birth_year_sum | double | Sum of the artists birth year. e.g. for a 2 member group, with one person born 1945 and another 1950, the value is 3895. |
debut_album_release_year | double | Debut Album Release Year |
ave_age_at_top_500 | double | Average age at top 500 Album |
years_between | double | Years Between Debut and Top 500 Album |
album_id | character | Album ID. NOS at the beginning of the ID if not on Spotify. |
Downloaded from Rolling Stone 500 (public).
Changed column names, replacing white space with underscores, and making all letters lowercase.
Removed Chartmetric Link and Album ID Quoted columns.
Removed "N/A" and "Not on Spotify" and "-" characters, replacing with empty cells.