The goal of this project is to analyze the data available from Spotify to answer questions about Spotify Audio Features by song year, correlation between Audio Features and country metrics, and the Spotify Audio Features by song year. Technologies to be used are Python, Jupyter Notebooks, Pandas, Requests, and Matplotlib. Optionally, the Spotify API can be used but will match the Kaggle Data. For this we used the following data sources:
Our presentation slides can be found here.
- Use the Happiness Score from World Metrics, scrape the 2019 weekly data for each region from Spotify Audio Features, and use the Audio Features from here Spotify Audio Features.
- Merge the 3 data sets and aggregate each Audio Feature by the appropriate measure of central tendency.
- Show plots with regression lines and give the r value for each Audio Feature by Country Happiness Score.
In 2019, do Audio Features of charting songs correlate to a country's Freedom to Make Life Choices Score?
- Use the Freedom to Make Life Choices Score from World Metrics, scrape the 2019 weekly data for each region from Spotify Audio Features, and use the Audio Features from here Spotify Audio Features.
- Merge the 3 data sets and aggregate each Audio Feature by the appropriate measure of central tendency.
- Show plots with regression lines and give the r value for each Audio Feature by Freedom to Make Life Choices Score.
- Use the GDP per Capita from World Metrics, scrape the 2019 weekly data for each region from Spotify Audio Features, and use the Audio Features from here Spotify Audio Features.
- Merge the 3 data sets and aggregate each Audio Feature by the appropriate measure of central tendency.
- Show plots with regression lines and give the r value for each Audio Feature by GDP per Capita.
- Use the Audio Features from here Spotify Audio Features.
- Determine the appropriate measure of central tendency for each Audio Feature and give evidence.
- Show plots displaying the change in each Audio Feature over time.
- Scrape the 2017 daily charts in the US from Spotify Audio Features and use the deaths of these artists to cross reference.
- Narrow the data frame to artists that hit the charts in 2017 and died in 2017.
- Compare the streams before death, on the day of death, and after death using line plots.
- The work for this is done in Daily 2017 US Charts and Weekly 2019 Charts by Region.
- This uses the requests library and Beautiful Soup to grab the html from Spotify Charts and converts the data into a large csv. We then open the csv as a data frame to analyze.
-
The analysis for this question is in the Features Over Time notebook.
-
Valence – The average valence was the lowest in 1946, meaning this was the saddest year of music in our dataset. This is likely due to WWII.
-
Acousticness – We noticed a general drop is acousticness over the decades, which makes sense considering the rise of digital music making
-
Popularity – Considering “The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are”, this analysis confirmed our assumption that the more recent a song was made, the more popular it would be.
In 2019, do Audio Features of charting songs correlate to a country's Happiness Score, Freedom to Make Life Choices Score, GDP per Capita?
- The analysis for this question is in the Audio Features vs Country Metrics notebook.
- We found that there is no strong correlation between the audio features of songs streamed in a country and that country's happiness score, freedom to make life choices score, or GDP. The strongest correlation that we observed was between GDP per capita and song duration. The relationship wasn't that strong (r value of -0.59) but it was the strongest we found. Below are the plots of audio features compared to a countries happiness score.
- The analysis for this question is in the 2017 Artist Deaths notebook.
- The deaths of Chester Bennington (Linkin Park) and Tom Petty had the most significant initial effect of on Spotify streams by a wide margin. Linkin Park and Tom Petty accumulated 10,647,809 and 9,080,227 streams, respectively, in the day following their deaths and were responsible for 14% and 11.5% of total songs on the chart those days. However, Linkin Park had a much more prolonged increase in streams, maintaining at least one song on the streaming chart for three weeks, while Tom Petty's final appearance came one week after his death.
- Overall, the data supports our hypothesis that the number of Spotify streams would dramatically increase following an artist's death. It was interesting, however to observe the variance in how long deceased artists maintained a position within the top 200 daily charts.
- Spotify only lists the number of streams for songs in the top 200, thus the total number of songs reflect only those particular tracks. It would be advantageous to be able to gather data from the entirety of an artists streams, which would also provide an even more telling look at their pre-death numbers and exactly how long after their deaths an increase was observed.
- Two artist provided unanticipated data that raised an additional question to contemplate: What are some song trends during certain seasons or particular events. Chuck Berry had just one song on one day reach the chart following his death, however, he appeared 33 times in the holiday season with his song "Run Rudolph Run." Following the death of guitarist Malcolm Young, there were no appearances for AC/DC, but they made the charts on New Years Day with "You Shook Me All Night Long" and on Halloween with "Highway to Hell".