Skip to content

Latest commit

 

History

History
63 lines (41 loc) · 6.65 KB

challenge_instructions.md

File metadata and controls

63 lines (41 loc) · 6.65 KB

On this page you can find the datasets and instructions for the 2025 Bucknell Data Challenge. This event is co-sponsored by the Dominguez Center for Data Science and the Digital Pedagogy & Scholarship Department of Library & Information Technology. Presentations of findings and awards will take place on Thursday, February 13th, 4:30 - 6:00pm in Taylor 210.

Datasets

You must utilize at least one of the following three datasets in this data challenge. You don't have to use all the rows (and definitely not all of the columns) and you are welcome to pull in additional data. If you can't figure out what a particular variable measures, make an educated guess based on the variable name, come by the Data Challenge Office Hours, or ask in the #challenge-s25 channel of the DCDS Slack Workspace.

Dataset Option 1: Billboard Hot Hundred

The Billboard Hot 100 represents the top songs in the US based on sales, online streaming, and radio airplay. The list started in 1958 and is updated weekly. UT Austin provides a CSV of the data on a public GitHub repository. We have stored their data here: https://github.com/Bucknell-Data-Science/data_challenge_s25/blob/main/hot_100_current.csv.

Consider using these data to explore trends in the Billboard Hot 100.

Dataset Option 2: Rolling Stone 500

Rolling Stone magazine periodically releases a list of the top 500 songs of all time. Their first list appeared in 2003 and was updated in 2012 and 2020. The digital publication, "The Pudding" compiled a dataset of the rankings, which can be found here: https://github.com/Bucknell-Data-Science/data_challenge_s25/blob/main/rolling_stone.csv.

Use these data to tell a story about the greatest albums of all time.

Dataset Option 3: Love Songs

The Pudding decided to investigate the common sentiment that love songs are dying. They compiled a dataset of all the songs from the Billboard Top 10 and then categorized those songs into the following groups:

  • Not a love song
  • Serenade: "They’re all unmistakably about romantic love and devotion, sung from one person to another."
  • Pursuit: "How about when you love someone, and it might become something more? You just spotted someone, your heart is beating fast, and who KNOWS where this thing might lead?"
  • Heartache: "But what happens if you love them, but they just... don’t? Maybe you broke up, or maybe it’s just unrequited."
  • It's Complicated: "What about when a relationship isn’t clearly good or bad? Maybe you fight constantly. Maybe they’re unfaithful. But you still try to make it work."
  • Good Riddance: "What about those songs where the relationship is clearly over, but the songwriter’s heartbreak has resurrected into... righteous power?"
  • Sexual Confidence: "What about songs that get a little steamy? Think artists like Nicki Minaj and Drake, who dominate this category that's all about getting into bed with someone."
  • Love Song for the Self: "Far enough to find its reincarnation: songs where heartbreak turns love back onto the OG, yourself. 'You don't love me? That’s ok, because I do! In fact, I’ll buy myself flowers…'"

You can access the data here: https://github.com/Bucknell-Data-Science/data_challenge_s25/blob/main/Love%20song%20categories%20for%20Billboard%20Top%2010%20hits%2C%201958%20-%20September%202023%2C%20from%20The%20Pudding.csv

Use this dataset to explore trends in love songs.

Data Exploration Suggestions

  • Consider mixing the datasets in different ways.
  • Consider adding in supplemental data from other sources such as Spotify.
  • Try focusing on specific subsets of the data and making comparisons across groups.
  • Start asking questions of the data and see where that goes.

Competition Structure and Rules

  • Between Friday, February 7th and 4:30pm on February 13th, you and your team should try to find some interesting insights in the data. Those insights could come from data visualizations, summaries, and/or models.
  • You and your team should create and share two Google slides: 1 title slide which includes the names of all group members and 1 content slide (or slide-like deliverable). Before 4:15pm on the 13th, share those slides with [email protected].
  • All students are welcome to participate, regardless of your data science background.
  • If no one from your team is able to attend the kick-off event, you can still participate.
  • You are allowed to compete solo or in teams of up to 4 people. All competitors must be Bucknell undergraduates.
  • Collaboration between teams is not only allowed but highly encouraged.
  • During the challenge, you are welcome to use Taylor 212 as a space to work whenever it isn't reserved for other events.
  • You can use whatever data science software (e.g., R, Python, Excel, Googlesheets, Matlab, Stata, Voyant, Tableau, Highcharts...) you want.
  • While you are allowed (but not required!) to use generative AI tools to support your work, the core problem-solving and final deliverable must be human-driven.
  • Any updates or announcements will be posted to the #challenge-s25 channel of the DCDS Slack workspace. Please sign up for the DCDS Slack workspace if you haven't already and join this public channel.

Challenge Events and Help Sessions:

  • The kick-off event will take place from noon - 12:50pm on Friday, February 7th in Taylor 210. At the kick-off, you will be introduced to the data and given the instructions for the challenge. Pizza will be provided.
  • We will hold drop-in help sessions in Taylor 212 at the following times: Sun, Feb 9th 1-3pm, Mon, Feb 10th 7-9pm, Wed, Feb 12th 12-1pm & 4-6pm, and Thur, Feb 13th noon - 1pm. This is a friendly competition so feel free to use these optional sessions to ask questions and brainstorm ideas. We'll have some goodies at these sessions.
  • The closing session will take place from 4:30 - 6pm on Thursday, February 13th in Taylor 210. Each group will give a three minute presentation. You must have 1 title slide which includes the names of all group members and 1 content slide (or slide-like deliverable). Prizes will be given for Best Insight, Best Visualization, and Best-in-Show. You do not need to have all team members participate in the presentation.