Music Popularity Prediction with Python

This project focuses on predicting the popularity of songs using various music features and metadata. It employs regression techniques to forecast a song's performance in terms of streams, downloads, and chart positions. The goal is to provide music producers, artists, and marketers with accurate predictions to make informed decisions.

Overview

The project uses a dataset of Spotify tracks to build a regression model that predicts song popularity. The analysis involves:

Exploratory Data Analysis (EDA): Visualizing the relationships between music features (like energy, danceability, and acousticness) and popularity.
Feature Selection: Identifying the most influential features for predicting popularity based on correlation analysis.
Model Training: Training a Random Forest Regressor model on the selected features.
Hyperparameter Tuning: Using GridSearchCV to find the optimal parameters for the model.
Evaluation: Assessing the model's performance by comparing its predictions to the actual popularity scores.

Dataset

The dataset used in this project is Spotify_data.csv, which contains various features for each track, including:

Track Name: The name of the song.
Artists: The artists of the song.
Album Name: The album the song belongs to.
Popularity: The popularity score of the song (the target variable).
Acoustic Features: Energy, Valence, Danceability, Loudness, Acousticness, Tempo, Speechiness, and Liveness.

Analysis and Model

The analysis begins with an exploratory data analysis (EDA) to understand the relationships between different musical features and popularity. The correlation matrix and visualizations show that features like Energy, Danceability, Loudness, and Acousticness have a significant correlation with popularity.

Based on this analysis, the following features were selected for model training:

Energy
Valence
Danceability
Loudness
Acousticness
Tempo
Speechiness
Liveness

A Random Forest Regressor was chosen as the prediction model after comparing its performance with other algorithms. The model was trained on the selected features, and GridSearchCV was used for hyperparameter tuning to find the best model parameters.

Results

The trained model shows a good correlation between the actual and predicted popularity scores, as seen in the scatter plot of the results. This indicates that the model can effectively predict music popularity based on the given features.

How to Run

To run this project, you can use the Jupyter Notebook ML_01_Music_Popularity_Prediction_with_Python.ipynb. Make sure you have the following libraries installed:

pandas
matplotlib
seaborn
scikit-learn

You can open the notebook in a Jupyter environment or Google Colab and run the cells sequentially to see the analysis and model training process.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Crawl_data_twitter.ipynb		Crawl_data_twitter.ipynb
ML_01_Music_Popularity_Prediction_with_Python.ipynb		ML_01_Music_Popularity_Prediction_with_Python.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music Popularity Prediction with Python

Overview

Dataset

Analysis and Model

Results

How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Music Popularity Prediction with Python

Overview

Dataset

Analysis and Model

Results

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages