Spotify-Popular-Songs-Predictive-Model

Basic approach to predict popular songs in spotify based on audio and lyrics features

Project purpose and goals

Most existing models analyze the popularity of songs primarily through audio features, offering useful insights but neglecting the significant impact of lyrical content. This gap creates a business necessity to explore the combined influence of audio and lyrical features on a song's popularity, addressing a more holistic understanding of success factors in the music industry.

This project aims to provide actionable insights by analyzing both audio and lyrical features, addressing key challenges like aligning creativity with listener preferences and supporting emerging artists. It equips stakeholders, including artists, record labels, and platforms like Spotify, with predictive tools to enhance decision-making and foster collaborations. By bridging gaps in creativity, market demands, and innovation, the project seeks to revolutionize music discovery and production through data-driven strategies.

Dataset

The dataset used for the model contains the processed audio features (loudness, energy, dancebility,..) and the raw lyrics from each song to be process.

Methodologies used

In this basic approach for preprocessing the lyrics have been used techniques to extract emotion, sentiments, lexical diversity and topics. For dealing with the imbalance data from the popular category, undersampling technique has been used. In the modelling, logistic regression has been used as baseline to compare with a Random Forest model.

Key findings

General Trends: Simpler lyrics (lower lexical diversity), loudness, and danceability are key predictors of popularity across genres, with newer songs generally being more popular except in rock.
Genre-Specific Insights: Popularity drivers vary by genre; for example, "Hip-Hop" subgenres dominate in rap, while older tracks are favored in rock, and reggaeton elements are critical in Latin music.
Model Performance: Random Forest models excel in capturing complex relationships, but imbalanced datasets pose challenges in achieving precision-recall balance. The model excels predicting not popular songs, but struggle with popular songs. However, capture around 72% of the popular songs.
Feature Divergence: Subgenres and specific attributes (e.g., "Neo Soul" in R&B or "Indie Poptimism" in pop) have unique impacts, while thematic diversity has minimal importance.

Limitations and improvements

Temporal Trends: Year-related features dominate predictions but might fail to generalize to older or timeless tracks.
Dataset Imbalance: The results are influenced by the dataset's composition, particularly its balance across genres and popularity classes.
Recommendations:
- Incorporate additional data like streaming metrics ( play counts or playlist placements) to refine predictions.
- Experiment with neural networks or hybrid models to capture deeper relationships between features.
- Explore more advanced NLP techniques to extract richer semantic meaning from lyrics.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Notebook-Code and Explanation.ipynb		Notebook-Code and Explanation.ipynb
README.md		README.md
spotify_songs.zip		spotify_songs.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify-Popular-Songs-Predictive-Model

Project purpose and goals

Dataset

Methodologies used

Key findings

Limitations and improvements

About

Releases

Packages

Languages

jeanpool1415/Spotify-Popular-Songs-Predictive-Model

Folders and files

Latest commit

History

Repository files navigation

Spotify-Popular-Songs-Predictive-Model

Project purpose and goals

Dataset

Methodologies used

Key findings

Limitations and improvements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages