Songs Analysis w/ Spotify

Industry

Machine Learning

Course

Usable Artificial Intelligence

My Role

Data Scientist

Timeline

Jan 2025 - May 2025

View Github
Decoding the beats: a data-driven approach with Spotify.
During my journey to learn more about machine learning and artificial intelligence, I was drawn to exploring broader trends across music history to understand what makes a song popular.
In this project, I set out to answer two guiding questions: What patterns emerge across musical features over time and genre? Can we predict a song's popularity or recommend similar songs basedon these features?
I ended up building a Streamlit-powered music recommender system that compares songs using audio features from Spotify data. Users can select a favorite song to get smart, feature-based recommendations.
Dataset Overview & Analysis
The dataset I used was sourced from Kaggle, originally collected using Spotify's Web API. It includes 15,150 hit songs by 3,083 artists spanning over a century. Each song has audio features that are added by Spotify.

Here are some insights I found from using the following methods:
Data Prep & Exploration
  • Pop dominates in count and popularity, followed by Rock and Rap
  • Popular tracks tend to be louder, more danceable, less acoustic
  • Feature correlations with popularityare weak and nonlinear
Predictive Modeling
  • Ran Regression (Random Forest, Ridge, Lasso) → Best R² ≈ 0.22
  • Ran Classification (Random Forest, SVM, XGBoost) → Best F1 ≈ 0.39
  • Models perform best for very popular songs, less accurate for mid-tier
Clustering & PCA
  • Applied PCA to reduce dimensionality and visualize song space
  • K-Means Clustering (k=4) revealed hidden acoustic groupings like:
    • Mainstream pop: Loud, upbeat, high-valence
    • Modern rap/EDM: High speechiness, very popular
    • Acoustic/folk/jazz: Soft, niche, less popular
    • Rock/metal: Loud, fast, mixed success
 Image
Visualizing Data Clusters with a Hexbin Plots
Overall, I found that no single audio feature can predict a song's popularity — trends are contextual, not absolute. But when features are combined, they reveal meaningful patterns.
 Image
Reducing Dimensions to Reveal Patterns
Audio profiles are effective for clustering and recommendations, especially when analyzed across decades. The sound of success evolves over time.
 Image
Building an Interactive Music Recommendation App
After exploring clustering and recommendation techniques, I developed a Streamlit app that lets users select a song and discover others with similar audio features. The app uses K-Means clustering to narrow the search and cosine similarity to find the closest matches. Users can compare songs through radar charts and tables.

One cool example: Black Sabbath was matched with The Parting Glass — not because they are the same genre, but because their sound features are alike.
 Image
What I learned
This project was definitely a challenge for me. I realized just how vast the world of machine learning is and how much the quality of your data impacts everything.

Still, I learned a lot along the way, and it's made me more excited to get more into advanced machine learning methods for future projects.