This project focuses on classifying electronic music tracks using machine learning techniques. The data is sourced from BeatsDataset, and the model is built using popular Python libraries such as pandas
, sklearn
, matplotlib
, and more.
You can find a detailed implementation of this project in my Kaggle notebook.
-
Data Preprocessing:
- The dataset is loaded using
pandas
and processed to clean and prepare the features for model training. - Categorical data is handled using the
OneHotEncoder
fromsklearn.preprocessing
and combined with numerical data usingColumnTransformer
fromsklearn.compose
.
- The dataset is loaded using
-
Data Splitting:
- The dataset is split into training and testing sets using
train_test_split
fromsklearn.model_selection
.
- The dataset is split into training and testing sets using
-
Feature Scaling:
- The features are scaled using the
StandardScaler
fromsklearn.preprocessing
to ensure that all features contribute equally to the model.
- The features are scaled using the
-
Model Training:
- The classification model chosen for this task is the
KNeighborsClassifier
fromsklearn.neighbors
. - Model hyperparameters are tuned using cross-validation to achieve optimal performance.
- The classification model chosen for this task is the
-
Model Evaluation:
- The model's performance is evaluated using accuracy metrics and visualized using
matplotlib.pyplot
.
- The model's performance is evaluated using accuracy metrics and visualized using
- pandas: For data manipulation and analysis.
- KNeighborsClassifier: A simple yet effective machine learning algorithm used for classification tasks.
- OneHotEncoder: For encoding categorical features.
- matplotlib.pyplot: For plotting and visualizing data and results.
- train_test_split: For splitting the dataset into training and testing subsets.
- ColumnTransformer: To apply different preprocessing steps to different columns.
- sklearn.preprocessing: Provides preprocessing utilities like scaling and encoding.
- sklearn.compose: Helps in combining multiple feature transformations into a single pipeline.
The dataset used in this project is the BeatsDataset, which contains various features describing electronic music tracks.