A machine learning project for binary classification of skin cancer as malignant or benign, utilizing models like XGBoost, LGBM Classifier, Adaboost, SVM, and Logistic Regression. This pipeline involves preprocessing, visualization, modeling, and evaluation, making it a powerful diagnostic aid in dermatology.
-
Analyze class distribution (malignant vs. benign). Benign 400666 Malignant 393
-
Data Cleaning: Remove duplicates, handle missing values.
-
Data Augmentation: Random rotations, flips, brightness/contrast variations.
-
Normalization: Scale pixel intensities, normalize image dimensions.
-
Resizing: Uniform image dimensions while preserving aspect ratio.
- XGBoost
- AdaBoost
- LightGBM (LGBMClassifier)
- Support Vector Machine (SVM)
- Logistic Regression
- (Optional) Neural Network (for deep learning approach)
- Hyperparameter tuning using Grid/Random Search.
- K-Fold Cross-validation.
- Model-specific optimizations (e.g., SVM kernels, NN architecture).
- Metrics: Accuracy, Precision, Recall, F1-Score, AUC.
- Validation: Regular monitoring using a validation set.
- Confusion Matrix: For visualizing classification errors.
Category | Tool/Framework |
---|---|
Platform | Kaggle |
Notebook | Jupyter Notebook |
Language | Python |
Libraries | scikit-learn, XGBoost, LightGBM, matplotlib, seaborn, OpenCV, NumPy, pandas |
- Go to the Kaggle Notebook using the link below:
https://www.kaggle.com/code/masharjavid/final-skin-cancer-binary-classifier
- Open the notebook and run all cells.
- Dataset is already uploaded in the Kaggle environment and linked within the notebook.
- Best AUC Score: 0.963
- Highest Accuracy: 90.02%
- Top Performing Model: LGBM-Classifier
- Train loss: 0.1646
- Validation loss: 0.1245
- Recall: 0.876
- Time taken: 60.94sec
- Web/Mobile app deployment with UI for diagnosis.
- Explore larger datasets for improved generalization.