A machine learning project for detecting anomalies in network traffic using a dataset containing network traffic data. The project includes data preprocessing, handling missing values, feature scaling, and training a Decision Tree Classifier. The performance of the model is evaluated using accuracy score, classification report, and confusion matrix visualizations.
This repository contains a comprehensive project for anomaly detection in network traffic data. The key steps include:
-
Data Loading and Exploration: Loading the dataset, displaying summary statistics, and checking for missing values.
-
Data Cleaning: Handling missing values for both numerical and categorical columns.
-
Data Preprocessing: Encoding categorical variables and scaling numerical features.
-
Model Training: Splitting the dataset into training and testing sets, and training a Decision Tree Classifier.
-
Model Evaluation: Evaluating the model using accuracy score, classification report, and plotting the confusion matrix.
Ensure you have the necessary libraries installed:
pip install pandas numpy scikit-learn matplotlib seaborn