Skip to content

Latest commit

 

History

History
63 lines (36 loc) · 2.14 KB

README.md

File metadata and controls

63 lines (36 loc) · 2.14 KB

Spam Detector

This is a Spam/Ham detector using Naive Bayes classifier implemented from scratch in Python3.

This is a text classification problem. Naive Bayes makes two assumptions:

  • bag of words assumption which assumes that positions do not matter.
  • conditional independence which assumes that feature probabilities are independent for a given class (e.g. spam/ham).

The following image shows the Naive Bayes Algorithm for training and testing text classification:


At the end, classification performance report is generated showing confusion matrix, accuracy, precision, recall and f1-score. It is currently trained on Enron dataset. However, it can be trained on any other email dataset by changing respective paths.


Usage 🔧

Program requires paths to train and test folders which further contain spam and ham folders having respective files to make datasets.

In Spam Ham Email Classification.ipynb, cell#5 contains the following code:

makeDatasets('train/spam', 'train/ham', 'test/spam', 'test/ham')

These are the paths to dataset files. Change these paths to train on any other dataset.


Author 👋

You can get in touch with me on my LinkedIn Profile:

Ahmad Shafique

LinkedIn Link

You can also follow my GitHub Profile to stay updated about my latest projects: GitHub Follow

If you liked the repo then please support it by giving it a star ⭐!


Contributions Welcome ✨

forthebadge

If you find any bug in the code or have any improvements in mind then feel free to generate a pull request.


License 📄

MIT

Copyright (c) 2020, Ahmad Shafique