Hate-Speech-Detection-in-Arabic

Hate Speech Detection in Arabic Using NLP

Overview:

This project focuses on detecting hate speech in Arabic text using Natural Language Processing (NLP) techniques. The objective is to classify Arabic tweets or texts into hate speech or non-hate speech categories. Given the challenges of processing Arabic text due to its unique morphology and grammar, this project employs preprocessing techniques and machine learning model.

Dataset:

We use a publicly available Arabic dataset for hate speech detection, which contains a collection of Arabic tweets labeled as "Hate Speech" or "Not Hate Speech".
Dataset Source: https://github.com/rewire-online/multilingual-hatecheck

Preprocessing:

Arabic text presents unique challenges in NLP due to its complex morphology and diacritics. This project uses the following preprocessing steps:

==> Text Normalization: Convert text to a standard form.
==> Tokenization: Use Farasa Segmenter for Arabic tokenization.
==> Stopwords Removal: Remove common Arabic stopwords that do not contribute to text meaning.
==> TF-IDF Vectorization: Convert text data into numerical form using TF-IDF.

Model Training:

We use Support Vector Machine (SVM) to classify the Arabic text into hate speech or non-hate speech

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Hate_Speech_Detection.ipynb		Hate_Speech_Detection.ipynb
README.md		README.md
word_hateful.png		word_hateful.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hate-Speech-Detection-in-Arabic

Overview:

Dataset:

Preprocessing:

Model Training:

About

Releases

Packages

Languages

Samahmaamri/Hate-Speech-Detection-in-Arabic

Folders and files

Latest commit

History

Repository files navigation

Hate-Speech-Detection-in-Arabic

Overview:

Dataset:

Preprocessing:

Model Training:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages