Skip to content

mehdimt1980/Text-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 

Repository files navigation

Clustering Analysis of Hobbes's Leviathan

This project applies clustering analysis to the text of Hobbes's Leviathan, which is available on Project Gutenberg. The goal is to identify patterns in the text and group similar sections together.

Data

The data used in this project is the full text of Hobbes's Leviathan, which was downloaded from Project Gutenberg in plain text format.

Methods

The text data was preprocessed by removing stop words and stemming the remaining words. A TF-IDF vectorizer was then applied to convert the text data into a matrix of features.

The optimal number of clusters was determined using the elbow method, and k-means clustering was applied to group the text data into clusters. The top words for each cluster were identified using the centroid of the cluster and the TF-IDF values.

A dendrogram was also generated to visualize the hierarchical distances between clusters.

About

Text Clustering on Hobbes's Leviathan with K-Means

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published