GitHub - mehdimt1980/Text-Clustering: Text Clustering on Hobbes's Leviathan with K-Means

Clustering Analysis of Hobbes's Leviathan

This project applies clustering analysis to the text of Hobbes's Leviathan, which is available on Project Gutenberg. The goal is to identify patterns in the text and group similar sections together.

Data

The data used in this project is the full text of Hobbes's Leviathan, which was downloaded from Project Gutenberg in plain text format.

Methods

The text data was preprocessed by removing stop words and stemming the remaining words. A TF-IDF vectorizer was then applied to convert the text data into a matrix of features.

The optimal number of clusters was determined using the elbow method, and k-means clustering was applied to group the text data into clusters. The top words for each cluster were identified using the centroid of the cluster and the TF-IDF values.

A dendrogram was also generated to visualize the hierarchical distances between clusters.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
Text_Clustering_on_Hobbes's_Leviathan.ipynb		Text_Clustering_on_Hobbes's_Leviathan.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

mehdimt1980/Text-Clustering

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages