Skip to content

nandajavarma/document-similarity

Repository files navigation

Install the required modules

pip install -r requirements.txt

Create the (title, vector) from the pdf

python create_vector.py pdf1 pdf2

To check the similarity of documents

python similarity.py pdf1 pdf2 pdf3 [pdf4 pdf5 ...]

The result will show how similar pdf1 is to the rest of pdfs

About

Document similiarity learning experiment with LSH

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages