Introduction and Motivation

In 2024 me and my partner challenged eachother to a reading competition. A simple game, whoever reads the most in the year wins. Naturally, a question of score arises. How do you determine who read the most? Well, you cant use number of books, otherwise i'd just read a ton of small books of no substance and book, i've won the game. Well, thats not fair, so that scoring metric wont do. What about page count? Well, the same book printed by two different printing companies can have a different number of pages, based off of font size, physical size of the book and so on. So this metric doesn't work that well either. What does that leave us with? Word count.

Word count, as we decided, was probably the most objective metric we would use as a determining factor for the competition. This completely negates the problems that book count and page count bring to the table. Now, why the need for this application? Well, I figured it would be easy to look up the wordcounts of major books, but the problem was that every website said something different, sometimes on the magniture of thousands of pages. Now this is simply no good for reporting our progress, especially when it comes to lesser known books that dont have a wordcount on the internet at all! That is the motivation for this project.

This application counts the amount of words in a given PDF between a certain interval of pages. That feature, choosing the pages, is important because we can't just upload the PDF for the entire book, else we would get all the extraneous words in the pre-print before the meat of the book starts. This keeps it as accurate as possible. Now, there gonna be inherent error in this code since word identification is a tricky process. This is not going to produce 100% accurate word counts in the given books, but the idea is that it will produce the most accurate count as possible.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Test PDFs		Test PDFs
__pycache__		__pycache__
.gitattributes		.gitattributes
README.md		README.md
Word Counter [Notebook].ipynb		Word Counter [Notebook].ipynb
pdfWordCounter.py		pdfWordCounter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction and Motivation

About

Releases

Languages

AndyJohnsonMath/PDF-Word-Counter

Folders and files

Latest commit

History

Repository files navigation

Introduction and Motivation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages