In 2024 me and my partner challenged eachother to a reading competition. A simple game, whoever reads the most in the year wins. Naturally, a question of score arises. How do you determine who read the most? Well, you cant use number of books, otherwise i'd just read a ton of small books of no substance and book, i've won the game. Well, thats not fair, so that scoring metric wont do. What about page count? Well, the same book printed by two different printing companies can have a different number of pages, based off of font size, physical size of the book and so on. So this metric doesn't work that well either. What does that leave us with? Word count.
Word count, as we decided, was probably the most objective metric we would use as a determining factor for the competition. This completely negates the problems that book count and page count bring to the table. Now, why the need for this application? Well, I figured it would be easy to look up the wordcounts of major books, but the problem was that every website said something different, sometimes on the magniture of thousands of pages. Now this is simply no good for reporting our progress, especially when it comes to lesser known books that dont have a wordcount on the internet at all! That is the motivation for this project.
This application counts the amount of words in a given PDF between a certain interval of pages. That feature, choosing the pages, is important because we can't just upload the PDF for the entire book, else we would get all the extraneous words in the pre-print before the meat of the book starts. This keeps it as accurate as possible. Now, there gonna be inherent error in this code since word identification is a tricky process. This is not going to produce 100% accurate word counts in the given books, but the idea is that it will produce the most accurate count as possible.