-
Notifications
You must be signed in to change notification settings - Fork 1
Difference checking done by the program
The program uses a couple functions in order to actually tell which words are different from each other. In the diff_checker folder, there are a couple tools that are used in the main GUI folder. There is a DiffWord.py class, main.py, and a bunch of tests.
DiffWord is a class with a couple of important properties. The init initializes the member variables. They are:
- self.word stores the original word.
- self.isDifferent is the boolean that states if a word appears in both lists at the same spot and the same string - it will be False if it is all the same and True if it is different.
- self.index is a list of indexes of where that words appear in the text input and then the voice input. If a word does not appear in one, then the index should be -1.
The str(self) function is meant to help print DiffWords out. It will just print the original value.
After that, there are just a bunch of observers.
- getWord() returns self.word
- isDiff() tells if self.isDifferent is True
- getIndex() returns the index list.
- get_pos_in_original() returns the first spot if you want that.
- get_pos_in_derived() returns the second spot in the audio file, if you're using it correctly.
All in all, a relatively simple class.
This is where the DiffWord class is listed, and where the main difference checking algorithms are there. They are transported to the GUI folder later to be used there. First, it imports the DiffWord class discussed above.
This function returns a list of words to work with. f is a file to parse. It should be a file with one string, and works for files of .txt format. It does work with multiple lines as well. It takes each line of the file and splits it into a list. Then, it takes all the words and cleans up the punctuation or any other issues it may have. It then lowers each word in order to turn it into an easier format to parse later. It will append each word to a master list of all the words later.
Arguably one of the most important functions. As input, it takes two lists of strings. It then computes the length of these words and compares them, with same_length being a boolean if the lists are of the same length or not. If they are the same length, then the function will call same_length_list(words1, words2) which is described below. If they are not, then it will call diff_length_list(arg1, arg2) depending on which one is of a longer length. arg1 should be the longer list.
The function will work the ultimate list of DiffWords to be parsed later.
This is the helper function that is called by get_DiffWords if the two lengths are the same length. For every word in the first list, it checks to see if it is in the second list. If they are equal, then it creates an appropriate DiffWord and appends it to a master list of DiffWords. Otherwise, it will create two DiffWords with their appropriate properties from their respective places, and append them one after the other. It will return a list of all the DiffWords after that.
This uses an offset in order to help keep track of words and indexes. for the length of the shorter list, if the word is the same, it's the same process as same_length_list. If it is different, one is added to the offset, and the offset is what is used +i in order to account for the indices, while retaining the right information. The DiffWords are all added to a master list of DiffWords and once it is gone through all the words in the shorter list, the list is returned.
This code is mainly just there for testing purposes. Among other stats like the number of words in each list and what the words actually are, it will then get_DiffWords on the words in each file and then print them out. This was used to used the .txt files found in the directory.
This code tests different aspects of the DiffWord class and just makes sure they work.
These .txt files were used by the QuoteR team in order to check the make sure the difference checking was correct. Feel free to add more or remove as seen fit.