Brief Description:
The project involves building a predictive typing system for an arbitrary language using two approaches.
In the first approach, using a large corpus, a language model is constructed using different smoothing methods like Witten-Bell and Kneser-Ney methods. The implemented language models with different smoothing techniques are evaluated using measures of perplexity and cross-entropy using a separate test corpus. This approach is suitable for long text.
The second approach involves finding the semantic similarity between two texts and using it to predict the next word. This approach is suitable for short text.
These two approaches are then compared for their prediction abilities.
Finally, a text editor with the predictive typing capabilities is developed.A suitable GUI will be provided to the user for using the text editor.
Modules
- Module - 1
- Corpus data preprocessing (tokenization,stopword removal etc.) and implementation of few smoothing techniques
- Implementation of remaining smoothing techniques and their comparison
- Module - 2
- Preparation of dataset for semantic similarity and finding the semantic similarity between two texts
- Using semantic similarity for prediction, Text-editor implementation and comparison of both approaches
Team Members:
1. Adarsh Sanghai
2. Amitojdeep Singh
3. Anirudh Kumar Bansal
4. Lakshit Bhutani