CS F469 Information Retrieval

Predictive Typing System and Semantic Similarity Analyzer

Brief Description:
The project involves building a predictive typing system for an arbitrary language using two approaches. In the first approach, using a large corpus, a language model is constructed using different smoothing methods like Witten-Bell and Kneser-Ney methods. The implemented language models with different smoothing techniques are evaluated using measures of perplexity and cross-entropy using a separate test corpus. This approach is suitable for long text. The second approach involves finding the semantic similarity between two texts and using it to predict the next word. This approach is suitable for short text. These two approaches are then compared for their prediction abilities. Finally, a text editor with the predictive typing capabilities is developed.A suitable GUI will be provided to the user for using the text editor.

Modules

Module - 1
- Corpus data preprocessing (tokenization,stopword removal etc.) and implementation of few smoothing techniques
- Implementation of remaining smoothing techniques and their comparison
Module - 2
- Preparation of dataset for semantic similarity and finding the semantic similarity between two texts
- Using semantic similarity for prediction, Text-editor implementation and comparison of both approaches

Team Members:
1. Adarsh Sanghai
2. Amitojdeep Singh
3. Anirudh Kumar Bansal
4. Lakshit Bhutani

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
preprocess		preprocess
project		project
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
REPORT.pdf		REPORT.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CS F469 Information Retrieval

Predictive Typing System and Semantic Similarity Analyzer

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

lakshit07/predictive-typing

Folders and files

Latest commit

History

Repository files navigation

CS F469 Information Retrieval

Predictive Typing System and Semantic Similarity Analyzer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages