Skip to content

RyvynYoung/nlp_harry_potter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This project used my assigned NLP project as a starting point to determine if I could improve the model. Resampling provided the greatest improvement in prediction while accuracy increased only 3%, the average F1 score increased 14%.

  • Scraped 1K GitHub repositories with Harry Potter in the search
  • Filtered out those without a readme file or program language
  • Filtered for only the top 4 represented program languages: JavaScript, Java, HTML, Python
  • Cleaned and prepared data using regex and toktoktokenizer
  • Removed common english stop words and additional words found to create noise in dataset
  • Vetorized data using TF-IDF
  • Initial attempts to improve model by adding data, reducing noise, and testing multiple algorithms did not significantly increase accuracy
  • Determined imbalanced dataset might be contributing factor and Resampled data with SMOTE
  • This only improved the average model accuracy by 3%, but improved the average F1 score by 14%

Background vector created by macrovector - www.freepik.com

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published