This repository accompanies the book Getting Started with Natural Language Processing available here. You can use the coupon code "slkochmar" to get a 42% discount.
- Chapter 1 – Introduction
- Chapter 2 – Your first NLP example
- Chapter 3 – Introduction to Information Search
- Chapter 4 – Information Extraction
- Chapters 5 & 6 – Author Attribution and User Profiling
- Chapters 7 & 8 – Sentiment Analysis
- Chapter 9 – Topic Analysis
- Chapter 10 – Topic Modeling
- Chapter 11 – Named Entity Recognition
To run the notebooks on your machine, check if Python 3
is installed (all code was written and tested with Python 3.7
). In addition, you will need the following libraries (notebooks tested with the versions indicated in the brackets):
NLTK
(v 3.5): check installation instructions for the toolkit at https://www.nltk.org/install.html and the accompanying data at https://www.nltk.org/data.htmlSpaCy
(v 3.1.3): check installation instructions at https://spacy.io/usage. You will also need to install models (e.g.,en_core_web_sm
,en_core_web_md
, anden_core_web_lg
) using the instructions on the website.Gensim
(v 3.8.0): check installation instructions at https://radimrehurek.com/gensim/Matplotlib
(v 3.1.3): check installation instructions at https://matplotlib.org/users/installing.htmlScikit-learn
(v 0.22.1): check installation instructions at http://scikit-learn.org/stable/install.htmlNumPy
(v 1.18.1): check installation instructions at https://www.scipy.org/install.htmlPandas
(v 1.0.1) check installation instructions at https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html
Alternatively, a number of these libraries can be installed in one go through Anaconda distribution.
For more information on Jupyter
notebooks, check https://jupyter.org.