Amazon Food Reviews - NLP

Data Source : Data

Machine-Learning Objective

NLP is a field in machine learning with the ability of a computer to understand, analyze, manipulate, and potentially generate human language.

This project mainly focuses on Sentiment Analysis of Amazon Food Reviews. Various methods ranging from wordvectors to complex deep learning methods are used in this project.

The Amazon Fine Food Reviews dataset consists of reviews of fine foods from Amazon.

Number of reviews: 568,454
Number of users: 256,059
Number of products: 74,258
Timespan: Oct 1999 - Oct 2012
Number of Attributes/Columns in data: 10

Attribute Information:

Id
ProductId - unique identifier for the product
UserId - unqiue identifier for the user
ProfileName
HelpfulnessNumerator - number of users who found the review helpful
HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not
Score - rating between 1 and 5
Time - timestamp for the review
Summary - brief summary of the review
Text - text of the review

Approach

Preprocessing

Data Cleaning - Deduplications
Removing html tags , punctuations and set of special characters like , or . or # etc .
Checking if the word is made up of english letters and not alpha numeric characters .
Converting to lowercase .
Removal of stopwords .
Stemming of words .

Featurization

It is necessary to convert the words in vector format for NLP tasks . So used following methods to convert words in vector format .

Bag of Words (BoW)
Bi-Grams and N-Grams
TF-IDF (Term Frequency–Inverse Document Frequency)
Average Word2Vec
TF-IDF weighted Word2vec

Word2Vec is the most popular technique to learn word embeddings . Embeddings are the vector representation of words and it is capable of capturing context of a word in a document , semantic and syntatic similarity , relation with other words , etc .

Implementations

Dimensionality Reduction using t-SNE , PCA
Naive Bayes
Logistic Regression
SGD for linear Regression
Support Vector Machines
Decision Trees
Random Forests and GBDT
KMeans , Agglomerative and Hierarchical Clustering methods (Unsupervised)
TruncatedSVD (Co-occurrence matrix) - Matrix Factorization method (Used in recommendation systems)

Examples

Truncated SVD (Based on occurrence similar words) :

Dimensionality reduction using UMAP on Average W2V to visualize seperation of positive and negative reviews :

Top 20 important words that are responsible for classification of reviews :

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Amazon Fine Food Reviews Analysis (1)(1).ipynb		Amazon Fine Food Reviews Analysis (1)(1).ipynb
Decision Trees.ipynb		Decision Trees.ipynb
LSTM_Amazon_food_reviews.ipynb		LSTM_Amazon_food_reviews.ipynb
Logistic_Regression_Amazon_Food_Reviews(1).ipynb		Logistic_Regression_Amazon_Food_Reviews(1).ipynb
Naive_Bayes_Amazon_Food_Review(3).ipynb		Naive_Bayes_Amazon_Food_Review(3).ipynb
README.md		README.md
RF.ipynb		RF.ipynb
SGD_Linear_Regression (1).ipynb		SGD_Linear_Regression (1).ipynb
Truncated SVD.ipynb		Truncated SVD.ipynb
sample2.png		sample2.png
similar1.png		similar1.png
similar3.png		similar3.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Food Reviews - NLP

Machine-Learning Objective

Approach

Preprocessing

Featurization

Implementations

Examples

About

Releases

Packages

Languages

amitpeshwani/NLP-Amazon-Food-Reviews

Folders and files

Latest commit

History

Repository files navigation

Amazon Food Reviews - NLP

Machine-Learning Objective

Approach

Preprocessing

Featurization

Implementations

Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages