Recipe Reviews Sentiment Analysis

Project Overview

This project analyzes user feedback and reviews on recipes to uncover sentiment trends and user engagement patterns. By leveraging Apache Spark, natural language processing (NLP), and machine learning models, the study aims to improve recipe recommendations and enhance customer satisfaction.

Dataset Overview

Source: UCI Machine Learning Repository
File Size: 5.8 MB
Format: CSV
Instances: 18,182
Features: 15
Missing Values: Present in the text column

Key Attributes

Recipe Details: Name, ranking, unique ID
User Information: ID, reputation score
Review Data: Comment ID, timestamp, replies, thumbs up/down, star rating (1-5 scale, 0 if not rated)
Text Data: Full review comment for NLP-based sentiment analysis

Business Context

The dataset, sourced from an online platform, represents real-world user feedback on recipes. This data is useful for sentiment analysis, user behavior modeling, and recommendation systems. By understanding sentiment and engagement patterns, platforms can improve user experience and boost interaction.

Data Preparation and Preprocessing

Steps Taken

Initialize Spark Session – Set up a distributed environment for large-scale processing.
Load Dataset – Read and explore the CSV file using pyspark.sql.
Data Type Casting – Convert numeric fields to appropriate data types (IntegerType, TimestampType).
Data Cleaning – Handle missing values in text fields and remove redundant data.
Correlation Analysis – Analyze relationships between user reputation, thumbs up, and ratings.
Feature Engineering – Extract additional time-based features (year, month, day) for trend analysis.

Data Visualizations

Top 20 Recipes by Rating
Top 20 Recipes by Thumbs Up
Review Length vs. Rating
Sentiment Trends Over Time

Sentiment Analysis

Approach

Tool Used: TextBlob for sentiment polarity scoring.
Processing Steps:
- Tokenization, stopword removal, and lemmatization applied to review text.
- Polarity scores assigned to each review to classify sentiment as positive, neutral, or negative.

Key Findings

Positive reviews correlate with higher engagement (thumbs up).
Long reviews tend to show stronger sentiment (either highly positive or negative).

NLP-Based Feature Extraction

Noun and Adjective Analysis: Identified frequent descriptive words in reviews.
Word Cloud Visualization: Generated insights into common themes in positive and negative reviews.

Machine Learning Models

Goal

Develop sentiment classification models using textual and engagement data.

Models Implemented

Model	Precision	Recall	F1 Score
Random Forest	73.5%	85.7%	79.1%
Logistic Regression	73.5%	85.7%	79.1%
Decision Tree	81.8%	85.9%	80.0%

Comparative Analysis

Random Forest provided robust generalization but had slightly lower precision.
Logistic Regression acted as a simple baseline model.
Decision Tree outperformed others, achieving the highest precision and recall.

Model Implementation Steps

Text Feature Extraction: TF-IDF vectorization of review text.
User Engagement Features: Integration of thumbs up, thumbs down, and star ratings.
Model Training and Evaluation: Used SparkML pipelines for training, testing, and hyperparameter tuning.
Cross-Validation: Optimized model parameters for improved accuracy.

Best Performing Model: Decision Tree

Achieved the highest precision (81.8%), meaning fewer false positive classifications.
Maintained strong recall and F1-score, balancing accuracy and interpretability.

Business Implications

Improving Recipe Recommendations

Promote High-Rated Recipes: Recipes with high ratings and engagement should be prioritized.
Flag Low-Rated Recipes: Negative reviews can be analyzed for potential improvements.

Trend Analysis

Seasonal Rating Trends: Identifying fluctuations in user sentiment over time can inform content strategies.
Keyword Analysis in Negative Reviews: Helps in identifying common complaints and areas for improvement.

Future Applications

Advanced NLP Models: Implement BERT or LSTM for better contextual understanding.
Real-Time Sentiment Analysis: Deploy models for live content moderation and recommendation systems.
User Segmentation: Tailor recipe recommendations based on user preferences and past interactions.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Recipe Reviews and User Feedback Dataset.csv		Recipe Reviews and User Feedback Dataset.csv
Recipe_Reviews_Sentiment_Analysis_Report.pdf		Recipe_Reviews_Sentiment_Analysis_Report.pdf
review_sentiment_analysis.ipynb		review_sentiment_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recipe Reviews Sentiment Analysis

Project Overview

Dataset Overview

Key Attributes

Business Context

Data Preparation and Preprocessing

Steps Taken

Data Visualizations

Sentiment Analysis

Approach

Key Findings

NLP-Based Feature Extraction

Machine Learning Models

Goal

Models Implemented

Comparative Analysis

Model Implementation Steps

Best Performing Model: Decision Tree

Business Implications

Improving Recipe Recommendations

Trend Analysis

Future Applications

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Recipe Reviews Sentiment Analysis

Project Overview

Dataset Overview

Key Attributes

Business Context

Data Preparation and Preprocessing

Steps Taken

Data Visualizations

Sentiment Analysis

Approach

Key Findings

NLP-Based Feature Extraction

Machine Learning Models

Goal

Models Implemented

Comparative Analysis

Model Implementation Steps

Best Performing Model: Decision Tree

Business Implications

Improving Recipe Recommendations

Trend Analysis

Future Applications

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages