Skip to content

ktakattack/Social-Graph-Analysis-Group-Project

 
 

Repository files navigation

Grandma’s Cookbook: An Analysis of Ingredients Associated with Cuisines

This is a group project completed for Graph and Social Network Analysis at Kent State in Fall 2020.

Project Overview

The goal was to provide a network analysis of a dataset. Our idea was to incorporate a recipe and ingredient dataset found on Kaggle (see sources). Through this particular dataset, we could see ingredient correlation by cuisine. This made us ask if forming a network on this dataset could help expand a users' palettes and showing connections to new cuisines based on their existing favorite foods.

There were two objectives:

  1. Essential ingredients for a favorite cuisine
  2. Ingredient recommendation system--for example, if you like 3 types of cuisine, find top similar ingredients

To analyze the data a bit further, Python and Jupyter notebook were used. See our Jupyter Notebook.

Data Cleansing

The Kaggle dataset we used provided cleaned data in JSON files. However, since we were doing some natural language processing, this required us to refine the data. For example, the dataset had "diced tomatoes" and "canned tomatoes." For the purposes of our project, we wanted to clean modifiers, so that became "tomatoes."

Machine Learning

Now that the data was prepared, we were able to being creating prediction about the relationships between ingredients in the dataset.

Using Word2vec algorithms, we formed a model. Once the model was built, we tested a recipe, "jalapeno chili", to see what ingredients were predicted. Below shows the corresponding ingredients, all with a greater than 90% prediction to the recipe. image

Then we tested the inverse. We entered in two uncorrelated ingredients to see if the model would know they were unrelated. We thought "fresh pineapple" and "milk" would be a good test. The screenshot below shows that there was only about 6% correlation between those ingredients.
image

Since we were confident in our model, we created it into a network. We plotted the network for the ingredient "romaine lettuce" and it returned an accurate network of correlation with other ingredients. image

Web App

For the purposes of presenting our project to our instructor and class, we created a web app using Flask as our framework. Here's a screenshot of what our app looked like: When the user types an ingredient in the search bar, it returns the most correlated ingredients in order from most to least. Also, the user could select one of the cuisine buttons at the top and see what ingredients were most associated with each cuisine.

Libraries used:

  • Pandas
  • Seaborn
  • Matplotlib
  • Numpy
  • NTLK
  • Gemsim
  • Pickle
  • Scikitlearn

Sources:

Kaggle ingredient dataset - https://www.kaggle.com/kaggle/recipe-ingredients-dataset?select=train.json

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.8%
  • HTML 1.6%
  • Other 0.6%