Skip to content

Encontrar el mejor modelo de test y entrenamiento, con un algoritmo preciso para un dataset de vino blanco(🤍) o tinto(❤) o ambos.🥂

License

Notifications You must be signed in to change notification settings

rusgar/White-or-red-wine

Repository files navigation

White-or-red-wine-app

Project Overview

Wine is an alcoholic beverage made from fermented grapes. Yeast consumes the sugar in the grapes and converts it to ethanol, carbon dioxide, and heat. It is a pleasant tasting alcoholic beverage, loved cellebrated . It will definitely be interesting to analyze the physicochemical attributes of wine and understand their relationships and significance with wine quality and types classifications. To do this, We will proceed according to the standard Machine Learning and data workflow models like the TPOT API model and devlop an app

Data Overview

Dataset is from Kaggle. This datasets is related to red variants of the Portuguese "Vinho Verde" wine.Vinho verde is a unique product from the Minho (northwest) region of Portugal. Medium in alcohol, is it particularly appreciated due to its freshness (specially in the summer). The classes are ordered and not balanced (e.g. there are much more normal wines than excellent or poor ones).

Kaggle the White-Wine

Kaggle the Red-Wine

After cleaning the two datasets, using the appropriate tools, we join them and check with the two models the differences between clean or unclean data

The set of both data contains a total of 12 variables, which were recorded for 5037 observations. These data will allow us to create different regression models to determine how different independent variables help predict our dependent variable, quality.

Dependences

Here we can find the libraries we will use in order to develop a solution for this problem.

numpy|pandas: Will help us treat the data.

matplotlib|seaborn: Will help us plot the information so we can visualize it in different ways and have a better understanding of it.

the interquartile range (IQR): We will use it to disperse the data and eliminate outliers and out-of-range values.

sklearn: Will provide all necessary tools to train our models and test them afterwards.

math: Will provide some functions we might want to use when testing our models (sqrt)

streamlit : It is the library that makes it easy to create web applications to display results of your data analysis.

Attributes Information

Input variables (based on physicochemical tests):

  1. fixed acidity
  2. volatile acidity
  3. citric acid
  4. residual sugar
  5. chlorides
  6. free sulfur dioxide
  7. total sulfur dioxide
  8. density
  9. pH
  10. sulphates 11 .alcohol Output variable (based on sensory data):
  11. quality (score between 4 to 7)

The Data Science

The data science workflow is a non-linear and iterative task which requires many skills and tools to cover the whole process. From framing your business problem to generating actionable insights. It includes following steps

  1. Business Understanding
  2. Data Understanding
  3. Data Preparation
  4. Exploratory Data Analysis
  5. Data Modeling
  6. Model Evaluation
  7. Model Deployment

Documentacion

Here I am going to do a simple machine learning with the help of Streamlit to predict the quality of the wine (as in the dataset). How to use machine learning to determine what physicochemical properties can make a wine "good", "medium" or "bad" and develop a web application with the help of streamlit to predict the quality of the wine?

Understanding, preparing and exploring data analysis

Once the data is loaded, cleanliness and structure, analysis, type of quality, its graphs, we reduce the atypical data thanks to the (IQR), we observe the correlation of its characteristics, we save it in another .csv

Data Modeling and evaluation

Union of the two datasets of white and red wine, elimination of duplicates, check the correlation regarding the quality, testing and training of the model, using the decision tree algorithms, with a conclusion of an accuracy of 1.0, in turn, We decompose the graph to distinguish where each wine is and two models so that they tell us the results of the dataset and know the quality of the wine: low, medium, good.

Grafica del color Grafica del color Grafica del color

As we continue to investigate, we do two testing and training models of the original data and those of the result model

Pairing

Pairing

Model Deployment

We are going to use streamlit, but before our development model, we use to be more precise TPOT, joblib to save the model and use it and with AutoML (automatic learning) we configure our final app

Streamlit

TPOT

Joblib

AutoML

Required Files

Setup.sh

files with the extension. sh are operating system scripts. We can execute them from the command line or from the interface of our distribution, in this case we will introduce it in our root file

mkdir -p ~/.streamlit/

echo "\
[server]\n\
headless = true\n\
port = $PORT\n\
enableCORS = false\n\
\n\
" > ~/.streamlit/config.toml

Procfile

We place in our root project the procfile file is a mechanism to declare which commands your application executes on the platform

web: sh setup.sh && streamlit run app.py

Requirements.txt

  1. Open a command terminal
  2. Navigate to the root of the project where you want to list the dependencies
  3. Execute
pip freeze > requirements.txt
  1. Open the file requirements.txt created, it will list all the package dependencies as well as the version of the package that your project requires to work:

To install this list of dependencies in any other Python installation you can run

pip install -r requirements.txt

App Deployment

  1. Open a command terminal
  2. Navigate to the project folder where we have the app.py
  3. Execute
mkdir -p ~/streamlit run app.py

A local host will be opened with the streamlit app running

Production

Streamlit

From this link we can upload our app in production without any difficulty, it will ask us to log in, and following the steps,

1- New App

2- Deploy an app

   2.1 Repository

   2.2 Branch

   2.3 Main file path

3- Deploy

4- After a few seconds on our right we will see how the commands are being executed ... to have our app with its url

Stack and pip list

Python 3.9.2 Streamlit 1.1.0 Jupyter 6.3.0 NUmpy Pandas Scikit learn Plotly Kaggle

Matplotlib 3.4.3 TPOT 0.11.7 Joblib 1.0.1 Pillow

ML Status

APP to enjoy

White-or-red-wine

    “El vino hace que la vida diaria sea más fácil, menos apresurada, con menos tensiones y con más tolerancia.” 
                                                                                      --- Benjamin Franklin

Core Code

Dar las gracias a mis profesores de Code-Code-School:

About

Encontrar el mejor modelo de test y entrenamiento, con un algoritmo preciso para un dataset de vino blanco(🤍) o tinto(❤) o ambos.🥂

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages