Skip to content

esmahoney/eda-client-project

Repository files navigation

eda client project

This is a exploratory learning project that focuses on data analysis, visualization and presentation.

It uses the data from King County Housing which contains information about home sales in King County (USA). This is a popular, public dataset that you can find more information about here: https://www.kaggle.com/datasets/harlfoxem/housesalesprediction/code. You can find descriptions of the columns names here [link to column_names.md].

Project Scope

You are a real estate agent in King County and your client is looking for a property to buy and they have specific needs. They are looking to you to provide them with insights and recommendations that will help them decide on a property to purchase. These should take into account location, timing, pricing, etc. Presentation to client can be found here Detailed notebook on how each of the calculations were formed can be found here.

Client Profile

Larry Sanders - 45yo. Married 3 children Occupation - Property requirements - waterfront with a view, isolated and wooded with minimal neighbors (or older neighbors) Neighborhood - nice, central Schools - not a requirement Budget - limited (need range) Additional details - kids are homeschooled or attend virtually to avoid germs. The family is close knit and spend time together so land plot size is important. Family is good with a home that may need to be renovated.

Requirements and Setup

The following sections will take you through the requirements needed to run the project and step-by-step getting you set up.

Requirements

The following packages are required for this project. Included are descriptions of each one to better understand what they do, how you'll use them and why they are helpful.

These packages will be automatically installed when you run pip install -r requirements.txt as described in the Setup section below.

Package Description
Altair (5.3.0) A declarative, beginner-friendly library for creating clean and interactive charts. Great for quick visualizations directly from pandas DataFrames.
Pandas (2.2.2) The essential library for working with structured data in Python. Makes it easy to clean, filter, and analyze data stored in tables or CSV files.
NumPy (1.26.4) Adds fast, efficient mathematical tools for handling large numerical arrays. It’s the backbone for most data and machine-learning libraries.
Matplotlib (3.9.1) The classic Python plotting library for creating static charts such as line, bar, or scatter plots. Highly customizable for data presentation.
Seaborn (0.13.2) Builds on Matplotlib to make beautiful, easy-to-read statistical graphics (like heatmaps, violin plots, and distributions) with minimal code.
Plotly (5.24.1) Used for creating dynamic, interactive, and zoomable visualizations that work in notebooks or dashboards. Great for exploratory data analysis.
Scikit-Learn (1.5.1) A robust library for machine learning. Includes ready-made algorithms for prediction, classification, and clustering, plus tools for data preparation.
GeoPandas (1.0.1) Extends pandas to handle geographic data — like coordinates, shapes, and maps — making spatial analysis simple and visual.
SQLAlchemy (2.0.15) A Python toolkit that simplifies connecting to and querying SQL databases, allowing you to use Pythonic commands instead of raw SQL.
psycopg2-binary (2.9.7) A PostgreSQL database adapter that lets Python applications (like SQLAlchemy or pandas) talk directly to a PostgreSQL database.
python-dotenv (1.0.0) Loads environment variables (like passwords or API keys) from a .env file into your project safely, so sensitive info isn’t hard-coded.
pytest (8.3.3) A simple but powerful testing framework for writing and running unit tests. Helps ensure your code works as expected and stays reliable over time.

Setup

First step is to clone this repository. This can be done from the green code button above. For more information on git check out some of the step-bs-step cheat-sheets here[shiny-octo] One of the first steps when starting any data science project is to create a virtual environment. For this project you have to create this environment from scratch yourself. However, you should be already familiar with the commands you will need to do so. The general workflow consists of...

  • setting the python version locally to 3.11.3
  • creating a virtual environment using the venv module
  • activating your newly created environment
  • upgrading pip (This step is not absolutely necessary, but will save you trouble when installing some packages.)
  • installing the required packages via pip

Set up notes

This repo contains a requirements.txt file with a list of all the packages and dependencies you will need.

Before you can start with plotly in Jupyter Lab you have to install node.js (if you haven't done it before).

  • Check Node version by run the following commands:
    node -v
    If you haven't installed it yet, begin at step_1. Otherwise, proceed to step_2.

macOS type the following commands :

  • Step_1: Update Homebrew and install Node by following commands:

    brew update
    brew install node
  • Step_2: Install the virtual environment and the required packages by following commands:

    pyenv local 3.11.3
    python -m venv .venv
    source .venv/bin/activate
    pip install --upgrade pip
    pip install -r requirements.txt

WindowsOS type the following commands :

  • Step_1: Update Chocolatey and install Node by following commands:

    choco upgrade chocolatey
    choco install nodejs
  • Step_2: Install the virtual environment and the required packages by following commands.

    For PowerShell CLI :

    pyenv local 3.11.3
    python -m venv .venv
    .venv\Scripts\Activate.ps1
    python -m pip install --upgrade pip
    pip install -r requirements.txt

    For Git-Bash CLI :

    pyenv local 3.11.3
    python -m venv .venv
    source .venv/Scripts/activate
    python -m pip install --upgrade pip
    pip install -r requirements.txt

About

Exploration into Seattle Housing Data and presentation to client

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published