eda client project

This is a exploratory learning project that focuses on data analysis, visualization and presentation.

It uses the data from King County Housing which contains information about home sales in King County (USA). This is a popular, public dataset that you can find more information about here: https://www.kaggle.com/datasets/harlfoxem/housesalesprediction/code. You can find descriptions of the columns names here [link to column_names.md].

Project Scope

You are a real estate agent in King County and your client is looking for a property to buy and they have specific needs. They are looking to you to provide them with insights and recommendations that will help them decide on a property to purchase. These should take into account location, timing, pricing, etc. Presentation to client can be found here Detailed notebook on how each of the calculations were formed can be found here.

Client Profile

Larry Sanders - 45yo. Married 3 children Occupation - Property requirements - waterfront with a view, isolated and wooded with minimal neighbors (or older neighbors) Neighborhood - nice, central Schools - not a requirement Budget - limited (need range) Additional details - kids are homeschooled or attend virtually to avoid germs. The family is close knit and spend time together so land plot size is important. Family is good with a home that may need to be renovated.

Requirements and Setup

The following sections will take you through the requirements needed to run the project and step-by-step getting you set up.

Requirements

The following packages are required for this project. Included are descriptions of each one to better understand what they do, how you'll use them and why they are helpful.

These packages will be automatically installed when you run pip install -r requirements.txt as described in the Setup section below.

Package	Description
Altair (5.3.0)	A declarative, beginner-friendly library for creating clean and interactive charts. Great for quick visualizations directly from pandas DataFrames.
Pandas (2.2.2)	The essential library for working with structured data in Python. Makes it easy to clean, filter, and analyze data stored in tables or CSV files.
NumPy (1.26.4)	Adds fast, efficient mathematical tools for handling large numerical arrays. It’s the backbone for most data and machine-learning libraries.
Matplotlib (3.9.1)	The classic Python plotting library for creating static charts such as line, bar, or scatter plots. Highly customizable for data presentation.
Seaborn (0.13.2)	Builds on Matplotlib to make beautiful, easy-to-read statistical graphics (like heatmaps, violin plots, and distributions) with minimal code.
Plotly (5.24.1)	Used for creating dynamic, interactive, and zoomable visualizations that work in notebooks or dashboards. Great for exploratory data analysis.
Scikit-Learn (1.5.1)	A robust library for machine learning. Includes ready-made algorithms for prediction, classification, and clustering, plus tools for data preparation.
GeoPandas (1.0.1)	Extends pandas to handle geographic data — like coordinates, shapes, and maps — making spatial analysis simple and visual.
SQLAlchemy (2.0.15)	A Python toolkit that simplifies connecting to and querying SQL databases, allowing you to use Pythonic commands instead of raw SQL.
psycopg2-binary (2.9.7)	A PostgreSQL database adapter that lets Python applications (like SQLAlchemy or pandas) talk directly to a PostgreSQL database.
python-dotenv (1.0.0)	Loads environment variables (like passwords or API keys) from a `.env` file into your project safely, so sensitive info isn’t hard-coded.
pytest (8.3.3)	A simple but powerful testing framework for writing and running unit tests. Helps ensure your code works as expected and stays reliable over time.

Setup

First step is to clone this repository. This can be done from the green code button above. For more information on git check out some of the step-bs-step cheat-sheets here[shiny-octo] One of the first steps when starting any data science project is to create a virtual environment. For this project you have to create this environment from scratch yourself. However, you should be already familiar with the commands you will need to do so. The general workflow consists of...

setting the python version locally to 3.11.3
creating a virtual environment using the venv module
activating your newly created environment
upgrading pip (This step is not absolutely necessary, but will save you trouble when installing some packages.)
installing the required packages via pip

Set up notes

This repo contains a requirements.txt file with a list of all the packages and dependencies you will need.

Before you can start with plotly in Jupyter Lab you have to install node.js (if you haven't done it before).

Check Node version by run the following commands:
```
node -v
```
If you haven't installed it yet, begin at step_1. Otherwise, proceed to step_2.

`macOS` type the following commands :

Step_1: Update Homebrew and install Node by following commands:
```
brew update
brew install node
```

Step_2: Install the virtual environment and the required packages by following commands:

pyenv local 3.11.3
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

`WindowsOS` type the following commands :

Step_1: Update Chocolatey and install Node by following commands:
```
choco upgrade chocolatey
choco install nodejs
```

Step_2: Install the virtual environment and the required packages by following commands.

For PowerShell CLI :

pyenv local 3.11.3
python -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt

For Git-Bash CLI :

pyenv local 3.11.3
python -m venv .venv
source .venv/Scripts/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
assets		assets
aufgabe		aufgabe
data		data
optional		optional
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
LICENSE		LICENSE
README.md		README.md
eda_larry_sanders.pptx		eda_larry_sanders.pptx
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

eda client project

Project Scope

Client Profile

Requirements and Setup

Requirements

Setup

Set up notes

`macOS` type the following commands :

`WindowsOS` type the following commands :

About

Uh oh!

Releases

Packages

Languages

License

esmahoney/eda-client-project

Folders and files

Latest commit

History

Repository files navigation

eda client project

Project Scope

Client Profile

Requirements and Setup

Requirements

Setup

Set up notes

macOS type the following commands :

WindowsOS type the following commands :

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`macOS` type the following commands :

`WindowsOS` type the following commands :

Packages