Skip to content

Os3m3/DS_Project

Repository files navigation

🏠 Real Estate Price Intelligence – Oman Edition

🎯 Objective

The goal of this project is to build a mini data pipeline for real estate price prediction in Oman. It starts by scraping property listings from two local websites, cleaning and integrating the data, engineering useful features, and framing a predictive modeling problem based on property prices.

🌐 Data Sources

Two real estate platforms were used:

  • Dubizzle Oman – scraped using BeautifulSoup
  • Tibiaan – scraped using Selenium

This project was a hands-on learning experience in web scraping, using two different techniques to deal with static and dynamic content.

πŸ“¦ Steps in Data Collection & Cleaning

  1. Explore: Analyzed both websites to understand their structure and potential fields to extract.
  2. Plan: Created an Excel sheet listing possible data fields and identified the overlap between both platforms.
  3. Scrape: Fetched data using Python scripts, then cleaned:
    • Removed duplicates
    • Filled missing values using mean, median, or mode depending on context
    • Trimmed extra spaces to improve matching and consistency
  4. Merge: Cleaned the datasets individually, then integrated them for modeling.

🧠 Feature Engineering

  1. Understanding the data: Identified important columns and assessed their value.
  2. New features: Created new columns and converted types where needed.
  3. Scaling: Used Box-Cox transformation on numerical features to normalize data.
  4. Encoding: Applied OneHotEncoder to handle categorical features for modeling.

πŸ“‚ Technologies Used

  • Python
  • Pandas & NumPy
  • BeautifulSoup
  • Selenium
  • Scikit-learn

This project highlights my ability to go from raw web data to a clean, structured dataset ready for modeling β€” combining web scraping, data preprocessing, and feature engineering in one pipeline.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published