A small end‑to‑end data science project demonstrating:
- Data Collection via web scraping (BeautifulSoup, Scrapy, Selenium).
- Data Storage of raw and processed files in
data/. - Exploratory Data Analysis and visualization in a Jupyter notebook.
Data_Science_Project/
├── data/ # Raw and processed datasets (CSV, JSON, etc.)
├── scraping/ # Standalone Python scripts (BeautifulSoup, Selenium)
├── spider/ # Scrapy project and spider definitions
├── data_analysis.ipynb # Jupyter notebook for EDA & visualization
└── README.md # Project overview and instructions
- Python 3.8 or higher
- Git (to clone this repository)
-
Clone the repository
git clone https://github.com/soupond/Data_Science_Project.git cd Data_Science_Project- Create and activate a virtual environment (recommended)
bashbash git clone https://github.com/soupond/Data_Science_Project.git cd Data_Science_Project
- Create and activate a virtual environment (recommended)
-
Create and activate a virtual environment (recommended)
python3 -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
If a
requirements.txtis not present, install manually:pip install pandas numpy matplotlib jupyter scrapy beautifulsoup4 requests selenium
Standalone scripts using BeautifulSoup or Selenium are located in scraping/. To run one:
python scraping/bs4_scraper.pyThe script will save output files under data/ (e.g., data/raw_listings.csv).
The Scrapy project lives in the spider/ directory. To crawl and export data:
Launch Jupyter Notebook and open the analysis notebook:
jupyter notebook data_analysis.ipynbInside, you’ll find:
- Data loading and cleaning steps
- Descriptive statistics and data summaries
- Visualizations (histograms, scatter plots, heatmaps)
- Key insights and recommendations
A requirements.txt file lists all Python package dependencies for this project. Install them with:
pip install -r requirements.txtContributions are welcome! To contribute:
- Fork the repository.
- Create a new branch (
git checkout -b feature/YourFeature). - Make your changes and commit with a clear message.
- Push to your fork and open a Pull Request.
Please ensure:
- Code follows PEP8 style guidelines.
- Dependencies are updated in
requirements.txt. - README is kept up to date with any new scripts or features.