A project where I scraped data from an online book website and analyzed it using Python, SQL, and data visualization tools to uncover key trends and insights.
This project demonstrates the complete data pipeline β from web scraping raw book data to extracting insights using SQL and Python. It includes:
- Scraping book titles, prices, availability, ratings, and categories
- Cleaning and structuring data using Python (Pandas)
- Loading data into a SQLite/PostgreSQL database
- Performing SQL queries to extract meaningful insights
- Visualizing data patterns with Matplotlib/Seaborn/Plotly
| Tool | Purpose |
|---|---|
Python |
Core scripting and data analysis |
BeautifulSoup / Requests |
Web scraping |
Pandas |
Data cleaning & manipulation |
SQLite or PostgreSQL |
Data storage & SQL queries |
Matplotlib / Seaborn |
Data visualization |
Jupyter Notebook |
Project documentation |
Here are a few insights extracted:
- πΈ Average book price across all categories
- π Most common book categories
- β Distribution of ratings
- π« Out-of-stock vs In-stock books
- π Category-wise pricing trends
(More insights are available in the analysis notebook.)