Skip to content

zanngujjar/Amazon-Item-Scrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amazon Web Scraper Desktop Application

A Python desktop application for scraping Amazon product data with an intuitive GUI interface, Excel-style data management, and comprehensive export capabilities.

Features

  • Embedded Browser Interface: Navigate Amazon with a simple browser widget
  • One-Click Scraping: Extract product data from Amazon pages with a single click
  • Excel-Style Data Management: View and edit scraped data in a spreadsheet-like interface
  • Multiple Table Support: Create, manage, and switch between multiple data tables
  • Comprehensive Data Export: Export to Excel (.xlsx) with professional formatting
  • Data Validation: Built-in validation and deduplication
  • Search and Filter: Find specific products in your scraped data
  • Configurable Settings: Customize scraping behavior and export options

Installation

  1. Clone or download this repository to your local machine

  2. Install Python dependencies:

    pip install -r requirements.txt
  3. Run the application:

    python main.py

Usage

Getting Started

  1. Launch the Application: Run python main.py
  2. Open External Browser: Click "Open in External Browser" to navigate Amazon
  3. Find Products: Browse to any Amazon product page
  4. Copy URL: Copy the product page URL from your browser
  5. Paste URL: Paste the URL in the application's URL field and click "Go"
  6. Scrape Data: Click "Scrape Data" to extract product information
  7. View Results: Scraped data appears in the Excel-style grid
  8. Export Data: Use "Export" to save data as .xlsx files

Main Interface

Browser Panel

  • URL Bar: Enter Amazon product page URLs
  • Navigation: Back button and quick Amazon homepage access
  • External Browser: Open URLs in your default browser for full navigation

Data Grid

  • Excel-Style View: Sortable columns with row numbers
  • Search: Real-time filtering of displayed data
  • Cell Editing: Double-click cells to edit values
  • Row Management: Delete selected rows or clear entire tables

Table Management

  • Multiple Tables: Create and switch between different data collections
  • Table List: Left panel shows all available tables
  • Quick Actions: New, delete, and export functions in toolbar

Scraped Data Fields

The application extracts the following information from Amazon product pages:

  • Basic Info: Title, Brand, ASIN
  • Pricing: Current price and availability status
  • Reviews: Average rating and review count
  • Details: Product description and key features
  • Media: Main product image URL
  • Metadata: Category, scraping timestamp, source URL

Export Options

  • Excel (.xlsx): Professional formatting with tables and styling
  • Multiple Formats: CSV and JSON support
  • Custom Formatting: Configurable column widths and styles
  • Batch Export: Export multiple tables to separate worksheets

Configuration

The application creates a configuration file at ~/.amazon_scraper/config.json with settings for:

  • Scraping: Request delays, timeouts, retry attempts
  • Browser: Default URLs, user agent strings
  • Export: File formats, formatting options
  • UI: Theme, window settings, confirmation dialogs
  • Data: Validation rules, deduplication settings

Project Structure

amazon_scraper/
├── main.py                 # Application entry point
├── gui/
│   ├── main_window.py     # Main GUI window
│   ├── browser_widget.py  # Browser interface component
│   └── data_grid.py       # Excel-style data view
├── scraper/
│   └── amazon_scraper.py  # Core scraping logic
├── data/
│   ├── manager.py         # Data management operations
│   └── exporter.py        # Excel export functionality
├── utils/
│   ├── config.py          # Configuration management
│   └── helpers.py         # Utility functions
├── requirements.txt       # Python dependencies
└── README.md             # This file

Important Notes

Legal and Ethical Usage

  • Respect Amazon's Terms of Service: Only scrape data for personal research and analysis
  • Rate Limiting: Built-in delays prevent overwhelming Amazon's servers
  • No Commercial Use: This tool is for educational and personal use only
  • Data Privacy: Be mindful of any personal information in scraped data

Technical Limitations

  • JavaScript Content: Some dynamic content may not be captured
  • Anti-Bot Measures: Amazon may block excessive requests
  • Page Structure Changes: Amazon updates may affect scraping accuracy
  • Network Dependency: Requires stable internet connection

Browser Requirements

  • The embedded browser is simplified - use external browser for full Amazon navigation
  • Copy/paste URLs from external browser for best results
  • JavaScript-heavy pages may not render completely in embedded view

Troubleshooting

Common Issues

"No data found" when scraping:

  • Ensure you're on a valid Amazon product page (URL contains /dp/ or /gp/product/)
  • Check your internet connection
  • Try accessing the page in an external browser first

Application won't start:

  • Verify Python 3.7+ is installed
  • Install all requirements: pip install -r requirements.txt
  • Check for error messages in the console

Export fails:

  • Ensure you have write permissions to the target directory
  • Check that the file isn't open in another application
  • Try a different file format (CSV instead of Excel)

Slow scraping:

  • Built-in delays are intentional to be respectful to Amazon's servers
  • Reduce delay settings in configuration if needed (not recommended)

Development

Dependencies

  • requests: HTTP requests and session management
  • beautifulsoup4: HTML parsing and data extraction
  • pandas: Data manipulation and analysis
  • openpyxl: Excel file generation and formatting
  • tkinter: GUI framework (included with Python)

Extending the Application

  • New Data Fields: Modify amazon_scraper.py to extract additional product information
  • Export Formats: Add new export formats in exporter.py
  • UI Enhancements: Extend GUI components in the gui/ directory
  • Custom Scrapers: Create scrapers for other e-commerce sites

License

This project is for educational purposes only. Users are responsible for complying with Amazon's Terms of Service and applicable laws regarding web scraping.

Support

For issues, feature requests, or questions:

  1. Check the troubleshooting section above
  2. Review the configuration options
  3. Ensure you're using the latest version of the application

About

Client project for scraping Amazon items and exporing data as a .xslx file

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages