Amazon Web Scraper Desktop Application

A Python desktop application for scraping Amazon product data with an intuitive GUI interface, Excel-style data management, and comprehensive export capabilities.

Features

Embedded Browser Interface: Navigate Amazon with a simple browser widget
One-Click Scraping: Extract product data from Amazon pages with a single click
Excel-Style Data Management: View and edit scraped data in a spreadsheet-like interface
Multiple Table Support: Create, manage, and switch between multiple data tables
Comprehensive Data Export: Export to Excel (.xlsx) with professional formatting
Data Validation: Built-in validation and deduplication
Search and Filter: Find specific products in your scraped data
Configurable Settings: Customize scraping behavior and export options

Installation

Clone or download this repository to your local machine
Install Python dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python main.py
```

Usage

Getting Started

Launch the Application: Run python main.py
Open External Browser: Click "Open in External Browser" to navigate Amazon
Find Products: Browse to any Amazon product page
Copy URL: Copy the product page URL from your browser
Paste URL: Paste the URL in the application's URL field and click "Go"
Scrape Data: Click "Scrape Data" to extract product information
View Results: Scraped data appears in the Excel-style grid
Export Data: Use "Export" to save data as .xlsx files

Main Interface

Browser Panel

URL Bar: Enter Amazon product page URLs
Navigation: Back button and quick Amazon homepage access
External Browser: Open URLs in your default browser for full navigation

Data Grid

Excel-Style View: Sortable columns with row numbers
Search: Real-time filtering of displayed data
Cell Editing: Double-click cells to edit values
Row Management: Delete selected rows or clear entire tables

Table Management

Multiple Tables: Create and switch between different data collections
Table List: Left panel shows all available tables
Quick Actions: New, delete, and export functions in toolbar

Scraped Data Fields

The application extracts the following information from Amazon product pages:

Basic Info: Title, Brand, ASIN
Pricing: Current price and availability status
Reviews: Average rating and review count
Details: Product description and key features
Media: Main product image URL
Metadata: Category, scraping timestamp, source URL

Export Options

Excel (.xlsx): Professional formatting with tables and styling
Multiple Formats: CSV and JSON support
Custom Formatting: Configurable column widths and styles
Batch Export: Export multiple tables to separate worksheets

Configuration

The application creates a configuration file at ~/.amazon_scraper/config.json with settings for:

Scraping: Request delays, timeouts, retry attempts
Browser: Default URLs, user agent strings
Export: File formats, formatting options
UI: Theme, window settings, confirmation dialogs
Data: Validation rules, deduplication settings

Project Structure

amazon_scraper/
├── main.py                 # Application entry point
├── gui/
│   ├── main_window.py     # Main GUI window
│   ├── browser_widget.py  # Browser interface component
│   └── data_grid.py       # Excel-style data view
├── scraper/
│   └── amazon_scraper.py  # Core scraping logic
├── data/
│   ├── manager.py         # Data management operations
│   └── exporter.py        # Excel export functionality
├── utils/
│   ├── config.py          # Configuration management
│   └── helpers.py         # Utility functions
├── requirements.txt       # Python dependencies
└── README.md             # This file

Important Notes

Legal and Ethical Usage

Respect Amazon's Terms of Service: Only scrape data for personal research and analysis
Rate Limiting: Built-in delays prevent overwhelming Amazon's servers
No Commercial Use: This tool is for educational and personal use only
Data Privacy: Be mindful of any personal information in scraped data

Technical Limitations

JavaScript Content: Some dynamic content may not be captured
Anti-Bot Measures: Amazon may block excessive requests
Page Structure Changes: Amazon updates may affect scraping accuracy
Network Dependency: Requires stable internet connection

Browser Requirements

The embedded browser is simplified - use external browser for full Amazon navigation
Copy/paste URLs from external browser for best results
JavaScript-heavy pages may not render completely in embedded view

Troubleshooting

Common Issues

"No data found" when scraping:

Ensure you're on a valid Amazon product page (URL contains /dp/ or /gp/product/)
Check your internet connection
Try accessing the page in an external browser first

Application won't start:

Verify Python 3.7+ is installed
Install all requirements: pip install -r requirements.txt
Check for error messages in the console

Export fails:

Ensure you have write permissions to the target directory
Check that the file isn't open in another application
Try a different file format (CSV instead of Excel)

Slow scraping:

Built-in delays are intentional to be respectful to Amazon's servers
Reduce delay settings in configuration if needed (not recommended)

Development

Dependencies

requests: HTTP requests and session management
beautifulsoup4: HTML parsing and data extraction
pandas: Data manipulation and analysis
openpyxl: Excel file generation and formatting
tkinter: GUI framework (included with Python)

Extending the Application

New Data Fields: Modify amazon_scraper.py to extract additional product information
Export Formats: Add new export formats in exporter.py
UI Enhancements: Extend GUI components in the gui/ directory
Custom Scrapers: Create scrapers for other e-commerce sites

License

This project is for educational purposes only. Users are responsible for complying with Amazon's Terms of Service and applicable laws regarding web scraping.

Support

For issues, feature requests, or questions:

Check the troubleshooting section above
Review the configuration options
Ensure you're using the latest version of the application

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Web Scraper Desktop Application

Features

Installation

Usage

Getting Started

Main Interface

Browser Panel

Data Grid

Table Management

Scraped Data Fields

Export Options

Configuration

Project Structure

Important Notes

Legal and Ethical Usage

Technical Limitations

Browser Requirements

Troubleshooting

Common Issues

Development

Dependencies

Extending the Application

License

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
gui		gui
scraper		scraper
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
amazon scraper logo.ico		amazon scraper logo.ico
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Amazon Web Scraper Desktop Application

Features

Installation

Usage

Getting Started

Main Interface

Browser Panel

Data Grid

Table Management

Scraped Data Fields

Export Options

Configuration

Project Structure

Important Notes

Legal and Ethical Usage

Technical Limitations

Browser Requirements

Troubleshooting

Common Issues

Development

Dependencies

Extending the Application

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages