A Python desktop application for scraping Amazon product data with an intuitive GUI interface, Excel-style data management, and comprehensive export capabilities.
- Embedded Browser Interface: Navigate Amazon with a simple browser widget
- One-Click Scraping: Extract product data from Amazon pages with a single click
- Excel-Style Data Management: View and edit scraped data in a spreadsheet-like interface
- Multiple Table Support: Create, manage, and switch between multiple data tables
- Comprehensive Data Export: Export to Excel (.xlsx) with professional formatting
- Data Validation: Built-in validation and deduplication
- Search and Filter: Find specific products in your scraped data
- Configurable Settings: Customize scraping behavior and export options
-
Clone or download this repository to your local machine
-
Install Python dependencies:
pip install -r requirements.txt
-
Run the application:
python main.py
- Launch the Application: Run
python main.py - Open External Browser: Click "Open in External Browser" to navigate Amazon
- Find Products: Browse to any Amazon product page
- Copy URL: Copy the product page URL from your browser
- Paste URL: Paste the URL in the application's URL field and click "Go"
- Scrape Data: Click "Scrape Data" to extract product information
- View Results: Scraped data appears in the Excel-style grid
- Export Data: Use "Export" to save data as .xlsx files
- URL Bar: Enter Amazon product page URLs
- Navigation: Back button and quick Amazon homepage access
- External Browser: Open URLs in your default browser for full navigation
- Excel-Style View: Sortable columns with row numbers
- Search: Real-time filtering of displayed data
- Cell Editing: Double-click cells to edit values
- Row Management: Delete selected rows or clear entire tables
- Multiple Tables: Create and switch between different data collections
- Table List: Left panel shows all available tables
- Quick Actions: New, delete, and export functions in toolbar
The application extracts the following information from Amazon product pages:
- Basic Info: Title, Brand, ASIN
- Pricing: Current price and availability status
- Reviews: Average rating and review count
- Details: Product description and key features
- Media: Main product image URL
- Metadata: Category, scraping timestamp, source URL
- Excel (.xlsx): Professional formatting with tables and styling
- Multiple Formats: CSV and JSON support
- Custom Formatting: Configurable column widths and styles
- Batch Export: Export multiple tables to separate worksheets
The application creates a configuration file at ~/.amazon_scraper/config.json with settings for:
- Scraping: Request delays, timeouts, retry attempts
- Browser: Default URLs, user agent strings
- Export: File formats, formatting options
- UI: Theme, window settings, confirmation dialogs
- Data: Validation rules, deduplication settings
amazon_scraper/
├── main.py # Application entry point
├── gui/
│ ├── main_window.py # Main GUI window
│ ├── browser_widget.py # Browser interface component
│ └── data_grid.py # Excel-style data view
├── scraper/
│ └── amazon_scraper.py # Core scraping logic
├── data/
│ ├── manager.py # Data management operations
│ └── exporter.py # Excel export functionality
├── utils/
│ ├── config.py # Configuration management
│ └── helpers.py # Utility functions
├── requirements.txt # Python dependencies
└── README.md # This file
- Respect Amazon's Terms of Service: Only scrape data for personal research and analysis
- Rate Limiting: Built-in delays prevent overwhelming Amazon's servers
- No Commercial Use: This tool is for educational and personal use only
- Data Privacy: Be mindful of any personal information in scraped data
- JavaScript Content: Some dynamic content may not be captured
- Anti-Bot Measures: Amazon may block excessive requests
- Page Structure Changes: Amazon updates may affect scraping accuracy
- Network Dependency: Requires stable internet connection
- The embedded browser is simplified - use external browser for full Amazon navigation
- Copy/paste URLs from external browser for best results
- JavaScript-heavy pages may not render completely in embedded view
"No data found" when scraping:
- Ensure you're on a valid Amazon product page (URL contains
/dp/or/gp/product/) - Check your internet connection
- Try accessing the page in an external browser first
Application won't start:
- Verify Python 3.7+ is installed
- Install all requirements:
pip install -r requirements.txt - Check for error messages in the console
Export fails:
- Ensure you have write permissions to the target directory
- Check that the file isn't open in another application
- Try a different file format (CSV instead of Excel)
Slow scraping:
- Built-in delays are intentional to be respectful to Amazon's servers
- Reduce delay settings in configuration if needed (not recommended)
- requests: HTTP requests and session management
- beautifulsoup4: HTML parsing and data extraction
- pandas: Data manipulation and analysis
- openpyxl: Excel file generation and formatting
- tkinter: GUI framework (included with Python)
- New Data Fields: Modify
amazon_scraper.pyto extract additional product information - Export Formats: Add new export formats in
exporter.py - UI Enhancements: Extend GUI components in the
gui/directory - Custom Scrapers: Create scrapers for other e-commerce sites
This project is for educational purposes only. Users are responsible for complying with Amazon's Terms of Service and applicable laws regarding web scraping.
For issues, feature requests, or questions:
- Check the troubleshooting section above
- Review the configuration options
- Ensure you're using the latest version of the application