Skip to content

SiddharthChaberia/amazon-scrapper-selenium

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Amazon Catalog Scrapping Using Selenium

Made a web scrapping script, which scraps product information from amazon on searching some specific query using selenium module, and storing the name, original price, discounted price and the link of the product to a csv file using pandas module.

This is a pretty straightforward way to scrap the product information, but before running the script, make sure you have a rotating proxy server setup. It protects your original IP address from getting banned from amazon.

First install all the required dependencies with the following command

pip install -r requirements.txt

Now run the main scrapping code, which would scrap the data. Feel free to replace the element variable in the code with something of your own choice.

python scrapper.py

The link might need to be updated as it changes dynamically for different user.

After the above code gets executed without any error, you would see the data/ folder populated with the html files, each containing some product information of the searched query

Now, its time to parse the necessary information from this sheet.

python collector.py

After the above code is finished executing properly, the data.csv is populated with the required entries.

Though I used selenium for scrapping purpose, but it is mainly used by testers to check the durability of a site with edge case inputs.

Feel free to submit issues or pull requests to improve this API. Was just learning about flask and dockerfiles.

Connect with me on:

About

Amazon Product Page Scrapper, Made Using Selenium

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages