Skip to content

medss19/web-scraping-with-beautiful-soup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

web-scraping-with-beautiful-soup

https://www.linkedin.com/posts/medha-agarwal-01b33725a_internship-pythonprogramming-webscraping-activity-7214991432367976448-8iL1?utm_source=share&utm_medium=member_desktop

𝗗𝗲𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻: 𝗪𝗲𝗯𝘀𝗶𝘁𝗲 𝗦𝗲𝗹𝗲𝗰𝘁𝗶𝗼𝗻: Chose BigBasket, a website with publicly accessible product listings. 𝗗𝗮𝘁𝗮 𝗘𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻: Used the Beautiful Soup library to scrape HTML content and extract relevant information such as product titles, prices, quantities, and discounts. 𝗗𝗮𝘁𝗮 𝗦𝘁𝗼𝗿𝗮𝗴𝗲: Stored the extracted data in a structured format (CSV file) for further analysis and use. 𝗧𝗲𝗰𝗵𝗻𝗶𝗰𝗮𝗹 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀: Handled issues like dynamic content loading, ensuring accurate and complete data extraction.

𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 𝗛𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝘀:

  • Utilized Selenium for navigating and interacting with the dynamic website.
  • Leveraged Beautiful Soup for parsing HTML content and extracting product details.
  • Implemented a scrolling mechanism to handle infinite scrolling and ensure all products were captured.
  • Ensured data integrity by handling missing or unavailable data gracefully.

𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀 𝗙𝗮𝗰𝗲𝗱:

  • Managing dynamic content loading and ensuring the scraper captures all products as the page scrolls.
  • Handling website structure changes and ensuring the scraper adapts accordingly.
  • Optimizing the scraper to efficiently process and store large amounts of data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages