A Selenium-based scraper for checking if Instagram posts contain sensitive content.
Table of Contents
This project is a Selenium-based scraper designed to check if Instagram posts contain sensitive content. It utilizes headless Chrome to access Instagram post URLs, and verify the presence of sensitive content based on the page source.
To get a local copy up and running, follow these simple steps.
-
Clone the repo
git clone https://github.com/avivbenami/instagram-post-scraper.git
-
Navigate to the project directory
cd instagram-post-scraper -
Create a virtual environment
python -m venv .venv
-
Activate the virtual environment
.\.venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
def get_sens(url_list: list) -> list:
sensitive_urls = []
for url in url_list:
# Create an instance of InstagramPostScraper for the current URL
scraper = InstagramPostScraper(url)
# Sleep for 2 seconds between requests
time.sleep(2)
# Check if the post is sensitive
if scraper.is_sensitive():
sensitive_urls.append(url)
return sensitive_urls
if __name__ == "__main__":
urls = ["https://www.instagram.com/cristiano/reel/C09VyjZtoyx/",
"https://www.instagram.com/reel/C0tWyIktI1Y"]
print(get_sens(urls))- Improve error handling
- Proxy management
Project Link: https://github.com/avivbenami/instagram-post-scraper