Web Scraper
A simple web scraper built with Python that extracts data from websites. This project serves as a demonstration of web scraping techniques using the requests
and BeautifulSoup
libraries.
Features
- Extracts text and HTML elements from specified web pages
- Handles pagination to scrape multiple pages
- Saves scraped data in various formats (CSV, JSON)
- Simple and easy-to-understand code structure
Installation
To get started, clone the repository:
gh repo clone G-ojies/simplewebscraper
cd simplewebscraper
Next, install the required dependencies:
bash
Copy code
pip install -r requirements.txt
Make sure you have Python 3.x installed on your machine.
Usage
To run the web scraper, execute the following command:
bash
Copy code
python scraper.py
You can customize the scraper.py file to specify the target URL, the elements you want to scrape, and the output format.
Examples
Here’s a simple example of how to use the scraper:
python
Copy code
import requests
from bs4 import BeautifulSoup
Specify the URL
url = 'http://example.com'
Send a request to the website
response = requests.get(url)
Parse the content
soup = BeautifulSoup(response.text, 'html.parser')
Find and print the title
title = soup.title.string
print(f'Title: {title}')
Contributing
Contributions are welcome! If you have suggestions for improvements or new features, please open an issue or submit a pull request.
License
This project is licensed under the MIT License. See the LICENSE file for details.
markdown
Copy code