UnPolinomio / news-scraper Public

Notifications You must be signed in to change notification settings
Fork 6
Star 9

📰 News site python-based web scraper 🐍

9 stars 6 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
scraper		scraper
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
scraper_config.yaml		scraper_config.yaml

Repository files navigation

News Web scraper

This is a news site python-based web scraper.

Use

Clone this repo and move to repo folder.
python -m venv .env
source .env/bin/activate
pip install -r requirements.txt
python main.py and wait some minutes.

Settings file

Settings file is scraper_config.yaml.

There is a news-sites list. Follow the current file structure.

For every site, config is:

General

sitename: Folder name where scraped news will be saved.

news_list

site: Url where news-site lists its news.
links: Xpath for news articles urls.

news_detail

title: Xpath for news article title text.
summary: Xpath for news article summary text.
body_paragraphs: Xpath for body paragraphs text

About

📰 News site python-based web scraper 🐍

Report repository

Languages

Python 100.0%