Skip to content

πŸ“° News site python-based web scraper 🐍

Notifications You must be signed in to change notification settings

UnPolinomio/news-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

News Web scraper

This is a news site python-based web scraper.

Use

  1. Clone this repo and move to repo folder.
  2. python -m venv .env
  3. source .env/bin/activate
  4. pip install -r requirements.txt
  5. python main.py and wait some minutes.

Settings file

Settings file is scraper_config.yaml.

There is a news-sites list. Follow the current file structure.

For every site, config is:

General

  • sitename: Folder name where scraped news will be saved.

news_list

  • site: Url where news-site lists its news.
  • links: Xpath for news articles urls.

news_detail

  • title: Xpath for news article title text.
  • summary: Xpath for news article summary text.
  • body_paragraphs: Xpath for body paragraphs text

About

πŸ“° News site python-based web scraper 🐍

Resources

Stars

Watchers

Forks

Languages