GitHub - peterhxk/ScrapeNYT: Scrapes 2500+ payload articles per hour from New York Times using Newspaper4k and records in JSONL file for further ML purposes.

Hi,

For answers to Machine Learning Media Bias by Tegmark et al. article, you may headover to MachineLearningMediaBias.txt

To run the model for scraping NYT, you may follow these steps:

Run: pip install -r requirements.txt
Headover to https://developer.nytimes.com/ and get your own nytime API key (make sure you enable the Article Search API)
Add the API key to .env file
Run: python ScrapeNYT.py

The program saves the scraped information to nyt_articles.jsonl. You may then use Read_jsonl.py to extract from it. Data format:

data

Unsuccessful fetches will be appended to failed_urls.txt and failed_requests.txt

Provide feedback

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.env		.env
MachineLearningMediaBias.txt		MachineLearningMediaBias.txt
README.md		README.md
Read_jsonl.py		Read_jsonl.py
ScrapeNYT.py		ScrapeNYT.py
TestScrapeContent.py		TestScrapeContent.py
failed_requests.txt		failed_requests.txt
nyt_articles.jsonl		nyt_articles.jsonl
requirements.txt		requirements.txt
sample_failed_urls.txt		sample_failed_urls.txt
sample_output.jsonl		sample_output.jsonl