Skip to content

Latest commit

 

History

History
10 lines (7 loc) · 708 Bytes

File metadata and controls

10 lines (7 loc) · 708 Bytes

University of Chicago's Maroon Newspaper Scrapper

The python notebook includes code to extract the text of the op-eds/editorials written by people affiliated with UChicago. The final output is in the form of a dataframe that includes the name of the author, the author's byline, the text of the article, the link to the article and the type of the article (op-ed, editorial etc.).

The texts of the Maroon Viewpoints can be used in a number of ways:

  1. topic modeling (LDA, NMF etc.) to get a sense of what topics people commonly write on.
  2. sentiment analysis using a pretrained Sentiment Analyzer.
  3. counting word frequencies.
  4. part-of-speech tagging using a pretrained POS tagger, etc.