Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify partisan phrase corpus #28

Closed
2 tasks done
slifty opened this issue Jul 17, 2018 · 2 comments
Closed
2 tasks done

Identify partisan phrase corpus #28

slifty opened this issue Jul 17, 2018 · 2 comments
Assignees

Comments

@slifty
Copy link
Member

slifty commented Jul 17, 2018

Issue #17 outlines the first experiment we're going to try to take on, which is the idea of modifying language in order to present information without triggering defensiveness by adjusting the specific words (or phrases) used.

This involves a research task of exploring the impact words and tone can have on meaning making; it also involves the collection of known charged words or phrases, along with potential replacements.

This issue involves both of these things:

  • Identify and collect research (storing key insights in the wiki) related to this goal
  • Identify and collect existing databases (or lists) of words that fall on a spectrum that research (or intuition) indicates might be useful for our goal. E.g. words that indicate partisanship, words that are highly associated with certain media sources, etc.

NOTE: we may decide it is best to try to generate these lists of words ourselves through TF/IDF style analysis on clusters of sites (e.g. what words are most associated with Fox News vs MSNBC). Lets start with the less engineered approach, however.

@slifty slifty assigned slifty and crupar and unassigned slifty Jul 17, 2018
@slifty slifty added this to the Phase 1: Collection and Black Boxes milestone Jul 17, 2018
@slifty slifty assigned slifty and unassigned crupar Jan 1, 2019
@slifty
Copy link
Member Author

slifty commented Jan 1, 2019

We are exploring this and logging the papers and notes in Zotero

Most of the papers we have found that identify partisan language use the congressional record to identify which phrases are most "telling" in terms of their ability to predict the speaker's party (congressional record makes it possible to know if a Democrat or Republican is talking at a given time).

There are two paths forward from this task (as alluded to in the original description). The first is to reach out to the authors of the previous papers and find the list of post-processed partisan phrases (or see if they just exist, though initial scans seem to have only the raw data that led to their creation). The second is to build out a system to generate those phrases ourselves.

For the sake of a demo, the static list is good enough. However, partisanship changes over time and it is important for the mid term vision to have more control (and more recent) databases of partisanship.

@slifty slifty changed the title Collecting words Identify partisan phrase corpus Jan 1, 2019
@slifty slifty modified the milestones: Phase 1: Collection and Black Boxes, January: Design and Data Jan 1, 2019
@slifty
Copy link
Member Author

slifty commented Feb 6, 2019

We have a corpus!

We'll be using the data from this paper for our March demo, and are in conversations with the researchers behind this paper about getting access to their list of 10k phrases.

@slifty slifty closed this as completed Feb 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants