May 6 -- 9, 2019
This short course introduces students to modern quantitative text analysis techniques. The goal is to provide an orientation for those wishing to go further with text analysis in their own research. We will discuss preprocessing texts into data (covering n-grams, stop words, stemming, and document-term matrices); comparing texts with discriminating words; and sentiment analysis using dictionary methods. Time permitting, we will introduce more advanced supervised and unsupervised machine learning methods, including topic models. We will demonstrate these techniques using the open source programming language R.
Prerequisites: Participants must have basic computer skills and be familiar with their computer’s file system. Basic knowledge of R programming is helpful but not required. Participants with no prior experience with R may wish to complete this brief tutorial (requiring 2-3 hours) to learn the basics of R before the course.
This course is geared towards social scientists who work with unstructured text data, including (but not limited to) news and media, open-ended surveys, and social media posts. Participants must have basic computer skills, be familiar with their computer’s file systems (e.g. paths). Basic knowledge of R programming is helpful but not required. By the end of the course, participants will:
- Be familiar with the main methods and techniques involved in modern computational text analysis.
- Be able to load, preprocess, and conduct simple analysis on text data.
- Know where to go next in their pursuit of more advanced computational text methods.
Rochelle Terman is currently a Provost Postdoctoral Fellow in the Department of Political Science at University of Chicago, where she will begin as Assistant Professor in 2020. Her research examines international norms, gender, and advocacy, with a focus on the Muslim world using a mix of quantitative, qualitative, and computational methods. She also teaches computational social science in a variety of capacities.
This course adapts materials from the following organizations and individuals. Thank you!
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.