Inspired by work done at the World Well-Being Project, we investigate the correlation between Twitter language usage and excessive alcohol consumption rates. We replicate a study by Curtis et al. using updated county tweet data. We successfully replicate the published baseline results as well as extensions to this baseline. We demonstrate that there is a strong correlation between the Twitter data features and drinking, stronger even than the socioeconomic and demographic features typically used to predict excessive alcohol consumption rates.
The easiest way to use this code is to run the notebook in Google Collab and upload the data folder from this repository to your Google Drive account. The notebook will then walk through the process step-by-step.