Author: Zixiao Tan
This project analyzes street pricing data for Diazepam (a benzodiazepine) using hierarchical modeling techniques. The analysis explores factors associated with price per milligram (ppm) and investigates geographic heterogeneity in pricing across different U.S. states.
- Which variables are associated with pricing per milligram of Diazepam?
- Is there heterogeneity in pricing of Diazepam by location?
The analysis uses the StreetRx dataset, which contains crowdsourced street drug pricing information. Key variables include:
- ppm: Price per milligram (outcome variable)
- mgstr: Dosage strength (2mg, 4mg, 5mg, 10mg)
- bulk_purchase: Indicator for purchases of 10+ units
- source: Source of information (Heard of, Internet, Personal)
- state: U.S. state where purchased
- year: Year of purchase
- Model Type: Hierarchical linear mixed model with random intercepts by state
- Response Transformation: Log transformation of ppm to meet normality assumptions
- Fixed Effects: Dosage strength (mgstr), bulk purchase indicator, source of information
- Random Effects: State-level intercepts to capture geographic variation
- Model Selection: Exhaustive search with BIC criterion across 1,024 candidate models
- Frequentist Approach: Maximum likelihood estimation using
lme4 - Bayesian Approach: MCMC sampling using
brmswith non-informative priors - Model Diagnostics: Residual analysis, influential group detection, assumption checks
- Dosage Effects: Pricing per mg varies significantly across dosage strengths
- Bulk Discounts: Bulk purchases are approximately 12.4% cheaper per mg
- Source Differences: Internet sources show 33% lower prices compared to word-of-mouth, while personal reports are 10% lower
- Geographic Variation: Significant heterogeneity in baseline pricing across states
- Model Convergence: Frequentist and Bayesian estimates are nearly identical
.
├── case_study.Rmd # Main analysis R Markdown file
├── case_study_report.Rmd # Report version
├── streetrx.RData # Dataset
├── presentation/ # Presentation materials
└── README.md # This file
tidyverse, lme4, rstan, brms, knitr, kableExtra,
patchwork, lubridate, gridExtra, influence.METo reproduce the analysis:
- Load the required R packages
- Open
case_study.Rmdin RStudio - Knit the document to generate the PDF report
# In R console
rmarkdown::render("case_study.Rmd")Watch the project presentation on YouTube:
STA 610 Case Study Presentation
- High within-state variance remains unexplained
- Normality assumption shows deviations in residual tails
- Some influential groups (e.g., Texas) detected but retained due to large sample sizes
- Missing data in
primary_reasonvariable excluded from analysis
This project was completed as part of Duke University's STA 610 course.