Graduate-level machine learning, analytics, and quantitative modeling across six domains — from NLP pipelines published externally to blockchain systems and stochastic optimization in R.
- Published externally — BERT-based tweet sentiment model written up in The Startup on Medium (one of the largest tech publications on the platform) · Full write-up PDF in repo
- Full-domain breadth — NLP, predictive ML, Spark data engineering, blockchain dev, stochastic optimization, and FinTech across two semesters
- Real data, real questions — Spotify audio features, Airbnb NYC pricing, WallStreetBets community graphs, Amazon supply chain modeling, Twitter influencer detection
| Domain | Project | Methods | Stack |
|---|---|---|---|
| NLP / Deep Learning | BERT Sentiment Model | RoBERTa, 5-fold CV, Jaccard ~0.596 | Python, HuggingFace, TensorFlow |
| Text Analytics | Xbox vs PS5 Launch NLP | LDA Topic Modeling, sentiment analysis | Python, NLTK, spaCy |
| Text Analytics | Beer Recommendation Engine | Word2Vec, VADER, cosine similarity | Python, spaCy, scikit-learn |
| Data Analytics | Spotify Song Popularity | EDA on 603 songs (2010–2019), genre clustering | Python, pandas, matplotlib |
| Predictive Modeling | Airbnb NYC Price Prediction | End-to-end regression, feature engineering | Python, scikit-learn |
| Social Media Analytics | Twitter Influencer Classifier | Random Forest, RFE, 66% acc on 5,500 samples | Python, scikit-learn |
| Social Media Analytics | WallStreetBets Network Analysis | Community detection, graph centrality | Python, NetworkX |
| Data Management | Amazon Data Lake | Spark pipeline, entity modeling | Apache Spark, SQL |
| Stochastic Optimization | Portfolio Optimization Suite | LP, IP, NLP, DP, stochastic programming | R |
| FinTech | Applied Finance Suite | Robo-advising, deep learning, cryptography | Python |
| Blockchain | Bitcoin + Ethereum + Hyperledger | Smart contracts, transaction analysis | Solidity, Hyperledger |
| Time Series | Learning Structures & Forecasting | PCA, clustering, factor models | SAS |
MSBA-UT-Austin/
├── APM(Advanced Predictive Modeling)/ ← RoBERTa tweet sentiment model + HW assignments
├── Data Analytics - Summer/ ← Spotify Top 100 popularity analysis
├── Data Management/ ← Amazon Data Lake in Apache Spark
├── TextAnalysis/ ← Xbox vs PS5 NLP + Beer recommendation engine
├── Predictive Modeling -Summer/ ← Airbnb NYC price prediction pipeline
├── Social Media Analytics/ ← Twitter influencer detection + WSB network graph
├── Stochastic Controls & Optimization/ ← Portfolio optimization in R (5 projects)
├── FinTech/ ← Applied finance: robo-advising, crypto, ML
├── Blockchain Solutions and Dev/ ← Bitcoin CLI, Ethereum Solidity, Hyperledger
└── Time_Series/ ← SAS-based learning structures & forecasting
Advanced Predictive Modeling Fine-tuned RoBERTa (Robustly Optimized BERT) on tweet sentiment extraction. 5-fold stratified cross-validation on ~27,981 samples. Average Jaccard score of 0.596. Published write-up in The Startup on Medium.
Data Management Modeled Amazon's call center, warehouse, and inventory management systems as a unified data lake architecture. Implemented in Apache Spark with SQL mini-projects and MapReduce exercises.
Text Analytics Two projects: (1) Launch-day NLP on Xbox Series X vs PlayStation 5 social discourse using LDA topic modeling; (2) Attribute-based beer recommendation engine using Word2Vec embeddings and VADER sentiment analysis across 6,205 reviews of 250 beers.
Data Analytics Exploratory analysis of Spotify's Top 100 songs from 2010–2019 (603 songs). Analyzed audio features — BPM, energy, danceability, valence — against popularity scores. Identified dance pop as the dominant genre (327/603 songs).
Predictive Modeling End-to-end regression pipeline for Airbnb pricing in New York City. Feature engineering on listing attributes, neighborhood data, and host history.
Social Media Analytics Two projects: (1) Twitter influencer classifier using Random Forest and Logistic Regression with RFE feature selection — 66% accuracy on 5,500 test samples, with a cost/revenue model for affiliate targeting; (2) WallStreetBets community network analysis to identify the most connected sub-communities.
Learning Structures & Time Series SAS-based coursework covering principal component analysis (PCA), cluster analysis, and factor modeling for structured data patterns.
Stochastic Controls & Optimization Five-project R suite covering linear programming, integer programming, nonlinear programming, dynamic programming, and stochastic programming. Projects include portfolio optimization on 2019–2020 stock data.
Blockchain Solutions & Dev Three deliverables: Bitcoin command-line transaction analysis, ViCo — a YouTube clip ownership platform built in Solidity/Ethereum, and a Hyperledger-based checking account system.
FinTech Applied finance modules covering robo-advising, cryptography in Python, deep learning for financial modeling, InsureTech, quantitative investing, and marketplace lending.
→ See full project write-ups, methodology breakdowns, and outcome summaries in CASE_STUDY.md