Skip to content

Latest commit

 

History

History
100 lines (72 loc) · 6.53 KB

File metadata and controls

100 lines (72 loc) · 6.53 KB

MS Business Analytics — UT Austin McCombs (2020–2021)

Graduate-level machine learning, analytics, and quantitative modeling across six domains — from NLP pipelines published externally to blockchain systems and stochastic optimization in R.

Python R TensorFlow Jupyter Apache Spark scikit-learn


Highlights

  • Published externally — BERT-based tweet sentiment model written up in The Startup on Medium (one of the largest tech publications on the platform) · Full write-up PDF in repo
  • Full-domain breadth — NLP, predictive ML, Spark data engineering, blockchain dev, stochastic optimization, and FinTech across two semesters
  • Real data, real questions — Spotify audio features, Airbnb NYC pricing, WallStreetBets community graphs, Amazon supply chain modeling, Twitter influencer detection

Projects at a Glance

Domain Project Methods Stack
NLP / Deep Learning BERT Sentiment Model RoBERTa, 5-fold CV, Jaccard ~0.596 Python, HuggingFace, TensorFlow
Text Analytics Xbox vs PS5 Launch NLP LDA Topic Modeling, sentiment analysis Python, NLTK, spaCy
Text Analytics Beer Recommendation Engine Word2Vec, VADER, cosine similarity Python, spaCy, scikit-learn
Data Analytics Spotify Song Popularity EDA on 603 songs (2010–2019), genre clustering Python, pandas, matplotlib
Predictive Modeling Airbnb NYC Price Prediction End-to-end regression, feature engineering Python, scikit-learn
Social Media Analytics Twitter Influencer Classifier Random Forest, RFE, 66% acc on 5,500 samples Python, scikit-learn
Social Media Analytics WallStreetBets Network Analysis Community detection, graph centrality Python, NetworkX
Data Management Amazon Data Lake Spark pipeline, entity modeling Apache Spark, SQL
Stochastic Optimization Portfolio Optimization Suite LP, IP, NLP, DP, stochastic programming R
FinTech Applied Finance Suite Robo-advising, deep learning, cryptography Python
Blockchain Bitcoin + Ethereum + Hyperledger Smart contracts, transaction analysis Solidity, Hyperledger
Time Series Learning Structures & Forecasting PCA, clustering, factor models SAS

Repository Structure

MSBA-UT-Austin/
├── APM(Advanced Predictive Modeling)/    ← RoBERTa tweet sentiment model + HW assignments
├── Data Analytics - Summer/              ← Spotify Top 100 popularity analysis
├── Data Management/                      ← Amazon Data Lake in Apache Spark
├── TextAnalysis/                         ← Xbox vs PS5 NLP + Beer recommendation engine
├── Predictive Modeling -Summer/          ← Airbnb NYC price prediction pipeline
├── Social Media Analytics/               ← Twitter influencer detection + WSB network graph
├── Stochastic Controls & Optimization/   ← Portfolio optimization in R (5 projects)
├── FinTech/                              ← Applied finance: robo-advising, crypto, ML
├── Blockchain Solutions and Dev/         ← Bitcoin CLI, Ethereum Solidity, Hyperledger
└── Time_Series/                          ← SAS-based learning structures & forecasting

Semester Breakdown

Fall 2020

Advanced Predictive Modeling Fine-tuned RoBERTa (Robustly Optimized BERT) on tweet sentiment extraction. 5-fold stratified cross-validation on ~27,981 samples. Average Jaccard score of 0.596. Published write-up in The Startup on Medium.

Data Management Modeled Amazon's call center, warehouse, and inventory management systems as a unified data lake architecture. Implemented in Apache Spark with SQL mini-projects and MapReduce exercises.

Text Analytics Two projects: (1) Launch-day NLP on Xbox Series X vs PlayStation 5 social discourse using LDA topic modeling; (2) Attribute-based beer recommendation engine using Word2Vec embeddings and VADER sentiment analysis across 6,205 reviews of 250 beers.

Data Analytics Exploratory analysis of Spotify's Top 100 songs from 2010–2019 (603 songs). Analyzed audio features — BPM, energy, danceability, valence — against popularity scores. Identified dance pop as the dominant genre (327/603 songs).

Predictive Modeling End-to-end regression pipeline for Airbnb pricing in New York City. Feature engineering on listing attributes, neighborhood data, and host history.


Spring 2021

Social Media Analytics Two projects: (1) Twitter influencer classifier using Random Forest and Logistic Regression with RFE feature selection — 66% accuracy on 5,500 test samples, with a cost/revenue model for affiliate targeting; (2) WallStreetBets community network analysis to identify the most connected sub-communities.

Learning Structures & Time Series SAS-based coursework covering principal component analysis (PCA), cluster analysis, and factor modeling for structured data patterns.

Stochastic Controls & Optimization Five-project R suite covering linear programming, integer programming, nonlinear programming, dynamic programming, and stochastic programming. Projects include portfolio optimization on 2019–2020 stock data.

Blockchain Solutions & Dev Three deliverables: Bitcoin command-line transaction analysis, ViCo — a YouTube clip ownership platform built in Solidity/Ethereum, and a Hyperledger-based checking account system.

FinTech Applied finance modules covering robo-advising, cryptography in Python, deep learning for financial modeling, InsureTech, quantitative investing, and marketplace lending.


→ See full project write-ups, methodology breakdowns, and outcome summaries in CASE_STUDY.md