Skip to content

aslanshi/Fraud-Analytics-Projects

Repository files navigation

This repository is a collection of jupyter notebooks in Python that I developed for recent projects.

Project 1: Unsupervised ML on NY Properties Anomaly Detection

  • Created a business-level data quality report on NY property data with over 1 million records.
  • Leveraged google API and web scraping to fill in missing location data using Python; created and normalized expert variables by dividing averages at different granularities; optimized speed performance by applying vectorization.
  • Conduct principal component analysis to reduce dimensionality; visualized component variance explained; standardized principle components and calculated Euclidean distances of z-scaled components; built an autoencoder model and calculated Euclidean distances of the difference between inputs and decoded outputs; ranked by each score to unify scales and combined two rankings.
  • Detected potential anomalies, traced down possible causes and created a high-level business report.

Project 2: Supervised Real Time Identity Fraud Model on Product Applications

  • Produced exploratory data analysis report on 1 million product application records with fraud labels; cleaned frivolous fields.
  • Applied smooth transition function to create a risk table of fraud probabilities at each day of the week at proper granular levels.
  • Created over 300 candidate time series variables; improved execution speed of professor’s script by 110x; achieved the best execution performance in the class.
  • Applied feature selection methods such as Kolmogorov–Smirnov, fraud detection rate, stepwise logistic regression, and Lasso regularization to reduce dimensionality; split dataset into training/testing and out of time validation sets; built supervised KNN/tree-based/support vector machines/neural networks classification models and conduct grid search/random search for hyperparameter tuning; calculated/plotted goodness measures such as ROC, AUC, as well as bias-variance tradeoff.
  • Picked model with the highest fraud detection rate at 3% on the out of time validation set; produced a high-level business report.

About

This repository is a collection of jupyter notebooks in Python that I developed for recent projects.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors