Skip to content

jasonfeiwang/Anomaly-Detectors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

135 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AnomalyDetectors

Project Background

Team Name: Anomaly Detectors
Team Member: Fei Wang, Yumeng Ding, Gautam Moogimane

In recent years, natural resource consumption and conservation have become major areas of focus across academic, industry and political debates. One of the big components of natural resource consumption comes from energy usage in commercial buildings. Therefore, the monitoring of energy usage trends and detection of abnormal activities in these public buildings are essential to a more efficient usage of these resources. Energy usage anomaly can come from many different sources, for example, error in manual data entry, broken infrastructure, seasonality of energy consumption and so on.

Our capstone project sponsor, Jones Lang LaSalle Americas, Inc. (JLL) is in charge of collecting energy usage data for properties under their management, to ensure the clients’ energy usage are compliant with local energy disclosure laws and measure progress towards sustainability goals. The key to success of this task relies on the accuracy of the monthly-reported utility data. More specifically, we need to distinguish true abnormal energy usage from the seemly-abnormal ones caused by data quality issues or other factors like broken infrastructure. This will help increase the productivity of the JLL’s analysts (i.e. time will be saved by targeting only the sites with anomalies for audits), lead to real cost savings (e.g. fixing building operation issues that are causing energy or water waste), and increase confidence in greenhouse gas sustainability reporting data. Our team’s objective will be to provide rules/algorithms based on data analysis and potentially make suggestions on how to pipeline the detection process.

Guide to AnomalyDetector Project

There are four major folders in this repository, namely: data, doc, output, and src. (Please also refer to Folder Structure below for detailed structure.)

  • data folder contains the original data we downloaded from New York City Housing Authority, it contains electricity consumption data from 2010 to 2018 on a monthly base for buildings in New York.

  • doc folder contains artifacts we have generated during the capstone projects, from Data Pipeline, Project Proposal, Interim Presentation to Final Poster and Paper.

  • output folder houses the intermediate outputs we have generated through the project, mainly after cleaning the original dataset and prorating/imputating target metric on an account level.

  • src folder holds all the codes in this project: users can use the Data_cleaning notebook to prepare any given dataset (e.g. detecting billing gaps, prorate bills to calendar months, and imputate missing values); all Demo notebooks are a guide to see the difference amongst the three different methods and a step by step guide of how each methods work on an example account; finally, the methods notebook provides a clean loop for users to run a given csv files through the method and output a dataframe with identification of the anomalous points detected.

Project Framework

We followed the framework below throughout the project: solid boxes denote the methods we successfully implemented and the dashed boxes show the methods we tried but didn't work for our particular use case.

Screen Shot 2019-03-16 at 4 07 40 PM

Folder Structure

Anomaly-Detectors
├── README.md
├── data
│   ├── Client\ 1\ -\ Data\ for\ UW\ team.xlsx
│   ├── Client\ 2\ -\ Data\ for\ UW\ team.xlsx
│   └── NYC\ Open\ Data\ -\ Electric_Consumption_And_Cost__2010_-__June_2018_.csv
├── doc
│   ├── AnomalyDetection_Poster.pdf
│   ├── Data_Pipeline.md
│   ├── Final_Report.pdf
│   ├── Interim_Presentation.pdf
│   ├── Problem_Statement.pdf
│   └── Project_Proposal.pdf
├── environment.yml
├── output
│   ├── client1
│   │   ├── anomaly_detection_decomposition_client1_electricity_charge.csv
│   │   ├── anomaly_detection_decomposition_client1_electricity_consumption.csv
│   │   ├── anomaly_detection_prophet_client1_electricity_charge.csv
│   │   ├── anomaly_detection_prophet_client1_electricity_consumption.csv
│   │   ├── df_cleaned
│   │   ├── df_mapping
│   │   ├── df_orig
│   │   ├── df_prorated
│   │   ├── electricity_prorated_ts
│   │   └── electricity_prorated_ts.csv
│   ├── client2
│   │   ├── electricity
│   │   │   ├── anomaly_detection_decomposition_client2_electricity_charge.csv
│   │   │   ├── anomaly_detection_decomposition_client2_electricity_consumption.csv
│   │   │   ├── anomaly_detection_prophet_client2_electricity_charge.csv
│   │   │   ├── anomaly_detection_prophet_client2_electricity_consumption.csv
│   │   │   ├── df_cleaned
│   │   │   ├── df_mapping
│   │   │   ├── df_orig
│   │   │   ├── df_prorated
│   │   │   ├── electricity_prorated_ts
│   │   │   └── electricity_prorated_ts.csv
│   │   └── natural_gas
│   │       ├── anomaly_detection_decomposition_client2_natural_gas_charge.csv
│   │       ├── anomaly_detection_decomposition_client2_natural_gas_consumption.csv
│   │       ├── anomaly_detection_prophet_client2_natural_gas_charge.csv
│   │       ├── anomaly_detection_prophet_client2_natural_gas_consumption.csv
│   │       ├── df_cleaned
│   │       ├── df_mapping
│   │       ├── df_orig
│   │       ├── df_prorated
│   │       ├── natural_gas_prorated_ts
│   │       └── natural_gas_prorated_ts.csv
│   └── nycha
│       ├── NYCHA_Prorated_KWH
│       ├── NYCHA_Prorated_KWH.csv
│       ├── NYCHA_TS.csv
│       └── result_summary_plots
│           ├── Clustering.png
│           ├── Prophet.png
│           └── STL.png
└── src
    ├── Clustering.ipynb
    ├── Clustering_Demo.ipynb
    ├── Data_Cleaning_Client1.ipynb
    ├── Data_Cleaning_Client2_Electricity.ipynb
    ├── Data_Cleaning_Client2_Natural_Gas.ipynb
    ├── Data_Cleaning_NYCHA.ipynb
    ├── Decomposition.ipynb
    ├── Decomposition_Demo.ipynb
    ├── Prophet.ipynb
    └── Prophet_Demo.ipynb

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •