Skip to content

Gas Sensor Array Drift Project

danny314 edited this page May 12, 2014 · 7 revisions

Data Pre-Processing

  1. There are 129 features in the dataset. The very first feature has the gas class and concentration concatenated using a semi-colon.
  2. Other 128 features have the feature number conactenated with actual data using a colon.

Data pre-processing is required to address 1 and 2. First feature was split into two features GAS and CONC (concentration). For all other features the feature number was discarded and an appropriate column name was given. For example S11I_001 column is the increasing current reading for sensor 11 when alpha = 0.001.

All ten batches were combined into a single data set with an additional column 'BATCH' denoting the batch to which the observation belongs. Final clean data set contains 131 variables (128 current reading features + gas + concentration + batch) with 13910 rows.

Current Task

Currently doing data exploration and dimensionality reduction using PCA.

Clone this wiki locally