CEDL2017 · chadHGY · Oct 12, 2017
diff --git a/results/finetune1.PNG b/results/finetune1.PNG
diff --git a/results/finetune2.PNG b/results/finetune2.PNG
diff --git a/results/finetune2stream.PNG b/results/finetune2stream.PNG
diff --git a/results/index.md b/results/index.md
@@ -1,47 +1,113 @@
-# Your Name <span style="color:red">(id)</span>
+# Your Name 楊皓鈞<span style="color:red">(105061523)</span>
 
-#Project 5: Deep Classification
+#Project 1: Deep Classification
+
+
+<center>
+<img src="./overview.png" alt="overview" style="float:middle;">
+</center>
 
 ## Overview
-The project is related to 
-> quote
+> Recently, the technological advance of wearable devices has led to significant interests in recognizing human behaviors in daily life (i.e., uninstrumented environment). Among many devices, egocentric camera systems have drawn significant attention, since the camera is aligned with the field-of-view of wearer, it naturally captures what a person sees. These systems have shown great potential in recognizing daily activities(e.g., making meals, watching TV, etc.), estimating hand poses, generating howto videos, etc.
+Despite many advantages of egocentric camera systems, there exists two main issues which are much less discussed. Firstly, hand localization is not solved especially for passive camera systems. Even for active camera systems like Kinect, hand localization is challenging when two hands are interacting or a hand is interacting with an object. Secondly, the limited field-of-view of an egocentric camera implies that hands will inevitably move outside the images sometimes.     
+HandCam (Fig. 1), a novel wearable camera capturing activities of hands, for recognizing human behaviors. HandCam has two main advantages over egocentric systems : (1) it avoids the need to detect hands and manipulation regions; (2) it observes the activities of hands almost at all time.
+
 
 
 ## Implementation
-1. One
-	* item
-	* item
-2. Two
+1. Single Stream CNN
+    In this work, the basic network that we're going to fine-tune is the [ResNet50](https://github.com/fchollet/keras/blob/master/keras/applications/resnet50.py). Due to the limitations of time and computation resources, we simply fine tune two version:
+    * Last fully connect layer <br>
+    <img src="./finetune1.PNG" >
+    * Last convolutional layer + fully connect layer
+    <img src="./finetune2.PNG"> <br>
+    <br>
+
+2. Two-Stream CNN
+    To achieve better performance, we implement the two stream CNN just like the architecture described in [paper](https://arxiv.org/abs/1512.01881) 
+    <img src="./finetune2stream.PNG"> <br>
+    <br>
+
+3. Class Weight
+    From the histogram below, we observe that there's a huge data unbalance (mostly sample belong to empty, i.e. no object)in the given dataset. Therefore, we try to make a compensation about it by having different weightings against different class.
+    <img src="./labelhist.png"> <br>
+
+    ```
+    def compute_class_weight(labelPath):
+        with open(labelPath, 'r') as text_file:
+            content = text_file.readlines()
+        content = np.asarray(content)
+        y = np.asarray([int(sample.split(' ')[2].strip('\n')) for sample in content] )
+
+        classes = np.asarray(list(set(y)))
+        le = LabelEncoder()
+        y_ind = le.fit_transform(y)
+        recip_freq = len(y) / (len(le.classes_) * bincount(y_ind).astype(np.float64))
+        weight = recip_freq[le.transform(classes)]
+        return weight
+    ```
+    <br>
 
-```
-Code highlights
-```
 
 ## Installation
-* Other required packages.
-* How to compile from source?
+* python library for building deep neural network: [keras](https://keras.io/)
+* python library for performing image augmentation: [imgaug](https://github.com/aleju/imgaug)
+
+
+## Results
+
+* Comparison: Weighted vs Noweighted when training
+<table border=1>
+<tr>
+    <td> Weighted by class </td>
+    <td> NoWeighted by class </td>
+</tr>
+<tr>
+    <td> <img src="./resnet_16_weighted_fc.png" alt="overview" style="float:middle;"> </td>
+    <td> <img src="./resnet_16_noweighted_fc.png" alt="overview" style="float:middle;"> </td>
+</tr>
+</table>
 
-### Results
 
+
+* Comparison: Single stream vs Two stream
 <table border=1>
 <tr>
-<td>
-<img src="placeholder.jpg" width="24%"/>
-<img src="placeholder.jpg"  width="24%"/>
-<img src="placeholder.jpg" width="24%"/>
-<img src="placeholder.jpg" width="24%"/>
-</td>
+    <td> Single Stream </td>
+    <td> Two Stream </td>
+</tr>
+<tr>
+    <td> <img src="./resnet_32_twostring_weighted_fc.png" alt="overview" style="float:middle;"> </td>
+    <td> <img src="./resnet_32_weighted_fc.png" alt="overview" style="float:middle;"> </td>
 </tr>
+</table>
 
+
+* Comparison: Finetune fully connected layer only or Finetune conv+fc layer
+<table border=1>
+<tr>
+    <td> FC only </td>
+    <td> FC + 1-layer conv </td>
+</tr>
 <tr>
-<td>
-<img src="placeholder.jpg" width="24%"/>
-<img src="placeholder.jpg"  width="24%"/>
-<img src="placeholder.jpg" width="24%"/>
-<img src="placeholder.jpg" width="24%"/>
-</td>
+    <td> <img src="./resnet_32_weighted_fc.png" alt="overview" style="float:middle;"> </td>
+    <td> <img src="./resnet_32_weighted_1conv.png" alt="overview" style="float:middle;"> </td>
 </tr>
+</table>
 
+* Comparison: flip input left to right or not
+<table border=1>
+<tr>
+    <td> Fliplr </td>
+    <td> No fliplr </td>
+</tr>
+<tr>
+    <td> <img src="./resnet_twostring_16_noweighted_fc_fliplr.png" alt="overview" style="float:middle;"> </td>
+    <td> <img src="./resnet_twostring_16_noweighted_1conv.png" alt="overview" style="float:middle;"> </td>
+</tr>
 </table>
 
 
+## Discussion
+After working exhaustedly, the best test accuracy I can achieve remains around <span style="color:red">0.51</span>. The weighting method doesn't seems to work, and either single stream or two stream network doesn't seem to construct a stable network. I guess the possible reason is that due to the GPU memory constrain, I didn't try larger batchsize while training by network, which results in quite label unbalance in each batch, and eventually the whole network crash. This is really an interesting task and I would really curious about how other classmates conquer this task. T_T
+
diff --git a/results/labelhist.png b/results/labelhist.png
diff --git a/results/overview.png b/results/overview.png
diff --git a/results/resnet_16_noweighted_fc.png b/results/resnet_16_noweighted_fc.png
diff --git a/results/resnet_16_weighted_fc.png b/results/resnet_16_weighted_fc.png
diff --git a/results/resnet_32_twostring_weighted_fc.png b/results/resnet_32_twostring_weighted_fc.png
diff --git a/results/resnet_32_weighted_1conv.png b/results/resnet_32_weighted_1conv.png
diff --git a/results/resnet_32_weighted_fc.png b/results/resnet_32_weighted_fc.png
diff --git a/results/resnet_twostring_16_noweighted_1conv.png b/results/resnet_twostring_16_noweighted_1conv.png
diff --git a/results/resnet_twostring_16_noweighted_fc_fliplr.png b/results/resnet_twostring_16_noweighted_fc_fliplr.png
diff --git a/scripts/create_traintestTXT.py b/scripts/create_traintestTXT.py
@@ -0,0 +1,77 @@
+import os 
+import numpy as np
+import pandas as pd
+os.chdir('D:/Lab/CEDL/hw1/scripts/')
+main_dir = '../data/'
+envs = ['house','lab','office']
+
+label_name = 'labels/'
+label_device = ['obj_left','obj_right']
+# label_device = ['obj_left']
+TWOSTRING = True
+
+data_name = 'test/'
+data_part = ['1','2','3','4']
+data_device = ['Lhand','Rhand']
+# data_device = ['head']
+
+## note for the pre-process of head-data:
+'''
+for head 'image' and 'label', I arbitrary select left-hand label as its correponding head-label, since I will 
+not use the label of the head in the parallel structure, so it doesn;t matter.
+'''
+
+#%%
+total_val_num = 0
+total_train_num = 0
+for _, env in enumerate(envs):
+    for idx, device in enumerate(label_device):
+        for _, part in enumerate(data_part):
+            if env != 'lab' and part == '4':
+                continue 
+            if data_name=='test/' and env =='lab':
+                label_f_dir = main_dir+label_name+env+'/'+device+str(int(part)+4)+'.npy' # label test
+            elif data_name=='test/':
+                label_f_dir = main_dir+label_name+env+'/'+device+str(int(part)+3)+'.npy' # label test
+            else:
+                label_f_dir = main_dir+label_name+env+'/'+device+part+'.npy'    # label train
+            label_array = np.load(label_f_dir)
+            print('now reading %s' % label_f_dir)
+            # img_num = len(label_array)
+
+            for i, label in enumerate(label_array):
+                train_num = int(len(label_array)*0.7)
+                val_num = len(label_array) - train_num    
+
+                '''
+                if i < train_num:
+                    with open("head_train.txt", "a") as text_file:
+                        f_dir = main_dir+data_name+env+'/'+part+'/'+data_device[idx]+'/'+'Image'+str(i+1)+'.png'
+                        cores_label = str(int(label))
+                        text_file.write(f_dir+' '+cores_label+'\n')
+                        total_train_num += 1
+                else:
+                    with open("head_val.txt", "a") as text_file:
+                        f_dir = main_dir+data_name+env+'/'+part+'/'+data_device[idx]+'/'+'Image'+str(i+1)+'.png'
+                        cores_label = str(int(label))
+                        text_file.write(f_dir+' '+cores_label+'\n')
+                        total_val_num += 1
+                '''
+                with open("hand_head_test.txt", "a") as text_file:
+                    coresLabel = str(int(label))
+                    handDir = main_dir+data_name+env+'/'+part+'/'+data_device[idx]+'/'+'Image'+str(i+1)+'.png'
+                    if TWOSTRING:
+                        headDir = main_dir+data_name+env+'/'+part+'/'+'head'+'/'+'Image'+str(i+1)+'.png'
+                        line = ' '.join([handDir, headDir, coresLabel])
+                    else:
+                        line = ' '.join([handDir, coresLabel])
+                    text_file.write(line+'\n')
+                    total_train_num += 1
+
+
+# print('total_val_num = ',total_val_num)
+# print('total_train_num = ',total_train_num)
+
+## Shuffle the txt-file
+# data = pd.read_csv('output_list.txt', sep=" ", header=None)
+# data.columns = ["a", "b", "c", "etc."]