Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added results/finetune1.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added results/finetune2.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added results/finetune2stream.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
118 changes: 92 additions & 26 deletions results/index.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,113 @@
# Your Name <span style="color:red">(id)</span>
# Your Name 楊皓鈞<span style="color:red">(105061523)</span>

#Project 5: Deep Classification
#Project 1: Deep Classification


<center>
<img src="./overview.png" alt="overview" style="float:middle;">
</center>

## Overview
The project is related to
> quote
> Recently, the technological advance of wearable devices has led to significant interests in recognizing human behaviors in daily life (i.e., uninstrumented environment). Among many devices, egocentric camera systems have drawn significant attention, since the camera is aligned with the field-of-view of wearer, it naturally captures what a person sees. These systems have shown great potential in recognizing daily activities(e.g., making meals, watching TV, etc.), estimating hand poses, generating howto videos, etc.
Despite many advantages of egocentric camera systems, there exists two main issues which are much less discussed. Firstly, hand localization is not solved especially for passive camera systems. Even for active camera systems like Kinect, hand localization is challenging when two hands are interacting or a hand is interacting with an object. Secondly, the limited field-of-view of an egocentric camera implies that hands will inevitably move outside the images sometimes.
HandCam (Fig. 1), a novel wearable camera capturing activities of hands, for recognizing human behaviors. HandCam has two main advantages over egocentric systems : (1) it avoids the need to detect hands and manipulation regions; (2) it observes the activities of hands almost at all time.



## Implementation
1. One
* item
* item
2. Two
1. Single Stream CNN
In this work, the basic network that we're going to fine-tune is the [ResNet50](https://github.com/fchollet/keras/blob/master/keras/applications/resnet50.py). Due to the limitations of time and computation resources, we simply fine tune two version:
* Last fully connect layer <br>
<img src="./finetune1.PNG" >
* Last convolutional layer + fully connect layer
<img src="./finetune2.PNG"> <br>
<br>

2. Two-Stream CNN
To achieve better performance, we implement the two stream CNN just like the architecture described in [paper](https://arxiv.org/abs/1512.01881)
<img src="./finetune2stream.PNG"> <br>
<br>

3. Class Weight
From the histogram below, we observe that there's a huge data unbalance (mostly sample belong to empty, i.e. no object)in the given dataset. Therefore, we try to make a compensation about it by having different weightings against different class.
<img src="./labelhist.png"> <br>

```
def compute_class_weight(labelPath):
with open(labelPath, 'r') as text_file:
content = text_file.readlines()
content = np.asarray(content)
y = np.asarray([int(sample.split(' ')[2].strip('\n')) for sample in content] )

classes = np.asarray(list(set(y)))
le = LabelEncoder()
y_ind = le.fit_transform(y)
recip_freq = len(y) / (len(le.classes_) * bincount(y_ind).astype(np.float64))
weight = recip_freq[le.transform(classes)]
return weight
```
<br>

```
Code highlights
```

## Installation
* Other required packages.
* How to compile from source?
* python library for building deep neural network: [keras](https://keras.io/)
* python library for performing image augmentation: [imgaug](https://github.com/aleju/imgaug)


## Results

* Comparison: Weighted vs Noweighted when training
<table border=1>
<tr>
<td> Weighted by class </td>
<td> NoWeighted by class </td>
</tr>
<tr>
<td> <img src="./resnet_16_weighted_fc.png" alt="overview" style="float:middle;"> </td>
<td> <img src="./resnet_16_noweighted_fc.png" alt="overview" style="float:middle;"> </td>
</tr>
</table>

### Results


* Comparison: Single stream vs Two stream
<table border=1>
<tr>
<td>
<img src="placeholder.jpg" width="24%"/>
<img src="placeholder.jpg" width="24%"/>
<img src="placeholder.jpg" width="24%"/>
<img src="placeholder.jpg" width="24%"/>
</td>
<td> Single Stream </td>
<td> Two Stream </td>
</tr>
<tr>
<td> <img src="./resnet_32_twostring_weighted_fc.png" alt="overview" style="float:middle;"> </td>
<td> <img src="./resnet_32_weighted_fc.png" alt="overview" style="float:middle;"> </td>
</tr>
</table>


* Comparison: Finetune fully connected layer only or Finetune conv+fc layer
<table border=1>
<tr>
<td> FC only </td>
<td> FC + 1-layer conv </td>
</tr>
<tr>
<td>
<img src="placeholder.jpg" width="24%"/>
<img src="placeholder.jpg" width="24%"/>
<img src="placeholder.jpg" width="24%"/>
<img src="placeholder.jpg" width="24%"/>
</td>
<td> <img src="./resnet_32_weighted_fc.png" alt="overview" style="float:middle;"> </td>
<td> <img src="./resnet_32_weighted_1conv.png" alt="overview" style="float:middle;"> </td>
</tr>
</table>

* Comparison: flip input left to right or not
<table border=1>
<tr>
<td> Fliplr </td>
<td> No fliplr </td>
</tr>
<tr>
<td> <img src="./resnet_twostring_16_noweighted_fc_fliplr.png" alt="overview" style="float:middle;"> </td>
<td> <img src="./resnet_twostring_16_noweighted_1conv.png" alt="overview" style="float:middle;"> </td>
</tr>
</table>


## Discussion
After working exhaustedly, the best test accuracy I can achieve remains around <span style="color:red">0.51</span>. The weighting method doesn't seems to work, and either single stream or two stream network doesn't seem to construct a stable network. I guess the possible reason is that due to the GPU memory constrain, I didn't try larger batchsize while training by network, which results in quite label unbalance in each batch, and eventually the whole network crash. This is really an interesting task and I would really curious about how other classmates conquer this task. T_T

Binary file added results/labelhist.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added results/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added results/resnet_16_noweighted_fc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added results/resnet_16_weighted_fc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added results/resnet_32_twostring_weighted_fc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added results/resnet_32_weighted_1conv.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added results/resnet_32_weighted_fc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added results/resnet_twostring_16_noweighted_1conv.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
77 changes: 77 additions & 0 deletions scripts/create_traintestTXT.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
import os
import numpy as np
import pandas as pd
os.chdir('D:/Lab/CEDL/hw1/scripts/')
main_dir = '../data/'
envs = ['house','lab','office']

label_name = 'labels/'
label_device = ['obj_left','obj_right']
# label_device = ['obj_left']
TWOSTRING = True

data_name = 'test/'
data_part = ['1','2','3','4']
data_device = ['Lhand','Rhand']
# data_device = ['head']

## note for the pre-process of head-data:
'''
for head 'image' and 'label', I arbitrary select left-hand label as its correponding head-label, since I will
not use the label of the head in the parallel structure, so it doesn;t matter.
'''

#%%
total_val_num = 0
total_train_num = 0
for _, env in enumerate(envs):
for idx, device in enumerate(label_device):
for _, part in enumerate(data_part):
if env != 'lab' and part == '4':
continue
if data_name=='test/' and env =='lab':
label_f_dir = main_dir+label_name+env+'/'+device+str(int(part)+4)+'.npy' # label test
elif data_name=='test/':
label_f_dir = main_dir+label_name+env+'/'+device+str(int(part)+3)+'.npy' # label test
else:
label_f_dir = main_dir+label_name+env+'/'+device+part+'.npy' # label train
label_array = np.load(label_f_dir)
print('now reading %s' % label_f_dir)
# img_num = len(label_array)

for i, label in enumerate(label_array):
train_num = int(len(label_array)*0.7)
val_num = len(label_array) - train_num

'''
if i < train_num:
with open("head_train.txt", "a") as text_file:
f_dir = main_dir+data_name+env+'/'+part+'/'+data_device[idx]+'/'+'Image'+str(i+1)+'.png'
cores_label = str(int(label))
text_file.write(f_dir+' '+cores_label+'\n')
total_train_num += 1
else:
with open("head_val.txt", "a") as text_file:
f_dir = main_dir+data_name+env+'/'+part+'/'+data_device[idx]+'/'+'Image'+str(i+1)+'.png'
cores_label = str(int(label))
text_file.write(f_dir+' '+cores_label+'\n')
total_val_num += 1
'''
with open("hand_head_test.txt", "a") as text_file:
coresLabel = str(int(label))
handDir = main_dir+data_name+env+'/'+part+'/'+data_device[idx]+'/'+'Image'+str(i+1)+'.png'
if TWOSTRING:
headDir = main_dir+data_name+env+'/'+part+'/'+'head'+'/'+'Image'+str(i+1)+'.png'
line = ' '.join([handDir, headDir, coresLabel])
else:
line = ' '.join([handDir, coresLabel])
text_file.write(line+'\n')
total_train_num += 1


# print('total_val_num = ',total_val_num)
# print('total_train_num = ',total_train_num)

## Shuffle the txt-file
# data = pd.read_csv('output_list.txt', sep=" ", header=None)
# data.columns = ["a", "b", "c", "etc."]
Loading