diff --git a/README.md b/README.md index b95f966..2aa30be 100644 --- a/README.md +++ b/README.md @@ -1,139 +1,39 @@ -
-overview -
+# 丘鈞岳 106065522 +> #### Homework 1 : Deep Classification +## Introduction +In this work, I use the pretrained model( ResNet18), which was pretrained on ImageNet Datasets to train on the given dataset. Consequencely, I test on it and have a good results on my experiments. Why I choose resnet? Cause it's the most polpular NN architecture now. So, I think it will positively much better than the Alexnet baseline. -# Deep Classification +## Enviroment -## updates -- 9/26/2017: provide [subset of dataset](https://drive.google.com/drive/folders/0B3fKFm-j0RqeWGdXZUNRUkpybU0?usp=sharing), separated into train/test set -- 9/27/2017: in this homework, we only evaluat the performance of object classification. You can use other label for multi-task learning, etc. +* Framework : Pytorch( Python) +* OS : Ubuntu + Linux +* Virtual Evironment : Miniconda2 +* Network Architecture : ResNet18 -## Brief -* ***+2 extra credit of the whole semester*** -* Due: Oct. 5, 11:59pm. -* Required files: results/index.md, and code/ -* [Project reference](http://aliensunmin.github.io/project/handcam/) +## Implementation + +1. I modify my [main.py](https://github.com/pytorch/examples/blob/master/imagenet/main.py) from the pytorch tutorial. +2. And it will load the pretrained ResNet18 automatically. +3. Write a DataLoader( load_dataset.py) to input the corresponding frames and labels for training and testing. +![](https://i.imgur.com/70KI1Qd.png) + +4. Setting epoch 3, --batch_size 128 ,--workers 4 on trainig. +![](https://i.imgur.com/4HglBjf.png) + + + + +## Result + +Testing Accuracy(%) + +| 1st epoch | 2nd epoch | 3rd epoch | +| -------- | -------- | -------- | +| 64.152% | 64.480% | **66.390%** | -## Overview -Recently, the technological advance of wearable devices has led to significant interests in recognizing human behaviors in daily life (i.e., uninstrumented environment). Among many devices, egocentric camera systems have drawn significant attention, since the camera is aligned with the field-of-view of wearer, it naturally captures what a person sees. These systems have shown great potential in recognizing daily activities(e.g., making meals, watching TV, etc.), estimating hand poses, generating howto videos, etc. -Despite many advantages of egocentric camera systems, there exists two main issues which are much less discussed. Firstly, hand localization is not solved especially for passive camera systems. Even for active camera systems like Kinect, hand localization is challenging when two hands are interacting or a hand is interacting with an object. Secondly, the limited field-of-view of an egocentric camera implies that hands will inevitably move outside the images sometimes. - -HandCam (Fig. 1), a novel wearable camera capturing activities of hands, for recognizing human behaviors. HandCam has two main advantages over egocentric systems : (1) it avoids the need to detect hands and manipulation regions; (2) it observes the activities of hands almost at all time. - -## Requirement -- Python -- [TensorFlow](https://github.com/tensorflow/tensorflow) -## Data - -### Introduction - -This is a [dataset](https://drive.google.com/drive/folders/0BwCy2boZhfdBdXdFWnEtNWJYRzQ) recorded by hand camera system. - -The camera system consist of three wide-angle cameras, two mounted on the left and right wrists to -capture hands (referred to as HandCam) and one mounted on the head (referred to as HeadCam). - -The dataset consists of 20 sets of video sequences (i.e., each set includes two HandCams and one -HeadCam synchronized videos) captured in three scenes: a small office, a mid-size lab, and a large home.) - -We want to classify some kinds of hand states including free v.s. active (i.e., hands holding objects or not), -object categories, and hand gestures. At the same time, a synchronized video has two sequence need to be labeled, -the left hand states and right hand states. - -For each classification task (i.e., free vs. active, object categories, or hand gesture), there are forty -sequences of data. We split the dataset into two parts, half for training, half for testing. The object instance is totally separated into training and testing. - -### Zip files - -`frames.zip` contains all the frames sample from the original videos by 6fps. - -`labels.zip` conatins the labels for all frames. - -FA : free vs. active (only 0/1) - -obj: object categories (24 classes, including free) - -ges: hand gesture (13 gestures, including free) - - -### Details of obj. and ges. - -``` -Obj = { 'free':0, - 'computer':1, - 'cellphone':2, - 'coin':3, - 'ruler':4, - 'thermos-bottle':5, - 'whiteboard-pen':6, - 'whiteboard-eraser':7, - 'pen':8, - 'cup':9, - 'remote-control-TV':10, - 'remote-control-AC':11, - 'switch':12, - 'windows':13, - 'fridge':14, - 'cupboard':15, - 'water-tap':16, - 'toy':17, - 'kettle':18, - 'bottle':19, - 'cookie':20, - 'book':21, - 'magnet':22, - 'lamp-switch':23} - -Ges= { 'free':0, - 'press'1, - 'large-diameter':2, - 'lateral-tripod':3, - 'parallel-extension':4, - 'thumb-2-finger':5, - 'thumb-4-finger':6, - 'thumb-index-finger':7, - 'precision-disk':8, - 'lateral-pinch':9, - 'tripod':10, - 'medium-wrap':11, - 'light-tool':12} -``` - -## Writeup - -You are required to implement a **deep-learning-based method** to recognize hand states (free vs. active hands, hand gestures, object categories). Moreover, You might need to further take advantage of both HandCam and HeadCam. You will have to compete the performance with your classmates, so try to use as many techniques as possible to improve. **Your score will based on the performance ranking.** - -For this project, and all other projects, you must do a project report in results folder using [Markdown](https://help.github.com/articles/markdown-basics). We provide you with a placeholder [index.md](./results/index.md) document which you can edit. In the report you will describe your algorithm and any decisions you made to write your algorithm a particular way. Then, you will describe how to run your code and if your code depended on other packages. You also need to show and discuss the results of your algorithm. Discuss any extra credit you did, and clearly show what contribution it had on the results (e.g. performance with and without each extra credit component). - -You should also include the precision-recall curve of your final classifier and any interesting variants of your algorithm. - -## Rubric - - -## Get start & hand in -* Publicly fork version (+2 extra points) - - [Fork the homework](https://education.github.com/guide/forks) to obtain a copy of the homework in your github account - - [Clone the homework](http://gitref.org/creating/#clone) to your local space and work on the code locally - - Commit and push your local code to your github repo - - Once you are done, submit your homework by [creating a pull request](https://help.github.com/articles/creating-a-pull-request) - -* [Privately duplicated version](https://help.github.com/articles/duplicating-a-repository) - - Make a bare clone - - mirror-push to new repo - - [make new repo private](https://help.github.com/articles/making-a-private-repository-public) - - [add aliensunmin as collaborator](https://help.github.com/articles/adding-collaborators-to-a-personal-repository) - - [Clone the homework](http://gitref.org/creating/#clone) to your local space and work on the code locally - - Commit and push your local code to your github repo - - I will clone your repo after the due date - -## Credits -Assignment designed by Cheng-Sheng Chan. Contents in this handout are from Chan et al.. diff --git a/load_Dataset.py b/load_Dataset.py new file mode 100644 index 0000000..7bb9371 --- /dev/null +++ b/load_Dataset.py @@ -0,0 +1,105 @@ +import numpy as np + +import pickle +import os +import collections +import random +import matplotlib.pyplot as plt +import pdb +import glob + +import torch +from torch.utils.data.dataset import Dataset +from torchvision import datasets, transforms, utils +from torch.utils.data import TensorDataset, DataLoader +from PIL import Image + +class HandCamDataset(Dataset): + + def __init__(self, stage): + + self.image_path = 'data/frames/'+stage+'/' + self.label_path = 'data/labels/' + + self.scene_list = ['house','lab','office'] + self.video_list = ['1','2','3','4'] + self.handview_list = ['Lhand','Rhand'] + + self.image_list = [] + + for scene in self.scene_list: + for video in self.video_list: + if video == '4' and scene != 'lab': + break + for handview in self.handview_list: + print('scene', scene, 'video', video, 'view', handview) + frame_list = glob.glob(self.image_path +scene+'/'+video+'/'+handview+'/*') + frame_list.sort(key=lambda x:int(x.split('Image')[1].split('.')[0])) + self.image_list.extend(frame_list) + self.labels = np.array([]) + + + + + for scene in self.scene_list: + for video in self.video_list: + for handview in self.handview_list: + if video == '4' and scene != 'lab': + break + label_index = video + if stage == 'test': + if scene == 'lab': + lable_index = str(int(label_index) + 4) + else: + lable_index = str(int(label_index) + 3) + handview = 'left' if handview == 'Lhand' else 'right' + + print self.label_path + scene + '/obj_' + handview + label_index + '.npy' + self.labels = np.append(self.labels, np.load(self.label_path + scene + '/obj_' + handview + label_index + '.npy')) + + normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225]) + if stage == 'train': + + self.transform = transforms.Compose([ + transforms.Scale(224), + #transforms.RandomCrop(224), + transforms.RandomHorizontalFlip(), + transforms.ToTensor(), + normalize, + ]) + + elif stage == 'test': + self.transform = transforms.Compose([ + transforms.Scale(224), + #transforms.CenterCrop(224), + transforms.ToTensor(), + normalize, + ]) + + + def __len__(self): + return len(self.image_list) + + def __getitem__(self, index): + image = self.transform(Image.open(self.image_list[index]).convert('RGB')) + label = torch.LongTensor([int(self.labels[index])]) + + return image, label + +def image_data_loader(args): + kwargs = {'num_workers': args.workers, 'pin_memory': True} + train_dataset = HandCamDataset('train') + train_loader = DataLoader(train_dataset, batch_size=args.batch_size, + shuffle=True, **kwargs) + + val_dataset = HandCamDataset('test') + val_loader = DataLoader(val_dataset, batch_size=args.batch_size, + shuffle=False, **kwargs) + + return train_loader, val_loader + +if __name__ == '__main__': + train_data = HandCamDataset('train') + test_data = HandCamDataset('test') + diff --git a/main.py b/main.py new file mode 100644 index 0000000..e8a33a9 --- /dev/null +++ b/main.py @@ -0,0 +1,330 @@ +import argparse +import os +import shutil +import time +import pdb + +import torch +import torch.nn as nn +import torch.nn.parallel +import torch.backends.cudnn as cudnn +import torch.distributed as dist +import torch.optim +import torch.utils.data +import torch.utils.data.distributed +import torchvision.transforms as transforms +import torchvision.datasets as datasets +import torchvision.models as models + + +from handcam_data import HandCam_Dataloader +model_names = sorted(name for name in models.__dict__ + if name.islower() and not name.startswith("__") + and callable(models.__dict__[name])) + +parser = argparse.ArgumentParser(description='PyTorch ImageNet Training') +parser.add_argument('--model_name', default='', type=str, metavar='PATH', + help='model_name (default: none)') +parser.add_argument('--arch', '-a', metavar='ARCH', default='resnet18', + choices=model_names, + help='model architecture: ' + + ' | '.join(model_names) + + ' (default: resnet18)') +parser.add_argument('-j', '--workers', default=4, type=int, metavar='N', + help='number of data loading workers (default: 4)') +parser.add_argument('--epochs', default=90, type=int, metavar='N', + help='number of total epochs to run') +parser.add_argument('--start-epoch', default=0, type=int, metavar='N', + help='manual epoch number (useful on restarts)') +parser.add_argument('-b', '--batch-size', default=256, type=int, + metavar='N', help='mini-batch size (default: 256)') +parser.add_argument('--lr', '--learning-rate', default=0.01, type=float, + metavar='LR', help='initial learning rate') +parser.add_argument('--momentum', default=0.9, type=float, metavar='M', + help='momentum') +parser.add_argument('--weight-decay', '--wd', default=1e-4, type=float, + metavar='W', help='weight decay (default: 1e-4)') +parser.add_argument('--print-freq', '-p', default=5, type=int, + metavar='N', help='print frequency (default: 10)') +parser.add_argument('--resume', default='', type=str, metavar='PATH', + help='path to latest checkpoint (default: none)') +parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', + help='evaluate model on validation set') +parser.add_argument('--pretrained', dest='pretrained', action='store_true', + help='use pre-trained model') +parser.add_argument('--world-size', default=1, type=int, + help='number of distributed processes') +parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str, + help='url used to set up distributed training') +parser.add_argument('--dist-backend', default='gloo', type=str, + help='distributed backend') + +best_prec1 = 0 + + +def main(): + global args, best_prec1 + args = parser.parse_args() + + args.distributed = args.world_size > 1 + + if args.distributed: + dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url, + world_size=args.world_size) + + # create model + if args.pretrained: + print("=> using pre-trained model '{}'".format(args.arch)) + model = models.__dict__[args.arch](pretrained=True) + if args.arch == 'resnet18': + model.fc = nn.Linear(512, 24) + elif args.arch == 'resnet50': + model.fc = nn.Linear(2048, 24) + else: + print("=> creating model '{}'".format(args.arch)) + model = models.__dict__[args.arch]() + if args.arch == 'resnet18': + model.fc = nn.Linear(512, 24) + elif args.arch == 'resnet50': + model.fc = nn.Linear(2048, 24) + + + if not args.distributed: + if args.arch.startswith('alexnet') or args.arch.startswith('vgg'): + model.features = torch.nn.DataParallel(model.features) + model.cuda() + else: + model = torch.nn.DataParallel(model).cuda() + else: + model.cuda() + model = torch.nn.parallel.DistributedDataParallel(model) + + # define loss function (criterion) and optimizer + criterion = nn.CrossEntropyLoss().cuda() + + optimizer = torch.optim.SGD(model.parameters(), args.lr, + momentum=args.momentum, + weight_decay=args.weight_decay) + + # optionally resume from a checkpoint + if args.resume: + if os.path.isfile(args.resume): + print("=> loading checkpoint '{}'".format(args.resume)) + checkpoint = torch.load(args.resume) + args.start_epoch = checkpoint['epoch'] + best_prec1 = checkpoint['best_prec1'] + model.load_state_dict(checkpoint['state_dict']) + optimizer.load_state_dict(checkpoint['optimizer']) + print("=> loaded checkpoint '{}' (epoch {})" + .format(args.resume, checkpoint['epoch'])) + else: + print("=> no checkpoint found at '{}'".format(args.resume)) + + cudnn.benchmark = True + + # Data loading code + train_loader, val_loader = HandCam_Dataloader(args) + ''' + traindir = os.path.join(args.data, 'train') + valdir = os.path.join(args.data, 'val') + normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], + std=[0.229, 0.224, 0.225]) + + train_dataset = datasets.ImageFolder( + traindir, + transforms.Compose([ + transforms.RandomSizedCrop(224), + transforms.RandomHorizontalFlip(), + transforms.ToTensor(), + normalize, + ])) + + if args.distributed: + train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset) + else: + train_sampler = None + + train_loader = torch.utils.data.DataLoader( + train_dataset, batch_size=args.batch_size, shuffle=(train_sampler is None), + num_workers=args.workers, pin_memory=True, sampler=train_sampler) + + val_loader = torch.utils.data.DataLoader( + datasets.ImageFolder(valdir, transforms.Compose([ + transforms.Scale(256), + transforms.CenterCrop(224), + transforms.ToTensor(), + normalize, + ])), + batch_size=args.batch_size, shuffle=False, + num_workers=args.workers, pin_memory=True) + ''' + if args.evaluate: + validate(val_loader, model, criterion) + return + + for epoch in range(args.start_epoch, args.epochs): + if args.distributed: + train_sampler.set_epoch(epoch) + adjust_learning_rate(optimizer, epoch) + + # train for one epoch + train(train_loader, model, criterion, optimizer, epoch) + + # evaluate on validation set + prec1 = validate(val_loader, model, criterion) + + # remember best prec@1 and save checkpoint + is_best = prec1 > best_prec1 + best_prec1 = max(prec1, best_prec1) + save_checkpoint({ + 'epoch': epoch + 1, + 'arch': args.arch, + 'state_dict': model.state_dict(), + 'best_prec1': best_prec1, + 'optimizer' : optimizer.state_dict(), + }, is_best, 'checkpoint/'+args.model_name+'/'+args.model_name+'_'+str(args.lr)+'_'+str(epoch)+'.pth.tar') + + +def train(train_loader, model, criterion, optimizer, epoch): + batch_time = AverageMeter() + data_time = AverageMeter() + losses = AverageMeter() + top1 = AverageMeter() + top5 = AverageMeter() + + # switch to train mode + model.train() + + end = time.time() + for i, (input, target) in enumerate(train_loader): + # measure data loading time + data_time.update(time.time() - end) + + target = target.squeeze().cuda(async=True) + input_var = torch.autograd.Variable(input) + target_var = torch.autograd.Variable(target) + + # compute output + output = model(input_var) + + loss = criterion(output, target_var) + + # measure accuracy and record loss + prec1, prec5 = accuracy(output.data, target, topk=(1, 5)) + losses.update(loss.data[0], input.size(0)) + top1.update(prec1[0], input.size(0)) + top5.update(prec5[0], input.size(0)) + + # compute gradient and do SGD step + optimizer.zero_grad() + loss.backward() + optimizer.step() + + # measure elapsed time + batch_time.update(time.time() - end) + end = time.time() + + if i % args.print_freq == 0: + print('Epoch: [{0}][{1}/{2}]\t' + 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' + 'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' + 'Loss {loss.val:.4f} ({loss.avg:.4f})\t' + 'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t' + 'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format( + epoch, i, len(train_loader), batch_time=batch_time, + data_time=data_time, loss=losses, top1=top1, top5=top5)) + + +def validate(val_loader, model, criterion): + batch_time = AverageMeter() + losses = AverageMeter() + top1 = AverageMeter() + top5 = AverageMeter() + + # switch to evaluate mode + model.eval() + + end = time.time() + for i, (input, target) in enumerate(val_loader): + target = target.squeeze().cuda(async=True) + input_var = torch.autograd.Variable(input, volatile=True) + target_var = torch.autograd.Variable(target, volatile=True) + + # compute output + output = model(input_var) + loss = criterion(output, target_var) + + # measure accuracy and record loss + prec1, prec5 = accuracy(output.data, target, topk=(1, 5)) + losses.update(loss.data[0], input.size(0)) + top1.update(prec1[0], input.size(0)) + top5.update(prec5[0], input.size(0)) + + # measure elapsed time + batch_time.update(time.time() - end) + end = time.time() + + if i % args.print_freq == 0: + print('Test: [{0}/{1}]\t' + 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' + 'Loss {loss.val:.4f} ({loss.avg:.4f})\t' + 'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t' + 'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format( + i, len(val_loader), batch_time=batch_time, loss=losses, + top1=top1, top5=top5)) + + print(' * Prec@1 {top1.avg:.3f} Prec@5 {top5.avg:.3f}' + .format(top1=top1, top5=top5)) + + return top1.avg + + +def save_checkpoint(state, is_best, filename): + torch.save(state, filename) + if is_best: + shutil.copyfile(filename, 'checkpoint/'+args.model_name+'/'+args.model_name+'_best.pth.tar') + + +class AverageMeter(object): + """Computes and stores the average and current value""" + def __init__(self): + self.reset() + + def reset(self): + self.val = 0 + self.avg = 0 + self.sum = 0 + self.count = 0 + + def update(self, val, n=1): + self.val = val + self.sum += val * n + self.count += n + self.avg = self.sum / self.count + + +def adjust_learning_rate(optimizer, epoch): + """Sets the learning rate to the initial LR decayed by 10 every 30 epochs""" + lr = args.lr * (0.1 ** (epoch // 30)) + for param_group in optimizer.param_groups: + param_group['lr'] = lr + + +def accuracy(output, target, topk=(1,)): + """Computes the precision@k for the specified values of k""" + maxk = max(topk) + batch_size = target.size(0) + + _, pred = output.topk(maxk, 1, True, True) + pred = pred.t() + correct = pred.eq(target.view(1, -1).expand_as(pred)) + + res = [] + for k in topk: + correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) + res.append(correct_k.mul_(100.0 / batch_size)) + return res + + +if __name__ == '__main__': + main()