ProxEmo: Gait-based Emotion Learning and Multi-view Proxemic Fusion for Socially-Aware Robot Navigation
ProxEmo is a novel end-to-end emotion prediction algorithm for socially aware robot navigation among pedestrians. The approach predicts the perceived emotions of a pedestrian from walking gaits, which is then used for emotion-guided navigation taking into account social and proxemic constraints. Multi-view skeleton graph convolution-based model uses commodity camera mounted onto a moving robot to classify emotions. Our emotion recognition is integrated into a mapless navigation scheme and makes no assumptions about the environment of pedestrian motion.
We first capture an RGB video from an onboard camera and extract pedestrian poses and track them at each frame. These tracked poses over a predefined time period are embedded into an image, which is then passed into our ProxEmo model for classifying emotions into four classes. The obtained emotions then undergo proxemic fusion with the LIDAR data and are finally passed into the navigation stack.
The network is trained on image embeddings of the 5D gait set G, which are scaled up to 244×244. The architecture consists of four group convolution (GC) layers. Each GC layer consists of four groups that have been stacked together. This represents the four group convolution outcomes for each of the four emotion labels. The group convolutions are stacked in two stages represented by Stage 1 and Stage 2. The output of the network has a dimension of 4 × 4 after passing through a sof tmax layer. The final predicted emotion is given by the maxima of this 4×4 output.
The code is implemented in Python and has the following dependency:
- Python3
- Pytorch >= 1.4
- torchlight
- OpenCV 3+
To run the demo with intel realsense D435 camera following libraries are required:
- OpenCV 3+
- pyrealsense2
- Cubemos SDK (works with Ubuntu 18.04)
Sample dataset can be downloaded from EWalk: Emotion Walk. Sample H5 files can be found in GitHub under [proxemo folder]/emotion_classification/sample_data
. For full dataset, please contact the authors.
VS-GCNN model trained on the above dataset can be loaded from trained_models folder - [proxemo folder]/emotion_classification/trained_models
Run [proxemo folder]/emotion_classification/utils/dataGenerator.py
to augment original dataset to different view angles. Check the source and destination folder paths in main loop and run the python script. The default settings will generate augmented data for 4 view angles.
Below are the basic changes to be made in config file. Open config file from [proxemo folder]/emotion_classification/modeling/config
and make following changes.
- Set the mode
GENERAL : MODE : ['train' | 'test' ]
- Specify pretrained model path if running in inferece or test mode or warm starting the training
MODEL : PRETRAIN_PATH : <path to model dir>
MODEL : PRETRAIN_NAME : <model file name>
- Specify features and labels H5 files.
DATA : FEATURES_FILE : <path to augmented features file>
DATA : LABELS_FILE : <path to augmented lables file>
Clone the repo.
git clone https://github.com/vijay4313/proxemo.git
cd <proxemo directory>
Find the latest release tag from released versions and checkout the latest release.
git checkout tags/<latest_tag_name>
example
git fetch --all --tags
git checkout tags/v1.0
All the settings are configured as yaml file from [proxemo folder]/emotion_classification/modeling/config. We have provided two settings file one for inference and one to train the model.
To run the code with specific settings file, run the below command
python3 main.py --settings infer
To run the demo, connect intel realsense D435 camera with above mentioned pre requsites and execute below command
python3 demo.py --model ./emotion_classification/modeling/config/infer.yaml
We use the emotions detected by ProxEmo along with the LIDAR data to perform Proxemic Fusion. This gives us a comfort distance around a pedestrian for emotionally-guided navigation. The green arrows represent the path after accounting for comfort distance while the violet arrows indicate the path without considering this distance. Observe the significant change in the path taken in the sad case. Note that the overhead image is representational, and ProxEmo works entirely from a egocentric camera on a robot.
Comparison of ProxEmo with other state-of-theartemotion classification algorithms.
Here we present the performance metrics of our ProxEmo network compared to the state-of-the-art arbitrary view action recognition models. We perform a comprehensive comparison of models across multiple distances of skeletal gaits from the camera and across multiple view-groups. It can be seen that our ProxEmo network outperforms other state-of-the-art network by 50% at an average in terms of prediction accuracy.
Confusion Matrix
@INPROCEEDINGS{narayanan2020proxemo,
author={Narayanan, Venkatraman and Manoghar, Bala Murali and Dorbala, Vishnu Sashank and Manocha, Dinesh and Bera, Aniket},
title={ProxEmo: Gait-based Emotion Learning and Multi-view Proxemic Fusion for Socially-Aware Robot Navigation},
booktitle={2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2020},
volume={},
number={}}
Venkatraman Narayanan [email protected]
Bala Murali Manoghar Sai Sudhakar [email protected]
Vishnu Sashank Dorbala [email protected]
Aniket Bera [email protected]