Skip to content
/ RERD Public

Source code for Deep Multimodal Sequence Fusion by RegularizedExpressive Representation Distillation

License

Notifications You must be signed in to change notification settings

Redaimao/RERD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python 3.7

RERD

Pytorch implementation for RERD

Paper

Deep Multimodal Sequence Fusion by Regularized Expressive Representation Distillation
[Xiaobao Guo], [Adams Wai-Kin Kong] *, and [Alex Kot]
IEEE Transactions on Multimedia, 2022.

Please cite our paper if you find our work useful for your research:

@article{guo2022deep,
  title={Deep Multimodal Sequence Fusion by Regularized Expressive Representation Distillation},
  author={Guo, Xiaobao and Kong, Wai-Kin Adams and Kot, Alex C},
  journal={IEEE Transactions on Multimedia},
  year={2022},
  publisher={IEEE}
}

Overall Architecture for RERD

RERD is comprised of the two major components based on an intermediate-fusion pipeline: (1) a multi-head distillation encoder is proposed to enhance unimodal representations from unaligned multimodal sequences, where the distillation attention layers dynamically capture and extract the most expressive unimodal features and (2) a novel multimodal sinkhorn distance regularizer is introduced to aid the joint optimization in training.

Prerequisites

Data preparation:

The processed MOSI, MOSEI can be downloaded from here.

The SIMS dataset can be downloaded from here

Bert pretrained model can be found from here

Run the Code

  1. Create folders for data and models:
mkdir data all_models
mkdir data/pretrained_bert

and put or link the data under 'data/'.

  1. Training:
python main.py [--params]
e.g.,
CUDA_VISIBLE_DEVICES=4,5 python main.py \
--model=RERD --lonly --aonly --vonly \
--name='RERD-01' \
--dataset='mosei' --data_path='./data/MOSEI' \
--batch_size=16 --use_bert=True\
--bert_path='./data/pretrained_bert/'
--dis_d_mode=64 --dis_n_heads=4 --dis_e_layers=2 \
--optim='Adam' --reg_lambda=0.1 \
--schedule='c' --lr=0.001 --nlevels=2

Acknowledgement

Some portion of the code were adapted from the fairseq, MMSA, and Informer repo.

About

Source code for Deep Multimodal Sequence Fusion by RegularizedExpressive Representation Distillation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published