Audio Explanation Synthesis with Generative Foundation Models
audio_explanation_generation --> Inference script for generating audio explanations using AudioXgen.
fidelity_performance --> Script to measure the fidelity score of explanations on the Speech Commands and TESS datasets.
models --> Folder containing classification models that predict on encoded datasets using EnCodec.
sample_explanations --> Sample audio explanations generated by AudioXgen.
EnCodec Installation
Encodec has now been added to Transformers. For more information, please refer to Transformers' Encodec docs.
You can find both the 24KHz and 48KHz checkpoints on the 🤗 Hub.
Please cite our paper.
@misc{akman2024audioexplanationsynthesisgenerative,
title={Audio Explanation Synthesis with Generative Foundation Models},
author={Alican Akman and Qiyang Sun and Björn W. Schuller},
year={2024},
eprint={2410.07530},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2410.07530},
}