Hi, and thanks a lot for this really cool repo!
I am trying to train pesto on my own data, and I had to change fmin and fmax (having relatively short audio files, otherwise CQT kernels would not fit).
As a result, my CQT frames have a different size than expected (157 and 125 before and after cropping respectively, instead of 216).
This results in a crash when entering the layernorm of the Resnet encoder, because its input dimension seems fixed.
I could bypass this by manually setting model.encoder.n_bins_in=125, but I guess it would be nice to have it adapt automatically.
Really sorry if I understood something wrong and this is not a bug.
Hi, and thanks a lot for this really cool repo!
I am trying to train pesto on my own data, and I had to change fmin and fmax (having relatively short audio files, otherwise CQT kernels would not fit).
As a result, my CQT frames have a different size than expected (157 and 125 before and after cropping respectively, instead of 216).
This results in a crash when entering the layernorm of the Resnet encoder, because its input dimension seems fixed.
I could bypass this by manually setting
model.encoder.n_bins_in=125, but I guess it would be nice to have it adapt automatically.Really sorry if I understood something wrong and this is not a bug.