Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Audio Preprocessing #46

Open
xjf-303 opened this issue Oct 24, 2024 · 1 comment
Open

Question about Audio Preprocessing #46

xjf-303 opened this issue Oct 24, 2024 · 1 comment

Comments

@xjf-303
Copy link

xjf-303 commented Oct 24, 2024

Hello, @jishengpeng thank you for the amazing work. May I ask several questions:

I was going through the paper and noticed that during the data preprocessing, audio is first cropped to a fixed length of 10 seconds and then randomly cropped again to obtain 3-second segments. I have a couple of questions regarding this process:

1.Could you explain the rationale behind first cropping the audio to 10 seconds and then performing another random crop to 3-second segments? How does this impact the model's performance or training?

2.Are there any overlaps between the cropped segments, or are they entirely distinct?

3.If possible, could you please share the code for this part of the data preprocessing pipeline?

Thank you for your time and consideration! Looking forward to your response.

@jishengpeng
Copy link
Owner

Hello, @jishengpeng thank you for the amazing work. May I ask several questions:

I was going through the paper and noticed that during the data preprocessing, audio is first cropped to a fixed length of 10 seconds and then randomly cropped again to obtain 3-second segments. I have a couple of questions regarding this process:

1.Could you explain the rationale behind first cropping the audio to 10 seconds and then performing another random crop to 3-second segments? How does this impact the model's performance or training?

2.Are there any overlaps between the cropped segments, or are they entirely distinct?

3.If possible, could you please share the code for this part of the data preprocessing pipeline?

Thank you for your time and consideration! Looking forward to your response.

Thank you for your attention.

  1. The data was randomly segmented into 10-second intervals, and this selection was part of a stochastic process. We opted for a value that appeared to be reasonable based on empirical considerations.
  2. Since some audio recordings can be as long as 5 minutes, we segmented them into 30 distinct 10-second clips.
  3. The corresponding code implementation is relatively concise, consisting of approximately 30 lines. It calculates the sampling points based on the 10-second interval and the sampling rate, producing the desired audio segments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants