Skip to content

Conversation

@BirgerMoell
Copy link

Here is a first attempt of adding data augmentations to the whisper training script. Some nice improvements would be using a flag instead of running it for all data.
There is a run.sh that seems to be working for trying it out.
Since the data is using streaming mode it would probably take a while to load things in and try it out.

@BirgerMoell
Copy link
Author

I got an error running the code. So this is currently NOT WORKING.

  File "/home/bmoell/community-events/whisper-fine-tuning-event/stream_with_augmentations.py", line 661, in <module>
    main()
  File "/home/bmoell/community-events/whisper-fine-tuning-event/stream_with_augmentations.py", line 398, in main
    raw_datasets["train"] = augment_dataset(raw_datasets["train"])
  File "/home/bmoell/community-events/whisper-fine-tuning-event/stream_with_augmentations.py", line 299, in augment_dataset
    dataset_name = interleave_datasets([dataset_name, augmented_noise, augmented_pitch, augmented_time_stretch])
  File "/home/bmoell/miniconda3/envs/fine-tune/lib/python3.9/site-packages/datasets/combine.py", line 128, in interleave_datasets
    return _interleave_iterable_datasets(
  File "/home/bmoell/miniconda3/envs/fine-tune/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1478, in _interleave_iterable_datasets
    _check_if_features_can_be_aligned([dset.features for dset in datasets])
  File "/home/bmoell/miniconda3/envs/fine-tune/lib/python3.9/site-packages/datasets/features/features.py", line 2000, in _check_if_features_can_be_aligned
    raise ValueError(
ValueError: The features can't be aligned because the key audio of features {'client_id': Value(dtype='string', id=None), 'path': Value(dtype='string', id=None), 'audio': {'array': Sequence(feature=Value(dtype='float32', id=None), length=-1, id=None), 'path': Value(dtype='string', id=None), 'sampling_rate': Value(dtype='int64', id=None)}, 'sentence': Value(dtype='string', id=None), 'up_votes': Value(dtype='int64', id=None), 'down_votes': Value(dtype='int64', id=None), 'age': Value(dtype='string', id=None), 'gender': Value(dtype='string', id=None), 'accent': Value(dtype='string', id=None), 'locale': Value(dtype='string', id=None), 'segment': Value(dtype='string', id=None)} has unexpected type - {'array': Sequence(feature=Value(dtype='float32', id=None), length=-1, id=None), 'path': Value(dtype='string', id=None), 'sampling_rate': Value(dtype='int64', id=None)} (expected either Audio(sampling_rate=48000, mono=True, decode=True, id=None) or Value("null").

@sanchit-gandhi
Copy link
Contributor

Hey @BirgerMoell! Super cool PR! Would love to see how data aug impacts Whisper training. Could you try updating datasets to main and seeing if that fixes the issue?

pip install git+https://github.com/huggingface/datasets

@Vaibhavs10
Copy link
Member

Hi @BirgerMoell - This is a really wonderful PR. just wondering if you double-checked @sanchit-gandhi's suggestion? We'd love to merge this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants