Latent Sound Transfer with Music2Latent

This repository demonstrates a workflow inspired by the paper "Latent Granular Resynthesis using Neural Audio Codecs" by Nao Tokui et al. The goal is to transfer characteristics of ambient sounds (e.g., ocean waves) onto a source sound (e.g., a guitar) using latent embeddings from Music2Latent.

Concept Overview

Compute embeddings of reference sounds: Start with a set of target audio samples, such as ocean waves, and calculate their latent embeddings using Music2Latent.
Compute embedding of the source sound: Take your source audio and compute its embedding.
Compare embeddings: Compare the source embedding with the embeddings of the reference sounds to find the closest matches. Different similarity metrics (cosine similarity, Euclidean distance, etc.) can be used.
Reconstruct with decoder: Use the Music2Latent decoder to reconstruct the source sound, now influenced by the characteristics of the closest reference sounds, effectively “translating” the timbre.
Result: The resulting audio preserves the original source content while adopting sonic qualities of the target sound.

Note: The codec and the embedding comparison method can be modified to influence the output.

Diagram

Results

Source: source.wav

Translated: translated.wav

Combined (dry 20%): combined.wav

How to Use

Replace source.wav with your own instrument or voice recording.
Replace or add target sounds to experiment with different timbres. Ideally, having a representative quantity of recordings will better capture the target characteristics. You can also consider data augmentation, but avoid transforming the target sound too much; otherwise, diverse embeddings may be picked, leading to unwanted results.
Try changing the similarity metric or the Music2Latent codec to creatively achieve different results.
Run the workflow to generate new audio files.

Notes:

The notebook paths are kept intentionally vague, so you may need to adjust them.
Without GPU, an acceptable result may take hours of computation.
Precomputed seawave embeddings are included as a toy example; they are not extensive or fully representative but you can reproduce my results.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
combined.wav		combined.wav
notebook.ipynb		notebook.ipynb
readme.md		readme.md
scheme.png		scheme.png
seawaves_embeddings.pt		seawaves_embeddings.pt
source.wav		source.wav
translated.wav		translated.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Latent Sound Transfer with Music2Latent

Concept Overview

Diagram

Results

How to Use

References

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Latent Sound Transfer with Music2Latent

Concept Overview

Diagram

Results

How to Use

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages