Music-Description-Generator

Example Outputs

Ex 1 * Only Audio file Not Video!

example1.mp4

'This is a classical music piece. It could also be playing in the background at a coffee shop.'

Ex 2 * Only Audio file Not Video!

example2.mp4

'The low quality recording features a live performance of a folk song and it consists of groovy bass, shimmering hi hats, soft kick and harmonizing vocals, harmonizing vocals. It sounds energetic.'

Model Architecture

Audio Encoder

Use facebook/encodec_32khz huggingface pre-trained model.

Input is 10 seconds of raw audio, sample rate is 32000.

Audio Encoder convert raw audio to Discrete sequence of audio like [100, 321, 210, 124, ... , 213].

Sequence of audio codebook is input of Text Decoder.

Text Decoder

Use Transformer base architecture and T5 tokenizer.

More details (nLayers, hidden dim, nHeads, etc...) are in trainer.ipynb

Input is sequence of codebook index, Out is sentences.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
img		img
.gitignore		.gitignore
Infer.ipynb		Infer.ipynb
README.md		README.md
trainer.ipynb		trainer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music-Description-Generator

Example Outputs

Ex 1 * Only Audio file Not Video!

Ex 2 * Only Audio file Not Video!

Model Architecture

Audio Encoder

Text Decoder

Training & Test Loss Graph

About

Uh oh!

Releases

Packages

Languages

ongdyub/Music-To-Text-Description-Model

Folders and files

Latest commit

History

Repository files navigation

Music-Description-Generator

Example Outputs

Ex 1 * Only Audio file Not Video!

Ex 2 * Only Audio file Not Video!

Model Architecture

Audio Encoder

Text Decoder

Training & Test Loss Graph

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages