Sharing analysis results enabling analysis to focus only on what's unique in a collection #262

audiomuze · 2025-12-31T10:45:01Z

audiomuze
Dec 31, 2025

Using tools like this require that users deploy pretty lengthy audio analysis. Every user or installed instance has to undertake the same analysis.

If the md5 of the audio stream were linked to the analysis results it would be possible to skip analysis by simply checking the md5sum against a database and if a match is found pull the relevant results from the table, avoiding the need to reperform analysis when moving files around, renaming folders etc.

Tech savvy users could share their analysis results with one another, a script could populate AudioMuse-AI's tables and analysis would only need to be done for tracks not previously analysed. A community database could be established is there was sufficient demand. By using md5 there's also no need to recompute md5 for FLAC because it can be read from the FLAC header - it's written as part of the spec.

To leverage something like this AudioMuse-AI should be able to export the md5sum of the audio of all tracks it has analysed to a table. Equally, when being deployed it should read and leverage first from this table if present, using the md5 as the matching key. Given a lot of the codebase is Python based it should be trivial to leverage Polars to handle this at scale.

NeptuneHub · 2025-12-31T11:09:42Z

NeptuneHub
Dec 31, 2025
Maintainer

The real challenges of a comunity database is not the technology (I did an easy experiment with just matching on artist and title and already save a lot of analysis time) but the copyright law.

Embedding are not reversible to the real music, so in some legislation is already ok. But then other legislation say that to do machine learning itself on a song you need to have the right. Then you can say back that we don’t use this information to “create new music” but only to search them better, like some very well done metadata.

But sincerely I don’t want to deal with legislation in my free time, because to do it you need money to have a lawyer that can consult initially and help in case some other lawyer write you.

TL;DR: the idea have value, multiple value. But have legal implications that I’m not able to deal with. We need a legal free contributor able to deal with international law related to copyright.

2 replies

audiomuze Dec 31, 2025
Author

Understood, how about doing everything but deploying a community database. This way the user is able to rename or move their files around without them having to be re-analysed. AudioMuse-AI is storing the analysis for its own purposes already, this just removes the need to re-analyse in the event a user moves or renames their files. AudioMuse-AI can add a scan and refresh capability that then reads from this table and updates accordingly? If users choose to do something else with the table that's their responsibility.

NeptuneHub Dec 31, 2025
Maintainer

This could be a good point. I need to think about.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharing analysis results enabling analysis to focus only on what's unique in a collection #262

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Sharing analysis results enabling analysis to focus only on what's unique in a collection #262

Uh oh!

audiomuze Dec 31, 2025

Replies: 1 comment · 2 replies

Uh oh!

NeptuneHub Dec 31, 2025 Maintainer

Uh oh!

Uh oh!

audiomuze Dec 31, 2025 Author

Uh oh!

NeptuneHub Dec 31, 2025 Maintainer

audiomuze
Dec 31, 2025

Replies: 1 comment 2 replies

NeptuneHub
Dec 31, 2025
Maintainer

audiomuze Dec 31, 2025
Author

NeptuneHub Dec 31, 2025
Maintainer