Replies: 1 comment 2 replies
-
|
The real challenges of a comunity database is not the technology (I did an easy experiment with just matching on artist and title and already save a lot of analysis time) but the copyright law. Embedding are not reversible to the real music, so in some legislation is already ok. But then other legislation say that to do machine learning itself on a song you need to have the right. Then you can say back that we don’t use this information to “create new music” but only to search them better, like some very well done metadata. But sincerely I don’t want to deal with legislation in my free time, because to do it you need money to have a lawyer that can consult initially and help in case some other lawyer write you. TL;DR: the idea have value, multiple value. But have legal implications that I’m not able to deal with. We need a legal free contributor able to deal with international law related to copyright. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Using tools like this require that users deploy pretty lengthy audio analysis. Every user or installed instance has to undertake the same analysis.
If the md5 of the audio stream were linked to the analysis results it would be possible to skip analysis by simply checking the md5sum against a database and if a match is found pull the relevant results from the table, avoiding the need to reperform analysis when moving files around, renaming folders etc.
Tech savvy users could share their analysis results with one another, a script could populate AudioMuse-AI's tables and analysis would only need to be done for tracks not previously analysed. A community database could be established is there was sufficient demand. By using md5 there's also no need to recompute md5 for FLAC because it can be read from the FLAC header - it's written as part of the spec.
To leverage something like this AudioMuse-AI should be able to export the md5sum of the audio of all tracks it has analysed to a table. Equally, when being deployed it should read and leverage first from this table if present, using the md5 as the matching key. Given a lot of the codebase is Python based it should be trivial to leverage Polars to handle this at scale.
Beta Was this translation helpful? Give feedback.
All reactions