Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support EBML (Matroska, WebM) files #218

Draft
wants to merge 64 commits into
base: main
Choose a base branch
from
Draft

Support EBML (Matroska, WebM) files #218

wants to merge 64 commits into from

Conversation

Serial-ATA
Copy link
Owner

closes #141

@milesegan
Copy link

Would you be interested in any help getting this one wrapped up?

This is useful because it looks like webm is actually now a recommended cross platform audio format:
https://github.com/goldfire/howler.js?tab=readme-ov-file#format-recommendations

@Serial-ATA
Copy link
Owner Author

@milesegan I'll put out 0.21.0 today and pick this back up for 0.22.0. It is something I'd like to start working on again.

@Serial-ATA Serial-ATA added this to the 0.22.0 milestone Jul 29, 2024
@milesegan
Copy link

@Serial-ATA That would be fantastic. My Rust isn't so strong but happy to help out if I can.

@Serial-ATA
Copy link
Owner Author

@milesegan I finally got it to a state where it can read and convert Matroska files into TaggedFile. Still need to handle the generic -> concrete conversion as well as writing. If you have any Matroska files, could you try them out on this branch and let me know how it goes?

@hasezoey
Copy link

currently trying e37fe441f6cafcd66dce1b1618106348deab3c21, instantly with a error:

   Compiling lofty v0.21.1 (https://github.com/Serial-ATA/lofty-rs.git?branch=matroska#e37fe441)
error[E0308]: mismatched types
   --> /home/hasezoey/.local/cargo/git/checkouts/lofty-rs-f5e48f8219b271cf/e37fe44/lofty/src/ebml/read/segment_tags.rs:218:3
    |
218 |         language,
    |         ^^^^^^^^ expected `Language`, found `Option<Language>`
    |
    = note: expected enum `ebml::tag::simple_tag::Language`
               found enum `Option<ebml::tag::simple_tag::Language>`
help: consider using `Option::expect` to unwrap the `Option<ebml::tag::simple_tag::Language>` value, panicking if the value is an `Option::None`
    |
218 |         language: language.expect("REASON"),
    |                 +++++++++++++++++++++++++++

For more information about this error, try `rustc --explain E0308`.

@Serial-ATA
Copy link
Owner Author

Serial-ATA commented Oct 28, 2024

Thanks for trying it out, forgot to include a file in my last commit. Fixed.

@hasezoey
Copy link

hasezoey commented Oct 28, 2024

aside from that error, when manually fixing it (using defaults), the current implementation only seems to be able to parse AlbumArtist:

TYPE: Matroska
TAGS: [
    TagItem {
        lang: [
            117,
            110,
            100,
        ],
        description: "",
        item_key: AlbumArtist,
        item_value: Text(
            "Alan Walker",
        ),
    },
]

ffmpeg reports a lot more:

Input #0, matroska,webm, from '01 Alan Walker - Jump Start.mka':
  Metadata:
    title           : Jump Start
    DATE            : 2021-09-09
    ARTIST          : Alan Walker
    track           : 1/6
    ALBUM           : Walker Racing League
    DISC            : 1/1
    PUBLISHER       : MER
    MUSICBRAINZ_RELEASE_GROUP_ID: 21b7c435-a015-4a0f-b89d-b9429cb8a39c
    TORY            : 2021
    MUSICBRAINZ_RELEASE_TRACK_ID: 044c0619-c3e4-4e06-b87f-5a2503571d57
    ALBUM_ARTIST    : Alan Walker
    TSO2            : Walker, Alan
    ARTIST-SORT     : Walker, Alan
    TSRC            : NOG842114010
    SCRIPT          : Latn
    TMED            : Digital Media
    ORIGINALYEAR    : 2021
    ARTISTS         : Alan Walker
    BARCODE         : 886449545476
    MUSICBRAINZ_ALBUM_TYPE: ep
    MUSICBRAINZ_ALBUM_STATUS: official
    PURL            : https://www.youtube.com/watch?v=3wIxNyH8NUA
    COMMENT         : https://www.youtube.com/watch?v=3wIxNyH8NUA
    MUSICBRAINZ_ALBUM_ID: 9e65a340-778b-4e82-92d3-8dbc9d823d5b
    MUSICBRAINZ_ARTIST_ID: b0e4bc50-3062-4d31-afad-def6a6b7a8e9
    MUSICBRAINZ_ALBUM_ARTIST_ID: b0e4bc50-3062-4d31-afad-def6a6b7a8e9
    ENCODER         : Lavf60.16.100
  Duration: 00:01:29.60, start: 0.000000, bitrate: 108 kb/s
  Stream #0:0: Audio: vorbis, 48000 Hz, stereo, fltp
      Metadata:
        ENCODER         : Lavc60.31.102 libvorbis
        DURATION        : 00:01:29.599000000

code used:

if let Ok(mut tagged_file) = probe.read() {
    debug!("FILE: {:#?}", path);
    for tag in tagged_file.tags() {
        debug!("TYPE: {:#?}", tag.tag_type());
        debug!("TAGS: {:#?}", tag.items().collect::<Vec<_>>());
    }
}

@Serial-ATA
Copy link
Owner Author

Nice, thanks! Looks like I got the targets for some of these items wrong. Could you email the file to serial@[DOMAIN ON MY PROFILE]?

@hasezoey
Copy link

Could you email the file to

send, hopefully i got the correct address, it should have been fine to be posted here (i modified it to not have any audio), but just to be safe i emailed it.

@Serial-ATA
Copy link
Owner Author

Yep, I got it.

it should have been fine to be posted here (i modified it to not have any audio)

Yeah, that'd be fine to send here. Normally people don't remove the audio though, so email is the safe default.

Was this file made with FFmpeg converting an MP3 to MKA? It looks like it's not using tag targets correctly. The ISRC and MusicBrainz track ID are on the album level, for example.

@hasezoey
Copy link

hasezoey commented Oct 28, 2024

Was this file made with FFmpeg converting an MP3 to MKA? It looks like it's not using tag targets correctly. The ISRC and MusicBrainz track ID are on the album level, for example.

yes it was made from converting a mp3 to a mka via ffmpeg (-map_metadata), no other specific options, so could be a ffmpeg issue

@hasezoey
Copy link

hasezoey commented Oct 29, 2024

I read a bit more about how matroska want tags to be handled, like it wants TargetType = ALBUM; TITLE = Album title and TargetType = Track; TITLE = Track title instead of things like TargetType = ALBUM; ALBUM = Album Title, which ffmpeg currently outputs.
This seems to be a ffmpeg problem which currently does not have fix, and the only related patch series i could find seems to not be merged.
FFmpeg has its internal tag names like album (similar to loftys AlbumTitle), which it currently always translates to the custom matroska tag ALBUM, see ffmpeg -f lavfi -i anullsrc -t 1 -metadata album="test album" test.mka (ffmepg n7.0.2).

Note: there seems to be a related discussion (the same issue as with the file i provided) in the mp3tag project
Edit: maybe related ffmpeg issue

So maybe consider adding those custom tags as a "relaxed" rule and when writing do it like matroska actually wants?


aside from that, i also just noticed that lofty did not parse the Segment information's title yet:

$ mkvinfo out.mka
...
+ Segment: size 5730
|+ Seek head (subentries will be skipped)
|+ EBML void: size 83
|+ Segment information
| + Timestamp scale: 1000000
| + Title: Jump Start
...

PS: ffmpeg will currently always translate a TITLE tag to the segment's title and not output a TITLE tag when muxing to matroska

@Serial-ATA
Copy link
Owner Author

Thanks a ton for looking into this.

So maybe consider adding those custom tags as a "relaxed" rule and when writing do it like matroska actually wants?

Hm, that does really complicate things.

In the case of your file:

  • It has both Target = Album; ARTIST = Alan Walker and Target = Album; ALBUM_ARTIST = Alan Walker. There's no way to safely know that ARTIST can be the track artist.
  • There's a DATE tag, but what kind of date?
  • The disc and track numbers are stored as current/total, which isn't right.
  • It has a bunch of untranslated ID3v2 tags.

The only safe assumption to make in that case is that ALBUM_ARTIST should just be replaced with ARTIST at the album level.

PS: ffmpeg will currently always translate a TITLE tag to the segment's title and not output a TITLE tag when muxing to matroska

Odd. mp3tag just deletes the segment title and puts it in the tags, which is an option I guess. FFmpeg seems to be able to pick up that title. It does run the risk of deleting someone's segment name unexpectedly, though.

This is a pretty unfortunate situation since I imagine FFmpeg isn't the only software with these kinds of issues (I don't blame them, this format is super over-engineered). Trying to support even some of these weird cases will really complicate things. I wonder if it's even worth it, or if Lofty should just expect closer-to-spec-compliant inputs, and ignoring all these junk tags.

@hasezoey
Copy link

Hm, that does really complicate things.

I dont know how much logic you want to put into this, but maybe if Album/ALBUM_ARTIST exists and Track/Artist does not exist, assume Album/Artist is the track artist? The same for title?
As for the other tags like DATE or disc and tracks, i dont know what the best option would be.

In any case, if this really is this complicated, just focus on spec-compliancy with the initial PR and lets put in issues / PRs in the future to add workarounds.

Regardless of workarounds applied, shouldnt those tags show up as custom tags (similar to id3's TXX) in Lofty?

@Serial-ATA
Copy link
Owner Author

Regardless of workarounds applied, shouldnt those tags show up as custom tags (similar to id3's TXX) in Lofty?

Yep, Matroska lets you put whatever tags you want outside of their official ones. Any software outside of FFmpeg won't be able to understand it, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Webm support
3 participants