Error - "files: mimetype text/plain; charset=utf-8 not supported in file ..." #69

thx1111 · 2024-07-16T00:54:22Z

Arch Linux
go 2:1.22.4-1

[INFO] files: found 243 input files in /home/james/Maildir/.dmarc/cur/
[ERROR] files: mimetype text/plain; charset=utf-8 not supported in file /home/james/Maildir/.dmarc/cur/..., skip
...
[ERROR] processFiles: reports list is empty

These errors apply to every single dmarc report received, including from various different sources.

Are each of the dmarc report files required to be manually extracted from the dmarc email, and moved to another directory, before processing? Or, is this a bug?

https://pkg.go.dev/unicode/utf8 says:

Package utf8 implements functions and constants to support text encoded in UTF-8.

https://pkg.go.dev/mime says:

Package mime implements parts of the MIME spec.

Am I missing this utf8 package? Or the mime package? Or something else?

The text was updated successfully, but these errors were encountered:

moorereason · 2024-07-16T16:33:43Z

Can you provide a sample message file? Maybe in a Github gist or something?

Are each of the dmarc report files required to be manually extracted from the dmarc email, and moved to another directory, before processing?

No. I'm processing individual messages (in EML format) and the tool extracts the attached reports properly (usually).

Am I missing this utf8 package? Or the mime package?

You don't need to worry about these dependencies since the tool written in Go. All the package requirements are settled at build time, so you don't have to worry about these dependencies in your runtime environment.

tierpod · 2024-07-17T16:03:42Z

It's interesting. I have no mime packages installed on my system and can process eml files as well. If you cannot provide sample file, you can post here output of

file <path_to_file>
head -n 2 <path_to_file>

tierpod · 2024-07-17T16:09:17Z

But according to this https://cs.opensource.google/go/go/+/refs/tags/go1.22.5:src/mime/type_unix.go - it's possible that some packages are needed to detect mime type. I have "/usr/share/mime/globs2" file on my Fedora system and it belongs to shared-mime-info package

thx1111 · 2024-07-17T17:06:48Z

Here's a recent random sample dmarc file from Google with Content-Type: application/zip;:
sampledmarcfile.txt

Here's another from Comcast with Content-Type: multipart/mixed; and Content-Type: text/plain; charset="utf-8"
sampledmarcfileutf8.txt

Both files include the header MIME-Version: 1.0.

I also have the "/usr/share/mime/globs2" file, generated by update-mime-database /usr/share/mime from the Arch Linux "shared-mime-info 2.4-1" package.

It may or may not be important that these files are NOT specifically .eml files, and instead, are Courier Maildir++ format files, delivered through postfix. There may be some subtly important difference in the file formats. For instance running file against some random .eml file gives:

 Birthday.eml: Unicode text, UTF-8 text, with very long lines (835), with CRLF line terminators

while running file against the above sample files gives instead, for instance:

sampledmarcfileutf8.txt: SMTP mail, ASCII text

moorereason · 2024-07-17T18:16:50Z

If you copy one of the files out of the Maildir and rename it as a .eml file, does this tool process it?

thx1111 · 2024-07-18T01:13:00Z

Ha! Yes.

I moved the sample files to a directory "dmarctest" and changed the extensions to .eml. Here is the output:

$ dmarc-report-converter
[INFO] files: found 2 eml file(s), extract attachments to /home/james/dmarctest/
[INFO] extractAttachment: found attachment: google.com!nurealm.net!1720828800!1720915199.zip
[INFO] extractAttachment: save attachment to: /home/james/dmarctest/google.com!nurealm.net!1720828800!1720915199.zip
[WARN] extractAttachment: inline MIME headers are not currently supported, skip
[INFO] extractAttachment: found attachment: comcast.net!nurealm.net!1710460800!1710547200.xml.gz
[INFO] extractAttachment: save attachment to: /home/james/dmarctest/comcast.net!nurealm.net!1710460800!1710547200.xml.gz
[INFO] files: found 2 input files in /home/james/dmarctest/
[INFO] ReadParseZIP: read file google.com!nurealm.net!1720828800!1720915199.xml from zip
[INFO] merge: 1 report(s), grouped by key '[email protected]!nurealm.net'
[INFO] merge: 1 report(s), grouped by key '[email protected]!nurealm.net'
[INFO] output: write to file /var/lib/dmarc-report-converter/outputs/2024-03-14-nurealm.net/[email protected]
[INFO] output: write to file /var/lib/dmarc-report-converter/outputs/2024-07-12-nurealm.net/[email protected]

Of course, now I'm curious - what happened there?

For reference and for instance, the sample files have the original names "1720949687.V804I914868M641386.topaz:2,S" and "1710608371.V804I910597M983359.topaz:2,". These are typical Maildir file names. Dovecot has some explanation of the filename format at https://doc.dovecot.org/admin_manual/mailbox_formats/maildir/ .

thx1111 · 2024-07-18T02:17:27Z

As an additional check, I went back and renamed those sample files to "sampledmarcfile" and "sampledmarcfileutf8", without the explicit .eml suffix - to see if the Maildir name format itself might have been the cause of the naming problem - and re-ran dmarc-report-converter. But, same error messages:

$ dmarc-report-converter
[INFO] files: found 2 input files in /home/james/dmarctest/
[ERROR] files: mimetype text/plain; charset=utf-8 not supported in file /home/james/dmarctest/sampledmarcfile, skip
[ERROR] files: mimetype text/plain; charset=utf-8 not supported in file /home/james/dmarctest/sampledmarcfileutf8, skip
[ERROR] processFiles: reports list is empty

The naming problem, then, really does appear to be connected to the .eml filename extension.

tierpod · 2024-07-18T07:15:47Z

Thank you for debugging, it looks like a bug

moorereason · 2024-07-18T10:30:34Z

I'd consider this an enhancement request instead of a bug, but it's your project. 😉

I ran into this issue but worked around it. I use AWS SES to receive DMARC reports which are placed in an S3 bucket with a hashed filename with no file extension. I have a script that downloads the files from S3 and renames them to have an eml file extension.

thx1111 · 2024-07-18T13:58:23Z

@moorereason While I can appreciate your high pain threshold, I'm inclined toward "bug", and not "enhancement". Still, we might agree that this .eml suffix issue is not a "feature". Either way, thanks very much for your suggestion to test the file name extension! I certainly wouldn't have expected something like that.

tierpod · 2024-07-22T15:14:04Z

I use AWS SES to receive DMARC reports which are placed in an S3 bucket with a hashed filename with no file extension

Interesting way. I use input.imap to download and process reports directly from email server :)

tierpod · 2024-07-22T15:15:29Z

dmarc-report-converter/cmd/dmarc-report-converter/files.go

Line 58 in 50e8df5

emlFiles, err := filepath.Glob(filepath.Join(c.cfg.Input.Dir, "*.eml"))

The reason for this unexpected behavior is simple

tierpod added the bug Something isn't working label Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error - "files: mimetype text/plain; charset=utf-8 not supported in file ..." #69

Error - "files: mimetype text/plain; charset=utf-8 not supported in file ..." #69

thx1111 commented Jul 16, 2024

moorereason commented Jul 16, 2024

tierpod commented Jul 17, 2024

tierpod commented Jul 17, 2024

thx1111 commented Jul 17, 2024

moorereason commented Jul 17, 2024

thx1111 commented Jul 18, 2024

thx1111 commented Jul 18, 2024

tierpod commented Jul 18, 2024

moorereason commented Jul 18, 2024

thx1111 commented Jul 18, 2024

tierpod commented Jul 22, 2024

tierpod commented Jul 22, 2024

Error - "files: mimetype text/plain; charset=utf-8 not supported in file ..." #69

Error - "files: mimetype text/plain; charset=utf-8 not supported in file ..." #69

Comments

thx1111 commented Jul 16, 2024

moorereason commented Jul 16, 2024

tierpod commented Jul 17, 2024

tierpod commented Jul 17, 2024

thx1111 commented Jul 17, 2024

moorereason commented Jul 17, 2024

thx1111 commented Jul 18, 2024

thx1111 commented Jul 18, 2024

tierpod commented Jul 18, 2024

moorereason commented Jul 18, 2024

thx1111 commented Jul 18, 2024

tierpod commented Jul 22, 2024

tierpod commented Jul 22, 2024