Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7bit encoded email with line break #151

Open
htulipe opened this issue Nov 22, 2022 · 8 comments
Open

7bit encoded email with line break #151

htulipe opened this issue Nov 22, 2022 · 8 comments

Comments

@htulipe
Copy link

htulipe commented Nov 22, 2022

Hello. Maybe a dumb question but I can't see how the lib can successfully parse a 7bit encoded email that contain line breaks. By successfully I mean without losing line breaks.

Version

mail 0.2.3
Erlang/OTP 24 [erts-12.3.2.6] [source] [64-bit] [smp:5:5] [ds:5:5:10] [async-threads:1]
Elixir 1.14.0 (compiled with Erlang/OTP 24)

Test Case

Using the parse_email function defined in parser test:

parse_email("""
    To: [email protected]
    From: [email protected]
    Subject: Test Email
    Content-Transfer-Encoding: 7bit

    This is the body!
    It has more than one line
    """)

Steps to reproduce

Run the above code

Expected Behavior

The returned body should have some sort of line breaks:
This is the body!\nIt has more than one line

Actual Behavior

The returned body no longer have line breaks:
This is the body!It has more than one line

Reading at the code, I see that the lib joins the body lines using \r\n but the SevenBit parser called just after drops them. Am I missing something ?

Joining the body lines with \n instead of \r\n seems to fix the issue.

Thanks in advance

PS: I saw the previous issue on the matter but could not find an answer there so I allowed myself to repost a new issue.

@bcardarella
Copy link
Member

Is @andrewtimberlake 's answer in #138 not sufficient?

@htulipe
Copy link
Author

htulipe commented Nov 22, 2022

I agree with Andrew's RFC understanding but the direct conclusion is that we can't send multi-line emails with this encoding. That can't be possible, I must be missing something.

May I add that python email module parses the same email without loosing line breaks.

@bcardarella
Copy link
Member

@htulipe so your issue is not with the parsing but the compilation from the data structure into an email?

@htulipe
Copy link
Author

htulipe commented Nov 23, 2022

My goal is to read an EML file and transform it in some data structure that my frontend end can then display.

@SergeyMosin
Copy link

Is @andrewtimberlake 's answer in #138 not sufficient? ( #138 (comment) )

First of all, thank you for your work on this module. However, I have the following question...

What should be the expected parsed message body for the following code according to RFC 2045 §2.7 ?

IO.inspect(
      Mail.parse([
        "From: [email protected]",
        "To: [email protected]",
        "Subject: test",
        "Content-Transfer-Encoding: 7bit", # or 8bit
        "",
        "line1",
        "line2"
      ])
    )

Option A: line1\r\nline2

  1. ✔️ Data that is all represented as relatively short lines with 998 octets or less between CRLF line separation sequences.
  2. ✔️ No octets with decimal values greater than 127 are allowed and neither are NULs (octets with decimal value 0).
  3. ✔️ CR (decimal value 13) and LF (decimal value 10) octets only occur as part of CRLF line separation sequences.

Option B: line1line2

  1. ❌ Data that is all represented as relatively short lines with 998 octets or less between CRLF line separation sequences.
  2. ✔️ No octets with decimal values greater than 127 are allowed and neither are NULs (octets with decimal value 0).
  3. CR (decimal value 13) and LF (decimal value 10) octets only occur as part of CRLF line separation sequences.

I personally lean towards Option A, but the Mail.parse function currently outputs Option B which seems to diverge from the RFC in points 1 and 3 because the "CRLF line separation sequence" is missing.

@andrewtimberlake
Copy link
Collaborator

I subsequently found out that 7bit decoding was removing line breaks indiscriminately and should only be removing those used to wrap lines exceeding the maximum length of 1000 chars
I have merged in a fix #164

@SergeyMosin
Copy link

Thank you for the quick fix. I think the same problem effects the 8bit encoding as well.
Example:

IO.inspect(
  Mail.parse([
    "From: [email protected]",
    "To: [email protected]",
    "Subject: test",
    "Content-Type: text/plain; charset=UTF-8",
    "Content-Transfer-Encoding: 8bit",
    "",
    "lïne1",
    "lïne2"
  ])
)

outputs this:

%Mail.Message{
  headers: %{
    "content-transfer-encoding" => "8bit",
    "content-type" => ["text/plain", {"charset", "UTF-8"}],
    "from" => "[email protected]",
    "subject" => "test",
    "to" => ["[email protected]"]
  },
  body: "lïne1lïne2",
  parts: [],
  multipart: false
}

no \r\n in the body

@andrewtimberlake
Copy link
Collaborator

Thanks, great catch. Fixed in #166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants