Message ID | 20210421013404.17383-1-congdanhqx@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mailinfo: strip CR from base64/quoted-printable email | expand |
Đoàn Trần Công Danh <congdanhqx@gmail.com> writes: > When an SMTP server receives an 8-bit email message, possibly with only > LF as line ending, some of those servers decide to change said LF to > CRLF. > > Some other SMTP servers, when receives an 8-bit email message, decide to > encoding such message in base64 and/or quoted-printable instead. encoding -> encode > > If an email is transfered through those 2 email servers in order, the > final recipients will receive an email contains a patch mungled with > CRLF encoded inside another encoding. Thus, such CR couldn't be dropped > by mailsplit. Such accidents have been observed in the wild [1]. > > Let's guess if such CR was added automatically and strip them in > mailinfo. > > [1]: https://nmbug.notmuchmail.org/nmweb/show/m2lf9ejegj.fsf%40guru.guru-group.fi > > Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> > --- > > I'm not sure if guessing the heuristic to strip CR is a good approach. > I think it's better to pass --keep-cr down from git-am. > Let's say --keep-cr=<yes|no|auto> It matches my instinct to tie this with the existing --keep-cr option, even though I admit that I haven't thought things through. .
On 2021-04-21 at 01:34:04, Đoàn Trần Công Danh wrote: > When an SMTP server receives an 8-bit email message, possibly with only > LF as line ending, some of those servers decide to change said LF to > CRLF. > > Some other SMTP servers, when receives an 8-bit email message, decide to > encoding such message in base64 and/or quoted-printable instead. This really isn't an SMTP server. It's mailing list software, namely mailman, and I would argue it's a bug, even though we may want to work around it. For example, re-encoding the message breaks DKIM signatures, which means that mailman is likely to cause mail to be needlessly rejected. 8BITMIME is now so common with SMTP that I'd argue that we should just write off servers that don't support it (especially in the context of SMTPUTF8 existing), but this isn't the case of an SMTP server being stuck in the last century. Can we say more accurately that this is mailing list software (or just call it out by name)? > If an email is transfered through those 2 email servers in order, the > final recipients will receive an email contains a patch mungled with > CRLF encoded inside another encoding. Thus, such CR couldn't be dropped > by mailsplit. Such accidents have been observed in the wild [1]. > > Let's guess if such CR was added automatically and strip them in > mailinfo. > > [1]: https://nmbug.notmuchmail.org/nmweb/show/m2lf9ejegj.fsf%40guru.guru-group.fi > > Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> > --- > > I'm not sure if guessing the heuristic to strip CR is a good approach. > I think it's better to pass --keep-cr down from git-am. > Let's say --keep-cr=<yes|no|auto> I think we may want a separate option here. When I send a 7bit or 8bit body, I expect text canonicalization on the line endings. However, when I send a base64 or quoted-printable body, I don't expect my data to be modified at all, and absent a compelling reason, doing so is incorrect. In most cases, using base64 or quoted-printable is going to mean that the sender knew that the body shouldn't be modified, not that mailman modified it, so we should make line munging in this case opt-in.
On 2021-04-21 03:32:07+0000, "brian m. carlson" <sandals@crustytoothpaste.net> wrote: > On 2021-04-21 at 01:34:04, Đoàn Trần Công Danh wrote: > > When an SMTP server receives an 8-bit email message, possibly with only > > LF as line ending, some of those servers decide to change said LF to > > CRLF. > > > > Some other SMTP servers, when receives an 8-bit email message, decide to > > encoding such message in base64 and/or quoted-printable instead. > > This really isn't an SMTP server. It's mailing list software, namely > mailman, and I would argue it's a bug, even though we may want to work > around it. For example, re-encoding the message breaks DKIM signatures, > which means that mailman is likely to cause mail to be needlessly > rejected. > > 8BITMIME is now so common with SMTP that I'd argue that we should just > write off servers that don't support it (especially in the context of > SMTPUTF8 existing), but this isn't the case of an SMTP server being > stuck in the last century. Can we say more accurately that this is > mailing list software (or just call it out by name)? I think replace "SMTP servers" with "mailing list managers" is correct. I don't feel comfortable to call it out, since I don't know if other managers do it that way or not. > > > If an email is transfered through those 2 email servers in order, the > > final recipients will receive an email contains a patch mungled with > > CRLF encoded inside another encoding. Thus, such CR couldn't be dropped > > by mailsplit. Such accidents have been observed in the wild [1]. > > > > Let's guess if such CR was added automatically and strip them in > > mailinfo. > > > > [1]: https://nmbug.notmuchmail.org/nmweb/show/m2lf9ejegj.fsf%40guru.guru-group.fi > > > > Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> > > --- > > > > I'm not sure if guessing the heuristic to strip CR is a good approach. > > I think it's better to pass --keep-cr down from git-am. > > Let's say --keep-cr=<yes|no|auto> > > I think we may want a separate option here. When I send a 7bit or 8bit > body, I expect text canonicalization on the line endings. However, when > I send a base64 or quoted-printable body, I don't expect my data to be > modified at all, and absent a compelling reason, doing so is incorrect. > In most cases, using base64 or quoted-printable is going to mean that > the sender knew that the body shouldn't be modified, not that mailman > modified it, so we should make line munging in this case opt-in. Make sense, this patch was sent mostly for some discussion first. Would you mind suggest something for the option. I'm thinking about --quoted-cr=<nowarn|warn|fix>, mimicking the --whitespace option.
On 2021-04-21 at 12:07:42, Đoàn Trần Công Danh wrote: > I think replace "SMTP servers" with "mailing list managers" is > correct. I don't feel comfortable to call it out, since I don't know > if other managers do it that way or not. I think that's fair. I would hope not for the reasons I mentioned, but it also would not be surprising to me if others did nevertheless. > Make sense, this patch was sent mostly for some discussion first. > Would you mind suggest something for the option. > > I'm thinking about --quoted-cr=<nowarn|warn|fix>, mimicking the > --whitespace option. I think that sounds like a great name.
diff --git a/mailinfo.c b/mailinfo.c index 5681d9130d..dbff867f42 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -988,16 +988,27 @@ static int handle_boundary(struct mailinfo *mi, struct strbuf *line) } static void handle_filter_flowed(struct mailinfo *mi, struct strbuf *line, - struct strbuf *prev) + struct strbuf *prev, int *keep_cr) { size_t len = line->len; const char *rest; if (!mi->format_flowed) { + if (*keep_cr == -1 && len >= 2) + *keep_cr = !(line->buf[len - 2] == '\r' && + line->buf[len - 1] == '\n'); + if (!*keep_cr && len >= 2 && + line->buf[len - 2] == '\r' && + line->buf[len - 1] == '\n') { + strbuf_setlen(line, len - 2); + strbuf_addch(line, '\n'); + len--; + } handle_filter(mi, line); return; } + *keep_cr = 1; if (line->buf[len - 1] == '\n') { len--; if (len && line->buf[len - 1] == '\r') @@ -1036,6 +1047,7 @@ static void handle_filter_flowed(struct mailinfo *mi, struct strbuf *line, static void handle_body(struct mailinfo *mi, struct strbuf *line) { struct strbuf prev = STRBUF_INIT; + int keep_cr = -1; /* Skip up to the first boundary */ if (*(mi->content_top)) { @@ -1081,7 +1093,7 @@ static void handle_body(struct mailinfo *mi, struct strbuf *line) strbuf_addbuf(&prev, sb); break; } - handle_filter_flowed(mi, sb, &prev); + handle_filter_flowed(mi, sb, &prev, &keep_cr); } /* * The partial chunk is saved in "prev" and will be @@ -1091,7 +1103,9 @@ static void handle_body(struct mailinfo *mi, struct strbuf *line) break; } default: - handle_filter_flowed(mi, line, &prev); + /* CR in plain message was processed in mailsplit */ + keep_cr = 1; + handle_filter_flowed(mi, line, &prev, &keep_cr); } if (mi->input_error) diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh index 147e616533..9ccc11d16a 100755 --- a/t/t5100-mailinfo.sh +++ b/t/t5100-mailinfo.sh @@ -228,4 +228,9 @@ test_expect_success 'mailinfo handles unusual header whitespace' ' test_cmp expect actual ' +test_expect_success 'mailinfo strip CR after decode base64' ' + cp $DATA/cr-base64.mbox 1000 && + check_mailinfo 1000 "" +' + test_done diff --git a/t/t5100/cr-base64.mbox b/t/t5100/cr-base64.mbox new file mode 100644 index 0000000000..6ea9806a6b --- /dev/null +++ b/t/t5100/cr-base64.mbox @@ -0,0 +1,22 @@ +From: A U Thor <mail@example.com> +To: list@example.org +Subject: [PATCH v2] sample +Date: Mon, 3 Aug 2020 22:40:55 +0700 +Message-Id: <msg-id@example.com> +Content-Type: text/plain; charset="utf-8" +Content-Transfer-Encoding: base64 + +T24gZGlmZmVyZW50IGRpc3RybywgcHl0ZXN0IGlzIHN1ZmZpeGVkIHdpdGggZGlmZmVyZW50IHBh +dHRlcm5zLg0KDQotLS0NCiBjb25maWd1cmUgfCAyICstDQogMSBmaWxlIGNoYW5nZWQsIDEgaW5z +ZXJ0aW9uKCspLCAxIGRlbGV0aW9uKC0pDQoNCmRpZmYgLS1naXQgYS9jb25maWd1cmUgYi9jb25m +aWd1cmUNCmluZGV4IGRiMzUzOGIzLi5mN2MxYzA5NSAxMDA3NTUNCi0tLSBhL2NvbmZpZ3VyZQ0K +KysrIGIvY29uZmlndXJlDQpAQCAtODE0LDcgKzgxNCw3IEBAIGlmIFsgJGhhdmVfcHl0aG9uMyAt +ZXEgMSBdOyB0aGVuDQogICAgIHByaW50ZiAiQ2hlY2tpbmcgZm9yIHB5dGhvbjMgcHl0ZXN0ICg+ +PSAzLjApLi4uICINCiAgICAgY29uZj0kKG1rdGVtcCkNCiAgICAgcHJpbnRmICJbcHl0ZXN0XVxu +bWludmVyc2lvbj0zLjBcbiIgPiAkY29uZg0KLSAgICBpZiBweXRlc3QtMyAtYyAkY29uZiAtLXZl +cnNpb24gPi9kZXYvbnVsbCAyPiYxOyB0aGVuDQorICAgIGlmICIkcHl0aG9uIiAtbSBweXRlc3Qg +LWMgJGNvbmYgLS12ZXJzaW9uID4vZGV2L251bGwgMj4mMTsgdGhlbg0KICAgICAgICAgcHJpbnRm +ICJZZXMuXG4iDQogICAgICAgICBoYXZlX3B5dGhvbjNfcHl0ZXN0PTENCiAgICAgZWxzZQ0KLS0g +DQoyLjI4LjANCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f +CmV4YW1wbGUgbWFpbGluZyBsaXN0IC0tIGxpc3RAZXhhbXBsZS5vcmcKVG8gdW5zdWJzY3JpYmUg +c2VuZCBhbiBlbWFpbCB0byBsaXN0LWxlYXZlQGV4YW1wbGUub3JnCg== diff --git a/t/t5100/info1000 b/t/t5100/info1000 new file mode 100644 index 0000000000..dab2228b70 --- /dev/null +++ b/t/t5100/info1000 @@ -0,0 +1,5 @@ +Author: A U Thor +Email: mail@example.com +Subject: sample +Date: Mon, 3 Aug 2020 22:40:55 +0700 + diff --git a/t/t5100/msg1000 b/t/t5100/msg1000 new file mode 100644 index 0000000000..5e8e860aae --- /dev/null +++ b/t/t5100/msg1000 @@ -0,0 +1,2 @@ +On different distro, pytest is suffixed with different patterns. + diff --git a/t/t5100/patch1000 b/t/t5100/patch1000 new file mode 100644 index 0000000000..51c4fb4cb5 --- /dev/null +++ b/t/t5100/patch1000 @@ -0,0 +1,22 @@ +--- + configure | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/configure b/configure +index db3538b3..f7c1c095 100755 +--- a/configure ++++ b/configure +@@ -814,7 +814,7 @@ if [ $have_python3 -eq 1 ]; then + printf "Checking for python3 pytest (>= 3.0)... " + conf=$(mktemp) + printf "[pytest]\nminversion=3.0\n" > $conf +- if pytest-3 -c $conf --version >/dev/null 2>&1; then ++ if "$python" -m pytest -c $conf --version >/dev/null 2>&1; then + printf "Yes.\n" + have_python3_pytest=1 + else +-- +2.28.0 +_______________________________________________ +example mailing list -- list@example.org +To unsubscribe send an email to list-leave@example.org
When an SMTP server receives an 8-bit email message, possibly with only LF as line ending, some of those servers decide to change said LF to CRLF. Some other SMTP servers, when receives an 8-bit email message, decide to encoding such message in base64 and/or quoted-printable instead. If an email is transfered through those 2 email servers in order, the final recipients will receive an email contains a patch mungled with CRLF encoded inside another encoding. Thus, such CR couldn't be dropped by mailsplit. Such accidents have been observed in the wild [1]. Let's guess if such CR was added automatically and strip them in mailinfo. [1]: https://nmbug.notmuchmail.org/nmweb/show/m2lf9ejegj.fsf%40guru.guru-group.fi Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> --- I'm not sure if guessing the heuristic to strip CR is a good approach. I think it's better to pass --keep-cr down from git-am. Let's say --keep-cr=<yes|no|auto> mailinfo.c | 20 +++++++++++++++++--- t/t5100-mailinfo.sh | 5 +++++ t/t5100/cr-base64.mbox | 22 ++++++++++++++++++++++ t/t5100/info1000 | 5 +++++ t/t5100/msg1000 | 2 ++ t/t5100/patch1000 | 22 ++++++++++++++++++++++ 6 files changed, 73 insertions(+), 3 deletions(-) create mode 100644 t/t5100/cr-base64.mbox create mode 100644 t/t5100/info1000 create mode 100644 t/t5100/msg1000 create mode 100644 t/t5100/patch1000