Message ID | e3e542d38818b762f8d6d6b50bc42e01b070772b.1587289680.git.congdanhqx@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mailinfo: disallow and complains about NUL character | expand |
On Sun, 19 Apr 2020 at 13:03, Đoàn Trần Công Danh <congdanhqx@gmail.com> wrote: > > We're passing buffer from strbuf to reencode_string, > which will call strlen(3) on that buffer, > and discard the length of newly created buffer. > Then, we compute the length of the return buffer to attach to strbuf. > > During this process, we introduce a discrimination between mail > originally written in utf-8 and other encoding. > > * if the email was written in utf-8, we leave it as is. If there is > a NUL character in that line, we complains loudly: > > error: a NUL byte in commit log message not allowed. > > * if the email was written in other encoding, we truncate the data as > the NUL character in that line, then we used the truncated line for > the metadata. > > We can do better by reusing all the available information, > and call the underlying lower level function that will be called > indirectly by reencode_string. By doing this, we will also postpone > the NUL character processing to the commit step, which will > complains about the faulty metadata. > > Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> This makes sense to me. I think the subject could be adapted though? Now it's not about "reusing info" anymore, it's about using *other*, *better* info. Maybe mailinfo.c: avoid strlen on strings that might contain NUL ? Could probably be improved further.. > --- a/mailinfo.c > +++ b/mailinfo.c > @@ -447,19 +447,21 @@ static int convert_to_utf8(struct mailinfo *mi, > struct strbuf *line, const char *charset) > { > char *out; > + size_t out_len; > > if (!mi->metainfo_charset || !charset || !*charset) > return 0; > > if (same_encoding(mi->metainfo_charset, charset)) > return 0; > - out = reencode_string(line->buf, mi->metainfo_charset, charset); > + out = reencode_string_len(line->buf, line->len, > + mi->metainfo_charset, charset, &out_len); > if (!out) { > mi->input_error = -1; > return error("cannot convert from %s to %s", > charset, mi->metainfo_charset); > } > - strbuf_attach(line, out, strlen(out), strlen(out)); > + strbuf_attach(line, out, out_len, out_len); > return 0; > } Same diff as before, ok. > +write_nul_patch() { > + space=' ' > + qNUL= > + case "$1" in > + subject) qNUL='=00' ;; > + esac Here it's case/esac... > + cat <<-EOF > + From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001 > + From: A U Thor <author@example.com> > + Date: Sun, 19 Apr 2020 13:42:07 +0700 > + Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?= > + MIME-Version: 1.0 > + Content-Type: text/plain; charset=ISO-8859-1 > + Content-Transfer-Encoding: 8bit > + > + EOF > + if test "$1" = body > + then > + printf "%s\0%s\n" abc def > + fi Here it's if/fi. Looks a bit inconsistent. I suppose you tried to have a case for "body" above but couldn't get it to work? Somehow, it would seem more consistent to have a qSubject and qBody and handle them the same way, but maybe that's not possible? Maybe using q_to_nul to create qBody containing a NUL? > + cat <<-\EOF > + --- > + diff --git a/afile b/afile > + new file mode 100644 > + index 0000000000..e69de29bb2 > + --$space > + 2.26.1 > + EOF > +} I think this helper function should use &&-chaining. > + > test_expect_success setup ' > # Note the missing "+++" line: > cat >bad-patch.diff <<-\EOF && > @@ -32,4 +62,18 @@ test_expect_success 'try to apply corrupted patch' ' > test_i18ncmp expected actual > ' > > +test_expect_success "NUL in commit message's body" ' > + test_when_finished "git am --abort" && > + write_nul_patch body >body.patch && > + test_must_fail git am body.patch 2>err && > + grep "a NUL byte in commit log message not allowed" err > +' Makes sense. > + > +test_expect_failure "NUL in commit message's header" ' > + test_when_finished "git am --abort" && > + write_nul_patch subject >subject.patch && > + test_must_fail git am subject.patch 2>err && > + grep "a NUL byte in Subject is not allowed" err > +' Interesting. :-) Martin
On 2020-04-19 14:29:06+0200, Martin Ågren <martin.agren@gmail.com> wrote: > I think the subject could be adapted though? Now it's not about "reusing > info" anymore, it's about using *other*, *better* info. Maybe > > mailinfo.c: avoid strlen on strings that might contain NUL This is better than mine. I couldn't think about any other alternative. > > > --- a/mailinfo.c > > +++ b/mailinfo.c > > @@ -447,19 +447,21 @@ static int convert_to_utf8(struct mailinfo *mi, > > struct strbuf *line, const char *charset) > > { > > char *out; > > + size_t out_len; > > > > if (!mi->metainfo_charset || !charset || !*charset) > > return 0; > > > > if (same_encoding(mi->metainfo_charset, charset)) > > return 0; > > - out = reencode_string(line->buf, mi->metainfo_charset, charset); > > + out = reencode_string_len(line->buf, line->len, > > + mi->metainfo_charset, charset, &out_len); > > if (!out) { > > mi->input_error = -1; > > return error("cannot convert from %s to %s", > > charset, mi->metainfo_charset); > > } > > - strbuf_attach(line, out, strlen(out), strlen(out)); > > + strbuf_attach(line, out, out_len, out_len); > > return 0; > > } > > Same diff as before, ok. > > > +write_nul_patch() { > > + space=' ' > > + qNUL= > > + case "$1" in > > + subject) qNUL='=00' ;; > > + esac > > Here it's case/esac... > > > + cat <<-EOF > > + From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001 > > + From: A U Thor <author@example.com> > > + Date: Sun, 19 Apr 2020 13:42:07 +0700 > > + Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?= > > + MIME-Version: 1.0 > > + Content-Type: text/plain; charset=ISO-8859-1 > > + Content-Transfer-Encoding: 8bit > > + > > + EOF > > + if test "$1" = body > > + then > > + printf "%s\0%s\n" abc def > > + fi > > Here it's if/fi. Looks a bit inconsistent. > > I suppose you tried to have a case for "body" above but couldn't get it > to work? Somehow, it would seem more consistent to have a qSubject and > qBody and handle them the same way, but maybe that's not possible? > Maybe using q_to_nul to create qBody containing a NUL? Those patch was written in different time, with different thought in mind. I need to send a re-roll to update the subject and moving oneline in test code to later patch. I'll update this hunk with that re-roll. Your new series _won't_ be affected, though. > > + cat <<-\EOF > > + --- > > + diff --git a/afile b/afile > > + new file mode 100644 > > + index 0000000000..e69de29bb2 > > + --$space > > + 2.26.1 > > + EOF > > +} > > I think this helper function should use &&-chaining. Correct. > > + > > test_expect_success setup ' > > # Note the missing "+++" line: > > cat >bad-patch.diff <<-\EOF && > > @@ -32,4 +62,18 @@ test_expect_success 'try to apply corrupted patch' ' > > test_i18ncmp expected actual > > ' > > > > +test_expect_success "NUL in commit message's body" ' > > + test_when_finished "git am --abort" && > > + write_nul_patch body >body.patch && > > + test_must_fail git am body.patch 2>err && > > + grep "a NUL byte in commit log message not allowed" err > > +' > > Makes sense. > > > + > > +test_expect_failure "NUL in commit message's header" ' > > + test_when_finished "git am --abort" && > > + write_nul_patch subject >subject.patch && > > + test_must_fail git am subject.patch 2>err && > > + grep "a NUL byte in Subject is not allowed" err > > +' > > Interesting. :-) Going through the mail now make me think about moving the last grep to next patch in order to clearly indicate that we expect "git-am" is failling but it's passed instead. And that message isn't introduced until the very next change.
Đoàn Trần Công Danh <congdanhqx@gmail.com> writes: > diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh > index ddd35498db..98cda32d0a 100755 > --- a/t/t4254-am-corrupt.sh > +++ b/t/t4254-am-corrupt.sh > @@ -3,6 +3,36 @@ > test_description='git am with corrupt input' > . ./test-lib.sh > > +write_nul_patch() { Style: SP on both sides of (), i.e. write_nul_patch () { But isn't this misnamed? You are interested in injecting '\0' byte in the e-mail headers and bodies, not necessarily part of the patch, but "nul-patch" somehow hints readers that we are writing out a Null Patch (something that does not do anything, perhaps?). sample_mbox_with_nul is the best alternative I can come up with offhand, which is not great either, but at least it does not say patch. > + space=' ' > + qNUL= > + case "$1" in > + subject) qNUL='=00' ;; > + esac Style: case/esac aligns with case arms, i.e. case "$1" in subject) qNUL='=00' ;; esac > + cat <<-EOF > + From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001 > + From: A U Thor <author@example.com> > + Date: Sun, 19 Apr 2020 13:42:07 +0700 > + Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?= > + MIME-Version: 1.0 > + Content-Type: text/plain; charset=ISO-8859-1 > + Content-Transfer-Encoding: 8bit > + > + EOF Since the above does have ${qNUL} interpolated, not quoting <<-EOF is correct. Good. > + if test "$1" = body > + then > + printf "%s\0%s\n" abc def > + fi OK. So we won't be able to inject NUL byte in both header and body at the same time. If you wanted to allow it, you could write case ",$1," in *,subject,*) qNUL="=00" ;; esac in the early part, and then rewrite this one like so: case ",$1," in *,body,*) printf "..." ;; esac Then those callers who want to ask for both can say sample_mbox_with_nul subject,body > + cat <<-\EOF > + --- > + diff --git a/afile b/afile > + new file mode 100644 > + index 0000000000..e69de29bb2 > + --$space > + 2.26.1 > + EOF Doesn't this want to interpolate $space in the output? I think you want to say <<-EOF, without quoting. cd t && sh t4254-am-corrupt.sh -d && cat trash*.t4254-*/body.patch tells me that "--$space" is left in the output, not "-- ". > +} > + > test_expect_success setup ' > # Note the missing "+++" line: > cat >bad-patch.diff <<-\EOF && > @@ -32,4 +62,18 @@ test_expect_success 'try to apply corrupted patch' ' > test_i18ncmp expected actual > ' > > +test_expect_success "NUL in commit message's body" ' > + test_when_finished "git am --abort" && > + write_nul_patch body >body.patch && > + test_must_fail git am body.patch 2>err && > + grep "a NUL byte in commit log message not allowed" err > +' > + > +test_expect_failure "NUL in commit message's header" ' > + test_when_finished "git am --abort" && > + write_nul_patch subject >subject.patch && > + test_must_fail git am subject.patch 2>err && > + grep "a NUL byte in Subject is not allowed" err > +' > + > test_done
On 2020-04-20 12:59:37-0700, Junio C Hamano <gitster@pobox.com> wrote: > Đoàn Trần Công Danh <congdanhqx@gmail.com> writes: > > > diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh > > index ddd35498db..98cda32d0a 100755 > > --- a/t/t4254-am-corrupt.sh > > +++ b/t/t4254-am-corrupt.sh > > @@ -3,6 +3,36 @@ > > test_description='git am with corrupt input' > > . ./test-lib.sh > > > > +write_nul_patch() { > > Style: SP on both sides of (), i.e. > > write_nul_patch () { > > But isn't this misnamed? You are interested in injecting '\0' byte Originally, this function was written to create a file named "nul.patch", but it's prohibited in Windows land. It's still misnamed, though. > in the e-mail headers and bodies, not necessarily part of the patch, > but "nul-patch" somehow hints readers that we are writing out a Null > Patch (something that does not do anything, perhaps?). > > sample_mbox_with_nul is the best alternative I can come up with > offhand, which is not great either, but at least it does not say > patch. I prefer having a verb, but make_sample_mbox_with_nul is too long. I'll take make_mbox_with_nul. Naming is hard. > > + cat <<-\EOF > > + --- > > + diff --git a/afile b/afile > > + new file mode 100644 > > + index 0000000000..e69de29bb2 > > + --$space > > + 2.26.1 > > + EOF > > Doesn't this want to interpolate $space in the output? I think you > want to say <<-EOF, without quoting. > > cd t && sh t4254-am-corrupt.sh -d && cat trash*.t4254-*/body.patch > > tells me that "--$space" is left in the output, not "-- ". I recalled it now. Originially, I wrote "-- " in that line, When I try git-am(1) the mail, I saw a warning about trailing space. I want to get rid of it but forget to change "-\EOF" Those last 2 lines isn't strictly required, I wanted to mimic a real patch created by git-format-patch, though.
diff --git a/mailinfo.c b/mailinfo.c index 742fa376ab..0e9911df6d 100644 --- a/mailinfo.c +++ b/mailinfo.c @@ -447,19 +447,21 @@ static int convert_to_utf8(struct mailinfo *mi, struct strbuf *line, const char *charset) { char *out; + size_t out_len; if (!mi->metainfo_charset || !charset || !*charset) return 0; if (same_encoding(mi->metainfo_charset, charset)) return 0; - out = reencode_string(line->buf, mi->metainfo_charset, charset); + out = reencode_string_len(line->buf, line->len, + mi->metainfo_charset, charset, &out_len); if (!out) { mi->input_error = -1; return error("cannot convert from %s to %s", charset, mi->metainfo_charset); } - strbuf_attach(line, out, strlen(out), strlen(out)); + strbuf_attach(line, out, out_len, out_len); return 0; } diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh index ddd35498db..98cda32d0a 100755 --- a/t/t4254-am-corrupt.sh +++ b/t/t4254-am-corrupt.sh @@ -3,6 +3,36 @@ test_description='git am with corrupt input' . ./test-lib.sh +write_nul_patch() { + space=' ' + qNUL= + case "$1" in + subject) qNUL='=00' ;; + esac + cat <<-EOF + From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001 + From: A U Thor <author@example.com> + Date: Sun, 19 Apr 2020 13:42:07 +0700 + Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?= + MIME-Version: 1.0 + Content-Type: text/plain; charset=ISO-8859-1 + Content-Transfer-Encoding: 8bit + + EOF + if test "$1" = body + then + printf "%s\0%s\n" abc def + fi + cat <<-\EOF + --- + diff --git a/afile b/afile + new file mode 100644 + index 0000000000..e69de29bb2 + --$space + 2.26.1 + EOF +} + test_expect_success setup ' # Note the missing "+++" line: cat >bad-patch.diff <<-\EOF && @@ -32,4 +62,18 @@ test_expect_success 'try to apply corrupted patch' ' test_i18ncmp expected actual ' +test_expect_success "NUL in commit message's body" ' + test_when_finished "git am --abort" && + write_nul_patch body >body.patch && + test_must_fail git am body.patch 2>err && + grep "a NUL byte in commit log message not allowed" err +' + +test_expect_failure "NUL in commit message's header" ' + test_when_finished "git am --abort" && + write_nul_patch subject >subject.patch && + test_must_fail git am subject.patch 2>err && + grep "a NUL byte in Subject is not allowed" err +' + test_done
We're passing buffer from strbuf to reencode_string, which will call strlen(3) on that buffer, and discard the length of newly created buffer. Then, we compute the length of the return buffer to attach to strbuf. During this process, we introduce a discrimination between mail originally written in utf-8 and other encoding. * if the email was written in utf-8, we leave it as is. If there is a NUL character in that line, we complains loudly: error: a NUL byte in commit log message not allowed. * if the email was written in other encoding, we truncate the data as the NUL character in that line, then we used the truncated line for the metadata. We can do better by reusing all the available information, and call the underlying lower level function that will be called indirectly by reencode_string. By doing this, we will also postpone the NUL character processing to the commit step, which will complains about the faulty metadata. Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> --- mailinfo.c | 6 ++++-- t/t4254-am-corrupt.sh | 44 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+), 2 deletions(-)