diff mbox series

[v2,2/3] mailinfo.c::convert_to_utf8: reuse strlen info

Message ID e3e542d38818b762f8d6d6b50bc42e01b070772b.1587289680.git.congdanhqx@gmail.com (mailing list archive)
State New, archived
Headers show
Series mailinfo: disallow and complains about NUL character | expand

Commit Message

Đoàn Trần Công Danh April 19, 2020, 11 a.m. UTC
We're passing buffer from strbuf to reencode_string,
which will call strlen(3) on that buffer,
and discard the length of newly created buffer.
Then, we compute the length of the return buffer to attach to strbuf.

During this process, we introduce a discrimination between mail
originally written in utf-8 and other encoding.

* if the email was written in utf-8, we leave it as is. If there is
  a NUL character in that line, we complains loudly:

  	error: a NUL byte in commit log message not allowed.

* if the email was written in other encoding, we truncate the data as
  the NUL character in that line, then we used the truncated line for
  the metadata.

We can do better by reusing all the available information,
and call the underlying lower level function that will be called
indirectly by reencode_string. By doing this, we will also postpone
the NUL character processing to the commit step, which will
complains about the faulty metadata.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 mailinfo.c            |  6 ++++--
 t/t4254-am-corrupt.sh | 44 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+), 2 deletions(-)

Comments

Martin Ågren April 19, 2020, 12:29 p.m. UTC | #1
On Sun, 19 Apr 2020 at 13:03, Đoàn Trần Công Danh <congdanhqx@gmail.com> wrote:
>
> We're passing buffer from strbuf to reencode_string,
> which will call strlen(3) on that buffer,
> and discard the length of newly created buffer.
> Then, we compute the length of the return buffer to attach to strbuf.
>
> During this process, we introduce a discrimination between mail
> originally written in utf-8 and other encoding.
>
> * if the email was written in utf-8, we leave it as is. If there is
>   a NUL character in that line, we complains loudly:
>
>         error: a NUL byte in commit log message not allowed.
>
> * if the email was written in other encoding, we truncate the data as
>   the NUL character in that line, then we used the truncated line for
>   the metadata.
>
> We can do better by reusing all the available information,
> and call the underlying lower level function that will be called
> indirectly by reencode_string. By doing this, we will also postpone
> the NUL character processing to the commit step, which will
> complains about the faulty metadata.
>
> Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>

This makes sense to me.

I think the subject could be adapted though? Now it's not about "reusing
info" anymore, it's about using *other*, *better* info. Maybe

  mailinfo.c: avoid strlen on strings that might contain NUL

? Could probably be improved further..

> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -447,19 +447,21 @@ static int convert_to_utf8(struct mailinfo *mi,
>                            struct strbuf *line, const char *charset)
>  {
>         char *out;
> +       size_t out_len;
>
>         if (!mi->metainfo_charset || !charset || !*charset)
>                 return 0;
>
>         if (same_encoding(mi->metainfo_charset, charset))
>                 return 0;
> -       out = reencode_string(line->buf, mi->metainfo_charset, charset);
> +       out = reencode_string_len(line->buf, line->len,
> +                                 mi->metainfo_charset, charset, &out_len);
>         if (!out) {
>                 mi->input_error = -1;
>                 return error("cannot convert from %s to %s",
>                              charset, mi->metainfo_charset);
>         }
> -       strbuf_attach(line, out, strlen(out), strlen(out));
> +       strbuf_attach(line, out, out_len, out_len);
>         return 0;
>  }

Same diff as before, ok.

> +write_nul_patch() {
> +       space=' '
> +       qNUL=
> +       case "$1" in
> +               subject) qNUL='=00' ;;
> +       esac

Here it's case/esac...

> +       cat <<-EOF
> +       From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001
> +       From: A U Thor <author@example.com>
> +       Date: Sun, 19 Apr 2020 13:42:07 +0700
> +       Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?=
> +       MIME-Version: 1.0
> +       Content-Type: text/plain; charset=ISO-8859-1
> +       Content-Transfer-Encoding: 8bit
> +
> +       EOF
> +       if test "$1" = body
> +       then
> +               printf "%s\0%s\n" abc def
> +       fi

Here it's if/fi. Looks a bit inconsistent.

I suppose you tried to have a case for "body" above but couldn't get it
to work? Somehow, it would seem more consistent to have a qSubject and
qBody and handle them the same way, but maybe that's not possible?
Maybe using q_to_nul to create qBody containing a NUL?

> +       cat <<-\EOF
> +       ---
> +       diff --git a/afile b/afile
> +       new file mode 100644
> +       index 0000000000..e69de29bb2
> +       --$space
> +       2.26.1
> +       EOF
> +}

I think this helper function should use &&-chaining.

> +
>  test_expect_success setup '
>         # Note the missing "+++" line:
>         cat >bad-patch.diff <<-\EOF &&
> @@ -32,4 +62,18 @@ test_expect_success 'try to apply corrupted patch' '
>         test_i18ncmp expected actual
>  '
>
> +test_expect_success "NUL in commit message's body" '
> +       test_when_finished "git am --abort" &&
> +       write_nul_patch body >body.patch &&
> +       test_must_fail git am body.patch 2>err &&
> +       grep "a NUL byte in commit log message not allowed" err
> +'

Makes sense.

> +
> +test_expect_failure "NUL in commit message's header" '
> +       test_when_finished "git am --abort" &&
> +       write_nul_patch subject >subject.patch &&
> +       test_must_fail git am subject.patch 2>err &&
> +       grep "a NUL byte in Subject is not allowed" err
> +'

Interesting. :-)


Martin
Đoàn Trần Công Danh April 19, 2020, 2:16 p.m. UTC | #2
On 2020-04-19 14:29:06+0200, Martin Ågren <martin.agren@gmail.com> wrote:
> I think the subject could be adapted though? Now it's not about "reusing
> info" anymore, it's about using *other*, *better* info. Maybe
> 
>   mailinfo.c: avoid strlen on strings that might contain NUL

This is better than mine.
I couldn't think about any other alternative.

> 
> > --- a/mailinfo.c
> > +++ b/mailinfo.c
> > @@ -447,19 +447,21 @@ static int convert_to_utf8(struct mailinfo *mi,
> >                            struct strbuf *line, const char *charset)
> >  {
> >         char *out;
> > +       size_t out_len;
> >
> >         if (!mi->metainfo_charset || !charset || !*charset)
> >                 return 0;
> >
> >         if (same_encoding(mi->metainfo_charset, charset))
> >                 return 0;
> > -       out = reencode_string(line->buf, mi->metainfo_charset, charset);
> > +       out = reencode_string_len(line->buf, line->len,
> > +                                 mi->metainfo_charset, charset, &out_len);
> >         if (!out) {
> >                 mi->input_error = -1;
> >                 return error("cannot convert from %s to %s",
> >                              charset, mi->metainfo_charset);
> >         }
> > -       strbuf_attach(line, out, strlen(out), strlen(out));
> > +       strbuf_attach(line, out, out_len, out_len);
> >         return 0;
> >  }
> 
> Same diff as before, ok.
> 
> > +write_nul_patch() {
> > +       space=' '
> > +       qNUL=
> > +       case "$1" in
> > +               subject) qNUL='=00' ;;
> > +       esac
> 
> Here it's case/esac...
> 
> > +       cat <<-EOF
> > +       From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001
> > +       From: A U Thor <author@example.com>
> > +       Date: Sun, 19 Apr 2020 13:42:07 +0700
> > +       Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?=
> > +       MIME-Version: 1.0
> > +       Content-Type: text/plain; charset=ISO-8859-1
> > +       Content-Transfer-Encoding: 8bit
> > +
> > +       EOF
> > +       if test "$1" = body
> > +       then
> > +               printf "%s\0%s\n" abc def
> > +       fi
> 
> Here it's if/fi. Looks a bit inconsistent.
> 
> I suppose you tried to have a case for "body" above but couldn't get it
> to work? Somehow, it would seem more consistent to have a qSubject and
> qBody and handle them the same way, but maybe that's not possible?
> Maybe using q_to_nul to create qBody containing a NUL?

Those patch was written in different time, with different thought in
mind.

I need to send a re-roll to update the subject and moving oneline in
test code to later patch.

I'll update this hunk with that re-roll.
Your new series _won't_ be affected, though.

> > +       cat <<-\EOF
> > +       ---
> > +       diff --git a/afile b/afile
> > +       new file mode 100644
> > +       index 0000000000..e69de29bb2
> > +       --$space
> > +       2.26.1
> > +       EOF
> > +}
> 
> I think this helper function should use &&-chaining.

Correct.

> > +
> >  test_expect_success setup '
> >         # Note the missing "+++" line:
> >         cat >bad-patch.diff <<-\EOF &&
> > @@ -32,4 +62,18 @@ test_expect_success 'try to apply corrupted patch' '
> >         test_i18ncmp expected actual
> >  '
> >
> > +test_expect_success "NUL in commit message's body" '
> > +       test_when_finished "git am --abort" &&
> > +       write_nul_patch body >body.patch &&
> > +       test_must_fail git am body.patch 2>err &&
> > +       grep "a NUL byte in commit log message not allowed" err
> > +'
> 
> Makes sense.
> 
> > +
> > +test_expect_failure "NUL in commit message's header" '
> > +       test_when_finished "git am --abort" &&
> > +       write_nul_patch subject >subject.patch &&
> > +       test_must_fail git am subject.patch 2>err &&
> > +       grep "a NUL byte in Subject is not allowed" err
> > +'
> 
> Interesting. :-)

Going through the mail now make me think about moving the last grep to
next patch in order to clearly indicate that we expect "git-am" is
failling but it's passed instead.

And that message isn't introduced until the very next change.
Junio C Hamano April 20, 2020, 7:59 p.m. UTC | #3
Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:

> diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
> index ddd35498db..98cda32d0a 100755
> --- a/t/t4254-am-corrupt.sh
> +++ b/t/t4254-am-corrupt.sh
> @@ -3,6 +3,36 @@
>  test_description='git am with corrupt input'
>  . ./test-lib.sh
>  
> +write_nul_patch() {

Style: SP on both sides of (), i.e.

	write_nul_patch () {

But isn't this misnamed?  You are interested in injecting '\0' byte
in the e-mail headers and bodies, not necessarily part of the patch,
but "nul-patch" somehow hints readers that we are writing out a Null
Patch (something that does not do anything, perhaps?).

sample_mbox_with_nul is the best alternative I can come up with
offhand, which is not great either, but at least it does not say
patch.

> +	space=' '
> +	qNUL=
> +	case "$1" in
> +		subject) qNUL='=00' ;;
> +	esac

Style: case/esac aligns with case arms, i.e.

	case "$1" in
	subject) qNUL='=00' ;;
	esac

> +	cat <<-EOF
> +	From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001
> +	From: A U Thor <author@example.com>
> +	Date: Sun, 19 Apr 2020 13:42:07 +0700
> +	Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?=
> +	MIME-Version: 1.0
> +	Content-Type: text/plain; charset=ISO-8859-1
> +	Content-Transfer-Encoding: 8bit
> +
> +	EOF

Since the above does have ${qNUL} interpolated, not quoting <<-EOF
is correct.  Good.

> +	if test "$1" = body
> +	then
> +		printf "%s\0%s\n" abc def
> +	fi

OK.  So we won't be able to inject NUL byte in both header and body
at the same time.  If you wanted to allow it, you could write

	case ",$1," in
	*,subject,*)	qNUL="=00" ;;
	esac

in the early part, and then rewrite this one like so:

	case ",$1," in
	*,body,*)	printf "..." ;;
	esac

Then those callers who want to ask for both can say

	sample_mbox_with_nul subject,body

> +	cat <<-\EOF
> +	---
> +	diff --git a/afile b/afile
> +	new file mode 100644
> +	index 0000000000..e69de29bb2
> +	--$space
> +	2.26.1
> +	EOF

Doesn't this want to interpolate $space in the output?  I think you
want to say <<-EOF, without quoting.

    cd t && sh t4254-am-corrupt.sh -d && cat trash*.t4254-*/body.patch

tells me that "--$space" is left in the output, not "-- ".

> +}
> +
>  test_expect_success setup '
>  	# Note the missing "+++" line:
>  	cat >bad-patch.diff <<-\EOF &&
> @@ -32,4 +62,18 @@ test_expect_success 'try to apply corrupted patch' '
>  	test_i18ncmp expected actual
>  '
>  
> +test_expect_success "NUL in commit message's body" '
> +	test_when_finished "git am --abort" &&
> +	write_nul_patch body >body.patch &&
> +	test_must_fail git am body.patch 2>err &&
> +	grep "a NUL byte in commit log message not allowed" err
> +'
> +
> +test_expect_failure "NUL in commit message's header" '
> +	test_when_finished "git am --abort" &&
> +	write_nul_patch subject >subject.patch &&
> +	test_must_fail git am subject.patch 2>err &&
> +	grep "a NUL byte in Subject is not allowed" err
> +'
> +
>  test_done
Đoàn Trần Công Danh April 20, 2020, 11:53 p.m. UTC | #4
On 2020-04-20 12:59:37-0700, Junio C Hamano <gitster@pobox.com> wrote:
> Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:
> 
> > diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
> > index ddd35498db..98cda32d0a 100755
> > --- a/t/t4254-am-corrupt.sh
> > +++ b/t/t4254-am-corrupt.sh
> > @@ -3,6 +3,36 @@
> >  test_description='git am with corrupt input'
> >  . ./test-lib.sh
> >  
> > +write_nul_patch() {
> 
> Style: SP on both sides of (), i.e.
> 
> 	write_nul_patch () {
> 
> But isn't this misnamed?  You are interested in injecting '\0' byte

Originally, this function was written to create a file named
"nul.patch", but it's prohibited in Windows land.

It's still misnamed, though.

> in the e-mail headers and bodies, not necessarily part of the patch,
> but "nul-patch" somehow hints readers that we are writing out a Null
> Patch (something that does not do anything, perhaps?).
> 
> sample_mbox_with_nul is the best alternative I can come up with
> offhand, which is not great either, but at least it does not say
> patch.

I prefer having a verb, but make_sample_mbox_with_nul is too long.
I'll take make_mbox_with_nul. Naming is hard.

> > +	cat <<-\EOF
> > +	---
> > +	diff --git a/afile b/afile
> > +	new file mode 100644
> > +	index 0000000000..e69de29bb2
> > +	--$space
> > +	2.26.1
> > +	EOF
> 
> Doesn't this want to interpolate $space in the output?  I think you
> want to say <<-EOF, without quoting.
> 
>     cd t && sh t4254-am-corrupt.sh -d && cat trash*.t4254-*/body.patch
> 
> tells me that "--$space" is left in the output, not "-- ".

I recalled it now. Originially, I wrote "-- " in that line,
When I try git-am(1) the mail, I saw a warning about trailing space.
I want to get rid of it but forget to change "-\EOF"

Those last 2 lines isn't strictly required, I wanted to mimic a real
patch created by git-format-patch, though.
diff mbox series

Patch

diff --git a/mailinfo.c b/mailinfo.c
index 742fa376ab..0e9911df6d 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -447,19 +447,21 @@  static int convert_to_utf8(struct mailinfo *mi,
 			   struct strbuf *line, const char *charset)
 {
 	char *out;
+	size_t out_len;
 
 	if (!mi->metainfo_charset || !charset || !*charset)
 		return 0;
 
 	if (same_encoding(mi->metainfo_charset, charset))
 		return 0;
-	out = reencode_string(line->buf, mi->metainfo_charset, charset);
+	out = reencode_string_len(line->buf, line->len,
+				  mi->metainfo_charset, charset, &out_len);
 	if (!out) {
 		mi->input_error = -1;
 		return error("cannot convert from %s to %s",
 			     charset, mi->metainfo_charset);
 	}
-	strbuf_attach(line, out, strlen(out), strlen(out));
+	strbuf_attach(line, out, out_len, out_len);
 	return 0;
 }
 
diff --git a/t/t4254-am-corrupt.sh b/t/t4254-am-corrupt.sh
index ddd35498db..98cda32d0a 100755
--- a/t/t4254-am-corrupt.sh
+++ b/t/t4254-am-corrupt.sh
@@ -3,6 +3,36 @@ 
 test_description='git am with corrupt input'
 . ./test-lib.sh
 
+write_nul_patch() {
+	space=' '
+	qNUL=
+	case "$1" in
+		subject) qNUL='=00' ;;
+	esac
+	cat <<-EOF
+	From ec7364544f690c560304f5a5de9428ea3b978b26 Mon Sep 17 00:00:00 2001
+	From: A U Thor <author@example.com>
+	Date: Sun, 19 Apr 2020 13:42:07 +0700
+	Subject: [PATCH] =?ISO-8859-1?q?=C4=CB${qNUL}=D1=CF=D6?=
+	MIME-Version: 1.0
+	Content-Type: text/plain; charset=ISO-8859-1
+	Content-Transfer-Encoding: 8bit
+
+	EOF
+	if test "$1" = body
+	then
+		printf "%s\0%s\n" abc def
+	fi
+	cat <<-\EOF
+	---
+	diff --git a/afile b/afile
+	new file mode 100644
+	index 0000000000..e69de29bb2
+	--$space
+	2.26.1
+	EOF
+}
+
 test_expect_success setup '
 	# Note the missing "+++" line:
 	cat >bad-patch.diff <<-\EOF &&
@@ -32,4 +62,18 @@  test_expect_success 'try to apply corrupted patch' '
 	test_i18ncmp expected actual
 '
 
+test_expect_success "NUL in commit message's body" '
+	test_when_finished "git am --abort" &&
+	write_nul_patch body >body.patch &&
+	test_must_fail git am body.patch 2>err &&
+	grep "a NUL byte in commit log message not allowed" err
+'
+
+test_expect_failure "NUL in commit message's header" '
+	test_when_finished "git am --abort" &&
+	write_nul_patch subject >subject.patch &&
+	test_must_fail git am subject.patch 2>err &&
+	grep "a NUL byte in Subject is not allowed" err
+'
+
 test_done