diff mbox series

[v4] git-send-email: Use sanitized address when reading mbox body

Message ID 20240701090115.56957-1-csokas.bence@prolan.hu (mailing list archive)
State Accepted
Commit c852531f451331eb9d5ba57d154b0e150246a438
Headers show
Series [v4] git-send-email: Use sanitized address when reading mbox body | expand

Commit Message

Csókás Bence July 1, 2024, 9:01 a.m. UTC
Addresses that are mentioned on the trailers in the commit log
('Signed-off-by: ' etc.) are added to @cc (unless suppressed),
passed to the SMTP server. However, these hand-written
addresses may be malformed (e.g. having unquoted commas and
other punctuation marks in the display-name part).

The code was already calling `sanitize_address()` for suppression
purposes, so we just have to use the result ($sc) for adding to @cc.

Note that sanitization is only done for the message body, as
`git format-patch` already RFC 2047-encodes mbox headers, so
those are generally trusted to be sane. Also note that
`sanitize_address()` does not process the mailbox addresses,
so it is up to `sendmail` to handle special characters there
(e.g. there are mailboxes in regular use with '+'-es in them).

Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu>
---

Notes:
    Changes in v4:
    * t9001: use ${SQ} instead of double quotes
    * re-worded message again
    Changes in v3:
    * more testcases
    * clarified wording in message
    Changes in v2:
    * added testcase to t9001
    * added rationale behind trusting mbox headers and the address-parts

 git-send-email.perl   |  4 ++--
 t/t9001-send-email.sh | 43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 2 deletions(-)

Comments

Junio C Hamano July 1, 2024, 6:47 p.m. UTC | #1
"Csókás, Bence" <csokas.bence@prolan.hu> writes:

> Addresses that are mentioned on the trailers in the commit log
> ('Signed-off-by: ' etc.) are added to @cc (unless suppressed),
> passed to the SMTP server. However, these hand-written
> addresses may be malformed (e.g. having unquoted commas and
> other punctuation marks in the display-name part).
>
> The code was already calling `sanitize_address()` for suppression
> purposes, so we just have to use the result ($sc) for adding to @cc.

There is a leap between the description of the status quo and your
conclusion.  "we just have to" becomes valid only after explaining
that sanitize_address turns the address-looking string into valid
addresses (and we do not want to send to malformed addresses--but
that goes without saying).

> Also note that
> `sanitize_address()` does not process the mailbox addresses,
> so it is up to `sendmail` to handle special characters there
> (e.g. there are mailboxes in regular use with '+'-es in them).

I do not quote see the point of this final note.  mailboxes with
'q'es in them are also in regular use, and singling out '+' does not
make much sense in the context of explaining this change.

>     Changes in v4:
>     * t9001: use ${SQ} instead of double quotes
>     * re-worded message again

OK.  The additional recipient address 

> +	Co-developed-by: "C. O. Developer" <codev@example.com>

to contrast with

> +	Signed-off-by: A. U. Thor <thor.au@example.com>

is a nice touch.  We make sure that, with or without necessary
quoting in the original, we produce the correct result ;-).

Let's mark it for 'next' soonish, with proposed log message
rewritten somewhat.

Thanks.

----- >8 -----
git-send-email: use sanitized address when reading mbox body

Addresses that are mentioned on the trailers in the commit log
messages (e.g., "Reviewed-by") are added to the "Cc:" list by "git
send-email".  These hand-written addresses, however, may be
malformed (e.g., having unquoted "." and other punctutation marks in
the display-name part) and can upset MTA.

The code does use the sanitize_address() helper on these
address-looking strings to turn them into valid addresses, but it is
used only to see if the address should be suppressed.  The original
string taken from the message is added to the @cc list if the code
decides the address is not suppressed.

Because the addresses on trailer lines are hand-written and more
likely to contain malformed addresses, when adding to the @cc list,
use the result from sanitize_address, not the original.  Note that
we do not modify the behaviour for addresses taken from the e-mail
headers, as they are more likely to be machine generated and
well-formed.

Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Csókás Bence July 3, 2024, 6:24 a.m. UTC | #2
On 7/1/24 20:47, Junio C Hamano wrote:
> Let's mark it for 'next' soonish, with proposed log message
> rewritten somewhat.
> 
> Thanks.
> 
> ----- >8 -----

Alright. Should I re-send it with this message then, or will you amend it?

Bence
Junio C Hamano July 3, 2024, 3:44 p.m. UTC | #3
Csókás Bence <csokas.bence@prolan.hu> writes:

> On 7/1/24 20:47, Junio C Hamano wrote:
>> Let's mark it for 'next' soonish, with proposed log message
>> rewritten somewhat.
>> Thanks.
>> ----- >8 -----
>
> Alright. Should I re-send it with this message then, or will you amend it?

If you are 100% happy with what you saw there, then just saying so
would be sufficient.  If not, you can update your commit with
further improvements and send a v5 iteration.

Thanks.
diff mbox series

Patch

diff --git a/git-send-email.perl b/git-send-email.perl
index f0be4b4560..72044e5ef3 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1847,9 +1847,9 @@  sub pre_process_file {
 					$what, $_) unless $quiet;
 				next;
 			}
-			push @cc, $c;
+			push @cc, $sc;
 			printf(__("(body) Adding cc: %s from line '%s'\n"),
-				$c, $_) unless $quiet;
+				$sc, $_) unless $quiet;
 		}
 	}
 	close $fh;
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 58699f8e4e..64a4ab3736 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -1299,6 +1299,49 @@  test_expect_success $PREREQ 'utf8 sender is not duplicated' '
 	test_line_count = 1 msgfrom
 '
 
+test_expect_success $PREREQ 'setup expect for cc list' "
+cat >expected-cc <<\EOF
+!recipient@example.com!
+!author@example.com!
+!one@example.com!
+!os@example.com!
+!odd_?=mail@example.com!
+!doug@example.com!
+!codev@example.com!
+!thor.au@example.com!
+EOF
+"
+
+test_expect_success $PREREQ 'cc list is sanitized' '
+	clean_fake_sendmail &&
+	test_commit weird_cc_body &&
+	test_when_finished "git reset --hard HEAD^" &&
+	git commit --amend -F - <<-EOF &&
+	Test Cc: sanitization.
+
+	Cc: Person, One <one@example.com>
+	Cc: Ronnie O${SQ}Sullivan <os@example.com>
+	Reviewed-by: Füñný Nâmé <odd_?=mail@example.com>
+	Reported-by: bugger on Jira
+	Reported-by: Douglas Reporter <doug@example.com> [from Jira profile]
+	BugID: 12345
+	Co-developed-by: "C. O. Developer" <codev@example.com>
+	Signed-off-by: A. U. Thor <thor.au@example.com>
+	EOF
+	git send-email -1 --to=recipient@example.com \
+		--smtp-server="$(pwd)/fake.sendmail" >actual-show-all-headers &&
+	test_cmp expected-cc commandline1 &&
+	test_grep "^(body) Adding cc: \"Person, One\" <one@example.com>" actual-show-all-headers &&
+	test_grep "^(body) Adding cc: Ronnie O${SQ}Sullivan <os@example.com>" actual-show-all-headers &&
+	test_grep "^(body) Adding cc: =?UTF-8?q?F=C3=BC=C3=B1n=C3=BD=20N=C3=A2m=C3=A9?="\
+" <odd_?=mail@example.com>" actual-show-all-headers &&
+	test_grep "^(body) Ignoring Reported-by .* bugger on Jira" actual-show-all-headers &&
+	test_grep "^(body) Adding cc: Douglas Reporter <doug@example.com>" actual-show-all-headers &&
+	test_grep ! "12345" actual-show-all-headers &&
+	test_grep "^(body) Adding cc: \"C. O. Developer\" <codev@example.com>" actual-show-all-headers &&
+	test_grep "^(body) Adding cc: \"A. U. Thor\" <thor.au@example.com>" actual-show-all-headers
+'
+
 test_expect_success $PREREQ 'sendemail.composeencoding works' '
 	clean_fake_sendmail &&
 	git config sendemail.composeencoding iso-8859-1 &&