diff mbox series

[v2,1/3] check-mailmap: accept "user@host" contacts

Message ID 20240819-jk-send-email-mailmap-support-v2-1-d212c3f9e505@gmail.com (mailing list archive)
State Superseded
Headers show
Series send-email: add --mailmap support | expand

Commit Message

Jacob Keller Aug. 20, 2024, 12:07 a.m. UTC
From: Jacob Keller <jacob.keller@gmail.com>

git check-mailmap splits each provided contact using split_ident_line.
This function requires that the contact either be of the form "Name
<user@host>" or of the form "<user@host>". In particular, if the mail
portion of the contact is not surrounded by angle brackets,
split_ident_line will reject it.

This results in git check-mailmap rejecting attempts to translate simple
email addresses:

  $ git check-mailmap user@host
  fatal: unable to parse contact: user@host

This limits the usability of check-mailmap as it requires placing angle
brackets around plain email addresses.

In particular, attempting to use git check-mailmap to support mapping
addresses in git send-email is not straight forward. The sanitization
and validation functions in git send-email strip angle brackets from
plain email addresses. It is not trivial to add brackets prior to
invoking git check-mailmap.

Instead, modify check_mailmap() to allow such strings as contacts. In
particular, treat any line which cannot be split by split_ident_line as
a simple email address.

No attempt is made to actually parse the address line to validate that
it is actually an address. Doing so is non-trivial, and provides little
value. Either the provided input will correctly map via the map_user
call, or it will fail and be printed out, surrounded by angle brackets:

  $ git check-mailmap user@host
  <user@host>

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 builtin/check-mailmap.c             | 18 +++++++++++-------
 Documentation/git-check-mailmap.txt |  8 ++++----
 t/t4203-mailmap.sh                  | 33 +++++++++++++++++++++++++++++++--
 3 files changed, 46 insertions(+), 13 deletions(-)

Comments

Josh Steadmon Aug. 21, 2024, 5:50 p.m. UTC | #1
On 2024.08.19 17:07, Jacob Keller wrote:
> From: Jacob Keller <jacob.keller@gmail.com>
> 
> git check-mailmap splits each provided contact using split_ident_line.
> This function requires that the contact either be of the form "Name
> <user@host>" or of the form "<user@host>". In particular, if the mail
> portion of the contact is not surrounded by angle brackets,
> split_ident_line will reject it.
> 
> This results in git check-mailmap rejecting attempts to translate simple
> email addresses:
> 
>   $ git check-mailmap user@host
>   fatal: unable to parse contact: user@host
> 
> This limits the usability of check-mailmap as it requires placing angle
> brackets around plain email addresses.
> 
> In particular, attempting to use git check-mailmap to support mapping
> addresses in git send-email is not straight forward. The sanitization
> and validation functions in git send-email strip angle brackets from
> plain email addresses. It is not trivial to add brackets prior to
> invoking git check-mailmap.
> 
> Instead, modify check_mailmap() to allow such strings as contacts. In
> particular, treat any line which cannot be split by split_ident_line as
> a simple email address.
> 
> No attempt is made to actually parse the address line to validate that
> it is actually an address. Doing so is non-trivial, and provides little
> value. Either the provided input will correctly map via the map_user
> call, or it will fail and be printed out, surrounded by angle brackets:
> 
>   $ git check-mailmap user@host
>   <user@host>
> 
> Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
> ---
>  builtin/check-mailmap.c             | 18 +++++++++++-------
>  Documentation/git-check-mailmap.txt |  8 ++++----
>  t/t4203-mailmap.sh                  | 33 +++++++++++++++++++++++++++++++--
>  3 files changed, 46 insertions(+), 13 deletions(-)
> 
> diff --git a/builtin/check-mailmap.c b/builtin/check-mailmap.c
> index b8a05b8e07b5..6b7fb53494f0 100644
> --- a/builtin/check-mailmap.c
> +++ b/builtin/check-mailmap.c
> @@ -25,13 +25,17 @@ static void check_mailmap(struct string_list *mailmap, const char *contact)
>  	size_t namelen, maillen;
>  	struct ident_split ident;
>  
> -	if (split_ident_line(&ident, contact, strlen(contact)))
> -		die(_("unable to parse contact: %s"), contact);
> -
> -	name = ident.name_begin;
> -	namelen = ident.name_end - ident.name_begin;
> -	mail = ident.mail_begin;
> -	maillen = ident.mail_end - ident.mail_begin;
> +	if (!split_ident_line(&ident, contact, strlen(contact))) {
> +		name = ident.name_begin;
> +		namelen = ident.name_end - ident.name_begin;
> +		mail = ident.mail_begin;
> +		maillen = ident.mail_end - ident.mail_begin;
> +	} else {
> +		name = NULL;
> +		namelen = 0;
> +		mail = contact;
> +		maillen = strlen(contact);
> +	}
>  
>  	map_user(mailmap, &mail, &maillen, &name, &namelen);
>  
> diff --git a/Documentation/git-check-mailmap.txt b/Documentation/git-check-mailmap.txt
> index 02f441832321..7747e38e25e3 100644
> --- a/Documentation/git-check-mailmap.txt
> +++ b/Documentation/git-check-mailmap.txt
> @@ -15,10 +15,10 @@ SYNOPSIS
>  DESCRIPTION
>  -----------
>  
> -For each ``Name $$<user@host>$$'' or ``$$<user@host>$$'' from the command-line
> -or standard input (when using `--stdin`), look up the person's canonical name
> -and email address (see "Mapping Authors" below). If found, print them;
> -otherwise print the input as-is.
> +For each ``Name $$<user@host>$$'', ``$$<user@host>$$'', or ``$$user@host$$''
> +from the command-line or standard input (when using `--stdin`), look up the
> +person's canonical name and email address (see "Mapping Authors" below). If
> +found, print them; otherwise print the input as-is.
>  
>  
>  OPTIONS
> diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
> index 79e5f42760d9..0c1efe0b2e17 100755
> --- a/t/t4203-mailmap.sh
> +++ b/t/t4203-mailmap.sh
> @@ -73,11 +73,40 @@ test_expect_success 'check-mailmap --stdin arguments: mapping' '
>  '
>  
>  test_expect_success 'check-mailmap bogus contact' '
> -	test_must_fail git check-mailmap bogus
> +	cat >expect <<-EOF &&
> +	<bogus>
> +	EOF
> +	git check-mailmap bogus >actual &&
> +	test_cmp expect actual
>  '

I think I'd just remove this test case altogether, IIUC it' doesn't
provide any additional value vs. the "check-mailmap simple address: no
mapping" test below.


>  test_expect_success 'check-mailmap bogus contact --stdin' '
> -	test_must_fail git check-mailmap --stdin bogus </dev/null
> +	cat >expect <<-EOF &&
> +	<bogus>
> +	EOF
> +	cat >stdin <<-EOF &&
> +	bogus
> +	EOF
> +	git check-mailmap --stdin <stdin >actual &&
> +	test_cmp expect actual
> +'

Similarly, I might change this to use a real address instead of "bogus",
as we're no longer checking for invalid input.


> +test_expect_success 'check-mailmap simple address: mapping' '
> +	test_when_finished "rm .mailmap" &&
> +	cat >.mailmap <<-EOF &&
> +	New Name <$GIT_AUTHOR_EMAIL>
> +	EOF
> +	cat .mailmap >expect &&
> +	git check-mailmap "$GIT_AUTHOR_EMAIL" >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'check-mailmap simple address: no mapping' '
> +	cat >expect <<-EOF &&
> +	<bugs@company.xx>
> +	EOF
> +	git check-mailmap "bugs@company.xx" >actual &&
> +	test_cmp expect actual
>  '
>  
>  test_expect_success 'No mailmap' '
> 
> -- 
> 2.46.0.124.g2dc1a81c8933
>
Junio C Hamano Aug. 21, 2024, 6:26 p.m. UTC | #2
Josh Steadmon <steadmon@google.com> writes:

>>  test_expect_success 'check-mailmap bogus contact' '
>> -	test_must_fail git check-mailmap bogus
>> +	cat >expect <<-EOF &&
>> +	<bogus>
>> +	EOF
>> +	git check-mailmap bogus >actual &&
>> +	test_cmp expect actual
>>  '
>
> I think I'd just remove this test case altogether, IIUC it' doesn't
> provide any additional value vs. the "check-mailmap simple address: no
> mapping" test below.

Sorry, but I do not follow.  The other one is <bogus@company.xx>
that looks more globally routable address than a local-only <bogus>
mailbox.  Isn't it worth ensuring that we will keep treating them
the same way?

Having said that ...

>> -For each ``Name $$<user@host>$$'' or ``$$<user@host>$$'' from the command-line
>> -or standard input (when using `--stdin`), look up the person's canonical name
>> -and email address (see "Mapping Authors" below). If found, print them;
>> -otherwise print the input as-is.
>> +For each ``Name $$<user@host>$$'', ``$$<user@host>$$'', or ``$$user@host$$''
>> +from the command-line or standard input (when using `--stdin`), look up the
>> +person's canonical name and email address (see "Mapping Authors" below). If
>> +found, print them; otherwise print the input as-is.

... it seems that <user> without <@host> is a supported format.
Should we update the document, too? 

If the @host-less name is meant to trigger a random unspecified
behaviour, whatever the code happens to do, that is perfectly fine,
but then we probably should not be etching it in the stone by
writing a test for it.  So because of a  reason that is completely
different from yours, I'd support removal of the "bogus" test, if
that is the case.

Thanks.
Eric Sunshine Aug. 21, 2024, 6:27 p.m. UTC | #3
On Wed, Aug 21, 2024 at 1:50 PM Josh Steadmon <steadmon@google.com> wrote:
> On 2024.08.19 17:07, Jacob Keller wrote:
> > diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
> > @@ -73,11 +73,40 @@ test_expect_success 'check-mailmap --stdin arguments: mapping' '
> >  test_expect_success 'check-mailmap bogus contact' '
> > -     test_must_fail git check-mailmap bogus
> > +     cat >expect <<-EOF &&
> > +     <bogus>
> > +     EOF
> > +     git check-mailmap bogus >actual &&
> > +     test_cmp expect actual
> >  '
>
> I think I'd just remove this test case altogether, IIUC it' doesn't
> provide any additional value vs. the "check-mailmap simple address: no
> mapping" test below.

I had the same thought upon reading this.

> >  test_expect_success 'check-mailmap bogus contact --stdin' '
> > -     test_must_fail git check-mailmap --stdin bogus </dev/null
> > +     cat >expect <<-EOF &&
> > +     <bogus>
> > +     EOF
> > +     cat >stdin <<-EOF &&
> > +     bogus
> > +     EOF
> > +     git check-mailmap --stdin <stdin >actual &&
> > +     test_cmp expect actual
> > +'
>
> Similarly, I might change this to use a real address instead of "bogus",
> as we're no longer checking for invalid input.

Ditto for this change, but even more so because this is a fairly
significant semantic change. In particular, the documented and
intended behavior of the command when --stdin is specified is that it
will consume email addresses from *both* the command-line and from
standard input, and I think the point of the original test was to
verify that it still correctly recognized a bogus email address
specified as an argument even when --stdin is requested. Given that
understanding (assuming it's correct), then the original test was
already perhaps somewhat iffy anyhow, but after this change, it is
even less meaningful.
Jacob Keller Aug. 21, 2024, 7:07 p.m. UTC | #4
On 8/21/2024 11:26 AM, Junio C Hamano wrote:
> Josh Steadmon <steadmon@google.com> writes:
> 
>>>  test_expect_success 'check-mailmap bogus contact' '
>>> -	test_must_fail git check-mailmap bogus
>>> +	cat >expect <<-EOF &&
>>> +	<bogus>
>>> +	EOF
>>> +	git check-mailmap bogus >actual &&
>>> +	test_cmp expect actual
>>>  '
>>
>> I think I'd just remove this test case altogether, IIUC it' doesn't
>> provide any additional value vs. the "check-mailmap simple address: no
>> mapping" test below.
> 
> Sorry, but I do not follow.  The other one is <bogus@company.xx>
> that looks more globally routable address than a local-only <bogus>
> mailbox.  Isn't it worth ensuring that we will keep treating them
> the same way?
> 
> Having said that ...
> 
>>> -For each ``Name $$<user@host>$$'' or ``$$<user@host>$$'' from the command-line
>>> -or standard input (when using `--stdin`), look up the person's canonical name
>>> -and email address (see "Mapping Authors" below). If found, print them;
>>> -otherwise print the input as-is.
>>> +For each ``Name $$<user@host>$$'', ``$$<user@host>$$'', or ``$$user@host$$''
>>> +from the command-line or standard input (when using `--stdin`), look up the
>>> +person's canonical name and email address (see "Mapping Authors" below). If
>>> +found, print them; otherwise print the input as-is.
> 
> ... it seems that <user> without <@host> is a supported format.
> Should we update the document, too? 
> 

Its supported by happenstance, but i didn't want to encourage it.

> If the @host-less name is meant to trigger a random unspecified
> behaviour, whatever the code happens to do, that is perfectly fine,
> but then we probably should not be etching it in the stone by
> writing a test for it.  So because of a  reason that is completely
> different from yours, I'd support removal of the "bogus" test, if
> that is the case.
> 

I prefer removing the test. In an ideal world, I think we would probably
only accept actual <user@host> or user@host, but I did not think I would
create sufficiently correct parsing for addresses, so I left it out.

I did try looking up what the rules for addresses are, but it looks like
it got pretty complicated. I guess we could do very basic "it must have
an @ symbol, but anything else goes?


> Thanks.
>
Jacob Keller Aug. 21, 2024, 7:09 p.m. UTC | #5
On 8/21/2024 11:27 AM, Eric Sunshine wrote:
> On Wed, Aug 21, 2024 at 1:50 PM Josh Steadmon <steadmon@google.com> wrote:
>> On 2024.08.19 17:07, Jacob Keller wrote:
>>> diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
>>> @@ -73,11 +73,40 @@ test_expect_success 'check-mailmap --stdin arguments: mapping' '
>>>  test_expect_success 'check-mailmap bogus contact' '
>>> -     test_must_fail git check-mailmap bogus
>>> +     cat >expect <<-EOF &&
>>> +     <bogus>
>>> +     EOF
>>> +     git check-mailmap bogus >actual &&
>>> +     test_cmp expect actual
>>>  '
>>
>> I think I'd just remove this test case altogether, IIUC it' doesn't
>> provide any additional value vs. the "check-mailmap simple address: no
>> mapping" test below.
> 
> I had the same thought upon reading this.
> 

Yea, I think I've been convinced. I'll remove these tests.

>>>  test_expect_success 'check-mailmap bogus contact --stdin' '
>>> -     test_must_fail git check-mailmap --stdin bogus </dev/null
>>> +     cat >expect <<-EOF &&
>>> +     <bogus>
>>> +     EOF
>>> +     cat >stdin <<-EOF &&
>>> +     bogus
>>> +     EOF
>>> +     git check-mailmap --stdin <stdin >actual &&
>>> +     test_cmp expect actual
>>> +'
>>
>> Similarly, I might change this to use a real address instead of "bogus",
>> as we're no longer checking for invalid input.>
> Ditto for this change, but even more so because this is a fairly
> significant semantic change. In particular, the documented and
> intended behavior of the command when --stdin is specified is that it
> will consume email addresses from *both* the command-line and from
> standard input, and I think the point of the original test was to
> verify that it still correctly recognized a bogus email address
> specified as an argument even when --stdin is requested. Given that
> understanding (assuming it's correct), then the original test was
> already perhaps somewhat iffy anyhow, but after this change, it is
> even less meaningful.
> 

I tried to ensure we have test cases covering both --stdin and a
combination. I'll double check this in a v3 and ensure test cases
covering the behavior.

Thanks for the feedback!
diff mbox series

Patch

diff --git a/builtin/check-mailmap.c b/builtin/check-mailmap.c
index b8a05b8e07b5..6b7fb53494f0 100644
--- a/builtin/check-mailmap.c
+++ b/builtin/check-mailmap.c
@@ -25,13 +25,17 @@  static void check_mailmap(struct string_list *mailmap, const char *contact)
 	size_t namelen, maillen;
 	struct ident_split ident;
 
-	if (split_ident_line(&ident, contact, strlen(contact)))
-		die(_("unable to parse contact: %s"), contact);
-
-	name = ident.name_begin;
-	namelen = ident.name_end - ident.name_begin;
-	mail = ident.mail_begin;
-	maillen = ident.mail_end - ident.mail_begin;
+	if (!split_ident_line(&ident, contact, strlen(contact))) {
+		name = ident.name_begin;
+		namelen = ident.name_end - ident.name_begin;
+		mail = ident.mail_begin;
+		maillen = ident.mail_end - ident.mail_begin;
+	} else {
+		name = NULL;
+		namelen = 0;
+		mail = contact;
+		maillen = strlen(contact);
+	}
 
 	map_user(mailmap, &mail, &maillen, &name, &namelen);
 
diff --git a/Documentation/git-check-mailmap.txt b/Documentation/git-check-mailmap.txt
index 02f441832321..7747e38e25e3 100644
--- a/Documentation/git-check-mailmap.txt
+++ b/Documentation/git-check-mailmap.txt
@@ -15,10 +15,10 @@  SYNOPSIS
 DESCRIPTION
 -----------
 
-For each ``Name $$<user@host>$$'' or ``$$<user@host>$$'' from the command-line
-or standard input (when using `--stdin`), look up the person's canonical name
-and email address (see "Mapping Authors" below). If found, print them;
-otherwise print the input as-is.
+For each ``Name $$<user@host>$$'', ``$$<user@host>$$'', or ``$$user@host$$''
+from the command-line or standard input (when using `--stdin`), look up the
+person's canonical name and email address (see "Mapping Authors" below). If
+found, print them; otherwise print the input as-is.
 
 
 OPTIONS
diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 79e5f42760d9..0c1efe0b2e17 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -73,11 +73,40 @@  test_expect_success 'check-mailmap --stdin arguments: mapping' '
 '
 
 test_expect_success 'check-mailmap bogus contact' '
-	test_must_fail git check-mailmap bogus
+	cat >expect <<-EOF &&
+	<bogus>
+	EOF
+	git check-mailmap bogus >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success 'check-mailmap bogus contact --stdin' '
-	test_must_fail git check-mailmap --stdin bogus </dev/null
+	cat >expect <<-EOF &&
+	<bogus>
+	EOF
+	cat >stdin <<-EOF &&
+	bogus
+	EOF
+	git check-mailmap --stdin <stdin >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'check-mailmap simple address: mapping' '
+	test_when_finished "rm .mailmap" &&
+	cat >.mailmap <<-EOF &&
+	New Name <$GIT_AUTHOR_EMAIL>
+	EOF
+	cat .mailmap >expect &&
+	git check-mailmap "$GIT_AUTHOR_EMAIL" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'check-mailmap simple address: no mapping' '
+	cat >expect <<-EOF &&
+	<bugs@company.xx>
+	EOF
+	git check-mailmap "bugs@company.xx" >actual &&
+	test_cmp expect actual
 '
 
 test_expect_success 'No mailmap' '