Message ID | 20201213010539.544101-2-sandals@crustytoothpaste.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Hashed mailmap support | expand |
Am 13.12.20 um 02:05 schrieb brian m. carlson: > Many people, through the course of their lives, will change either a > name or an email address. For this reason, we have the mailmap, to map > from a user's former name or email address to their current, canonical > forms. Normally, this works well as it is. > > However, sometimes people change a name or an email address and wish to > wholly disassociate themselves from that former name or email address. > For example, a person may have left a company which engaged in a deeply > unethical act with which the person does not want to be associated, or > they may have changed their name to disassociate themselves from an > abusive family or partner. In such a case, using the former name or > address in any way may be undesirable and the person may wish to replace > it as completely as possible. > > For projects which wish to support this, introduce hashed forms into the > mailmap. These forms, which start with "@sha256:" followed by a SHA-256 > hash of the entry, can be used in place of the form used in the commit > field. This form is intentionally designed to be unlikely to conflict > with legitimate use cases. For example, this is not a valid email > address according to RFC 5322. In the unlikely event that a user has > put such a form into the actual commit as their name, we will accept it. > > While the form of the data is designed to accept multiple hash > algorithms, we intentionally do not support SHA-1. There is little > reason to support such a weak algorithm in new use cases and no > backwards compatibility to consider. Moreover, SHA-256 is faster than > the SHA1DC implementation we use, so this not only improves performance, > but simplifies the current implementation somewhat as well. > > Note that it is, of course, possible to perform a lookup on all commit > objects to determine the actual entry which matches the hashed form of > the data. However, a project for which this feature is valuable may > simply insert entries for many contributors in order to make discovery > of "interesting" entries significantly less convenient. > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > --- ... > diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh > index 586c3a86b1..794133ba5d 100755 > --- a/t/t4203-mailmap.sh > +++ b/t/t4203-mailmap.sh > @@ -62,6 +62,41 @@ test_expect_success 'check-mailmap --stdin arguments' ' > test_cmp expect actual > ' > > +test_expect_success 'hashed mailmap' ' > + test_config mailmap.file ./hashed && > + hashed_author_name="@sha256:$(printf "$GIT_AUTHOR_NAME" | test-tool sha256)" && > + hashed_author_email="@sha256:$(printf "$GIT_AUTHOR_EMAIL" | test-tool sha256)" && > + cat >expect <<-EOF && > + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> > + EOF ... > + cat >hashed <<-EOF && > + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> <$hashed_author_email> > + EOF > + git check-mailmap "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual && > + test_cmp expect actual I don't understand the concept. A mailmap entry of the form A <a@b> <x@y> tells that the former address <x@y>, which is recorded in old project history, should be replaced by A <a@b> when a commit is displayed. I am assuming that the idea is that old <x@y> should be the "banned" address. How does a hashed entry help when the hashed value appears at the right side of a mailmap entry and that literal string never appears anywhere in the history? -- Hannes
Am 13.12.20 um 10:34 schrieb Johannes Sixt: > I don't understand the concept. A mailmap entry of the form > > A <a@b> <x@y> > > tells that the former address <x@y>, which is recorded in old project > history, should be replaced by A <a@b> when a commit is displayed. I am > assuming that the idea is that old <x@y> should be the "banned" address. > How does a hashed entry help when the hashed value appears at the right > side of a mailmap entry and that literal string never appears anywhere > in the history? Never mind, I got it: A wants to be disassociated from <x@y>, but not from their contributions whose authorship was recorded as <x@y>. Therefore, Git must always compute the hash of all of <x@y>, <a@b>, etc, just in case that the hashed form appears anywhere in the mailmap file. -- Hannes
On 2020-12-13 at 09:45:58, Johannes Sixt wrote: > Am 13.12.20 um 10:34 schrieb Johannes Sixt: > > I don't understand the concept. A mailmap entry of the form > > > > A <a@b> <x@y> > > > > tells that the former address <x@y>, which is recorded in old project > > history, should be replaced by A <a@b> when a commit is displayed. I am > > assuming that the idea is that old <x@y> should be the "banned" address. > > How does a hashed entry help when the hashed value appears at the right > > side of a mailmap entry and that literal string never appears anywhere > > in the history? > > Never mind, I got it: A wants to be disassociated from <x@y>, but not > from their contributions whose authorship was recorded as <x@y>. > Therefore, Git must always compute the hash of all of <x@y>, <a@b>, etc, > just in case that the hashed form appears anywhere in the mailmap file. Yup, exactly. You can't specify the hashed one on the new side because it has to map to it, but you can on the old side. Sorry if that wasn't clear. Come to think of it, this probably needs documentation as well, so I'll wait for any other feedback and then reroll with that in there. Hopefully that will clear up any potential confusion.
"brian m. carlson" <sandals@crustytoothpaste.net> writes: > Come to think of it, this probably needs documentation as well, so I'll > wait for any other feedback and then reroll with that in there. > Hopefully that will clear up any potential confusion. Not just "where does the hashed entry can appear in the file", but "how exactly does it gets computed" needs to be described. If it is sufficient to do something like set x $(echo doe@example.com | sha256sum) && echo "@sha256sum:$2" that exact procedure must be described to the users in the documentation (note: I know the above is not correct as I looked at the tests---it is a demonstration of the need for a procedure using commonly available tools). I wonder if somebody may want to do a dedicated tool that lets you (1) given an e-mail and/or a name, look-up existing entries and show what <name, e-mail> pair it maps to; (2) take a new <name, e-mail> pair and add mapping from it to some other <name, e-mail> pair. (3) take an existing mailmap file, and obfuscate all the existing entries. The first one is covered by "check-mailmap", so the other two could be new features added to the command to be triggered with a command line option. > + cat >hashed <<-EOF && > + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $hashed_author_name <$GIT_AUTHOR_EMAIL> > + EOF > + git check-mailmap "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual && The two spaces after "check-mailmap" is not significant but drew my attention. Let's not do so.
On Sun, Dec 13 2020, brian m. carlson wrote: > Many people, through the course of their lives, will change either a > name or an email address. For this reason, we have the mailmap, to map > from a user's former name or email address to their current, canonical > forms. Normally, this works well as it is. > > However, sometimes people change a name or an email address and wish to > wholly disassociate themselves from that former name or email address. > For example, a person may have left a company which engaged in a deeply > unethical act with which the person does not want to be associated, or > they may have changed their name to disassociate themselves from an > abusive family or partner. In such a case, using the former name or > address in any way may be undesirable and the person may wish to replace > it as completely as possible. > > [...] > > Note that it is, of course, possible to perform a lookup on all commit > objects to determine the actual entry which matches the hashed form of > the data. The commit message & cover letter are subtly different in a way that I didn't even notice at first glance. E.g. I assume based on the cover letter that one part of this this is a proposed solution do the whole "deadname" problem. It would be nice if v2 were more explicit and attempted to explicitly summarize the use-cases in the commit message. But for now I'll attempt to read between the lines from having read both. I don't understand why either the problem of "I don't want to see my old name again" or "I want to hide from other abusive people" (as an aside: but not so much that you'd still take the risk of submitting a patch to .mailmap?) require a hashing solution, as opposed to just some encoding in the .mailmap file such as base64. You can still trivially get the same information in the end, on git.git running --pretty=format:"%aN %aE %an %ae" takes under a second. A part of your commit message seems to address this: > However, a project for which this feature is valuable may > simply insert entries for many contributors in order to make discovery > of "interesting" entries significantly less convenient. But I don't get how that's helped at all by a sha256 hash. Since you can trivially re-expand these again using log/check-mailmap the hashing offers no extra protection beyond a trivial layer of obscurity in those cases. You'd get the same safety in numbers by having everything a large un-hashed .mailmap file, would you not? I think the underlying use-case is legitimate, but I read it as primarily a social signaling feature by a trivial addition of obscurity. Someone called X would like not to be called Y anymore, or not be found in a search engine or "git grep" when searching for "Y". So I'd think purely from the perspective of the feature's appearance to users matching its underlying security we'd be better served with support for encoding of some sort. E.g. URL encoding, Base64, or even just string_reverse() (ROT13 is out as not working for non-ASCII names). The encoding versions of this have the added bonus of expanding the use-case beyond what you're suggesting. If you're trying to map e.g. a non-UTF-8 E-Mail address (in your project due to some encoding error) you'd be able to put it into .mailmap without making the project maintainers deal with invalid non-UTF-8 encoding in the file (the existing support is sufficient to map names in most such cases). Another reason I'd prefer some encoding solution is because .mailmap isn't just used by git itself. Since the format got added it's become how a lot of downstream systems do this mapping. E.g. I worked once on a change management system that mapped lots of user actions across different systems, and piggy-backed on .mailmap files in git to resolve E-Mail addresses even in cases where the originating data wasn't within git. Now because of the trivialness of the format it's easy to e.g. import it into a DB table and do a JOIN against it (or the same after converting it from some trivial encoding). Use-cases like that would become a full history walk for each project to extract the real E-Mails (or a re implementation of the SHA256 trick in some sub-SELECT in the database). Those are all solvable problems that are rather trivial in the end. I just wonder if we're not making things needlessly hard to achieve the stated aims. And to be fair, most of those aims I inferred (and might have incorrectly inferred), since as noted above the patch itself doesn't discuss the tradeoffs of potential alternate solutions). > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > --- > mailmap.c | 39 +++++++++++++++++++++++++++++++++++++-- > t/t4203-mailmap.sh | 35 +++++++++++++++++++++++++++++++++++ > 2 files changed, 72 insertions(+), 2 deletions(-) > > [...] > > int map_user(struct string_list *map, > const char **email, size_t *emaillen, > const char **name, size_t *namelen) > @@ -324,7 +359,7 @@ int map_user(struct string_list *map, > (int)*namelen, debug_str(*name), > (int)*emaillen, debug_str(*email)); > > - item = lookup_prefix(map, *email, *emaillen); > + item = lookup_one(map, *email, *emaillen); > if (item != NULL) { > me = (struct mailmap_entry *)item->util; > if (me->namemap.nr) { > @@ -334,7 +369,7 @@ int map_user(struct string_list *map, > * simple entry. > */ > struct string_list_item *subitem; > - subitem = lookup_prefix(&me->namemap, *name, *namelen); > + subitem = lookup_one(&me->namemap, *name, *namelen); > if (subitem) > item = subitem; > } If you turn on DEBUG_MAILMAP=1 at the top of the file and run e.g. an unbounded --pretty=format=:%aE you can see we'll call map_user() in a loop for each commit shown. What I'm suggesting above can be read as "can't we have some solution that achieves the same aims, but which we can handle purely in add_mapping()?". Both for our case, and for external parsers/re-implementations. In any case it would be interesting if v2 amended t/perf/p4205-log-pretty-formats.sh to test e.g. the impact of linux.git with all-sha256 entries to see what the cost in the tight loop could be.
Hi Brian On 13/12/2020 01:05, brian m. carlson wrote: > Many people, through the course of their lives, will change either a > name or an email address. For this reason, we have the mailmap, to map > from a user's former name or email address to their current, canonical > forms. Normally, this works well as it is. > > However, sometimes people change a name or an email address and wish to > wholly disassociate themselves from that former name or email address. > For example, a person may have left a company which engaged in a deeply > unethical act with which the person does not want to be associated, or > they may have changed their name to disassociate themselves from an > abusive family or partner. I think we should be clear in the documentation that by adding a hashed .mailmap entry people are still publicly associating their old identity with their new identity it's just that the association is obscured. They should not rely on it for their safety. An abusive partner knows the old identity so all they have to do to find the new identity is hash the old identity and see if it is in the .mailmap file. Having said that I think this is a useful step forward in may cases. Best Wishes Phillip > In such a case, using the former name or > address in any way may be undesirable and the person may wish to replace > it as completely as possible. > > For projects which wish to support this, introduce hashed forms into the > mailmap. These forms, which start with "@sha256:" followed by a SHA-256 > hash of the entry, can be used in place of the form used in the commit > field. This form is intentionally designed to be unlikely to conflict > with legitimate use cases. For example, this is not a valid email > address according to RFC 5322. In the unlikely event that a user has > put such a form into the actual commit as their name, we will accept it. > > While the form of the data is designed to accept multiple hash > algorithms, we intentionally do not support SHA-1. There is little > reason to support such a weak algorithm in new use cases and no > backwards compatibility to consider. Moreover, SHA-256 is faster than > the SHA1DC implementation we use, so this not only improves performance, > but simplifies the current implementation somewhat as well. > > Note that it is, of course, possible to perform a lookup on all commit > objects to determine the actual entry which matches the hashed form of > the data. However, a project for which this feature is valuable may > simply insert entries for many contributors in order to make discovery > of "interesting" entries significantly less convenient. > > Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> > --- > mailmap.c | 39 +++++++++++++++++++++++++++++++++++++-- > t/t4203-mailmap.sh | 35 +++++++++++++++++++++++++++++++++++ > 2 files changed, 72 insertions(+), 2 deletions(-) > > diff --git a/mailmap.c b/mailmap.c > index 962fd86d6d..09d0ad7ca4 100644 > --- a/mailmap.c > +++ b/mailmap.c > @@ -313,6 +313,41 @@ static struct string_list_item *lookup_prefix(struct string_list *map, > return NULL; > } > > +/* > + * Convert an email or name into a hashed form for comparison. The hashed form > + * will be created in the form > + * @sha256:c68b7a430ac8dee9676ec77a387194e23f234d024e03d844050cf6c01775c8f6, > + * which would be the hashed form for "doe@example.com". > + */ > +static char *hashed_form(struct strbuf *buf, const struct git_hash_algo *algop, const char *key, size_t keylen) > +{ > + git_hash_ctx ctx; > + unsigned char hashbuf[GIT_MAX_RAWSZ]; > + char hexbuf[GIT_MAX_HEXSZ + 1]; > + > + algop->init_fn(&ctx); > + algop->update_fn(&ctx, key, keylen); > + algop->final_fn(hashbuf, &ctx); > + hash_to_hex_algop_r(hexbuf, hashbuf, algop); > + > + strbuf_addf(buf, "@%s:%s", algop->name, hexbuf); > + return buf->buf; > +} > + > +static struct string_list_item *lookup_one(struct string_list *map, > + const char *string, size_t len) > +{ > + struct strbuf buf = STRBUF_INIT; > + struct string_list_item *item = lookup_prefix(map, string, len); > + if (item) > + return item; > + > + hashed_form(&buf, &hash_algos[GIT_HASH_SHA256], string, len); > + item = lookup_prefix(map, buf.buf, buf.len); > + strbuf_release(&buf); > + return item; > +} > + > int map_user(struct string_list *map, > const char **email, size_t *emaillen, > const char **name, size_t *namelen) > @@ -324,7 +359,7 @@ int map_user(struct string_list *map, > (int)*namelen, debug_str(*name), > (int)*emaillen, debug_str(*email)); > > - item = lookup_prefix(map, *email, *emaillen); > + item = lookup_one(map, *email, *emaillen); > if (item != NULL) { > me = (struct mailmap_entry *)item->util; > if (me->namemap.nr) { > @@ -334,7 +369,7 @@ int map_user(struct string_list *map, > * simple entry. > */ > struct string_list_item *subitem; > - subitem = lookup_prefix(&me->namemap, *name, *namelen); > + subitem = lookup_one(&me->namemap, *name, *namelen); > if (subitem) > item = subitem; > } > diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh > index 586c3a86b1..794133ba5d 100755 > --- a/t/t4203-mailmap.sh > +++ b/t/t4203-mailmap.sh > @@ -62,6 +62,41 @@ test_expect_success 'check-mailmap --stdin arguments' ' > test_cmp expect actual > ' > > +test_expect_success 'hashed mailmap' ' > + test_config mailmap.file ./hashed && > + hashed_author_name="@sha256:$(printf "$GIT_AUTHOR_NAME" | test-tool sha256)" && > + hashed_author_email="@sha256:$(printf "$GIT_AUTHOR_EMAIL" | test-tool sha256)" && > + cat >expect <<-EOF && > + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> > + EOF > + > + cat >hashed <<-EOF && > + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $hashed_author_name <$GIT_AUTHOR_EMAIL> > + EOF > + git check-mailmap "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual && > + test_cmp expect actual && > + > + cat >hashed <<-EOF && > + Wrong <wrong@example.org> $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> > + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $hashed_author_name <$GIT_AUTHOR_EMAIL> > + EOF > + # Check that we prefer literal matches over hashed names. > + git check-mailmap "$hashed_author_name <$GIT_AUTHOR_EMAIL>" >actual && > + test_cmp expect actual && > + > + cat >hashed <<-EOF && > + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $hashed_author_name <$hashed_author_email> > + EOF > + git check-mailmap "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual && > + test_cmp expect actual && > + > + cat >hashed <<-EOF && > + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> <$hashed_author_email> > + EOF > + git check-mailmap "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual && > + test_cmp expect actual > +' > + > test_expect_success 'check-mailmap bogus contact' ' > test_must_fail git check-mailmap bogus > ' >
On 2020-12-14 at 00:09:19, Junio C Hamano wrote: > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > Come to think of it, this probably needs documentation as well, so I'll > > wait for any other feedback and then reroll with that in there. > > Hopefully that will clear up any potential confusion. > > Not just "where does the hashed entry can appear in the file", but > "how exactly does it gets computed" needs to be described. If it is > sufficient to do something like > > set x $(echo doe@example.com | sha256sum) && > echo "@sha256sum:$2" > > that exact procedure must be described to the users in the > documentation (note: I know the above is not correct as I looked at > the tests---it is a demonstration of the need for a procedure using > commonly available tools). I believe the difference is that "echo" adds a newline and you probably wanted "printf" here. But I get your point: we need documentation to explain how to do this that's simple and straightforward, and as we've both pointed out, there isn't any at all. I'll add some. > I wonder if somebody may want to do a dedicated tool that lets you > > (1) given an e-mail and/or a name, look-up existing entries and > show what <name, e-mail> pair it maps to; > > (2) take a new <name, e-mail> pair and add mapping from it to some > other <name, e-mail> pair. > > (3) take an existing mailmap file, and obfuscate all the existing > entries. > > The first one is covered by "check-mailmap", so the other two could > be new features added to the command to be triggered with a command > line option. That could be a useful tool. > > + cat >hashed <<-EOF && > > + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $hashed_author_name <$GIT_AUTHOR_EMAIL> > > + EOF > > + git check-mailmap "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual && > > The two spaces after "check-mailmap" is not significant but drew my > attention. Let's not do so. That wasn't intentional. Will fix.
diff --git a/mailmap.c b/mailmap.c index 962fd86d6d..09d0ad7ca4 100644 --- a/mailmap.c +++ b/mailmap.c @@ -313,6 +313,41 @@ static struct string_list_item *lookup_prefix(struct string_list *map, return NULL; } +/* + * Convert an email or name into a hashed form for comparison. The hashed form + * will be created in the form + * @sha256:c68b7a430ac8dee9676ec77a387194e23f234d024e03d844050cf6c01775c8f6, + * which would be the hashed form for "doe@example.com". + */ +static char *hashed_form(struct strbuf *buf, const struct git_hash_algo *algop, const char *key, size_t keylen) +{ + git_hash_ctx ctx; + unsigned char hashbuf[GIT_MAX_RAWSZ]; + char hexbuf[GIT_MAX_HEXSZ + 1]; + + algop->init_fn(&ctx); + algop->update_fn(&ctx, key, keylen); + algop->final_fn(hashbuf, &ctx); + hash_to_hex_algop_r(hexbuf, hashbuf, algop); + + strbuf_addf(buf, "@%s:%s", algop->name, hexbuf); + return buf->buf; +} + +static struct string_list_item *lookup_one(struct string_list *map, + const char *string, size_t len) +{ + struct strbuf buf = STRBUF_INIT; + struct string_list_item *item = lookup_prefix(map, string, len); + if (item) + return item; + + hashed_form(&buf, &hash_algos[GIT_HASH_SHA256], string, len); + item = lookup_prefix(map, buf.buf, buf.len); + strbuf_release(&buf); + return item; +} + int map_user(struct string_list *map, const char **email, size_t *emaillen, const char **name, size_t *namelen) @@ -324,7 +359,7 @@ int map_user(struct string_list *map, (int)*namelen, debug_str(*name), (int)*emaillen, debug_str(*email)); - item = lookup_prefix(map, *email, *emaillen); + item = lookup_one(map, *email, *emaillen); if (item != NULL) { me = (struct mailmap_entry *)item->util; if (me->namemap.nr) { @@ -334,7 +369,7 @@ int map_user(struct string_list *map, * simple entry. */ struct string_list_item *subitem; - subitem = lookup_prefix(&me->namemap, *name, *namelen); + subitem = lookup_one(&me->namemap, *name, *namelen); if (subitem) item = subitem; } diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh index 586c3a86b1..794133ba5d 100755 --- a/t/t4203-mailmap.sh +++ b/t/t4203-mailmap.sh @@ -62,6 +62,41 @@ test_expect_success 'check-mailmap --stdin arguments' ' test_cmp expect actual ' +test_expect_success 'hashed mailmap' ' + test_config mailmap.file ./hashed && + hashed_author_name="@sha256:$(printf "$GIT_AUTHOR_NAME" | test-tool sha256)" && + hashed_author_email="@sha256:$(printf "$GIT_AUTHOR_EMAIL" | test-tool sha256)" && + cat >expect <<-EOF && + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> + EOF + + cat >hashed <<-EOF && + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $hashed_author_name <$GIT_AUTHOR_EMAIL> + EOF + git check-mailmap "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual && + test_cmp expect actual && + + cat >hashed <<-EOF && + Wrong <wrong@example.org> $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $hashed_author_name <$GIT_AUTHOR_EMAIL> + EOF + # Check that we prefer literal matches over hashed names. + git check-mailmap "$hashed_author_name <$GIT_AUTHOR_EMAIL>" >actual && + test_cmp expect actual && + + cat >hashed <<-EOF && + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $hashed_author_name <$hashed_author_email> + EOF + git check-mailmap "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual && + test_cmp expect actual && + + cat >hashed <<-EOF && + $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> <$hashed_author_email> + EOF + git check-mailmap "$GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL>" >actual && + test_cmp expect actual +' + test_expect_success 'check-mailmap bogus contact' ' test_must_fail git check-mailmap bogus '
Many people, through the course of their lives, will change either a name or an email address. For this reason, we have the mailmap, to map from a user's former name or email address to their current, canonical forms. Normally, this works well as it is. However, sometimes people change a name or an email address and wish to wholly disassociate themselves from that former name or email address. For example, a person may have left a company which engaged in a deeply unethical act with which the person does not want to be associated, or they may have changed their name to disassociate themselves from an abusive family or partner. In such a case, using the former name or address in any way may be undesirable and the person may wish to replace it as completely as possible. For projects which wish to support this, introduce hashed forms into the mailmap. These forms, which start with "@sha256:" followed by a SHA-256 hash of the entry, can be used in place of the form used in the commit field. This form is intentionally designed to be unlikely to conflict with legitimate use cases. For example, this is not a valid email address according to RFC 5322. In the unlikely event that a user has put such a form into the actual commit as their name, we will accept it. While the form of the data is designed to accept multiple hash algorithms, we intentionally do not support SHA-1. There is little reason to support such a weak algorithm in new use cases and no backwards compatibility to consider. Moreover, SHA-256 is faster than the SHA1DC implementation we use, so this not only improves performance, but simplifies the current implementation somewhat as well. Note that it is, of course, possible to perform a lookup on all commit objects to determine the actual entry which matches the hashed form of the data. However, a project for which this feature is valuable may simply insert entries for many contributors in order to make discovery of "interesting" entries significantly less convenient. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> --- mailmap.c | 39 +++++++++++++++++++++++++++++++++++++-- t/t4203-mailmap.sh | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 72 insertions(+), 2 deletions(-)