Message ID | 87blk0rjob.fsf@0x63.nu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Questions about trailer configuration semantics | expand |
[Redirecting it to the resident expert of the trailers] Anders Waldenborg <anders@0x63.nu> writes: > I noticed some undocumented and (at least to me) surprising behavior in > trailers.c. > > When configuring a value in trailer.<token>.key it causes the trailer to > be normalized to that in "git interpret-trailers --parse". > E.g: > $ printf '\naCKed: Zz\n' | \ > git -c 'trailer.Acked.key=Acked' interpret-trailers --parse > will emit: "Acked: Zz" > > but only if "key" is used, other config options doesn't cause it to be > normalized. > E.g: > $ printf '\naCKed: Zz\n' | \ > git -c 'trailer.Acked.ifmissing=doNothing' interpret-trailers --parse > will emit: "aCKed: Zz" (still lowercase a and uppercase CK) > > > Then there is the replacement by config "trailer.fix.key=Fixes" which > expands "fix" to "Fixes". This happens when using "--trailer 'fix = 123'" > which seems to be expected and useful behavior (albeit a bit unclear in > documentation). But it also happens when parsing incoming trailers, e.g > with that config > $ printf "\nFix: 1\n" | git interpret-trailers --parse > will emit: "Fixes: 1" > > (token_from_item prefers order .key, incoming token, .name) > > > The most surprising thing is that it uses prefix matching when finding > they key in configuration. If I have "trailer.reviewed.key=Reviewed-By" > it is possible to just '--trailer r=XYZ' and it will find the > reviewed-by trailer as "r" is a prefix of reviewedby. This also applies > to the "--parse". This in makes it impossible to have trailer keys that > are prefix of each other (e.g: "Acked", "Acked-Tests", "Acked-Docs") if > there is multiple matching in configuration it will just pick the one > that happens to come first. > > (token_matches_item uses strncasecmp with token length) > > > I guess these are the questions for the above observations: > > * Should normalization of spelling happen at all? > > * If so should it only happen when there is a .key config? > > * Should replacement to what is in .key happen also in --parse mode, or > only for "--trailer" > > * The prefix matching gotta be a bug, right? > > > > Here is a patch to the tests showing these things. > > > > From 49a4bb64a7ebf1f2d50897a024deb86b4f8056b1 Mon Sep 17 00:00:00 2001 > From: Anders Waldenborg <anders@0x63.nu> > Date: Mon, 27 Jul 2020 18:34:37 +0200 > Subject: [PATCH] trailers: add tests for unclear/undocumented behavior > > --- > t/t7513-interpret-trailers.sh | 70 +++++++++++++++++++++++++++++++++++ > 1 file changed, 70 insertions(+) > > diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh > index 2e6d406edf..d5d19cf89b 100755 > --- a/t/t7513-interpret-trailers.sh > +++ b/t/t7513-interpret-trailers.sh > @@ -99,6 +99,64 @@ test_expect_success 'with config option on the command line' ' > test_cmp expected actual > ' > > +test_expect_success 'parse normalizes spelling and separators from configs with key' ' > + cat >patch <<-\EOF && > + non-trailer-line > + > + ReviEweD-bY :abc > + ReviEwEd-bY) rst > + ReviEweD-BY ; xyz > + aCked-bY: not normalized > + EOF > + cat >expected <<-\EOF && > + Reviewed-By: abc > + Reviewed-By: rst > + Reviewed-By: xyz > + aCked-bY: not normalized > + EOF > + git \ > + -c "trailer.separators=:);" \ > + -c "trailer.rb.key=Reviewed-By" \ > + -c "trailer.Acked-By.ifmissing=doNothing" \ > + interpret-trailers --parse patch >actual && > + test_cmp expected actual > +' > + > +# Matching currently is prefix matching, causing "This-trailer" to be normalized too > +test_expect_failure 'config option matches exact only' ' > + cat >patch <<-\EOF && > + > + This-trailer: a > + b > + This-trailer-exact: b > + c > + This-trailer-exact-plus-some: c > + d > + EOF > + cat >expected <<-\EOF && > + This-trailer: a b > + THIS-TRAILER-EXACT: b c > + This-trailer-exact-plus-some: c d > + EOF > + git -c "trailer.tte.key=THIS-TRAILER-EXACT" interpret-trailers --parse patch >actual && > + test_cmp expected actual > +' > + > +# Matching currently uses the config key even if key value is different > +test_expect_failure 'config option matches exact only' ' > + cat >patch <<-\EOF && > + > + Ticket: 1234 > + Reference-ticket: 99 > + EOF > + cat >expected <<-\EOF && > + Ticket: 1234 > + Reference-Ticket: 99 > + EOF > + git -c "trailer.ticket.key=Reference-Ticket" interpret-trailers --parse patch >actual && > + test_cmp expected actual > +' > + > test_expect_success 'with only a title in the message' ' > cat >expected <<-\EOF && > area: change > @@ -473,6 +531,18 @@ test_expect_success 'with config setup' ' > test_cmp expected actual > ' > > +# currently this matches the "Acked-by: " value in ack key set by previous test > +test_expect_failure 'with config setup matches key exactly' ' > + cat >expected <<-\EOF && > + > + A: B > + EOF > + git interpret-trailers --trailer "A=10" empty >actual && > + test_cmp expected actual > +' > + > + > + > test_expect_success 'with config setup and ":=" as separators' ' > git config trailer.separators ":=" && > git config trailer.ack.key "Acked-by= " && > -- > 2.25.1
[Adding Peff and Jonathan in Cc as they know also about this area of the code] On Mon, Jul 27, 2020 at 7:18 PM Junio C Hamano <gitster@pobox.com> wrote: > > [Redirecting it to the resident expert of the trailers] Thanks! > Anders Waldenborg <anders@0x63.nu> writes: > > > I noticed some undocumented and (at least to me) surprising behavior in > > trailers.c. > > > > When configuring a value in trailer.<token>.key it causes the trailer to > > be normalized to that in "git interpret-trailers --parse". > > E.g: > > $ printf '\naCKed: Zz\n' | \ > > git -c 'trailer.Acked.key=Acked' interpret-trailers --parse > > will emit: "Acked: Zz" Yeah, I think that's nice, as it can make sure that the key appears in the same way. It's true that it would be better if it would be documented. > > but only if "key" is used, other config options doesn't cause it to be > > normalized. > > E.g: > > $ printf '\naCKed: Zz\n' | \ > > git -c 'trailer.Acked.ifmissing=doNothing' interpret-trailers --parse > > will emit: "aCKed: Zz" (still lowercase a and uppercase CK) Yeah, in this case we are not sure if "Acked" or "aCKed" is the right way to spell it. > > Then there is the replacement by config "trailer.fix.key=Fixes" which > > expands "fix" to "Fixes". This happens when using "--trailer 'fix = 123'" > > which seems to be expected and useful behavior (albeit a bit unclear in > > documentation). But it also happens when parsing incoming trailers, e.g > > with that config > > $ printf "\nFix: 1\n" | git interpret-trailers --parse > > will emit: "Fixes: 1" Yeah, I think it allows for shortcuts and can help with standardizing the keys in commit messages. > > (token_from_item prefers order .key, incoming token, .name) > > > > > > The most surprising thing is that it uses prefix matching when finding > > they key in configuration. If I have "trailer.reviewed.key=Reviewed-By" > > it is possible to just '--trailer r=XYZ' and it will find the > > reviewed-by trailer as "r" is a prefix of reviewedby. This also applies > > to the "--parse". Yeah, that's also for shortcuts and standardization. > > This in makes it impossible to have trailer keys that > > are prefix of each other (e.g: "Acked", "Acked-Tests", "Acked-Docs") if > > there is multiple matching in configuration it will just pick the one > > that happens to come first. That's a downside of the above. I agree that it might seem strange or bad. Perhaps an option could be added to implement a strict matching, if people really want it. Also if you configure trailers in the "Acked", "Acked-Tests", "Acked-Docs" order, then any common prefix will pick "Acked" which could be considered ok in my opinion. > > (token_matches_item uses strncasecmp with token length) > > > > > > I guess these are the questions for the above observations: > > > > * Should normalization of spelling happen at all? Yes, I think it can help. > > * If so should it only happen when there is a .key config? Yes, it can help too if that only happens when there is a .key config. > > * Should replacement to what is in .key happen also in --parse mode, or > > only for "--trailer" I think it's more consistent if it happens in both --parse and --trailer mode. I didn't implement --parse though. > > * The prefix matching gotta be a bug, right? No, it's a feature ;-) Seriously I agree that this could be seen as a downside, but I think it can be understood that the convenience is worth it. And in case someone is really annoyed by this, then adding an option for strict matching should not be very difficult. > > Here is a patch to the tests showing these things. Thanks for the patch! I would be ok to add such a patch to the test suite if it was sent like a regular patch (so with a commit message, a Signed-off-by: and so on) to the mailing list. While at it some documentation of the related behavior would also be very nice.
On Mon, Jul 27, 2020 at 08:37:26PM +0200, Christian Couder wrote: > > > I noticed some undocumented and (at least to me) surprising behavior in > > > trailers.c. > > > > > > When configuring a value in trailer.<token>.key it causes the trailer to > > > be normalized to that in "git interpret-trailers --parse". > > > E.g: > > > $ printf '\naCKed: Zz\n' | \ > > > git -c 'trailer.Acked.key=Acked' interpret-trailers --parse > > > will emit: "Acked: Zz" > > Yeah, I think that's nice, as it can make sure that the key appears in > the same way. It's true that it would be better if it would be > documented. I'd note that this also happens without --parse. > > > Then there is the replacement by config "trailer.fix.key=Fixes" which > > > expands "fix" to "Fixes". This happens when using "--trailer 'fix = 123'" > > > which seems to be expected and useful behavior (albeit a bit unclear in > > > documentation). But it also happens when parsing incoming trailers, e.g > > > with that config > > > $ printf "\nFix: 1\n" | git interpret-trailers --parse > > > will emit: "Fixes: 1" > [...] > > > * Should replacement to what is in .key happen also in --parse mode, or > > > only for "--trailer" > > I think it's more consistent if it happens in both --parse and > --trailer mode. I didn't implement --parse though. I don't recall being aware of this prefix matching until this thread, so I doubt that the current behavior of --parse was something I tried for intentionally. It's mostly just using the existing code, plus a few extra options (listed in the docs). I'm not opposed to adding an option to do strict matching and/or avoid rewriting, and then possibly adding that into --parse by default. I don't have much of an opinion on which behavior would be preferred. I've never actually had a use case for configuring trailer.*.key, as I usually am only looking at reading existing trailers to collect stats, etc. -Peff
Christian Couder <christian.couder@gmail.com> writes: >> > $ printf '\naCKed: Zz\n' | \ >> > git -c 'trailer.Acked.key=Acked' interpret-trailers --parse >> > will emit: "Acked: Zz" > > Yeah, I think that's nice, as it can make sure that the key appears in > the same way. It's true that it would be better if it would be > documented. > >> > but only if "key" is used, other config options doesn't cause it to be >> > normalized. >> > E.g: >> > $ printf '\naCKed: Zz\n' | \ >> > git -c 'trailer.Acked.ifmissing=doNothing' interpret-trailers --parse >> > will emit: "aCKed: Zz" (still lowercase a and uppercase CK) > > Yeah, in this case we are not sure if "Acked" or "aCKed" is the right > way to spell it. OK, so in short, the trailer subsystem matches the second level of the configuration variable name (e.g. "Acked") in a case insensitive way, and it does *not* normalize the case in the output. The .key request is a mechanism to replace the matched key with the specified string, so there is *NO* case normalization in what Anders observed. In other words, $ printf '\naCKed: Zz\n' | \ git -c 'trailer.Acked.key=Rejected' interpret-trailers --parse would have emitted "Rejected: Zz".
Christian Couder writes: >> > When configuring a value in trailer.<token>.key it causes the trailer to >> > be normalized to that in "git interpret-trailers --parse". > > Yeah, I think that's nice, as it can make sure that the key appears in > the same way. It's true that it would be better if it would be > documented. > >> > but only if "key" is used, other config options doesn't cause it to be >> > normalized. > > Yeah, in this case we are not sure if "Acked" or "aCKed" is the right > way to spell it. Makes sense. However I guess one alternative implementation would be that if trailer.X.something is configured but not trailer.X.key it would work as if there was an implicit "trailer.X.key=X" configured. The name of the configuration value would specify the correct spelling. >> > Then there is the replacement by config "trailer.fix.key=Fixes" which >> > expands "fix" to "Fixes". This happens when using "--trailer 'fix = 123'" >> > which seems to be expected and useful behavior (albeit a bit unclear in >> > documentation). But it also happens when parsing incoming trailers, e.g >> > with that config >> > $ printf "\nFix: 1\n" | git interpret-trailers --parse >> > will emit: "Fixes: 1" > > Yeah, I think it allows for shortcuts and can help with standardizing > the keys in commit messages. Maybe what I'm missing is a clear picture of the different cases that "git interpret-trailers" is being used in. The "--trailer x=10" option seems clearly designed for human input trying to be as helpful as possible (e.g always allowing '=' regardless of separators configured). But when reading a message with trailers, is same helpfulness useful? Or is it always only reading proper trailers previously added by --trailer? The standardization of trailers is interesting. If I want to standardize "ReviewedBy", "Reviewer", "Code-Reviewer" trailers to "Reviewed-By" I would add these configs: trailer.reviewer.key = Reviewed-By trailer.ReviewedBy.key = Reviewed-By trailer.Code-Reviewer.key = Reviewed-By Seems useful (and works out of the box as a msg-filter to filter-branch). But the configuration seems a bit backwards, it is configured a 3 different trailer entries, rather that a single trailer entry with three aliases (or something like that). >> > (token_from_item prefers order .key, incoming token, .name) >> > >> > >> > The most surprising thing is that it uses prefix matching when finding >> > they key in configuration. If I have "trailer.reviewed.key=Reviewed-By" >> > it is possible to just '--trailer r=XYZ' and it will find the >> > reviewed-by trailer as "r" is a prefix of reviewedby. This also applies >> > to the "--parse". > > Yeah, that's also for shortcuts and standardization. > >> > This in makes it impossible to have trailer keys that >> > are prefix of each other (e.g: "Acked", "Acked-Tests", "Acked-Docs") if >> > there is multiple matching in configuration it will just pick the one >> > that happens to come first. > > That's a downside of the above. I agree that it might seem strange or > bad. Perhaps an option could be added to implement a strict matching, > if people really want it. > > Also if you configure trailers in the "Acked", "Acked-Tests", > "Acked-Docs" order, then any common prefix will pick "Acked" which > could be considered ok in my opinion. Yeah, that works. But that dependency on order of configuration is quite subtle imho. E.g consider: $ cat >msg <<EOF Acked: acked Acked-Test: test Acked-Docs: docs EOF $ git -c 'trailer.Acked.key=Acked' \ -c 'trailer.Acked-Tests.key=Acked-Tests' \ -c 'trailer.Acked-Docs.key=Acked-Docs' \ interpret-trailers --parse msg Acked: acked Acked-Tests: test Acked-Docs: docs $ git -c 'trailer.Acked-Docs.key=Acked-Docs' \ -c 'trailer.Acked-Tests.key=Acked-Tests' \ -c 'trailer.Acked.key=Acked' \ interpret-trailers --parse msg Acked-Docs: acked Acked-Tests: test Acked-Docs: docs $ git -c 'trailer.Acked-Tests.key=Acked-Tests' \ -c 'trailer.Acked-Docs.key=Acked-Docs' \ -c 'trailer.Acked.key=Acked' \ interpret-trailers --parse msg Acked-Tests: acked Acked-Tests: test Acked-Docs: docs > >> > (token_matches_item uses strncasecmp with token length) >> > >> > >> > I guess these are the questions for the above observations: >> > >> > * Should normalization of spelling happen at all? > > Yes, I think it can help. > >> > * If so should it only happen when there is a .key config? > > Yes, it can help too if that only happens when there is a .key config. > >> > * Should replacement to what is in .key happen also in --parse mode, or >> > only for "--trailer" > > I think it's more consistent if it happens in both --parse and > --trailer mode. I didn't implement --parse though. Keep in mind that the machinery used by interpret-trailers is also used by pretty '%(trailers)' so whatever normalization and shortcuts happening also shows up there. Is shortcuts useful in that case too? There the input is commit history, not some user input. E.g: $ git -c 'trailer.signed-off-by.key=Attest' \ show --pretty='format:%(trailers:key=Attest)' --no-patch \ 0172f7834a31ae7cb9873feaaaa63939352fa3ae Attest: Christian Couder <chriscool@tuxfamily.org> Attest: Junio C Hamano <gitster@pobox.com> There is also some inconsistency here. If one use '%(trailers) the normalization doesn't happen. Only if using '%(trailers:only)' or some other option. (because optimization in format_trailer_info: /* If we want the whole block untouched, we can take the fast path. */) I guess the fact that expansion happens also in these cases can get confusing if I have configured a trailer "Bug-Introduced-In" locally and the commit message contains "Bug: 123". 'git log --pretty=format:"%h% (trailers:key=Bug-Introduced-In,valueonly,separator=%x20)"' would pick up data from the wrong trailer just because I added configuration for Bug-Introduced-In trailer. >> > * The prefix matching gotta be a bug, right? > > No, it's a feature ;-) Seriously I agree that this could be seen as a > downside, but I think it can be understood that the convenience is > worth it. And in case someone is really annoyed by this, then adding > an option for strict matching should not be very difficult. > >> > Here is a patch to the tests showing these things. > > Thanks for the patch! I would be ok to add such a patch to the test > suite if it was sent like a regular patch (so with a commit message, a > Signed-off-by: and so on) to the mailing list. While at it some > documentation of the related behavior would also be very nice. Should I keep the "test_expect_failure" tests, or change into expecting current behavior and switch them over to "test_except_success"? I'll see if I can do something about documentation.
Junio C Hamano writes: > Christian Couder <christian.couder@gmail.com> writes: >> Yeah, in this case we are not sure if "Acked" or "aCKed" is the right >> way to spell it. > > OK, so in short, the trailer subsystem matches the second level of > the configuration variable name (e.g. "Acked") in a case insensitive > way From what I can understand it tries to match *both* on the second level AND the value of .key (trailers.c:token_matches_item) $ printf '\na: 1\nb: 2\nc: 3\n' | \ git -c 'trailer.A.key=B' interpret-trailers B: 1 B: 2 c: 3 and then uses the .key value when outputting the result (by calling trailer.c:token_from_item) I.e: it gets "a: 1", tries to find configuration for that, and finds trailer.A because "a" (case insenitively) matches conf.name, therefore it outputs value of trailer.A.key + separator + "1" then it gets "b: 1", and again finds trailer.A because "b" matches conf.key. > , and it does *not* normalize the case in the output. The .key > request is a mechanism to replace the matched key with the specified > string, so there is *NO* case normalization in what Anders observed. Hmm. Maybe the "matching" vs "outputting" makes it clearer. Given configuration trailer.<NAME>.key=<KEY> When printing a parsed trailer, e.g from pretty format "%(trailers)", "git interpret-trailers" passthrough of existing trailers or addition of a new trailer with --trailer: <KEY> is used in output. If <KEY> is not configured the trailer token is output the same way as it was in input. When finding a trailer, e.g for '--trailer x=y' or trailer.<NAME>.where=before/after: matching is done against both <NAME> and <KEY>. When showing a single trailer with pretty format '%(trailers:key=X)' it is matched against <KEY> only. (I guess one can see this as matching on the formatted output). > In other words, > > $ printf '\naCKed: Zz\n' | \ > git -c 'trailer.Acked.key=Rejected' interpret-trailers --parse > > would have emitted "Rejected: Zz". Indeed.
Anders Waldenborg <anders@0x63.nu> writes: > However I guess one alternative implementation would be that if > trailer.X.something is configured but not trailer.X.key it would work as > if there was an implicit "trailer.X.key=X" configured. The name of the > configuration value would specify the correct spelling. I had the same thought but then a question struck and stopped me: what should happen if "trailer.X.something" and "trailer.x.somethingelse" are both defined?
Jeff King writes: > On Mon, Jul 27, 2020 at 08:37:26PM +0200, Christian Couder wrote: > >> > > I noticed some undocumented and (at least to me) surprising behavior in >> > > trailers.c. >> > > >> > > When configuring a value in trailer.<token>.key it causes the trailer to >> > > be normalized to that in "git interpret-trailers --parse". >> > > E.g: >> > > $ printf '\naCKed: Zz\n' | \ >> > > git -c 'trailer.Acked.key=Acked' interpret-trailers --parse >> > > will emit: "Acked: Zz" >> >> Yeah, I think that's nice, as it can make sure that the key appears in >> the same way. It's true that it would be better if it would be >> documented. > > I'd note that this also happens without --parse. Right, and it also happens with "--only-input" (part of "--parse") "--only-input" is documented as follows: Output only trailers that exist in the input; do not add any from the command-line or by following configured trailer.* rules. [] > I don't recall being aware of this prefix matching until this thread, so > I doubt that the current behavior of --parse was something I tried for > intentionally. It's mostly just using the existing code, plus a few > extra options (listed in the docs). I'm not opposed to adding an option > to do strict matching and/or avoid rewriting, and then possibly adding > that into --parse by default. Would that option also be set for the parsing done by "%(trailers)" pretty format specifier? > I don't have much of an opinion on which behavior would be preferred. > I've never actually had a use case for configuring trailer.*.key, as I > usually am only looking at reading existing trailers to collect stats, > etc. I'm also mainly using in reading trailers (mostly with pretty format "%(trailers:key=x)") and then these convenience shortcuts doesn't really seem helpful, they rather add a small risk of mangling my data. Not that this has caused any problems for me in practice.
Anders Waldenborg <anders@0x63.nu> writes: > From what I can understand it tries to match *both* on the second level > AND the value of .key (trailers.c:token_matches_item) Yuck, I do not know what were we thinking to design the behaviour like *that*. Or it may be simply buggy. > $ printf '\na: 1\nb: 2\nc: 3\n' | \ > git -c 'trailer.A.key=B' interpret-trailers > B: 1 > B: 2 > c: 3 I can understand the first one (i.e. "trailer.$name.$var" try to match $name as case insensitively) but not the second one. There is not an single rule for "b" trailer, and we should be getting the same behaviour as the third line, i.e. the key not involved in rewriting is passed as-is.
Junio C Hamano writes: > I had the same thought but then a question struck and stopped me: > what should happen if "trailer.X.something" and > "trailer.x.somethingelse" are both defined? Good point. If we follow the same reasoning as with what happens with prefix matching (config order matters) the one that happens to be mentioned first in configuration wins. The same thing actually happens today with .key: trailer.foo.key=Hello trailer.bar.key=HELLO $ printf "\nHELLo: hi\n" | \ git -c "trailer.foo.key=Hello" -c "trailer.bar.key=HELLO" interpret-trailers --parse Hello: hi but if we swap order of config: $ printf "\nHELLo: hi\n" | \ git -c "trailer.bar.key=HELLO" -c "trailer.foo.key=Hello" interpret-trailers --parse HELLO: hi
On Tue, Jul 28, 2020 at 12:57:51AM +0200, Anders Waldenborg wrote: > >> > > When configuring a value in trailer.<token>.key it causes the trailer to > >> > > be normalized to that in "git interpret-trailers --parse". > >> > > E.g: > >> > > $ printf '\naCKed: Zz\n' | \ > >> > > git -c 'trailer.Acked.key=Acked' interpret-trailers --parse > >> > > will emit: "Acked: Zz" > >> > >> Yeah, I think that's nice, as it can make sure that the key appears in > >> the same way. It's true that it would be better if it would be > >> documented. > > > > I'd note that this also happens without --parse. > > Right, and it also happens with "--only-input" (part of "--parse") > > "--only-input" is documented as follows: > > Output only trailers that exist in the input; do not add any from the > command-line or by following configured trailer.* rules. I think I meant there only that we wouldn't add new trailers (e.g., from "trailers.*.ifMissing"). But I do agree that it might be simpler if we can just say "we don't look at trailer.<token>.* config at all in --only-input mode. I _think_ that's true (we probably do look at trailer.separators, but the rest of the token-specific ones look like they're all about writing or modifying, not reading). > > I don't recall being aware of this prefix matching until this thread, so > > I doubt that the current behavior of --parse was something I tried for > > intentionally. It's mostly just using the existing code, plus a few > > extra options (listed in the docs). I'm not opposed to adding an option > > to do strict matching and/or avoid rewriting, and then possibly adding > > that into --parse by default. > > Would that option also be set for the parsing done by "%(trailers)" > pretty format specifier? I thnk %(trailers) isn't quite the same as "--parse", because you have to say "only" or "unfold" yourself. But yeah, that option should certainly be available there, if not the default. > > I don't have much of an opinion on which behavior would be preferred. > > I've never actually had a use case for configuring trailer.*.key, as I > > usually am only looking at reading existing trailers to collect stats, > > etc. > > I'm also mainly using in reading trailers (mostly with pretty format > "%(trailers:key=x)") and then these convenience shortcuts doesn't really > seem helpful, they rather add a small risk of mangling my data. Not that > this has caused any problems for me in practice. Yeah, pondering it a bit more, I think trailer.<token>.* doesn't really make any sense for any reading operation (including %(trailers) or --parse). I guess it _could_ be useful to normalize names in some instances, but it's as likely to confuse or break somebody as to help. -Peff
Junio C Hamano writes: > Anders Waldenborg <anders@0x63.nu> writes: > >> From what I can understand it tries to match *both* on the second level >> AND the value of .key (trailers.c:token_matches_item) > > Yuck, I do not know what were we thinking to design the behaviour > like *that*. Or it may be simply buggy. > >> $ printf '\na: 1\nb: 2\nc: 3\n' | \ >> git -c 'trailer.A.key=B' interpret-trailers >> B: 1 >> B: 2 >> c: 3 > > I can understand the first one (i.e. "trailer.$name.$var" try to > match $name as case insensitively) but not the second one. There is > not an single rule for "b" trailer, and we should be getting the > same behaviour as the third line, i.e. the key not involved in > rewriting is passed as-is. I'm not so sure about that. Matching sometimes needs to happen on .key too. If this configuration is supposed to be useful (and as "shortcuts" has been mentioned before and is what tests does, I think it should be): trailer.rb.key=Reviewed-By trailer.rb.ifexists=addifdifferent then matching must look at key, not name. As the value of .key is what would have been written previously into the message. E.g: $ printf "\nReviewed-By: hi\n" | \ git -c "trailer.rb.key=Reviewed-By" \ -c "trailer.rb.ifexists=addifdifferent" \ interpret-trailers --trailer "rb=hi" Reviewed-By: hi The opposite case, matching on name in input message I'm not sure where it is useful. $ printf "\nrb: hi\n" | \ git -c "trailer.rb.key=Reviewed-By" \ -c "trailer.rb.ifexists=addifdifferent" \ interpret-trailers --trailer "rb=hi" Reviewed-By: hi The way I have understood it ".key" can be used for some different things: * Freeing up name to be used as a shortcut alias. * Defining the canonical capitalization when passing through trailers * Allowing specifying non default separator for that key. I wonder if those uses could be split up. So instead of this configuration: [trailer "rb"] key="Reviewed-By> " that does all three of those. The configuration would be: [trailer "Reviewed-By"] separator="> " canonicalize=true alias=rb that way the "name" part of the config always is the correct spelling and capitalization. "alias" could easily be a list of multiple alias if that is wanted. "alias" could be split into "inputalias" (matched against when reading trailers from stdin or a commit/tag message) and "optalias" (matched against when reading --trailer cmdline option, and %(trailers:key))
On Mon, Jul 27, 2020 at 11:41 PM Anders Waldenborg <anders@0x63.nu> wrote: > Christian Couder writes: [...] > >> > Then there is the replacement by config "trailer.fix.key=Fixes" which > >> > expands "fix" to "Fixes". This happens when using "--trailer 'fix = 123'" > >> > which seems to be expected and useful behavior (albeit a bit unclear in > >> > documentation). But it also happens when parsing incoming trailers, e.g > >> > with that config > >> > $ printf "\nFix: 1\n" | git interpret-trailers --parse > >> > will emit: "Fixes: 1" > > > > Yeah, I think it allows for shortcuts and can help with standardizing > > the keys in commit messages. > > Maybe what I'm missing is a clear picture of the different cases that > "git interpret-trailers" is being used in. The "--trailer x=10" option > seems clearly designed for human input trying to be as helpful as > possible (e.g always allowing '=' regardless of separators > configured). But when reading a message with trailers, is same > helpfulness useful? Or is it always only reading proper trailers > previously added by --trailer? I guess it depends on the purpose of reading a message with trailers. If you want to do stats for example, do you really want to consider "Reviewed", "Reviewer" and "Reviewed-by" as different trailers in your stats? > The standardization of trailers is interesting. If I want to standardize > "ReviewedBy", "Reviewer", "Code-Reviewer" trailers to "Reviewed-By" I > would add these configs: > > trailer.reviewer.key = Reviewed-By > trailer.ReviewedBy.key = Reviewed-By > trailer.Code-Reviewer.key = Reviewed-By > > Seems useful (and works out of the box as a msg-filter to > filter-branch). But the configuration seems a bit backwards, it is > configured a 3 different trailer entries, rather that a single trailer > entry with three aliases (or something like that). Maybe taking advantage of the shortcuts and using only the following could work: trailer.review.key = Reviewed-By trailer.code-review.key = Reviewed-By > >> > This in makes it impossible to have trailer keys that > >> > are prefix of each other (e.g: "Acked", "Acked-Tests", "Acked-Docs") if > >> > there is multiple matching in configuration it will just pick the one > >> > that happens to come first. > > > > That's a downside of the above. I agree that it might seem strange or > > bad. Perhaps an option could be added to implement a strict matching, > > if people really want it. > > > > Also if you configure trailers in the "Acked", "Acked-Tests", > > "Acked-Docs" order, then any common prefix will pick "Acked" which > > could be considered ok in my opinion. > > Yeah, that works. But that dependency on order of configuration is quite > subtle imho. [...] Yeah, I agree that documenting this would be nice. > >> > * Should replacement to what is in .key happen also in --parse mode, or > >> > only for "--trailer" > > > > I think it's more consistent if it happens in both --parse and > > --trailer mode. I didn't implement --parse though. > > Keep in mind that the machinery used by interpret-trailers is also used > by pretty '%(trailers)' so whatever normalization and shortcuts > happening also shows up there. Is shortcuts useful in that case too? > There the input is commit history, not some user input. Pretty %(trailer) was added later by someone else, but I guess it also depends on the use case. Is it going to be used for stats? Is it interesting to distinguish between very similar trailers in these stats? And again if people are interested in very strict processing, then adding an option for that could be the right thing to do. [...] > There is also some inconsistency here. If one use '%(trailers) the > normalization doesn't happen. Only if using '%(trailers:only)' or some > other option. > > (because optimization in format_trailer_info: > /* If we want the whole block untouched, we can take the fast path. */) Maybe that's a bug. Peff, it looks like you added the above comment. Do you think it's a bug? > I guess the fact that expansion happens also in these cases can get > confusing if I have configured a trailer "Bug-Introduced-In" locally and > the commit message contains "Bug: 123". > > 'git log --pretty=format:"%h% (trailers:key=Bug-Introduced-In,valueonly,separator=%x20)"' > would pick up data from the wrong trailer just because I added > configuration for Bug-Introduced-In trailer. Yeah, I understand that it could be an issue. > >> > Here is a patch to the tests showing these things. > > > > Thanks for the patch! I would be ok to add such a patch to the test > > suite if it was sent like a regular patch (so with a commit message, a > > Signed-off-by: and so on) to the mailing list. While at it some > > documentation of the related behavior would also be very nice. > > Should I keep the "test_expect_failure" tests, or change into expecting > current behavior and switch them over to "test_except_success"? I think you should switch them over to "test_except_success". > I'll see if I can do something about documentation. Thanks, Christian.
On Tue, Jul 28, 2020 at 09:07:18AM +0200, Christian Couder wrote: > > Maybe what I'm missing is a clear picture of the different cases that > > "git interpret-trailers" is being used in. The "--trailer x=10" option > > seems clearly designed for human input trying to be as helpful as > > possible (e.g always allowing '=' regardless of separators > > configured). But when reading a message with trailers, is same > > helpfulness useful? Or is it always only reading proper trailers > > previously added by --trailer? > > I guess it depends on the purpose of reading a message with trailers. > If you want to do stats for example, do you really want to consider > "Reviewed", "Reviewer" and "Reviewed-by" as different trailers in your > stats? My guess would be that yes, you probably would want them to be different. They _might_ be typos of each other, in which case normalizing is helpful. But they might well have totally different meanings. It will really depend on the project's use of the trailers, and I'd expect any stats-gathering to do that kind of normalization much later. I.e., use Git to reliably get "foo: bar" output from all of the commits, and then count them up in a perl script or whatever, lumping together like fields at that stage. It's inevitable that you'll have to do some data cleanup like that anyway. Lumping together prefixes isn't flexible enough to coverall cases. Using trailer.*.key to manually map names covers more, but again not all (e.g., if the project used to use "foo" but switched to "bar", but the syntax of the value fields also changed at the same time, you'd need to normalize those, too). So to me it boils down to: - returning the data as verbatim as possible when reading existing trailers would give the least surprise to most people - real data collection is going to involve a separate post-processing step which is better done in a full programming language > > There is also some inconsistency here. If one use '%(trailers) the > > normalization doesn't happen. Only if using '%(trailers:only)' or some > > other option. > > > > (because optimization in format_trailer_info: > > /* If we want the whole block untouched, we can take the fast path. */) > > Maybe that's a bug. Peff, it looks like you added the above comment. > Do you think it's a bug? The original %(trailers) was "dump the trailers block verbatim". But that's not super useful for parsing individual trailers. So "only" actually starts parsing the individual trailers. I'd argue that the %(trailers) behavior is correct, but that %(trailers:only) should probably be doing less (i.e., just parsing and reporting what it finds, but not changing any names). -Peff
Jeff King <peff@peff.net> writes: > It's inevitable that you'll have to do some data cleanup like that > anyway. Lumping together prefixes isn't flexible enough to coverall > cases. Using trailer.*.key to manually map names covers more, but again > not all (e.g., if the project used to use "foo" but switched to "bar", > but the syntax of the value fields also changed at the same time, you'd > need to normalize those, too). > > So to me it boils down to: > > - returning the data as verbatim as possible when reading existing > trailers would give the least surprise to most people > > - real data collection is going to involve a separate post-processing > step which is better done in a full programming language Yeah, I agree that these are good guidelines to use when we think about the feature. > The original %(trailers) was "dump the trailers block verbatim". But > that's not super useful for parsing individual trailers. So "only" > actually starts parsing the individual trailers. I'd argue that the > %(trailers) behavior is correct, but that %(trailers:only) should > probably be doing less (i.e., just parsing and reporting what it finds, > but not changing any names). Yes, sounds sensible.
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh index 2e6d406edf..d5d19cf89b 100755 --- a/t/t7513-interpret-trailers.sh +++ b/t/t7513-interpret-trailers.sh @@ -99,6 +99,64 @@ test_expect_success 'with config option on the command line' ' test_cmp expected actual ' +test_expect_success 'parse normalizes spelling and separators from configs with key' ' + cat >patch <<-\EOF && + non-trailer-line + + ReviEweD-bY :abc + ReviEwEd-bY) rst + ReviEweD-BY ; xyz + aCked-bY: not normalized + EOF + cat >expected <<-\EOF && + Reviewed-By: abc + Reviewed-By: rst + Reviewed-By: xyz + aCked-bY: not normalized + EOF + git \ + -c "trailer.separators=:);" \ + -c "trailer.rb.key=Reviewed-By" \ + -c "trailer.Acked-By.ifmissing=doNothing" \ + interpret-trailers --parse patch >actual && + test_cmp expected actual +' + +# Matching currently is prefix matching, causing "This-trailer" to be normalized too +test_expect_failure 'config option matches exact only' ' + cat >patch <<-\EOF && + + This-trailer: a + b + This-trailer-exact: b + c + This-trailer-exact-plus-some: c + d + EOF + cat >expected <<-\EOF && + This-trailer: a b + THIS-TRAILER-EXACT: b c + This-trailer-exact-plus-some: c d + EOF + git -c "trailer.tte.key=THIS-TRAILER-EXACT" interpret-trailers --parse patch >actual && + test_cmp expected actual +' + +# Matching currently uses the config key even if key value is different +test_expect_failure 'config option matches exact only' ' + cat >patch <<-\EOF && + + Ticket: 1234 + Reference-ticket: 99 + EOF + cat >expected <<-\EOF && + Ticket: 1234 + Reference-Ticket: 99 + EOF + git -c "trailer.ticket.key=Reference-Ticket" interpret-trailers --parse patch >actual && + test_cmp expected actual +' + test_expect_success 'with only a title in the message' ' cat >expected <<-\EOF && area: change @@ -473,6 +531,18 @@ test_expect_success 'with config setup' ' test_cmp expected actual ' +# currently this matches the "Acked-by: " value in ack key set by previous test +test_expect_failure 'with config setup matches key exactly' ' + cat >expected <<-\EOF && + + A: B + EOF + git interpret-trailers --trailer "A=10" empty >actual && + test_cmp expected actual +' + + + test_expect_success 'with config setup and ":=" as separators' ' git config trailer.separators ":=" && git config trailer.ack.key "Acked-by= " &&
I noticed some undocumented and (at least to me) surprising behavior in trailers.c. When configuring a value in trailer.<token>.key it causes the trailer to be normalized to that in "git interpret-trailers --parse". E.g: $ printf '\naCKed: Zz\n' | \ git -c 'trailer.Acked.key=Acked' interpret-trailers --parse will emit: "Acked: Zz" but only if "key" is used, other config options doesn't cause it to be normalized. E.g: $ printf '\naCKed: Zz\n' | \ git -c 'trailer.Acked.ifmissing=doNothing' interpret-trailers --parse will emit: "aCKed: Zz" (still lowercase a and uppercase CK) Then there is the replacement by config "trailer.fix.key=Fixes" which expands "fix" to "Fixes". This happens when using "--trailer 'fix = 123'" which seems to be expected and useful behavior (albeit a bit unclear in documentation). But it also happens when parsing incoming trailers, e.g with that config $ printf "\nFix: 1\n" | git interpret-trailers --parse will emit: "Fixes: 1" (token_from_item prefers order .key, incoming token, .name) The most surprising thing is that it uses prefix matching when finding they key in configuration. If I have "trailer.reviewed.key=Reviewed-By" it is possible to just '--trailer r=XYZ' and it will find the reviewed-by trailer as "r" is a prefix of reviewedby. This also applies to the "--parse". This in makes it impossible to have trailer keys that are prefix of each other (e.g: "Acked", "Acked-Tests", "Acked-Docs") if there is multiple matching in configuration it will just pick the one that happens to come first. (token_matches_item uses strncasecmp with token length) I guess these are the questions for the above observations: * Should normalization of spelling happen at all? * If so should it only happen when there is a .key config? * Should replacement to what is in .key happen also in --parse mode, or only for "--trailer" * The prefix matching gotta be a bug, right? Here is a patch to the tests showing these things. From 49a4bb64a7ebf1f2d50897a024deb86b4f8056b1 Mon Sep 17 00:00:00 2001 From: Anders Waldenborg <anders@0x63.nu> Date: Mon, 27 Jul 2020 18:34:37 +0200 Subject: [PATCH] trailers: add tests for unclear/undocumented behavior --- t/t7513-interpret-trailers.sh | 70 +++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) -- 2.25.1