Message ID | 20240324214316.917513-1-brianmlyles@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | pretty: find pretty formats case-insensitively | expand |
On Sun, Mar 24, 2024 at 04:43:09PM -0500, Brian Lyles wrote: > User-defined pretty formats are stored in config, which is meant to use > case-insensitive matching for names as noted in config.txt's 'Syntax' > section: > > All the other lines [...] are recognized as setting variables, in > the form 'name = value' [...]. The variable names are > case-insensitive, [...]. > > When a user specifies one of their format aliases with an uppercase in > it, however, it is not found. > > $ git config pretty.testAlias %h > $ git config --list | grep pretty > pretty.testalias=%h > $ git log --format=testAlias -1 > fatal: invalid --pretty format: testAlias > $ git log --format=testalias -1 > 3c2a3fdc38 Yeah, I agree that case-insensitive matching makes more sense here due to the nature of config keys, especially given this: > This is true whether the name in the config file uses any uppercase > characters or not. I.e., the config code is going to normalize the variable names already, so we must match (even if the user consistently specifies camelCase). But... > static struct cmt_fmt_map *find_commit_format(const char *sought) > { > + struct cmt_fmt_map *result; > + char *sought_lower; > + > if (!commit_formats) > setup_commit_formats(); > > - return find_commit_format_recursive(sought, sought, 0); > + /* > + * The sought name will be compared to config names that have already > + * been normalized to lowercase. > + */ > + sought_lower = xstrdup_tolower(sought); > + result = find_commit_format_recursive(sought_lower, sought_lower, 0); > + free(sought_lower); > + return result; > } The mention of "recursive" in the function we call made me what wonder if we'd need more normalization. And I think we do. Try this modification to your test: diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh index 321e305979..be549b1d4b 100755 --- a/t/t4205-log-pretty-formats.sh +++ b/t/t4205-log-pretty-formats.sh @@ -61,8 +61,9 @@ test_expect_success 'alias user-defined format' ' test_expect_success 'alias user-defined format is matched case-insensitively' ' git log --pretty="format:%h" >expected && - git config pretty.testalias "format:%h" && - git log --pretty=testAlias >actual && + git config pretty.testone "format:%h" && + git config pretty.testtwo testOne && + git log --pretty=testTwo >actual && test_cmp expected actual ' which fails because looking up "testOne" in the recursion won't work. So I think we'd want to simply match case-insensitively inside the function, like: diff --git a/pretty.c b/pretty.c index 50825c9d25..10f71ee004 100644 --- a/pretty.c +++ b/pretty.c @@ -147,7 +147,7 @@ static struct cmt_fmt_map *find_commit_format_recursive(const char *sought, for (i = 0; i < commit_formats_len; i++) { size_t match_len; - if (!starts_with(commit_formats[i].name, sought)) + if (!istarts_with(commit_formats[i].name, sought)) continue; match_len = strlen(commit_formats[i].name); And then you would not even need to normalize it in find_commit_format(). > +test_expect_success 'alias user-defined format is matched case-insensitively' ' > + git log --pretty="format:%h" >expected && > + git config pretty.testalias "format:%h" && > + git log --pretty=testAlias >actual && > + test_cmp expected actual > +' Modern style would be to use "test_config" here (or just "git -c"), but I see the surrounding tests are too old to do so. So I'd be OK with matching them (but cleaning up all of the surrounding ones would be nice, too). -Peff PS The matching rules in find_commit_format_recursive() seem weird to me. We do a prefix match, and then return the entry whose name is the shortest? And break ties based on which came first? So: git -c pretty.abcd=format:one \ -c pretty.abc=format:two \ -c pretty.abd=format:three \ log -1 --format=ab quietly chooses "two". I guess the "shortest wins" is meant to allow "foo" to be chosen over "foobar" if you specify the whole name. But the fact that we don't flag an ambiguity between "abc" and "abd" seems strange. That is all orthogonal to your patch, of course, but just a head-scratcher I noticed while looking at the code.
Hi Peff Thanks for the review. On Mon, Mar 25, 2024 at 1:14 AM Jeff King <peff@peff.net> wrote: > The mention of "recursive" in the function we call made me what wonder > if we'd need more normalization. And I think we do. Try this > modification to your test: > > diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh > index 321e305979..be549b1d4b 100755 > --- a/t/t4205-log-pretty-formats.sh > +++ b/t/t4205-log-pretty-formats.sh > @@ -61,8 +61,9 @@ test_expect_success 'alias user-defined format' ' > > test_expect_success 'alias user-defined format is matched case-insensitively' ' > git log --pretty="format:%h" >expected && > - git config pretty.testalias "format:%h" && > - git log --pretty=testAlias >actual && > + git config pretty.testone "format:%h" && > + git config pretty.testtwo testOne && > + git log --pretty=testTwo >actual && > test_cmp expected actual > ' > > > which fails because looking up "testOne" in the recursion won't work. So > I think we'd want to simply match case-insensitively inside the > function, like: > > diff --git a/pretty.c b/pretty.c > index 50825c9d25..10f71ee004 100644 > --- a/pretty.c > +++ b/pretty.c > @@ -147,7 +147,7 @@ static struct cmt_fmt_map *find_commit_format_recursive(const char *sought, > for (i = 0; i < commit_formats_len; i++) { > size_t match_len; > > - if (!starts_with(commit_formats[i].name, sought)) > + if (!istarts_with(commit_formats[i].name, sought)) > continue; > > match_len = strlen(commit_formats[i].name); > > And then you would not even need to normalize it in > find_commit_format(). Good catch -- you're absolutely right, and simply switching to `istarts_with` is a more elegant solution than my initial patch. I'll switch to this approach in a v2 re-roll. >> +test_expect_success 'alias user-defined format is matched case-insensitively' ' >> + git log --pretty="format:%h" >expected && >> + git config pretty.testalias "format:%h" && >> + git log --pretty=testAlias >actual && >> + test_cmp expected actual >> +' > > Modern style would be to use "test_config" here (or just "git -c"), but > I see the surrounding tests are too old to do so. So I'd be OK with > matching them (but cleaning up all of the surrounding ones would be > nice, too). Thanks for the tip. Updating the existing tests in this file to use `test_config` looks to be fairly trivial, so I will start v2 with a patch that does that as well. I'm opting for `test_config` over `git -c` for no real reason other than they seem roughly equivalent, but `test_config` still ends up calling `git config` which seems slightly more realistic to how pretty formats would be defined normally. > PS The matching rules in find_commit_format_recursive() seem weird > to me. We do a prefix match, and then return the entry whose name is > the shortest? And break ties based on which came first? So: > > git -c pretty.abcd=format:one \ > -c pretty.abc=format:two \ > -c pretty.abd=format:three \ > log -1 --format=ab > > quietly chooses "two". I guess the "shortest wins" is meant to allow > "foo" to be chosen over "foobar" if you specify the whole name. But > the fact that we don't flag an ambiguity between "abc" and "abd" > seems strange. > > That is all orthogonal to your patch, of course, but just a > head-scratcher I noticed while looking at the code. I agree that this behavior is somewhat odd. I'm not sure what we would want to do about it at this point -- any change would technically be breaking, I assume. Regardless, not something I'd scope into this patch, but good observation.
Jeff King <peff@peff.net> writes: > The mention of "recursive" in the function we call made me what wonder > if we'd need more normalization. And I think we do. Try this > modification to your test: > > diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh > index 321e305979..be549b1d4b 100755 > --- a/t/t4205-log-pretty-formats.sh > +++ b/t/t4205-log-pretty-formats.sh > @@ -61,8 +61,9 @@ test_expect_success 'alias user-defined format' ' > > test_expect_success 'alias user-defined format is matched case-insensitively' ' > git log --pretty="format:%h" >expected && > - git config pretty.testalias "format:%h" && > - git log --pretty=testAlias >actual && > + git config pretty.testone "format:%h" && > + git config pretty.testtwo testOne && > + git log --pretty=testTwo >actual && > test_cmp expected actual > ' Very good thinking. I totally missed the short-cut to another short-cut while reading the patch. >> +test_expect_success 'alias user-defined format is matched case-insensitively' ' >> + git log --pretty="format:%h" >expected && >> + git config pretty.testalias "format:%h" && >> + git log --pretty=testAlias >actual && >> + test_cmp expected actual >> +' > > Modern style would be to use "test_config" here (or just "git -c"), but > I see the surrounding tests are too old to do so. So I'd be OK with > matching them (but cleaning up all of the surrounding ones would be > nice, too). Yup. I do not mind seeing it done either way, as a preliminary clean-up before the main fix, just a fix with more modern style while leaving the clean-up as #leftoverbits to be done after the dust settles. > PS The matching rules in find_commit_format_recursive() seem weird > to me. We do a prefix match, and then return the entry whose name is > the shortest? And break ties based on which came first? So: > > git -c pretty.abcd=format:one \ > -c pretty.abc=format:two \ > -c pretty.abd=format:three \ > log -1 --format=ab > > quietly chooses "two". I guess the "shortest wins" is meant to allow > "foo" to be chosen over "foobar" if you specify the whole name. But > the fact that we don't flag an ambiguity between "abc" and "abd" > seems strange. > That is all orthogonal to your patch, of course, but just a > head-scratcher I noticed while looking at the code. I think it is not just strange but outright wrong. I agree that it is orthogonal to this fix. Thanks, both.
diff --git a/pretty.c b/pretty.c index cf964b060c..78ec7a75ff 100644 --- a/pretty.c +++ b/pretty.c @@ -168,10 +168,20 @@ static struct cmt_fmt_map *find_commit_format_recursive(const char *sought, static struct cmt_fmt_map *find_commit_format(const char *sought) { + struct cmt_fmt_map *result; + char *sought_lower; + if (!commit_formats) setup_commit_formats(); - return find_commit_format_recursive(sought, sought, 0); + /* + * The sought name will be compared to config names that have already + * been normalized to lowercase. + */ + sought_lower = xstrdup_tolower(sought); + result = find_commit_format_recursive(sought_lower, sought_lower, 0); + free(sought_lower); + return result; } void get_commit_format(const char *arg, struct rev_info *rev) diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh index e3d655e6b8..321e305979 100755 --- a/t/t4205-log-pretty-formats.sh +++ b/t/t4205-log-pretty-formats.sh @@ -59,6 +59,13 @@ test_expect_success 'alias user-defined format' ' test_cmp expected actual ' +test_expect_success 'alias user-defined format is matched case-insensitively' ' + git log --pretty="format:%h" >expected && + git config pretty.testalias "format:%h" && + git log --pretty=testAlias >actual && + test_cmp expected actual +' + test_expect_success 'alias user-defined tformat with %s (ISO8859-1 encoding)' ' git config i18n.logOutputEncoding $test_encoding && git log --oneline >expected-s &&
User-defined pretty formats are stored in config, which is meant to use case-insensitive matching for names as noted in config.txt's 'Syntax' section: All the other lines [...] are recognized as setting variables, in the form 'name = value' [...]. The variable names are case-insensitive, [...]. When a user specifies one of their format aliases with an uppercase in it, however, it is not found. $ git config pretty.testAlias %h $ git config --list | grep pretty pretty.testalias=%h $ git log --format=testAlias -1 fatal: invalid --pretty format: testAlias $ git log --format=testalias -1 3c2a3fdc38 This is true whether the name in the config file uses any uppercase characters or not. Normalize the format name specified via `--format` to lowercase so that format aliases are found case-insensitively. The format aliases loaded from config against which this name is compared are already normalized to lowercase since they are loaded through `git_config()`. `xstrdup_tolower` is used instead of modifying the string in-place to ensure that the error shown to the user when the format is not found has the same casing that the user entered. Otherwise, the mismatch may be confusing to the user. Signed-off-by: Brian Lyles <brianmlyles@gmail.com> --- pretty.c | 12 +++++++++++- t/t4205-log-pretty-formats.sh | 7 +++++++ 2 files changed, 18 insertions(+), 1 deletion(-)