Message ID | pull.1322.git.git.1661992197.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | make test "linting" more comprehensive | expand |
On Thu, Sep 01, 2022 at 12:29:38AM +0000, Eric Sunshine via GitGitGadget wrote: > A while back, Peff successfully nerd-sniped[1] me into tackling a > long-brewing idea I had about (possibly) improving "chainlint" performance Oops, sorry. :) I gave this a read-through, and it looks sensible overall. I have to admit that I did not carefully check all of your regexes. Given the relatively low stakes of the code (as an internal build-time tool only) and the set of tests accompanying it, I'm willing to assume it's good enough until we see counter-examples. I posted some timings and thoughts on the use of threads elsewhere. But in the end the timings are close enough that I don't care that much either way. I'd also note that I got some first-hand experience with the script as I merged it with all of my other long-brewing topics, and it found a half dozen spots, mostly LOOP annotations. At least one was a real "oops, we'd miss a bug in Git here" spot. Several were "we'd probably notice the problem because the loop output wouldn't be as expected". One was a "we're on the left-hand of a pipe, so the exit code doesn't matter anyway" case, but I am more than happy to fix those if it lets us be linter-clean. The output took me a minute to adjust to, just because it feels pretty jumbled when there are several cases. Mostly this is because the script eats indentation. So it's hard to see the "# chainlint:" comment starts, let alone the ?! annotations. Here's an example: -- >8 -- # chainlint: t4070-diff-pairs.sh # chainlint: split input across multiple diff-pairs write_script split-raw-diff "$PERL_PATH" <<-EOF && git diff-tree -p -M -C -C base new > expect && git diff-tree -r -z -M -C -C base new | ./split-raw-diff && for i in diff* ; do git diff-pairs -p < $i ?!LOOP?! done > actual && test_cmp expect actual # chainlint: perf/p5305-pack-limits.sh # chainlint: set up delta islands head=$(git rev-parse HEAD) && git for-each-ref --format="delete %(refname)" | git update-ref --no-deref --stdin && n=0 && fork=0 && git rev-list --first-parent $head | while read commit ; do n=$((n+1)) ?!AMP?! if test "$n" = 100 ; then echo "create refs/forks/$fork/master $commit" ?!AMP?! fork=$((fork+1)) ?!AMP?! n=0 fi ?!LOOP?! done | git update-ref --stdin && git config pack.island "refs/forks/([0-9]*)/" -- 8< -- It wasn't too bad once I got the hang of it, but I wonder if a user writing a single test for the first time may get a bit overwhelmed. I assume that the indentation is removed as part of the normalization (I notice extra whitespace around "<", too). That might be hard to address. I wonder if color output for "# chainlint" and "?!" annotations would help, too. It looks like that may be tricky, though, because the annotations re-parsed internally in some cases. -Peff
On Sun, Sep 11, 2022 at 1:28 AM Jeff King <peff@peff.net> wrote: > On Thu, Sep 01, 2022 at 12:29:38AM +0000, Eric Sunshine via GitGitGadget wrote: > > A while back, Peff successfully nerd-sniped[1] me into tackling a > > long-brewing idea I had about (possibly) improving "chainlint" performance > > I gave this a read-through, and it looks sensible overall. I have to > admit that I did not carefully check all of your regexes. Given the > relatively low stakes of the code (as an internal build-time tool only) > and the set of tests accompanying it, I'm willing to assume it's good > enough until we see counter-examples. Thanks for the feedback. > I posted some timings and thoughts on the use of threads elsewhere. But > in the end the timings are close enough that I don't care that much > either way. I ran my eye over that message quickly and have been meaning to dig into it and give it a proper response but haven't yet found the time. > I'd also note that I got some first-hand experience with the script as I > merged it with all of my other long-brewing topics, and it found a half > dozen spots, mostly LOOP annotations. At least one was a real "oops, > we'd miss a bug in Git here" spot. Several were "we'd probably notice > the problem because the loop output wouldn't be as expected". One was a > "we're on the left-hand of a pipe, so the exit code doesn't matter > anyway" case, but I am more than happy to fix those if it lets us be > linter-clean. Indeed, I'm not super happy about the linter complaining about cases which obviously can't have an impact on the test's outcome, but (as mentioned elsewhere in the thread), finally convinced myself that the relatively low number of these was outweighed by the quite large number of cases caught by the linter which could have let real problems slip though. Perhaps some day the linter can be made smarter about these cases. > The output took me a minute to adjust to, just because it feels pretty > jumbled when there are several cases. Mostly this is because the > script eats indentation. So it's hard to see the "# chainlint:" comment > starts, let alone the ?! annotations. Here's an example: > [...snip...] > It wasn't too bad once I got the hang of it, but I wonder if a user > writing a single test for the first time may get a bit overwhelmed. I > assume that the indentation is removed as part of the normalization (I > notice extra whitespace around "<", too). That might be hard to address. The script implements a proper parser and lexer, and the lexer is tokenizing the input (throwing away whitespace in the process), thus by the time the parser notices something to complain about with a "?!FOO?!" annotation, the original whitespace is long gone, and it just emits the token stream with "?!FOO?!" inserted at the correct place. In retrospect, the way this perhaps should have been done would have been for the parser to instruct the lexer to emit a "?!FOO?!" annotation at the appropriate point in the input stream. But even that might get a bit hairy since there are cases in which the parser back-patches by removing some "?!AMP?!" annotations when it has decided that it doesn't need to complain about &&-chain breakage. I'm sure it's fixable, but don't know how important it is at this point. > I wonder if color output for "# chainlint" and "?!" annotations would > help, too. It looks like that may be tricky, though, because the > annotations re-parsed internally in some cases. I had the exact same thought about coloring the "# chainlint:" lines and "?!FOO?!" annotations, and how helpful that could be to anyone (not just newcomers). Aside from not having much free time these days, a big reason I didn't tackle it was because doing so properly probably means relying upon some third-party Perl module, and I intentionally wanted to keep the linter independent of add-on modules. Even without a "coloring" module of some sort, if Perl had a standard `curses` module (which it doesn't), then it would have been easy enough to ask `curses` for the proper color codes and apply them as needed. I'm old-school, so it doesn't appeal to me, but an alternative would be to assume it's safe to use ANSI color codes, but even that may have to be done carefully (i.e. checking TERM and accepting only some whitelisted entries, and worrying about about Windows consoles).
On Sun, Sep 11, 2022 at 03:01:41AM -0400, Eric Sunshine wrote: > > I wonder if color output for "# chainlint" and "?!" annotations would > > help, too. It looks like that may be tricky, though, because the > > annotations re-parsed internally in some cases. > > I had the exact same thought about coloring the "# chainlint:" lines > and "?!FOO?!" annotations, and how helpful that could be to anyone > (not just newcomers). Aside from not having much free time these days, > a big reason I didn't tackle it was because doing so properly probably > means relying upon some third-party Perl module, and I intentionally > wanted to keep the linter independent of add-on modules. Even without > a "coloring" module of some sort, if Perl had a standard `curses` > module (which it doesn't), then it would have been easy enough to ask > `curses` for the proper color codes and apply them as needed. I'm > old-school, so it doesn't appeal to me, but an alternative would be to > assume it's safe to use ANSI color codes, but even that may have to be > done carefully (i.e. checking TERM and accepting only some whitelisted > entries, and worrying about about Windows consoles). We're pretty happy to just use ANSI in the rest of Git, but there is a complication on Windows. See compat/winansi.c where we decode those internally into SetConsoleTextAttribute() calls. I think we can live with it as-is for now and see how people react. If lots of people are getting confused by the output, then that motivates finding a solution. If not, then it's probably not worth the time. -Peff
On Sun, Sep 11, 2022 at 2:31 PM Jeff King <peff@peff.net> wrote: > On Sun, Sep 11, 2022 at 03:01:41AM -0400, Eric Sunshine wrote: > > > I wonder if color output for "# chainlint" and "?!" annotations would > > > help, too. It looks like that may be tricky, though, because the > > > annotations re-parsed internally in some cases. > > > > I had the exact same thought about coloring the "# chainlint:" lines > > and "?!FOO?!" annotations, and how helpful that could be to anyone > > (not just newcomers). Aside from not having much free time these days, > > a big reason I didn't tackle it was because doing so properly probably > > means relying upon some third-party Perl module, and I intentionally > > wanted to keep the linter independent of add-on modules. Even without > > a "coloring" module of some sort, if Perl had a standard `curses` > > module (which it doesn't), then it would have been easy enough to ask > > `curses` for the proper color codes and apply them as needed. I'm > > old-school, so it doesn't appeal to me, but an alternative would be to > > assume it's safe to use ANSI color codes, but even that may have to be > > done carefully (i.e. checking TERM and accepting only some whitelisted > > entries, and worrying about about Windows consoles). > > We're pretty happy to just use ANSI in the rest of Git, but there is a > complication on Windows. See compat/winansi.c where we decode those > internally into SetConsoleTextAttribute() calls. > > I think we can live with it as-is for now and see how people react. If > lots of people are getting confused by the output, then that motivates > finding a solution. If not, then it's probably not worth the time. Well, you nerd-sniped me anyhow. The result is at [1]. Following the example of t/test-lib.sh, it uses `tput` if available to avoid hardcoding color codes, and `tput` is invoked lazily, only if it detects problems in the tests, so a normal (non-problematic) run doesn't incur the overhead of shelling out to `tput`. My first attempt just assumed ANSI color codes, but then I discovered the precedence set by t/test-lib.sh of using `tput`, so I went with that (since I'm old-school). The ANSI-only version was, of course, much simpler. [1]: https://lore.kernel.org/git/pull.1324.git.git.1663023888412.gitgitgadget@gmail.com/
On Mon, Sep 12, 2022 at 07:17:12PM -0400, Eric Sunshine wrote: > > I think we can live with it as-is for now and see how people react. If > > lots of people are getting confused by the output, then that motivates > > finding a solution. If not, then it's probably not worth the time. > > Well, you nerd-sniped me anyhow. The result is at [1]. Following the It seems we've discovered my true talent. :) > example of t/test-lib.sh, it uses `tput` if available to avoid > hardcoding color codes, and `tput` is invoked lazily, only if it > detects problems in the tests, so a normal (non-problematic) run > doesn't incur the overhead of shelling out to `tput`. Ah, of course. I didn't think about the fact that the regular tests already had to deal with this problem. Following that lead makes perfect sense. -Peff