mbox series

[00/18] make test "linting" more comprehensive

Message ID pull.1322.git.git.1661992197.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series make test "linting" more comprehensive | expand

Message

Jean-Noël Avila via GitGitGadget Sept. 1, 2022, 12:29 a.m. UTC
A while back, Peff successfully nerd-sniped[1] me into tackling a
long-brewing idea I had about (possibly) improving "chainlint" performance
by linting all tests in all scripts with a single command invocation instead
of running "sed" 26800+ times (once for each test). The new linter
introduced by this series can check all test definitions in the entire
project in a single invocation, and each test definition is checked only
once no matter how many times the test is actually run (unlike chainlint.sed
which will check a test repeatedly if, for instance, the test is run in a
loop). Moreover, all test definitions in the project are "linted" even if
some of those tests would not run on a particular platform or under a
certain configuration (unlike chainlint.sed which only lints tests which
actually run).

The new linter is a good deal smarter than chainlint.sed and understands not
just shell syntax but also some semantics of test construction, unlike
chainlint.sed which is merely heuristics-based. For instance, the new linter
recognizes cases when a broken &&-chain is legitimate, such as when "$?" is
handled explicitly or when a failure is signaled directly with "false", in
which case the &&-chain leading up to the "false" is immaterial, as well as
other cases. Unlike chainlint.sed, it recognizes that a semicolon after the
last command in a compound statement is harmless, thus won't interpret the
semicolon as breaking the &&-chain.

The new linter also provides considerably better coverage for broken
&&-chains. The "magic exit code 117" &&-chain checker built into test-lib.sh
only works for top-level command invocations; it doesn't work within "{...}"
groups, "(...)" subshells, "$(...)" substitutions, or within bodies of
compound statements, such as "if", "for", "while", "case", etc.
chainlint.sed partly fills the gap by catching broken &&-chains in "(...)"
subshells one level deep, but bugs can still lurk behind broken &&-chains in
the other cases. The new linter catches broken &&-chains within all those
constructs to any depth.

Another important improvement is that the new linter understands that shell
loops do not terminate automatically when a command in the loop body fails,
and that the condition needs to be handled explicitly by the test author by
using "|| return 1" (or "|| exit 1" in a subshell) to signal failure.
Consequently, the new linter will complain when a loop is lacking "|| return
1" (or "|| exit 1").

Finally, unlike chainlint.sed which (not surprisingly) is implemented in
"sed", the new linter is written in Perl, thus should be more accessible to
a wider audience, and is structured as a traditional top-down parser which
makes it much easier to reason about.

The new linter could eventually subsume other linting tasks such as
check-nonportable-shell.pl (which itself takes a couple seconds to run on my
machine), though it probably should be renamed to something other than
"chainlint" since it is no longer targeted only at spotting &&-chain breaks,
but that can wait for another day.

Ævar offered some sensible comments[2,3] about optimizing the Makefile rules
related to chainlint, but those optimizations are not tackled here for a few
reasons: (1) this series is already quite long, (2) I'd like to keep the
series focused on its primary goal of installing a new and improved linter,
(3) these patches do not make the Makefile situation any worse[4], and (4)
those optimizations can easily be done atop this series[5].

Junio: This series is nominally atop es/t4301-sed-portability-fix which is
in "next", and es/fix-chained-tests, es/test-chain-lint, and es/chainlint,
all of which are already in "master".

Dscho: This series conflicts with some patches carried only by the Git for
Windows project; the resolutions are obvious and simple. The new linter also
identifies some problems in tests carried only by the Git for Windows
project.

[1] https://lore.kernel.org/git/YJzGcZpZ+E9R0gYd@coredump.intra.peff.net/
[2]
https://lore.kernel.org/git/RFC-patch-1.1-bb3f1577829-20211213T095456Z-avarab@gmail.com/
[3] https://lore.kernel.org/git/211213.86tufc8oop.gmgdl@evledraar.gmail.com/
[4]
https://lore.kernel.org/git/CAPig+cSFtpt6ExbVDbcx3tZodrKFuM-r2GMW4TQ2tJmLvHBFtQ@mail.gmail.com/
[5] https://lore.kernel.org/git/211214.86tufbbbu3.gmgdl@evledraar.gmail.com/

Eric Sunshine (18):
  t: add skeleton chainlint.pl
  chainlint.pl: add POSIX shell lexical analyzer
  chainlint.pl: add POSIX shell parser
  chainlint.pl: add parser to validate tests
  chainlint.pl: add parser to identify test definitions
  chainlint.pl: validate test scripts in parallel
  chainlint.pl: don't require `return|exit|continue` to end with `&&`
  t/Makefile: apply chainlint.pl to existing self-tests
  chainlint.pl: don't require `&` background command to end with `&&`
  chainlint.pl: don't flag broken &&-chain if `$?` handled explicitly
  chainlint.pl: don't flag broken &&-chain if failure indicated
    explicitly
  chainlint.pl: complain about loops lacking explicit failure handling
  chainlint.pl: allow `|| echo` to signal failure upstream of a pipe
  t/chainlint: add more chainlint.pl self-tests
  test-lib: retire "lint harder" optimization hack
  test-lib: replace chainlint.sed with chainlint.pl
  t/Makefile: teach `make test` and `make prove` to run chainlint.pl
  t: retire unused chainlint.sed

 contrib/buildsystems/CMakeLists.txt           |   2 +-
 t/Makefile                                    |  49 +-
 t/README                                      |   5 -
 t/chainlint.pl                                | 730 ++++++++++++++++++
 t/chainlint.sed                               | 399 ----------
 t/chainlint/blank-line-before-esac.expect     |  18 +
 t/chainlint/blank-line-before-esac.test       |  19 +
 t/chainlint/block.expect                      |  15 +-
 t/chainlint/block.test                        |  15 +-
 t/chainlint/chain-break-background.expect     |   9 +
 t/chainlint/chain-break-background.test       |  10 +
 t/chainlint/chain-break-continue.expect       |  12 +
 t/chainlint/chain-break-continue.test         |  13 +
 t/chainlint/chain-break-false.expect          |   9 +
 t/chainlint/chain-break-false.test            |  10 +
 t/chainlint/chain-break-return-exit.expect    |  19 +
 t/chainlint/chain-break-return-exit.test      |  23 +
 t/chainlint/chain-break-status.expect         |   9 +
 t/chainlint/chain-break-status.test           |  11 +
 t/chainlint/chained-block.expect              |   9 +
 t/chainlint/chained-block.test                |  11 +
 t/chainlint/chained-subshell.expect           |  10 +
 t/chainlint/chained-subshell.test             |  13 +
 .../command-substitution-subsubshell.expect   |   2 +
 .../command-substitution-subsubshell.test     |   3 +
 t/chainlint/complex-if-in-cuddled-loop.expect |   2 +-
 t/chainlint/double-here-doc.expect            |   2 +
 t/chainlint/double-here-doc.test              |  12 +
 t/chainlint/dqstring-line-splice.expect       |   3 +
 t/chainlint/dqstring-line-splice.test         |   7 +
 t/chainlint/dqstring-no-interpolate.expect    |  11 +
 t/chainlint/dqstring-no-interpolate.test      |  15 +
 t/chainlint/empty-here-doc.expect             |   3 +
 t/chainlint/empty-here-doc.test               |   5 +
 t/chainlint/exclamation.expect                |   4 +
 t/chainlint/exclamation.test                  |   8 +
 t/chainlint/for-loop-abbreviated.expect       |   5 +
 t/chainlint/for-loop-abbreviated.test         |   6 +
 t/chainlint/for-loop.expect                   |   4 +-
 t/chainlint/function.expect                   |  11 +
 t/chainlint/function.test                     |  13 +
 t/chainlint/here-doc-indent-operator.expect   |   5 +
 t/chainlint/here-doc-indent-operator.test     |  13 +
 t/chainlint/here-doc-multi-line-string.expect |   3 +-
 t/chainlint/if-condition-split.expect         |   7 +
 t/chainlint/if-condition-split.test           |   8 +
 t/chainlint/if-in-loop.expect                 |   2 +-
 t/chainlint/if-in-loop.test                   |   2 +-
 t/chainlint/loop-detect-failure.expect        |  15 +
 t/chainlint/loop-detect-failure.test          |  17 +
 t/chainlint/loop-detect-status.expect         |  18 +
 t/chainlint/loop-detect-status.test           |  19 +
 t/chainlint/loop-in-if.expect                 |   2 +-
 t/chainlint/loop-upstream-pipe.expect         |  10 +
 t/chainlint/loop-upstream-pipe.test           |  11 +
 t/chainlint/multi-line-string.expect          |  11 +-
 t/chainlint/nested-loop-detect-failure.expect |  31 +
 t/chainlint/nested-loop-detect-failure.test   |  35 +
 t/chainlint/nested-subshell.expect            |   2 +-
 t/chainlint/one-liner-for-loop.expect         |   9 +
 t/chainlint/one-liner-for-loop.test           |  10 +
 t/chainlint/return-loop.expect                |   5 +
 t/chainlint/return-loop.test                  |   6 +
 t/chainlint/semicolon.expect                  |   2 +-
 t/chainlint/sqstring-in-sqstring.expect       |   4 +
 t/chainlint/sqstring-in-sqstring.test         |   5 +
 t/chainlint/t7900-subtree.expect              |  13 +-
 t/chainlint/token-pasting.expect              |  27 +
 t/chainlint/token-pasting.test                |  32 +
 t/chainlint/while-loop.expect                 |   4 +-
 t/t0027-auto-crlf.sh                          |   7 +-
 t/t3070-wildmatch.sh                          |   5 -
 t/test-lib.sh                                 |  12 +-
 73 files changed, 1439 insertions(+), 449 deletions(-)
 create mode 100755 t/chainlint.pl
 delete mode 100644 t/chainlint.sed
 create mode 100644 t/chainlint/blank-line-before-esac.expect
 create mode 100644 t/chainlint/blank-line-before-esac.test
 create mode 100644 t/chainlint/chain-break-background.expect
 create mode 100644 t/chainlint/chain-break-background.test
 create mode 100644 t/chainlint/chain-break-continue.expect
 create mode 100644 t/chainlint/chain-break-continue.test
 create mode 100644 t/chainlint/chain-break-false.expect
 create mode 100644 t/chainlint/chain-break-false.test
 create mode 100644 t/chainlint/chain-break-return-exit.expect
 create mode 100644 t/chainlint/chain-break-return-exit.test
 create mode 100644 t/chainlint/chain-break-status.expect
 create mode 100644 t/chainlint/chain-break-status.test
 create mode 100644 t/chainlint/chained-block.expect
 create mode 100644 t/chainlint/chained-block.test
 create mode 100644 t/chainlint/chained-subshell.expect
 create mode 100644 t/chainlint/chained-subshell.test
 create mode 100644 t/chainlint/command-substitution-subsubshell.expect
 create mode 100644 t/chainlint/command-substitution-subsubshell.test
 create mode 100644 t/chainlint/double-here-doc.expect
 create mode 100644 t/chainlint/double-here-doc.test
 create mode 100644 t/chainlint/dqstring-line-splice.expect
 create mode 100644 t/chainlint/dqstring-line-splice.test
 create mode 100644 t/chainlint/dqstring-no-interpolate.expect
 create mode 100644 t/chainlint/dqstring-no-interpolate.test
 create mode 100644 t/chainlint/empty-here-doc.expect
 create mode 100644 t/chainlint/empty-here-doc.test
 create mode 100644 t/chainlint/exclamation.expect
 create mode 100644 t/chainlint/exclamation.test
 create mode 100644 t/chainlint/for-loop-abbreviated.expect
 create mode 100644 t/chainlint/for-loop-abbreviated.test
 create mode 100644 t/chainlint/function.expect
 create mode 100644 t/chainlint/function.test
 create mode 100644 t/chainlint/here-doc-indent-operator.expect
 create mode 100644 t/chainlint/here-doc-indent-operator.test
 create mode 100644 t/chainlint/if-condition-split.expect
 create mode 100644 t/chainlint/if-condition-split.test
 create mode 100644 t/chainlint/loop-detect-failure.expect
 create mode 100644 t/chainlint/loop-detect-failure.test
 create mode 100644 t/chainlint/loop-detect-status.expect
 create mode 100644 t/chainlint/loop-detect-status.test
 create mode 100644 t/chainlint/loop-upstream-pipe.expect
 create mode 100644 t/chainlint/loop-upstream-pipe.test
 create mode 100644 t/chainlint/nested-loop-detect-failure.expect
 create mode 100644 t/chainlint/nested-loop-detect-failure.test
 create mode 100644 t/chainlint/one-liner-for-loop.expect
 create mode 100644 t/chainlint/one-liner-for-loop.test
 create mode 100644 t/chainlint/return-loop.expect
 create mode 100644 t/chainlint/return-loop.test
 create mode 100644 t/chainlint/sqstring-in-sqstring.expect
 create mode 100644 t/chainlint/sqstring-in-sqstring.test
 create mode 100644 t/chainlint/token-pasting.expect
 create mode 100644 t/chainlint/token-pasting.test


base-commit: d42b38dfb5edf1a7fddd9542d722f91038407819
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1322%2Fsunshineco%2Fchainlintperl-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1322/sunshineco/chainlintperl-v1
Pull-Request: https://github.com/git/git/pull/1322

Comments

Jeff King Sept. 11, 2022, 5:28 a.m. UTC | #1
On Thu, Sep 01, 2022 at 12:29:38AM +0000, Eric Sunshine via GitGitGadget wrote:

> A while back, Peff successfully nerd-sniped[1] me into tackling a
> long-brewing idea I had about (possibly) improving "chainlint" performance

Oops, sorry. :)

I gave this a read-through, and it looks sensible overall. I have to
admit that I did not carefully check all of your regexes. Given the
relatively low stakes of the code (as an internal build-time tool only)
and the set of tests accompanying it, I'm willing to assume it's good
enough until we see counter-examples.

I posted some timings and thoughts on the use of threads elsewhere. But
in the end the timings are close enough that I don't care that much
either way.

I'd also note that I got some first-hand experience with the script as I
merged it with all of my other long-brewing topics, and it found a half
dozen spots, mostly LOOP annotations. At least one was a real "oops,
we'd miss a bug in Git here" spot. Several were "we'd probably notice
the problem because the loop output wouldn't be as expected". One was a
"we're on the left-hand of a pipe, so the exit code doesn't matter
anyway" case, but I am more than happy to fix those if it lets us be
linter-clean.

The output took me a minute to adjust to, just because it feels pretty
jumbled when there are several cases. Mostly this is because the
script eats indentation. So it's hard to see the "# chainlint:" comment
starts, let alone the ?! annotations. Here's an example:

-- >8 --
# chainlint: t4070-diff-pairs.sh
# chainlint: split input across multiple diff-pairs
write_script split-raw-diff "$PERL_PATH" <<-EOF &&

git diff-tree -p -M -C -C base new > expect &&

git diff-tree -r -z -M -C -C base new |
./split-raw-diff &&
for i in diff* ; do
git diff-pairs -p < $i ?!LOOP?!
done > actual &&
test_cmp expect actual
# chainlint: perf/p5305-pack-limits.sh
# chainlint: set up delta islands
head=$(git rev-parse HEAD) &&
git for-each-ref --format="delete %(refname)" |
git update-ref --no-deref --stdin &&

n=0 &&
fork=0 &&
git rev-list --first-parent $head |
while read commit ; do
n=$((n+1)) ?!AMP?!
if test "$n" = 100 ; then
echo "create refs/forks/$fork/master $commit" ?!AMP?!
fork=$((fork+1)) ?!AMP?!
n=0
fi ?!LOOP?!
done |
git update-ref --stdin &&

git config pack.island "refs/forks/([0-9]*)/"
-- 8< --

It wasn't too bad once I got the hang of it, but I wonder if a user
writing a single test for the first time may get a bit overwhelmed.  I
assume that the indentation is removed as part of the normalization (I
notice extra whitespace around "<", too). That might be hard to address.

I wonder if color output for "# chainlint" and "?!" annotations would
help, too. It looks like that may be tricky, though, because the
annotations re-parsed internally in some cases.

-Peff
Eric Sunshine Sept. 11, 2022, 7:01 a.m. UTC | #2
On Sun, Sep 11, 2022 at 1:28 AM Jeff King <peff@peff.net> wrote:
> On Thu, Sep 01, 2022 at 12:29:38AM +0000, Eric Sunshine via GitGitGadget wrote:
> > A while back, Peff successfully nerd-sniped[1] me into tackling a
> > long-brewing idea I had about (possibly) improving "chainlint" performance
>
> I gave this a read-through, and it looks sensible overall. I have to
> admit that I did not carefully check all of your regexes. Given the
> relatively low stakes of the code (as an internal build-time tool only)
> and the set of tests accompanying it, I'm willing to assume it's good
> enough until we see counter-examples.

Thanks for the feedback.

> I posted some timings and thoughts on the use of threads elsewhere. But
> in the end the timings are close enough that I don't care that much
> either way.

I ran my eye over that message quickly and have been meaning to dig
into it and give it a proper response but haven't yet found the time.

> I'd also note that I got some first-hand experience with the script as I
> merged it with all of my other long-brewing topics, and it found a half
> dozen spots, mostly LOOP annotations. At least one was a real "oops,
> we'd miss a bug in Git here" spot. Several were "we'd probably notice
> the problem because the loop output wouldn't be as expected". One was a
> "we're on the left-hand of a pipe, so the exit code doesn't matter
> anyway" case, but I am more than happy to fix those if it lets us be
> linter-clean.

Indeed, I'm not super happy about the linter complaining about cases
which obviously can't have an impact on the test's outcome, but (as
mentioned elsewhere in the thread), finally convinced myself that the
relatively low number of these was outweighed by the quite large
number of cases caught by the linter which could have let real
problems slip though. Perhaps some day the linter can be made smarter
about these cases.

> The output took me a minute to adjust to, just because it feels pretty
> jumbled when there are several cases. Mostly this is because the
> script eats indentation. So it's hard to see the "# chainlint:" comment
> starts, let alone the ?! annotations. Here's an example:
> [...snip...]
> It wasn't too bad once I got the hang of it, but I wonder if a user
> writing a single test for the first time may get a bit overwhelmed.  I
> assume that the indentation is removed as part of the normalization (I
> notice extra whitespace around "<", too). That might be hard to address.

The script implements a proper parser and lexer, and the lexer is
tokenizing the input (throwing away whitespace in the process), thus
by the time the parser notices something to complain about with a
"?!FOO?!" annotation, the original whitespace is long gone, and it
just emits the token stream with "?!FOO?!" inserted at the correct
place. In retrospect, the way this perhaps should have been done would
have been for the parser to instruct the lexer to emit a "?!FOO?!"
annotation at the appropriate point in the input stream. But even that
might get a bit hairy since there are cases in which the parser
back-patches by removing some "?!AMP?!" annotations when it has
decided that it doesn't need to complain about &&-chain breakage. I'm
sure it's fixable, but don't know how important it is at this point.

> I wonder if color output for "# chainlint" and "?!" annotations would
> help, too. It looks like that may be tricky, though, because the
> annotations re-parsed internally in some cases.

I had the exact same thought about coloring the "# chainlint:" lines
and "?!FOO?!" annotations, and how helpful that could be to anyone
(not just newcomers). Aside from not having much free time these days,
a big reason I didn't tackle it was because doing so properly probably
means relying upon some third-party Perl module, and I intentionally
wanted to keep the linter independent of add-on modules. Even without
a "coloring" module of some sort, if Perl had a standard `curses`
module (which it doesn't), then it would have been easy enough to ask
`curses` for the proper color codes and apply them as needed. I'm
old-school, so it doesn't appeal to me, but an alternative would be to
assume it's safe to use ANSI color codes, but even that may have to be
done carefully (i.e. checking TERM and accepting only some whitelisted
entries, and worrying about about Windows consoles).
Jeff King Sept. 11, 2022, 6:31 p.m. UTC | #3
On Sun, Sep 11, 2022 at 03:01:41AM -0400, Eric Sunshine wrote:

> > I wonder if color output for "# chainlint" and "?!" annotations would
> > help, too. It looks like that may be tricky, though, because the
> > annotations re-parsed internally in some cases.
> 
> I had the exact same thought about coloring the "# chainlint:" lines
> and "?!FOO?!" annotations, and how helpful that could be to anyone
> (not just newcomers). Aside from not having much free time these days,
> a big reason I didn't tackle it was because doing so properly probably
> means relying upon some third-party Perl module, and I intentionally
> wanted to keep the linter independent of add-on modules. Even without
> a "coloring" module of some sort, if Perl had a standard `curses`
> module (which it doesn't), then it would have been easy enough to ask
> `curses` for the proper color codes and apply them as needed. I'm
> old-school, so it doesn't appeal to me, but an alternative would be to
> assume it's safe to use ANSI color codes, but even that may have to be
> done carefully (i.e. checking TERM and accepting only some whitelisted
> entries, and worrying about about Windows consoles).

We're pretty happy to just use ANSI in the rest of Git, but there is a
complication on Windows. See compat/winansi.c where we decode those
internally into SetConsoleTextAttribute() calls.

I think we can live with it as-is for now and see how people react. If
lots of people are getting confused by the output, then that motivates
finding a solution. If not, then it's probably not worth the time.

-Peff
Eric Sunshine Sept. 12, 2022, 11:17 p.m. UTC | #4
On Sun, Sep 11, 2022 at 2:31 PM Jeff King <peff@peff.net> wrote:
> On Sun, Sep 11, 2022 at 03:01:41AM -0400, Eric Sunshine wrote:
> > > I wonder if color output for "# chainlint" and "?!" annotations would
> > > help, too. It looks like that may be tricky, though, because the
> > > annotations re-parsed internally in some cases.
> >
> > I had the exact same thought about coloring the "# chainlint:" lines
> > and "?!FOO?!" annotations, and how helpful that could be to anyone
> > (not just newcomers). Aside from not having much free time these days,
> > a big reason I didn't tackle it was because doing so properly probably
> > means relying upon some third-party Perl module, and I intentionally
> > wanted to keep the linter independent of add-on modules. Even without
> > a "coloring" module of some sort, if Perl had a standard `curses`
> > module (which it doesn't), then it would have been easy enough to ask
> > `curses` for the proper color codes and apply them as needed. I'm
> > old-school, so it doesn't appeal to me, but an alternative would be to
> > assume it's safe to use ANSI color codes, but even that may have to be
> > done carefully (i.e. checking TERM and accepting only some whitelisted
> > entries, and worrying about about Windows consoles).
>
> We're pretty happy to just use ANSI in the rest of Git, but there is a
> complication on Windows. See compat/winansi.c where we decode those
> internally into SetConsoleTextAttribute() calls.
>
> I think we can live with it as-is for now and see how people react. If
> lots of people are getting confused by the output, then that motivates
> finding a solution. If not, then it's probably not worth the time.

Well, you nerd-sniped me anyhow. The result is at [1]. Following the
example of t/test-lib.sh, it uses `tput` if available to avoid
hardcoding color codes, and `tput` is invoked lazily, only if it
detects problems in the tests, so a normal (non-problematic) run
doesn't incur the overhead of shelling out to `tput`.

My first attempt just assumed ANSI color codes, but then I discovered
the precedence set by t/test-lib.sh of using `tput`, so I went with
that (since I'm old-school). The ANSI-only version was, of course,
much simpler.

[1]: https://lore.kernel.org/git/pull.1324.git.git.1663023888412.gitgitgadget@gmail.com/
Jeff King Sept. 13, 2022, 12:04 a.m. UTC | #5
On Mon, Sep 12, 2022 at 07:17:12PM -0400, Eric Sunshine wrote:

> > I think we can live with it as-is for now and see how people react. If
> > lots of people are getting confused by the output, then that motivates
> > finding a solution. If not, then it's probably not worth the time.
> 
> Well, you nerd-sniped me anyhow. The result is at [1]. Following the

It seems we've discovered my true talent. :)

> example of t/test-lib.sh, it uses `tput` if available to avoid
> hardcoding color codes, and `tput` is invoked lazily, only if it
> detects problems in the tests, so a normal (non-problematic) run
> doesn't incur the overhead of shelling out to `tput`.

Ah, of course. I didn't think about the fact that the regular tests
already had to deal with this problem. Following that lead makes perfect
sense.

-Peff