[0/4] chainlint: improve annotated output

Message ID	pull.1375.git.git.1667934510.gitgitgadget@gmail.com (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <pull.1375.git.git.1667934510.gitgitgadget@gmail.com> From: "Eric Sunshine via GitGitGadget" <gitgitgadget@gmail.com> Date: Tue, 08 Nov 2022 19:08:26 +0000 Subject: [PATCH 0/4] chainlint: improve annotated output Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Jeff King <peff@peff.net>, =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason <avarab@gmail.com>, Eric Sunshine <sunshine@sunshineco.com> Precedence: bulk
Series	chainlint: improve annotated output \| expand [0/4] chainlint: improve annotated output [1/4] chainlint: add explanatory comments [2/4] chainlint: tighten accuracy when consuming input stream [3/4] chainlint: latch start/end position of each token [4/4] chainlint: annotate original test definition rather than token stream

Philippe Blain via GitGitGadget Nov. 8, 2022, 7:08 p.m. UTC

When chainlint detects problems in a test, such as a broken &&-chain, it
prints out the test with "?!FOO?!" annotations inserted at each problem
location. However, rather than annotating the original test definition, it
instead dumps out a parsed token representation of the test. Since it lacks
comments, indentation, here-doc bodies, and so forth, this tokenized
representation can be difficult for the test author to digest and relate
back to the original test definition.

An earlier patch series[1] improved the output somewhat by colorizing the
"?!FOO?!" annotations and the "# chainlint:" lines, but the output can still
be difficult to digest.

This patch series further improves the output by instead making chainlint.pl
annotate the original test definition rather than the parsed token stream,
thus preserving indentation (and whitespace, in general), here-doc bodies,
etc., which should make it easier for a test author to relate each problem
back to the source.

This series was inspired by usability comments from Peff[2] and Ævar[3] and
a bit of discussion which followed[4][5].

(Note to self: Add Ævar to nerd-snipe blacklist alongside Peff.)

FOOTNOTES

[1]
https://lore.kernel.org/git/pull.1324.v2.git.git.1663041707260.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/Yx1x5lme2SGBjfia@coredump.intra.peff.net/
[3] https://lore.kernel.org/git/221024.865yg9ecsx.gmgdl@evledraar.gmail.com/
[4]
https://lore.kernel.org/git/CAPig+cRJVn-mbA6-jOmNfDJtK_nX4ZTw+OcNShvvz8zcQYbCHQ@mail.gmail.com/
[5]
https://lore.kernel.org/git/CAPig+cT=cWYT6kicNWT+6RxfiKKMyVz72H3_9kwkF-f4Vuoe1w@mail.gmail.com/

Eric Sunshine (4):
  chainlint: add explanatory comments
  chainlint: tighten accuracy when consuming input stream
  chainlint: latch start/end position of each token
  chainlint: annotate original test definition rather than token stream

 t/chainlint.pl                                | 107 +++++++++++-------
 t/chainlint/block-comment.expect              |   2 +
 t/chainlint/case-comment.expect               |   3 +
 t/chainlint/close-subshell.expect             |   3 +-
 t/chainlint/comment.expect                    |   4 +
 t/chainlint/double-here-doc.expect            |  14 ++-
 t/chainlint/empty-here-doc.expect             |   3 +-
 t/chainlint/for-loop.expect                   |   4 +-
 t/chainlint/here-doc-close-subshell.expect    |   4 +-
 t/chainlint/here-doc-indent-operator.expect   |  10 +-
 .../here-doc-multi-line-command-subst.expect  |   5 +-
 t/chainlint/here-doc-multi-line-string.expect |   4 +-
 t/chainlint/here-doc.expect                   |  24 +++-
 t/chainlint/if-then-else.expect               |   4 +-
 t/chainlint/incomplete-line.expect            |  10 +-
 t/chainlint/inline-comment.expect             |   4 +-
 t/chainlint/loop-detect-status.expect         |   2 +-
 t/chainlint/nested-here-doc.expect            |  27 ++++-
 t/chainlint/nested-subshell-comment.expect    |   2 +
 t/chainlint/subshell-here-doc.expect          |  28 ++++-
 t/chainlint/t7900-subtree.expect              |   4 +
 t/chainlint/while-loop.expect                 |   4 +-
 22 files changed, 206 insertions(+), 66 deletions(-)


base-commit: 63bba4fdd86d80ef061c449daa97a981a9be0792
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1375%2Fsunshineco%2Fchainlintpreserve-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1375/sunshineco/chainlintpreserve-v1
Pull-Request: https://github.com/git/git/pull/1375

Taylor Blau Nov. 8, 2022, 8:28 p.m. UTC | #1

On Tue, Nov 08, 2022 at 07:08:26PM +0000, Eric Sunshine via GitGitGadget wrote:
> This patch series further improves the output by instead making chainlint.pl
> annotate the original test definition rather than the parsed token stream,
> thus preserving indentation (and whitespace, in general), here-doc bodies,
> etc., which should make it easier for a test author to relate each problem
> back to the source.

Very nicely done. The changes all seemed reasonable to me (and, in fact,
the approach is pretty straightforward -- the diffstat is misleading
since many of changes are to chainlint's expected output).

So I'm happy with it, but let's hear from some other folks who are more
familiar with this area before we start merging it down.

Thanks,
Taylor

Ævar Arnfjörð Bjarmason Nov. 8, 2022, 10:17 p.m. UTC | #2

On Tue, Nov 08 2022, Eric Sunshine via GitGitGadget wrote:

> When chainlint detects problems in a test, such as a broken &&-chain, it
> prints out the test with "?!FOO?!" annotations inserted at each problem
> location. However, rather than annotating the original test definition, it
> instead dumps out a parsed token representation of the test. Since it lacks
> comments, indentation, here-doc bodies, and so forth, this tokenized
> representation can be difficult for the test author to digest and relate
> back to the original test definition.
>
> An earlier patch series[1] improved the output somewhat by colorizing the
> "?!FOO?!" annotations and the "# chainlint:" lines, but the output can still
> be difficult to digest.
>
> This patch series further improves the output by instead making chainlint.pl
> annotate the original test definition rather than the parsed token stream,
> thus preserving indentation (and whitespace, in general), here-doc bodies,
> etc., which should make it easier for a test author to relate each problem
> back to the source.
>
> This series was inspired by usability comments from Peff[2] and Ævar[3] and
> a bit of discussion which followed[4][5].
>
> (Note to self: Add Ævar to nerd-snipe blacklist alongside Peff.)

Heh! It's great to see a follow-up to our discussion the other day, and
having the output verbatim & annotated looks much better, especially for
complex tests.

E.g. (taking one at random, after some grepping/skimming), ruining this one:
	
	diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
	index dcaab7265f5..c27539a773d 100755
	--- a/t/t6300-for-each-ref.sh
	+++ b/t/t6300-for-each-ref.sh
	@@ -1365,8 +1365,7 @@ test_expect_success 'for-each-ref --ignore-case works on multiple sort keys' '
	                do
	                        GIT_COMMITTER_EMAIL="$email@example.com" \
	                        git tag -m "tag $subject" icase-$(printf %02d $nr) &&
	-                       nr=$((nr+1))||
	-                       return 1
	+                       nr=$((nr+1))
	                done
	        done &&
	        git for-each-ref --ignore-case \

Would, before emit (correct, but a bit of a token-soup):

	
	$ ./t6300-for-each-ref.sh 
	# chainlint: ./t6300-for-each-ref.sh
	# chainlint: for-each-ref --ignore-case works on multiple sort keys
	
	nr=0 &&
	for email in a A b B
	do
	for subject in a A b B
	do
	GIT_COMMITTER_EMAIL="$email@example.com" git tag -m "tag $subject" icase-$(printf %02d $nr) &&
	nr=$((nr+1)) ?!LOOP?!
	done ?!LOOP?!
	done &&
	git for-each-ref --ignore-case --format="%(taggeremail) %(subject) %(refname)" --sort=refname --sort=subject --sort=taggeremail refs/tags/icase-* > actual &&
	cat > expect <<-EOF &&
	test_cmp expect actual
	error: bug in the test script: lint error (see '?!...!? annotations above)

But now it'll instead emit:
	
	$ ./t6300-for-each-ref.sh
	# chainlint: ./t6300-for-each-ref.sh
	# chainlint: for-each-ref --ignore-case works on multiple sort keys
	        # name refs numerically to avoid case-insensitive filesystem conflicts
	        nr=0 &&
	        for email in a A b B
	        do
	                for subject in a A b B
	                do
	                        GIT_COMMITTER_EMAIL="$email@example.com" \
	                        git tag -m "tag $subject" icase-$(printf %02d $nr) &&
	                        nr=$((nr+1)) ?!LOOP?!
	                done ?!LOOP?!
	        done &&
	        git for-each-ref --ignore-case \
	                --format="%(taggeremail) %(subject) %(refname)" \
	                --sort=refname \
	                --sort=subject \
	                --sort=taggeremail \
	                refs/tags/icase-* >actual &&
	        cat >expect <<-\EOF &&
	        <a@example.com> tag a refs/tags/icase-00
	        <a@example.com> tag A refs/tags/icase-01
	        <A@example.com> tag a refs/tags/icase-04
	        <A@example.com> tag A refs/tags/icase-05
	        <a@example.com> tag b refs/tags/icase-02
	        <a@example.com> tag B refs/tags/icase-03
	        <A@example.com> tag b refs/tags/icase-06
	        <A@example.com> tag B refs/tags/icase-07
	        <b@example.com> tag a refs/tags/icase-08
	        <b@example.com> tag A refs/tags/icase-09
	        <B@example.com> tag a refs/tags/icase-12
	        <B@example.com> tag A refs/tags/icase-13
	        <b@example.com> tag b refs/tags/icase-10
	        <b@example.com> tag B refs/tags/icase-11
	        <B@example.com> tag b refs/tags/icase-14
	        <B@example.com> tag B refs/tags/icase-15
	        EOF
	        test_cmp expect actual
	error: bug in the test script: lint error (see '?!...!? annotations above)

Which is so much better, i.e. as you're preserving the whitespace &
comments, and the "?!LOOP?!" is of course much easier to see with the
colored output.

I hadn't noticed before that the contents of here-docs was pruned, but
that made sense in the previous parser, but having the content.

Also, and I guess this is an attempt to evade your blacklist. I *did*
notice when playing around with this that if I now expand the "1 while"
loop here:

	my $s = do { local $/; <$fh> };
	close($fh);
	my $parser = ScriptParser->new(\$s);
	1 while $parser->parse_cmd();

To something that "follows along" with the parser it shouldn't be too
hard in the future to add line number annotations now. E.g. for
"#!/bin/sh\n" you'll emit a token like "\n", but the positions will be
0, 10.

But that's all for some hypothetical future, this is already much better
:)

Eric Sunshine Nov. 8, 2022, 10:43 p.m. UTC | #3

On Tue, Nov 8, 2022 at 5:29 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> On Tue, Nov 08 2022, Eric Sunshine via GitGitGadget wrote:
> > (Note to self: Add Ævar to nerd-snipe blacklist alongside Peff.)
>
> Also, and I guess this is an attempt to evade your blacklist. I *did*
> notice when playing around with this that if I now expand the "1 while"
> loop here:
>    [...]
> To something that "follows along" with the parser it shouldn't be too
> hard in the future to add line number annotations now. E.g. for
> "#!/bin/sh\n" you'll emit a token like "\n", but the positions will be
> 0, 10.
>
> But that's all for some hypothetical future, this is already much better

My nerd-snipe blacklist hasn't fully solidified yet, unfortunately (for me).

Eric Sunshine Nov. 8, 2022, 10:52 p.m. UTC | #4

On Tue, Nov 8, 2022 at 5:43 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Tue, Nov 8, 2022 at 5:29 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> > On Tue, Nov 08 2022, Eric Sunshine via GitGitGadget wrote:
> > > (Note to self: Add Ævar to nerd-snipe blacklist alongside Peff.)
> >
> > Also, and I guess this is an attempt to evade your blacklist. I *did*
> > notice when playing around with this that if I now expand the "1 while"
> > loop here:
> >    [...]
> > To something that "follows along" with the parser it shouldn't be too
> > hard in the future to add line number annotations now. E.g. for
> > "#!/bin/sh\n" you'll emit a token like "\n", but the positions will be
> > 0, 10.
> >
> > But that's all for some hypothetical future, this is already much better
>
> My nerd-snipe blacklist hasn't fully solidified yet, unfortunately (for me).

I forgot to add that if you do manage to penetrate my nerd-snipe
blacklist, such a feature would be built atop the current series (i.e.
no reason to hold up this series for the "hypothetical future", as you
say).

Jeff King Nov. 9, 2022, 1:11 p.m. UTC | #5

On Tue, Nov 08, 2022 at 03:28:34PM -0500, Taylor Blau wrote:

> On Tue, Nov 08, 2022 at 07:08:26PM +0000, Eric Sunshine via GitGitGadget wrote:
> > This patch series further improves the output by instead making chainlint.pl
> > annotate the original test definition rather than the parsed token stream,
> > thus preserving indentation (and whitespace, in general), here-doc bodies,
> > etc., which should make it easier for a test author to relate each problem
> > back to the source.
> 
> Very nicely done. The changes all seemed reasonable to me (and, in fact,
> the approach is pretty straightforward -- the diffstat is misleading
> since many of changes are to chainlint's expected output).
> 
> So I'm happy with it, but let's hear from some other folks who are more
> familiar with this area before we start merging it down.

I don't claim to be _that_ familiar with the code itself, but all of the
patches look reasonable to me. And most importantly, I dug out the state
of my tree from early September (via the reflog) before I fixed all of
the chainlint problems on my local topics. The improvement in the output
with this series is night and day.

I was a little surprised that using a class in patch 3 would cause such
a slowdown. But it's not that hard to believe that the workload is so
heavy on string comparison and manipulation that the overloaded string
and comparison functions introduce significant overhead. It has been a
long time since I've optimized any perl, but I remember the rule of
thumb being to minimize the number of lines of perl (because all of the
builtin stuff is blazingly fast C, and all of the perl is byte-code).

At any rate, the result you came up with doesn't look too bad. The only
risk is that you forgot to s/$token/$token->[0]/ somewhere, and I
suspect we'd have found that in running the tests.

So it all seems like a step forward to me.

-Peff

Taylor Blau Nov. 10, 2022, 2:42 a.m. UTC | #6

On Wed, Nov 09, 2022 at 08:11:45AM -0500, Jeff King wrote:
> So it all seems like a step forward to me.

Thanks, all. Let's start merging this topic down :-).


Thanks,
Taylor

[0/4] chainlint: improve annotated output

Message

Comments