diff mbox series

cmd-list.perl: fix identifying man sections

Message ID 20220923070334.1970213-1-martin.agren@gmail.com (mailing list archive)
State Superseded
Headers show
Series cmd-list.perl: fix identifying man sections | expand

Commit Message

Martin Ågren Sept. 23, 2022, 7:03 a.m. UTC
We attribute each documentation text file to a man section by finding a
line in the file that looks like "gitfoo(<digit>)". Commit cc75e556a9
("scalar: add to 'git help -a' command list", 2022-09-02) updated this
logic to look not only for "gitfoo" but also "scalarfoo". In doing so,
it forgot to account for the fact that after the updated regex has found
a match, the man section is no longer to be found in `$1` but now lives
in `$2`.

This makes our git(1) manpage look as follows:

  Main porcelain commands
       git-add(git)
           Add file contents to the index.

  [...]

       gitk(git)
           The Git repository browser.

       scalar(scalar)
           A tool for managing large Git repositories.

Restore the man sections by grabbing the correct value out of the regex
match.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 This is a v2.38.0-rc1 regression.

 Documentation/cmd-list.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ævar Arnfjörð Bjarmason Sept. 23, 2022, 7:38 a.m. UTC | #1
On Fri, Sep 23 2022, Martin Ågren wrote:

> We attribute each documentation text file to a man section by finding a
> line in the file that looks like "gitfoo(<digit>)". Commit cc75e556a9
> ("scalar: add to 'git help -a' command list", 2022-09-02) updated this
> logic to look not only for "gitfoo" but also "scalarfoo". In doing so,
> it forgot to account for the fact that after the updated regex has found
> a match, the man section is no longer to be found in `$1` but now lives
> in `$2`.
>
> This makes our git(1) manpage look as follows:
>
>   Main porcelain commands
>        git-add(git)
>            Add file contents to the index.
>
>   [...]
>
>        gitk(git)
>            The Git repository browser.
>
>        scalar(scalar)
>            A tool for managing large Git repositories.
>
> Restore the man sections by grabbing the correct value out of the regex
> match.
>
> Signed-off-by: Martin Ågren <martin.agren@gmail.com>
> ---
>  This is a v2.38.0-rc1 regression.
>
>  Documentation/cmd-list.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
> index 9515a499a3..16451d81b8 100755
> --- a/Documentation/cmd-list.perl
> +++ b/Documentation/cmd-list.perl
> @@ -11,7 +11,7 @@ sub format_one {
>  	open I, '<', "$name.txt" or die "No such file $name.txt";
>  	while (<I>) {
>  		if (/^(git|scalar)[a-z0-9-]*\(([0-9])\)$/) {
> -			$mansection = $1;
> +			$mansection = $2;
>  			next;
>  		}
>  		if (/^NAME$/) {

Ouch, good catch!

In the v1 review for scalar[1] I pointed out that I had a local patch
where this is instead:

	if (/^[a-z0-9-]*\(([0-9])\)$/) {
		$mansection = $1;

The v2 at [2] which I didn't get around to reviewing (sorry!) then
introduced this logic error.

Victoria: Did you find some reason to not just take the version I had?
The doc-diff with Martin's above is empty, i.e. it's the same in
practice, but if we don't need to hard-code our top-level commands for
no reason I don't see why we'd maintain this list of them here.

On the proposed rc patch: FWIW we can also just use (?:) groupings
instead of a capture:

	if (/^(?:git|scalar)[a-z0-9-]*\(([0-9])\)$/) {

It has the same effect, but arguably makes more sense. I.e. if we don't
care about $1 let's not capture the thing we don't care about into $1 in
the first place.

1. https://lore.kernel.org/git/220831.86y1v48h2x.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/070f195f027e5601b88ca6a0d4c9991b472ad9ab.1662134210.git.gitgitgadget@gmail.com/
Jeff King Sept. 23, 2022, 10:07 p.m. UTC | #2
On Fri, Sep 23, 2022 at 09:03:34AM +0200, Martin Ågren wrote:

> This makes our git(1) manpage look as follows:
> 
>   Main porcelain commands
>        git-add(git)
>            Add file contents to the index.
> 
>   [...]
> 
>        gitk(git)
>            The Git repository browser.
> 
>        scalar(scalar)
>            A tool for managing large Git repositories.

Good catch. The patch looks good (I was going to suggest ?:, but I see
Ævar beat me to it).

I wondered if we might have caught this in a more automatic way. The
output of:

  cd Documentation
  ./doc-diff cc75e556a9^ cc75e556a9

makes the problem apparent, but I don't fault reviewers for not running
it. I rarely remember to do so. And in general you need a human looking
at doc-diff output to know if it was the intended change or not.

I wondered if it might be worth running

  ./doc-diff v2.37.0 v2.38.0-rc1

near a release to scan over all of the changes. But the diff is over
8000 lines, and I admit my eyes glazed over before I got to the
problematic hunks (even though I knew I was looking for them!). You can
limit it a bit with --diff-filter=a, which drops new entries (which
can't have regressed!), but it's still over 4000 lines.

So I dunno. I think doc-diff is a potentially useful tool, but I'm not
sure how to point the human attention at the right spot to find a bug.
Maybe "given enough eyeballs, all bugs are shallow" is our best bet
here. After all, it did find this bug before the release. :)

-Peff
Martin Ågren Sept. 26, 2022, 7:16 a.m. UTC | #3
On Sat, 24 Sept 2022 at 00:07, Jeff King <peff@peff.net> wrote:
>
> I wondered if we might have caught this in a more automatic way. The
> output of:
>
>   cd Documentation
>   ./doc-diff cc75e556a9^ cc75e556a9
>
> makes the problem apparent, but I don't fault reviewers for not running
> it. I rarely remember to do so. And in general you need a human looking
> at doc-diff output to know if it was the intended change or not.
>
> I wondered if it might be worth running
>
>   ./doc-diff v2.37.0 v2.38.0-rc1
>
> near a release to scan over all of the changes. But the diff is over
> 8000 lines, and I admit my eyes glazed over before I got to the
> problematic hunks (even though I knew I was looking for them!).

I know how you felt when you looked at that doc-diff... I was lucky
enough to first look at

  ./doc-diff v2.37.0 v2.38.0-rc0

so that when I then looked at

  ./doc-diff v2.38.0-rc0 v2.38.0-rc1

this stood out quite well.

> So I dunno. I think doc-diff is a potentially useful tool, but I'm not
> sure how to point the human attention at the right spot to find a bug.

I try to do some such doc-diffing every now and then, and in particular
around rc time. It has caught a few buglets, usually nothing major.

I've also done

  ./doc-diff origin/next origin/seen

at times to catch such things a lot earlier, but it's not often that I
find the time to do so / think about doing it.

> Maybe "given enough eyeballs, all bugs are shallow" is our best bet
> here. After all, it did find this bug before the release. :)

Yeah, and luck never hurts -- I think it's fair to assume that I would
have missed this bug if it hadn't come in after rc0.

Martin
Ævar Arnfjörð Bjarmason Sept. 26, 2022, 1:35 p.m. UTC | #4
On Fri, Sep 23 2022, Jeff King wrote:

> On Fri, Sep 23, 2022 at 09:03:34AM +0200, Martin Ågren wrote:
>
>> This makes our git(1) manpage look as follows:
>> 
>>   Main porcelain commands
>>        git-add(git)
>>            Add file contents to the index.
>> 
>>   [...]
>> 
>>        gitk(git)
>>            The Git repository browser.
>> 
>>        scalar(scalar)
>>            A tool for managing large Git repositories.
>
> Good catch. The patch looks good (I was going to suggest ?:, but I see
> Ævar beat me to it).
>
> I wondered if we might have caught this in a more automatic way. The
> output of:
>
>   cd Documentation
>   ./doc-diff cc75e556a9^ cc75e556a9

FWIW I asked before on-list[1] why the CI output of
https://github.com/gitster/git wasn't visible, I don't know if it's a
setting Junio can adjust, or something else.

I wanted to set something up to scrape the WC E-Mails, and see what the
CI status of those topics was individually.

And, if others found it useful eventually, perhaps send that as a
follow-up to the list. I.e. if CI for the topic passes, what (if any)
the output differences are. We could then just run a "doc-diff" in the
CI, and report (as part of some matrix) a checkmark next to those that
have doc changes.

Of course I could re-push all of the topics myself, but that would take
a long time to complete, and compete with any other use I have for CI
myself. So piggy-backing on gitster's would be much better, if they're
even there in the first place...

1. https://lore.kernel.org/git/220114.86fspqrxbw.gmgdl@evledraar.gmail.com/
Junio C Hamano Sept. 26, 2022, 5:06 p.m. UTC | #5
Martin Ågren <martin.agren@gmail.com> writes:

> I've also done
>
>   ./doc-diff origin/next origin/seen
>
> at times to catch such things a lot earlier, but it's not often that I
> find the time to do so / think about doing it.

Ah, that probably is a good thing to do on my end after picking
which topics to merge to 'next' and before merging them, i.e.

 * Find ones that are marked as "Merge to 'next'";
 * "git checkout --detach next" and then merge them;
 * Do the doc-diff between 'next' and HEAD.

in addition to my usual set of tests, SANITIZE=<stuff>, etc.

Thanks for an idea.
diff mbox series

Patch

diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
index 9515a499a3..16451d81b8 100755
--- a/Documentation/cmd-list.perl
+++ b/Documentation/cmd-list.perl
@@ -11,7 +11,7 @@  sub format_one {
 	open I, '<', "$name.txt" or die "No such file $name.txt";
 	while (<I>) {
 		if (/^(git|scalar)[a-z0-9-]*\(([0-9])\)$/) {
-			$mansection = $1;
+			$mansection = $2;
 			next;
 		}
 		if (/^NAME$/) {