Message ID | 5088e93d76e44de9d079b7b2296b8c810828a2f5.1615856156.git.liu.denton@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Sort lists and add static-analysis | expand |
On Mon, Mar 15, 2021 at 8:57 PM Denton Liu <liu.denton@gmail.com> wrote: > In the previous few commits, we sorted many lists into ASCII-order. In > order to ensure that they remain that way, add the 'check-sort' target. > [...] > Signed-off-by: Denton Liu <liu.denton@gmail.com> > --- > +my @regexes = map { qr/^$_/ } @ARGV; > +my $last_regex = 0; > +my $last_line = ''; > +while (<STDIN>) { > + my $matched = 0; > + chomp; > + for my $regex (@regexes) { > + next unless $_ =~ $regex; > + if ($last_regex == $regex) { > + die "duplicate lines: '$_'\n" unless $last_line ne $_; > + die "unsorted lines: '$last_line' before '$_'\n" unless $last_line lt $_; > + } > + $matched = 1; > + $last_regex = $regex; > + $last_line = $_; > + } > + unless ($matched) { > + $last_regex = 0; > + $last_line = ''; > + } > +} This is, of course, endlessly bikesheddable. Here is a shorter -- and, at least for me, easier to understand -- way to do it: my $rc = 0; chomp(my @all = <STDIN>); foreach my $needle (@ARGV) { my @lines = grep(/^$needle/, @all); if (join("\n", @lines) ne join("\n", sort @lines)) { print "'$needle' lines not sorted\n"; $rc = 1; } } exit $rc; By the way, it might be a good idea to also print the filename in which the problem occurred. Such context can be important for the person trying to track down the complaint. To do so, you'd probably want to pass the filename as an argument, and open and read the file rather than sending it only as standard-input.
Hi Eric, On Tue, Mar 16, 2021 at 02:37:16AM -0400, Eric Sunshine wrote: > On Mon, Mar 15, 2021 at 8:57 PM Denton Liu <liu.denton@gmail.com> wrote: > > In the previous few commits, we sorted many lists into ASCII-order. In > > order to ensure that they remain that way, add the 'check-sort' target. > > [...] > > Signed-off-by: Denton Liu <liu.denton@gmail.com> > > --- > > +my @regexes = map { qr/^$_/ } @ARGV; > > +my $last_regex = 0; > > +my $last_line = ''; > > +while (<STDIN>) { > > + my $matched = 0; > > + chomp; > > + for my $regex (@regexes) { > > + next unless $_ =~ $regex; > > + if ($last_regex == $regex) { > > + die "duplicate lines: '$_'\n" unless $last_line ne $_; > > + die "unsorted lines: '$last_line' before '$_'\n" unless $last_line lt $_; > > + } > > + $matched = 1; > > + $last_regex = $regex; > > + $last_line = $_; > > + } > > + unless ($matched) { > > + $last_regex = 0; > > + $last_line = ''; > > + } > > +} > > This is, of course, endlessly bikesheddable. Here is a shorter -- and, > at least for me, easier to understand -- way to do it: > > my $rc = 0; > chomp(my @all = <STDIN>); > foreach my $needle (@ARGV) { > my @lines = grep(/^$needle/, @all); > if (join("\n", @lines) ne join("\n", sort @lines)) { > print "'$needle' lines not sorted\n"; > $rc = 1; > } > } > exit $rc; That's pretty clever, thanks for showing me how it's done :) However, the reason I wrote it out the way that I did is because my code ensures that consecutive lines matching the regex are sorted but if there are any breaks between matching regex lines, it will consider them separate blocks. Just taking all the lines fails in the case of `LIB_OBJS \+=` in Makefile since we have LIB_OBJS += zlib.o [... many intervening lines ...] LIB_OBJS += $(COMPAT_OBJS) and that is technically unsorted. That being said, I don't really like my current approach that much. I think I have two better options: 1. Tighten up the regexes so that it excludes the $(COMPAT_OBJS). I don't want to be too strict, though, because if we end up not matching a line it might end up unsorted. 2. Consider blank lines to be block separators and only consider it to be sorted if the text matching regexes within a block are sorted. Now that I've written that all out, I think I like option 1 more, although I could definitely be convinced to go either way. > By the way, it might be a good idea to also print the filename in > which the problem occurred. Such context can be important for the > person trying to track down the complaint. To do so, you'd probably > want to pass the filename as an argument, and open and read the file > rather than sending it only as standard-input. Agreed. Thanks, Denton
On Tue, Mar 16 2021, Denton Liu wrote: > In the previous few commits, we sorted many lists into ASCII-order. In > order to ensure that they remain that way, add the 'check-sort' target. > > The check-sort.perl program ensures that consecutive lines that match > the same regex are sorted in ASCII-order. The 'check-sort' target runs > the check-sort.perl program on some files which are known to contain > sorted lists. > > Signed-off-by: Denton Liu <liu.denton@gmail.com> > --- > > Notes: > Full disclaimer: this is the first time I've written anything in Perl. > Please let me know if I'm doing anything unconventional :) > > Makefile | 25 +++++++++++++++++++++++++ > check-sort.perl | 31 +++++++++++++++++++++++++++++++ > 2 files changed, 56 insertions(+) > create mode 100755 check-sort.perl > > diff --git a/Makefile b/Makefile > index 5832aa33da..b23dff384d 100644 > --- a/Makefile > +++ b/Makefile > @@ -3240,6 +3240,31 @@ check-docs:: > check-builtins:: > ./check-builtins.sh > > +.PHONY: check-sort > +check-sort:: > + ./check-sort.perl \ > + 'ALL_COMMANDS \+=' \ > + 'ALL_COMMANDS_TO_INSTALL \+=' \ > + 'BINDIR_PROGRAMS_NEED_X \+=' \ > + 'BINDIR_PROGRAMS_NO_X \+=' \ > + 'BUILTIN_OBJS \+=' \ > + 'BUILT_INS \+=' \ > + 'FUZZ_OBJS \+=' \ > + 'GENERATED_H \+=' \ > + 'LIB_OBJS \+=' \ > + 'SCRIPT_LIB \+=' \ > + 'SCRIPT_PERL \+=' \ > + 'SCRIPT_PYTHON \+=' \ > + 'SCRIPT_SH \+=' \ > + 'TEST_BUILTINS_OBJS \+=' \ > + 'TEST_PROGRAMS_NEED_X \+=' \ > + 'THIRD_PARTY_SOURCES \+=' \ > + 'XDIFF_OBJS \+=' \ > + <Makefile Why does this part need to be a Perl script at all? We can check this in the makefile itself. Make has a sort function and string comparisons, e.g.: LIB_OBJS_SORTED = LIB_OBJS_SORTED += $(sort $(LIB_OBJS)) ifneq ("$(LIB_OBJS)", "$(LIB_OBJS_SORTED)") $(error "please sort and de-duplicate LIB_OBJS!") endif This will fail/pass before/after your patches. Note that make's sort isn't just a sort, it also de-deplicates (not that we're likely to have that issue). > [...] > + ./check-sort.perl '\t\{ "[^"]*",' <git.c This last one you can IMO be done better as (or if we want to be more anal, we could make git die on startup if it's not true): diff --git a/t/t0012-help.sh b/t/t0012-help.sh index 5679e29c62..5bd2ebceca 100755 --- a/t/t0012-help.sh +++ b/t/t0012-help.sh @@ -77,6 +77,11 @@ test_expect_success 'generate builtin list' ' git --list-cmds=builtins >builtins ' +test_expect_success 'list of builtins in git.c should be sorted' ' + sort builtins >sorted && + test_cmp sorted builtins +' + while read builtin do test_expect_success "$builtin can handle -h" ' Which just leaves: > + ./check-sort.perl 'int cmd_[^(]*\(' <builtin.h > + ./check-sort.perl 'int cmd__[^(]*\(' <t/helper/test-tool.h As something that can't be done in the Makefile itself or that we're already extracting from the tests. Both of those IMO would be better handled down the line by making the relevant part of these files generated from the data in the Makefile, at which point we'd have no need for the external perl script.
On Wed, Mar 17, 2021 at 01:47:15PM +0100, Ævar Arnfjörð Bjarmason wrote: > Why does this part need to be a Perl script at all? We can check this in > the makefile itself. Make has a sort function and string comparisons, > e.g.: > > LIB_OBJS_SORTED = > LIB_OBJS_SORTED += $(sort $(LIB_OBJS)) > ifneq ("$(LIB_OBJS)", "$(LIB_OBJS_SORTED)") > $(error "please sort and de-duplicate LIB_OBJS!") > endif > > This will fail/pass before/after your patches. Note that make's sort > isn't just a sort, it also de-deplicates (not that we're likely to have > that issue). I like that much better, if only because it runs quickly and automatically (as opposed to much later as part of some CI process). Reducing the length of the feedback loop makes automated checks much less annoying. It doesn't indicate which lines are out of order or duplicated. We could probably add some $(shell) magic to dump both sides to diff inside the ifneq. That's getting unwieldy to use in each site, but I wonder if something like this solves that: diff --git a/Makefile b/Makefile index dfb0f1000f..3fa7893c8b 100644 --- a/Makefile +++ b/Makefile @@ -596,6 +596,16 @@ THIRD_PARTY_SOURCES = # interactive shell sessions without exporting it. unexport CDPATH +# usage: $(call check-sort,VAR) +# If this complains, then make sure the contents of VAR are ASCII-sorted. +define check-sort-template +SORTED_$1 = $$(sort $$($1)) +ifneq ($$($1),$$(SORTED_$1)) +$$(error "please sort and de-duplicate $1!") +endif +endef +check-sort = $(eval $(call check-sort-template,$1)) + SCRIPT_SH += git-bisect.sh SCRIPT_SH += git-difftool--helper.sh SCRIPT_SH += git-filter-branch.sh @@ -1037,6 +1047,7 @@ LIB_OBJS += ws.o LIB_OBJS += wt-status.o LIB_OBJS += xdiff-interface.o LIB_OBJS += zlib.o +$(call check-sort,LIB_OBJS) BUILTIN_OBJS += builtin/add.o BUILTIN_OBJS += builtin/am.o And then it's just a single-liner for each block that should be checked. We haven't used $(call) or $(eval) yet in our Makefile, but past discussions have reached the conclusion that they should be safe (they're both in GNU make 3.80, which is the oldest version worth caring about). TBH, I'm a little on the fence on whether automatically checking this is even worth the hassle. Doing the make function above was a fun diversion, but already I think this discussion has taken more time than people actually spend resolving conflicts on unsorted Makefile lists. > > [...] > > + ./check-sort.perl '\t\{ "[^"]*",' <git.c > > This last one you can IMO be done better as (or if we want to be more > anal, we could make git die on startup if it's not true): > > diff --git a/t/t0012-help.sh b/t/t0012-help.sh > index 5679e29c62..5bd2ebceca 100755 > --- a/t/t0012-help.sh > +++ b/t/t0012-help.sh > @@ -77,6 +77,11 @@ test_expect_success 'generate builtin list' ' > git --list-cmds=builtins >builtins > ' > > +test_expect_success 'list of builtins in git.c should be sorted' ' > + sort builtins >sorted && > + test_cmp sorted builtins > +' Again, much nicer, because it just happens as part of the test suite (which I hope everyone is running after making changes). If we care about the order in the source code, then this check implies that we are not sorting the list internally before outputting it, but that is probably reasonable. > > + ./check-sort.perl 'int cmd_[^(]*\(' <builtin.h > > + ./check-sort.perl 'int cmd__[^(]*\(' <t/helper/test-tool.h > > As something that can't be done in the Makefile itself or that we're > already extracting from the tests. > > Both of those IMO would be better handled down the line by making the > relevant part of these files generated from the data in the Makefile, at > which point we'd have no need for the external perl script. Agreed. Automating away a tedious task is always better than automatically checking that somebody did it right. :) -Peff
On Wed, Mar 17 2021, Jeff King wrote: > SCRIPT_SH += git-bisect.sh > SCRIPT_SH += git-difftool--helper.sh > SCRIPT_SH += git-filter-branch.sh > @@ -1037,6 +1047,7 @@ LIB_OBJS += ws.o > LIB_OBJS += wt-status.o > LIB_OBJS += xdiff-interface.o > LIB_OBJS += zlib.o > +$(call check-sort,LIB_OBJS) > > BUILTIN_OBJS += builtin/add.o > BUILTIN_OBJS += builtin/am.o > > And then it's just a single-liner for each block that should be checked. > We haven't used $(call) or $(eval) yet in our Makefile, but past > discussions have reached the conclusion that they should be safe > (they're both in GNU make 3.80, which is the oldest version worth caring > about). ...also this sort of thing can be guarded by "ifdef DEVELOPER" or something, which AFAICT (from trying to introduce syntax errors etc.) will get parsed first, before "make" even tries to parse what's within the ifdef. So if we have bells & whistles in that sort of setup it can be less portable than the Makefile in general..
Denton Liu <liu.denton@gmail.com> writes: > + ./check-sort.perl 'int cmd_[^(]*\(' <builtin.h > + ./check-sort.perl 'int cmd__[^(]*\(' <t/helper/test-tool.h These two are trivial to see. > + ./check-sort.perl '\t\{ "[^"]*",' <git.c This is too brittle to be acceptable. It FORBIDS us from introducing initialization for another table to the file. I won't participate in the bikeshedding of how the Perl script would be best written ;-)
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> + ./check-sort.perl '\t\{ "[^"]*",' <git.c > > This last one you can IMO be done better as (or if we want to be more > anal, we could make git die on startup if it's not true): > > diff --git a/t/t0012-help.sh b/t/t0012-help.sh > index 5679e29c62..5bd2ebceca 100755 > --- a/t/t0012-help.sh > +++ b/t/t0012-help.sh > @@ -77,6 +77,11 @@ test_expect_success 'generate builtin list' ' > git --list-cmds=builtins >builtins > ' > > +test_expect_success 'list of builtins in git.c should be sorted' ' > + sort builtins >sorted && > + test_cmp sorted builtins > +' "LANG=C LC_ALL=C sort ..." I like this 100% better than the original ;-)
On Wed, Mar 17 2021, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >>> + ./check-sort.perl '\t\{ "[^"]*",' <git.c >> >> This last one you can IMO be done better as (or if we want to be more >> anal, we could make git die on startup if it's not true): >> >> diff --git a/t/t0012-help.sh b/t/t0012-help.sh >> index 5679e29c62..5bd2ebceca 100755 >> --- a/t/t0012-help.sh >> +++ b/t/t0012-help.sh >> @@ -77,6 +77,11 @@ test_expect_success 'generate builtin list' ' >> git --list-cmds=builtins >builtins >> ' >> >> +test_expect_success 'list of builtins in git.c should be sorted' ' >> + sort builtins >sorted && >> + test_cmp sorted builtins >> +' > > "LANG=C LC_ALL=C sort ..." > > I like this 100% better than the original ;-) We don't need to use "LANG=C LC_ALL=C sort", the test-lib.sh sets that already, so just "sort" works consistently. It's also why with GETTEXT_POISON gone we can just "grep" output, instead of worrying that it may be in the user's locale.
On Wed, Mar 17, 2021 at 1:34 PM Jeff King <peff@peff.net> wrote: > TBH, I'm a little on the fence on whether automatically checking this is > even worth the hassle. Doing the make function above was a fun > diversion, but already I think this discussion has taken more time than > people actually spend resolving conflicts on unsorted Makefile lists. I had the same reaction. Like you, I jumped in for the fun diversion. It allowed me to flex my Perl muscle a bit which has atrophied, but an out-of-order item here and there is such a minor concern, especially since they don't impact correctness, that I worry that such a CI job would be more hassle than it's worth. Making the feedback loop tighter, as discussed elsewhere in this thread, makes the idea of the automated check a bit more palatable.
On Wed, Mar 17, 2021 at 05:48:18PM -0400, Eric Sunshine wrote: > On Wed, Mar 17, 2021 at 1:34 PM Jeff King <peff@peff.net> wrote: > > TBH, I'm a little on the fence on whether automatically checking this is > > even worth the hassle. Doing the make function above was a fun > > diversion, but already I think this discussion has taken more time than > > people actually spend resolving conflicts on unsorted Makefile lists. > > I had the same reaction. Like you, I jumped in for the fun diversion. > It allowed me to flex my Perl muscle a bit which has atrophied, but an > out-of-order item here and there is such a minor concern, especially > since they don't impact correctness, that I worry that such a CI job > would be more hassle than it's worth. Making the feedback loop > tighter, as discussed elsewhere in this thread, makes the idea of the > automated check a bit more palatable. There's an implication in what both of us said that I think is worth calling out explicitly: I would not feel the same about a problem that impacts the correctness of the resulting code. E.g., if the list of builtins were used with a binary search, then an unsorted list would produce the wrong result. And that would be worth testing. It just seems that the stakes here are much lower. -Peff
diff --git a/Makefile b/Makefile index 5832aa33da..b23dff384d 100644 --- a/Makefile +++ b/Makefile @@ -3240,6 +3240,31 @@ check-docs:: check-builtins:: ./check-builtins.sh +.PHONY: check-sort +check-sort:: + ./check-sort.perl \ + 'ALL_COMMANDS \+=' \ + 'ALL_COMMANDS_TO_INSTALL \+=' \ + 'BINDIR_PROGRAMS_NEED_X \+=' \ + 'BINDIR_PROGRAMS_NO_X \+=' \ + 'BUILTIN_OBJS \+=' \ + 'BUILT_INS \+=' \ + 'FUZZ_OBJS \+=' \ + 'GENERATED_H \+=' \ + 'LIB_OBJS \+=' \ + 'SCRIPT_LIB \+=' \ + 'SCRIPT_PERL \+=' \ + 'SCRIPT_PYTHON \+=' \ + 'SCRIPT_SH \+=' \ + 'TEST_BUILTINS_OBJS \+=' \ + 'TEST_PROGRAMS_NEED_X \+=' \ + 'THIRD_PARTY_SOURCES \+=' \ + 'XDIFF_OBJS \+=' \ + <Makefile + ./check-sort.perl 'int cmd_[^(]*\(' <builtin.h + ./check-sort.perl 'int cmd__[^(]*\(' <t/helper/test-tool.h + ./check-sort.perl '\t\{ "[^"]*",' <git.c + ### Test suite coverage testing # .PHONY: coverage coverage-clean coverage-compile coverage-test coverage-report diff --git a/check-sort.perl b/check-sort.perl new file mode 100755 index 0000000000..cd723db14d --- /dev/null +++ b/check-sort.perl @@ -0,0 +1,31 @@ +#!/usr/bin/perl + +use strict; +use warnings; + +my @regexes = map { qr/^$_/ } @ARGV; +my $last_regex = 0; +my $last_line = ''; + +while (<STDIN>) { + my $matched = 0; + chomp; + + for my $regex (@regexes) { + next unless $_ =~ $regex; + + if ($last_regex == $regex) { + die "duplicate lines: '$_'\n" unless $last_line ne $_; + die "unsorted lines: '$last_line' before '$_'\n" unless $last_line lt $_; + } + + $matched = 1; + $last_regex = $regex; + $last_line = $_; + } + + unless ($matched) { + $last_regex = 0; + $last_line = ''; + } +}
In the previous few commits, we sorted many lists into ASCII-order. In order to ensure that they remain that way, add the 'check-sort' target. The check-sort.perl program ensures that consecutive lines that match the same regex are sorted in ASCII-order. The 'check-sort' target runs the check-sort.perl program on some files which are known to contain sorted lists. Signed-off-by: Denton Liu <liu.denton@gmail.com> --- Notes: Full disclaimer: this is the first time I've written anything in Perl. Please let me know if I'm doing anything unconventional :) Makefile | 25 +++++++++++++++++++++++++ check-sort.perl | 31 +++++++++++++++++++++++++++++++ 2 files changed, 56 insertions(+) create mode 100755 check-sort.perl