Message ID | 62fc652eb47a4df83d88a197e376f28dbbab3b52.1661992197.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 29fb2ec384a867ca577335a12f4b45c184e7b642 |
Headers | show |
Series | make test "linting" more comprehensive | expand |
On Thu, Sep 01 2022, Eric Sunshine via GitGitGadget wrote: > From: Eric Sunshine <sunshine@sunshineco.com> > > Although chainlint.pl has undergone a good deal of optimization during > its development -- increasing in speed significantly -- parsing and > validating 1050+ scripts and 16500+ tests via Perl is not exactly > instantaneous. However, perceived performance can be improved by taking > advantage of the fact that there is no interdependence between test > scripts or test definitions, thus parsing and validating can be done in > parallel. The number of available cores is determined automatically but > can be overridden via the --jobs option. Per your CL: Ævar offered some sensible comments[2,3] about optimizing the Makefile rules related to chainlint, but those optimizations are not tackled here for a few reasons: (1) this series is already quite long, (2) I'd like to keep the series focused on its primary goal of installing a new and improved linter, (3) these patches do not make the Makefile situation any worse[4], and (4) those optimizations can easily be done atop this series[5]. I have been running with those t/Makefile changesg locally, but didn't submit them. FWIW that's here: https://github.com/git/git/compare/master...avar:git:avar/t-Makefile-use-dependency-graph-for-check-chainlint Which I'm not entirely sure I'm happy about, and it's jeust about the chainlint tests, but... > +sub ncores { > + # Windows > + return $ENV{NUMBER_OF_PROCESSORS} if exists($ENV{NUMBER_OF_PROCESSORS}); > + # Linux / MSYS2 / Cygwin / WSL > + do { local @ARGV='/proc/cpuinfo'; return scalar(grep(/^processor\s*:/, <>)); } if -r '/proc/cpuinfo'; > + # macOS & BSD > + return qx/sysctl -n hw.ncpu/ if $^O =~ /(?:^darwin$|bsd)/; > + return 1; > +} > + > sub show_stats { > my ($start_time, $stats) = @_; > my $walltime = $interval->($start_time); > @@ -621,7 +633,9 @@ sub exit_code { > Getopt::Long::Configure(qw{bundling}); > GetOptions( > "emit-all!" => \$emit_all, > + "jobs|j=i" => \$jobs, > "stats|show-stats!" => \$show_stats) or die("option error\n"); > +$jobs = ncores() if $jobs < 1; > > my $start_time = $getnow->(); > my @stats; > @@ -633,6 +647,40 @@ unless (@scripts) { > exit; > } > > -push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); })); > +unless ($Config{useithreads} && eval { > + require threads; threads->import(); > + require Thread::Queue; Thread::Queue->import(); > + 1; > + }) { > + push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); })); > + show_stats($start_time, \@stats) if $show_stats; > + exit(exit_code(\@stats)); > +} > + > +my $script_queue = Thread::Queue->new(); > +my $output_queue = Thread::Queue->new(); > + > +sub next_script { return $script_queue->dequeue(); } > +sub emit { $output_queue->enqueue(@_); } > + > +sub monitor { > + while (my $s = $output_queue->dequeue()) { > + print($s); > + } > +} > + > +my $mon = threads->create({'context' => 'void'}, \&monitor); > +threads->create({'context' => 'list'}, \&check_script, $_, \&next_script, \&emit) for 1..$jobs; > + > +$script_queue->enqueue(@scripts); > +$script_queue->end(); > + > +for (threads->list()) { > + push(@stats, $_->join()) unless $_ == $mon; > +} > + > +$output_queue->end(); > +$mon->join(); Maybe I'm misunderstanding this whole thing, but this really seems like the wrong direction in an otherwise fantastic direction of a series. I.e. it's *great* that we can do chain-lint without needing to actually execute the *.sh file, this series adds a lint parser that can parse those *.sh "at rest". But in your 16/18 you then do: +if test "${GIT_TEST_CHAIN_LINT:-1}" != 0 +then + "$PERL_PATH" "$TEST_DIRECTORY/chainlint.pl" "$0" || + BUG "lint error (see '?!...!? annotations above)" +fi I may just be missing something here, but why not instead just borrow what I did for "lint-docs" in 8650c6298c1 (doc lint: make "lint-docs" non-.PHONY, 2021-10-15)? I.e. if we can run against t0001-init.sh or whatever *once* to see if it chain-lints OK then surely we could have a rule like: t0001-init.sh.chainlint-ok: t0001-init.sh perl chainlint.pl $< >$@ Then whenever you change t0001-init.sh we refresh that t0001-init.sh.chainlint-ok, if the chainlint.pl exits non-zero we'll fail to make it, and will unlink that t0001-init.sh.chainlint-ok. That way you wouldn't need any parallelism in the Perl script, because you'd have "make" take care of it, and the common case of re-testing where the speed matters would be that we woudln't need to run this at all, or would only re-run it for the test scripts that changed. (Obviously a "real" implementation would want to create that ".ok" file in t/.build/chainlint" or whatever) A drawback is that you'd probably be slower on the initial run, as you'd spwn N chainlint.pl. You could use $? instead of $< to get around that, but that requires some re-structuring, and I've found it to generally not be worth it. It would also have the drawback that a: ./t0001-init.sh wouldn't run the chain-lint, but this would: make T=t0001-init.sh But if want the former to work we could carry some "GIT_TEST_VIA_MAKEFILE" variable or whatever, and only run the test-via-test-lib.sh if it isn't set.
On Thu, Sep 1, 2022 at 8:47 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Thu, Sep 01 2022, Eric Sunshine via GitGitGadget wrote: > > Although chainlint.pl has undergone a good deal of optimization during > > its development -- increasing in speed significantly -- parsing and > > validating 1050+ scripts and 16500+ tests via Perl is not exactly > > instantaneous. However, perceived performance can be improved by taking > > advantage of the fact that there is no interdependence between test > > scripts or test definitions, thus parsing and validating can be done in > > parallel. The number of available cores is determined automatically but > > can be overridden via the --jobs option. > > Per your CL: > > Ævar offered some sensible comments[2,3] about optimizing the Makefile rules > related to chainlint, but those optimizations are not tackled here for a few > reasons: (1) this series is already quite long, (2) I'd like to keep the > series focused on its primary goal of installing a new and improved linter, > (3) these patches do not make the Makefile situation any worse[4], and (4) > those optimizations can easily be done atop this series[5]. > > I have been running with those t/Makefile changesg locally, but didn't > submit them. FWIW that's here: > > https://github.com/git/git/compare/master...avar:git:avar/t-Makefile-use-dependency-graph-for-check-chainlint Thanks for the link. It's nice to see an actual implementation. I think most of what you wrote in the commit message and the patch itself are still meaningful following this series. > > +my $script_queue = Thread::Queue->new(); > > +my $output_queue = Thread::Queue->new(); > > + > > +my $mon = threads->create({'context' => 'void'}, \&monitor); > > +threads->create({'context' => 'list'}, \&check_script, $_, \&next_script, \&emit) for 1..$jobs; > > Maybe I'm misunderstanding this whole thing, but this really seems like > the wrong direction in an otherwise fantastic direction of a series. > > I.e. it's *great* that we can do chain-lint without needing to actually > execute the *.sh file, this series adds a lint parser that can parse > those *.sh "at rest". > > But in your 16/18 you then do: > > +if test "${GIT_TEST_CHAIN_LINT:-1}" != 0 > +then > + "$PERL_PATH" "$TEST_DIRECTORY/chainlint.pl" "$0" || > + BUG "lint error (see '?!...!? annotations above)" > +fi > > I may just be missing something here, but why not instead just borrow > what I did for "lint-docs" in 8650c6298c1 (doc lint: make "lint-docs" > non-.PHONY, 2021-10-15)? I may be misunderstanding, but regarding patch [16/18], I think you answered your own question at the end of your response when you pointed out the drawback that you wouldn't get linting when running the test script manually (i.e. `./t1234-test-stuff.sh`). Ensuring that the linter is invoked when running a test script manually is important (at least to me) since it's a frequent step when developing a new test or modifying an existing test. [16/18] is present to ensure that we still get that behavior. > I.e. if we can run against t0001-init.sh or whatever *once* to see if it > chain-lints OK then surely we could have a rule like: > > t0001-init.sh.chainlint-ok: t0001-init.sh > perl chainlint.pl $< >$@ > > Then whenever you change t0001-init.sh we refresh that > t0001-init.sh.chainlint-ok, if the chainlint.pl exits non-zero we'll > fail to make it, and will unlink that t0001-init.sh.chainlint-ok. > > That way you wouldn't need any parallelism in the Perl script, because > you'd have "make" take care of it, and the common case of re-testing > where the speed matters would be that we woudln't need to run this at > all, or would only re-run it for the test scripts that changed. A couple comments regarding parallelism: (1) as mentioned in another response, when developing the script, I had in mind that it might be useful for other projects (i.e. `sharness`), thus should be able to stand on its own without advanced Makefile support, and (2) process creation on Microsoft Windows is _very_ expensive and slow, so on that platform, being able to lint all tests in all script with a single invocation is a big win over running the linter 1050+ times, once for each test script. That's not to discredit any of your points... I'm just conveying some of my thought process. > (Obviously a "real" implementation would want to create that ".ok" file > in t/.build/chainlint" or whatever) > > A drawback is that you'd probably be slower on the initial run, as you'd > spwn N chainlint.pl. You could use $? instead of $< to get around that, > but that requires some re-structuring, and I've found it to generally > not be worth it. The $? trick might be something Windows folk would appreciate, and even those of us in macOS land (at least those of us with old hardware and OS). > It would also have the drawback that a: > > ./t0001-init.sh > > wouldn't run the chain-lint, but this would: > > make T=t0001-init.sh > > But if want the former to work we could carry some > "GIT_TEST_VIA_MAKEFILE" variable or whatever, and only run the > test-via-test-lib.sh if it isn't set. I may be misunderstanding, but isn't the GIT_TEST_CHAIN_LINT variable useful for this already, as in [16/18]? Regarding your observations as a whole, I think the extract from the cover letter which you cited above is relevant to my response. I don't disagree with your points about using the Makefile to optimize away unnecessary invocations of the linter, or that doing so can be a useful future direction. As mentioned in the cover letter, though, I think that such optimizations are outside the scope of this series which -- aside from installing an improved linter -- aims to maintain the status quo; in particular, this series ensures that (1) tests get linted as they are being written/modified when the developer runs the script manually `./t1234-test-stuff.sh`, and (2) all tests get linted upon `make test`. (The other reason why I'd prefer to see such optimizations applied atop this series is that I simply don't have the time these days to devote to major changes of direction in this series, which I think meets its stated goals without making the situation any worse or making it any more difficult to apply the optimizations you describe. And the new linter has been languishing on my computer for far too long; the implementation has been complete for well over a year, but it took me this long to finish polishing the patch series. I'd like to see the new linter make it into the toolchest of other developers since it can be beneficial; it has already found scores or hundreds[1] of possible hiding places for bugs due to broken &&-chain or missing `|| return`, and has sniffed out some actual broken tests[2,3].) [1]: https://lore.kernel.org/git/20211209051115.52629-1-sunshine@sunshineco.com/ [2]: https://lore.kernel.org/git/20211209051115.52629-3-sunshine@sunshineco.com/ [3]: https://lore.kernel.org/git/7b0784056f3cc0c96e9543ae44d0f5a7b0bf85fa.1661192802.git.gitgitgadget@gmail.com/
Eric Sunshine via GitGitGadget <gitgitgadget@gmail.com> wrote: > +unless ($Config{useithreads} && eval { > + require threads; threads->import(); Fwiw, the threads(3perl) manpage has this since 2014: The use of interpreter-based threads in perl is officially discouraged. I was bummed, too :< but I've decided it wasn't worth the effort to deal with the problems threads could cause down the line in future Perl versions. For example, common libraries like File::Temp will chdir behind-the-scenes which is thread-unsafe. (of course I only care about *BSD and Linux on MMU hardware, so I use SOCK_SEQPACKET and fork() freely :>)
On Tue, Sep 6, 2022 at 6:35 PM Eric Wong <e@80x24.org> wrote: > Eric Sunshine via GitGitGadget <gitgitgadget@gmail.com> wrote: > > +unless ($Config{useithreads} && eval { > > + require threads; threads->import(); > > Fwiw, the threads(3perl) manpage has this since 2014: > > The use of interpreter-based threads in perl is officially discouraged. Thanks for pointing this out. I did see that, but as no better alternative was offered, and since I did want this to work on Windows, I went with it. > I was bummed, too :< but I've decided it wasn't worth the > effort to deal with the problems threads could cause down the > line in future Perl versions. For example, common libraries > like File::Temp will chdir behind-the-scenes which is > thread-unsafe. > > (of course I only care about *BSD and Linux on MMU hardware, > so I use SOCK_SEQPACKET and fork() freely :>) I'm not overly worried about the deprecation at the moment since (1) chainlint.pl isn't a widely used script -- it's audience is very narrow; (2) the `$Config{useithreads}` conditional can be seen as an automatic escape-hatch, and (if need be) I can even make `--jobs=1` be an explicit escape hatch, and there's already --no-chain-lint for an extreme escape-hatch; (3) the script is pretty much standalone -- it doesn't rely upon any libraries like File::Temp or others; (4) Ævar has ideas for using the Makefile for parallelism instead; (5) we can cross the deprecation-bridge when/if it actually does become a problem, either by dropping parallelism from chainlint.pl or by dropping chainlint.pl itself.
On Tue, Sep 06, 2022 at 06:52:26PM -0400, Eric Sunshine wrote: > On Tue, Sep 6, 2022 at 6:35 PM Eric Wong <e@80x24.org> wrote: > > Eric Sunshine via GitGitGadget <gitgitgadget@gmail.com> wrote: > > > +unless ($Config{useithreads} && eval { > > > + require threads; threads->import(); > > > > Fwiw, the threads(3perl) manpage has this since 2014: > > > > The use of interpreter-based threads in perl is officially discouraged. > > Thanks for pointing this out. I did see that, but as no better > alternative was offered, and since I did want this to work on Windows, > I went with it. I did some timings the other night, and I found something quite curious with the thread stuff. Here's a hyperfine run of "make" in the t/ directory before any of your patches. It uses "prove" to do parallelism under the hood: Benchmark 1: make Time (mean ± σ): 68.895 s ± 0.840 s [User: 620.914 s, System: 428.498 s] Range (min … max): 67.943 s … 69.531 s 3 runs So that gives us a baseline. Now the first thing I wondered is how bad it would be to just run chainlint.pl once per script. So I applied up to that patch: Benchmark 1: make Time (mean ± σ): 71.289 s ± 1.302 s [User: 673.300 s, System: 417.912 s] Range (min … max): 69.788 s … 72.120 s 3 runs I was quite surprised that it made things slower! It's nice that we're only calling it once per script instead of once per test, but it seems the startup overhead of the script is really high. And since in this mode we're only feeding it one script at a time, I tried reverting the "chainlint.pl: validate test scripts in parallel" commit. And indeed, now things are much faster: Benchmark 1: make Time (mean ± σ): 61.544 s ± 3.364 s [User: 556.486 s, System: 384.001 s] Range (min … max): 57.660 s … 63.490 s 3 runs And you can see the same thing just running chainlint by itself: $ time perl chainlint.pl /dev/null real 0m0.069s user 0m0.042s sys 0m0.020s $ git revert HEAD^{/validate.test.scripts.in.parallel} $ time perl chainlint.pl /dev/null real 0m0.014s user 0m0.010s sys 0m0.004s I didn't track down the source of the slowness. Maybe it's loading extra modules, or maybe it's opening /proc/cpuinfo, or maybe it's the thread setup. But it's a surprising slowdown. Now of course your intent is to do a single repo-wide invocation. And that is indeed a bit faster. Here it is without the parallel code: Benchmark 1: make Time (mean ± σ): 61.727 s ± 2.140 s [User: 507.712 s, System: 377.753 s] Range (min … max): 59.259 s … 63.074 s 3 runs The wall-clock time didn't improve much, but the CPU time did. Restoring the parallel code does improve the wall-clock time a bit, but at the cost of some extra CPU: Benchmark 1: make Time (mean ± σ): 59.029 s ± 2.851 s [User: 515.690 s, System: 380.369 s] Range (min … max): 55.736 s … 60.693 s 3 runs which makes sense. If I do a with/without of just "make test-chainlint", the parallelism is buying a few seconds of wall-clock: Benchmark 1: make test-chainlint Time (mean ± σ): 900.1 ms ± 102.9 ms [User: 12049.8 ms, System: 79.7 ms] Range (min … max): 704.2 ms … 994.4 ms 10 runs Benchmark 1: make test-chainlint Time (mean ± σ): 3.778 s ± 0.042 s [User: 3.756 s, System: 0.023 s] Range (min … max): 3.706 s … 3.833 s 10 runs I'm not sure what it all means. For Linux, I think I'd be just as happy with a single non-parallelized test-chainlint run for each file. But maybe on Windows the startup overhead is worse? OTOH, the whole test run is so much worse there. One process per script is not going to be that much in relative terms either way. And if we did cache the results and avoid extra invocations via "make", then we'd want all the parallelism to move to there anyway. Maybe that gives you more food for thought about whether perl's "use threads" is worth having. -Peff
On Tue, Sep 6, 2022 at 7:27 PM Jeff King <peff@peff.net> wrote: > I did some timings the other night, and I found something quite curious > with the thread stuff. > > I was quite surprised that it made things slower! It's nice that we're > only calling it once per script instead of once per test, but it seems > the startup overhead of the script is really high. > > And since in this mode we're only feeding it one script at a time, I > tried reverting the "chainlint.pl: validate test scripts in parallel" > commit. And indeed, now things are much faster: > > Benchmark 1: make > Time (mean ± σ): 61.544 s ± 3.364 s [User: 556.486 s, System: 384.001 s] > Range (min … max): 57.660 s … 63.490 s 3 runs > > And you can see the same thing just running chainlint by itself: > > $ time perl chainlint.pl /dev/null > real 0m0.069s > user 0m0.042s > sys 0m0.020s > > $ git revert HEAD^{/validate.test.scripts.in.parallel} > $ time perl chainlint.pl /dev/null > real 0m0.014s > user 0m0.010s > sys 0m0.004s > > I didn't track down the source of the slowness. Maybe it's loading extra > modules, or maybe it's opening /proc/cpuinfo, or maybe it's the thread > setup. But it's a surprising slowdown. It is surprising, and unfortunate. Ditching "ithreads" would probably be a good idea. (more on that below) > Now of course your intent is to do a single repo-wide invocation. And > that is indeed a bit faster. Here it is without the parallel code: > > Benchmark 1: make > Time (mean ± σ): 61.727 s ± 2.140 s [User: 507.712 s, System: 377.753 s] > Range (min … max): 59.259 s … 63.074 s 3 runs > > The wall-clock time didn't improve much, but the CPU time did. Restoring > the parallel code does improve the wall-clock time a bit, but at the > cost of some extra CPU: > > Benchmark 1: make > Time (mean ± σ): 59.029 s ± 2.851 s [User: 515.690 s, System: 380.369 s] > Range (min … max): 55.736 s … 60.693 s 3 runs > > which makes sense. If I do a with/without of just "make test-chainlint", > the parallelism is buying a few seconds of wall-clock: > > Benchmark 1: make test-chainlint > Time (mean ± σ): 900.1 ms ± 102.9 ms [User: 12049.8 ms, System: 79.7 ms] > Range (min … max): 704.2 ms … 994.4 ms 10 runs > > Benchmark 1: make test-chainlint > Time (mean ± σ): 3.778 s ± 0.042 s [User: 3.756 s, System: 0.023 s] > Range (min … max): 3.706 s … 3.833 s 10 runs > > I'm not sure what it all means. For Linux, I think I'd be just as happy > with a single non-parallelized test-chainlint run for each file. But > maybe on Windows the startup overhead is worse? OTOH, the whole test run > is so much worse there. One process per script is not going to be that > much in relative terms either way. Somehow Windows manages to be unbelievably slow no matter what. I mentioned elsewhere (after you sent this) that I tested on a five or six year old 8-core dual-boot machine. Booted to Linux, running a single chainlint.pl invocation using all 8 cores to check all scripts in the project took under 1 second walltime. The same machine booted to Windows using all 8 cores took just under two minutes(!) walltime for the single Perl invocation to check all scripts in the project. So, at this point, I have no hope for making linting fast on Windows; it seems to be a lost cause. > And if we did cache the results and avoid extra invocations via "make", > then we'd want all the parallelism to move to there anyway. > > Maybe that gives you more food for thought about whether perl's "use > threads" is worth having. I'm not especially happy about the significant overhead of "ithreads"; on my (old) machine, although it does improve perceived time significantly, it eats up quite a bit of additional user-time. As such, I would not be unhappy to see "ithreads" go away, especially since fast linting on Windows seems unattainable (at least with Perl). Overall, I think Ævar's plan to parallelize linting via "make" is probably the way to go.
On Sun, Nov 20 2022, Eric Sunshine wrote: > On Tue, Sep 6, 2022 at 7:27 PM Jeff King <peff@peff.net> wrote: >> I did some timings the other night, and I found something quite curious >> with the thread stuff. >> >> I was quite surprised that it made things slower! It's nice that we're >> only calling it once per script instead of once per test, but it seems >> the startup overhead of the script is really high. >> >> And since in this mode we're only feeding it one script at a time, I >> tried reverting the "chainlint.pl: validate test scripts in parallel" >> commit. And indeed, now things are much faster: >> >> Benchmark 1: make >> Time (mean ± σ): 61.544 s ± 3.364 s [User: 556.486 s, System: 384.001 s] >> Range (min … max): 57.660 s … 63.490 s 3 runs >> >> And you can see the same thing just running chainlint by itself: >> >> $ time perl chainlint.pl /dev/null >> real 0m0.069s >> user 0m0.042s >> sys 0m0.020s >> >> $ git revert HEAD^{/validate.test.scripts.in.parallel} >> $ time perl chainlint.pl /dev/null >> real 0m0.014s >> user 0m0.010s >> sys 0m0.004s >> >> I didn't track down the source of the slowness. Maybe it's loading extra >> modules, or maybe it's opening /proc/cpuinfo, or maybe it's the thread >> setup. But it's a surprising slowdown. > > It is surprising, and unfortunate. Ditching "ithreads" would probably > be a good idea. (more on that below) > >> Now of course your intent is to do a single repo-wide invocation. And >> that is indeed a bit faster. Here it is without the parallel code: >> >> Benchmark 1: make >> Time (mean ± σ): 61.727 s ± 2.140 s [User: 507.712 s, System: 377.753 s] >> Range (min … max): 59.259 s … 63.074 s 3 runs >> >> The wall-clock time didn't improve much, but the CPU time did. Restoring >> the parallel code does improve the wall-clock time a bit, but at the >> cost of some extra CPU: >> >> Benchmark 1: make >> Time (mean ± σ): 59.029 s ± 2.851 s [User: 515.690 s, System: 380.369 s] >> Range (min … max): 55.736 s … 60.693 s 3 runs >> >> which makes sense. If I do a with/without of just "make test-chainlint", >> the parallelism is buying a few seconds of wall-clock: >> >> Benchmark 1: make test-chainlint >> Time (mean ± σ): 900.1 ms ± 102.9 ms [User: 12049.8 ms, System: 79.7 ms] >> Range (min … max): 704.2 ms … 994.4 ms 10 runs >> >> Benchmark 1: make test-chainlint >> Time (mean ± σ): 3.778 s ± 0.042 s [User: 3.756 s, System: 0.023 s] >> Range (min … max): 3.706 s … 3.833 s 10 runs >> >> I'm not sure what it all means. For Linux, I think I'd be just as happy >> with a single non-parallelized test-chainlint run for each file. But >> maybe on Windows the startup overhead is worse? OTOH, the whole test run >> is so much worse there. One process per script is not going to be that >> much in relative terms either way. > > Somehow Windows manages to be unbelievably slow no matter what. I > mentioned elsewhere (after you sent this) that I tested on a five or > six year old 8-core dual-boot machine. Booted to Linux, running a > single chainlint.pl invocation using all 8 cores to check all scripts > in the project took under 1 second walltime. The same machine booted > to Windows using all 8 cores took just under two minutes(!) walltime > for the single Perl invocation to check all scripts in the project. > > So, at this point, I have no hope for making linting fast on Windows; > it seems to be a lost cause. I'd be really interested in seeing e.g. the NYTProf output for that run, compared with that on *nix (if you could upload the HTML versions of both somewhere, even better). Maybe "chainlint.pl" is doing something odd, but this goes against the usual wisdom about what is and isn't slow in Perl on windows, as I understand it. I.e. process star-up etc. is slow there, and I/O's a bit slower, but once you're started up and e.g. slurping up all of those files & parsing them you're just running "perl-native" code. Which shouldn't be much slower at all. A perl compiled with ithreads is (last I checked) around 10-20% slower, and the Windows version is always compiled with that (it's needed for "fork" emulation). But most *nix versions are compiled with that too, and certainly the one you're using with "threads", so that's not the difference. So I suspect something odd's going on... >> And if we did cache the results and avoid extra invocations via "make", >> then we'd want all the parallelism to move to there anyway. >> >> Maybe that gives you more food for thought about whether perl's "use >> threads" is worth having. > > I'm not especially happy about the significant overhead of "ithreads"; > on my (old) machine, although it does improve perceived time > significantly, it eats up quite a bit of additional user-time. As > such, I would not be unhappy to see "ithreads" go away, especially > since fast linting on Windows seems unattainable (at least with Perl). > > Overall, I think Ævar's plan to parallelize linting via "make" is > probably the way to go. Yeah, but that seems to me to be orthagonal to why it's this slow on Windows, and if it is that wouldn't help much, except for incremental re-runs.
On Mon, Nov 21, 2022 at 8:32 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Sun, Nov 20 2022, Eric Sunshine wrote: > > Somehow Windows manages to be unbelievably slow no matter what. I > > mentioned elsewhere (after you sent this) that I tested on a five or > > six year old 8-core dual-boot machine. Booted to Linux, running a > > single chainlint.pl invocation using all 8 cores to check all scripts > > in the project took under 1 second walltime. The same machine booted > > to Windows using all 8 cores took just under two minutes(!) walltime > > for the single Perl invocation to check all scripts in the project. > > I'd be really interested in seeing e.g. the NYTProf output for that run, > compared with that on *nix (if you could upload the HTML versions of > both somewhere, even better). Unfortunately, I no longer have access to that machine, or usable Windows in general. Of course, someone else with access to a dual-boot machine could generate such a report, but whether anyone will offer to do so is a different matter. > Maybe "chainlint.pl" is doing something odd, but this goes against the > usual wisdom about what is and isn't slow in Perl on windows, as I > understand it. > > I.e. process star-up etc. is slow there, and I/O's a bit slower, but > once you're started up and e.g. slurping up all of those files & parsing > them you're just running "perl-native" code. > > Which shouldn't be much slower at all. A perl compiled with ithreads is > (last I checked) around 10-20% slower, and the Windows version is always > compiled with that (it's needed for "fork" emulation). > > But most *nix versions are compiled with that too, and certainly the one > you're using with "threads", so that's not the difference. > > So I suspect something odd's going on... This is all my understanding, as well, which is why I was so surprised by the difference in speed. Aside from suspecting Windows I/O as the culprit, another obvious possible culprit would be whatever mechanism/primitives "ithreads" is using on Windows for locking/synchronizing and passing messages between threads. I wouldn't be surprised to learn that those mechanisms/primitives have very high overhead on that platform. > > Overall, I think Ævar's plan to parallelize linting via "make" is > > probably the way to go. > > Yeah, but that seems to me to be orthagonal to why it's this slow on > Windows, and if it is that wouldn't help much, except for incremental > re-runs. Oh, I didn't at all mean that `make` parallelism would be helpful on Windows; I can't imagine that it ever would be (though I could once again be wrong). What I meant was that `make` parallelism would be a nice improvement and simplification (of sorts), in general, considering that I've given up hope of ever seeing linting be speedy on Windows.
On Mon, Nov 21 2022, Eric Sunshine wrote: > On Mon, Nov 21, 2022 at 8:32 AM Ævar Arnfjörð Bjarmason > <avarab@gmail.com> wrote: >> On Sun, Nov 20 2022, Eric Sunshine wrote: >> > Somehow Windows manages to be unbelievably slow no matter what. I >> > mentioned elsewhere (after you sent this) that I tested on a five or >> > six year old 8-core dual-boot machine. Booted to Linux, running a >> > single chainlint.pl invocation using all 8 cores to check all scripts >> > in the project took under 1 second walltime. The same machine booted >> > to Windows using all 8 cores took just under two minutes(!) walltime >> > for the single Perl invocation to check all scripts in the project. >> >> I'd be really interested in seeing e.g. the NYTProf output for that run, >> compared with that on *nix (if you could upload the HTML versions of >> both somewhere, even better). > > Unfortunately, I no longer have access to that machine, or usable > Windows in general. Of course, someone else with access to a dual-boot > machine could generate such a report, but whether anyone will offer to > do so is a different matter. :( >> Maybe "chainlint.pl" is doing something odd, but this goes against the >> usual wisdom about what is and isn't slow in Perl on windows, as I >> understand it. >> >> I.e. process star-up etc. is slow there, and I/O's a bit slower, but >> once you're started up and e.g. slurping up all of those files & parsing >> them you're just running "perl-native" code. >> >> Which shouldn't be much slower at all. A perl compiled with ithreads is >> (last I checked) around 10-20% slower, and the Windows version is always >> compiled with that (it's needed for "fork" emulation). >> >> But most *nix versions are compiled with that too, and certainly the one >> you're using with "threads", so that's not the difference. >> >> So I suspect something odd's going on... > > This is all my understanding, as well, which is why I was so surprised > by the difference in speed. Aside from suspecting Windows I/O as the > culprit, another obvious possible culprit would be whatever > mechanism/primitives "ithreads" is using on Windows for > locking/synchronizing and passing messages between threads. I wouldn't > be surprised to learn that those mechanisms/primitives have very high > overhead on that platform. Yeah, that could be, but then... >> > Overall, I think Ævar's plan to parallelize linting via "make" is >> > probably the way to go. >> >> Yeah, but that seems to me to be orthagonal to why it's this slow on >> Windows, and if it is that wouldn't help much, except for incremental >> re-runs. > > Oh, I didn't at all mean that `make` parallelism would be helpful on > Windows; I can't imagine that it ever would be (though I could once > again be wrong). What I meant was that `make` parallelism would be a > nice improvement and simplification (of sorts), in general, > considering that I've given up hope of ever seeing linting be speedy > on Windows. ...that parallelism probably wouldn't be helpful, as it'll run into another thing that's slow. But just ditching the "ithreads" commit from chainlint.pl should make it much faster, as sequentially parsing all the files isn't that slow, and as that won't use threads should be much faster then.
On Mon, Nov 21, 2022 at 9:20 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > On Mon, Nov 21 2022, Eric Sunshine wrote: > > Oh, I didn't at all mean that `make` parallelism would be helpful on > > Windows; I can't imagine that it ever would be (though I could once > > again be wrong). What I meant was that `make` parallelism would be a > > nice improvement and simplification (of sorts), in general, > > considering that I've given up hope of ever seeing linting be speedy > > on Windows. > > But just ditching the "ithreads" commit from chainlint.pl should make it > much faster, as sequentially parsing all the files isn't that slow, and > as that won't use threads should be much faster then. On my (old) machine (with spinning hard drive), `make test-chainlint` with "ithreads" and warm filesystem cache takes about 3.8 seconds walltime. Without "ithreads", it takes about 11.3 seconds. So, the improvement in perceived time is significant. As such, I'm somewhat hesitant to see "ithreads" dropped from chainlint.pl before `make` parallelism is implemented. (I can easily see "drop ithreads" as the final patch of a series which adds `make` parallelism.) But perhaps I'm focussing too much on my own experience with my old machine. Maybe linting without "ithreads" and without `make` parallelism would be "fast enough" for developers using beefier modern machines... (genuine question/thought since I don't have access to any beefy modern hardware).
On Sun, Nov 20, 2022 at 11:02:54PM -0500, Eric Sunshine wrote: > > And if we did cache the results and avoid extra invocations via "make", > > then we'd want all the parallelism to move to there anyway. > > > > Maybe that gives you more food for thought about whether perl's "use > > threads" is worth having. > > I'm not especially happy about the significant overhead of "ithreads"; > on my (old) machine, although it does improve perceived time > significantly, it eats up quite a bit of additional user-time. As > such, I would not be unhappy to see "ithreads" go away, especially > since fast linting on Windows seems unattainable (at least with Perl). > > Overall, I think Ævar's plan to parallelize linting via "make" is > probably the way to go. TBH, I think just running the linter once per test script when the script is run would be sufficient. That is one extra process per script, but they are already shell scripts running a bunch of processes. You get parallelism for free because you're already running the tests in parallel. You lose out on "don't bother linting because the file hasn't changed", but I'm not sure that's really worth the extra complexity overall. -Peff
On Mon, Nov 21, 2022 at 1:04 PM Jeff King <peff@peff.net> wrote: > On Sun, Nov 20, 2022 at 11:02:54PM -0500, Eric Sunshine wrote: > > Overall, I think Ævar's plan to parallelize linting via "make" is > > probably the way to go. > > TBH, I think just running the linter once per test script when the > script is run would be sufficient. That is one extra process per script, > but they are already shell scripts running a bunch of processes. You get > parallelism for free because you're already running the tests in > parallel. You lose out on "don't bother linting because the file hasn't > changed", but I'm not sure that's really worth the extra complexity > overall. Hmm, yes, that's appealing (especially since I've essentially given up on making linting fast on Windows), and it wouldn't be hard to implement. In fact, it's already implemented by 23a14f3016 (test-lib: replace chainlint.sed with chainlint.pl, 2022-09-01); making it work the way you describe would just involve dropping 69b9924b87 (t/Makefile: teach `make test` and `make prove` to run chainlint.pl, 2022-09-01) and 29fb2ec384 (chainlint.pl: validate test scripts in parallel, 2022-09-01). I think Ævar's use-case for `make` parallelization was to speed up git-bisect runs. But thinking about it now, the likelihood of "lint" problems cropping up during a git-bisect run is effectively nil, in which case setting GIT_TEST_CHAIN_LINT=1 should be a perfectly appropriate way to take linting out of the equation when bisecting.
On Mon, Nov 21, 2022 at 1:47 PM Eric Sunshine <sunshine@sunshineco.com> wrote: > I think Ævar's use-case for `make` parallelization was to speed up > git-bisect runs. But thinking about it now, the likelihood of "lint" > problems cropping up during a git-bisect run is effectively nil, in > which case setting GIT_TEST_CHAIN_LINT=1 should be a perfectly > appropriate way to take linting out of the equation when bisecting. I mean "GIT_TEST_CHAIN_LINT=0", of course.
On Mon, Nov 21, 2022 at 01:47:42PM -0500, Eric Sunshine wrote: > On Mon, Nov 21, 2022 at 1:04 PM Jeff King <peff@peff.net> wrote: > > On Sun, Nov 20, 2022 at 11:02:54PM -0500, Eric Sunshine wrote: > > > Overall, I think Ævar's plan to parallelize linting via "make" is > > > probably the way to go. > > > > TBH, I think just running the linter once per test script when the > > script is run would be sufficient. That is one extra process per script, > > but they are already shell scripts running a bunch of processes. You get > > parallelism for free because you're already running the tests in > > parallel. You lose out on "don't bother linting because the file hasn't > > changed", but I'm not sure that's really worth the extra complexity > > overall. > > Hmm, yes, that's appealing (especially since I've essentially given up > on making linting fast on Windows), and it wouldn't be hard to > implement. In fact, it's already implemented by 23a14f3016 (test-lib: > replace chainlint.sed with chainlint.pl, 2022-09-01); making it work > the way you describe would just involve dropping 69b9924b87 > (t/Makefile: teach `make test` and `make prove` to run chainlint.pl, > 2022-09-01) and 29fb2ec384 (chainlint.pl: validate test scripts in > parallel, 2022-09-01). Yes, that was one of the modes I timed in my original email. :) > I think Ævar's use-case for `make` parallelization was to speed up > git-bisect runs. But thinking about it now, the likelihood of "lint" > problems cropping up during a git-bisect run is effectively nil, in > which case setting GIT_TEST_CHAIN_LINT=1 should be a perfectly > appropriate way to take linting out of the equation when bisecting. Yes. It's also dumb to run a straight "make test" while bisecting in the first place, because you are going to run a zillion tests that aren't relevant to your bisection. Bisecting on "cd t && ./test-that-fails" is faster, at which point you're only running the one lint process (and if it really bothers you, you can disable chain lint as you suggest). -Peff
On Mon, Nov 21, 2022 at 1:52 PM Jeff King <peff@peff.net> wrote: > On Mon, Nov 21, 2022 at 01:47:42PM -0500, Eric Sunshine wrote: > > I think Ævar's use-case for `make` parallelization was to speed up > > git-bisect runs. But thinking about it now, the likelihood of "lint" > > problems cropping up during a git-bisect run is effectively nil, in > > which case setting GIT_TEST_CHAIN_LINT=1 should be a perfectly > > appropriate way to take linting out of the equation when bisecting. > > Yes. It's also dumb to run a straight "make test" while bisecting in the > first place, because you are going to run a zillion tests that aren't > relevant to your bisection. Bisecting on "cd t && ./test-that-fails" is > faster, at which point you're only running the one lint process (and if > it really bothers you, you can disable chain lint as you suggest). I think I misspoke. Dredging up old memories, I think Ævar's use-case is that he now runs: git rebase -i --exec 'make test' ... in order to ensure that the entire test suite passes for _every_ patch in a series. (This is due to him having missed a runtime breakage by only running "make test" after the final patch in a series was applied, when the breakage was only temporary -- added by one patch, but resolved by some other later patch.) Even so, GIT_TEST_CHAIN_LINT=0 should be appropriate here too.
On Mon, Nov 21, 2022 at 02:00:41PM -0500, Eric Sunshine wrote: > On Mon, Nov 21, 2022 at 1:52 PM Jeff King <peff@peff.net> wrote: > > On Mon, Nov 21, 2022 at 01:47:42PM -0500, Eric Sunshine wrote: > > > I think Ævar's use-case for `make` parallelization was to speed up > > > git-bisect runs. But thinking about it now, the likelihood of "lint" > > > problems cropping up during a git-bisect run is effectively nil, in > > > which case setting GIT_TEST_CHAIN_LINT=1 should be a perfectly > > > appropriate way to take linting out of the equation when bisecting. > > > > Yes. It's also dumb to run a straight "make test" while bisecting in the > > first place, because you are going to run a zillion tests that aren't > > relevant to your bisection. Bisecting on "cd t && ./test-that-fails" is > > faster, at which point you're only running the one lint process (and if > > it really bothers you, you can disable chain lint as you suggest). > > I think I misspoke. Dredging up old memories, I think Ævar's use-case > is that he now runs: > > git rebase -i --exec 'make test' ... > > in order to ensure that the entire test suite passes for _every_ patch > in a series. (This is due to him having missed a runtime breakage by > only running "make test" after the final patch in a series was > applied, when the breakage was only temporary -- added by one patch, > but resolved by some other later patch.) Yeah, I do that sometimes, too, especially when heavy refactoring is involved. > Even so, GIT_TEST_CHAIN_LINT=0 should be appropriate here too. Agreed. But also, my original point stands. If you are running 10 CPU minutes of tests, then a few CPU seconds of linting is not really that important. -Peff
On Mon, Nov 21 2022, Eric Sunshine wrote: > On Mon, Nov 21, 2022 at 1:52 PM Jeff King <peff@peff.net> wrote: >> On Mon, Nov 21, 2022 at 01:47:42PM -0500, Eric Sunshine wrote: >> > I think Ævar's use-case for `make` parallelization was to speed up >> > git-bisect runs. But thinking about it now, the likelihood of "lint" >> > problems cropping up during a git-bisect run is effectively nil, in >> > which case setting GIT_TEST_CHAIN_LINT=1 should be a perfectly >> > appropriate way to take linting out of the equation when bisecting. >> >> Yes. It's also dumb to run a straight "make test" while bisecting in the >> first place, because you are going to run a zillion tests that aren't >> relevant to your bisection. Bisecting on "cd t && ./test-that-fails" is >> faster, at which point you're only running the one lint process (and if >> it really bothers you, you can disable chain lint as you suggest). > > I think I misspoke. Dredging up old memories, I think Ævar's use-case > is that he now runs: > > git rebase -i --exec 'make test' ... > > in order to ensure that the entire test suite passes for _every_ patch > in a series. (This is due to him having missed a runtime breakage by > only running "make test" after the final patch in a series was > applied, when the breakage was only temporary -- added by one patch, > but resolved by some other later patch.) > > Even so, GIT_TEST_CHAIN_LINT=0 should be appropriate here too. I'd like to make "make" fast in terms of avoiding its own overhead before it gets to actual work mainly because of that use-case, but it helps in general. E.g. if you switch branches we don't compile a file we don't need to, we shouldn't re-run test checks we don't need either. For t/ this is: - Running chainlint.pl on the file, even if it didn't change - Ditto check-non-portable-shell.pl - Ditto "non-portable file name(s)" check - Ditto "test -x" on all test files I have a branch where these are all checked using dependencies instead, e.g. we run a "test -x" on t0071-sort.sh and create a ".build/check-executable/t0071-sort.sh.ok" if that passed, we don't need to shell out in the common case. The results of that are, and this is a best case in picking one where the test itself is cheap: $ git hyperfine -L rev @{u},HEAD~,HEAD -s 'make CFLAGS=-O3' 'make test T=t0071-sort.sh' -w 1 Benchmark 1: make test T=t0071-sort.sh' in '@{u} Time (mean ± σ): 1.168 s ± 0.074 s [User: 1.534 s, System: 0.082 s] Range (min … max): 1.096 s … 1.316 s 10 runs Benchmark 2: make test T=t0071-sort.sh' in 'HEAD~ Time (mean ± σ): 719.1 ms ± 46.1 ms [User: 910.6 ms, System: 79.7 ms] Range (min … max): 682.0 ms … 828.2 ms 10 runs Benchmark 3: make test T=t0071-sort.sh' in 'HEAD Time (mean ± σ): 685.0 ms ± 34.2 ms [User: 645.0 ms, System: 56.8 ms] Range (min … max): 657.6 ms … 773.6 ms 10 runs Summary 'make test T=t0071-sort.sh' in 'HEAD' ran 1.05 ± 0.09 times faster than 'make test T=t0071-sort.sh' in 'HEAD~' 1.71 ± 0.14 times faster than 'make test T=t0071-sort.sh' in '@{u}' The @{u} being "master", HEAD~ is "incremant without chainlint.pl", and "HEAD" is where it's all incremental. It's very WIP-quality, but I pushed the chainlint.pl part of it as a POC just now, I did the others a while ago: https://github.com/avar/git/tree/avar/t-Makefile-break-T-to-file-association
diff --git a/t/chainlint.pl b/t/chainlint.pl index d526723ac00..898573a9100 100755 --- a/t/chainlint.pl +++ b/t/chainlint.pl @@ -15,9 +15,11 @@ use warnings; use strict; +use Config; use File::Glob; use Getopt::Long; +my $jobs = -1; my $show_stats; my $emit_all; @@ -569,6 +571,16 @@ if (eval {require Time::HiRes; Time::HiRes->import(); 1;}) { $interval = sub { return Time::HiRes::tv_interval(shift); }; } +sub ncores { + # Windows + return $ENV{NUMBER_OF_PROCESSORS} if exists($ENV{NUMBER_OF_PROCESSORS}); + # Linux / MSYS2 / Cygwin / WSL + do { local @ARGV='/proc/cpuinfo'; return scalar(grep(/^processor\s*:/, <>)); } if -r '/proc/cpuinfo'; + # macOS & BSD + return qx/sysctl -n hw.ncpu/ if $^O =~ /(?:^darwin$|bsd)/; + return 1; +} + sub show_stats { my ($start_time, $stats) = @_; my $walltime = $interval->($start_time); @@ -621,7 +633,9 @@ sub exit_code { Getopt::Long::Configure(qw{bundling}); GetOptions( "emit-all!" => \$emit_all, + "jobs|j=i" => \$jobs, "stats|show-stats!" => \$show_stats) or die("option error\n"); +$jobs = ncores() if $jobs < 1; my $start_time = $getnow->(); my @stats; @@ -633,6 +647,40 @@ unless (@scripts) { exit; } -push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); })); +unless ($Config{useithreads} && eval { + require threads; threads->import(); + require Thread::Queue; Thread::Queue->import(); + 1; + }) { + push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); })); + show_stats($start_time, \@stats) if $show_stats; + exit(exit_code(\@stats)); +} + +my $script_queue = Thread::Queue->new(); +my $output_queue = Thread::Queue->new(); + +sub next_script { return $script_queue->dequeue(); } +sub emit { $output_queue->enqueue(@_); } + +sub monitor { + while (my $s = $output_queue->dequeue()) { + print($s); + } +} + +my $mon = threads->create({'context' => 'void'}, \&monitor); +threads->create({'context' => 'list'}, \&check_script, $_, \&next_script, \&emit) for 1..$jobs; + +$script_queue->enqueue(@scripts); +$script_queue->end(); + +for (threads->list()) { + push(@stats, $_->join()) unless $_ == $mon; +} + +$output_queue->end(); +$mon->join(); + show_stats($start_time, \@stats) if $show_stats; exit(exit_code(\@stats));