Message ID | 64b41d537e68a45f2bb0a0c3078f2cd314b5a57d.1600814153.git.matheus.bernardino@usp.br (mailing list archive) |
---|---|
State | Accepted |
Commit | 682569c6fd08b8f3f2edef85e0424a0997a70c68 |
Headers | show |
Series | Parallel Checkout (part I) | expand |
Hi, Matheus Tavares wrote: > Add tests to populate the working tree during clone and checkout using > the sequential and parallel modes, to confirm that they produce > identical results. Also test basic checkout mechanics, such as checking > for symlinks in the leading directories and the abidance to --force. Thanks for implementing parallel checkout! I'm excited about the feature. And thanks for including these tests. [...] > --- /dev/null > +++ b/t/lib-parallel-checkout.sh > @@ -0,0 +1,39 @@ [...] > +# Runs `git -c checkout.workers=$1 -c checkout.thesholdForParallelism=$2 ${@:4}` > +# and checks that the number of workers spawned is equal to $3. > +git_pc() nit: what does git_pc mean? Can this spell it out more verbosely, or could callers take on more of the burden? (Perhaps it would make sense to use a helper that uses test_config to set the relevant configuration, and then the caller can use plain "git clone"?) [...] > + GIT_TRACE2="$(pwd)/trace" git \ > + -c checkout.workers=$workers \ > + -c checkout.thresholdForParallelism=$threshold \ > + -c advice.detachedHead=0 \ > + $@ && $@ needs to be quoted, or else it will act like $* (and in particular it won't handle parameters with embedded spaces). > + > + # Check that the expected number of workers has been used. Note that it > + # can be different than the requested number in two cases: when the > + # quantity of entries to be checked out is less than the number of > + # workers; and when the threshold has not been reached. > + # > + local workers_in_trace=$(grep "child_start\[.\+\] git checkout--helper" trace | wc -l) && Do we use grep's \+ operator in other tests? I thought we preferred to use the more portable *, but it may be that I'm out of date. [...] > +# Verify that both the working tree and the index were created correctly > +verify_checkout() > +{ > + git -C $1 diff-index --quiet HEAD -- && > + git -C $1 diff-index --quiet --cached HEAD -- && > + git -C $1 status --porcelain >$1.status && > + test_must_be_empty $1.status > +} Like git_pc, this is not easy to take in at a glance. "$1" needs to be quoted if we are to handle paths with spaces. [...] > --- /dev/null > +++ b/t/t2080-parallel-checkout-basics.sh > @@ -0,0 +1,197 @@ > +#!/bin/sh > + > +test_description='parallel-checkout basics > + > +Ensure that parallel-checkout basically works on clone and checkout, spawning > +the required number of workers and correctly populating both the index and > +working tree. > +' > + > +TEST_NO_CREATE_REPO=1 > +. ./test-lib.sh > +. "$TEST_DIRECTORY/lib-parallel-checkout.sh" > + > +# NEEDSWORK: cloning a SHA1 repo with GIT_TEST_DEFAULT_HASH set to "sha256" > +# currently produces a wrong result (See > +# https://lore.kernel.org/git/20200911151717.43475-1-matheus.bernardino@usp.br/). > +# So we skip the "parallel-checkout during clone" tests when this test flag is > +# set to "sha256". Remove this when the bug is fixed. > +# > +if test "$GIT_TEST_DEFAULT_HASH" = "sha256" > +then > + skip_all="t2080 currently don't work with GIT_TEST_DEFAULT_HASH=sha256" > + test_done > +fi > + > +R_BASE=$GIT_BUILD_DIR > + > +test_expect_success 'sequential clone' ' > + git_pc 1 0 0 clone --quiet -- $R_BASE r_sequential && This fails when I run it when building from a tarball, which is presenting me from releasing this patch series to Debian experimental. Can we use an artificial repo instead of git.git? Using git.git as test data seems like a recipe for hard-to-reproduce test failures. Thanks and hope that helps, Jonathan
On Mon, Oct 19, 2020 at 06:35:58PM -0700, Jonathan Nieder wrote: > > + > > + # Check that the expected number of workers has been used. Note that it > > + # can be different than the requested number in two cases: when the > > + # quantity of entries to be checked out is less than the number of > > + # workers; and when the threshold has not been reached. > > + # > > + local workers_in_trace=$(grep "child_start\[.\+\] git checkout--helper" trace | wc -l) && > > Do we use grep's \+ operator in other tests? I thought we preferred to > use the more portable *, but it may be that I'm out of date. You're not out-of-date; I looked at this myself a couple of months ago: https://lore.kernel.org/git/20200812140352.GC74542@syl.lan/ Thanks, Taylor
Hi, Jonathan On Mon, Oct 19, 2020 at 10:36 PM Jonathan Nieder <jrnieder@gmail.com> wrote: > > Hi, > > Matheus Tavares wrote: > > > Add tests to populate the working tree during clone and checkout using > > the sequential and parallel modes, to confirm that they produce > > identical results. Also test basic checkout mechanics, such as checking > > for symlinks in the leading directories and the abidance to --force. > > Thanks for implementing parallel checkout! I'm excited about the > feature. And thanks for including these tests. Thanks for the comments and feedback :) > [...] > > --- /dev/null > > +++ b/t/lib-parallel-checkout.sh > > @@ -0,0 +1,39 @@ > [...] > > +# Runs `git -c checkout.workers=$1 -c checkout.thesholdForParallelism=$2 ${@:4}` > > +# and checks that the number of workers spawned is equal to $3. > > +git_pc() > > nit: what does git_pc mean? The idea was "git w/ parallel-checkout". But I realize it may have gotten too abbreviated... > Can this spell it out more verbosely, or > could callers take on more of the burden? (Perhaps it would make sense > to use a helper that uses test_config to set the relevant configuration, > and then the caller can use plain "git clone"?) Hmm, it's possible, but I think we might end up with quite a lot of repetition (to always check that checkout spawned the right number of workers). > [...] > > + GIT_TRACE2="$(pwd)/trace" git \ > > + -c checkout.workers=$workers \ > > + -c checkout.thresholdForParallelism=$threshold \ > > + -c advice.detachedHead=0 \ > > + $@ && > > $@ needs to be quoted, or else it will act like $* (and in particular it > won't handle parameters with embedded spaces). Nice catch, thanks! I will send a patch for this tomorrow. > > + > > + # Check that the expected number of workers has been used. Note that it > > + # can be different than the requested number in two cases: when the > > + # quantity of entries to be checked out is less than the number of > > + # workers; and when the threshold has not been reached. > > + # > > + local workers_in_trace=$(grep "child_start\[.\+\] git checkout--helper" trace | wc -l) && > > Do we use grep's \+ operator in other tests? I thought we preferred to > use the more portable *, but it may be that I'm out of date. Oh, I didn't know about the portability issue with \+. This is already in `next`, but I guess it's worth sending a follow-up patch to fix it, right? (I see we have a second \+ occurrence in t7508, which could be changed in the same patch.) > [...] > > +# Verify that both the working tree and the index were created correctly > > +verify_checkout() > > +{ > > + git -C $1 diff-index --quiet HEAD -- && > > + git -C $1 diff-index --quiet --cached HEAD -- && > > + git -C $1 status --porcelain >$1.status && > > + test_must_be_empty $1.status > > +} > > Like git_pc, this is not easy to take in at a glance. > > "$1" needs to be quoted if we are to handle paths with spaces. Thanks, again :) Currently, this function doesn't get paths with spaces, but I agree that it's better to be cautious here. > [...] > > --- /dev/null > > +++ b/t/t2080-parallel-checkout-basics.sh > > @@ -0,0 +1,197 @@ > > +#!/bin/sh > > + > > +test_description='parallel-checkout basics > > + > > +Ensure that parallel-checkout basically works on clone and checkout, spawning > > +the required number of workers and correctly populating both the index and > > +working tree. > > +' > > + > > +TEST_NO_CREATE_REPO=1 > > +. ./test-lib.sh > > +. "$TEST_DIRECTORY/lib-parallel-checkout.sh" > > + > > +# NEEDSWORK: cloning a SHA1 repo with GIT_TEST_DEFAULT_HASH set to "sha256" > > +# currently produces a wrong result (See > > +# https://lore.kernel.org/git/20200911151717.43475-1-matheus.bernardino@usp.br/). > > +# So we skip the "parallel-checkout during clone" tests when this test flag is > > +# set to "sha256". Remove this when the bug is fixed. > > +# > > +if test "$GIT_TEST_DEFAULT_HASH" = "sha256" > > +then > > + skip_all="t2080 currently don't work with GIT_TEST_DEFAULT_HASH=sha256" > > + test_done > > +fi > > + > > +R_BASE=$GIT_BUILD_DIR > > + > > +test_expect_success 'sequential clone' ' > > + git_pc 1 0 0 clone --quiet -- $R_BASE r_sequential && > > This fails when I run it when building from a tarball, which is > presenting me from releasing this patch series to Debian experimental. Sorry for the trouble :( It didn't occur to me, while writing the test, that it could also be run from the tarball. > Can we use an artificial repo instead of git.git? Using git.git as > test data seems like a recipe for hard-to-reproduce test failures. I think we could maybe drop these tests. There are already some similar tests below these, which use an artificial repository. The goal of using git.git in this section was to test parallel-checkout with a real-world repo, and hopefully catch errors that we might not see with small artificial ones. But you have a very valid concern, as well. Hmm, I'm not sure what is the best solution to this case. What do you think?
Matheus Tavares Bernardino wrote: > On Mon, Oct 19, 2020 at 10:36 PM Jonathan Nieder <jrnieder@gmail.com> wrote: >> Can we use an artificial repo instead of git.git? Using git.git as >> test data seems like a recipe for hard-to-reproduce test failures. > > I think we could maybe drop these tests. There are already some > similar tests below these, which use an artificial repository. The > goal of using git.git in this section was to test parallel-checkout > with a real-world repo, and hopefully catch errors that we might not > see with small artificial ones. But you have a very valid concern, as > well. Hmm, I'm not sure what is the best solution to this case. What > do you think? I see. I suppose my preference would be to have a real-world example in t/perf/ (see t/perf/README for how it allows an arbitrary repo to be passed in) instead of in the regression tests. In the regression testsuite I'd focus more on particular behaviors I want to test (e.g., a file being replaced by a directory, that kind of thing). Behaviors exercised by git.git are in some sense the *least* important thing to test here, since developers in the Git project know to advocate for those and exercise them day-to-day. Where the testsuite shines is in being able to advocate for use cases that are exercised by other populations --- a testsuite failure can be a reminder to not forget about the features other people need that are not part of our own daily lives. Thanks, Jonathan
On Mon, Oct 19, 2020 at 11:55 PM Taylor Blau <me@ttaylorr.com> wrote: > > On Mon, Oct 19, 2020 at 06:35:58PM -0700, Jonathan Nieder wrote: > > > + > > > + # Check that the expected number of workers has been used. Note that it > > > + # can be different than the requested number in two cases: when the > > > + # quantity of entries to be checked out is less than the number of > > > + # workers; and when the threshold has not been reached. > > > + # > > > + local workers_in_trace=$(grep "child_start\[.\+\] git checkout--helper" trace | wc -l) && > > > > Do we use grep's \+ operator in other tests? I thought we preferred to > > use the more portable *, but it may be that I'm out of date. > > You're not out-of-date; I looked at this myself a couple of months ago: > > https://lore.kernel.org/git/20200812140352.GC74542@syl.lan/ Thanks for the pointer, I'll replace .\+ by ..*, then. I noticed we also have some uses of + and ? in tests, with `grep -E` (or egrep). Are we OK with ERE or did these maybe just slip in by accident?
Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes: > I noticed we also have some uses of + and ? in tests, with `grep -E` > (or egrep). Are we OK with ERE or did these maybe just slip in by > accident? We are OK with 'grep -E' and 'egrep' and write '+' and '?' as valid ERE elements. What we are not OK with is to invoke ERE elements in an expression that is supposed to be a BRE by prefixing a backslash, e.g. '\+'. Perhaps it is a GNU extension? We need to remove '\+' in t/perf/bisect_regression used with sed. What is sad is that this trick and "sed -E" are both GNUisms, and there is no portable way to use ERE with sed X-<. But we could resort to Perl in truly tricky cases ;-).
Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes: > Oh, I didn't know about the portability issue with \+. This is already > in `next`, but I guess it's worth sending a follow-up patch to fix it, > right? (I see we have a second \+ occurrence in t7508, which could be > changed in the same patch.) Note that soon, typically a week, after a release, the tip of the next branch is rewound and all the topics that did not graduate to master has a chance to get a clean start. This may be a good use case for that chance.
diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh new file mode 100644 index 0000000000..c95ca27711 --- /dev/null +++ b/t/lib-parallel-checkout.sh @@ -0,0 +1,39 @@ +# Helpers for t208* tests + +# Runs `git -c checkout.workers=$1 -c checkout.thesholdForParallelism=$2 ${@:4}` +# and checks that the number of workers spawned is equal to $3. +git_pc() +{ + if test $# -lt 4 + then + BUG "too few arguments to git_pc()" + fi + + workers=$1 threshold=$2 expected_workers=$3 && + shift && shift && shift && + + rm -f trace && + GIT_TRACE2="$(pwd)/trace" git \ + -c checkout.workers=$workers \ + -c checkout.thresholdForParallelism=$threshold \ + -c advice.detachedHead=0 \ + $@ && + + # Check that the expected number of workers has been used. Note that it + # can be different than the requested number in two cases: when the + # quantity of entries to be checked out is less than the number of + # workers; and when the threshold has not been reached. + # + local workers_in_trace=$(grep "child_start\[.\+\] git checkout--helper" trace | wc -l) && + test $workers_in_trace -eq $expected_workers && + rm -f trace +} + +# Verify that both the working tree and the index were created correctly +verify_checkout() +{ + git -C $1 diff-index --quiet HEAD -- && + git -C $1 diff-index --quiet --cached HEAD -- && + git -C $1 status --porcelain >$1.status && + test_must_be_empty $1.status +} diff --git a/t/t2080-parallel-checkout-basics.sh b/t/t2080-parallel-checkout-basics.sh new file mode 100755 index 0000000000..c088a06ecc --- /dev/null +++ b/t/t2080-parallel-checkout-basics.sh @@ -0,0 +1,197 @@ +#!/bin/sh + +test_description='parallel-checkout basics + +Ensure that parallel-checkout basically works on clone and checkout, spawning +the required number of workers and correctly populating both the index and +working tree. +' + +TEST_NO_CREATE_REPO=1 +. ./test-lib.sh +. "$TEST_DIRECTORY/lib-parallel-checkout.sh" + +# NEEDSWORK: cloning a SHA1 repo with GIT_TEST_DEFAULT_HASH set to "sha256" +# currently produces a wrong result (See +# https://lore.kernel.org/git/20200911151717.43475-1-matheus.bernardino@usp.br/). +# So we skip the "parallel-checkout during clone" tests when this test flag is +# set to "sha256". Remove this when the bug is fixed. +# +if test "$GIT_TEST_DEFAULT_HASH" = "sha256" +then + skip_all="t2080 currently don't work with GIT_TEST_DEFAULT_HASH=sha256" + test_done +fi + +R_BASE=$GIT_BUILD_DIR + +test_expect_success 'sequential clone' ' + git_pc 1 0 0 clone --quiet -- $R_BASE r_sequential && + verify_checkout r_sequential +' + +test_expect_success 'parallel clone' ' + git_pc 2 0 2 clone --quiet -- $R_BASE r_parallel && + verify_checkout r_parallel +' + +test_expect_success 'fallback to sequential clone (threshold)' ' + git -C $R_BASE ls-files >files && + nr_files=$(wc -l <files) && + threshold=$(($nr_files + 1)) && + + git_pc 2 $threshold 0 clone --quiet -- $R_BASE r_sequential_fallback && + verify_checkout r_sequential_fallback +' + +# Just to be paranoid, actually compare the contents of the worktrees directly. +test_expect_success 'compare working trees from clones' ' + rm -rf r_sequential/.git && + rm -rf r_parallel/.git && + rm -rf r_sequential_fallback/.git && + diff -qr r_sequential r_parallel && + diff -qr r_sequential r_sequential_fallback +' + +# Test parallel-checkout with different operations (creation, deletion, +# modification) and entry types. A branch switch from B1 to B2 will contain: +# +# - a (file): modified +# - e/x (file): deleted +# - b (symlink): deleted +# - b/f (file): created +# - e (symlink): created +# - d (submodule): created +# +test_expect_success SYMLINKS 'setup repo for checkout with various operations' ' + git init various && + ( + cd various && + git checkout -b B1 && + echo a>a && + mkdir e && + echo e/x >e/x && + ln -s e b && + git add -A && + git commit -m B1 && + + git checkout -b B2 && + echo modified >a && + rm -rf e && + rm b && + mkdir b && + echo b/f >b/f && + ln -s b e && + git init d && + test_commit -C d f && + git submodule add ./d && + git add -A && + git commit -m B2 && + + git checkout --recurse-submodules B1 + ) +' + +test_expect_success SYMLINKS 'sequential checkout' ' + cp -R various various_sequential && + git_pc 1 0 0 -C various_sequential checkout --recurse-submodules B2 && + verify_checkout various_sequential +' + +test_expect_success SYMLINKS 'parallel checkout' ' + cp -R various various_parallel && + git_pc 2 0 2 -C various_parallel checkout --recurse-submodules B2 && + verify_checkout various_parallel +' + +test_expect_success SYMLINKS 'fallback to sequential checkout (threshold)' ' + cp -R various various_sequential_fallback && + git_pc 2 100 0 -C various_sequential_fallback checkout --recurse-submodules B2 && + verify_checkout various_sequential_fallback +' + +test_expect_success SYMLINKS 'compare working trees from checkouts' ' + rm -rf various_sequential/.git && + rm -rf various_parallel/.git && + rm -rf various_sequential_fallback/.git && + diff -qr various_sequential various_parallel && + diff -qr various_sequential various_sequential_fallback +' + +test_cmp_str() +{ + echo "$1" >tmp && + test_cmp tmp "$2" +} + +test_expect_success 'parallel checkout respects --[no]-force' ' + git init dirty && + ( + cd dirty && + mkdir D && + test_commit D/F && + test_commit F && + + echo changed >F.t && + rm -rf D && + echo changed >D && + + # We expect 0 workers because there is nothing to be updated + git_pc 2 0 0 checkout HEAD && + test_path_is_file D && + test_cmp_str changed D && + test_cmp_str changed F.t && + + git_pc 2 0 2 checkout --force HEAD && + test_path_is_dir D && + test_cmp_str D/F D/F.t && + test_cmp_str F F.t + ) +' + +test_expect_success SYMLINKS 'parallel checkout checks for symlinks in leading dirs' ' + git init symlinks && + ( + cd symlinks && + mkdir D E && + + # Create two entries in D to have enough work for 2 parallel + # workers + test_commit D/A && + test_commit D/B && + test_commit E/C && + rm -rf D && + ln -s E D && + + git_pc 2 0 2 checkout --force HEAD && + ! test -L D && + test_cmp_str D/A D/A.t && + test_cmp_str D/B D/B.t + ) +' + +test_expect_success SYMLINKS,CASE_INSENSITIVE_FS 'symlink colliding with leading dir' ' + git init colliding-symlink && + ( + cd colliding-symlink && + file_hex=$(git hash-object -w --stdin </dev/null) && + file_oct=$(echo $file_hex | hex2oct) && + + sym_hex=$(echo "./D" | git hash-object -w --stdin) && + sym_oct=$(echo $sym_hex | hex2oct) && + + printf "100644 D/A\0${file_oct}" >tree && + printf "100644 E/B\0${file_oct}" >>tree && + printf "120000 e\0${sym_oct}" >>tree && + + tree_hex=$(git hash-object -w -t tree --stdin <tree) && + commit_hex=$(git commit-tree -m collisions $tree_hex) && + git update-ref refs/heads/colliding-symlink $commit_hex && + + git_pc 2 0 2 checkout colliding-symlink && + test_path_is_dir D && + test_path_is_missing D/B + ) +' + +test_done