[4/7] parallel-checkout: add tests for basic operations

Message ID	6379b8df6a59361dd44733e379880a11c6cd977c.1619104091.git.matheus.bernardino@usp.br (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> From: Matheus Tavares <matheus.bernardino@usp.br> To: git@vger.kernel.org Cc: christian.couder@gmail.com, git@jeffhostetler.com, Jonathan Nieder <jrnieder@gmail.com> Subject: [PATCH 4/7] parallel-checkout: add tests for basic operations Date: Thu, 22 Apr 2021 12:17:50 -0300 Message-Id: <6379b8df6a59361dd44733e379880a11c6cd977c.1619104091.git.matheus.bernardino@usp.br> In-Reply-To: <cover.1619104091.git.matheus.bernardino@usp.br> References: <cover.1619104091.git.matheus.bernardino@usp.br> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Parallel Checkout (part 3) \| expand [0/7] Parallel Checkout (part 3) [1/7] make_transient_cache_entry(): optionally alloc from mem_pool [2/7] builtin/checkout.c: complete parallel checkout support [3/7] checkout-index: add parallel checkout support [4/7] parallel-checkout: add tests for basic operations [5/7] parallel-checkout: add tests related to path collisions [6/7] parallel-checkout: add tests related to .gitattributes [7/7] ci: run test round with parallel-checkout enabled

Message ID

6379b8df6a59361dd44733e379880a11c6cd977c.1619104091.git.matheus.bernardino@usp.br (mailing list archive)

State

Superseded

Headers

From: Matheus Tavares <matheus.bernardino@usp.br>
To: git@vger.kernel.org
Cc: christian.couder@gmail.com, git@jeffhostetler.com,
        Jonathan Nieder <jrnieder@gmail.com>
Subject: [PATCH 4/7] parallel-checkout: add tests for basic operations
Date: Thu, 22 Apr 2021 12:17:50 -0300
Message-Id: 
 <6379b8df6a59361dd44733e379880a11c6cd977c.1619104091.git.matheus.bernardino@usp.br>
In-Reply-To: <cover.1619104091.git.matheus.bernardino@usp.br>
References: <cover.1619104091.git.matheus.bernardino@usp.br>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Parallel Checkout (part 3) | expand

Commit Message

Matheus Tavares April 22, 2021, 3:17 p.m. UTC

Add tests to populate the working tree during clone and checkout using
sequential and parallel mode, to confirm that they produce identical
results. Also test basic checkout mechanics, such as checking for
symlinks in the leading directories and the abidance to --force.

Note: some helper functions are added to a common lib file which is only
included by t2080 for now. But they will also be used by other
parallel-checkout tests in the following patches.

Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/lib-parallel-checkout.sh          |  37 +++++++
 t/t2080-parallel-checkout-basics.sh | 150 ++++++++++++++++++++++++++++
 2 files changed, 187 insertions(+)
 create mode 100644 t/lib-parallel-checkout.sh
 create mode 100755 t/t2080-parallel-checkout-basics.sh

Comments

Derrick Stolee April 23, 2021, 7:18 p.m. UTC | #1

On 4/22/2021 11:17 AM, Matheus Tavares wrote:
> Add tests to populate the working tree during clone and checkout using
> sequential and parallel mode, to confirm that they produce identical
> results. Also test basic checkout mechanics, such as checking for
> symlinks in the leading directories and the abidance to --force.
> 
> Note: some helper functions are added to a common lib file which is only
> included by t2080 for now. But they will also be used by other
> parallel-checkout tests in the following patches.
> 
> Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>

Is this a standard thing? Or, did you change the patch significantly
enough that "Co-authored-by:" is no longer appropriate?

> +++ b/t/lib-parallel-checkout.sh
> @@ -0,0 +1,37 @@
> +# Helpers for t208* tests

I could see tests outside of the t208* range possibly having
value in interacting with parallel checkout in the future.

Perhaps:

	# Helpers for tests invoking parallel-checkout

> +
> +set_checkout_config () {
> +	if test $# -ne 2
> +	then
> +		BUG "set_checkout_config() requires two arguments"
> +	fi &&

A usage comment is typically helpful for these helpers:

# set_checkout_config <workers> <threshold>
# sets global config values to use the given number of
# workers when the given threshold is met.

> +
> +	test_config_global checkout.workers $1 &&
> +	test_config_global checkout.thresholdForParallelism $2
> +}
> +
> +# Run "${@:2}" and check that $1 checkout workers were used
> +test_checkout_workers () {

This simpler doc style works, too.

> +	if test $# -lt 2
> +	then
> +		BUG "too few arguments to test_checkout_workers()"
> +	fi &&
> +
> +	expected_workers=$1 &&
> +	shift &&
> +
> +	rm -f trace &&
> +	GIT_TRACE2="$(pwd)/trace" "$@" &&
> +
> +	workers=$(grep "child_start\[..*\] git checkout--worker" trace | wc -l) &&
> +	test $workers -eq $expected_workers &&
> +	rm -f trace

I wonder if this should be a "test_when_finished rm -f trace" being
recorded earlier in the step. It would also benefit from using a more
specific name than "trace". Something like "trace-test-checkout-workers"
would be unlikely to collide with someone else's trace.

> +# Verify that both the working tree and the index were created correctly
> +verify_checkout () {

Add usage of a repo in $1?

> +	git -C "$1" diff-index --quiet HEAD -- &&
> +	git -C "$1" diff-index --quiet --cached HEAD -- &&
> +	git -C "$1" status --porcelain >"$1".status &&
> +	test_must_be_empty "$1".status
> +}


> +TEST_NO_CREATE_REPO=1
> +. ./test-lib.sh
> +. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
> +
> +# Test parallel-checkout with a branch switch containing file creations,
> +# deletions, and modification; with different entry types. Switching from B1 to
> +# B2 will have the following changes:
> +#
> +# - a (file):      modified
> +# - e/x (file):    deleted
> +# - b (symlink):   deleted
> +# - b/f (file):    created
> +# - e (symlink):   created
> +# - d (submodule): created
> +#

An interesting set of changes. What about a directory/file conflict?

Something like this might be useful:

 # - f/t (file):   deleted
 # - f (file):     created

in fact, it could be interesting to have a file conflict with each
of these types, such as the symlink 'e' and the submodule 'd'.

While we are at it, what about a symlink/submodule conflict? I know
it makes the test bigger, but doing everything simultaneously through
a carefully designed repository helps prevent test case bloat.

> +test_expect_success SYMLINKS 'setup repo for checkout with various types of changes' '

Could we split this setup step into two parts? Once could
set up everything except the symlinks and would not require
the SYMLINKS prereq. We could then have another test with
the SYMLINKS prereq that extends B1 and B2 to have symlinks
and their conflicts. The remaining tests would work on any
platform without needing the SYMLINKS prereq.

> +test_expect_success SYMLINKS 'sequential checkout' '
> +	cp -R various various_sequential &&
> +	set_checkout_config 1 0 &&
> +	test_checkout_workers 0 \
> +		git -C various_sequential checkout --recurse-submodules B2 &&
> +	verify_checkout various_sequential
> +'

I see all these tests are very similar. Perhaps group
them to demonstrate their differences?

parallel_test_case () {
	test_expect_success "$1" "
		cp -R various $2 &&
		set_checkout_config $3 $4 &&
		test_checkout_workers $5 \
			git -C $2 checkout --recurse-submodules B2 &&
		verify_checkout $2
	"
}

parallel_test_case 'sequential checkout' \
	various_sequential 1 0 0
parallel_test_case 'parallel checkout' \
	various_parallel 2 0 2
parallel_test_case 'fallback to sequential checkout (threshold)' \
	various_sequential_fallback 2 100 0

> +test_expect_success SYMLINKS 'parallel checkout on clone' '
> +	git -C various checkout --recurse-submodules B2 &&
> +	set_checkout_config 2 0 &&
> +	test_checkout_workers 2 \
> +		git clone --recurse-submodules various various_parallel_clone &&
> +	verify_checkout various_parallel_clone
> +'
> +
> +test_expect_success SYMLINKS 'fallback to sequential checkout on clone (threshold)' '
> +	git -C various checkout --recurse-submodules B2 &&
> +	set_checkout_config 2 100 &&
> +	test_checkout_workers 0 \
> +		git clone --recurse-submodules various various_sequential_fallback_clone &&
> +	verify_checkout various_sequential_fallback_clone
> +'

Doing a similar grouping for the clone case might be interesting,
if only for the possible future where more customization might be
necessary.

Since the clone case is only caring about the contents at B2, it
is good that B2 contains one of each type of entry.

> +# Just to be paranoid, actually compare the working trees' contents directly.
> +test_expect_success SYMLINKS 'compare the working trees' '
> +	rm -rf various_*/.git &&
> +	rm -rf various_*/d/.git &&
> +
> +	diff -r various_sequential various_parallel &&
> +	diff -r various_sequential various_sequential_fallback &&
> +	diff -r various_sequential various_parallel_clone &&
> +	diff -r various_sequential various_sequential_fallback_clone
> +'
> +
> +test_expect_success 'parallel checkout respects --[no]-force' '
> +	set_checkout_config 2 0 &&
> +	git init dirty &&
> +	(
> +		cd dirty &&
> +		mkdir D &&
> +		test_commit D/F &&
> +		test_commit F &&
> +
> +		rm -rf D &&
> +		echo changed >D &&
> +		echo changed >F.t &&
> +
> +		# We expect 0 workers because there is nothing to be done
> +		test_checkout_workers 0 git checkout HEAD &&
> +		test_path_is_file D &&
> +		grep changed D &&
> +		grep changed F.t &&
> +
> +		test_checkout_workers 2 git checkout --force HEAD &&
> +		test_path_is_dir D &&
> +		grep D/F D/F.t &&
> +		grep F F.t
> +	)
> +'

I see SYMLINKS is not necessary here due to creating a new repo.
Still better to not have the prereq when not necessary.

> +test_expect_success SYMLINKS 'parallel checkout checks for symlinks in leading dirs' '
> +	set_checkout_config 2 0 &&
> +	git init symlinks &&
> +	(
> +		cd symlinks &&
> +		mkdir D untracked &&
> +		# Commit 2 files to have enough work for 2 parallel workers
> +		test_commit D/A &&
> +		test_commit D/B &&
> +		rm -rf D &&
> +		ln -s untracked D &&
> +
> +		test_checkout_workers 2 git checkout --force HEAD &&
> +		! test -h D &&
> +		grep D/A D/A.t &&
> +		grep D/B D/B.t
> +	)
> +'

This test can continue to require SYMLINKS.

Thanks,
-Stolee

Matheus Tavares April 27, 2021, 2:30 a.m. UTC | #2

On Fri, Apr 23, 2021 at 4:18 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 4/22/2021 11:17 AM, Matheus Tavares wrote:
> > Add tests to populate the working tree during clone and checkout using
> > sequential and parallel mode, to confirm that they produce identical
> > results. Also test basic checkout mechanics, such as checking for
> > symlinks in the leading directories and the abidance to --force.
> >
> > Note: some helper functions are added to a common lib file which is only
> > included by t2080 for now. But they will also be used by other
> > parallel-checkout tests in the following patches.
> >
> > Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
>
> Is this a standard thing? Or, did you change the patch significantly
> enough that "Co-authored-by:" is no longer appropriate?

No, I think "Co-authored-by" is appropriate, let's change this trailer :)

> > +++ b/t/lib-parallel-checkout.sh
> > @@ -0,0 +1,37 @@
> > +# Helpers for t208* tests
>
> I could see tests outside of the t208* range possibly having
> value in interacting with parallel checkout in the future.
>
> Perhaps:
>
>         # Helpers for tests invoking parallel-checkout

Will do!

> > +
> > +set_checkout_config () {
> > +     if test $# -ne 2
> > +     then
> > +             BUG "set_checkout_config() requires two arguments"
> > +     fi &&
>
> A usage comment is typically helpful for these helpers:
>
> # set_checkout_config <workers> <threshold>
> # sets global config values to use the given number of
> # workers when the given threshold is met.

OK, I'll change that.

> > +
> > +     test_config_global checkout.workers $1 &&
> > +     test_config_global checkout.thresholdForParallelism $2
> > +}
> > +
> > +# Run "${@:2}" and check that $1 checkout workers were used
> > +test_checkout_workers () {
>
> This simpler doc style works, too.
>
> > +     if test $# -lt 2
> > +     then
> > +             BUG "too few arguments to test_checkout_workers()"
> > +     fi &&
> > +
> > +     expected_workers=$1 &&
> > +     shift &&
> > +
> > +     rm -f trace &&
> > +     GIT_TRACE2="$(pwd)/trace" "$@" &&
> > +
> > +     workers=$(grep "child_start\[..*\] git checkout--worker" trace | wc -l) &&
> > +     test $workers -eq $expected_workers &&
> > +     rm -f trace
>
> I wonder if this should be a "test_when_finished rm -f trace" being
> recorded earlier in the step.It would also benefit from using a more
> specific name than "trace". Something like "trace-test-checkout-workers"
> would be unlikely to collide with someone else's trace.

Good idea, I'll change the trace file name.

Unfortunately, I think we won't be able to use `test_when_finished`
here as this function is sometimes called inside subshells, and
`test_when_finished` doesn't work in this situation :(

> > +# Verify that both the working tree and the index were created correctly
> > +verify_checkout () {
>
> Add usage of a repo in $1?

Sure!

> > +     git -C "$1" diff-index --quiet HEAD -- &&
> > +     git -C "$1" diff-index --quiet --cached HEAD -- &&
> > +     git -C "$1" status --porcelain >"$1".status &&
> > +     test_must_be_empty "$1".status
> > +}
>
>
> > +TEST_NO_CREATE_REPO=1
> > +. ./test-lib.sh
> > +. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
> > +
> > +# Test parallel-checkout with a branch switch containing file creations,
> > +# deletions, and modification; with different entry types. Switching from B1 to
> > +# B2 will have the following changes:
> > +#
> > +# - a (file):      modified
> > +# - e/x (file):    deleted
> > +# - b (symlink):   deleted
> > +# - b/f (file):    created
> > +# - e (symlink):   created
> > +# - d (submodule): created
> > +#
>
> An interesting set of changes. What about a directory/file conflict?
>
> Something like this might be useful:
>
>  # - f/t (file):   deleted
>  # - f (file):     created
>
> in fact, it could be interesting to have a file conflict with each
> of these types, such as the symlink 'e' and the submodule 'd'.

Sure, I'll add those. (But we already have the symlink case with 'e/x'
(file) and 'e' (symlink), no?)

> While we are at it, what about a symlink/submodule conflict? I know
> it makes the test bigger, but doing everything simultaneously through
> a carefully designed repository helps prevent test case bloat.
>
> > +test_expect_success SYMLINKS 'setup repo for checkout with various types of changes' '
>
> Could we split this setup step into two parts? Once could
> set up everything except the symlinks and would not require
> the SYMLINKS prereq. We could then have another test with
> the SYMLINKS prereq that extends B1 and B2 to have symlinks
> and their conflicts. The remaining tests would work on any
> platform without needing the SYMLINKS prereq.

Good point. I think we might be able to create the symlinks with
`test_ln_s_add` and completely get rid of the SYMLINKS prereq :)

> > +test_expect_success SYMLINKS 'sequential checkout' '
> > +     cp -R various various_sequential &&
> > +     set_checkout_config 1 0 &&
> > +     test_checkout_workers 0 \
> > +             git -C various_sequential checkout --recurse-submodules B2 &&
> > +     verify_checkout various_sequential
> > +'
>
> I see all these tests are very similar. Perhaps group
> them to demonstrate their differences?

Makes sense, I'll do that.

> parallel_test_case () {
>         test_expect_success "$1" "
>                 cp -R various $2 &&
>                 set_checkout_config $3 $4 &&
>                 test_checkout_workers $5 \
>                         git -C $2 checkout --recurse-submodules B2 &&
>                 verify_checkout $2
>         "
> }
>
> parallel_test_case 'sequential checkout' \
>         various_sequential 1 0 0
> parallel_test_case 'parallel checkout' \
>         various_parallel 2 0 2
> parallel_test_case 'fallback to sequential checkout (threshold)' \
>         various_sequential_fallback 2 100 0
>
> > +test_expect_success SYMLINKS 'parallel checkout on clone' '
> > +     git -C various checkout --recurse-submodules B2 &&
> > +     set_checkout_config 2 0 &&
> > +     test_checkout_workers 2 \
> > +             git clone --recurse-submodules various various_parallel_clone &&
> > +     verify_checkout various_parallel_clone
> > +'

diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
new file mode 100644
index 0000000000..39fd36fdf6
--- /dev/null
+++ b/t/lib-parallel-checkout.sh
@@ -0,0 +1,37 @@ 
+# Helpers for t208* tests
+
+set_checkout_config () {
+	if test $# -ne 2
+	then
+		BUG "set_checkout_config() requires two arguments"
+	fi &&
+
+	test_config_global checkout.workers $1 &&
+	test_config_global checkout.thresholdForParallelism $2
+}
+
+# Run "${@:2}" and check that $1 checkout workers were used
+test_checkout_workers () {
+	if test $# -lt 2
+	then
+		BUG "too few arguments to test_checkout_workers()"
+	fi &&
+
+	expected_workers=$1 &&
+	shift &&
+
+	rm -f trace &&
+	GIT_TRACE2="$(pwd)/trace" "$@" &&
+
+	workers=$(grep "child_start\[..*\] git checkout--worker" trace | wc -l) &&
+	test $workers -eq $expected_workers &&
+	rm -f trace
+}
+
+# Verify that both the working tree and the index were created correctly
+verify_checkout () {
+	git -C "$1" diff-index --quiet HEAD -- &&
+	git -C "$1" diff-index --quiet --cached HEAD -- &&
+	git -C "$1" status --porcelain >"$1".status &&
+	test_must_be_empty "$1".status
+}
diff --git a/t/t2080-parallel-checkout-basics.sh b/t/t2080-parallel-checkout-basics.sh
new file mode 100755
index 0000000000..0cb1493cdc
--- /dev/null
+++ b/t/t2080-parallel-checkout-basics.sh
@@ -0,0 +1,150 @@ 
+#!/bin/sh
+
+test_description='parallel-checkout basics
+
+Ensure that parallel-checkout basically works on clone and checkout, spawning
+the required number of workers and correctly populating both the index and the
+working tree.
+'
+
+TEST_NO_CREATE_REPO=1
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+
+# Test parallel-checkout with a branch switch containing file creations,
+# deletions, and modification; with different entry types. Switching from B1 to
+# B2 will have the following changes:
+#
+# - a (file):      modified
+# - e/x (file):    deleted
+# - b (symlink):   deleted
+# - b/f (file):    created
+# - e (symlink):   created
+# - d (submodule): created
+#
+test_expect_success SYMLINKS 'setup repo for checkout with various types of changes' '
+	git init various &&
+	(
+		cd various &&
+		git checkout -b B1 &&
+		echo a >a &&
+		mkdir e &&
+		echo e/x >e/x &&
+		ln -s e b &&
+		git add -A &&
+		git commit -m B1 &&
+
+		git checkout -b B2 &&
+		echo modified >a &&
+		rm -rf e &&
+		rm b &&
+		mkdir b &&
+		echo b/f >b/f &&
+		ln -s b e &&
+		git init d &&
+		test_commit -C d f &&
+		git submodule add ./d &&
+		git add -A &&
+		git commit -m B2 &&
+
+		git checkout --recurse-submodules B1
+	)
+'
+
+test_expect_success SYMLINKS 'sequential checkout' '
+	cp -R various various_sequential &&
+	set_checkout_config 1 0 &&
+	test_checkout_workers 0 \
+		git -C various_sequential checkout --recurse-submodules B2 &&
+	verify_checkout various_sequential
+'
+
+test_expect_success SYMLINKS 'parallel checkout' '
+	cp -R various various_parallel &&
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 \
+		git -C various_parallel checkout --recurse-submodules B2 &&
+	verify_checkout various_parallel
+'
+
+test_expect_success SYMLINKS 'fallback to sequential checkout (threshold)' '
+	cp -R various various_sequential_fallback &&
+	set_checkout_config 2 100 &&
+	test_checkout_workers 0 \
+		git -C various_sequential_fallback checkout --recurse-submodules B2 &&
+	verify_checkout various_sequential_fallback
+'
+
+test_expect_success SYMLINKS 'parallel checkout on clone' '
+	git -C various checkout --recurse-submodules B2 &&
+	set_checkout_config 2 0 &&
+	test_checkout_workers 2 \
+		git clone --recurse-submodules various various_parallel_clone &&
+	verify_checkout various_parallel_clone
+'
+
+test_expect_success SYMLINKS 'fallback to sequential checkout on clone (threshold)' '
+	git -C various checkout --recurse-submodules B2 &&
+	set_checkout_config 2 100 &&
+	test_checkout_workers 0 \
+		git clone --recurse-submodules various various_sequential_fallback_clone &&
+	verify_checkout various_sequential_fallback_clone
+'
+
+# Just to be paranoid, actually compare the working trees' contents directly.
+test_expect_success SYMLINKS 'compare the working trees' '
+	rm -rf various_*/.git &&
+	rm -rf various_*/d/.git &&
+
+	diff -r various_sequential various_parallel &&
+	diff -r various_sequential various_sequential_fallback &&
+	diff -r various_sequential various_parallel_clone &&
+	diff -r various_sequential various_sequential_fallback_clone
+'
+
+test_expect_success 'parallel checkout respects --[no]-force' '
+	set_checkout_config 2 0 &&
+	git init dirty &&
+	(
+		cd dirty &&
+		mkdir D &&
+		test_commit D/F &&
+		test_commit F &&
+
+		rm -rf D &&
+		echo changed >D &&
+		echo changed >F.t &&
+
+		# We expect 0 workers because there is nothing to be done
+		test_checkout_workers 0 git checkout HEAD &&
+		test_path_is_file D &&
+		grep changed D &&
+		grep changed F.t &&
+
+		test_checkout_workers 2 git checkout --force HEAD &&
+		test_path_is_dir D &&
+		grep D/F D/F.t &&
+		grep F F.t
+	)
+'
+
+test_expect_success SYMLINKS 'parallel checkout checks for symlinks in leading dirs' '
+	set_checkout_config 2 0 &&
+	git init symlinks &&
+	(
+		cd symlinks &&
+		mkdir D untracked &&
+		# Commit 2 files to have enough work for 2 parallel workers
+		test_commit D/A &&
+		test_commit D/B &&
+		rm -rf D &&
+		ln -s untracked D &&
+
+		test_checkout_workers 2 git checkout --force HEAD &&
+		! test -h D &&
+		grep D/A D/A.t &&
+		grep D/B D/B.t
+	)
+'
+
+test_done

[4/7] parallel-checkout: add tests for basic operations

Commit Message

Comments

Patch