diff mbox series

[02/20] t/perf: add performance test for sparse operations

Message ID a8c6322a3dbe1130dd2026b600a896e86d54a95d.1614111270.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Sparse Index: Design, Format, Tests | expand

Commit Message

Derrick Stolee Feb. 23, 2021, 8:14 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

Create a test script that takes the default performance test (the Git
codebase) and multiplies it by 256 using four layers of duplicated
trees of width four. This results in nearly one million blob entries in
the index. Then, we can clone this repository with sparse-checkout
patterns that demonstrate four copies of the initial repository. Each
clone will use a different index format or mode so peformance can be
tested across the different options.

Note that the initial repo is stripped of submodules before doing the
copies. This preserves the expected data shape of the sparse index,
because directories containing submodules are not collapsed to a sparse
directory entry.

Run a few Git commands on these clones, especially those that use the
index (status, add, commit).

Here are the results on my Linux machine:

Test
--------------------------------------------------------------
2000.2: git status (full-index-v3)             0.37(0.30+0.09)
2000.3: git status (full-index-v4)             0.39(0.32+0.10)
2000.4: git add -A (full-index-v3)             1.42(1.06+0.20)
2000.5: git add -A (full-index-v4)             1.26(0.98+0.16)
2000.6: git add . (full-index-v3)              1.40(1.04+0.18)
2000.7: git add . (full-index-v4)              1.26(0.98+0.17)
2000.8: git commit -a -m A (full-index-v3)     1.42(1.11+0.16)
2000.9: git commit -a -m A (full-index-v4)     1.33(1.08+0.16)

It is perhaps noteworthy that there is an improvement when using index
version 4. This is because the v3 index uses 108 MiB while the v4
index uses 80 MiB. Since the repeated portions of the directories are
very short (f3/f1/f2, for example) this ratio is less pronounced than in
similarly-sized real repositories.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/perf/p2000-sparse-operations.sh | 87 +++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)
 create mode 100755 t/perf/p2000-sparse-operations.sh

Comments

Elijah Newren Feb. 24, 2021, 2:30 a.m. UTC | #1
On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Create a test script that takes the default performance test (the Git
> codebase) and multiplies it by 256 using four layers of duplicated
> trees of width four. This results in nearly one million blob entries in
> the index. Then, we can clone this repository with sparse-checkout
> patterns that demonstrate four copies of the initial repository. Each
> clone will use a different index format or mode so peformance can be
> tested across the different options.
>
> Note that the initial repo is stripped of submodules before doing the
> copies. This preserves the expected data shape of the sparse index,
> because directories containing submodules are not collapsed to a sparse
> directory entry.
>
> Run a few Git commands on these clones, especially those that use the
> index (status, add, commit).
>
> Here are the results on my Linux machine:
>
> Test
> --------------------------------------------------------------
> 2000.2: git status (full-index-v3)             0.37(0.30+0.09)
> 2000.3: git status (full-index-v4)             0.39(0.32+0.10)
> 2000.4: git add -A (full-index-v3)             1.42(1.06+0.20)
> 2000.5: git add -A (full-index-v4)             1.26(0.98+0.16)
> 2000.6: git add . (full-index-v3)              1.40(1.04+0.18)
> 2000.7: git add . (full-index-v4)              1.26(0.98+0.17)
> 2000.8: git commit -a -m A (full-index-v3)     1.42(1.11+0.16)
> 2000.9: git commit -a -m A (full-index-v4)     1.33(1.08+0.16)
>
> It is perhaps noteworthy that there is an improvement when using index
> version 4. This is because the v3 index uses 108 MiB while the v4
> index uses 80 MiB. Since the repeated portions of the directories are
> very short (f3/f1/f2, for example) this ratio is less pronounced than in
> similarly-sized real repositories.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/perf/p2000-sparse-operations.sh | 87 +++++++++++++++++++++++++++++++
>  1 file changed, 87 insertions(+)
>  create mode 100755 t/perf/p2000-sparse-operations.sh
>
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> new file mode 100755
> index 000000000000..52597683376e
> --- /dev/null
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -0,0 +1,87 @@
> +#!/bin/sh
> +
> +test_description="test performance of Git operations using the index"
> +
> +. ./perf-lib.sh
> +
> +test_perf_default_repo
> +
> +SPARSE_CONE=f2/f4/f1
> +
> +test_expect_success 'setup repo and indexes' '
> +       git reset --hard HEAD &&
> +       # Remove submodules from the example repo, because our
> +       # duplication of the entire repo creates an unlikly data shape.
> +       git config --file .gitmodules --get-regexp "submodule.*.path" >modules &&
> +       rm -f .gitmodules &&
> +       git add .gitmodules &&

Why not `git rm [-f] .gitmodules` instead of these two commands?  Is
there something special about .gitmodules that requires this special
handling?

> +       for module in $(awk "{print \$2}" modules)
> +       do
> +               git rm $module || return 1
> +       done &&
> +       git add . &&

What does the `git add .` do?  I don't see any changes there weren't
already git-add'ed or git-rm'ed.

> +       git commit -m "remove submodules" &&
> +
> +       echo bogus >a &&
> +       cp a b &&
> +       git add a b &&
> +       git commit -m "level 0" &&
> +       BLOB=$(git rev-parse HEAD:a) &&
> +       OLD_COMMIT=$(git rev-parse HEAD) &&
> +       OLD_TREE=$(git rev-parse HEAD^{tree}) &&
> +
> +       for i in $(test_seq 1 4)
> +       do
> +               cat >in <<-EOF &&
> +                       100755 blob $BLOB       a
> +                       040000 tree $OLD_TREE   f1
> +                       040000 tree $OLD_TREE   f2
> +                       040000 tree $OLD_TREE   f3
> +                       040000 tree $OLD_TREE   f4
> +               EOF
> +               NEW_TREE=$(git mktree <in) &&
> +               NEW_COMMIT=$(git commit-tree $NEW_TREE -p $OLD_COMMIT -m "level $i") &&
> +               OLD_TREE=$NEW_TREE &&
> +               OLD_COMMIT=$NEW_COMMIT || return 1
> +       done &&
> +
> +       git sparse-checkout init --cone &&
> +       git branch -f wide $OLD_COMMIT &&
> +       git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v3 &&
> +       (
> +               cd full-index-v3 &&
> +               git sparse-checkout init --cone &&
> +               git sparse-checkout set $SPARSE_CONE &&
> +               git config index.version 3 &&
> +               git update-index --index-version=3
> +       ) &&
> +       git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v4 &&
> +       (
> +               cd full-index-v4 &&
> +               git sparse-checkout init --cone &&
> +               git sparse-checkout set $SPARSE_CONE &&
> +               git config index.version 4 &&
> +               git update-index --index-version=4
> +       )
> +'
> +
> +test_perf_on_all () {
> +       command="$@"
> +       for repo in full-index-v3 full-index-v4
> +       do
> +               test_perf "$command ($repo)" "
> +                       (
> +                               cd $repo &&
> +                               echo >>$SPARSE_CONE/a &&
> +                               $command
> +                       )
> +               "
> +       done
> +}
> +
> +test_perf_on_all git status
> +test_perf_on_all git add -A
> +test_perf_on_all git add .
> +test_perf_on_all git commit -a -m A
> +
> +test_done
> --
> gitgitgadget

Other than the two minor questions, the rest looks good to me.
Derrick Stolee March 9, 2021, 8:03 p.m. UTC | #2
On 2/23/2021 9:30 PM, Elijah Newren wrote:
> On Tue, Feb 23, 2021 at 12:14 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> +test_expect_success 'setup repo and indexes' '
> +       git reset --hard HEAD &&
> +       # Remove submodules from the example repo, because our
> +       # duplication of the entire repo creates an unlikly data shape.
> +       git config --file .gitmodules --get-regexp "submodule.*.path" >modules &&
> +       rm -f .gitmodules &&
> +       git add .gitmodules &&
> Why not `git rm [-f] .gitmodules` instead of these two commands?  Is
> there something special about .gitmodules that requires this special
> handling?

No, I'm just being sloppy. Will clean up.

>> +       for module in $(awk "{print \$2}" modules)
>> +       do
>> +               git rm $module || return 1
>> +       done &&
>> +       git add . &&
> What does the `git add .` do?  I don't see any changes there weren't
> already git-add'ed or git-rm'ed.

Same here. Thanks.

-Stolee
diff mbox series

Patch

diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
new file mode 100755
index 000000000000..52597683376e
--- /dev/null
+++ b/t/perf/p2000-sparse-operations.sh
@@ -0,0 +1,87 @@ 
+#!/bin/sh
+
+test_description="test performance of Git operations using the index"
+
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+SPARSE_CONE=f2/f4/f1
+
+test_expect_success 'setup repo and indexes' '
+	git reset --hard HEAD &&
+	# Remove submodules from the example repo, because our
+	# duplication of the entire repo creates an unlikly data shape.
+	git config --file .gitmodules --get-regexp "submodule.*.path" >modules &&
+	rm -f .gitmodules &&
+	git add .gitmodules &&
+	for module in $(awk "{print \$2}" modules)
+	do
+		git rm $module || return 1
+	done &&
+	git add . &&
+	git commit -m "remove submodules" &&
+
+	echo bogus >a &&
+	cp a b &&
+	git add a b &&
+	git commit -m "level 0" &&
+	BLOB=$(git rev-parse HEAD:a) &&
+	OLD_COMMIT=$(git rev-parse HEAD) &&
+	OLD_TREE=$(git rev-parse HEAD^{tree}) &&
+
+	for i in $(test_seq 1 4)
+	do
+		cat >in <<-EOF &&
+			100755 blob $BLOB	a
+			040000 tree $OLD_TREE	f1
+			040000 tree $OLD_TREE	f2
+			040000 tree $OLD_TREE	f3
+			040000 tree $OLD_TREE	f4
+		EOF
+		NEW_TREE=$(git mktree <in) &&
+		NEW_COMMIT=$(git commit-tree $NEW_TREE -p $OLD_COMMIT -m "level $i") &&
+		OLD_TREE=$NEW_TREE &&
+		OLD_COMMIT=$NEW_COMMIT || return 1
+	done &&
+
+	git sparse-checkout init --cone &&
+	git branch -f wide $OLD_COMMIT &&
+	git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v3 &&
+	(
+		cd full-index-v3 &&
+		git sparse-checkout init --cone &&
+		git sparse-checkout set $SPARSE_CONE &&
+		git config index.version 3 &&
+		git update-index --index-version=3
+	) &&
+	git -c core.sparseCheckoutCone=true clone --branch=wide --sparse . full-index-v4 &&
+	(
+		cd full-index-v4 &&
+		git sparse-checkout init --cone &&
+		git sparse-checkout set $SPARSE_CONE &&
+		git config index.version 4 &&
+		git update-index --index-version=4
+	)
+'
+
+test_perf_on_all () {
+	command="$@"
+	for repo in full-index-v3 full-index-v4
+	do
+		test_perf "$command ($repo)" "
+			(
+				cd $repo &&
+				echo >>$SPARSE_CONE/a &&
+				$command
+			)
+		"
+	done
+}
+
+test_perf_on_all git status
+test_perf_on_all git add -A
+test_perf_on_all git add .
+test_perf_on_all git commit -a -m A
+
+test_done