[v3,01/15] diff --color-moved: add perf tests

Message ID	8fc8914a37b3c343cd92bb0255088f7b000ff7f7.1635336262.git.gitgitgadget@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <8fc8914a37b3c343cd92bb0255088f7b000ff7f7.1635336262.git.gitgitgadget@gmail.com> In-Reply-To: <pull.981.v3.git.1635336262.gitgitgadget@gmail.com> References: <pull.981.v2.git.1626777393.gitgitgadget@gmail.com> <pull.981.v3.git.1635336262.gitgitgadget@gmail.com> Date: Wed, 27 Oct 2021 12:04:08 +0000 Subject: [PATCH v3 01/15] diff --color-moved: add perf tests Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Phillip Wood <phillip.wood@dunelm.org.uk>, =?utf-8?b?w4Z2YXIgQXJuZmo=?= =?utf-8?b?w7Zyw7A=?= Bjarmason <avarab@gmail.com>, Elijah Newren <newren@gmail.com>, Phillip Wood <phillip.wood123@gmail.com>, Phillip Wood <phillip.wood@dunelm.org.uk>, Phillip Wood <phillip.wood@dunelm.org.uk> Precedence: bulk From: Phillip Wood <phillip.wood@dunelm.org.uk>
Series	diff --color-moved[-ws] speedups \| expand [v3,00/15] diff --color-moved[-ws] speedups [v3,01/15] diff --color-moved: add perf tests [v3,02/15] diff --color-moved: clear all flags on blocks that are too short [v3,03/15] diff --color-moved: factor out function [v3,04/15] diff --color-moved: rewind when discarding pmb [v3,05/15] diff --color-moved=zebra: fix alternate coloring [v3,06/15] diff --color-moved: avoid false short line matches and bad zerba coloring [v3,07/15] diff: simplify allow-indentation-change delta calculation [v3,08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize [v3,09/15] diff --color-moved: call comparison function directly [v3,10/15] diff --color-moved: unify moved block growth functions [v3,11/15] diff --color-moved: shrink potential moved blocks as we go [v3,12/15] diff --color-moved: stop clearing potential moved blocks [v3,13/15] diff --color-moved-ws=allow-indentation-change: improve hash lookups [v3,14/15] diff: use designated initializers for emitted_diff_symbol [v3,15/15] diff --color-moved: intern strings

Message ID

8fc8914a37b3c343cd92bb0255088f7b000ff7f7.1635336262.git.gitgitgadget@gmail.com (mailing list archive)

State

New, archived

Headers

Message-Id: 
 <8fc8914a37b3c343cd92bb0255088f7b000ff7f7.1635336262.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.981.v3.git.1635336262.gitgitgadget@gmail.com>
References: <pull.981.v2.git.1626777393.gitgitgadget@gmail.com>
        <pull.981.v3.git.1635336262.gitgitgadget@gmail.com>
Date: Wed, 27 Oct 2021 12:04:08 +0000
Subject: [PATCH v3 01/15] diff --color-moved: add perf tests
Fcc: Sent
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Phillip Wood <phillip.wood@dunelm.org.uk>, =?utf-8?b?w4Z2YXIgQXJuZmo=?=
	=?utf-8?b?w7Zyw7A=?= Bjarmason  <avarab@gmail.com>,
 Elijah Newren <newren@gmail.com>, Phillip Wood <phillip.wood123@gmail.com>,
 Phillip Wood <phillip.wood@dunelm.org.uk>,
 Phillip Wood <phillip.wood@dunelm.org.uk>
Precedence: bulk
From: Phillip Wood <phillip.wood@dunelm.org.uk>

Series

diff --color-moved[-ws] speedups | expand

Commit Message

Phillip Wood Oct. 27, 2021, 12:04 p.m. UTC

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Add some tests so we can monitor changes to the performance of the
move detection code. The tests record the performance of a single
large diff and a sequence of smaller diffs.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 t/perf/p4002-diff-color-moved.sh | 45 ++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100755 t/perf/p4002-diff-color-moved.sh

Comments

Junio C Hamano Oct. 28, 2021, 9:32 p.m. UTC | #1

"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> Add some tests so we can monitor changes to the performance of the
> move detection code. The tests record the performance of a single
> large diff and a sequence of smaller diffs.

"A single large diff" meaning...?

> +if ! git rev-parse --verify v2.29.0^{commit} >/dev/null
> +then
> +	skip_all='skipping because tag v2.29.0 was not found'
> +	test_done
> +fi

Hmph.  So this is designed only to be run in a clone of git.git with
that tag (and a bit of history, at least to v2.28.0 and 1000 commits)?

I am asking primarily because this seems to be the first instance of
a test that hardcodes the dependency on our history, instead of
allowing the tester to use their favourite history by using the
GIT_PERF_LARGE_REPO and GIT_PERF_REPO environment variables.

The intention of the tests themselves looks quite clear.  Thanks.

> +GIT_PAGER_IN_USE=1
> +test_export GIT_PAGER_IN_USE
> +
> +test_perf 'diff --no-color-moved --no-color-moved-ws large change' '
> +	git diff --no-color-moved --no-color-moved-ws v2.28.0 v2.29.0
> +'
> +
> +test_perf 'diff --color-moved --no-color-moved-ws large change' '
> +	git diff --color-moved=zebra --no-color-moved-ws v2.28.0 v2.29.0
> +'
> +
> +test_perf 'diff --color-moved-ws=allow-indentation-change large change' '
> +	git diff --color-moved=zebra --color-moved-ws=allow-indentation-change \
> +		v2.28.0 v2.29.0
> +'
> +
> +test_perf 'log --no-color-moved --no-color-moved-ws' '
> +	git log --no-color-moved --no-color-moved-ws --no-merges --patch \
> +		-n1000 v2.29.0
> +'
> +
> +test_perf 'log --color-moved --no-color-moved-ws' '
> +	git log --color-moved=zebra --no-color-moved-ws --no-merges --patch \
> +		-n1000 v2.29.0
> +'
> +
> +test_perf 'log --color-moved-ws=allow-indentation-change' '
> +	git log --color-moved=zebra --color-moved-ws=allow-indentation-change \
> +		--no-merges --patch -n1000 v2.29.0
> +'
> +
> +test_done

Phillip Wood Oct. 29, 2021, 10:24 a.m. UTC | #2

Hi Junio

On 28/10/2021 22:32, Junio C Hamano wrote:
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> Add some tests so we can monitor changes to the performance of the
>> move detection code. The tests record the performance of a single
>> large diff and a sequence of smaller diffs.
> 
> "A single large diff" meaning...?

The diff of two commits that are far apart in the history so have lots 
of changes between them

>> +if ! git rev-parse --verify v2.29.0^{commit} >/dev/null
>> +then
>> +	skip_all='skipping because tag v2.29.0 was not found'
>> +	test_done
>> +fi
> 
> Hmph.  So this is designed only to be run in a clone of git.git with
> that tag (and a bit of history, at least to v2.28.0 and 1000 commits)?
> 
> I am asking primarily because this seems to be the first instance of
> a test that hardcodes the dependency on our history, instead of
> allowing the tester to use their favourite history by using the
> GIT_PERF_LARGE_REPO and GIT_PERF_REPO environment variables.

p3404-rebase-interactive does the same thing. The aim is to have a 
repeatable test rather than just using whatever commit HEAD happens to 
be pointing at when the test is run as the starting point, if you have 
any ideas for doing that another way I'm happy to change it.

> The intention of the tests themselves looks quite clear.  Thanks.

Thanks

Phillip

>> +GIT_PAGER_IN_USE=1
>> +test_export GIT_PAGER_IN_USE
>> +
>> +test_perf 'diff --no-color-moved --no-color-moved-ws large change' '
>> +	git diff --no-color-moved --no-color-moved-ws v2.28.0 v2.29.0
>> +'
>> +
>> +test_perf 'diff --color-moved --no-color-moved-ws large change' '
>> +	git diff --color-moved=zebra --no-color-moved-ws v2.28.0 v2.29.0
>> +'
>> +
>> +test_perf 'diff --color-moved-ws=allow-indentation-change large change' '
>> +	git diff --color-moved=zebra --color-moved-ws=allow-indentation-change \
>> +		v2.28.0 v2.29.0
>> +'
>> +
>> +test_perf 'log --no-color-moved --no-color-moved-ws' '
>> +	git log --no-color-moved --no-color-moved-ws --no-merges --patch \
>> +		-n1000 v2.29.0
>> +'
>> +
>> +test_perf 'log --color-moved --no-color-moved-ws' '
>> +	git log --color-moved=zebra --no-color-moved-ws --no-merges --patch \
>> +		-n1000 v2.29.0
>> +'
>> +
>> +test_perf 'log --color-moved-ws=allow-indentation-change' '
>> +	git log --color-moved=zebra --color-moved-ws=allow-indentation-change \
>> +		--no-merges --patch -n1000 v2.29.0
>> +'
>> +
>> +test_done

Ævar Arnfjörð Bjarmason Oct. 29, 2021, 11:06 a.m. UTC | #3

On Fri, Oct 29 2021, Phillip Wood wrote:

> Hi Junio
>
> On 28/10/2021 22:32, Junio C Hamano wrote:
>> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> 
>>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>>
>>> Add some tests so we can monitor changes to the performance of the
>>> move detection code. The tests record the performance of a single
>>> large diff and a sequence of smaller diffs.
>> "A single large diff" meaning...?
>
> The diff of two commits that are far apart in the history so have lots
> of changes between them
>
>>> +if ! git rev-parse --verify v2.29.0^{commit} >/dev/null
>>> +then
>>> +	skip_all='skipping because tag v2.29.0 was not found'
>>> +	test_done
>>> +fi
>> Hmph.  So this is designed only to be run in a clone of git.git with
>> that tag (and a bit of history, at least to v2.28.0 and 1000 commits)?
>> I am asking primarily because this seems to be the first instance of
>> a test that hardcodes the dependency on our history, instead of
>> allowing the tester to use their favourite history by using the
>> GIT_PERF_LARGE_REPO and GIT_PERF_REPO environment variables.
>
> p3404-rebase-interactive does the same thing. The aim is to have a
> repeatable test rather than just using whatever commit HEAD happens to 
> be pointing at when the test is run as the starting point, if you have
> any ideas for doing that another way I'm happy to change it.

I don't know if it's worth it here, but the following would work:

 1. List all tags in the repository, sorted in reverse order, so e.g.:

    git tag -l 'v*.0' --sort=version:refname

    (The glob can be configurable as an env variable, or we could fall
    back)

 2. Go down that list and find the first pair that matches some limit, I
    think say the first "major" release with 500 commits would qualify

 3. Make it a GIT_PERF_LARGE_REPO test

We've got some perf tests that do similar things. I think you'd find
that with something like this you should able to hand the perf test a
path to git.git, or linux.git, and probably any "major" repository" as
long as it follows a common "we tag our releases at some interval"
pattern.

Or perhaps more simply:

 1. Note the number of commits in the history, per "git rev-list HEAD |
    wc -l" 2.

 2. Then round that down to the nearest 10^x, so for a 250k commit
   repository round down to 100k and diff say the 90k..100kth commits,
   for git.git which has 60k that would be 10k, and the diff is commits
   9k..10k..

It means you'll get a "bump" eventually when say git.git crosses 100k
commits, but it will prorably be stable for any measurement anyone cares
to do, and means that you can get "realistic" measurements for diffing a
big chuck on of history from anything from a tiny repository with >=10
commits, to something truly gargantuan where you'd end up diffing say
900k..1m.

Phillip Wood Nov. 10, 2021, 11:05 a.m. UTC | #4

Hi Ævar

On 29/10/2021 12:06, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Oct 29 2021, Phillip Wood wrote:
> 
>> Hi Junio
>>
>> On 28/10/2021 22:32, Junio C Hamano wrote:
>>> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>>>
>>>> Add some tests so we can monitor changes to the performance of the
>>>> move detection code. The tests record the performance of a single
>>>> large diff and a sequence of smaller diffs.
>>> "A single large diff" meaning...?
>>
>> The diff of two commits that are far apart in the history so have lots
>> of changes between them
>>
>>>> +if ! git rev-parse --verify v2.29.0^{commit} >/dev/null
>>>> +then
>>>> +	skip_all='skipping because tag v2.29.0 was not found'
>>>> +	test_done
>>>> +fi
>>> Hmph.  So this is designed only to be run in a clone of git.git with
>>> that tag (and a bit of history, at least to v2.28.0 and 1000 commits)?
>>> I am asking primarily because this seems to be the first instance of
>>> a test that hardcodes the dependency on our history, instead of
>>> allowing the tester to use their favourite history by using the
>>> GIT_PERF_LARGE_REPO and GIT_PERF_REPO environment variables.
>>
>> p3404-rebase-interactive does the same thing. The aim is to have a
>> repeatable test rather than just using whatever commit HEAD happens to
>> be pointing at when the test is run as the starting point, if you have
>> any ideas for doing that another way I'm happy to change it.
> 
> I don't know if it's worth it here, but the following would work:
> 
>   1. List all tags in the repository, sorted in reverse order, so e.g.:
> 
>      git tag -l 'v*.0' --sort=version:refname
> 
>      (The glob can be configurable as an env variable, or we could fall
>      back)
> 
>   2. Go down that list and find the first pair that matches some limit, I
>      think say the first "major" release with 500 commits would qualify
> 
>   3. Make it a GIT_PERF_LARGE_REPO test
> 
> We've got some perf tests that do similar things. I think you'd find
> that with something like this you should able to hand the perf test a
> path to git.git, or linux.git, and probably any "major" repository" as
> long as it follows a common "we tag our releases at some interval"
> pattern.
> 
> Or perhaps more simply:
> 
>   1. Note the number of commits in the history, per "git rev-list HEAD |
>      wc -l" 2.
> 
>   2. Then round that down to the nearest 10^x, so for a 250k commit
>     repository round down to 100k and diff say the 90k..100kth commits,
>     for git.git which has 60k that would be 10k, and the diff is commits
>     9k..10k..
> 
> It means you'll get a "bump" eventually when say git.git crosses 100k
> commits, but it will prorably be stable for any measurement anyone cares
> to do, and means that you can get "realistic" measurements for diffing a
> big chuck on of history from anything from a tiny repository with >=10
> commits, to something truly gargantuan where you'd end up diffing say
> 900k..1m.

Thanks for the suggestions, I was quite tempted by the second idea, but 
in the end I couldn't face rerunning the pref tests and updating all the 
commit messages again. I've added a couple of environment variables to 
allow the revs in the diff commands to be customized.

Best Wishes

Phillip

diff --git a/t/perf/p4002-diff-color-moved.sh b/t/perf/p4002-diff-color-moved.sh
new file mode 100755
index 00000000000..ad56bcb71e4
--- /dev/null
+++ b/t/perf/p4002-diff-color-moved.sh
@@ -0,0 +1,45 @@ 
+#!/bin/sh
+
+test_description='Tests diff --color-moved performance'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+if ! git rev-parse --verify v2.29.0^{commit} >/dev/null
+then
+	skip_all='skipping because tag v2.29.0 was not found'
+	test_done
+fi
+
+GIT_PAGER_IN_USE=1
+test_export GIT_PAGER_IN_USE
+
+test_perf 'diff --no-color-moved --no-color-moved-ws large change' '
+	git diff --no-color-moved --no-color-moved-ws v2.28.0 v2.29.0
+'
+
+test_perf 'diff --color-moved --no-color-moved-ws large change' '
+	git diff --color-moved=zebra --no-color-moved-ws v2.28.0 v2.29.0
+'
+
+test_perf 'diff --color-moved-ws=allow-indentation-change large change' '
+	git diff --color-moved=zebra --color-moved-ws=allow-indentation-change \
+		v2.28.0 v2.29.0
+'
+
+test_perf 'log --no-color-moved --no-color-moved-ws' '
+	git log --no-color-moved --no-color-moved-ws --no-merges --patch \
+		-n1000 v2.29.0
+'
+
+test_perf 'log --color-moved --no-color-moved-ws' '
+	git log --color-moved=zebra --no-color-moved-ws --no-merges --patch \
+		-n1000 v2.29.0
+'
+
+test_perf 'log --color-moved-ws=allow-indentation-change' '
+	git log --color-moved=zebra --color-moved-ws=allow-indentation-change \
+		--no-merges --patch -n1000 v2.29.0
+'
+
+test_done

[v3,01/15] diff --color-moved: add perf tests

Commit Message

Comments

Patch