mbox series

[00/10] diff --color-moved[-ws] speedups

Message ID pull.981.git.1623675888.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series diff --color-moved[-ws] speedups | expand

Message

Derrick Stolee via GitGitGadget June 14, 2021, 1:04 p.m. UTC
The current implementation of diff --color-moved-ws=allow-indentation-change
is considerably slower that the implementation of diff --color-moved which
is in turn slower than a regular diff. This patch series starts with a
couple of bug fixes and then reworks the implementation of diff
--color-moved and diff --color-moved-ws=allow-indentation-change to speed
them up on large diffs. The time to run git diff --color-moved
--no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
with --color-moved - the time to run git log -p --color-moved
--no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
processors. On older processors these patches reduce the running time in all
cases that I've tested. In general the larger the diff the larger the speed
up. As an extreme example the time to run diff --color-moved
--color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
minutes to 6 seconds.

Phillip Wood (10):
  diff --color-moved=zerba: fix alternate coloring
  diff --color-moved: avoid false short line matches and bad zerba
    coloring
  diff: simplify allow-indentation-change delta calculation
  diff --color-moved-ws=allow-indentation-change: simplify and optimize
  diff --color-moved: call comparison function directly
  diff --color-moved: unify moved block growth functions
  diff --color-moved: shrink potential moved blocks as we go
  diff --color-moved: stop clearing potential moved blocks
  diff --color-moved-ws=allow-indentation-change: improve hash lookups
  diff --color-moved: intern strings

 diff.c                     | 375 ++++++++++++++-----------------------
 t/t4015-diff-whitespace.sh | 137 ++++++++++++++
 2 files changed, 276 insertions(+), 236 deletions(-)


base-commit: 211eca0895794362184da2be2a2d812d070719d3
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-981%2Fphillipwood%2Fwip%2Fdiff-color-moved-tweaks-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-981/phillipwood/wip/diff-color-moved-tweaks-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/981

Comments

Ævar Arnfjörð Bjarmason June 16, 2021, 2:24 p.m. UTC | #1
On Mon, Jun 14 2021, Phillip Wood via GitGitGadget wrote:

> The current implementation of diff --color-moved-ws=allow-indentation-change
> is considerably slower that the implementation of diff --color-moved which
> is in turn slower than a regular diff. This patch series starts with a
> couple of bug fixes and then reworks the implementation of diff
> --color-moved and diff --color-moved-ws=allow-indentation-change to speed
> them up on large diffs. The time to run git diff --color-moved
> --no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
> git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
> v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
> with --color-moved - the time to run git log -p --color-moved
> --no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
> processors. On older processors these patches reduce the running time in all
> cases that I've tested. In general the larger the diff the larger the speed
> up. As an extreme example the time to run diff --color-moved
> --color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
> minutes to 6 seconds.
>
> Phillip Wood (10):
>   diff --color-moved=zerba: fix alternate coloring
>   diff --color-moved: avoid false short line matches and bad zerba
>     coloring
>   diff: simplify allow-indentation-change delta calculation
>   diff --color-moved-ws=allow-indentation-change: simplify and optimize
>   diff --color-moved: call comparison function directly
>   diff --color-moved: unify moved block growth functions
>   diff --color-moved: shrink potential moved blocks as we go
>   diff --color-moved: stop clearing potential moved blocks
>   diff --color-moved-ws=allow-indentation-change: improve hash lookups
>   diff --color-moved: intern strings

Nice to see these land after the earlier on-list reference to them.

I skimmed these mostly, and am not familiar with this code, but didn't
see any glaring things missing. There was one existing oddity with
assigning a 0 to an "enum diff_symbol", don't we want
DIFF_SYMBOL_BINARY_DIFF_HEADER? In any case, it's just a line you touch
in 02/10, and pre-dates these changes.

One thing I would very much like to see here is a conversion of the
existing ad-hoc benchmarks you note in commit messages to something that
lives in t/perf/, it really helps future maintenance of perf-sensitive
code to be able to re-run those, and I for one find the output much
easier to read than whatever tool you're using to produce your
benchmarks.
Phillip Wood June 21, 2021, 10:03 a.m. UTC | #2
On 16/06/2021 15:24, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Jun 14 2021, Phillip Wood via GitGitGadget wrote:
> 
>> The current implementation of diff --color-moved-ws=allow-indentation-change
>> is considerably slower that the implementation of diff --color-moved which
>> is in turn slower than a regular diff. This patch series starts with a
>> couple of bug fixes and then reworks the implementation of diff
>> --color-moved and diff --color-moved-ws=allow-indentation-change to speed
>> them up on large diffs. The time to run git diff --color-moved
>> --no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
>> git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
>> v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
>> with --color-moved - the time to run git log -p --color-moved
>> --no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
>> processors. On older processors these patches reduce the running time in all
>> cases that I've tested. In general the larger the diff the larger the speed
>> up. As an extreme example the time to run diff --color-moved
>> --color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
>> minutes to 6 seconds.
>>
>> Phillip Wood (10):
>>    diff --color-moved=zerba: fix alternate coloring
>>    diff --color-moved: avoid false short line matches and bad zerba
>>      coloring
>>    diff: simplify allow-indentation-change delta calculation
>>    diff --color-moved-ws=allow-indentation-change: simplify and optimize
>>    diff --color-moved: call comparison function directly
>>    diff --color-moved: unify moved block growth functions
>>    diff --color-moved: shrink potential moved blocks as we go
>>    diff --color-moved: stop clearing potential moved blocks
>>    diff --color-moved-ws=allow-indentation-change: improve hash lookups
>>    diff --color-moved: intern strings
> 
> Nice to see these land after the earlier on-list reference to them.
> 
> I skimmed these mostly, and am not familiar with this code, but didn't
> see any glaring things missing. There was one existing oddity with
> assigning a 0 to an "enum diff_symbol", don't we want
> DIFF_SYMBOL_BINARY_DIFF_HEADER? In any case, it's just a line you touch
> in 02/10, and pre-dates these changes.

Thanks for taking a look at these patches. I take your point about the 
assignment, I don't think the actual value matters so long as it's not 
DIFF_SYMBOL_PLUS or DIFF_SYMBOL_MINUS.

> One thing I would very much like to see here is a conversion of the
> existing ad-hoc benchmarks you note in commit messages to something that
> lives in t/perf/, it really helps future maintenance of perf-sensitive
> code to be able to re-run those, and I for one find the output much
> easier to read than whatever tool you're using to produce your
> benchmarks.

Adding some perf tests is a good idea, I'll do that when I reroll which 
may take a couple of weeks as I'm going offline for a while at the end 
of the week. The tool I have been using is hyperfine[1], it has been 
used by a few other contributors (see `git log --grep σ` if you're 
interested)

[1] https://github.com/sharkdp/hyperfine

Best Wishes

Phillip