Message ID | pull.981.git.1623675888.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | diff --color-moved[-ws] speedups | expand |
On Mon, Jun 14 2021, Phillip Wood via GitGitGadget wrote: > The current implementation of diff --color-moved-ws=allow-indentation-change > is considerably slower that the implementation of diff --color-moved which > is in turn slower than a regular diff. This patch series starts with a > couple of bug fixes and then reworks the implementation of diff > --color-moved and diff --color-moved-ws=allow-indentation-change to speed > them up on large diffs. The time to run git diff --color-moved > --no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run > git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0 > v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs > with --color-moved - the time to run git log -p --color-moved > --no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent > processors. On older processors these patches reduce the running time in all > cases that I've tested. In general the larger the diff the larger the speed > up. As an extreme example the time to run diff --color-moved > --color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8 > minutes to 6 seconds. > > Phillip Wood (10): > diff --color-moved=zerba: fix alternate coloring > diff --color-moved: avoid false short line matches and bad zerba > coloring > diff: simplify allow-indentation-change delta calculation > diff --color-moved-ws=allow-indentation-change: simplify and optimize > diff --color-moved: call comparison function directly > diff --color-moved: unify moved block growth functions > diff --color-moved: shrink potential moved blocks as we go > diff --color-moved: stop clearing potential moved blocks > diff --color-moved-ws=allow-indentation-change: improve hash lookups > diff --color-moved: intern strings Nice to see these land after the earlier on-list reference to them. I skimmed these mostly, and am not familiar with this code, but didn't see any glaring things missing. There was one existing oddity with assigning a 0 to an "enum diff_symbol", don't we want DIFF_SYMBOL_BINARY_DIFF_HEADER? In any case, it's just a line you touch in 02/10, and pre-dates these changes. One thing I would very much like to see here is a conversion of the existing ad-hoc benchmarks you note in commit messages to something that lives in t/perf/, it really helps future maintenance of perf-sensitive code to be able to re-run those, and I for one find the output much easier to read than whatever tool you're using to produce your benchmarks.
On 16/06/2021 15:24, Ævar Arnfjörð Bjarmason wrote: > > On Mon, Jun 14 2021, Phillip Wood via GitGitGadget wrote: > >> The current implementation of diff --color-moved-ws=allow-indentation-change >> is considerably slower that the implementation of diff --color-moved which >> is in turn slower than a regular diff. This patch series starts with a >> couple of bug fixes and then reworks the implementation of diff >> --color-moved and diff --color-moved-ws=allow-indentation-change to speed >> them up on large diffs. The time to run git diff --color-moved >> --no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run >> git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0 >> v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs >> with --color-moved - the time to run git log -p --color-moved >> --no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent >> processors. On older processors these patches reduce the running time in all >> cases that I've tested. In general the larger the diff the larger the speed >> up. As an extreme example the time to run diff --color-moved >> --color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8 >> minutes to 6 seconds. >> >> Phillip Wood (10): >> diff --color-moved=zerba: fix alternate coloring >> diff --color-moved: avoid false short line matches and bad zerba >> coloring >> diff: simplify allow-indentation-change delta calculation >> diff --color-moved-ws=allow-indentation-change: simplify and optimize >> diff --color-moved: call comparison function directly >> diff --color-moved: unify moved block growth functions >> diff --color-moved: shrink potential moved blocks as we go >> diff --color-moved: stop clearing potential moved blocks >> diff --color-moved-ws=allow-indentation-change: improve hash lookups >> diff --color-moved: intern strings > > Nice to see these land after the earlier on-list reference to them. > > I skimmed these mostly, and am not familiar with this code, but didn't > see any glaring things missing. There was one existing oddity with > assigning a 0 to an "enum diff_symbol", don't we want > DIFF_SYMBOL_BINARY_DIFF_HEADER? In any case, it's just a line you touch > in 02/10, and pre-dates these changes. Thanks for taking a look at these patches. I take your point about the assignment, I don't think the actual value matters so long as it's not DIFF_SYMBOL_PLUS or DIFF_SYMBOL_MINUS. > One thing I would very much like to see here is a conversion of the > existing ad-hoc benchmarks you note in commit messages to something that > lives in t/perf/, it really helps future maintenance of perf-sensitive > code to be able to re-run those, and I for one find the output much > easier to read than whatever tool you're using to produce your > benchmarks. Adding some perf tests is a good idea, I'll do that when I reroll which may take a couple of weeks as I'm going offline for a while at the end of the week. The tool I have been using is hyperfine[1], it has been used by a few other contributors (see `git log --grep σ` if you're interested) [1] https://github.com/sharkdp/hyperfine Best Wishes Phillip