mbox series

[0/7] Add a new --remerge-diff capability to show & log

Message ID pull.1080.git.git.1630376800.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Add a new --remerge-diff capability to show & log | expand

Message

Philippe Blain via GitGitGadget Aug. 31, 2021, 2:26 a.m. UTC
Here are some patches to add a --remerge-diff capability to show & log,
which works by comparing merge commits to an automatic remerge (note that
the automatic remerge tree can contain files with conflict markers).

Here are some example commits you can try this out on (with git show
--remerge-diff $COMMIT):

 * git.git conflicted merge: 07601b5b36
 * git.git non-conflicted change: bf04590ecd
 * linux.git conflicted merge: eab3540562fb
 * linux.git non-conflicted change: 223cea6a4f05

Many more can be found by just running git log --merges --remerge-diff in
your repository of choice and searching for diffs (most merges tend to be
clean and unmodified and thus produce no diff but a search of '^diff' in the
log output tends to find the examples nicely).

Some basic high level details about this new option:

 * This option is most naturally compared to --cc, though the output seems
   to be much more understandable to most users than --cc output.
 * Since merges are often clean and unmodified, this new option results in
   an empty diff for most merges.
 * This new option shows things like the removal of conflict markers, which
   hunks users picked from the various conflicted sides to keep or remove,
   and shows changes made outside of conflict markers (which might reflect
   changes needed to resolve semantic conflicts or cleanups of e.g.
   compilation warnings or other additional changes an integrator felt
   belonged in the merged result).
 * This new option does not (currently) work for octopus merges, since
   merge-ort is specific to two-parent merges[1].
 * This option will not work on a read-only or full filesystem[2].
 * We discussed this capability at Git Merge 2020, and one of the
   suggestions was doing a periodic git gc --auto during the operation (due
   to potential new blobs and trees created during the operation). I found a
   way to avoid that; see [2].
 * This option is faster than you'd probably expect; it handles 33.5 merge
   commits per second in linux.git on my computer; see below.

In regards to the performance point above, the timing for running the
following command:

time git log --min-parents=2 --max-parents=2 $DIFF_FLAG | wc -l


in linux.git (with v5.4 checked out, since my copy of linux is very out of
date) is as follows:

DIFF_FLAG=--cc:            71m 31.536s
DIFF_FLAG=--remerge-diff:  31m  3.170s


Note that there are 62476 merges in this history. Also, output size is:

DIFF_FLAG=--cc:            2169111 lines
DIFF_FLAG=--remerge-diff:  2458020 lines


So roughly the same amount of output as --cc, as you'd expect.

As a side note: git log --remerge-diff, when run in various repositories and
allowed to run all the way back to the beginning(s) of history, is a nice
stress test of sorts for merge-ort. Especially when users run it for you on
their repositories they are working on, whether intentionally or via a bug
in a tool triggering that command to be run unexpectedly. Long story short,
such a bug in an internal tool existed last December and this command was
run on an internal repository and found a platform-specific bug in merge-ort
on some really old merge commit from that repo. I fixed that bug (a
STABLE_QSORT thing) while upstreaming all the merge-ort patches in the mean
time, but it was nice getting extra testing. Having more folks run this on
their repositories might be useful extra testing of the new merge strategy.

Also, I previously mentioned --remerge-diff-only (a flag to show how
cherry-picks or reverts differ from an automatic cherry-pick or revert, in
addition to showing how merges differ from an automatic merge). This series
does not include the patches to introduce that option; I'll submit them
later.

Two other things that might be interesting but are not included and which I
haven't investigated:

 * some mechanism for passing extra merge options through (e.g.
   -Xignore-space-change)
 * a capability to compare the automatic merge to a second automatic merge
   done with different merge options. (Not sure if this would be of interest
   to end users, but might be interesting while developing new a
   --strategy-option, or maybe checking how changing some default in the
   merge algorithm would affect historical merges in various repositories).

[1] I have nebulous ideas of how an Octopus-centric ORT strategy could be
written -- basically, just repeatedly invoking ort and trying to make sure
nested conflicts can be differentiated. For now, though, a simple warning is
printed that octopus merges are not handled and no diff will be shown. [2]
New blobs/trees can be written by the three-way merging step. These are
written to a temporary area (via tmp-objdir.c) under the git object store
that is cleaned up at the end of the operation, with the new loose objects
from the remerge being cleaned up after each individual merge.

Elijah Newren (7):
  merge-ort: mark a few more conflict messages as omittable
  merge-ort: add ability to record conflict messages in a file
  ll-merge: add API for capturing warnings in a strbuf instead of stderr
  merge-ort: capture and print ll-merge warnings in our preferred
    fashion
  tmp-objdir: new API for creating and removing primary object dirs
  show, log: provide a --remerge-diff capability
  doc/diff-options: explain the new --remerge-diff option

 Documentation/diff-options.txt |  8 +++
 builtin/log.c                  | 23 ++++++++
 diff-merges.c                  | 12 +++++
 ll-merge.c                     | 51 +++++++++++++-----
 ll-merge.h                     |  9 ++++
 log-tree.c                     | 69 ++++++++++++++++++++++++
 merge-ort.c                    | 96 +++++++++++++++++++++++++++++++---
 merge-recursive.c              |  3 ++
 merge-recursive.h              |  1 +
 revision.h                     |  6 ++-
 t/t6404-recursive-merge.sh     | 10 +++-
 t/t6406-merge-attr.sh          | 10 +++-
 tmp-objdir.c                   | 29 ++++++++++
 tmp-objdir.h                   | 16 ++++++
 14 files changed, 319 insertions(+), 24 deletions(-)


base-commit: c4203212e360b25a1c69467b5a8437d45a373cac
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1080%2Fnewren%2Fremerge-diff-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1080/newren/remerge-diff-v1
Pull-Request: https://github.com/git/git/pull/1080

Comments

Bagas Sanjaya Aug. 31, 2021, 11:05 a.m. UTC | #1
On 31/08/21 09.26, Elijah Newren via GitGitGadget wrote:
> Here are some patches to add a --remerge-diff capability to show & log,
> which works by comparing merge commits to an automatic remerge (note that
> the automatic remerge tree can contain files with conflict markers).
> 
> Here are some example commits you can try this out on (with git show
> --remerge-diff $COMMIT):
> 
>   * git.git conflicted merge: 07601b5b36
>   * git.git non-conflicted change: bf04590ecd
>   * linux.git conflicted merge: eab3540562fb
>   * linux.git non-conflicted change: 223cea6a4f05
> 
<snip>...
> In regards to the performance point above, the timing for running the
> following command:
> 
> time git log --min-parents=2 --max-parents=2 $DIFF_FLAG | wc -l
> 
> 
> in linux.git (with v5.4 checked out, since my copy of linux is very out of
> date) is as follows:
> 
> DIFF_FLAG=--cc:            71m 31.536s
> DIFF_FLAG=--remerge-diff:  31m  3.170s
> 
> 
> Note that there are 62476 merges in this history. Also, output size is:
> 
> DIFF_FLAG=--cc:            2169111 lines
> DIFF_FLAG=--remerge-diff:  2458020 lines
> 

Which repo did you mean by linux.git? Kernel developers often work 
against Linus' mainline tree [1], while end-users (including myself) 
prefer stable tree (which is mainline + stable release branches and 
tags) [2].

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Elijah Newren Aug. 31, 2021, 4:16 p.m. UTC | #2
On Tue, Aug 31, 2021 at 4:05 AM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
>
> On 31/08/21 09.26, Elijah Newren via GitGitGadget wrote:
> > Here are some patches to add a --remerge-diff capability to show & log,
> > which works by comparing merge commits to an automatic remerge (note that
> > the automatic remerge tree can contain files with conflict markers).
> >
> > Here are some example commits you can try this out on (with git show
> > --remerge-diff $COMMIT):
> >
> >   * git.git conflicted merge: 07601b5b36
> >   * git.git non-conflicted change: bf04590ecd
> >   * linux.git conflicted merge: eab3540562fb
> >   * linux.git non-conflicted change: 223cea6a4f05
> >
> <snip>...
> > In regards to the performance point above, the timing for running the
> > following command:
> >
> > time git log --min-parents=2 --max-parents=2 $DIFF_FLAG | wc -l
> >
> >
> > in linux.git (with v5.4 checked out, since my copy of linux is very out of
> > date) is as follows:
> >
> > DIFF_FLAG=--cc:            71m 31.536s
> > DIFF_FLAG=--remerge-diff:  31m  3.170s
> >
> >
> > Note that there are 62476 merges in this history. Also, output size is:
> >
> > DIFF_FLAG=--cc:            2169111 lines
> > DIFF_FLAG=--remerge-diff:  2458020 lines
> >
>
> Which repo did you mean by linux.git? Kernel developers often work
> against Linus' mainline tree [1], while end-users (including myself)
> prefer stable tree (which is mainline + stable release branches and
> tags) [2].
>
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> [2]: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git

$ git remote -v
origin git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
(fetch)
origin git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
(push)
$ git log -1 origin/master
commit 11a48a5a18c63fd7621bb050228cebf13566e4d8 (tag: v5.6-rc2,
origin/master, origin/HEAD, master)
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun Feb 16 13:16:59 2020 -0800

    Linux 5.6-rc2
$ git log -1 HEAD
commit 219d54332a09e8d8741c1e1982f5eae56099de85 (HEAD, tag: v5.4)
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun Nov 24 16:32:01 2019 -0800

    Linux 5.4
Junio C Hamano Aug. 31, 2021, 8:03 p.m. UTC | #3
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Here are some patches to add a --remerge-diff capability to show & log,
> which works by comparing merge commits to an automatic remerge (note that
> the automatic remerge tree can contain files with conflict markers).

Excited ;-)

>  * This new option does not (currently) work for octopus merges, since
>    merge-ort is specific to two-parent merges[1].

Unless you do so manually, the native "octopus" backend does not let
you create non-trivial merges anyway, so punting on them should not
be a big loss.  Falling back to --cc might be a usable alternative.

>  * This option will not work on a read-only or full filesystem[2].

OK.  I am not sure if it is worth doing the "temporary objects"
trick, though---would it risk repository corruption if somebody is
creating a new blob that happens to be identical to the one that is
involved in the remerge operation at the same time, or there is no
visibility of the temporary area to these "somebody" outside so
there is no risk?
Elijah Newren Aug. 31, 2021, 8:23 p.m. UTC | #4
On Tue, Aug 31, 2021 at 1:03 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Here are some patches to add a --remerge-diff capability to show & log,
> > which works by comparing merge commits to an automatic remerge (note that
> > the automatic remerge tree can contain files with conflict markers).
>
> Excited ;-)
>
> >  * This new option does not (currently) work for octopus merges, since
> >    merge-ort is specific to two-parent merges[1].
>
> Unless you do so manually, the native "octopus" backend does not let
> you create non-trivial merges anyway, so punting on them should not
> be a big loss.  Falling back to --cc might be a usable alternative.
>
> >  * This option will not work on a read-only or full filesystem[2].
>
> OK.  I am not sure if it is worth doing the "temporary objects"
> trick, though---would it risk repository corruption if somebody is
> creating a new blob that happens to be identical to the one that is
> involved in the remerge operation at the same time, or there is no
> visibility of the temporary area to these "somebody" outside so
> there is no risk?

The temporary area is only used by the process running --remerge-diff,
so there's no risk of corruption.  If you have two `git log
--remerge-diff ...` processes running at the same time, they each have
their own temporary areas.
Junio C Hamano Sept. 1, 2021, 9:07 p.m. UTC | #5
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Here are some patches to add a --remerge-diff capability to show & log,

One sad omission from the maintainer usecase is that we do not seem
to know "git diff --remerge-diff" yet during a conflicted merge.

"git diff [-- <path>]" before recording the resolution for the path
with "git add <path>" shows combined patch to give a final sanity
check before committing it to the rerere database.  I am wondering
if viewing it in the --remrege-diff format instead would help this
step even more.
Elijah Newren Sept. 1, 2021, 9:42 p.m. UTC | #6
On Wed, Sep 1, 2021 at 2:08 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Here are some patches to add a --remerge-diff capability to show & log,
>
> One sad omission from the maintainer usecase is that we do not seem
> to know "git diff --remerge-diff" yet during a conflicted merge.
>
> "git diff [-- <path>]" before recording the resolution for the path
> with "git add <path>" shows combined patch to give a final sanity
> check before committing it to the rerere database.  I am wondering
> if viewing it in the --remrege-diff format instead would help this
> step even more.

We do have `git diff AUTO_MERGE`, though.  It's not quite the same as
it doesn't include all the "CONFLICT" messages shown in the terminal
like --remerge-diff does with log/show, but otherwise it's the same.
Perhaps we could even alias `git diff --remerge-diff` to `git diff
AUTO_MERGE`?

See commit 5291828df838 (merge-ort: write $GIT_DIR/AUTO_MERGE whenever
we hit a conflict, 2021-03-20) for more details.
Junio C Hamano Sept. 1, 2021, 9:55 p.m. UTC | #7
Elijah Newren <newren@gmail.com> writes:

> We do have `git diff AUTO_MERGE`, though.  It's not quite the same as
> it doesn't include all the "CONFLICT" messages shown in the terminal
> like --remerge-diff does with log/show, but otherwise it's the same.

Ah, forgot about that one, so we are good.

> Perhaps we could even alias `git diff --remerge-diff` to `git diff
> AUTO_MERGE`?

I do not think it is a good idea to hide AUTO_MERGE behind an
option.  It is a feature that deserves more user awareness by
itself.

Thanks.