diff mbox series

[v3,04/15] merge-tree: implement real merges

Message ID 02c29f920d0d5fde6d85f7b86a69be92e3f0f34d.1643787281.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series In-core git merge-tree ("Server side merges") | expand

Commit Message

Elijah Newren Feb. 2, 2022, 7:34 a.m. UTC
From: Elijah Newren <newren@gmail.com>

This adds the ability to perform real merges rather than just trivial
merges (meaning handling three way content merges, recursive ancestor
consolidation, renames, proper directory/file conflict handling, and so
forth).  However, unlike `git merge`, the working tree and index are
left alone and no branch is updated.

The only output is:
  - the toplevel resulting tree printed on stdout
  - exit status of 0 (clean), 1 (conflicts present), anything else
    (merge could not be performed; unknown if clean or conflicted)

This output is meant to be used by some higher level script, perhaps in
a sequence of steps like this:

   NEWTREE=$(git merge-tree --write-tree $BRANCH1 $BRANCH2)
   test $? -eq 0 || die "There were conflicts..."
   NEWCOMMIT=$(git commit-tree $NEWTREE -p $BRANCH1 -p $BRANCH2)
   git update-ref $BRANCH1 $NEWCOMMIT

Note that higher level scripts may also want to access the
conflict/warning messages normally output during a merge, or have quick
access to a list of files with conflicts.  That is not available in this
preliminary implementation, but subsequent commits will add that
ability.

This also marks the traditional trivial merge of merge-tree as
deprecated.  The trivial merge not only had limited applicability, the
output format was also difficult to work with (and its format
undocumented), and will generally be less performant than real merges.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-merge-tree.txt | 71 +++++++++++++++++++++-----
 builtin/merge-tree.c             | 44 +++++++++++++++-
 t/t4301-merge-tree-write-tree.sh | 88 ++++++++++++++++++++++++++++++++
 3 files changed, 190 insertions(+), 13 deletions(-)
 create mode 100755 t/t4301-merge-tree-write-tree.sh

Comments

Junio C Hamano Feb. 2, 2022, 9:22 p.m. UTC | #1
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -392,7 +395,46 @@ struct merge_tree_options {
>  static int real_merge(struct merge_tree_options *o,
>  		      const char *branch1, const char *branch2)
>  {
> -	die(_("real merges are not yet implemented"));
> +	struct commit *parent1, *parent2;
> +	struct commit_list *common;
> +	struct commit_list *merge_bases = NULL;
> +	struct commit_list *j;
> +	struct merge_options opt;
> +	struct merge_result result = { 0 };
> +
> +	parent1 = get_merge_parent(branch1);
> +	if (!parent1)
> +		help_unknown_ref(branch1, "merge-tree",
> +				 _("not something we can merge"));
> +
> +	parent2 = get_merge_parent(branch2);
> +	if (!parent2)
> +		help_unknown_ref(branch2, "merge-tree",
> +				 _("not something we can merge"));
> +
> +	init_merge_options(&opt, the_repository);
> +
> +	opt.show_rename_progress = 0;
> +
> +	opt.branch1 = branch1;
> +	opt.branch2 = branch2;
> +
> +	/*
> +	 * Get the merge bases, in reverse order; see comment above
> +	 * merge_incore_recursive in merge-ort.h
> +	 */
> +	common = get_merge_bases(parent1, parent2);
> +	if (!common)
> +		die(_("refusing to merge unrelated histories"));

It appears to me that "merge-tree" in this mode, with the above
code, cannot be used as a workhorse to implement server-side
cherry-pick (or revert), which needs to allow the user to specify an
arbitrary "common ancestor", instead of computing on its own.

To replay the change made by commit A on top of commit X (i.e.
"cherry-pick A on X"), we have to be able to say "compute the
three-way merge between A and X, pretending as if A^ were their
common ancestor".  The story is the same for revert---we compute
three-way merge between A^ and X, pretending as if A were their
common ancestor.

The above interface into this function, sadly, does not seem to
allow such a request, unless I am missing something.

And if I am correct, it is a shame---after all, the point of the
merge-trees command is to take three trees and run a three-way
merge, and not being able to merge three "trees" and require
"commits" makes this mode much less useful than its potential.
Elijah Newren Feb. 2, 2022, 9:56 p.m. UTC | #2
On Wed, Feb 2, 2022 at 1:22 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > @@ -392,7 +395,46 @@ struct merge_tree_options {
> >  static int real_merge(struct merge_tree_options *o,
> >                     const char *branch1, const char *branch2)
> >  {
> > -     die(_("real merges are not yet implemented"));
> > +     struct commit *parent1, *parent2;
> > +     struct commit_list *common;
> > +     struct commit_list *merge_bases = NULL;
> > +     struct commit_list *j;
> > +     struct merge_options opt;
> > +     struct merge_result result = { 0 };
> > +
> > +     parent1 = get_merge_parent(branch1);
> > +     if (!parent1)
> > +             help_unknown_ref(branch1, "merge-tree",
> > +                              _("not something we can merge"));
> > +
> > +     parent2 = get_merge_parent(branch2);
> > +     if (!parent2)
> > +             help_unknown_ref(branch2, "merge-tree",
> > +                              _("not something we can merge"));
> > +
> > +     init_merge_options(&opt, the_repository);
> > +
> > +     opt.show_rename_progress = 0;
> > +
> > +     opt.branch1 = branch1;
> > +     opt.branch2 = branch2;
> > +
> > +     /*
> > +      * Get the merge bases, in reverse order; see comment above
> > +      * merge_incore_recursive in merge-ort.h
> > +      */
> > +     common = get_merge_bases(parent1, parent2);
> > +     if (!common)
> > +             die(_("refusing to merge unrelated histories"));
>
> It appears to me that "merge-tree" in this mode, with the above
> code, cannot be used as a workhorse to implement server-side
> cherry-pick (or revert), which needs to allow the user to specify an
> arbitrary "common ancestor", instead of computing on its own.
>
> To replay the change made by commit A on top of commit X (i.e.
> "cherry-pick A on X"), we have to be able to say "compute the
> three-way merge between A and X, pretending as if A^ were their
> common ancestor".  The story is the same for revert---we compute
> three-way merge between A^ and X, pretending as if A were their
> common ancestor.
>
> The above interface into this function, sadly, does not seem to
> allow such a request, unless I am missing something.
>
> And if I am correct, it is a shame---after all, the point of the
> merge-trees command is to take three trees and run a three-way
> merge, and not being able to merge three "trees" and require
> "commits" makes this mode much less useful than its potential.

Yes, you are reading right.  I think the cherry-pick/rebase
replacement actually deserves a separate command from what merges
should use; replaying a sequence of commits just has a number of UI
differences and abilities that I think pull it in a different
direction.  I didn't want to wait and submit everything all at once
(this series is long enough), and I figured that providing the in-core
equivalent to `git merge` was a simpler first step worth submitting
first before later providing the in-core equivalents to `git
rebase/git cherry-pick`.

(Also, I'm a bit wary of providing a command meant to just do a single
three-way merge with a defined merge-base, because that seems to be a
path towards returning to a scripted rebase.  Having a way to do only
a single such special merge is fine but we should avoid encouraging
people to go down that route.)
Junio C Hamano Feb. 2, 2022, 10:01 p.m. UTC | #3
Elijah Newren <newren@gmail.com> writes:

> Yes, you are reading right.  I think the cherry-pick/rebase
> replacement actually deserves a separate command from what merges
> should use; replaying a sequence of commits just has a number of UI
> differences and abilities that I think pull it in a different
> direction.

I completely disagree.  Each individual step in a sequence of
replaying commits in order (or in reverse order) should be
scriptable as a single merge-tree that takes "apply the change to go
from A^ to A on X".  Sequencing and placing UI around it is a job
for the script that drives merge-tree.
Elijah Newren Feb. 3, 2022, 12:18 a.m. UTC | #4
On Wed, Feb 2, 2022 at 2:01 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > Yes, you are reading right.  I think the cherry-pick/rebase
> > replacement actually deserves a separate command from what merges
> > should use; replaying a sequence of commits just has a number of UI
> > differences and abilities that I think pull it in a different
> > direction.
>
> I completely disagree.  Each individual step in a sequence of
> replaying commits in order (or in reverse order) should be
> scriptable as a single merge-tree that takes "apply the change to go
> from A^ to A on X".  Sequencing and placing UI around it is a job
> for the script that drives merge-tree.

Adding such an ability to merge-tree would be trivial -- it basically
involves just two things: (1) accepting one extra argument, and (2)
calling merge_incore_nonrecursive() instead of
merge_incore_recursive().

However, I think forking a subprocess for every merge of a series of
commits is a completely unreasonable overhead, so even if we provide
such an option to merge-tree, I still want a separate plumbing-ish
tool that does non-worktree/non-index replaying of commits which is
not written as a driver of merge-tree.  That other tool should just
call merge_incore_nonrecursive() directly.  And such a tool, since it
should handle an arbitrary number of commits, should certainly be able
to handle just one commit.  From that angle, it feels like adding
another mode to merge-tree would just be a partial duplication of the
other tool.

However, if the other tool doesn't obviate the need for this
additional mode (perhaps it ends up being forced to be too
porcelain-ish insteading of plumbing-ish?), or folks really just want
another merge-tree mode, I'm happy to add one along with the tool I
submit later.  Does that sound reasonable to you, or is there
something you're still objecting to that I've missed?
Johannes Altmanninger Feb. 3, 2022, 10:42 a.m. UTC | #5
On Wed, Feb 02, 2022 at 04:18:39PM -0800, Elijah Newren wrote:
> On Wed, Feb 2, 2022 at 2:01 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Elijah Newren <newren@gmail.com> writes:
> >
> > > Yes, you are reading right.  I think the cherry-pick/rebase
> > > replacement actually deserves a separate command from what merges
> > > should use; replaying a sequence of commits just has a number of UI
> > > differences and abilities that I think pull it in a different
> > > direction.
> >
> > I completely disagree.  Each individual step in a sequence of
> > replaying commits in order (or in reverse order) should be
> > scriptable as a single merge-tree that takes "apply the change to go
> > from A^ to A on X".  Sequencing and placing UI around it is a job
> > for the script that drives merge-tree.
> 
> Adding such an ability to merge-tree would be trivial -- it basically
> involves just two things: (1) accepting one extra argument, and (2)
> calling merge_incore_nonrecursive() instead of
> merge_incore_recursive().
> 
> However, I think forking a subprocess for every merge of a series of
> commits is a completely unreasonable overhead, so even if we provide
> such an option to merge-tree, I still want a separate plumbing-ish
> tool that does non-worktree/non-index replaying of commits which is
> not written as a driver of merge-tree.  That other tool should just
> call merge_incore_nonrecursive() directly.  And such a tool, since it
> should handle an arbitrary number of commits, should certainly be able
> to handle just one commit.  From that angle, it feels like adding
> another mode to merge-tree would just be a partial duplication of the
> other tool.

I wonder how the UI of a tool that does non-worktree/non-index cherry-picks
will look like.  I'd expect it to produce the same output as merge-tree,
except cherry-pick should probably output a commit OID, not a tree.

Maybe we want a unified command that produces commits from any sequence of
merge/cherry-pick/revert/reword steps. The obvious UI would use something
like the rebase-todo list as input.  For example:

	$ echo '
	pick commit1
	reword commit2	# edit commit message in $GIT_EDITOR
	merge commit3 -m "log message"
	' | git create-commit commit0
	<OID of final commit>

we start from commit0 and apply steps one-by-one. Obviously, one unsolved
problem is how to pass parameters like commit messages if no editor should
be invoked (my sketch uses -m).
If any of the steps fails when merging merge, then we get the tree with
conflicts

	$ echo '
	pick commit1
	pick commit2
	pick commit-that-does-not-apply
	' | git create-commit commit0
	<OID of commit after step 2>
	<OID of toplevel tree after failed merge>
	<Conflicted file info>
	<Informational messages>

Replaying a series of commits might look like this:

	$ echo 'pick commit1 ^commit0' | git create-commit new-base

I'm concluding that this is a difficult UI problem, and having a merge-tree
command that accepts a "common ancestor" parameter could make it easier
to experiment.  Of course that depends on who is experimenting.

> 
> However, if the other tool doesn't obviate the need for this
> additional mode (perhaps it ends up being forced to be too
> porcelain-ish insteading of plumbing-ish?), or folks really just want
> another merge-tree mode, I'm happy to add one along with the tool I
> submit later.  Does that sound reasonable to you, or is there
> something you're still objecting to that I've missed?
Elijah Newren Feb. 3, 2022, 4:54 p.m. UTC | #6
Hi,

On Thu, Feb 3, 2022 at 2:42 AM Johannes Altmanninger <aclopte@gmail.com> wrote:
>
> On Wed, Feb 02, 2022 at 04:18:39PM -0800, Elijah Newren wrote:
> > On Wed, Feb 2, 2022 at 2:01 PM Junio C Hamano <gitster@pobox.com> wrote:
> > >
> > > Elijah Newren <newren@gmail.com> writes:
> > >
> > > > Yes, you are reading right.  I think the cherry-pick/rebase
> > > > replacement actually deserves a separate command from what merges
> > > > should use; replaying a sequence of commits just has a number of UI
> > > > differences and abilities that I think pull it in a different
> > > > direction.
> > >
> > > I completely disagree.  Each individual step in a sequence of
> > > replaying commits in order (or in reverse order) should be
> > > scriptable as a single merge-tree that takes "apply the change to go
> > > from A^ to A on X".  Sequencing and placing UI around it is a job
> > > for the script that drives merge-tree.
> >
> > Adding such an ability to merge-tree would be trivial -- it basically
> > involves just two things: (1) accepting one extra argument, and (2)
> > calling merge_incore_nonrecursive() instead of
> > merge_incore_recursive().
> >
> > However, I think forking a subprocess for every merge of a series of
> > commits is a completely unreasonable overhead, so even if we provide
> > such an option to merge-tree, I still want a separate plumbing-ish
> > tool that does non-worktree/non-index replaying of commits which is
> > not written as a driver of merge-tree.  That other tool should just
> > call merge_incore_nonrecursive() directly.  And such a tool, since it
> > should handle an arbitrary number of commits, should certainly be able
> > to handle just one commit.  From that angle, it feels like adding
> > another mode to merge-tree would just be a partial duplication of the
> > other tool.
>
> I wonder how the UI of a tool that does non-worktree/non-index cherry-picks
> will look like.  I'd expect it to produce the same output as merge-tree,
> except cherry-pick should probably output a commit OID, not a tree.
>
> Maybe we want a unified command that produces commits from any sequence of
> merge/cherry-pick/revert/reword steps. The obvious UI would use something
> like the rebase-todo list as input.  For example:
>
>         $ echo '
>         pick commit1
>         reword commit2  # edit commit message in $GIT_EDITOR
>         merge commit3 -m "log message"
>         ' | git create-commit commit0
>         <OID of final commit>
>
> we start from commit0 and apply steps one-by-one. Obviously, one unsolved
> problem is how to pass parameters like commit messages if no editor should
> be invoked (my sketch uses -m).
> If any of the steps fails when merging merge, then we get the tree with
> conflicts
>
>         $ echo '
>         pick commit1
>         pick commit2
>         pick commit-that-does-not-apply
>         ' | git create-commit commit0
>         <OID of commit after step 2>
>         <OID of toplevel tree after failed merge>
>         <Conflicted file info>
>         <Informational messages>
>
> Replaying a series of commits might look like this:
>
>         $ echo 'pick commit1 ^commit0' | git create-commit new-base
>
> I'm concluding that this is a difficult UI problem

I agree.  I've got a lot of thoughts on it, and some work in progress
towards it (https://github.com/newren/git/tree/replay -- _very_ hacky,
not even close to alpha quality, lots of fixup commits, todo comments,
random brain dump files added to the tree, based on a previous round
of this patch series, not updated for weeks, etc., etc.)

> and having a merge-tree
> command that accepts a "common ancestor" parameter could make it easier
> to experiment.  Of course that depends on who is experimenting.

I think that would result in experiments and eventually full-blown
scripts designed around forking subprocesses for every merge, and
pushes us back into the world of having a scripted-rebase again.  Yes,
I know people can transliterate shell back to C; it seems to always be
done as a half-way measure with the forking just being done from C or
have other UI-warts guided by the shell design.  In fact, *that* was
the primary reason for me not providing a merge-tree option based on
merge_incore_nonrecursive(), despite how trivial it'd be to provide
it.  If someone wanted a merge_incore_nonrecursive() mode for
merge-tree for reasons other than attempting to build a
rebase/cherry-pick replacement based on it, then I'd be much happier
to provide it.

If someone wants to experiment with what a plumbing-ish
rebase/cherry-pick would look like, the _right_ way to do it would be
making using of merge_incore_nonrecursive() directly.  If they want
example code, I already provided some a year and a half ago and got it
merged into git.git in the form of t/helper/test-fast-rebase.c.  My
"replay" branch is based on that code, but (a) moves it from t/helper
to a real builtin, (b) removes the hardcoded very strict input, (c)
removes the line of code doing the index & working tree updates, and
(d) modifies the output to be a more plumbing-ish style.

We'll certainly have discussions on what that should look like.  But a
plumbing-ish replacement for merge was much simpler, and made sense to
do first.  I would prefer to concentrate on getting that hammered down
first.  Then I'll start discussions on a plumbing-ish
rebase/cherry-pick.  And if that doesn't fulfill all the needs that
folks think they want out of merge-tree, then we can add a
merge_incore_nonrecursive()-based mode to merge-tree.  It's all
coming, but having fought transliterations-of-scripts in
merge-recursive.c, sequencer.c, stash.c, rebase.c, etc. for years I
really, really don't want any more of that.  Let's end that insanity.
Junio C Hamano Feb. 3, 2022, 8:05 p.m. UTC | #7
Johannes Altmanninger <aclopte@gmail.com> writes:

> I wonder how the UI of a tool that does non-worktree/non-index cherry-picks
> will look like.  I'd expect it to produce the same output as merge-tree,
> except cherry-pick should probably output a commit OID, not a tree.

A tool that creates a "merge", "cherry-pick" and "revert" should all
output a resulting commit object name, not a tree object.

And "merge-tree" that creates a three-way merge to apply a change to
go from tree A to tree B on top of tree X should take three tree-ish
object names and produce a resulting one tree object name.

Using the "merge-tree" that merges two trees using the third tree as
a common ancestor, these three higher-level tools can perform
"merge", "cherry-pick", and "revert".  They would all take two
commit objects and produce one commit object.
Josh Steadmon Feb. 4, 2022, 4:48 a.m. UTC | #8
On 2022.02.02 07:34, Elijah Newren via GitGitGadget wrote:
> diff --git a/Documentation/git-merge-tree.txt b/Documentation/git-merge-tree.txt
> index 58731c19422..569485815a0 100644
> --- a/Documentation/git-merge-tree.txt
> +++ b/Documentation/git-merge-tree.txt
> @@ -3,26 +3,73 @@ git-merge-tree(1)
>  
>  NAME
>  ----
> -git-merge-tree - Show three-way merge without touching index
> +git-merge-tree - Perform merge without touching index or working tree
>  
>  
>  SYNOPSIS
>  --------
>  [verse]
> -'git merge-tree' <base-tree> <branch1> <branch2>
> +'git merge-tree' [--write-tree] <branch1> <branch2>
> +'git merge-tree' [--trivial-merge] <base-tree> <branch1> <branch2> (deprecated)
>  
>  DESCRIPTION
>  -----------
> -Reads three tree-ish, and output trivial merge results and
> -conflicting stages to the standard output.  This is similar to
> -what three-way 'git read-tree -m' does, but instead of storing the
> -results in the index, the command outputs the entries to the
> -standard output.
> -
> -This is meant to be used by higher level scripts to compute
> -merge results outside of the index, and stuff the results back into the
> -index.  For this reason, the output from the command omits
> -entries that match the <branch1> tree.
> +
> +Performs a merge, but does not make any new commits and does not read
> +from or write to either the working tree or index.
> +
> +The second form is deprecated and supported only for backward
> +compatibility.  It will likely be removed in the future, and will not
> +be discussed further in this manual.
> +
> +The first form will merge the two branches, doing a real merge.  A real
> +merge is distinguished from a trivial merge in that it includes:
> +
> +  * three way content merges of individual files
> +  * rename detection
> +  * proper directory/file conflict handling
> +  * recursive ancestor consolidation (i.e. when there is more than one
> +    merge base, creating a virtual merge base by merging the merge bases)
> +  * etc.
> +
> +After the merge completes, it will create a new toplevel tree object.
> +See `OUTPUT` below for details.
> +
> +OUTPUT
> +------
> +
> +For either a successful or conflicted merge, the output from
> +git-merge-tree is simply one line:
> +
> +	<OID of toplevel tree>
> +
> +The printed tree object corresponds to what would be checked out in
> +the working tree at the end of `git merge`, and thus may have files
> +with conflict markers in them.
> +
> +EXIT STATUS
> +-----------
> +
> +For a successful, non-conflicted merge, the exit status is 0.  When the
> +merge has conflicts, the exit status is 1.  If the merge is not able to
> +complete (or start) due to some kind of error, the exit status is
> +something other than 0 or 1.
> +
> +USAGE NOTES
> +-----------
> +
> +git-merge-tree was written to be low-level plumbing, similar to
> +hash-object, mktree, commit-tree, update-ref, and mktag.  Thus, it could
> +be used as a part of a series of steps such as
> +
> +       NEWTREE=$(git merge-tree --write-tree $BRANCH1 $BRANCH2)
> +       test $? -eq 0 || die "There were conflicts..."
> +       NEWCOMMIT=$(git commit-tree $NEWTREE -p $BRANCH1 -p $BRANCH2)
> +       git update-ref $BRANCH1 $NEWCOMMIT
> +
> +However, it does not quite fit into the same category of low-level
> +plumbing commands since the possibility of merge conflicts give it a
> +much higher chance of the command not succeeding.

I found this final paragraph confusing. It seems to be hinting at some
conclusion it expects readers to make, but I haven't been able to figure
out what. Could this be made more explicit, or perhaps dropped
altogether?
Elijah Newren Feb. 4, 2022, 6:08 a.m. UTC | #9
On Thu, Feb 3, 2022 at 8:48 PM Josh Steadmon <steadmon@google.com> wrote:
>
> On 2022.02.02 07:34, Elijah Newren via GitGitGadget wrote:
[...]
> > +USAGE NOTES
> > +-----------
> > +
> > +git-merge-tree was written to be low-level plumbing, similar to
> > +hash-object, mktree, commit-tree, update-ref, and mktag.  Thus, it could
> > +be used as a part of a series of steps such as
> > +
> > +       NEWTREE=$(git merge-tree --write-tree $BRANCH1 $BRANCH2)
> > +       test $? -eq 0 || die "There were conflicts..."
> > +       NEWCOMMIT=$(git commit-tree $NEWTREE -p $BRANCH1 -p $BRANCH2)
> > +       git update-ref $BRANCH1 $NEWCOMMIT
> > +
> > +However, it does not quite fit into the same category of low-level
> > +plumbing commands since the possibility of merge conflicts give it a
> > +much higher chance of the command not succeeding.
>
> I found this final paragraph confusing. It seems to be hinting at some
> conclusion it expects readers to make, but I haven't been able to figure
> out what. Could this be made more explicit, or perhaps dropped
> altogether?

Yep, I'll drop it.
Johannes Schindelin Feb. 21, 2022, 9:06 a.m. UTC | #10
Hi,

On Thu, 3 Feb 2022, Elijah Newren wrote:

> On Thu, Feb 3, 2022 at 2:42 AM Johannes Altmanninger <aclopte@gmail.com> wrote:
> >
> > On Wed, Feb 02, 2022 at 04:18:39PM -0800, Elijah Newren wrote:
> > > On Wed, Feb 2, 2022 at 2:01 PM Junio C Hamano <gitster@pobox.com> wrote:
> > > >
> > > > Elijah Newren <newren@gmail.com> writes:
> > > >
> > > > > Yes, you are reading right.  I think the cherry-pick/rebase
> > > > > replacement actually deserves a separate command from what merges
> > > > > should use; replaying a sequence of commits just has a number of UI
> > > > > differences and abilities that I think pull it in a different
> > > > > direction.
> > > >
> > > > I completely disagree.  Each individual step in a sequence of
> > > > replaying commits in order (or in reverse order) should be
> > > > scriptable as a single merge-tree that takes "apply the change to go
> > > > from A^ to A on X".  Sequencing and placing UI around it is a job
> > > > for the script that drives merge-tree.
> > >
> > > Adding such an ability to merge-tree would be trivial -- it basically
> > > involves just two things: (1) accepting one extra argument, and (2)
> > > calling merge_incore_nonrecursive() instead of
> > > merge_incore_recursive().
> > >
> > > However, I think forking a subprocess for every merge of a series of
> > > commits is a completely unreasonable overhead, so even if we provide
> > > such an option to merge-tree, I still want a separate plumbing-ish
> > > tool that does non-worktree/non-index replaying of commits which is
> > > not written as a driver of merge-tree.  That other tool should just
> > > call merge_incore_nonrecursive() directly.  And such a tool, since it
> > > should handle an arbitrary number of commits, should certainly be able
> > > to handle just one commit.  From that angle, it feels like adding
> > > another mode to merge-tree would just be a partial duplication of the
> > > other tool.
> >
> > I wonder how the UI of a tool that does non-worktree/non-index cherry-picks
> > will look like.  I'd expect it to produce the same output as merge-tree,
> > except cherry-pick should probably output a commit OID, not a tree.
> >
> > Maybe we want a unified command that produces commits from any sequence of
> > merge/cherry-pick/revert/reword steps. The obvious UI would use something
> > like the rebase-todo list as input.  For example:
> >
> >         $ echo '
> >         pick commit1
> >         reword commit2  # edit commit message in $GIT_EDITOR
> >         merge commit3 -m "log message"
> >         ' | git create-commit commit0
> >         <OID of final commit>
> >
> > we start from commit0 and apply steps one-by-one. Obviously, one unsolved
> > problem is how to pass parameters like commit messages if no editor should
> > be invoked (my sketch uses -m).
> > If any of the steps fails when merging merge, then we get the tree with
> > conflicts
> >
> >         $ echo '
> >         pick commit1
> >         pick commit2
> >         pick commit-that-does-not-apply
> >         ' | git create-commit commit0
> >         <OID of commit after step 2>
> >         <OID of toplevel tree after failed merge>
> >         <Conflicted file info>
> >         <Informational messages>
> >
> > Replaying a series of commits might look like this:
> >
> >         $ echo 'pick commit1 ^commit0' | git create-commit new-base
> >
> > I'm concluding that this is a difficult UI problem
>
> I agree.  I've got a lot of thoughts on it, and some work in progress
> towards it (https://github.com/newren/git/tree/replay -- _very_ hacky,
> not even close to alpha quality, lots of fixup commits, todo comments,
> random brain dump files added to the tree, based on a previous round
> of this patch series, not updated for weeks, etc., etc.)

Just chiming in that I find that very exciting. But it's a tangent, and
slightly distracting from the topic at hand, so I would like to ask to
focus back on server-side merges.

> > and having a merge-tree command that accepts a "common ancestor"
> > parameter could make it easier to experiment.  Of course that depends
> > on who is experimenting.
>
> I think that would result in experiments and eventually full-blown
> scripts designed around forking subprocesses for every merge, and
> pushes us back into the world of having a scripted-rebase again.  Yes,
> I know people can transliterate shell back to C; it seems to always be
> done as a half-way measure with the forking just being done from C or
> have other UI-warts guided by the shell design.  In fact, *that* was
> the primary reason for me not providing a merge-tree option based on
> merge_incore_nonrecursive(), despite how trivial it'd be to provide
> it.  If someone wanted a merge_incore_nonrecursive() mode for
> merge-tree for reasons other than attempting to build a
> rebase/cherry-pick replacement based on it, then I'd be much happier
> to provide it.
>
> If someone wants to experiment with what a plumbing-ish
> rebase/cherry-pick would look like, the _right_ way to do it would be
> making using of merge_incore_nonrecursive() directly.  If they want
> example code, I already provided some a year and a half ago and got it
> merged into git.git in the form of t/helper/test-fast-rebase.c.  My
> "replay" branch is based on that code, but (a) moves it from t/helper
> to a real builtin, (b) removes the hardcoded very strict input, (c)
> removes the line of code doing the index & working tree updates, and
> (d) modifies the output to be a more plumbing-ish style.

I actually implemented that so I could provide apples-to-apples
speed comparisons between libgit2 and merge-ort:

-- snip --
From 6a865c691810b67dc15ddb57ad110bd6fdfc2f12 Mon Sep 17 00:00:00 2001
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Date: Fri, 28 Jan 2022 23:28:20 +0100
Subject: [PATCH] merge-tree: optionally force a simple 3-way merge

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/merge-tree.c | 72 ++++++++++++++++++++++++++++++--------------
 1 file changed, 50 insertions(+), 22 deletions(-)

diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 58c0ddc5a3..1007aaaede 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -396,6 +396,7 @@ struct merge_tree_options {
 	int allow_unrelated_histories;
 	int show_messages;
 	int exclude_modes_oids_stages;
+	const char *nonrecursive_base;
 };

 static int real_merge(struct merge_tree_options *o,
@@ -409,34 +410,58 @@ static int real_merge(struct merge_tree_options *o,
 	struct merge_options opt;
 	struct merge_result result = { 0 };

-	parent1 = get_merge_parent(branch1);
-	if (!parent1)
-		help_unknown_ref(branch1, "merge-tree",
-				 _("not something we can merge"));
-
-	parent2 = get_merge_parent(branch2);
-	if (!parent2)
-		help_unknown_ref(branch2, "merge-tree",
-				 _("not something we can merge"));
-
 	init_merge_options(&opt, the_repository);

 	opt.show_rename_progress = 0;

-	opt.branch1 = branch1;
-	opt.branch2 = branch2;
+	if (o->nonrecursive_base) {
+		struct object_id base_oid, head_oid, merge_oid;
+		struct tree *base_tree, *head_tree, *merge_tree;
+
+		opt.ancestor = "(base)";
+		opt.branch1 = "(branch1)";
+		opt.branch2 = "(branch2)";
+
+		if (get_oid_treeish(o->nonrecursive_base, &base_oid))
+			die("could not parse base '%s'", o->nonrecursive_base);
+		base_tree = parse_tree_indirect(&base_oid);
+		if (get_oid_treeish(branch1, &head_oid))
+			die("could not parse head '%s'", branch1);
+		head_tree = parse_tree_indirect(&head_oid);
+		if (get_oid_treeish(branch2, &merge_oid))
+			die("could not parse merge '%s'", branch2);
+		merge_tree = parse_tree_indirect(&merge_oid);
+
+		merge_incore_nonrecursive(&opt,
+					  base_tree, head_tree, merge_tree,
+					  &result);
+	} else {
+		parent1 = get_merge_parent(branch1);
+		if (!parent1)
+			help_unknown_ref(branch1, "merge-tree",
+					 _("not something we can merge"));
+
+		parent2 = get_merge_parent(branch2);
+		if (!parent2)
+			help_unknown_ref(branch2, "merge-tree",
+					 _("not something we can merge"));
+
+		opt.branch1 = branch1;
+		opt.branch2 = branch2;

-	/*
-	 * Get the merge bases, in reverse order; see comment above
-	 * merge_incore_recursive in merge-ort.h
-	 */
-	common = get_merge_bases(parent1, parent2);
-	if (!common && !o->allow_unrelated_histories)
-		die(_("refusing to merge unrelated histories"));
-	for (j = common; j; j = j->next)
-		commit_list_insert(j->item, &merge_bases);
+		/*
+		 * Get the merge bases, in reverse order; see comment above
+		 * merge_incore_recursive in merge-ort.h
+		 */
+		common = get_merge_bases(parent1, parent2);
+		if (!common && !o->allow_unrelated_histories)
+			die(_("refusing to merge unrelated histories"));
+		for (j = common; j; j = j->next)
+			commit_list_insert(j->item, &merge_bases);
+
+		merge_incore_recursive(&opt, merge_bases, parent1, parent2, &result);
+	}

-	merge_incore_recursive(&opt, merge_bases, parent1, parent2, &result);
 	if (result.clean < 0)
 		die(_("failure to merge"));

@@ -501,6 +526,9 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix)
 			   &o.allow_unrelated_histories,
 			   N_("allow merging unrelated histories"),
 			   PARSE_OPT_NONEG),
+		OPT_STRING(0, "force-non-recursive-base", &o.nonrecursive_base,
+			   N_("base-tree"),
+			   N_("force a simple three-way merge")),
 		OPT_END()
 	};

-- snap --

I do strongly agree that this should _not_ enter core Git's code, I just
provide this in case someone else wants to play with merge-ort on the
server side in an existing code base.

> We'll certainly have discussions on what that should look like.  But a
> plumbing-ish replacement for merge was much simpler, and made sense to
> do first.  I would prefer to concentrate on getting that hammered down
> first.  Then I'll start discussions on a plumbing-ish
> rebase/cherry-pick.  And if that doesn't fulfill all the needs that
> folks think they want out of merge-tree, then we can add a
> merge_incore_nonrecursive()-based mode to merge-tree.  It's all
> coming, but having fought transliterations-of-scripts in
> merge-recursive.c, sequencer.c, stash.c, rebase.c, etc. for years I
> really, really don't want any more of that.  Let's end that insanity.

Being the driving force behind many a "built-in-ification" of scripted
commands, I wholeheartedly agree. You can still see the fall-out of
designing commands in a scripted fashion, without any way to represent
data structures other than strings. I wish we had come up with a better
design to prototype commands than to write shell scripts. But I have to
admit that even I do not have any better idea than to work on a proper API
for libgit.a (which has historically invariably seen push-back from
Junio).

While I agree that this discussion is a valuable one, right now I would
like to focus on getting the server-side merges done, and once that has
happened, move on to the replay/sequencer/API discussion (which will
probably be a big one, not so much for technical reasons but more for all
too human ones).

Ciao,
Dscho
Junio C Hamano Feb. 21, 2022, 6:55 p.m. UTC | #11
Elijah Newren <newren@gmail.com> writes:

> Adding such an ability to merge-tree would be trivial -- it basically
> involves just two things: (1) accepting one extra argument, and (2)
> calling merge_incore_nonrecursive() instead of
> merge_incore_recursive().
>
> However, I think forking a subprocess for every merge of a series of
> commits is a completely unreasonable overhead, so even if we provide
> such an option to merge-tree, I still want a separate plumbing-ish
> tool that does non-worktree/non-index replaying of commits which is
> not written as a driver of merge-tree.  That other tool should just
> call merge_incore_nonrecursive() directly.  And such a tool, since it
> should handle an arbitrary number of commits, should certainly be able
> to handle just one commit.  From that angle, it feels like adding
> another mode to merge-tree would just be a partial duplication of the
> other tool.

The above does not make much sense to me.

I am hearing that "multi-step cherry-picks and reverts need to be
fast and we need something like sequencer that is all written in C,
and single-step cherry-pick is merely a special case that does not
deserve a plumbing".

But that argument leads to "and the same something-like-sequencer
that is all written in C would need '--rebase-merges' that can pick
multi-step merge sequences, and single-step merge does not deserve a
plumbing", which is an argument against this topic that is utterly
absurd.

So why isn't your objection not equally absurd against having a
single step cherry-pick or revert primitive as a plumbing?
Elijah Newren Feb. 22, 2022, 2:37 a.m. UTC | #12
On Mon, Feb 21, 2022 at 1:06 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi,
>
> On Thu, 3 Feb 2022, Elijah Newren wrote:

[...]
> > I agree.  I've got a lot of thoughts on it, and some work in progress
> > towards it (https://github.com/newren/git/tree/replay -- _very_ hacky,
> > not even close to alpha quality, lots of fixup commits, todo comments,
> > random brain dump files added to the tree, based on a previous round
> > of this patch series, not updated for weeks, etc., etc.)
>
> Just chiming in that I find that very exciting. But it's a tangent, and
> slightly distracting from the topic at hand, so I would like to ask to
> focus back on server-side merges.

+1 for refocusing.

> > We'll certainly have discussions on what that should look like.  But a
> > plumbing-ish replacement for merge was much simpler, and made sense to
> > do first.  I would prefer to concentrate on getting that hammered down
> > first.  Then I'll start discussions on a plumbing-ish
> > rebase/cherry-pick.  And if that doesn't fulfill all the needs that
> > folks think they want out of merge-tree, then we can add a
> > merge_incore_nonrecursive()-based mode to merge-tree.  It's all
> > coming, but having fought transliterations-of-scripts in
> > merge-recursive.c, sequencer.c, stash.c, rebase.c, etc. for years I
> > really, really don't want any more of that.  Let's end that insanity.
>
> Being the driving force behind many a "built-in-ification" of scripted
> commands, I wholeheartedly agree. You can still see the fall-out of
> designing commands in a scripted fashion, without any way to represent
> data structures other than strings. I wish we had come up with a better
> design to prototype commands than to write shell scripts. But I have to
> admit that even I do not have any better idea than to work on a proper API
> for libgit.a (which has historically invariably seen push-back from
> Junio).
>
> While I agree that this discussion is a valuable one, right now I would
> like to focus on getting the server-side merges done, and once that has
> happened, move on to the replay/sequencer/API discussion (which will
> probably be a big one, not so much for technical reasons but more for all
> too human ones).

I'm just guessing, but I suspect your prediction for the future
replay/sequencer/rebase/API discussion will be spot on.  :-)
Elijah Newren Feb. 22, 2022, 4:26 p.m. UTC | #13
On Mon, Feb 21, 2022 at 10:55 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > Adding such an ability to merge-tree would be trivial -- it basically
> > involves just two things: (1) accepting one extra argument, and (2)
> > calling merge_incore_nonrecursive() instead of
> > merge_incore_recursive().
> >
> > However, I think forking a subprocess for every merge of a series of
> > commits is a completely unreasonable overhead, so even if we provide
> > such an option to merge-tree, I still want a separate plumbing-ish
> > tool that does non-worktree/non-index replaying of commits which is
> > not written as a driver of merge-tree.  That other tool should just
> > call merge_incore_nonrecursive() directly.  And such a tool, since it
> > should handle an arbitrary number of commits, should certainly be able
> > to handle just one commit.  From that angle, it feels like adding
> > another mode to merge-tree would just be a partial duplication of the
> > other tool.
>
> The above does not make much sense to me.
>
> I am hearing that "multi-step cherry-picks and reverts need to be
> fast and we need something like sequencer that is all written in C,

Yes, I agree with that part so far.  jj is kicking our butt on rebase
speed; I'm not sure if we can catch it, but it'd be nice to see us not
be more than a hundred times slower.

> and single-step cherry-pick is merely a special case that does not
> deserve a plumbing".

Well, apparently I failed at communication if that's what you heard.
Perhaps I can step back and provide my high-level goals, and then
mention how this series fits in.  My high-level goals:

  * new sequencer-like replay tool, including multiple abilities
today's rebase/cherry-pick tools don't have
  * enable folks to use merging machinery for server side operations
(merge, rebase, cherry-pick, revert)
  * do not repeat or encourage the rebase-as-shell-script mistakes of yesteryear
  * somehow split this up into reviewable chunks

Now, in particular, the "merge divergent branches" piece seemed like a
really simple portion of the problem space for which I could get some
early feedback without having to address the whole problem space all
at once, and which doesn't seem to have any downside risk.

And even with my attempt to narrow it in scope, and even despite lots
of early feedback from the Git Virtual Contributor Summit six months
ago, it's been nearly two months of active discussions including all
kinds of intrinsic and tangential points about the UI and design.  Why
try to prematurely widen the scope?  Can we just focus on merging
divergent branches for now, and cover the rest later?

> But that argument leads to "and the same something-like-sequencer
> that is all written in C would need '--rebase-merges' that can pick
> multi-step merge sequences, and single-step merge does not deserve a
> plumbing", which is an argument against this topic that is utterly
> absurd.
>
> So why isn't your objection not equally absurd against having a
> single step cherry-pick or revert primitive as a plumbing?

The objection you are arguing against is not my position.  In fact,
I'm not even objecting to having a single-step cherry-pick, I'm
objecting to providing it _now_, which I thought would have been clear
from the portion of my email you snipped ("...I'm happy to add [a
single step cherry-pick primitive] along with the tool I submit
later...").  Since that wasn't clear, and since that wasn't my only
communication failure here, let me attempt to be clearer about my
objection(s):

1. I'm really trying to pick off a small piece of the problem space
and get feedback on it without unnecessarily complicating things with
unrelated issues.  Thus, this series is _only_ about merging branches
that have diverged, and leaves commit replaying for later.

2. Two folks have chimed in about the single step cherry-pick, and the
ONLY reason given for wanting such a thing was to create a
rebasing/cherry-picking script which was driven by repeatedly invoking
this low-level primitive command.  That's also the only usecase I can
currently think of for such a primitive.  To me, that means providing
such a low-level command now would be likely to result in the
rebase-as-a-script mistake of yesteryear.  I think we can avoid that
pitfall by first providing a tool that avoids the
repeatedly-fork-git-subprocesses model.  (Also, providing a low-level
single-step cherry-pick command also has the added negative of further
distracting from the focus on merging divergent branches.)

3. The merge primitive in this series is useful completely independent
of any rebasing script (it would not be used solely for rebasing
merges, if it's used for that purpose at all, as evidenced by the fact
that dscho is already trying to use it for doing new real merges).

4. Once we have a git-replay tool that can replay a sequence of
commits, there _might_ not be a need for a single commit replaying
primitive.  If we provided one as you and Johannes Altimanninger were
asking for, and it turned out to be deemed useless because the later
tool I provide can do everything it can and more, haven't we just
wasted time in providing it?  And perhaps also wasted future time as
we then have work to do to deprecate and remove the new command or
mode? (NOTE: I did *not* say there was "no need" for a single-commit
replaying primitive -- I said there "might not" be a need.)

Also, since you bring up --rebase-merges, there's an additional point
about it that might be relevant:

5. While you could implement a naive --rebase-merges in terms of a
primitive for merging divergent branches (or vice-versa, i.e.
implement merging divergent branches from a naive --rebase-merges
implementation), I think replaying merges more intelligently[*] is
actually a distinct operation from doing a new merge of divergent
branches and that you probably can't implement one in terms of the
other.  (I'm not certain on this, and definitely don't want to argue
the finer points on it while my implementation is still half-baked,
but I really do think they are different things right now.)

[*] https://lore.kernel.org/git/CABPp-BHp+d62dCyAaJfh1cZ8xVpGyb97mZryd02aCOX=Qn=Ltw@mail.gmail.com/
Johannes Schindelin Feb. 22, 2022, 4:45 p.m. UTC | #14
Hi Junio,

On Mon, 21 Feb 2022, Junio C Hamano wrote:

> Elijah Newren <newren@gmail.com> writes:
>
> > Adding such an ability to merge-tree would be trivial -- it basically
> > involves just two things: (1) accepting one extra argument, and (2)
> > calling merge_incore_nonrecursive() instead of
> > merge_incore_recursive().
> >
> > However, I think forking a subprocess for every merge of a series of
> > commits is a completely unreasonable overhead, so even if we provide
> > such an option to merge-tree, I still want a separate plumbing-ish
> > tool that does non-worktree/non-index replaying of commits which is
> > not written as a driver of merge-tree.  That other tool should just
> > call merge_incore_nonrecursive() directly.  And such a tool, since it
> > should handle an arbitrary number of commits, should certainly be able
> > to handle just one commit.  From that angle, it feels like adding
> > another mode to merge-tree would just be a partial duplication of the
> > other tool.
>
> The above does not make much sense to me.
>
> I am hearing that "multi-step cherry-picks and reverts need to be
> fast and we need something like sequencer that is all written in C,
> and single-step cherry-pick is merely a special case that does not
> deserve a plumbing".

Correct. The single cherry-pick will be a trivial fall-out of such a
sequencer, and the opposite is not true: if we taught `merge-tree` the
options `--cherry-pick` or `--revert`, the result would be a dead end
because it does not make sense to extend `merge-tree` to do what the
`sequencer` would do.

> But that argument leads to "and the same something-like-sequencer
> that is all written in C would need '--rebase-merges' that can pick
> multi-step merge sequences, and single-step merge does not deserve a
> plumbing", which is an argument against this topic that is utterly
> absurd.

But that `--rebase-merges`-like behavior is far off in the future, and the
sequencer is not.

If you step back for a moment and think about the existing use cases where
we want to use `merge-tree` on the server side, there are GitHub's Pull
Requests (and I suspect that all other Git hosting services followed
suite), where you have three options:

- merge
- squash
- rebase

The first two options are actually pretty much done, as we already have
a way with the current iteration of `merge-tree` to get the tree, and
that's all we need from `merge-tree`, the rest can be done by calling
`commit-tree` with the appropriate parent(s) and commit messages.

In contrast, `rebase` will require not only `tree` objects to be
generated, but much more. It is a fundamentally more complex operation
because of that.

Now, if there were already server-side user interfaces to cherry-pick, I
could potentially see that the next logical step would be to support
something like the `--force-non-recursive-base=<tree-ish>` option I
have already implemented over here (for separate reasons).

But I am unaware of such a user interface. I _am_ aware of the `rebase`
option to apply Pull Requests. So I think that's the logical direction
we're going from here.

> So why isn't your objection not equally absurd against having a
> single step cherry-pick or revert primitive as a plumbing?

Well, if you care so deeply about it, I will offer up that patch to
support `--force-non-recursive-base=<tree-ish>` (where the `cherry-pick`
and `revert` primitive would not need to exist but be the special case of
passing `CHERRY_PICK_HEAD^` as the appropriate argument).

What gets me excited much more, though, is the `rebase` operation. And
therefore I would like to spend more focus on that, and focus is a limited
resource (especially here on the Git mailing list :-)).

Ciao,
Dscho
Junio C Hamano Feb. 23, 2022, 8:07 p.m. UTC | #15
Elijah Newren <newren@gmail.com> writes:

> The objection you are arguing against is not my position.  In fact,
> I'm not even objecting to having a single-step cherry-pick, I'm
> objecting to providing it _now_, which I thought would have been clear
> from the portion of my email you snipped ("...I'm happy to add [a
> single step cherry-pick primitive] along with the tool I submit
> later...").

The entry point into the in-core merge machinery of ort already
knows how to accept externally defined merge-base(s) to bypass the
"caller didn't give us the merge base, so let's figure them out by
using the two heads being merged" logic, so it just felt backwards
*not* to have a vanilla three-way merge that can take three trees
and be used for merge, cherry-pick or revert as the single primitive
in the very beginning before we talk about multi-step operations.

So, I guess I am still not getting where the "I'm happy to _add_"
part comes from.  If we start with a primitive (internally callable
from C) "here are three trees O A B, do the three-way merge", then
there is nothing to "add" at all to expose a single-step
cherry-pick.  In fact, to the users of merge-tree, the result does
not have to have any fixed meaning.  If they pass common ancestors
as the merge bases as Os and the current HEAD and the other branch
as A and B, they get a merge.  If they pass the commit to be picked
as B, current HEAD as A and B's parent as O, they get a cherry-pick.

Perhaps starting from "You are allowed to give me two commits A B,
and I do not let you specify the commit O to use as a common
'ancestor'" is the root cause of making this thing feel backwards.
I agree with the goal of having an all-incore machinery that can do
a merge.  I just do not see the reason why you have to build it in a
way that cannot be reused for other two directions of merge-y
operations.
Elijah Newren Feb. 24, 2022, 2:22 a.m. UTC | #16
On Wed, Feb 23, 2022 at 12:07 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > The objection you are arguing against is not my position.  In fact,
> > I'm not even objecting to having a single-step cherry-pick, I'm
> > objecting to providing it _now_, which I thought would have been clear
> > from the portion of my email you snipped ("...I'm happy to add [a
> > single step cherry-pick primitive] along with the tool I submit
> > later...").
>
> The entry point into the in-core merge machinery of ort already
> knows how to accept externally defined merge-base(s) to bypass the
> "caller didn't give us the merge base, so let's figure them out by
> using the two heads being merged" logic, so it just felt backwards
> *not* to have a vanilla three-way merge that can take three trees
> and be used for merge, cherry-pick or revert as the single primitive
> in the very beginning before we talk about multi-step operations.

"The entry point"?

There are two entry points: merge_incore_recursive(), and
merge_incore_nonrecursive().  The former is analogous to
merge-recursive's merge_recursive(), and the latter is analogous to
merge_trees().

"git merge" always uses merge_incore_recursive()/merge_recursive().
The other big merge-y operations (cherry-pick, rebase, revert), always
use merge_incore_nonrecursive()/merge_trees().

(Technically, merge-recursive has a third entry point used by am &
stash, but let's ignore it for a moment.)

> So, I guess I am still not getting where the "I'm happy to _add_"
> part comes from.  If we start with a primitive (internally callable
> from C) "here are three trees O A B, do the three-way merge", then
> there is nothing to "add" at all to expose a single-step
> cherry-pick.  In fact, to the users of merge-tree, the result does
> not have to have any fixed meaning.  If they pass common ancestors
> as the merge bases as Os and the current HEAD and the other branch
> as A and B, they get a merge.  If they pass the commit to be picked
> as B, current HEAD as A and B's parent as O, they get a cherry-pick.
>
> Perhaps starting from "You are allowed to give me two commits A B,
> and I do not let you specify the commit O to use as a common
> 'ancestor'" is the root cause of making this thing feel backwards.
> I agree with the goal of having an all-incore machinery that can do
> a merge.  I just do not see the reason why you have to build it in a
> way that cannot be reused for other two directions of merge-y
> operations.

So, am I correct to understand that what bugs you is actually
merge-recursive's and merge-ort's API?  That you don't want these two
types of merges to have different entry points, and that there should
in fact only be one?

That might be an interesting line of investigation to try to modify
the API to achieve that, but that feels like a bigger task that is
somewhat tangential to this series.

Without such a thing, handling both merging-of-divergent-branches and
three-way-merge-with-specified-merge-base would be separate codepaths
that call different functions.  I implemented one of those two
codepaths, and if we want the other then it needs to be added.  That's
where the "add" part comes from, in my view.


Does that help, or am I still missing what you're saying?
Junio C Hamano Feb. 24, 2022, 8:04 p.m. UTC | #17
Elijah Newren <newren@gmail.com> writes:

> So, am I correct to understand that what bugs you is actually
> merge-recursive's and merge-ort's API?  That you don't want these two
> types of merges to have different entry points, and that there should
> in fact only be one?

It is more like

    It is more than OK that there are two, but the basic primitive
    is the "we have this and that tree objects to merge, and use
    this tree object as the ancestor" non-recursive thing, with the
    recursive one being just a thin wrapper around it to compute
    common ancestors, using the three-way primitive to reduce them
    into a single virtual ancestor, and finally using the three-way
    primitive to come up with the final result.

And making the composite "recursive" feature available long before
the underlying "non-recursive" primitive becomes easily accessible
to the scripters and system builders simply felt backwards.
Junio C Hamano Feb. 24, 2022, 11:36 p.m. UTC | #18
Junio C Hamano <gitster@pobox.com> writes:

> It is more like
> ...

Actually, I misspoke.  It is a bit different.

In my mind, the building block hierarchy would have been

 (1) Take three tree objects A, B, and O, and do the three-way
     merge.  No history relation is assumed among them.

 (2) Take two tree objects A and B, with one or more commit objects
     Os; use (2) recursively to reduce Os into a single O and then
     apply (1) on A, B and O.

 (3) Take two commit objects A and B.  Compute Os out of A and B and
     use (2) once to merge A and B.

I think the basic primitive that should be exposed to an external
world (read: plumbing) this year, after all years of experience with
merge-recursive, should be (2), not (1).  

If you have (2), then (3) is trivially possible (it is just a single
call to get_merge_base()).  "git merge-tree A B" without having to
spell out bases is so convenent and you do not have to write
"git merge-tree A B $(git merge-base --all A B)", so I am OK for it
to exist, but it is not essential.

If you have (2) and exposed (2) as the primitive plumbing,
cherry-pick and revert would be a narrow special case of passing
only one O to the machinery.

And coming from the above point of view, exposing (3) as the
primitive plumbing to scripters and system builders, and later
having to _add_ support to allow (2), felt backwards.  It should be
trivial for us to make (2) available before we can even offer (3),
but what is happening to this new plumbing command goes in the
opposite order.

It may be, as you said, the problem the underlying ort API has that
somehow makes it harder to expose (2), in which case, yes, I think
that is what bugs me.
Johannes Altmanninger Feb. 27, 2022, 5:35 p.m. UTC | #19
On Tue, Feb 22, 2022 at 08:26:41AM -0800, Elijah Newren wrote:
> On Mon, Feb 21, 2022 at 10:55 AM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Elijah Newren <newren@gmail.com> writes:
> >
> > > Adding such an ability to merge-tree would be trivial -- it basically
> > > involves just two things: (1) accepting one extra argument, and (2)
> > > calling merge_incore_nonrecursive() instead of
> > > merge_incore_recursive().
> > >
> > > However, I think forking a subprocess for every merge of a series of
> > > commits is a completely unreasonable overhead, so even if we provide
> > > such an option to merge-tree, I still want a separate plumbing-ish
> > > tool that does non-worktree/non-index replaying of commits which is
> > > not written as a driver of merge-tree.  That other tool should just
> > > call merge_incore_nonrecursive() directly.  And such a tool, since it
> > > should handle an arbitrary number of commits, should certainly be able
> > > to handle just one commit.  From that angle, it feels like adding
> > > another mode to merge-tree would just be a partial duplication of the
> > > other tool.

I don't think "to avoid duplication" is a good argument for making this
plumbing command less flexible, because that's just chasing a local minimum
w.r.t. redundancy. More general APIs will lead to a global minimum.

> >
> > The above does not make much sense to me.
> >
> > I am hearing that "multi-step cherry-picks and reverts need to be
> > fast and we need something like sequencer that is all written in C,
> 
> Yes, I agree with that part so far.  jj is kicking our butt on rebase
> speed; I'm not sure if we can catch it, but it'd be nice to see us not
> be more than a hundred times slower.
> 
> > and single-step cherry-pick is merely a special case that does not
> > deserve a plumbing".
> 
> Well, apparently I failed at communication if that's what you heard.
> Perhaps I can step back and provide my high-level goals, and then
> mention how this series fits in.  My high-level goals:
> 
>   * new sequencer-like replay tool, including multiple abilities
> today's rebase/cherry-pick tools don't have
>   * enable folks to use merging machinery for server side operations
> (merge, rebase, cherry-pick, revert)
>   * do not repeat or encourage the rebase-as-shell-script mistakes of yesteryear
>   * somehow split this up into reviewable chunks
> 
> Now, in particular, the "merge divergent branches" piece seemed like a
> really simple portion of the problem space for which I could get some
> early feedback without having to address the whole problem space all
> at once, and which doesn't seem to have any downside risk.
> 
> And even with my attempt to narrow it in scope, and even despite lots
> of early feedback from the Git Virtual Contributor Summit six months
> ago, it's been nearly two months of active discussions including all
> kinds of intrinsic and tangential points about the UI and design.  Why
> try to prematurely widen the scope?  Can we just focus on merging
> divergent branches for now, and cover the rest later?
> 
> > But that argument leads to "and the same something-like-sequencer
> > that is all written in C would need '--rebase-merges' that can pick
> > multi-step merge sequences, and single-step merge does not deserve a
> > plumbing", which is an argument against this topic that is utterly
> > absurd.
> >
> > So why isn't your objection not equally absurd against having a
> > single step cherry-pick or revert primitive as a plumbing?
> 
> The objection you are arguing against is not my position.  In fact,
> I'm not even objecting to having a single-step cherry-pick, I'm
> objecting to providing it _now_, which I thought would have been clear
> from the portion of my email you snipped ("...I'm happy to add [a
> single step cherry-pick primitive] along with the tool I submit
> later...").  Since that wasn't clear, and since that wasn't my only
> communication failure here, let me attempt to be clearer about my
> objection(s):
> 
> 1. I'm really trying to pick off a small piece of the problem space
> and get feedback on it without unnecessarily complicating things with
> unrelated issues.  Thus, this series is _only_ about merging branches
> that have diverged, and leaves commit replaying for later.
> 
> 2. Two folks have chimed in about the single step cherry-pick, and the
> ONLY reason given for wanting such a thing was to create a
> rebasing/cherry-picking script which was driven by repeatedly invoking
> this low-level primitive command.  That's also the only usecase I can
> currently think of for such a primitive.  To me, that means providing
> such a low-level command now would be likely to result in the
> rebase-as-a-script mistake of yesteryear.  I think we can avoid that
> pitfall by first providing a tool that avoids the
> repeatedly-fork-git-subprocesses model.  (Also, providing a low-level
> single-step cherry-pick command also has the added negative of further
> distracting from the focus on merging divergent branches.)

I agree that it's not a good idea to call merge-tree in a loop for
cherry-picking commit sequences.

At the same time, it is weird for such a low-level tool to not allow
specifying merge bases.
Accepting merge bases is the more logical API, that might allow curious
users to figure out how revert/cherry-pick are implemented.

I intuitively prefer the version that accepts merge bases but I don't have
a good use case, so I think it's okay to add that later if we ever find use
for it.

> 
> 3. The merge primitive in this series is useful completely independent
> of any rebasing script (it would not be used solely for rebasing
> merges, if it's used for that purpose at all, as evidenced by the fact
> that dscho is already trying to use it for doing new real merges).
> 
> 4. Once we have a git-replay tool that can replay a sequence of
> commits, there _might_ not be a need for a single commit replaying
> primitive.  If we provided one as you and Johannes Altimanninger were
> asking for, and it turned out to be deemed useless because the later
> tool I provide can do everything it can and more, haven't we just
> wasted time in providing it?  And perhaps also wasted future time as
> we then have work to do to deprecate and remove the new command or
> mode? (NOTE: I did *not* say there was "no need" for a single-commit
> replaying primitive -- I said there "might not" be a need.)

If we get a tool that can do multiple cherry-picks, I think there is no
technical reason against having an equivalent tool that can do multiple merges.
In that future, merge-tree might be mostly obsolete.

In general, this is a difficult discussion.  It's really hard to judge
this series without a bigger picture of how our future UI will look like.
(Thanks for sharing the replay code BTW, there are some nice features in
there.)  Though I agree that integrating this (minimal) series first makes
a ton of sense, because it already supports a valid use case.

I feel like the output format is a bit experimental because it doesn't give
much of the conflict information in a machine-parseable format.  Of course
it's good enough for many uses (so I don't think this should block this
topic) but I think we should have a plan on how to change the output format
in future without adding ugly compatibility hacks. Marking merge-tree as
"experimental" (like git-switch/git-restore) comes to mind.  That would work
although it's not the most user-friendly way.

(OK I just saw that you are still looking into the output format in
CABPp-BG++YqesTxp+JL3XzwrogfMag1NscoMpCOExmV9z6Py9A@mail.gmail.com )

I wanted to implement some (cherry-picking) scripts using merge-tree but I
don't have enough time or need, so I don't have much feedback on the output
format today. I can imagine that it would be nice to have a clear distinction
between content conflicts and non-content conflicts, but let's worry about
that later..

> 
> Also, since you bring up --rebase-merges, there's an additional point
> about it that might be relevant:
> 
> 5. While you could implement a naive --rebase-merges in terms of a
> primitive for merging divergent branches (or vice-versa, i.e.
> implement merging divergent branches from a naive --rebase-merges
> implementation), I think replaying merges more intelligently[*] is
> actually a distinct operation from doing a new merge of divergent
> branches and that you probably can't implement one in terms of the
> other.  (I'm not certain on this, and definitely don't want to argue
> the finer points on it while my implementation is still half-baked,
> but I really do think they are different things right now.)
> 
> [*] https://lore.kernel.org/git/CABPp-BHp+d62dCyAaJfh1cZ8xVpGyb97mZryd02aCOX=Qn=Ltw@mail.gmail.com/
Johannes Altmanninger Feb. 27, 2022, 5:35 p.m. UTC | #20
On Thu, Feb 24, 2022 at 03:36:55PM -0800, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > It is more like
> > ...
> 
> Actually, I misspoke.  It is a bit different.
> 
> In my mind, the building block hierarchy would have been
> 
>  (1) Take three tree objects A, B, and O, and do the three-way
>      merge.  No history relation is assumed among them.
> 
>  (2) Take two tree objects A and B, with one or more commit objects
>      Os; use (2) recursively to reduce Os into a single O and then
>      apply (1) on A, B and O.

Accepting multiple bases is nice (because it frees users of having
to recursively merge their merge-bases),

Let's say we take this series in its current form:

	git merge-tree [--write-tree] [<options>] <branch1> <branch2>
	git merge-tree [--trivial-merge] <base-tree> <branch1> <branch2> (deprecated)

and later discover we want to add (2), one possible syntax would be

	git merge-tree --write-tree [<options>] <branch1> <branch2> <base>...

(or put the bases in the middle like merge-file).
Though the mandatory --write-tree leaves a bad taste.

A separate option is a better alternative:

	git merge-tree [--write-tree] [<options>] --base=<base1>,<base2>,... <branch1> <branch2>

Anyway, no need to worry about that now, especially since the root cause of
the ugliness is the legacy --trivial-merge, and there is no way avoid that,
even if we add this now.

> 
>  (3) Take two commit objects A and B.  Compute Os out of A and B and
>      use (2) once to merge A and B.
> 
> I think the basic primitive that should be exposed to an external
> world (read: plumbing) this year, after all years of experience with
> merge-recursive, should be (2), not (1).  
> 
> If you have (2), then (3) is trivially possible (it is just a single
> call to get_merge_base()).  "git merge-tree A B" without having to
> spell out bases is so convenent and you do not have to write
> "git merge-tree A B $(git merge-base --all A B)", so I am OK for it
> to exist, but it is not essential.
> 
> If you have (2) and exposed (2) as the primitive plumbing,
> cherry-pick and revert would be a narrow special case of passing
> only one O to the machinery.
> 
> And coming from the above point of view, exposing (3) as the
> primitive plumbing to scripters and system builders, and later
> having to _add_ support to allow (2), felt backwards.  It should be
> trivial for us to make (2) available before we can even offer (3),
> but what is happening to this new plumbing command goes in the
> opposite order.
> 
> It may be, as you said, the problem the underlying ort API has that
> somehow makes it harder to expose (2), in which case, yes, I think
> that is what bugs me.
> 
> 
>
diff mbox series

Patch

diff --git a/Documentation/git-merge-tree.txt b/Documentation/git-merge-tree.txt
index 58731c19422..569485815a0 100644
--- a/Documentation/git-merge-tree.txt
+++ b/Documentation/git-merge-tree.txt
@@ -3,26 +3,73 @@  git-merge-tree(1)
 
 NAME
 ----
-git-merge-tree - Show three-way merge without touching index
+git-merge-tree - Perform merge without touching index or working tree
 
 
 SYNOPSIS
 --------
 [verse]
-'git merge-tree' <base-tree> <branch1> <branch2>
+'git merge-tree' [--write-tree] <branch1> <branch2>
+'git merge-tree' [--trivial-merge] <base-tree> <branch1> <branch2> (deprecated)
 
 DESCRIPTION
 -----------
-Reads three tree-ish, and output trivial merge results and
-conflicting stages to the standard output.  This is similar to
-what three-way 'git read-tree -m' does, but instead of storing the
-results in the index, the command outputs the entries to the
-standard output.
-
-This is meant to be used by higher level scripts to compute
-merge results outside of the index, and stuff the results back into the
-index.  For this reason, the output from the command omits
-entries that match the <branch1> tree.
+
+Performs a merge, but does not make any new commits and does not read
+from or write to either the working tree or index.
+
+The second form is deprecated and supported only for backward
+compatibility.  It will likely be removed in the future, and will not
+be discussed further in this manual.
+
+The first form will merge the two branches, doing a real merge.  A real
+merge is distinguished from a trivial merge in that it includes:
+
+  * three way content merges of individual files
+  * rename detection
+  * proper directory/file conflict handling
+  * recursive ancestor consolidation (i.e. when there is more than one
+    merge base, creating a virtual merge base by merging the merge bases)
+  * etc.
+
+After the merge completes, it will create a new toplevel tree object.
+See `OUTPUT` below for details.
+
+OUTPUT
+------
+
+For either a successful or conflicted merge, the output from
+git-merge-tree is simply one line:
+
+	<OID of toplevel tree>
+
+The printed tree object corresponds to what would be checked out in
+the working tree at the end of `git merge`, and thus may have files
+with conflict markers in them.
+
+EXIT STATUS
+-----------
+
+For a successful, non-conflicted merge, the exit status is 0.  When the
+merge has conflicts, the exit status is 1.  If the merge is not able to
+complete (or start) due to some kind of error, the exit status is
+something other than 0 or 1.
+
+USAGE NOTES
+-----------
+
+git-merge-tree was written to be low-level plumbing, similar to
+hash-object, mktree, commit-tree, update-ref, and mktag.  Thus, it could
+be used as a part of a series of steps such as
+
+       NEWTREE=$(git merge-tree --write-tree $BRANCH1 $BRANCH2)
+       test $? -eq 0 || die "There were conflicts..."
+       NEWCOMMIT=$(git commit-tree $NEWTREE -p $BRANCH1 -p $BRANCH2)
+       git update-ref $BRANCH1 $NEWCOMMIT
+
+However, it does not quite fit into the same category of low-level
+plumbing commands since the possibility of merge conflicts give it a
+much higher chance of the command not succeeding.
 
 GIT
 ---
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index e98ec8a9f1d..d14c9f6e44e 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -2,6 +2,9 @@ 
 #include "builtin.h"
 #include "tree-walk.h"
 #include "xdiff-interface.h"
+#include "help.h"
+#include "commit-reach.h"
+#include "merge-ort.h"
 #include "object-store.h"
 #include "parse-options.h"
 #include "repository.h"
@@ -392,7 +395,46 @@  struct merge_tree_options {
 static int real_merge(struct merge_tree_options *o,
 		      const char *branch1, const char *branch2)
 {
-	die(_("real merges are not yet implemented"));
+	struct commit *parent1, *parent2;
+	struct commit_list *common;
+	struct commit_list *merge_bases = NULL;
+	struct commit_list *j;
+	struct merge_options opt;
+	struct merge_result result = { 0 };
+
+	parent1 = get_merge_parent(branch1);
+	if (!parent1)
+		help_unknown_ref(branch1, "merge-tree",
+				 _("not something we can merge"));
+
+	parent2 = get_merge_parent(branch2);
+	if (!parent2)
+		help_unknown_ref(branch2, "merge-tree",
+				 _("not something we can merge"));
+
+	init_merge_options(&opt, the_repository);
+
+	opt.show_rename_progress = 0;
+
+	opt.branch1 = branch1;
+	opt.branch2 = branch2;
+
+	/*
+	 * Get the merge bases, in reverse order; see comment above
+	 * merge_incore_recursive in merge-ort.h
+	 */
+	common = get_merge_bases(parent1, parent2);
+	if (!common)
+		die(_("refusing to merge unrelated histories"));
+	for (j = common; j; j = j->next)
+		commit_list_insert(j->item, &merge_bases);
+
+	merge_incore_recursive(&opt, merge_bases, parent1, parent2, &result);
+	if (result.clean < 0)
+		die(_("failure to merge"));
+	puts(oid_to_hex(&result.tree->object.oid));
+	merge_finalize(&opt, &result);
+	return !result.clean; /* result.clean < 0 handled above */
 }
 
 int cmd_merge_tree(int argc, const char **argv, const char *prefix)
diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
new file mode 100755
index 00000000000..66c3eaf2021
--- /dev/null
+++ b/t/t4301-merge-tree-write-tree.sh
@@ -0,0 +1,88 @@ 
+#!/bin/sh
+
+test_description='git merge-tree --write-tree'
+
+. ./test-lib.sh
+
+# This test is ort-specific
+if test "${GIT_TEST_MERGE_ALGORITHM}" != "ort"
+then
+	skip_all="GIT_TEST_MERGE_ALGORITHM != ort"
+	test_done
+fi
+
+test_expect_success setup '
+	test_write_lines 1 2 3 4 5 >numbers &&
+	echo hello >greeting &&
+	echo foo >whatever &&
+	git add numbers greeting whatever &&
+	test_tick &&
+	git commit -m initial &&
+
+	git branch side1 &&
+	git branch side2 &&
+
+	git checkout side1 &&
+	test_write_lines 1 2 3 4 5 6 >numbers &&
+	echo hi >greeting &&
+	echo bar >whatever &&
+	git add numbers greeting whatever &&
+	test_tick &&
+	git commit -m modify-stuff &&
+
+	git checkout side2 &&
+	test_write_lines 0 1 2 3 4 5 >numbers &&
+	echo yo >greeting &&
+	git rm whatever &&
+	mkdir whatever &&
+	>whatever/empty &&
+	git add numbers greeting whatever/empty &&
+	test_tick &&
+	git commit -m other-modifications
+'
+
+test_expect_success 'Content merge and a few conflicts' '
+	git checkout side1^0 &&
+	test_must_fail git merge side2 &&
+	expected_tree=$(cat .git/AUTO_MERGE) &&
+
+	# We will redo the merge, while we are still in a conflicted state!
+	test_when_finished "git reset --hard" &&
+
+	test_expect_code 1 git merge-tree --write-tree side1 side2 >RESULT &&
+	actual_tree=$(head -n 1 RESULT) &&
+
+	# Due to differences of e.g. "HEAD" vs "side1", the results will not
+	# exactly match.  Dig into individual files.
+
+	# Numbers should have three-way merged cleanly
+	test_write_lines 0 1 2 3 4 5 6 >expect &&
+	git show ${actual_tree}:numbers >actual &&
+	test_cmp expect actual &&
+
+	# whatever and whatever~<branch> should have same HASHES
+	git rev-parse ${expected_tree}:whatever ${expected_tree}:whatever~HEAD >expect &&
+	git rev-parse ${actual_tree}:whatever ${actual_tree}:whatever~side1 >actual &&
+	test_cmp expect actual &&
+
+	# greeting should have a merge conflict
+	git show ${expected_tree}:greeting >tmp &&
+	cat tmp | sed -e s/HEAD/side1/ >expect &&
+	git show ${actual_tree}:greeting >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'Barf on misspelled option, with exit code other than 0 or 1' '
+	# Mis-spell with single "s" instead of double "s"
+	test_expect_code 129 git merge-tree --write-tree --mesages FOOBAR side1 side2 2>expect &&
+
+	grep "error: unknown option.*mesages" expect
+'
+
+test_expect_success 'Barf on too many arguments' '
+	test_expect_code 129 git merge-tree --write-tree side1 side2 side3 2>expect &&
+
+	grep "^usage: git merge-tree" expect
+'
+
+test_done