diff mbox series

[v7,1/2] merge-tree.c: add --merge-base=<commit> option

Message ID 1cf1c69b8e8e8e81eccc42b5d8efc605a36ab7eb.1668210314.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Commit 66265a693e8deb3ab86577eb7f69940410044081
Headers show
Series merge-tree: allow specifying a base commit when --write-tree is passed | expand

Commit Message

Kyle Zhao Nov. 11, 2022, 11:45 p.m. UTC
From: Kyle Zhao <kylezhao@tencent.com>

This patch will give our callers more flexibility to use `git merge-tree`,
such as:

    git merge-tree --write-tree --merge-base=branch^ HEAD branch

This does a merge of HEAD and branch, but uses branch^ as the merge-base.

And the reason why using an option flag instead of a positional argument
is to allow additional commits passed to merge-tree to be handled via an
octopus merge in the future.

Signed-off-by: Kyle Zhao <kylezhao@tencent.com>
---
 Documentation/git-merge-tree.txt |  4 +++
 builtin/merge-tree.c             | 46 ++++++++++++++++++++++++--------
 t/t4301-merge-tree-write-tree.sh | 27 +++++++++++++++++++
 3 files changed, 66 insertions(+), 11 deletions(-)

Comments

Junio C Hamano Nov. 22, 2022, 3:04 a.m. UTC | #1
"Kyle Zhao via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +--merge-base=<commit>::
> +	Instead of finding the merge-bases for <branch1> and <branch2>,
> +	specify a merge-base for the merge.

OK.

> +	const char *merge_base = NULL;
>  
>  	const char * const merge_tree_usage[] = {
>  		N_("git merge-tree [--write-tree] [<options>] <branch1> <branch2>"),
> @@ -515,6 +533,10 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix)
>  			   &o.use_stdin,
>  			   N_("perform multiple merges, one per line of input"),
>  			   PARSE_OPT_NONEG),
> +		OPT_STRING(0, "merge-base",
> +			   &merge_base,
> +			   N_("commit"),
> +			   N_("specify a merge-base for the merge")),
>  		OPT_END()
>  	};

This looks wrong, though.

Shouldn't "git merge-tree --merge-base=X --merge-base=Y A B"
allow you to compute the merge between A and B in a history
where there are two merge bases?

Unfortunately this is already in 'next', so let's see an incremental
fix on top.

Thanks.
Kyle Zhao Nov. 22, 2022, 3:52 a.m. UTC | #2
> > +	const char *merge_base = NULL;
> >
> >  	const char * const merge_tree_usage[] = {
> >  		N_("git merge-tree [--write-tree] [<options>] <branch1>
> > <branch2>"), @@ -515,6 +533,10 @@ int cmd_merge_tree(int argc, const
> char **argv, const char *prefix)
> >  			   &o.use_stdin,
> >  			   N_("perform multiple merges, one per line of input"),
> >  			   PARSE_OPT_NONEG),
> > +		OPT_STRING(0, "merge-base",
> > +			   &merge_base,
> > +			   N_("commit"),
> > +			   N_("specify a merge-base for the merge")),
> >  		OPT_END()
> >  	};
> 
> This looks wrong, though.
> 
> Shouldn't "git merge-tree --merge-base=X --merge-base=Y A B"
> allow you to compute the merge between A and B in a history where there are
> two merge bases?
> 
> Unfortunately this is already in 'next', so let's see an incremental fix on top.
> 
> Thanks.

I agree.

OPT_STRING only use the last value of "--merge-base".
It will mislead users, they may specify the merge-base multiple times, but found it doesn't work.

I went to check the api-parse-option.txt, but I didn't found an elegant solution to stop when the users uses the
second "--merge-base".  Did I miss it? Or we just need to mention this in the git-merge-tree.txt,
such as:
 --merge-base=<commit>::
 	Instead of finding the merge-bases for <branch1> and <branch2>,
-	specify a merge-base for the merge.
+            specify a merge-base for the merge. This option doesn't support
+            being specified multiple times, only the last value you provide will be used.


Thanks,
Kyle
Junio C Hamano Nov. 22, 2022, 4:28 a.m. UTC | #3
"kylezhao(赵柯宇)" <kylezhao@tencent.com> writes:

> I went to check the api-parse-option.txt, but I didn't found an
> elegant solution to stop when the users uses the second
> "--merge-base".

That's not even a fix, as it does not allow specifying more than one
merge bases, is it?

Just like how builtin/merge-tree.c::real_merge() is prepared to
handle parent1 and parent2 that have multiple merge bases and pass
all of them to the underlying merge machinery, you can accumulate
multiple strings using OPT_STRING_LIST() and use all of them as
merge_bases, no?
Junio C Hamano Nov. 22, 2022, 5:42 a.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> "kylezhao(赵柯宇)" <kylezhao@tencent.com> writes:
>
>> I went to check the api-parse-option.txt, but I didn't found an
>> elegant solution to stop when the users uses the second
>> "--merge-base".
>
> That's not even a fix, as it does not allow specifying more than one
> merge bases, is it?
>
> Just like how builtin/merge-tree.c::real_merge() is prepared to
> handle parent1 and parent2 that have multiple merge bases and pass
> all of them to the underlying merge machinery, you can accumulate
> multiple strings using OPT_STRING_LIST() and use all of them as
> merge_bases, no?

I am a bit torn on this, though.  

Because it uses lookup_commit_reference_by_name() to obtain
base_commit in addition to parent1 and parent2, and then
get_commit_tree() on them to get their trees, the real_merge()
function in the posted patch is incapable of accepting a single
"pretend as if this tree object is the common ancestor tree" and
merging the two tree objects.  But once that flaw is fixed, using
merge_incore_nonrecursive() with an explicitly given "base", we can
merge totally trees regardless of how the commits they are found in
are related in the history, which is a very logical thing to do.
And while operating in that mode, there is no way to accept more
than one "base".

So, I would be PERFECTLY HAPPY if this new mode of operation can
take only one "base" tree object, if it allows BASE, PARENT1,
and PARENT2 to be all tree objects, not necessarily commit objects.

But if we insist that PARENT1 and PARENT2 must be commits, then I
would find it very unsatisfying if we only took a single BASE that
also must be a commit.  If the merge-base has to be a tree or trees,
then there is no way to perform recursive merge (because you cannot
compute common ancestors across tree objects) , so it is perfectly
justifiable to take only a single base (and error out upon seeing a
second --merge-base=X option).

But it has to be a commit, then there is no justification to forbid
recursive operation, and I do not see a good reason to take only one
COMMON thing.

So, it is easy to say "let's take the current patch as a good first
step", but it is unclear what the good second step would be.

We could correct the code to allow PARENT1, PARENT2 and BASE to be
tree objects, not necessarily commits, when only a single merge-base
is specified.  That corrects the current flaw that tree objects
cannot be used.  And then when more than one BASE is given (or no
BASE is given---which is the original code), we could correct the
code to call merge_incore_recursive() instead.

But the amount of the effort to get there from the current codebase
(without your patch) feels more or less the same ballpark as the
patch in question, so...
Kyle Zhao Nov. 22, 2022, 7:47 a.m. UTC | #5
> I am a bit torn on this, though.
> 
> Because it uses lookup_commit_reference_by_name() to obtain base_commit
> in addition to parent1 and parent2, and then
> get_commit_tree() on them to get their trees, the real_merge() function in the
> posted patch is incapable of accepting a single "pretend as if this tree object is
> the common ancestor tree" and merging the two tree objects.  But once that
> flaw is fixed, using
> merge_incore_nonrecursive() with an explicitly given "base", we can merge
> totally trees regardless of how the commits they are found in are related in the
> history, which is a very logical thing to do.
> And while operating in that mode, there is no way to accept more than one
> "base".
> 
> So, I would be PERFECTLY HAPPY if this new mode of operation can take only
> one "base" tree object, if it allows BASE, PARENT1, and PARENT2 to be all tree
> objects, not necessarily commit objects.
> 
> But if we insist that PARENT1 and PARENT2 must be commits, then I would find
> it very unsatisfying if we only took a single BASE that also must be a commit.  If
> the merge-base has to be a tree or trees, then there is no way to perform
> recursive merge (because you cannot compute common ancestors across tree
> objects) , so it is perfectly justifiable to take only a single base (and error out
> upon seeing a second --merge-base=X option).
> 
> But it has to be a commit, then there is no justification to forbid recursive
> operation, and I do not see a good reason to take only one COMMON thing.
> 
> So, it is easy to say "let's take the current patch as a good first step", but it is
> unclear what the good second step would be.

Yeah, I think so.  
In fact, I planned to implement a version that specifies *only one* merge-base
at the beginning and made the format of this option would be possible to
extend to support multiple bases and octopus at the same time.

> 
> We could correct the code to allow PARENT1, PARENT2 and BASE to be tree
> objects, not necessarily commits, when only a single merge-base is specified.
> That corrects the current flaw that tree objects cannot be used.  And then when
> more than one BASE is given (or no BASE is given---which is the original code),
> we could correct the code to call merge_incore_recursive() instead.

I prefer this solution.

> 
> But the amount of the effort to get there from the current codebase (without
> your patch) feels more or less the same ballpark as the patch in question, so...

It means we need to support specifying multiple bases in the first version, which makes
merge-tree command more complete. Since individual merge-tree support multiple bases, 
--stdin mode seems to need to support this too.

However, I can't think of when the user needs to manually specify multiple bases for a merge ;).
In other words, do we really need to support multiple bases in the first version?

Thanks,
Kyle
diff mbox series

Patch

diff --git a/Documentation/git-merge-tree.txt b/Documentation/git-merge-tree.txt
index 04bcc416e6e..d9dacb2ce54 100644
--- a/Documentation/git-merge-tree.txt
+++ b/Documentation/git-merge-tree.txt
@@ -64,6 +64,10 @@  OPTIONS
 	share no common history.  This flag can be given to override that
 	check and make the merge proceed anyway.
 
+--merge-base=<commit>::
+	Instead of finding the merge-bases for <branch1> and <branch2>,
+	specify a merge-base for the merge.
+
 [[OUTPUT]]
 OUTPUT
 ------
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index fe853aa8f91..62c6d43cdb9 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -3,6 +3,7 @@ 
 #include "tree-walk.h"
 #include "xdiff-interface.h"
 #include "help.h"
+#include "commit.h"
 #include "commit-reach.h"
 #include "merge-ort.h"
 #include "object-store.h"
@@ -406,6 +407,7 @@  struct merge_tree_options {
 };
 
 static int real_merge(struct merge_tree_options *o,
+		      const char *merge_base,
 		      const char *branch1, const char *branch2,
 		      const char *prefix)
 {
@@ -432,16 +434,31 @@  static int real_merge(struct merge_tree_options *o,
 	opt.branch1 = branch1;
 	opt.branch2 = branch2;
 
-	/*
-	 * Get the merge bases, in reverse order; see comment above
-	 * merge_incore_recursive in merge-ort.h
-	 */
-	merge_bases = get_merge_bases(parent1, parent2);
-	if (!merge_bases && !o->allow_unrelated_histories)
-		die(_("refusing to merge unrelated histories"));
-	merge_bases = reverse_commit_list(merge_bases);
+	if (merge_base) {
+		struct commit *base_commit;
+		struct tree *base_tree, *parent1_tree, *parent2_tree;
+
+		base_commit = lookup_commit_reference_by_name(merge_base);
+		if (!base_commit)
+			die(_("could not lookup commit %s"), merge_base);
+
+		opt.ancestor = merge_base;
+		base_tree = get_commit_tree(base_commit);
+		parent1_tree = get_commit_tree(parent1);
+		parent2_tree = get_commit_tree(parent2);
+		merge_incore_nonrecursive(&opt, base_tree, parent1_tree, parent2_tree, &result);
+	} else {
+		/*
+		 * Get the merge bases, in reverse order; see comment above
+		 * merge_incore_recursive in merge-ort.h
+		 */
+		merge_bases = get_merge_bases(parent1, parent2);
+		if (!merge_bases && !o->allow_unrelated_histories)
+			die(_("refusing to merge unrelated histories"));
+		merge_bases = reverse_commit_list(merge_bases);
+		merge_incore_recursive(&opt, merge_bases, parent1, parent2, &result);
+	}
 
-	merge_incore_recursive(&opt, merge_bases, parent1, parent2, &result);
 	if (result.clean < 0)
 		die(_("failure to merge"));
 
@@ -487,6 +504,7 @@  int cmd_merge_tree(int argc, const char **argv, const char *prefix)
 	struct merge_tree_options o = { .show_messages = -1 };
 	int expected_remaining_argc;
 	int original_argc;
+	const char *merge_base = NULL;
 
 	const char * const merge_tree_usage[] = {
 		N_("git merge-tree [--write-tree] [<options>] <branch1> <branch2>"),
@@ -515,6 +533,10 @@  int cmd_merge_tree(int argc, const char **argv, const char *prefix)
 			   &o.use_stdin,
 			   N_("perform multiple merges, one per line of input"),
 			   PARSE_OPT_NONEG),
+		OPT_STRING(0, "merge-base",
+			   &merge_base,
+			   N_("commit"),
+			   N_("specify a merge-base for the merge")),
 		OPT_END()
 	};
 
@@ -529,6 +551,8 @@  int cmd_merge_tree(int argc, const char **argv, const char *prefix)
 
 		if (o.mode == MODE_TRIVIAL)
 			die(_("--trivial-merge is incompatible with all other options"));
+		if (merge_base)
+			die(_("--merge-base is incompatible with --stdin"));
 		line_termination = '\0';
 		while (strbuf_getline_lf(&buf, stdin) != EOF) {
 			struct strbuf **split;
@@ -538,7 +562,7 @@  int cmd_merge_tree(int argc, const char **argv, const char *prefix)
 			if (!split[0] || !split[1] || split[2])
 				die(_("malformed input line: '%s'."), buf.buf);
 			strbuf_rtrim(split[0]);
-			result = real_merge(&o, split[0]->buf, split[1]->buf, prefix);
+			result = real_merge(&o, merge_base, split[0]->buf, split[1]->buf, prefix);
 			if (result < 0)
 				die(_("merging cannot continue; got unclean result of %d"), result);
 			strbuf_list_free(split);
@@ -581,7 +605,7 @@  int cmd_merge_tree(int argc, const char **argv, const char *prefix)
 
 	/* Do the relevant type of merge */
 	if (o.mode == MODE_REAL)
-		return real_merge(&o, argv[0], argv[1], prefix);
+		return real_merge(&o, merge_base, argv[0], argv[1], prefix);
 	else
 		return trivial_merge(argv[0], argv[1], argv[2]);
 }
diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
index cac85591b52..6db96ccbaae 100755
--- a/t/t4301-merge-tree-write-tree.sh
+++ b/t/t4301-merge-tree-write-tree.sh
@@ -860,4 +860,31 @@  test_expect_success '--stdin with both a successful and a conflicted merge' '
 	test_cmp expect actual
 '
 
+# specify merge-base as parent of branch2
+# git merge-tree --write-tree --merge-base=c2 c1 c3
+#   Commit c1: add file1
+#   Commit c2: add file2 after c1
+#   Commit c3: add file3 after c2
+#   Expected: add file3, and file2 does NOT appear
+
+test_expect_success 'specify merge-base as parent of branch2' '
+	# Setup
+	test_when_finished "rm -rf base-b2-p" &&
+	git init base-b2-p &&
+	test_commit -C base-b2-p c1 file1 &&
+	test_commit -C base-b2-p c2 file2 &&
+	test_commit -C base-b2-p c3 file3 &&
+
+	# Testing
+	TREE_OID=$(git -C base-b2-p merge-tree --write-tree --merge-base=c2 c1 c3) &&
+
+	q_to_tab <<-EOF >expect &&
+	100644 blob $(git -C base-b2-p rev-parse c1:file1)Qfile1
+	100644 blob $(git -C base-b2-p rev-parse c3:file3)Qfile3
+	EOF
+
+	git -C base-b2-p ls-tree $TREE_OID >actual &&
+	test_cmp expect actual
+'
+
 test_done