mbox series

[00/12] RFC: In-core git merge-tree ("Server side merges")

Message ID pull.1122.git.1642888562.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series RFC: In-core git merge-tree ("Server side merges") | expand

Message

Philippe Blain via GitGitGadget Jan. 22, 2022, 9:55 p.m. UTC
Note1: Depends on en/remerge-diff (but no need to pick it up; it's still
RFC).

(Note2: This is a continuation of my series at [0], but I can't change the
base repository for a pull request in GitHub so I had to open a new PR and
that makes it look like a new series. [0]
https://lore.kernel.org/git/pull.1114.v2.git.git.1641403655.gitgitgadget@gmail.com/)

== Basic Summary ==

This series introduces a new mode to git merge-tree allowing it to perform
real merges (three-way text content merges, recursive ancestor
consolidation, rename detection, proper directory/file conflict handling,
etc.) and write the result as a toplevel tree. It doesn't touch the working
tree or index, and doesn't create any commits or update any refs.

== Updates Log ==

Updates since v2 (thanks to Christian, Dscho, Ramsay, and René for
suggestions and comments on v2):

 * Significant changes to output format:
   * Flags no longer take a filename for additional output; they write to
     stdout instead.
   * More information included by default when there are conflicts (no need
     to request it with additional flags, instead flags can be used to
     suppress it).
   * Provide (mode, oid, stage, file) tuples -- i.e. ls-files -u style of
     information -- when there are conflicts. Add a flag to only list
     conflicted files if that's preferred.
 * Much more thorough manual for git-merge-tree.txt
 * Renamed option from --real to --write-tree
 * Accept an optional --trivial-merge option to get old style merge-tree
   behavior
 * Allow both --write-tree and --trivial-merge to be omitted since we can
   deduce which from number of arguments
 * Document exit code when the merge cannot be run (so we can distinguish
   other error cases from conflicts)
 * testcase cleanups: test_tick, early skip of test when using recursive
   backend, variable renames, etc.
 * various minor code cleanups
 * Add a new --allow-unrelated-histories option (with same meaning as the
   one used in git merge)
 * Rebased on top of en/remerge-diff to avoid a small conflict

Updates since v1 (thanks to Johannes Altmanninger and Fabian for suggestions
on v1):

 * Fixed a bad patch splitting, and a style issue pointed out by Johannes
   Altimanninger
 * Fixed misleading commit messages in new test cases
 * Fixed my comments about how commit-tree could be used to correctly use
   two -p flags

== Other notes ==

Stuff intentionally NOT included, but which others seemed to feel strongly
about; they'd need to convince me more on these:

 * Any form of diff output[1]
 * A way to omit printing the generated tree hash[2][3] In regards to these,
   also see also some of the new info in Documentation/git-merge-tree.txt,
   namely the expanded paragraph on "the second form is deprecated" in the
   description as regards diff output usability and performance, and also
   the final paragraph of the new "Mistakes to avoid" section in regards to
   tree hash.

[1]
https://lore.kernel.org/git/nycvar.QRO.7.76.6.2201101427440.339@tvgsbejvaqbjf.bet/
(section starting with "Providing a tree") [2]
https://lore.kernel.org/git/CABPp-BHvXrP0sTTmuTYfACoJTCcm9+wk_f441nj4TstrmQdqMQ@mail.gmail.com/
(sections starting with "avoid printing" and "Where did I suggest") [3]
https://lore.kernel.org/git/CABPp-BGdbh=HM7jA+_RTwSWVcMr_qvEY7RoNXooeBT2NB4Ubzw@mail.gmail.com/
(section starting with "Providing a tree object")

Elijah Newren (12):
  merge-tree: rename merge_trees() to trivial_merge_trees()
  merge-tree: move logic for existing merge into new function
  merge-tree: add option parsing and initial shell for real merge
    function
  merge-tree: implement real merges
  merge-ort: split out a separate display_update_messages() function
  merge-ort: allow update messages to be written to different file
    stream
  merge-tree: support including merge messages in output
  merge-ort: provide a merge_get_conflicted_files() helper function
  merge-tree: provide a list of which files have conflicts
  merge-tree: provide easy access to `ls-files -u` style info
  merge-tree: add a --allow-unrelated-histories flag
  git-merge-tree.txt: add a section on potentional usage mistakes

 Documentation/git-merge-tree.txt | 182 +++++++++++++++++++++++++++++--
 builtin/merge-tree.c             | 178 ++++++++++++++++++++++++++++--
 git.c                            |   2 +-
 merge-ort.c                      | 109 ++++++++++++------
 merge-ort.h                      |  30 +++++
 t/t4301-merge-tree-real.sh       | 163 +++++++++++++++++++++++++++
 6 files changed, 603 insertions(+), 61 deletions(-)
 create mode 100755 t/t4301-merge-tree-real.sh


base-commit: ea5df61cf358d3c831189e2f04863abc2157e3e1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1122%2Fnewren%2Fin-core-merge-tree-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1122/newren/in-core-merge-tree-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1122

Comments

Christian Couder Jan. 26, 2022, 8:48 a.m. UTC | #1
On Sat, Jan 22, 2022 at 10:56 PM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:

> Updates since v2 (thanks to Christian, Dscho, Ramsay, and René for
> suggestions and comments on v2):
>
>  * Significant changes to output format:
>    * Flags no longer take a filename for additional output; they write to
>      stdout instead.
>    * More information included by default when there are conflicts (no need
>      to request it with additional flags, instead flags can be used to
>      suppress it).
>    * Provide (mode, oid, stage, file) tuples -- i.e. ls-files -u style of
>      information -- when there are conflicts. Add a flag to only list
>      conflicted files if that's preferred.

The above changes seem good to me.

>  * Much more thorough manual for git-merge-tree.txt
>  * Renamed option from --real to --write-tree
>  * Accept an optional --trivial-merge option to get old style merge-tree
>    behavior
>  * Allow both --write-tree and --trivial-merge to be omitted since we can
>    deduce which from number of arguments

I still think that it might be simpler and cleaner to leave 'git
merge-tree' alone for now, and just add a new command named for
example 'git write-merge-tree'. Later we can always add flags to 'git
merge-tree' or add 'git trivial-merge-tree' as an alias for 'git
merge-tree', and eventually slowly switch 'git merge-tree' to mean
only 'git write-merge-tree' if that's where we want to go.

>  * Document exit code when the merge cannot be run (so we can distinguish
>    other error cases from conflicts)
>  * testcase cleanups: test_tick, early skip of test when using recursive
>    backend, variable renames, etc.
>  * various minor code cleanups
>  * Add a new --allow-unrelated-histories option (with same meaning as the
>    one used in git merge)

The above changes seem good to me too.

> Stuff intentionally NOT included, but which others seemed to feel strongly
> about; they'd need to convince me more on these:
>
>  * Any form of diff output[1]

It's not a big issue for me to not include them right now as long as
it's possible to add cli options later that add them. The reason is
that I think in many cases when there are conflicts, the conflicts
will be small and the user will want to see them. So it would be
simpler to just have an option to show any conflict right away, rather
than have the user launch another command (a diff-tree against which
tree and with which options?).

Thanks for working on this anyway!
Johannes Schindelin Jan. 26, 2022, 12:02 p.m. UTC | #2
Hi Christian,

On Wed, 26 Jan 2022, Christian Couder wrote:

> On Sat, Jan 22, 2022 at 10:56 PM Elijah Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>
> >  * Accept an optional --trivial-merge option to get old style merge-tree
> >    behavior
> >  * Allow both --write-tree and --trivial-merge to be omitted since we can
> >    deduce which from number of arguments
>
> I still think that it might be simpler and cleaner to leave 'git
> merge-tree' alone for now, and just add a new command named for
> example 'git write-merge-tree'.

That would assume that the original `git merge-tree` implementation was
useful. That notion has been thoroughly refuted in the meantime, though.

I am really opposed to introducing a new command here. Elijah took the
best approach we can take here: save the `merge-tree` command by teaching
it to do something useful.

> Later we can always add flags to 'git merge-tree' or add 'git
> trivial-merge-tree' as an alias for 'git merge-tree', and eventually
> slowly switch 'git merge-tree' to mean only 'git write-merge-tree' if
> that's where we want to go.

I suggested before, and seem to need to repeat again, that we need to let
ourselves be guided less by hypothetical scenarios, and more by actual,
concrete use cases where the revamped `merge-tree` command is useful.

And since I already provided some feedback based on my work from working
on a server-side backend, I am fairly certain that we already have a
pretty good idea where we want to go.

> > Stuff intentionally NOT included, but which others seemed to feel strongly
> > about; they'd need to convince me more on these:
> >
> >  * Any form of diff output[1]
>
> It's not a big issue for me to not include them right now as long as
> it's possible to add cli options later that add them.

But why? That _so_ smells like a hypothetical scenario.

We do not need the diffs. It is highly unlikely that the server-side wants
to have diffs, and if a user does want the diffs, it is very, very easy to
generate them by chaining low-level commands.

So there is absolutely no need for `git merge-tree` to produce diffs.

> The reason is that I think in many cases when there are conflicts, the
> conflicts will be small and the user will want to see them. So it would
> be simpler to just have an option to show any conflict right away,
> rather than have the user launch another command (a diff-tree against
> which tree and with which options?).

That assumes that server-side merge UIs will present merge conflicts in
the form of diffs containing merge conflict markers. Which I don't think
will happen, like, ever.

In short: I completely disagree that we should introduce a new command,
and I also completely disagree that the `merge-tree` command should output
any diffs.

I do agree that we need to be mindful of what we actually need, and in
that regard, I reiterate that we need to let concrete use cases guide us.
As part of GitLab, you might be in an excellent position to look at
GitLab's concrete server-side needs when it comes to use `git merge-tree`
to perform merges.

Ciao,
Dscho
Christian Couder Jan. 26, 2022, 2:44 p.m. UTC | #3
Hi Dscho,

On Wed, Jan 26, 2022 at 1:03 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Christian,
>
> On Wed, 26 Jan 2022, Christian Couder wrote:
>
> > On Sat, Jan 22, 2022 at 10:56 PM Elijah Newren via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >
> > >  * Accept an optional --trivial-merge option to get old style merge-tree
> > >    behavior
> > >  * Allow both --write-tree and --trivial-merge to be omitted since we can
> > >    deduce which from number of arguments
> >
> > I still think that it might be simpler and cleaner to leave 'git
> > merge-tree' alone for now, and just add a new command named for
> > example 'git write-merge-tree'.
>
> That would assume that the original `git merge-tree` implementation was
> useful. That notion has been thoroughly refuted in the meantime, though.
>
> I am really opposed to introducing a new command here. Elijah took the
> best approach we can take here: save the `merge-tree` command by teaching
> it to do something useful.

I think it's a question of point of view. If a command is completely
useless, then most of the time it needs to die, not be "saved". We
would need good statistics, but I doubt we have "saved" many useless
commands before, compared to commands we have just killed.

> > Later we can always add flags to 'git merge-tree' or add 'git
> > trivial-merge-tree' as an alias for 'git merge-tree', and eventually
> > slowly switch 'git merge-tree' to mean only 'git write-merge-tree' if
> > that's where we want to go.
>
> I suggested before, and seem to need to repeat again, that we need to let
> ourselves be guided less by hypothetical scenarios, and more by actual,
> concrete use cases where the revamped `merge-tree` command is useful.

Ok, see below.

> And since I already provided some feedback based on my work from working
> on a server-side backend, I am fairly certain that we already have a
> pretty good idea where we want to go.
>
> > > Stuff intentionally NOT included, but which others seemed to feel strongly
> > > about; they'd need to convince me more on these:
> > >
> > >  * Any form of diff output[1]
> >
> > It's not a big issue for me to not include them right now as long as
> > it's possible to add cli options later that add them.
>
> But why? That _so_ smells like a hypothetical scenario.
>
> We do not need the diffs. It is highly unlikely that the server-side wants
> to have diffs, and if a user does want the diffs, it is very, very easy to
> generate them by chaining low-level commands.
>
> So there is absolutely no need for `git merge-tree` to produce diffs.
>
> > The reason is that I think in many cases when there are conflicts, the
> > conflicts will be small and the user will want to see them. So it would
> > be simpler to just have an option to show any conflict right away,
> > rather than have the user launch another command (a diff-tree against
> > which tree and with which options?).
>
> That assumes that server-side merge UIs will present merge conflicts in
> the form of diffs containing merge conflict markers. Which I don't think
> will happen, like, ever.

Please take a look at:

https://docs.gitlab.com/ee/user/project/merge_requests/conflicts.html#resolve-conflicts-in-the-inline-editor

As you can see in the image there are conflict markers in the file
displayed by the server UI.

> In short: I completely disagree that we should introduce a new command,
> and I also completely disagree that the `merge-tree` command should output
> any diffs.
>
> I do agree that we need to be mindful of what we actually need, and in
> that regard, I reiterate that we need to let concrete use cases guide us.
> As part of GitLab, you might be in an excellent position to look at
> GitLab's concrete server-side needs when it comes to use `git merge-tree`
> to perform merges.

I hope I provided a concrete use case with the link above.
Johannes Schindelin Jan. 28, 2022, 12:58 p.m. UTC | #4
Hi Christian,

On Wed, 26 Jan 2022, Christian Couder wrote:

> On Wed, Jan 26, 2022 at 1:03 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Wed, 26 Jan 2022, Christian Couder wrote:
> >
> > > The reason is that I think in many cases when there are conflicts,
> > > the conflicts will be small and the user will want to see them. So
> > > it would be simpler to just have an option to show any conflict
> > > right away, rather than have the user launch another command (a
> > > diff-tree against which tree and with which options?).
> >
> > That assumes that server-side merge UIs will present merge conflicts in
> > the form of diffs containing merge conflict markers. Which I don't think
> > will happen, like, ever.
>
> Please take a look at:
>
> https://docs.gitlab.com/ee/user/project/merge_requests/conflicts.html#resolve-conflicts-in-the-inline-editor
>
> As you can see in the image there are conflict markers in the file
> displayed by the server UI.

Please note the difference between what I wrote above (present merge
conflicts in the form of diffs containing merge conflict markers) and what
is shown in the document you linked to (present a file annotated with
merge conflict markers).

There is no diff in that page.

What's more: there are not only conflict markers in the editor, there is
clearly a visual marker next to the line number that indicates that the
editor has a fundamental understanding where the conflict markers are.
Which means that the conflict markers must have been generated
independently of Git rather than parsed in some random diff that was
produced by Git.

In other words: you are making my case for me that `git merge-tree` should
not generate diff output because it would not even be used.

> > In short: I completely disagree that we should introduce a new command,
> > and I also completely disagree that the `merge-tree` command should output
> > any diffs.
> >
> > I do agree that we need to be mindful of what we actually need, and in
> > that regard, I reiterate that we need to let concrete use cases guide us.
> > As part of GitLab, you might be in an excellent position to look at
> > GitLab's concrete server-side needs when it comes to use `git merge-tree`
> > to perform merges.
>
> I hope I provided a concrete use case with the link above.

Sorry, I apparently was a bit unclear.

In the context of discussing `git merge-tree`, a low-level Git command,
when I talk about a user, I do not mean a human being, but a program that
calls that command and parses its output.

Corollary: by "use case" I refer to a concrete implementation of a
server-side merge operation, I refer to backend code that currently calls
into libgit2 to perform the merge, and would benefit from calling `git
merge-tree` instead. Such a use case informs us about the type and amount
of information that is required of the code that is currently being
discussed in this mail thread. And I highly doubt that you will find such
a use case that wants libgit2 (and later, `git merge-tree`) to generate
diffs. Because _diffs_ are certainly _not_ what is consumed by the inline
editor you referenced.

Of course, I am still left guessing what the server-side needs concretely,
because that is not at all obvious from the user-facing web site to which
you sent me. What is needed is a good, hard look at the actual _code_, the
code that calls into libgit2 to perform a merge, and that could instead
spawn a `git merge-tree` process to accomplish the same thing.

We need to get away from hypothetical scenarios. They're not helping.

Ciao,
Johannes
Christian Couder Jan. 28, 2022, 1:37 p.m. UTC | #5
Hi Dscho,

On Fri, Jan 28, 2022 at 1:58 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Christian,
>
> On Wed, 26 Jan 2022, Christian Couder wrote:
>
> > On Wed, Jan 26, 2022 at 1:03 PM Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> > >
> > > On Wed, 26 Jan 2022, Christian Couder wrote:
> > >
> > > > The reason is that I think in many cases when there are conflicts,
> > > > the conflicts will be small and the user will want to see them. So
> > > > it would be simpler to just have an option to show any conflict
> > > > right away, rather than have the user launch another command (a
> > > > diff-tree against which tree and with which options?).
> > >
> > > That assumes that server-side merge UIs will present merge conflicts in
> > > the form of diffs containing merge conflict markers. Which I don't think
> > > will happen, like, ever.
> >
> > Please take a look at:
> >
> > https://docs.gitlab.com/ee/user/project/merge_requests/conflicts.html#resolve-conflicts-in-the-inline-editor
> >
> > As you can see in the image there are conflict markers in the file
> > displayed by the server UI.
>
> Please note the difference between what I wrote above (present merge
> conflicts in the form of diffs containing merge conflict markers) and what
> is shown in the document you linked to (present a file annotated with
> merge conflict markers).
>
> There is no diff in that page.

The server UI could just get a diff with the conflicts inside instead
of the full files with conflict inside, as the diff would be smaller
to parse than the full files. So even if it's not shown, the diff
could still be useful.

Also just above the section of the link I sent, there is this section

https://docs.gitlab.com/ee/user/project/merge_requests/conflicts.html#resolve-conflicts-in-interactive-mode

where one can see diff markers in the image. There are no conflict
markers in those images, but it's possible that a future UI could
combine both a diff and conflict markers.

Also please note that I don't absolutely require diffs. At the
beginning of the paragraph from my original email that you quoted
above, there was:

"It's not a big issue for me to not include them right now as long as
it's possible to add cli options later that add them."

So I was just saying that the format and code should be flexible
enough to be able to easily accommodate sending further data like
diffs with additional options. I think it's a very reasonable request.
So please don't make it a huge issue. You can always NACK a patch
adding such an option later.

> What's more: there are not only conflict markers in the editor,

You don't see the ">>>>>>>"?

> there is
> clearly a visual marker next to the line number that indicates that the
> editor has a fundamental understanding where the conflict markers are.

Yeah, so this shows that those markers can be important for the editor.

> Which means that the conflict markers must have been generated
> independently of Git rather than parsed in some random diff that was
> produced by Git.

Why couldn't they be generated by Git and then just parsed from a diff
in the future, even if that was not the case here?

> In other words: you are making my case for me that `git merge-tree` should
> not generate diff output because it would not even be used.

The other link above in this email actually shows that diffs are used
right now to resolve conflicts.

> > > In short: I completely disagree that we should introduce a new command,
> > > and I also completely disagree that the `merge-tree` command should output
> > > any diffs.
> > >
> > > I do agree that we need to be mindful of what we actually need, and in
> > > that regard, I reiterate that we need to let concrete use cases guide us.
> > > As part of GitLab, you might be in an excellent position to look at
> > > GitLab's concrete server-side needs when it comes to use `git merge-tree`
> > > to perform merges.
> >
> > I hope I provided a concrete use case with the link above.
>
> Sorry, I apparently was a bit unclear.
>
> In the context of discussing `git merge-tree`, a low-level Git command,
> when I talk about a user, I do not mean a human being, but a program that
> calls that command and parses its output.

So it could very well parse diffs containing conflict markers and show
those conflict markers.

[...]

> Of course, I am still left guessing what the server-side needs concretely,
> because that is not at all obvious from the user-facing web site to which
> you sent me. What is needed is a good, hard look at the actual _code_, the
> code that calls into libgit2 to perform a merge, and that could instead
> spawn a `git merge-tree` process to accomplish the same thing.
>
> We need to get away from hypothetical scenarios. They're not helping.

I am not even asking for a feature, just to make it possible to extend
the output of a brand new command in an RFC with some possibly useful
things, and you are making such requests...

Please relax a bit on this.
Johannes Schindelin Jan. 28, 2022, 4:05 p.m. UTC | #6
Hi Christian,

On Fri, 28 Jan 2022, Christian Couder wrote:

> On Fri, Jan 28, 2022 at 1:58 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Wed, 26 Jan 2022, Christian Couder wrote:
> >
> > > On Wed, Jan 26, 2022 at 1:03 PM Johannes Schindelin
> > > <Johannes.Schindelin@gmx.de> wrote:
> > > >
> > > > On Wed, 26 Jan 2022, Christian Couder wrote:
> > > >
> > > > > The reason is that I think in many cases when there are conflicts,
> > > > > the conflicts will be small and the user will want to see them. So
> > > > > it would be simpler to just have an option to show any conflict
> > > > > right away, rather than have the user launch another command (a
> > > > > diff-tree against which tree and with which options?).
> > > >
> > > > That assumes that server-side merge UIs will present merge conflicts in
> > > > the form of diffs containing merge conflict markers. Which I don't think
> > > > will happen, like, ever.
> > >
> > > Please take a look at:
> > >
> > > https://docs.gitlab.com/ee/user/project/merge_requests/conflicts.html#resolve-conflicts-in-the-inline-editor
> > >
> > > As you can see in the image there are conflict markers in the file
> > > displayed by the server UI.
> >
> > Please note the difference between what I wrote above (present merge
> > conflicts in the form of diffs containing merge conflict markers) and what
> > is shown in the document you linked to (present a file annotated with
> > merge conflict markers).
> >
> > There is no diff in that page.
>
> The server UI could just get a diff with the conflicts inside instead
> of the full files with conflict inside, as the diff would be smaller
> to parse than the full files. So even if it's not shown, the diff
> could still be useful.

You really need to get away from talking about this in hypothetical terms.

> Also just above the section of the link I sent, there is this section
>
> https://docs.gitlab.com/ee/user/project/merge_requests/conflicts.html#resolve-conflicts-in-interactive-mode
>
> where one can see diff markers in the image.

That's a side-by-side diff. Git cannot even produce those.

> > What's more: there are not only conflict markers in the editor,
>
> You don't see the ">>>>>>>"?

Yes, I do. And not only that. I also see that the editor knows very
specifically where the conflict happens.

And since any file can contain `>>>>>>>` _without_ it being a conflict
marker, the editor most likely does not parse the output of Git that
contains a conflict marker. At least I hope it does not because it would
then very easily be confused by strings that look like conflict markers,
but aren't.

Think about our very own test suite, and why we specifically set
`conflict-marker-size=32` for those files. Same reason why the server
backend cannot simply ingest files with conflict markers and then hope to
figure out which `>>>>>>>` are real conflict markers and which are not.

> > there is clearly a visual marker next to the line number that
> > indicates that the editor has a fundamental understanding where the
> > conflict markers are.
>
> Yeah, so this shows that those markers can be important for the editor.

Of course they are important! That's my point!

> > In other words: you are making my case for me that `git merge-tree` should
> > not generate diff output because it would not even be used.
>
> The other link above in this email actually shows that diffs are used
> right now to resolve conflicts.

It shows that Git was not used to generate the diff, is what it shows.

I see that you are still trying to guess what the server-side needs
actually are. It really is time to stop guessing. So I will keep
challenging you to actually look at the GitLab code, to take a stab at
teaching it to use `git merge-tree` to perform merges. And then to come
back with what you learned. I guarantee you that that will be multiple
times more useful than talking about it in hypotheticals.

And you are in such an almost unique position to contribute to this patch
series, to provide that very valuable feedback how `git merge-tree` could
be improved to support actual, real-life server-side code that is
currently in use! So why not make the most out of it?

Ciao,
Johannes
Johannes Schindelin Jan. 28, 2022, 5 p.m. UTC | #7
Hi Elijah,

On Sat, 22 Jan 2022, Elijah Newren via GitGitGadget wrote:

> Note1: Depends on en/remerge-diff (but no need to pick it up; it's still
> RFC).

For reasons, I will have to backport this on top of v2.33.1 ;-)

> == Updates Log ==
>
> Updates since v2 (thanks to Christian, Dscho, Ramsay, and René for
> suggestions and comments on v2):
>
>  * Significant changes to output format:
>    * Flags no longer take a filename for additional output; they write to
>      stdout instead.
>    * More information included by default when there are conflicts (no need
>      to request it with additional flags, instead flags can be used to
>      suppress it).
>    * Provide (mode, oid, stage, file) tuples -- i.e. ls-files -u style of
>      information -- when there are conflicts. Add a flag to only list
>      conflicted files if that's preferred.
>  * Much more thorough manual for git-merge-tree.txt
>  * Renamed option from --real to --write-tree
>  * Accept an optional --trivial-merge option to get old style merge-tree
>    behavior
>  * Allow both --write-tree and --trivial-merge to be omitted since we can
>    deduce which from number of arguments
>  * Document exit code when the merge cannot be run (so we can distinguish
>    other error cases from conflicts)
>  * testcase cleanups: test_tick, early skip of test when using recursive
>    backend, variable renames, etc.
>  * various minor code cleanups
>  * Add a new --allow-unrelated-histories option (with same meaning as the
>    one used in git merge)
>  * Rebased on top of en/remerge-diff to avoid a small conflict

I am really happy with the way this patch series is going, and hope to be
a lot more active on it in the near future.

I've read through the current iteration and left a few suggestions,
nothing major.

Thank you so much for your outstanding work!
Dscho
Elijah Newren Jan. 29, 2022, 7:03 a.m. UTC | #8
On Wed, Jan 26, 2022 at 12:48 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Sat, Jan 22, 2022 at 10:56 PM Elijah Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>
> > Updates since v2 (thanks to Christian, Dscho, Ramsay, and René for
> > suggestions and comments on v2):
> >
> >  * Significant changes to output format:
> >    * Flags no longer take a filename for additional output; they write to
> >      stdout instead.
> >    * More information included by default when there are conflicts (no need
> >      to request it with additional flags, instead flags can be used to
> >      suppress it).
> >    * Provide (mode, oid, stage, file) tuples -- i.e. ls-files -u style of
> >      information -- when there are conflicts. Add a flag to only list
> >      conflicted files if that's preferred.
>
> The above changes seem good to me.

:-)

> >  * Much more thorough manual for git-merge-tree.txt
> >  * Renamed option from --real to --write-tree
> >  * Accept an optional --trivial-merge option to get old style merge-tree
> >    behavior
> >  * Allow both --write-tree and --trivial-merge to be omitted since we can
> >    deduce which from number of arguments
>
> I still think that it might be simpler and cleaner to leave 'git
> merge-tree' alone for now, and just add a new command named for
> example 'git write-merge-tree'. Later we can always add flags to 'git
> merge-tree' or add 'git trivial-merge-tree' as an alias for 'git
> merge-tree', and eventually slowly switch 'git merge-tree' to mean
> only 'git write-merge-tree' if that's where we want to go.

Understood.  Since we can't immediately kill the old merge-tree, I
don't think there's a perfectly clean solution here, and it's totally
understandable that different folks would have different opinions on
which interim choice would be cleanest.  What you suggest is
reasonable, I just personally prefer the path in this series.

> >  * Document exit code when the merge cannot be run (so we can distinguish
> >    other error cases from conflicts)
> >  * testcase cleanups: test_tick, early skip of test when using recursive
> >    backend, variable renames, etc.
> >  * various minor code cleanups
> >  * Add a new --allow-unrelated-histories option (with same meaning as the
> >    one used in git merge)
>
> The above changes seem good to me too.

Thanks for reading over things carefully and for providing many
detailed, helpful comments!

> > Stuff intentionally NOT included, but which others seemed to feel strongly
> > about; they'd need to convince me more on these:
> >
> >  * Any form of diff output[1]
>
> It's not a big issue for me to not include them right now as long as
> it's possible to add cli options later that add them.

My main concern is just that `merge-tree` remain a low-level tool and
have machine-parseable output.  I was a little worried that both you
and Dscho wanted everything on stdout rather than in separate files,
as the <Informational messages> part of the output is rather
free-form.  But since it's at the end, and has a machine-parseable
beginning, it can just be slurped in and we're all good.  The diff
output raises my eyebrow because I'm worried we're losing this
property.  If there are clear usecases for adding more output, and we
can do so without losing this machine-parseable property, I don't have
a problem with adding an option for it.

One analogy we might use here is that `git merge` provides a diffstat
at the end.  What you're asking is more than a diffstat, but might be
considered similar-ish in nature.

> The reason is
> that I think in many cases when there are conflicts, the conflicts
> will be small and the user will want to see them.

I'm a little worried about the assumption here that conflict size is
measurable and visible via diffs.  That might be true in some cases,
but a UI written with that assumption is going to be very confusing
when hitting cases where that assumption does not hold.  For example:

  * What if there is a binary file conflict, or a modify/delete or
rename/delete conflict, or failed-to-merge submodule conflict, or a
file location conflict? (For these, there is no diff relative to the
first parent and hence this conflict would have no diff output for
it)?
  * What if there was a simple file/directory conflict?  A diff would
show a rename (even when neither side did any renames), but not any
conflict markers.
  * What if there was a rename/rename conflict (both sides renamed
same file differently) or a distinct types conflict?  The former
results in three different conflicting files, none of them with
conflict markers, while the latter results in two different
conflicting files both without conflict markers?  Showing individual
per-file diffs totally loses all context here -- it'll show no-diff
for one of the files, and totally new additions for the ones.

Such a problem statement just seems fraught with edge cases to me, and
suggests that the problem statement might be in need of revisiting.

Don't read this as me closing the door on the possibility of diffs;
I'm not trying to do that.  I'm listing my misgivings about how I
think they might be used (i.e. be careful if you're headed down this
path as you might be digging yourself a never-ending support hole).
You can also think of my comments as feedback to consider and address
when you propose a future feature addition for adding diffs.  If/when
you propose such a feature, we'd probably be able to dive more into
specifics and usecases at that time, which may or may not circumvent
my concerns.

> So it would be
> simpler to just have an option to show any conflict right away, rather
> than have the user launch another command (a diff-tree against which
> tree and with which options?).

Um, this part I'm not sure I get.  I thought the reason for the diffs
was performance -- you knew you wanted the diffs, and you wanted it
done as part of the same process.  But why would this be simpler?
Your patch series included three different diffs, and the emails you
pointed me at suggested all kinds of configurability.  That suggests
the merge-tree command would have to take the exact same options the
user would supply to diff, and thus would have to be told all the same
options, right?  I don't see how this removes any complexity at all
for the user.

Unless...is the request in some way similar to merge's diffstat where
there is always a very specific type of diff that is wanted and you
aren't envisioning much flexibility in what kind of diff or what to
diff against -- is that where the simplification comes from?
Christian Couder Jan. 29, 2022, 8:17 a.m. UTC | #9
On Sat, Jan 29, 2022 at 8:04 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Wed, Jan 26, 2022 at 12:48 AM Christian Couder
> <christian.couder@gmail.com> wrote:
> >
> > On Sat, Jan 22, 2022 at 10:56 PM Elijah Newren via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:

> > > Stuff intentionally NOT included, but which others seemed to feel strongly
> > > about; they'd need to convince me more on these:
> > >
> > >  * Any form of diff output[1]
> >
> > It's not a big issue for me to not include them right now as long as
> > it's possible to add cli options later that add them.
>
> My main concern is just that `merge-tree` remain a low-level tool and
> have machine-parseable output.  I was a little worried that both you
> and Dscho wanted everything on stdout rather than in separate files,
> as the <Informational messages> part of the output is rather
> free-form.  But since it's at the end, and has a machine-parseable
> beginning, it can just be slurped in and we're all good.  The diff
> output raises my eyebrow because I'm worried we're losing this
> property.  If there are clear usecases for adding more output, and we
> can do so without losing this machine-parseable property, I don't have
> a problem with adding an option for it.

That's ok for me for now. I will certainly not work on adding options
for diff output without any usecase.

> One analogy we might use here is that `git merge` provides a diffstat
> at the end.  What you're asking is more than a diffstat, but might be
> considered similar-ish in nature.
>
> > The reason is
> > that I think in many cases when there are conflicts, the conflicts
> > will be small and the user will want to see them.
>
> I'm a little worried about the assumption here that conflict size is
> measurable and visible via diffs.  That might be true in some cases,
> but a UI written with that assumption is going to be very confusing
> when hitting cases where that assumption does not hold.  For example:
>
>   * What if there is a binary file conflict, or a modify/delete or
> rename/delete conflict, or failed-to-merge submodule conflict, or a
> file location conflict? (For these, there is no diff relative to the
> first parent and hence this conflict would have no diff output for
> it)?
>   * What if there was a simple file/directory conflict?  A diff would
> show a rename (even when neither side did any renames), but not any
> conflict markers.
>   * What if there was a rename/rename conflict (both sides renamed
> same file differently) or a distinct types conflict?  The former
> results in three different conflicting files, none of them with
> conflict markers, while the latter results in two different
> conflicting files both without conflict markers?  Showing individual
> per-file diffs totally loses all context here -- it'll show no-diff
> for one of the files, and totally new additions for the ones.

In those cases we just tell users that they cannot resolve those
conflicts in the user interface, see the following doc:

https://docs.gitlab.com/ee/user/project/merge_requests/conflicts.html#conflicts-you-can-resolve-in-the-user-interface

> Such a problem statement just seems fraught with edge cases to me, and
> suggests that the problem statement might be in need of revisiting.

Users understand that some kinds of conflicts cannot yet be resolved
using a user interface. Maybe we will be able to make improvements so
that more kinds of conflicts can be resolved in a UI in the future
though. That's why a flexible and extensible output could help.

> Don't read this as me closing the door on the possibility of diffs;
> I'm not trying to do that.  I'm listing my misgivings about how I
> think they might be used (i.e. be careful if you're headed down this
> path as you might be digging yourself a never-ending support hole).
> You can also think of my comments as feedback to consider and address
> when you propose a future feature addition for adding diffs.  If/when
> you propose such a feature, we'd probably be able to dive more into
> specifics and usecases at that time, which may or may not circumvent
> my concerns.

I know that diffs, or any new single feature, will likely not be a
silver bullet.

> > So it would be
> > simpler to just have an option to show any conflict right away, rather
> > than have the user launch another command (a diff-tree against which
> > tree and with which options?).
>
> Um, this part I'm not sure I get.  I thought the reason for the diffs
> was performance -- you knew you wanted the diffs, and you wanted it
> done as part of the same process.  But why would this be simpler?

In the commit message of 4/12 you show an example of using it in simple scripts:

NEWTREE=$(git merge-tree --write-tree $BRANCH1 $BRANCH2)
test $? -eq 0 || die "There were conflicts..."
...

So I think it would be simpler for someone interested in seeing the
conflicts, like a script writer, or maybe someone using it manually
for example as a dry run before performing a merge, to be able to get
them right away from the command rather than to have to use another
command (which means finding the right command, arguments and options
for that) to get them.

> Your patch series included three different diffs, and the emails you
> pointed me at suggested all kinds of configurability.  That suggests
> the merge-tree command would have to take the exact same options the
> user would supply to diff, and thus would have to be told all the same
> options, right?  I don't see how this removes any complexity at all
> for the user.
>
> Unless...is the request in some way similar to merge's diffstat where
> there is always a very specific type of diff that is wanted and you
> aren't envisioning much flexibility in what kind of diff or what to
> diff against -- is that where the simplification comes from?

Well I just think the default diff output could be tailored for the
most likely usecases and options made available later for more
advanced usecases or users.
Elijah Newren Jan. 29, 2022, 5:43 p.m. UTC | #10
Hi Christian,

On Sat, Jan 29, 2022 at 12:18 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Sat, Jan 29, 2022 at 8:04 AM Elijah Newren <newren@gmail.com> wrote:
> >
> > On Wed, Jan 26, 2022 at 12:48 AM Christian Couder
> > <christian.couder@gmail.com> wrote:
> > >
> > > On Sat, Jan 22, 2022 at 10:56 PM Elijah Newren via GitGitGadget
> > > <gitgitgadget@gmail.com> wrote:
>
> > > > Stuff intentionally NOT included, but which others seemed to feel strongly
> > > > about; they'd need to convince me more on these:
> > > >
> > > >  * Any form of diff output[1]
> > >
> > > It's not a big issue for me to not include them right now as long as
> > > it's possible to add cli options later that add them.
> >
> > My main concern is just that `merge-tree` remain a low-level tool and
> > have machine-parseable output.  I was a little worried that both you
> > and Dscho wanted everything on stdout rather than in separate files,
> > as the <Informational messages> part of the output is rather
> > free-form.  But since it's at the end, and has a machine-parseable
> > beginning, it can just be slurped in and we're all good.  The diff
> > output raises my eyebrow because I'm worried we're losing this
> > property.  If there are clear usecases for adding more output, and we
> > can do so without losing this machine-parseable property, I don't have
> > a problem with adding an option for it.
>
> That's ok for me for now. I will certainly not work on adding options
> for diff output without any usecase.
>
> > One analogy we might use here is that `git merge` provides a diffstat
> > at the end.  What you're asking is more than a diffstat, but might be
> > considered similar-ish in nature.
> >
> > > The reason is
> > > that I think in many cases when there are conflicts, the conflicts
> > > will be small and the user will want to see them.
> >
> > I'm a little worried about the assumption here that conflict size is
> > measurable and visible via diffs.  That might be true in some cases,
> > but a UI written with that assumption is going to be very confusing
> > when hitting cases where that assumption does not hold.  For example:
> >
> >   * What if there is a binary file conflict, or a modify/delete or
> > rename/delete conflict, or failed-to-merge submodule conflict, or a
> > file location conflict? (For these, there is no diff relative to the
> > first parent and hence this conflict would have no diff output for
> > it)?
> >   * What if there was a simple file/directory conflict?  A diff would
> > show a rename (even when neither side did any renames), but not any
> > conflict markers.
> >   * What if there was a rename/rename conflict (both sides renamed
> > same file differently) or a distinct types conflict?  The former
> > results in three different conflicting files, none of them with
> > conflict markers, while the latter results in two different
> > conflicting files both without conflict markers?  Showing individual
> > per-file diffs totally loses all context here -- it'll show no-diff
> > for one of the files, and totally new additions for the ones.
>
> In those cases we just tell users that they cannot resolve those
> conflicts in the user interface, see the following doc:
>
> https://docs.gitlab.com/ee/user/project/merge_requests/conflicts.html#conflicts-you-can-resolve-in-the-user-interface

So...I think you may have just convinced me that my fears were
justified and that I should probably NAK any attempt to add diffs to
the merge-tree command.  I won't jump to conclusions but you've
provided some pretty strong signal to me against going down that
route.  The list of limitations in the link you provide do mostly
avoid the broken cases I listed above, but it enshrines those
limitations on that webpage as fundamental rather than just as current
implementation shortcomings.  You may not be able to remove those
limitations on that webpage without either expunging the diffs from
the UI or exposing the brokenness of the various cases above.

If you do propose a diff option in the future, come prepared to
discuss how you'll avoid accidentally leading others down into paths
with the same fundamental issues, and/or how the above types of
conflicts might still be meaningfully handled.

Also, the list of limitations you have may not be quite comprehensive
enough to avoid all problems (though it certainly avoids most of
them).  Can I ask a couple clarifying questions about your list of
limitations in that link? :

  * When that page says the file cannot already contain conflict
markers, is the check performed on the version of the file in the two
trees being merged, or is the check performed on the 2nd and 3rd index
stage of the merge result (these are not equivalent checks, even if
they often give the same answer)?
  * When that page says the file must already exist in the same path
on both branches, is the check performed on by checking the path in
the two trees being merged, or is the check performed on the 2nd and
3rd index stage of the merge result (again, these are not equivalent
checks)?

> > Such a problem statement just seems fraught with edge cases to me, and
> > suggests that the problem statement might be in need of revisiting.
>
> Users understand that some kinds of conflicts cannot yet be resolved
> using a user interface. Maybe we will be able to make improvements so
> that more kinds of conflicts can be resolved in a UI in the future
> though. That's why a flexible and extensible output could help.

I think future improvements to handle more conflict types may well
hinge on removing the diff-output-using portion of your interface; I
think it may well be fundamentally incompatible.

I agree we want to leave the output format open for extension, I'm
just saying we have to be careful about what extensions are included
and this one worries me.

> > Don't read this as me closing the door on the possibility of diffs;
> > I'm not trying to do that.  I'm listing my misgivings about how I
> > think they might be used (i.e. be careful if you're headed down this
> > path as you might be digging yourself a never-ending support hole).
> > You can also think of my comments as feedback to consider and address
> > when you propose a future feature addition for adding diffs.  If/when
> > you propose such a feature, we'd probably be able to dive more into
> > specifics and usecases at that time, which may or may not circumvent
> > my concerns.
>
> I know that diffs, or any new single feature, will likely not be a
> silver bullet.

Sure that's fair; not being a silver bullet is fine.  We do need to
avoid providing a kryptonite bullet, though.

> > > So it would be
> > > simpler to just have an option to show any conflict right away, rather
> > > than have the user launch another command (a diff-tree against which
> > > tree and with which options?).
> >
> > Um, this part I'm not sure I get.  I thought the reason for the diffs
> > was performance -- you knew you wanted the diffs, and you wanted it
> > done as part of the same process.  But why would this be simpler?
>
> In the commit message of 4/12 you show an example of using it in simple scripts:
>
> NEWTREE=$(git merge-tree --write-tree $BRANCH1 $BRANCH2)
> test $? -eq 0 || die "There were conflicts..."
> ...
>
> So I think it would be simpler for someone interested in seeing the
> conflicts, like a script writer, or maybe someone using it manually
> for example as a dry run before performing a merge, to be able to get
> them right away from the command...

Okay, but the command already does that.  When there are conflicts,
NEWTREE won't actually be a tree; it'll be lots of lines of output.
That's (part of) the reason for the exit status check.  So users
already get that information from the command.

> ...rather than to have to use another
> command (which means finding the right command, arguments and options
> for that) to get them.

As for finding the right arguments and options...

> > Your patch series included three different diffs, and the emails you
> > pointed me at suggested all kinds of configurability.  That suggests
> > the merge-tree command would have to take the exact same options the
> > user would supply to diff, and thus would have to be told all the same
> > options, right?  I don't see how this removes any complexity at all
> > for the user.
> >
> > Unless...is the request in some way similar to merge's diffstat where
> > there is always a very specific type of diff that is wanted and you
> > aren't envisioning much flexibility in what kind of diff or what to
> > diff against -- is that where the simplification comes from?
>
> Well I just think the default diff output could be tailored for the
> most likely usecases and options made available later for more
> advanced usecases or users.

Ah, so this may go against your earlier comments at [1] about a
merge-tree on steroids and a huge array of diff options, or against
your comments about diffs not being provided by default (also [1]).
Because if you have that huge range of diff options, and the diffs are
not provided by default, then it's not clear how you've simplified
things because users would still need to figure out the right
arguments and options to pass, it's just that the user would have to
pass all (or maybe just most?) of those arguments and options to
merge-tree instead of to diff.  Or is the simpler UI you're discussing
really just about not needing to include 1 argument, the name of the
new toplevel tree, since that single argument could be implicit?

I'm having a hard time buying the "simpler UI for script writers"
angle of argument here, especially for script writers who should fully
be able to look up the appropriate commands and use them.  Your
earlier arguments about performance being important (having both the
merge & the diff in the same process) seemed much more convincing to
me.  But maybe I'm still just missing something about your "simpler"
angle?

[1] https://lore.kernel.org/git/CAP8UFD0wKnAg5oyMWchXysPTg3K9Vb4M1tRcPzPE81QM903pYg@mail.gmail.com/
Elijah Newren Jan. 31, 2022, 5:45 p.m. UTC | #11
On Sat, Jan 29, 2022 at 9:43 AM Elijah Newren <newren@gmail.com> wrote:
>
> Hi Christian,
>
> On Sat, Jan 29, 2022 at 12:18 AM Christian Couder
> <christian.couder@gmail.com> wrote:
> >
> > On Sat, Jan 29, 2022 at 8:04 AM Elijah Newren <newren@gmail.com> wrote:
> > >
> > > On Wed, Jan 26, 2022 at 12:48 AM Christian Couder
> > > <christian.couder@gmail.com> wrote:
[...]
> > > > The reason is
> > > > that I think in many cases when there are conflicts, the conflicts
> > > > will be small and the user will want to see them.
> > >
> > > I'm a little worried about the assumption here that conflict size is
> > > measurable and visible via diffs.  That might be true in some cases,
> > > but a UI written with that assumption is going to be very confusing
> > > when hitting cases where that assumption does not hold.  For example:
> > >
> > >   * What if there is a binary file conflict, or a modify/delete or
> > > rename/delete conflict, or failed-to-merge submodule conflict, or a
> > > file location conflict? (For these, there is no diff relative to the
> > > first parent and hence this conflict would have no diff output for
> > > it)?
> > >   * What if there was a simple file/directory conflict?  A diff would
> > > show a rename (even when neither side did any renames), but not any
> > > conflict markers.
> > >   * What if there was a rename/rename conflict (both sides renamed
> > > same file differently) or a distinct types conflict?  The former
> > > results in three different conflicting files, none of them with
> > > conflict markers, while the latter results in two different
> > > conflicting files both without conflict markers?  Showing individual
> > > per-file diffs totally loses all context here -- it'll show no-diff
> > > for one of the files, and totally new additions for the ones.
> >
> > In those cases we just tell users that they cannot resolve those
> > conflicts in the user interface, see the following doc:
> >
> > https://docs.gitlab.com/ee/user/project/merge_requests/conflicts.html#conflicts-you-can-resolve-in-the-user-interface
>
> So...I think you may have just convinced me that my fears were
> justified and that I should probably NAK any attempt to add diffs to
> the merge-tree command.  I won't jump to conclusions but you've
> provided some pretty strong signal to me against going down that
> route.  The list of limitations in the link you provide do mostly
> avoid the broken cases I listed above, but it enshrines those
> limitations on that webpage as fundamental rather than just as current
> implementation shortcomings.  You may not be able to remove those
> limitations on that webpage without either expunging the diffs from
> the UI or exposing the brokenness of the various cases above.
>
> If you do propose a diff option in the future, come prepared to
> discuss how you'll avoid accidentally leading others down into paths
> with the same fundamental issues, and/or how the above types of
> conflicts might still be meaningfully handled.

Actually, after having a few extra days to think about it, I thought
of something that should have been obvious to me, given my other
in-flight series that this depends upon...

If you used the same trick that remerge-diff does to include the
CONFLICT (and related messages) headers in the diff, then the kinds of
conflicts that are normally either invisible or misleading/confusing
to show via a diff would suddenly have the extra notices needed to
explain them, and make this problem tractable.

Further, it'd only make sense to do the special diff as part of the
merge-tree process, since it has the conflict messages strmap needed
to do this.

And there's not all that much work that would be needed to take
advantage of this, especially since this series already depends upon
the remerge-diff series.

So, maybe this is fine after all.

> Also, the list of limitations you have may not be quite comprehensive
> enough to avoid all problems (though it certainly avoids most of
> them).  Can I ask a couple clarifying questions about your list of
> limitations in that link? :
>
>   * When that page says the file cannot already contain conflict
> markers, is the check performed on the version of the file in the two
> trees being merged, or is the check performed on the 2nd and 3rd index
> stage of the merge result (these are not equivalent checks, even if
> they often give the same answer)?
>   * When that page says the file must already exist in the same path
> on both branches, is the check performed on by checking the path in
> the two trees being merged, or is the check performed on the 2nd and
> 3rd index stage of the merge result (again, these are not equivalent
> checks)?

I am still curious about this either way.