[RFC,0/2] Introduce new merge-tree-ort command

Message ID	20220105163324.73369-1-chriscool@tuxfamily.org (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> From: Christian Couder <christian.couder@gmail.com> To: git@vger.kernel.org Cc: Junio C Hamano <gitster@pobox.com>, Christian Couder <chriscool@tuxfamily.org>, =?utf-8?b?w4Z2YXIgQXJuZmrDtnI=?= =?utf-8?b?w7AgQmphcm1hc29u?= <avarab@gmail.com>, Taylor Blau <me@ttaylorr.com>, Johannes Schindelin <Johannes.Schindelin@gmx.de>, Elijah Newren <newren@gmail.com> Subject: [RFC PATCH 0/2] Introduce new merge-tree-ort command Date: Wed, 5 Jan 2022 17:33:22 +0100 Message-Id: <20220105163324.73369-1-chriscool@tuxfamily.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Introduce new merge-tree-ort command \| expand [RFC,0/2] Introduce new merge-tree-ort command [RFC,1/2] merge-ort: add new merge-tree-ort command [RFC,2/2] merge-ort: add t/t4310-merge-tree-ort.sh

Christian Couder Jan. 5, 2022, 4:33 p.m. UTC

During the 2nd Virtual Git Contributors’ Summit last October, and even
before, the subject of performing server side merges and rebases came
up, as platforms like GitHub and GitLab would like to support many
features and data formats that libgit2 doesn't support, like for
example SHA256 hashes and partial clone.

It's hard for them to get rid of libgit2 though, because Git itself
doesn't have a good way to support server side merges and rebases,
while libgit2 has ways to perform them. Without server side merges and
rebases, those platforms would have to launch some kind of checkout,
which can be very expensive, before any merge or rebase.

The latest discussions on this topic following the 2nd Virtual
Summit[1] ended with some proposals around a `git merge-tree` on
steroids that could be a good solution to this issue.

The current `git merge-tree` command though seems to have a number of
issues, especially:

  - it's too much related to the old merge recursive strategy which is
    not the default anymore since v2.34.0 and is likely to be
    deprecated over time,

  - it seems to output things in its own special format, which is not
    easy to customize, and which needs special code and logic to parse

To move forward on this, this small RFC patch series introduces a new
`git merge-tree-ort` command with the following design:

  - it uses merge-ort's API as is to perform the merge
  
  - it gets back a tree oid and a cleanliness status from merge-ort's
    API and prints them out first

  - it uses diff's API as is to output changed paths and code

  - the diff API, actually diff_tree_oid() is called 3 times: once for
    the diff versus branch1 ("ours"), once for the diff versus branch2
    ("theirs"), and once for the diff versus the base.

Therefore:

  - its code is very simple and very easy to extend and customize, for
    example by passing diff or merge-ort options that the code would
    just pass on to the merge-ort and diff APIs respectively

  - its output can easily be parsed using simple code and existing
    diff parsers

This of course means that merge-tree-ort's output is not backward
compatible with merge-tree's output, but it doesn't seem that there is
much value in keeping the same output anyway. On the contrary
merge-tree's output is likely to hold us back already.

The first patch in the series adds the new command without any test
and documentation.

The second patch in the series adds a few tests that let us see how
the command's output looks like in different very simple cases.

Of course if this approach is considered valuable, I plan to add some
documentation, more tests and very likely a number of options before
submitting the next iteration.

I am not sure that it's worth showing the 3 diffs (versus branch1,
branch2 and base) by default. Maybe by default no diff at all should
be shown and the command should have --branch1 (or --ours), --branch2
(or --theirs) and --base options to ask for such output, but for an
RFC patch I thought it would be better to output the 3 diffs so that
people get a better idea of the approach this patch series is taking.

[1] https://lore.kernel.org/git/nycvar.QRO.7.76.6.2110211147490.56@tvgsbejvaqbjf.bet/


Christian Couder (2):
  merge-ort: add new merge-tree-ort command
  merge-ort: add t/t4310-merge-tree-ort.sh

 .gitignore                |   1 +
 Makefile                  |   1 +
 builtin.h                 |   1 +
 builtin/merge-tree-ort.c  |  93 ++++++++++++++++++++++
 git.c                     |   1 +
 t/t4310-merge-tree-ort.sh | 162 ++++++++++++++++++++++++++++++++++++++
 6 files changed, 259 insertions(+)
 create mode 100644 builtin/merge-tree-ort.c
 create mode 100755 t/t4310-merge-tree-ort.sh

Elijah Newren Jan. 5, 2022, 4:53 p.m. UTC | #1

On Wed, Jan 5, 2022 at 8:33 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> During the 2nd Virtual Git Contributors’ Summit last October, and even
> before, the subject of performing server side merges and rebases came
> up, as platforms like GitHub and GitLab would like to support many
> features and data formats that libgit2 doesn't support, like for
> example SHA256 hashes and partial clone.
>
> It's hard for them to get rid of libgit2 though, because Git itself
> doesn't have a good way to support server side merges and rebases,
> while libgit2 has ways to perform them. Without server side merges and
> rebases, those platforms would have to launch some kind of checkout,
> which can be very expensive, before any merge or rebase.
>
> The latest discussions on this topic following the 2nd Virtual
> Summit[1] ended with some proposals around a `git merge-tree` on
> steroids that could be a good solution to this issue.
>
> The current `git merge-tree` command though seems to have a number of
> issues, especially:
>
>   - it's too much related to the old merge recursive strategy which is
>     not the default anymore since v2.34.0 and is likely to be
>     deprecated over time,
>
>   - it seems to output things in its own special format, which is not
>     easy to customize, and which needs special code and logic to parse

I agree we don't want those...but why would new merge-tree options
have to use the old merge strategy or the old output format?

> To move forward on this, this small RFC patch series introduces a new
> `git merge-tree-ort` command with the following design:

Slightly dislike the command name.  `ort` was meant as a temporary,
internal name.  I don't think it's very meaningful to users, so I was
hoping to just make `recursive` mean `ort` after we had enough
testing, and to delete merge-recursive.[ch] at that time.  Then `ort`
merely becomes a historical footnote (...and perhaps part of the name
of the file where the `recursive` algorithm is implemented).

>   - it uses merge-ort's API as is to perform the merge
>
>   - it gets back a tree oid and a cleanliness status from merge-ort's
>     API and prints them out first

Good so far.

>
>   - it uses diff's API as is to output changed paths and code
>
>   - the diff API, actually diff_tree_oid() is called 3 times: once for
>     the diff versus branch1 ("ours"), once for the diff versus branch2
>     ("theirs"), and once for the diff versus the base.

Why?  That seems to be a performance penalty for anyone that doesn't
want/need the diffs, and since we return a tree, a caller can go and
get whatever diffs they like.

> Therefore:
>
>   - its code is very simple and very easy to extend and customize, for
>     example by passing diff or merge-ort options that the code would
>     just pass on to the merge-ort and diff APIs respectively
>
>   - its output can easily be parsed using simple code

These points are good.

>     and existing diff parsers
>
> This of course means that merge-tree-ort's output is not backward
> compatible with merge-tree's output, but it doesn't seem that there is
> much value in keeping the same output anyway. On the contrary
> merge-tree's output is likely to hold us back already.
>
> The first patch in the series adds the new command without any test
> and documentation.
>
> The second patch in the series adds a few tests that let us see how
> the command's output looks like in different very simple cases.
>
> Of course if this approach is considered valuable, I plan to add some
> documentation, more tests and very likely a number of options before
> submitting the next iteration.

Was there something you didn't like about
https://lore.kernel.org/git/pull.1114.git.git.1640927044.gitgitgadget@gmail.com/?

> I am not sure that it's worth showing the 3 diffs (versus branch1,
> branch2 and base) by default. Maybe by default no diff at all should
> be shown and the command should have --branch1 (or --ours), --branch2
> (or --theirs) and --base options to ask for such output, but for an
> RFC patch I thought it would be better to output the 3 diffs so that
> people get a better idea of the approach this patch series is taking.

I think not showing, neither by default or at all would be better.
All three of these are things users could easily generate for
themselves with the tree we return.  I'm curious, though, what's the
usecase for wanting these specific diffs?

Two things you didn't return that users cannot get any other way: (1)
conflict and warning messages, (2) list of conflicted paths.

> [1] https://lore.kernel.org/git/nycvar.QRO.7.76.6.2110211147490.56@tvgsbejvaqbjf.bet/
>
>
> Christian Couder (2):
>   merge-ort: add new merge-tree-ort command
>   merge-ort: add t/t4310-merge-tree-ort.sh
>
>  .gitignore                |   1 +
>  Makefile                  |   1 +
>  builtin.h                 |   1 +
>  builtin/merge-tree-ort.c  |  93 ++++++++++++++++++++++
>  git.c                     |   1 +
>  t/t4310-merge-tree-ort.sh | 162 ++++++++++++++++++++++++++++++++++++++
>  6 files changed, 259 insertions(+)
>  create mode 100644 builtin/merge-tree-ort.c
>  create mode 100755 t/t4310-merge-tree-ort.sh
>
> --
> 2.34.1.433.g7bc349372a.dirty

Elijah Newren Jan. 5, 2022, 5:32 p.m. UTC | #2

On Wed, Jan 5, 2022 at 8:53 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Wed, Jan 5, 2022 at 8:33 AM Christian Couder
> <christian.couder@gmail.com> wrote:
> >
> > During the 2nd Virtual Git Contributors’ Summit last October, and even
> > before, the subject of performing server side merges and rebases came
> > up, as platforms like GitHub and GitLab would like to support many
> > features and data formats that libgit2 doesn't support, like for
> > example SHA256 hashes and partial clone.
> >
...
> > The first patch in the series adds the new command without any test
> > and documentation.
> >
> > The second patch in the series adds a few tests that let us see how
> > the command's output looks like in different very simple cases.
> >
> > Of course if this approach is considered valuable, I plan to add some
> > documentation, more tests and very likely a number of options before
> > submitting the next iteration.
>
> Was there something you didn't like about
> https://lore.kernel.org/git/pull.1114.git.git.1640927044.gitgitgadget@gmail.com/?

Since I already had some fixes queued up for that series based on
feedback from Johannes Altmanninger and Fabian Stelzer, I decided to
submit those so you wouldn't have to stumble over the same problems.
So see https://lore.kernel.org/git/pull.1114.v2.git.git.1641403655.gitgitgadget@gmail.com/
instead (which also now has a pointer to this series in the cover
letter).

Christian Couder Jan. 7, 2022, 5:58 p.m. UTC | #3

On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Wed, Jan 5, 2022 at 8:33 AM Christian Couder
> <christian.couder@gmail.com> wrote:

> > The current `git merge-tree` command though seems to have a number of
> > issues, especially:
> >
> >   - it's too much related to the old merge recursive strategy which is
> >     not the default anymore since v2.34.0 and is likely to be
> >     deprecated over time,
> >
> >   - it seems to output things in its own special format, which is not
> >     easy to customize, and which needs special code and logic to parse
>
> I agree we don't want those...but why would new merge-tree options
> have to use the old merge strategy or the old output format?

Yeah, it's not necessary if there are 2 separate modes, a "real" mode
(like what you implemented with --real in your recent patch series)
and a "trivial" mode (which is the name you give to the old code).

Adding modes like this to a command is likely to make the command and
its documentation more difficult to understand though. For example I
think that we created `git switch` because the different modes of `git
checkout` made the command hard to learn.

I gave other reasons in [1] why I prefer a separate command.

[1] https://lore.kernel.org/git/CAP8UFD1LgfZ0MT9=cMvxCcox++_MBBhWG9Twf42cMiXL42AdpQ@mail.gmail.com/

> > To move forward on this, this small RFC patch series introduces a new
> > `git merge-tree-ort` command with the following design:
>
> Slightly dislike the command name.

I am ok with changing the command name.

> `ort` was meant as a temporary,
> internal name.  I don't think it's very meaningful to users, so I was
> hoping to just make `recursive` mean `ort` after we had enough
> testing, and to delete merge-recursive.[ch] at that time.  Then `ort`
> merely becomes a historical footnote (...and perhaps part of the name
> of the file where the `recursive` algorithm is implemented).

I think something similar could still be done with `git
merge-tree-ort` or whatever name we give to this command. For example
we could first add --ort to `git merge-tree` and make it call `git
merge-tree-ort`, then after some time make --ort the default, then
after some more time remove `git merge-tree-ort`. And whenever we make
those changes we could also rename the builtin/merge-tree*.{h,c}
accordingly.

> >   - it uses merge-ort's API as is to perform the merge
> >
> >   - it gets back a tree oid and a cleanliness status from merge-ort's
> >     API and prints them out first
>
> Good so far.
>
> >   - it uses diff's API as is to output changed paths and code
> >
> >   - the diff API, actually diff_tree_oid() is called 3 times: once for
> >     the diff versus branch1 ("ours"), once for the diff versus branch2
> >     ("theirs"), and once for the diff versus the base.
>
> Why?  That seems to be a performance penalty for anyone that doesn't
> want/need the diffs, and since we return a tree, a caller can go and
> get whatever diffs they like.

I say somewhere else that I am planning to add options to disable this
or parts of this diff output.

I think it's still interesting for the command to be able to output
diffs, especially diffs of conflicting parts. In [2] Ævar said:

=> I.e. I'm not the first or last to have (not for anything serious)
=> implement a dry-run bare-repo merge with something like:
=>
=>     git merge-tree origin/master git-for-windows/main origin/seen >diff
=>     # Better regex needed, but basically this
=>     grep "^\+<<<<<<< \.our$" diff && conflict=t

Also `git merge-tree` currently outputs diffs, so I thought it would
be sad if the new command couldn't do it.

[2] https://lore.kernel.org/git/211109.861r3qdpt8.gmgdl@evledraar.gmail.com/

> > Therefore:
> >
> >   - its code is very simple and very easy to extend and customize, for
> >     example by passing diff or merge-ort options that the code would
> >     just pass on to the merge-ort and diff APIs respectively
> >
> >   - its output can easily be parsed using simple code
>
> These points are good.
>
> >     and existing diff parsers
> >
> > This of course means that merge-tree-ort's output is not backward
> > compatible with merge-tree's output, but it doesn't seem that there is
> > much value in keeping the same output anyway. On the contrary
> > merge-tree's output is likely to hold us back already.
> >
> > The first patch in the series adds the new command without any test
> > and documentation.
> >
> > The second patch in the series adds a few tests that let us see how
> > the command's output looks like in different very simple cases.
> >
> > Of course if this approach is considered valuable, I plan to add some
> > documentation, more tests and very likely a number of options before
> > submitting the next iteration.
>
> Was there something you didn't like about
> https://lore.kernel.org/git/pull.1114.git.git.1640927044.gitgitgadget@gmail.com/?

I was having a vacation at the time and even though I skimmed the
mailing list, I missed it. Sorry.

Also I thought that you might not be interested in this anymore as you
didn't reply to [1] and [2] where Ævar and I both said that we were
interested in your opinion on a git merge-tree on steroids.

> > I am not sure that it's worth showing the 3 diffs (versus branch1,
> > branch2 and base) by default. Maybe by default no diff at all should
> > be shown and the command should have --branch1 (or --ours), --branch2
> > (or --theirs) and --base options to ask for such output, but for an
> > RFC patch I thought it would be better to output the 3 diffs so that
> > people get a better idea of the approach this patch series is taking.
>
> I think not showing, neither by default

I am ok with not showing them by default.

> or at all would be better.
> All three of these are things users could easily generate for
> themselves with the tree we return.  I'm curious, though, what's the
> usecase for wanting these specific diffs?

I think I replied to this above.

> Two things you didn't return that users cannot get any other way: (1)
> conflict and warning messages, (2) list of conflicted paths.

Yeah, I wasn't sure how they could be returned with the merge-ort (or
maybe diff) API, and I thought that the current `git merge-tree`
doesn't report them, so I was aiming for something roughly just as
powerful as the current `git merge-tree`.

Elijah Newren Jan. 7, 2022, 7:06 p.m. UTC | #4

On Fri, Jan 7, 2022 at 9:58 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@gmail.com> wrote:
> >
> > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder
> > <christian.couder@gmail.com> wrote:
>
> > > The current `git merge-tree` command though seems to have a number of
> > > issues, especially:
> > >
> > >   - it's too much related to the old merge recursive strategy which is
> > >     not the default anymore since v2.34.0 and is likely to be
> > >     deprecated over time,
> > >
> > >   - it seems to output things in its own special format, which is not
> > >     easy to customize, and which needs special code and logic to parse
> >
> > I agree we don't want those...but why would new merge-tree options
> > have to use the old merge strategy or the old output format?
>
> Yeah, it's not necessary if there are 2 separate modes, a "real" mode
> (like what you implemented with --real in your recent patch series)
> and a "trivial" mode (which is the name you give to the old code).
>
> Adding modes like this to a command is likely to make the command and
> its documentation more difficult to understand though. For example I
> think that we created `git switch` because the different modes of `git
> checkout` made the command hard to learn.
>
> I gave other reasons in [1] why I prefer a separate command.
>
> [1] https://lore.kernel.org/git/CAP8UFD1LgfZ0MT9=cMvxCcox++_MBBhWG9Twf42cMiXL42AdpQ@mail.gmail.com/

I can see where you're coming from, but I think the particular
comparison you use isn't quite apples to apples.  `git checkout` has a
"change branches" mode and a "revert specific paths" mode.  While
those have significant implementation overlap, they seem like
different concepts to users.  For merge-tree, the implementation is
completely orthogonal between the two modes, but the concept is
basically the same from the user viewpoint.  Yes, the output differs
in the two merge-tree modes, but command line arguments are often used
to tweak the output (much like diff has different output styles based
on various flags).  In fact, in [1] where you suggest a new command
you suggest that it should have "roughly the same features as git
merge-tree and a similar interface".  To me, that suggests that the
two may be good candidates for being similar commands.

That said, I'm not against a new command.  Personally, my main reason
for using merge-tree wasn't because that's where I thought it was best
to put it, but just that (IIRC) each of Dscho, Peff, and Junio
suggested that location.

My biggest gripe was just the command name...

> > > To move forward on this, this small RFC patch series introduces a new
> > > `git merge-tree-ort` command with the following design:
> >
> > Slightly dislike the command name.
>
> I am ok with changing the command name.

:-)

> > `ort` was meant as a temporary,
> > internal name.  I don't think it's very meaningful to users, so I was
> > hoping to just make `recursive` mean `ort` after we had enough
> > testing, and to delete merge-recursive.[ch] at that time.  Then `ort`
> > merely becomes a historical footnote (...and perhaps part of the name
> > of the file where the `recursive` algorithm is implemented).
>
> I think something similar could still be done with `git
> merge-tree-ort` or whatever name we give to this command. For example
> we could first add --ort to `git merge-tree` and make it call `git
> merge-tree-ort`, then after some time make --ort the default, then
> after some more time remove `git merge-tree-ort`. And whenever we make
> those changes we could also rename the builtin/merge-tree*.{h,c}
> accordingly.

User-facing names tend to take a while to remove; I'd rather start
with user-facing command and option names that are meaningful to users
in terms of what they want to accomplish.  I don't think `ort`
qualifies as such; I'd rather `ort` only be used by expert users (e.g.
folks wanting to test out a new merge algorithm they heard about
before it became the default).

> > >   - it uses merge-ort's API as is to perform the merge
> > >
> > >   - it gets back a tree oid and a cleanliness status from merge-ort's
> > >     API and prints them out first
> >
> > Good so far.
> >
> > >   - it uses diff's API as is to output changed paths and code
> > >
> > >   - the diff API, actually diff_tree_oid() is called 3 times: once for
> > >     the diff versus branch1 ("ours"), once for the diff versus branch2
> > >     ("theirs"), and once for the diff versus the base.
> >
> > Why?  That seems to be a performance penalty for anyone that doesn't
> > want/need the diffs, and since we return a tree, a caller can go and
> > get whatever diffs they like.
>
> I say somewhere else that I am planning to add options to disable this
> or parts of this diff output.
>
> I think it's still interesting for the command to be able to output
> diffs, especially diffs of conflicting parts.

Are you presuming that you can output a diff of conflicting parts?  A
diff on a modify/delete or can't-merge-binary-files won't show
anything, and won't provide any notification that there even was a
conflict.

> In [2] Ævar said:
>
> => I.e. I'm not the first or last to have (not for anything serious)
> => implement a dry-run bare-repo merge with something like:
> =>
> =>     git merge-tree origin/master git-for-windows/main origin/seen >diff
> =>     # Better regex needed, but basically this
> =>     grep "^\+<<<<<<< \.our$" diff && conflict=t

This code is pretty hacky; it presumes that content conflicts are the
only type of conflict.  Inability to merge binary files,
file/directory conflicts, mode conflicts, modify/delete conflicts,
various rename conflicts, etc. are all other types.  I'd rather not
encourage such hacks.

Returning whether there were conflicts in the exit status is far
cleaner.  Providing some mechanism for getting a list of files which
had conflicts may also be useful, but both of these are things where a
diff is the wrong tool.

> Also `git merge-tree` currently outputs diffs, so I thought it would
> be sad if the new command couldn't do it.
>
> [2] https://lore.kernel.org/git/211109.861r3qdpt8.gmgdl@evledraar.gmail.com/

Hmm, I had a totally different opinion.  I felt the diffs in the
current merge-tree was a hack to deal with the fact that they didn't
have good tools to make use of the results, and ended up providing a
Rube-Goldberg scheme to do anything useful with it.

Providing a tree is a concrete and easily usable object for end users.
They can feed that tree to other git commands to do additional things,
and obviates the need for the Rube-Goldberg contraption.

> > Was there something you didn't like about
> > https://lore.kernel.org/git/pull.1114.git.git.1640927044.gitgitgadget@gmail.com/?
>
> I was having a vacation at the time and even though I skimmed the
> mailing list, I missed it. Sorry.

Ah, gotcha.  I was just curious if there was something you felt was
wrong and you wanted to propose an alternative.

> Also I thought that you might not be interested in this anymore as you
> didn't reply to [1] and [2] where Ævar and I both said that we were
> interested in your opinion on a git merge-tree on steroids.

Sorry,  I saw Ævar's email and noticed various assumptions I felt were
problematic, but noticed he said he was just spitballing (which is
totally fair).  I figured that rather than attempt to explain each
detail, it'd be better to update my code and provide it...and I then
ignored the rest of the thread and missed your question.  Sorry.  I
certainly meant to update this series and send it out earlier, but
there were multiple other series of mine that were also backed up and
several series I needed to review, so it took me a while to clean out
the queue a bit.

Johannes Schindelin Jan. 7, 2022, 7:54 p.m. UTC | #5

Hi Christian,

On Fri, 7 Jan 2022, Christian Couder wrote:

> On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@gmail.com> wrote:
> >
> > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder
> > <christian.couder@gmail.com> wrote:
>
> > > To move forward on this, this small RFC patch series introduces a
> > > new `git merge-tree-ort` command with the following design:
> >
> > Slightly dislike the command name.
>
> I am ok with changing the command name.

Or just going with `merge-tree`?

That command name has the distinct advantage of _already_ being used by
Git. So there is no chance for users to have a `git-merge-tree` script
lying around that all of a sudden would stop working because a superseding
built-in using the same name is introduced.

That is a distinct advantage of using the existing command for the new
code flow, even if the command name could be interpreted as misleading. Of
course, the name could also be construed to be on point: it merges, and it
produces a tree, hence "merge-tree".

> > >   - the diff API, actually diff_tree_oid() is called 3 times: once for
> > >     the diff versus branch1 ("ours"), once for the diff versus branch2
> > >     ("theirs"), and once for the diff versus the base.
> >
> > Why?  That seems to be a performance penalty for anyone that doesn't
> > want/need the diffs, and since we return a tree, a caller can go and
> > get whatever diffs they like.
>
> I say somewhere else that I am planning to add options to disable this
> or parts of this diff output.

Well, now you got me really curious. Since you are listed as part of
GitLab (https://about.gitlab.com/company/team/#chriscool), I assume that
you are pretty familiar with how merges are done on GitLab's server side,
and plan on experimenting with your own work to use `merge-ort` on
GitLab's servers.

Which makes me wonder where that idea comes from to use the diff API?

What operations does GitLab need from that Git command, which code flows
need to be supported (and what inputs/outputs are there)?

> I think it's still interesting for the command to be able to output
> diffs, especially diffs of conflicting parts. In [2] Ævar said:
>
> => I.e. I'm not the first or last to have (not for anything serious)
> => implement a dry-run bare-repo merge with something like:
> =>
> =>     git merge-tree origin/master git-for-windows/main origin/seen >diff
> =>     # Better regex needed, but basically this
> =>     grep "^\+<<<<<<< \.our$" diff && conflict=t
>
> Also `git merge-tree` currently outputs diffs, so I thought it would
> be sad if the new command couldn't do it.
>
> [2] https://lore.kernel.org/git/211109.861r3qdpt8.gmgdl@evledraar.gmail.com/

But `git merge-tree` was only meant as a proof-of-concept, intended to
entice other developers to turn it into something useful. As it took
_this_ long, we can safely assume that the user interface of `merge-tree`
left a bit of a room for improvement. Wouldn't you agree?

> Also I thought that you might not be interested in this anymore as you
> didn't reply to [1] and [2] where Ævar and I both said that we were
> interested in your opinion on a git merge-tree on steroids.

Elijah stated at the time that he was scarce on time for Git.

But back to the topic: have you played with using `merge-ort` in GitLab
already? What were the needs you saw that would be required of the Git
command, what flows would it have to support?

Ciao,
Johannes

Johannes Schindelin Jan. 10, 2022, 1:49 p.m. UTC | #6

Hi Elijah,

On Fri, 7 Jan 2022, Elijah Newren wrote:

> On Fri, Jan 7, 2022 at 9:58 AM Christian Couder
> <christian.couder@gmail.com> wrote:
> >
> > On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@gmail.com> wrote:
> > >
> > > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder
> > > <christian.couder@gmail.com> wrote:
> >
> > > > The current `git merge-tree` command though seems to have a number of
> > > > issues, especially:
> > > >
> > > >   - it's too much related to the old merge recursive strategy which is
> > > >     not the default anymore since v2.34.0 and is likely to be
> > > >     deprecated over time,
> > > >
> > > >   - it seems to output things in its own special format, which is not
> > > >     easy to customize, and which needs special code and logic to parse
> > >
> > > I agree we don't want those...but why would new merge-tree options
> > > have to use the old merge strategy or the old output format?
> >
> > Yeah, it's not necessary if there are 2 separate modes, a "real" mode
> > (like what you implemented with --real in your recent patch series)
> > and a "trivial" mode (which is the name you give to the old code).
> >
> > Adding modes like this to a command is likely to make the command and
> > its documentation more difficult to understand though. For example I
> > think that we created `git switch` because the different modes of `git
> > checkout` made the command hard to learn.
> >
> > I gave other reasons in [1] why I prefer a separate command.
> >
> > [1] https://lore.kernel.org/git/CAP8UFD1LgfZ0MT9=cMvxCcox++_MBBhWG9Twf42cMiXL42AdpQ@mail.gmail.com/
>
> I can see where you're coming from, but I think the particular
> comparison you use isn't quite apples to apples.  `git checkout` has a
> "change branches" mode and a "revert specific paths" mode.  While
> those have significant implementation overlap, they seem like
> different concepts to users.  For merge-tree, the implementation is
> completely orthogonal between the two modes, but the concept is
> basically the same from the user viewpoint.  Yes, the output differs
> in the two merge-tree modes, but command line arguments are often used
> to tweak the output (much like diff has different output styles based
> on various flags).  In fact, in [1] where you suggest a new command
> you suggest that it should have "roughly the same features as git
> merge-tree and a similar interface".  To me, that suggests that the
> two may be good candidates for being similar commands.
>
> That said, I'm not against a new command.  Personally, my main reason
> for using merge-tree wasn't because that's where I thought it was best
> to put it, but just that (IIRC) each of Dscho, Peff, and Junio
> suggested that location.
>
> My biggest gripe was just the command name...

I am against a new command for what essentially serves the original
purpose of `merge-tree`.

The fact that `merge-tree` has not seen any work in almost 12 years is
testament not only to how hard it was to disentangle the work-tree
requirement from the recursive merge (it is one of my favorite
counterexamples when anybody claims that you can easily prototype code in
a script and then convert it to C), but the fact that there is no user
within Git itself (apart from t/t4300-merge-tree.sh, which does not count)
speaks volumes about the design of that `merge-tree` tool.

So it's only fair to breathe life into it by letting it do what it was
meant to do all along.

> > Also `git merge-tree` currently outputs diffs, so I thought it would
> > be sad if the new command couldn't do it.
> >
> > [2] https://lore.kernel.org/git/211109.861r3qdpt8.gmgdl@evledraar.gmail.com/
>
> Hmm, I had a totally different opinion.  I felt the diffs in the
> current merge-tree was a hack to deal with the fact that they didn't
> have good tools to make use of the results, and ended up providing a
> Rube-Goldberg scheme to do anything useful with it.

Indeed. When I had a look how libgit2 is used to perform a server-side
merge, I saw how careful the code is not to produce anything unneeded. And
since the `merge` operation is performed a ton of times even without any
user interaction, producing a diff is the _least_ of said code's concerns.

> Providing a tree is a concrete and easily usable object for end users.
> They can feed that tree to other git commands to do additional things,
> and obviates the need for the Rube-Goldberg contraption.

I strongly concur.

So I would like to reiterate my challenge to you, Christian: could you
have a look at the server-side merge of GitLab, and see what it actually
would need of the `git merge-tree` command?

I _bet_ it needs first and foremost the information "is this mergeable?".

That is by far the most common question the code I saw had to answer,
without any follow-up questions asked.

An add-on question is then "which files/directories conflicted?". And
there, it really only wanted the file names, along with the OIDs of the
respective item in the base, the HEAD and the merge branch.

In my work in December, I also had to implement another thing that I think
we will have to implement in `merge-tree` in one form or another: when
users provided merge conflict resolutions via the web UI, we want the
merge to succeed. What I implemented was this (in a nutshell, a way to
provide file names with associated blob OIDs that resolve the content
conflicts):

-- snip --
Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Date:   Thu Dec 16 20:46:35 2021 +0100
Subject: merge-ort: allow pre-resolving merge conflicts

One of merge-ort's particular strengths is that it works well even in a
worktree-less setting, e.g. in a backend for a server-side merge.

In such a scenario, it is conceivable that the merge in question not
only results in a conflict, but that the caller figures out some sort of
resolution before calling the merge again. The second call, of course,
is meant to apply the merge conflict resolution.

With this commit, we allow just that. The new, optional
`pre_resolved_conflicts` field of `struct merge_options` maps paths to
the blob OID of the resolution.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

diff --git a/merge-ort.c b/merge-ort.c
index a74d4251c3..fa59ce2f97 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -3961,6 +3961,7 @@ static void process_entries(struct merge_options *opt,
 	prefetch_for_content_merges(opt, &plist);
 	for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
 		char *path = entry->string;
+		struct object_id *resolved_oid;
 		/*
 		 * NOTE: mi may actually be a pointer to a conflict_info, but
 		 * we have to check mi->clean first to see if it's safe to
@@ -3972,7 +3973,17 @@ static void process_entries(struct merge_options *opt,
 					  &dir_metadata);
 		if (mi->clean)
 			record_entry_for_tree(&dir_metadata, path, mi);
-		else {
+		else if (opt->pre_resolved_conflicts &&
+			 (resolved_oid =
+			  strmap_get(opt->pre_resolved_conflicts, path))) {
+			mi->clean = 1;
+			mi->is_null = 0;
+			memcpy(&mi->result.oid, resolved_oid,
+			       sizeof(*resolved_oid));
+			if (!mi->result.mode)
+				mi->result.mode = 0100644;
+			record_entry_for_tree(&dir_metadata, path, mi);
+		} else {
 			struct conflict_info *ci = (struct conflict_info *)mi;
 			process_entry(opt, path, ci, &dir_metadata);
 		}
diff --git a/merge-recursive.h b/merge-recursive.h
index 0795a1d3ec..1f45effdd0 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -47,6 +47,13 @@ struct merge_options {
 	const char *subtree_shift;
 	unsigned renormalize : 1;

+	/*
+	 * (merge-ORT only) If non-NULL, contains a map from `path` to OID. If
+	 * the given `path would be marked as conflict, it is instead resolved
+	 * to the specified blob.
+	 */
+	struct strmap *pre_resolved_conflicts;
+
 	/* internal fields used by the implementation */
 	struct merge_options_internal *priv;
 };
-- snap --

It is a proof-of-concept, therefore it expects the resolved conflicts to
be in _files_. I don't think that there is a way to reasonably handle
symlink target conflicts or directory/file/symlink conflicts, but there
might be.

A subsequent commit changed my hacky `merge-tree` hack to optionally
accept NUL-terminated merge conflict resolutions in <path>/<OID> pairs via
stdin (again, avoiding files to be written where we do not _need_ spend
I/O unnecessarily).

The biggest problem we faced at the Contributor Summit was that our
discussion was not informed by the actual server-side needs. So I would
like to reiterate my challenge to make that the driver. Let's not use any
hypothetical scenario as the basis for the design of `git merge-tree`, but
let's use the concrete requirements of actual server-side merges that are
currently implemented using libgit2, when trying to figure out what
functionality we need from `merge-tree`, and in what shape.

Here is an initial list:

- need to determine whether a merge will succeed, quickly

- need the tree OID for a successful merge

- need the list of items that conflict, along with OIDs and modes, if the
  merge failed

- need a way to provide merge conflict resolutions

And now that I wrote all this, I realize I should have sent it to the
`merge-tree` thread, not the `merge-tree-ort` thread...

Ciao,
Dscho

Junio C Hamano Jan. 10, 2022, 5:56 p.m. UTC | #7

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> I am against a new command for what essentially serves the original
> purpose of `merge-tree`.
>
> The fact that `merge-tree` has not seen any work in almost 12 years is
> testament not only to how hard it was to disentangle the work-tree
> requirement from the recursive merge (it is one of my favorite
> counterexamples when anybody claims that you can easily prototype code in
> a script and then convert it to C), but the fact that there is no user
> within Git itself (apart from t/t4300-merge-tree.sh, which does not count)
> speaks volumes about the design of that `merge-tree` tool.
>
> So it's only fair to breathe life into it by letting it do what it was
> meant to do all along.

My "Yup" would not weigh as much as one that Linus (whose original
merge-tree survived this long without seeing much enhancements)
might give us, but he is busy elsewhere so you guys have to live
with mine ;-)

As to its output, I do agree that "we give a tree when it is already
usable to record in a new commit" is a valuable option to have.  The
original behaviour should be made available somehow, for those who
built their workflow (including scripts) around it, though.

Elijah Newren Jan. 10, 2022, 5:59 p.m. UTC | #8

On Mon, Jan 10, 2022 at 5:49 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Elijah,
>
> On Fri, 7 Jan 2022, Elijah Newren wrote:
>
> > On Fri, Jan 7, 2022 at 9:58 AM Christian Couder
> > <christian.couder@gmail.com> wrote:
> > >
> > > On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@gmail.com> wrote:
> > > >
> > > > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder
> > > > <christian.couder@gmail.com> wrote:
> > >
...
>
> I _bet_ it needs first and foremost the information "is this mergeable?".
>
> That is by far the most common question the code I saw had to answer,
> without any follow-up questions asked.
>
> An add-on question is then "which files/directories conflicted?". And
> there, it really only wanted the file names, along with the OIDs of the
> respective item in the base, the HEAD and the merge branch.

This might be difficult to provide for some cases, e.g.
rename/rename(1to2), especially if that conflict gets entangled with
any others.  Users would also have difficulty gaining these even using
the command line `git merge` (with either recursive or ort) for these
cases.

Also, does this presume three OIDs?  What about the cases where there
are 4 or 6 for a given path?

I'm a bit worried about the assumptions underlying this request, and
that it might not be satisfiable in general depending on what those
assumptions are.

> In my work in December, I also had to implement another thing that I think
> we will have to implement in `merge-tree` in one form or another: when
> users provided merge conflict resolutions via the web UI, we want the
> merge to succeed. What I implemented was this (in a nutshell, a way to
> provide file names with associated blob OIDs that resolve the content
> conflicts):

Interesting.  I'm curious, though -- if they are providing conflict
resolutions, then would they not have had a previous merge to find out
what the conflicts were?  If so, wouldn't they be able to provide the
tree to avoid the need for a second merge?

>
> -- snip --
> Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
> Date:   Thu Dec 16 20:46:35 2021 +0100
> Subject: merge-ort: allow pre-resolving merge conflicts
>
> One of merge-ort's particular strengths is that it works well even in a
> worktree-less setting, e.g. in a backend for a server-side merge.
>
> In such a scenario, it is conceivable that the merge in question not
> only results in a conflict, but that the caller figures out some sort of
> resolution before calling the merge again. The second call, of course,
> is meant to apply the merge conflict resolution.
>
> With this commit, we allow just that. The new, optional
> `pre_resolved_conflicts` field of `struct merge_options` maps paths to
> the blob OID of the resolution.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> diff --git a/merge-ort.c b/merge-ort.c
> index a74d4251c3..fa59ce2f97 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -3961,6 +3961,7 @@ static void process_entries(struct merge_options *opt,
>         prefetch_for_content_merges(opt, &plist);
>         for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
>                 char *path = entry->string;
> +               struct object_id *resolved_oid;
>                 /*
>                  * NOTE: mi may actually be a pointer to a conflict_info, but
>                  * we have to check mi->clean first to see if it's safe to
> @@ -3972,7 +3973,17 @@ static void process_entries(struct merge_options *opt,
>                                           &dir_metadata);
>                 if (mi->clean)
>                         record_entry_for_tree(&dir_metadata, path, mi);
> -               else {
> +               else if (opt->pre_resolved_conflicts &&
> +                        (resolved_oid =
> +                         strmap_get(opt->pre_resolved_conflicts, path))) {

You have a couple problematic assumptions here:

  * What if the user's resolution required fixing a semantic conflict,
meaning they needed to modify a file that had no conflicts?  Your code
here would ignore those, because the clean case is handled above.

  * What if the user's resolution required adding a totally new file
(either because a rename is handled as a separate add/delete, or they
just needed a new file)?  The loop above isn't over items in
pre_resolved_conflicts, it just assumes that items in
pre_resolved_conflicts are also in the plist.items being looped over.

> +                       mi->clean = 1;
> +                       mi->is_null = 0;
> +                       memcpy(&mi->result.oid, resolved_oid,
> +                              sizeof(*resolved_oid));

And here's another:

  * What if the provided resolution was a deletion of the file in
question (especially after e.g. a modify/delete or rename/delete
conflict)?  Don't you need to have a special check for the zero_oid
here?

And if I combine the things above:

  * What if the user wants to remove a file and add a directory of
files in its place at some location in order to resolve things?
Granted, you didn't handle either deletes or new files above, but if
you did add both, then you might have this issue.  The code at this
point used some very carefully constructed logic and order of walking
items specifically to handle file/delete conflicts correctly.  I'm
worried that your conflict resolution stuff here might interact badly
with that.

> +                       if (!mi->result.mode)
> +                               mi->result.mode = 0100644;

How do you know it's not supposed to be 0100755?

> +                       record_entry_for_tree(&dir_metadata, path, mi);
> +               } else {
>                         struct conflict_info *ci = (struct conflict_info *)mi;
>                         process_entry(opt, path, ci, &dir_metadata);
>                 }
> diff --git a/merge-recursive.h b/merge-recursive.h
> index 0795a1d3ec..1f45effdd0 100644
> --- a/merge-recursive.h
> +++ b/merge-recursive.h
> @@ -47,6 +47,13 @@ struct merge_options {
>         const char *subtree_shift;
>         unsigned renormalize : 1;
>
> +       /*
> +        * (merge-ORT only) If non-NULL, contains a map from `path` to OID. If
> +        * the given `path would be marked as conflict, it is instead resolved
> +        * to the specified blob.
> +        */
> +       struct strmap *pre_resolved_conflicts;
> +
>         /* internal fields used by the implementation */
>         struct merge_options_internal *priv;
>  };
> -- snap --
>
> It is a proof-of-concept

Fair enough.

>, therefore it expects the resolved conflicts to
> be in _files_. I don't think that there is a way to reasonably handle
> symlink target conflicts or directory/file/symlink conflicts, but there
> might be.

You really need (mode,oid) pairs to be provided by the user.  That
fixes the executable issue I mentioned above, and makes it clear how
to handle symlinks and file/symlink conflicts.

directory/file are still handled by providing individual files, but
ordering traversal becomes really tricky.  The directory/file case
might even require the pre_resolved_conflicts to be pulled out of that
loop somehow.  It'd take some investigative work, or some deep
thought, or both.

> A subsequent commit changed my hacky `merge-tree` hack to optionally
> accept NUL-terminated merge conflict resolutions in <path>/<OID> pairs via
> stdin (again, avoiding files to be written where we do not _need_ spend
> I/O unnecessarily).
>
> The biggest problem we faced at the Contributor Summit was that our
> discussion was not informed by the actual server-side needs. So I would
> like to reiterate my challenge to make that the driver. Let's not use any
> hypothetical scenario as the basis for the design of `git merge-tree`, but
> let's use the concrete requirements of actual server-side merges that are
> currently implemented using libgit2, when trying to figure out what
> functionality we need from `merge-tree`, and in what shape.
>
> Here is an initial list:
>
> - need to determine whether a merge will succeed, quickly
>
> - need the tree OID for a successful merge
>
> - need the list of items that conflict...

I think my version is good up to here.

>  , along with OIDs and modes, if the merge failed

Could you clarify what you mean here by OIDs and modes?  For a given
path, are you just looking for a (pathname, OID, mode) tuple?  (In
which case, you'd get the OID of a file that potentially has embedded
conflict markers)

Or are you looking for multiple OIDs and modes for each file?

If you are looking for multiple OIDs and modes for each file
(representing where the content came from on the different sides of
the merge), are you looking for:
  * the OID and mode of each file that played a part in the merge resolution
OR
  * just the OIDs and modes that would have been recorded in the index
had the merge been done by `git merge`?

(Those last two possibilities are usually the same answer, but no they
are not always the same.  The index cannot hold all the OIDs and modes
in various cases and we have to squash them down to three, possibly
tossing some information that might have been of interest to the user.
This can even occur when you have a unique merge base.)

>
> - need a way to provide merge conflict resolutions
>
> And now that I wrote all this, I realize I should have sent it to the
> `merge-tree` thread, not the `merge-tree-ort` thread...

My submission was RFC and you're providing C, so it's all good in my
book.  I'm reading both threads.  :-)

Johannes Schindelin Jan. 11, 2022, 1:47 p.m. UTC | #9

Hi Junio,

On Mon, 10 Jan 2022, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > I am against a new command for what essentially serves the original
> > purpose of `merge-tree`.
> >
> > The fact that `merge-tree` has not seen any work in almost 12 years is
> > testament not only to how hard it was to disentangle the work-tree
> > requirement from the recursive merge (it is one of my favorite
> > counterexamples when anybody claims that you can easily prototype code in
> > a script and then convert it to C), but the fact that there is no user
> > within Git itself (apart from t/t4300-merge-tree.sh, which does not count)
> > speaks volumes about the design of that `merge-tree` tool.
> >
> > So it's only fair to breathe life into it by letting it do what it was
> > meant to do all along.
>
> My "Yup" would not weigh as much as one that Linus (whose original
> merge-tree survived this long without seeing much enhancements)
> might give us, but he is busy elsewhere so you guys have to live
> with mine ;-)
>
> As to its output, I do agree that "we give a tree when it is already
> usable to record in a new commit" is a valuable option to have.  The
> original behaviour should be made available somehow, for those who
> built their workflow (including scripts) around it, though.

No, I don't think it is a good idea to keep the original behavior around
indefinitely, when it is totally unclear whether there actually _is_ any
user of this feature out there.

We intentionally broke any existing users of `git-parse-remote.sh` by
removing it, when that feature was much more likely to be used in scripts
than `git merge-tree`. We cannot say on the one hand that we will get rid
of some useful script just because we don't want the maintenance burden
when on the other hand trying to keep support for an operation that is
unlikely to have any users. That does not compute. (And don't get hung up
on the specific example of `git-parse-remote.sh`, you know there we
shuffled around _many_ more things recently that had a good chance of
breaking existing users.)

Besides, the original `git remote-tree` behavior will be very easy to
recreate using the better UI, the one that outputs tree OIDs.

Ciao,
Dscho

Ævar Arnfjörð Bjarmason Jan. 11, 2022, 5 p.m. UTC | #10

On Tue, Jan 11 2022, Johannes Schindelin wrote:

> Hi Junio,
>
> On Mon, 10 Jan 2022, Junio C Hamano wrote:
>
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>> > I am against a new command for what essentially serves the original
>> > purpose of `merge-tree`.
>> >
>> > The fact that `merge-tree` has not seen any work in almost 12 years is
>> > testament not only to how hard it was to disentangle the work-tree
>> > requirement from the recursive merge (it is one of my favorite
>> > counterexamples when anybody claims that you can easily prototype code in
>> > a script and then convert it to C), but the fact that there is no user
>> > within Git itself (apart from t/t4300-merge-tree.sh, which does not count)
>> > speaks volumes about the design of that `merge-tree` tool.
>> >
>> > So it's only fair to breathe life into it by letting it do what it was
>> > meant to do all along.
>>
>> My "Yup" would not weigh as much as one that Linus (whose original
>> merge-tree survived this long without seeing much enhancements)
>> might give us, but he is busy elsewhere so you guys have to live
>> with mine ;-)
>>
>> As to its output, I do agree that "we give a tree when it is already
>> usable to record in a new commit" is a valuable option to have.  The
>> original behaviour should be made available somehow, for those who
>> built their workflow (including scripts) around it, though.
>
> No, I don't think it is a good idea to keep the original behavior around
> indefinitely, when it is totally unclear whether there actually _is_ any
> user of this feature out there.

For what it's worth I've used it for some automation at an ex-employer
at least once. Grepping through my #*git* IRC logs there's a few
mentions of it, and likewise for StackOverflow.

I'm not opposed to replacing it, and I think that probably in-the-wild
users of it are almost certainly just grepping for the conflict markers
to see if there's conflicts, or parsing which files have them etc.

So if we can provide a better interface that they can use (or even make
git merge-tree a thin wrapper...).

> We intentionally broke any existing users of `git-parse-remote.sh` by
> removing it, when that feature was much more likely to be used in scripts
> than `git merge-tree`. We cannot say on the one hand that we will get rid
> of some useful script just because we don't want the maintenance burden
> when on the other hand trying to keep support for an operation that is
> unlikely to have any users. That does not compute.  (And don't get hung up
> on the specific example of `git-parse-remote.sh`[...]

The specific example or case really matters. We don't have some generic
flowchart for deprecating things that applies in all cases.

I removed git-parse-remote.sh in a89a2fbfccd (parse-remote: remove this
now-unused library, 2020-11-14) and had the opposite impression you're
expressing here.

I.e. searching around for in-the-wild uses via search engines, stack
overflow etc. either found nothing interesting, or just a reflection of
ourselves (i.e. generated manpages etc.).

That along with its history, i.e. it wasn't *really* meant for
out-of-tree exposure if you look at the history of how it came about, it
was just made at a time when similar libraries had similar manpages
etc. added as a template.

Maybe that was the right thing to do, maybe not, but it went out with
v2.30.0 and the lack of complaints since then would seem to suggest that
I was right that removing it wouldn't be a big deal.

Of course it may have broken someone's script somewhere.

But an important distinction is that they can get it working again by
just copy/pasting that ~100 line shell library into their own script, or
calling the underlying commands it was invoking themselves.

Which doesn't apply to "git merge-tree" unless we come up with a
replacement, and even then a 1=1 port of some in-the-wild code might be
much harder, i.e. if it's deeply coupled to parsing the specific output
it emits now.

So I'd personally be much more hesitant to remove or change that, but of
course we might still come up with good reasons for why it's worth it,
especially with some advertised deprecation period.

And isn't any doubt around that even more reason to just go with
Elijah's plan of introducing new plumbing? I.e. is it really costing us
to just leave these "legacy merge" plumbing built-ins and
merge-recursive.c etc. in place?

Elijah Newren Jan. 11, 2022, 9:15 p.m. UTC | #11

On Mon, Jan 10, 2022 at 9:59 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Mon, Jan 10, 2022 at 5:49 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
...
> >, therefore it expects the resolved conflicts to
> > be in _files_. I don't think that there is a way to reasonably handle
> > symlink target conflicts or directory/file/symlink conflicts, but there
> > might be.
>
> You really need (mode,oid) pairs to be provided by the user.  That
> fixes the executable issue I mentioned above, and makes it clear how
> to handle symlinks and file/symlink conflicts.
>
> directory/file are still handled by providing individual files, but
> ordering traversal becomes really tricky.  The directory/file case
> might even require the pre_resolved_conflicts to be pulled out of that
> loop somehow.  It'd take some investigative work, or some deep
> thought, or both.

I think I came up with a solution to this during my run yesterday,
though I haven't tried or tested it.  Instead of modifying the loop
over plist.items, you instead add a preliminary loop over
pre_resolved_conflicts that modifies opt->priv->paths (and add this
preliminary loop just before the items from opt->priv->paths are added
to plist.items).  In that preliminary loop, you need to make sure that
(a) any files in pre_resolved_conflicts corresponding to existing
_files_ in opt->priv->path result in updating that item's clean &
is_null & mode & oid state, (b) any files in pre_resolved_conflicts
that correspond to existing _directories_ in opt->priv->path need the
value to be expanded to be a conflict_info instead of just a
merged_info, you need to set the df_conflict bit, and don't update the
merge_info fields but do update the extended conflict_info ones, (c)
any new files in pre_resolved_conflicts result in new entries
opt->priv->paths, (d) any leading directories of new files in
pre_resolved_conflicts are appropriately handled, meaning: (d1) new
opt->priv->paths are created if the directory path wasn't in
opt->priv->paths before, (d2) a tweak to df_conflict for the directory
item if it previously existed in opt->priv->paths but only as a file
(possibly also necessitating expanding from a merged_info to a
conflict_info), (d3) no-op if the directory already existed in
opt->priv->paths and was just a directory (and in this case, you can
stop walking the parent directories to the toplevel).

Then, after this preliminary loop that modifies opt->priv->paths, the
rest can just proceed as-is.

That should handle new files, new directories, and all directory/file
conflicts.  Yeah, it's a bunch to look at, but directory/file
conflicts are always a bear.

Elijah Newren Jan. 11, 2022, 10:25 p.m. UTC | #12

On Tue, Jan 11, 2022 at 12:38 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Tue, Jan 11 2022, Johannes Schindelin wrote:
>
> > Hi Junio,
> >
> > On Mon, 10 Jan 2022, Junio C Hamano wrote:
> >
> >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> >>
...
> > No, I don't think it is a good idea to keep the original behavior around
> > indefinitely, when it is totally unclear whether there actually _is_ any
> > user of this feature out there.
>
> For what it's worth I've used it for some automation at an ex-employer
> at least once. Grepping through my #*git* IRC logs there's a few
> mentions of it, and likewise for StackOverflow.
>
> I'm not opposed to replacing it, and I think that probably in-the-wild
> users of it are almost certainly just grepping for the conflict markers
> to see if there's conflicts, or parsing which files have them etc.
>
> So if we can provide a better interface that they can use (or even make
> git merge-tree a thin wrapper...).
...
> So I'd personally be much more hesitant to remove or change that, but of
> course we might still come up with good reasons for why it's worth it,
> especially with some advertised deprecation period.
>
> And isn't any doubt around that even more reason to just go with
> Elijah's plan of introducing new plumbing? I.e. is it really costing us
> to just leave these "legacy merge" plumbing built-ins

I definitely think it's worth guiding users away from the old `git
merge-tree` behavior in documentation (i.e. deprecating it).  That may
also lead towards its eventual removal, but I'm not as worried about
that.

`git merge-recursive` was actively used in various places, including
in `git cherry-pick`.  I had used it a few times myself in a script.
I don't see a need to deprecate it currently, which naturally would
push its removal (if anyone is pushing for it) even further away.

> and merge-recursive.c etc. in place?

This, however, is different.  I have much stronger opinions here and I
do want to eventually remove merge-recursive.c.  Definitely not now,
because:

  * `ort` has not been the default long enough.  We had one bug
reported this cycle where ort did the wrong thing and using
`recursive` was a good workaround.  There may be a long tail here, so
I'd like to wait a couple years for reports to trickle in.
  * `merge-recursive.c` is still hard-coded in three places in the
code; you can't even set a configuration option to choose merge-ort.c
in those places: builtin/am, builtin/merge-recursive, and
builtin/stash.

More details on that second point: All three of these use
merge_recursive_generic() and need that usage to be replaced.  It's on
my TODO list.  Although it might look like a trivial job of just
copying merge_recursive_generic() and replace its merge_recursive()
call with merge_ort_recursive(), that would only get you something
that would die on assert()s in ort's setup.  If you removed those
assert()s, then you'd merely be copying the bugs in
merge_recursive_generic() usage into the new world (and adding one new
bug).  We should instead fix the problems identified with that usage
and those callers during the review of ort.  So don't try to do this
simplistic translation.

Johannes Schindelin Jan. 11, 2022, 10:30 p.m. UTC | #13

Hi Elijah,

On Mon, 10 Jan 2022, Elijah Newren wrote:

> On Mon, Jan 10, 2022 at 5:49 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Fri, 7 Jan 2022, Elijah Newren wrote:
> >
> > > On Fri, Jan 7, 2022 at 9:58 AM Christian Couder
> > > <christian.couder@gmail.com> wrote:
> > > >
> > > > On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@gmail.com>
> > > > wrote:
> > > > >
> > > > > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder
> > > > > <christian.couder@gmail.com> wrote:
> > > >
> ...
> >
> > I _bet_ it needs first and foremost the information "is this
> > mergeable?".
> >
> > That is by far the most common question the code I saw had to answer,
> > without any follow-up questions asked.
> >
> > An add-on question is then "which files/directories conflicted?". And
> > there, it really only wanted the file names, along with the OIDs of
> > the respective item in the base, the HEAD and the merge branch.
>
> This might be difficult to provide for some cases, e.g.
> rename/rename(1to2), especially if that conflict gets entangled with any
> others.  Users would also have difficulty gaining these even using the
> command line `git merge` (with either recursive or ort) for these cases.
>
> Also, does this presume three OIDs?  What about the cases where there
> are 4 or 6 for a given path?
>
> I'm a bit worried about the assumptions underlying this request, and
> that it might not be satisfiable in general depending on what those
> assumptions are.

Well, that request is driven by the reality of a web UI.

You cannot reasonably resolve just any merge conflict in a web UI. But you
can easily resolve a trivial content conflict in, say, a README file,
without opening a console window, cloning the repository, starting an
editor, then saving the result after resolving the textual conflict,
committing it, then trying to force-push to the original PR branch (if the
contributor has given you permission to push).

So what I am talking about here really is the case where no rename
conflict happened, no directory/file conflict, no type change. Just the
good ole' textual conflicts where the same lines were modified in
divergent ways.

This means that we need the base, HEAD and MERGE OIDs here (and modes, you
are absolutely correct).

> > In my work in December, I also had to implement another thing that I
> > think we will have to implement in `merge-tree` in one form or
> > another: when users provided merge conflict resolutions via the web
> > UI, we want the merge to succeed. What I implemented was this (in a
> > nutshell, a way to provide file names with associated blob OIDs that
> > resolve the content conflicts):
>
> Interesting.  I'm curious, though -- if they are providing conflict
> resolutions, then would they not have had a previous merge to find out
> what the conflicts were?  If so, wouldn't they be able to provide the
> tree to avoid the need for a second merge?

No, the conflict resolutions are apparently stored independently from the
commits. Probably to allow for the same resolutions to be applied in case
that the PR's target branch changes but leaves the same conflicts.

Again, we are dealing with the realities of a web UI ;-)

BTW I am not saying that the way I implemented it is 100% the best way to
achieve the intended goal. It is quite possible that there is a better
way, or that we need to at least provide the (mode,OID) triplet
corresponding to the conflict, too, so that the merge machinery can verify
that the resolution is applied to the _correct_ conflict ;-)

> > -- snip --
> > Author: Johannes Schindelin <Johannes.Schindelin@gmx.de>
> > Date:   Thu Dec 16 20:46:35 2021 +0100
> > Subject: merge-ort: allow pre-resolving merge conflicts
> >
> > One of merge-ort's particular strengths is that it works well even in a
> > worktree-less setting, e.g. in a backend for a server-side merge.
> >
> > In such a scenario, it is conceivable that the merge in question not
> > only results in a conflict, but that the caller figures out some sort of
> > resolution before calling the merge again. The second call, of course,
> > is meant to apply the merge conflict resolution.
> >
> > With this commit, we allow just that. The new, optional
> > `pre_resolved_conflicts` field of `struct merge_options` maps paths to
> > the blob OID of the resolution.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > diff --git a/merge-ort.c b/merge-ort.c
> > index a74d4251c3..fa59ce2f97 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -3961,6 +3961,7 @@ static void process_entries(struct merge_options *opt,
> >         prefetch_for_content_merges(opt, &plist);
> >         for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) {
> >                 char *path = entry->string;
> > +               struct object_id *resolved_oid;
> >                 /*
> >                  * NOTE: mi may actually be a pointer to a conflict_info, but
> >                  * we have to check mi->clean first to see if it's safe to
> > @@ -3972,7 +3973,17 @@ static void process_entries(struct merge_options *opt,
> >                                           &dir_metadata);
> >                 if (mi->clean)
> >                         record_entry_for_tree(&dir_metadata, path, mi);
> > -               else {
> > +               else if (opt->pre_resolved_conflicts &&
> > +                        (resolved_oid =
> > +                         strmap_get(opt->pre_resolved_conflicts, path))) {
>
> You have a couple problematic assumptions here:
>
>   * What if the user's resolution required fixing a semantic conflict,
> meaning they needed to modify a file that had no conflicts?  Your code
> here would ignore those, because the clean case is handled above.
>
>   * What if the user's resolution required adding a totally new file
> (either because a rename is handled as a separate add/delete, or they
> just needed a new file)?  The loop above isn't over items in
> pre_resolved_conflicts, it just assumes that items in
> pre_resolved_conflicts are also in the plist.items being looped over.

I can see how these assumptions may look ludicrous when coming from the
command-line. But again, we are talking about the realities of a conflict
resolution via a web UI.

So I think that it is out of the question to address non-textual conflicts
in this scenario. And then the assumptions make much more sense.

>
> > +                       mi->clean = 1;
> > +                       mi->is_null = 0;
> > +                       memcpy(&mi->result.oid, resolved_oid,
> > +                              sizeof(*resolved_oid));
>
> And here's another:
>
>   * What if the provided resolution was a deletion of the file in
> question (especially after e.g. a modify/delete or rename/delete
> conflict)?  Don't you need to have a special check for the zero_oid
> here?
>
> And if I combine the things above:
>
>   * What if the user wants to remove a file and add a directory of
> files in its place at some location in order to resolve things?
> Granted, you didn't handle either deletes or new files above, but if
> you did add both, then you might have this issue.  The code at this
> point used some very carefully constructed logic and order of walking
> items specifically to handle file/delete conflicts correctly.  I'm
> worried that your conflict resolution stuff here might interact badly
> with that.
>
> > +                       if (!mi->result.mode)
> > +                               mi->result.mode = 0100644;
>
> How do you know it's not supposed to be 0100755?

I don't ;-)

This was a proof-of-concept, and I meant to look into this a bit further,
trying to figure out from where I could get this information, but I never
got to that.

Now that I think about it, the best solution would probably be for the
mode to be provided in addition to the blob OID, so that the caller has to
decide.

> > +                       record_entry_for_tree(&dir_metadata, path, mi);
> > +               } else {
> >                         struct conflict_info *ci = (struct conflict_info *)mi;
> >                         process_entry(opt, path, ci, &dir_metadata);
> >                 }
> > diff --git a/merge-recursive.h b/merge-recursive.h
> > index 0795a1d3ec..1f45effdd0 100644
> > --- a/merge-recursive.h
> > +++ b/merge-recursive.h
> > @@ -47,6 +47,13 @@ struct merge_options {
> >         const char *subtree_shift;
> >         unsigned renormalize : 1;
> >
> > +       /*
> > +        * (merge-ORT only) If non-NULL, contains a map from `path` to OID. If
> > +        * the given `path would be marked as conflict, it is instead resolved
> > +        * to the specified blob.
> > +        */
> > +       struct strmap *pre_resolved_conflicts;
> > +
> >         /* internal fields used by the implementation */
> >         struct merge_options_internal *priv;
> >  };
> > -- snap --
> >
> > It is a proof-of-concept
>
> Fair enough.
>
> >, therefore it expects the resolved conflicts to
> > be in _files_. I don't think that there is a way to reasonably handle
> > symlink target conflicts or directory/file/symlink conflicts, but there
> > might be.
>
> You really need (mode,oid) pairs to be provided by the user.  That
> fixes the executable issue I mentioned above, and makes it clear how
> to handle symlinks and file/symlink conflicts.

It's a really good point, too.

> directory/file are still handled by providing individual files, but
> ordering traversal becomes really tricky.  The directory/file case
> might even require the pre_resolved_conflicts to be pulled out of that
> loop somehow.  It'd take some investigative work, or some deep
> thought, or both.

My idea for directory/file conflicts is to essentially teach the web UI to
throw its hands in the air upon encountering them, and telling the user
that it's not possible to resolve these types of conflicts via the UI.

And since my principal driver here is the concrete need for such a
server-side merge with conflict resolution, that's as far as I want to
push `merge-tree`, too, and leave any more complicated resolution to a
future date, or never, whichever comes first.

> > A subsequent commit changed my hacky `merge-tree` hack to optionally
> > accept NUL-terminated merge conflict resolutions in <path>/<OID> pairs via
> > stdin (again, avoiding files to be written where we do not _need_ spend
> > I/O unnecessarily).
> >
> > The biggest problem we faced at the Contributor Summit was that our
> > discussion was not informed by the actual server-side needs. So I would
> > like to reiterate my challenge to make that the driver. Let's not use any
> > hypothetical scenario as the basis for the design of `git merge-tree`, but
> > let's use the concrete requirements of actual server-side merges that are
> > currently implemented using libgit2, when trying to figure out what
> > functionality we need from `merge-tree`, and in what shape.
> >
> > Here is an initial list:
> >
> > - need to determine whether a merge will succeed, quickly
> >
> > - need the tree OID for a successful merge
> >
> > - need the list of items that conflict...
>
> I think my version is good up to here.

Yes!

The only thing I would change is to not even bother providing a tree
object in case of conflicts. Either we provide it, and expect the user to
somehow reconstruct the conflict types from there, or we simply don't say
anything about the tree whose (transitive) file contents may or may not
contain conflict markers (or not, think: binary files). Providing a tree
object in case of a failed merge isn't helpful IMO.

> >  , along with OIDs and modes, if the merge failed
>
> Could you clarify what you mean here by OIDs and modes?  For a given
> path, are you just looking for a (pathname, OID, mode) tuple?  (In
> which case, you'd get the OID of a file that potentially has embedded
> conflict markers)
>
> Or are you looking for multiple OIDs and modes for each file?

This. In case of a conflict, I am looking for (mode,OID) for the merge
base (which _can_ be a virtual one in case of recursive merges) as well as
for the two divergent revisions that were supposed to be merged.

I do realize that other types of conflicts can occur, such as a
rename/rename conflict, and we would need a way to represent these in the
output, too. But in such an instance, where no clear (mode,OID) triplet
can be provided that represents the merge conflicts for this file, the
current web UI cannot offer a way to resolve the conflict, so I am a bit
less interested in that scenario.

> If you are looking for multiple OIDs and modes for each file
> (representing where the content came from on the different sides of
> the merge), are you looking for:
>   * the OID and mode of each file that played a part in the merge resolution
> OR
>   * just the OIDs and modes that would have been recorded in the index
> had the merge been done by `git merge`?

I think the latter was my idea.

But again, you made me think of rename/rename conflicts and friends, and
we would need a way to represent those, too. Even if my current use case
would need to only detect their presence in order to say "nope, can't
resolve that on the web".

> (Those last two possibilities are usually the same answer, but no they
> are not always the same.  The index cannot hold all the OIDs and modes
> in various cases and we have to squash them down to three, possibly
> tossing some information that might have been of interest to the user.
> This can even occur when you have a unique merge base.)
>
> >
> > - need a way to provide merge conflict resolutions
> >
> > And now that I wrote all this, I realize I should have sent it to the
> > `merge-tree` thread, not the `merge-tree-ort` thread...
>
> My submission was RFC and you're providing C, so it's all good in my
> book.  I'm reading both threads.  :-)

:-)

Thank you so much!
Dscho

Elijah Newren Jan. 12, 2022, 12:41 a.m. UTC | #14

Hi Dscho,

On Tue, Jan 11, 2022 at 2:30 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Elijah,
>
> On Mon, 10 Jan 2022, Elijah Newren wrote:
>
> > On Mon, Jan 10, 2022 at 5:49 AM Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> > >
> > > On Fri, 7 Jan 2022, Elijah Newren wrote:
> > >
> > > > On Fri, Jan 7, 2022 at 9:58 AM Christian Couder
> > > > <christian.couder@gmail.com> wrote:
> > > > >
> > > > > On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder
> > > > > > <christian.couder@gmail.com> wrote:
> > > > >
> > ...
> > >
> > > I _bet_ it needs first and foremost the information "is this
> > > mergeable?".
> > >
> > > That is by far the most common question the code I saw had to answer,
> > > without any follow-up questions asked.
> > >
> > > An add-on question is then "which files/directories conflicted?". And
> > > there, it really only wanted the file names, along with the OIDs of
> > > the respective item in the base, the HEAD and the merge branch.
> >
> > This might be difficult to provide for some cases, e.g.
> > rename/rename(1to2), especially if that conflict gets entangled with any
> > others.  Users would also have difficulty gaining these even using the
> > command line `git merge` (with either recursive or ort) for these cases.
> >
> > Also, does this presume three OIDs?  What about the cases where there
> > are 4 or 6 for a given path?
> >
> > I'm a bit worried about the assumptions underlying this request, and
> > that it might not be satisfiable in general depending on what those
> > assumptions are.
>
> Well, that request is driven by the reality of a web UI.
>
> You cannot reasonably resolve just any merge conflict in a web UI. But you
> can easily resolve a trivial content conflict in, say, a README file,
> without opening a console window, cloning the repository, starting an
> editor, then saving the result after resolving the textual conflict,
> committing it, then trying to force-push to the original PR branch (if the
> contributor has given you permission to push).
>
> So what I am talking about here really is the case where no rename
> conflict happened, no directory/file conflict, no type change. Just the
> good ole' textual conflicts where the same lines were modified in
> divergent ways.

Sure, but we were talking about needs for `git merge-tree`.  You are
saying we need more information out of it (OIDs and modes) to fullfill
user requests.  That's totally reasonable.  But stating that you don't
care about certain types of conflicts or cases where renames happen
doesn't really help answer my question; while that's a restriction you
can put on your internal implementation, other users of `merge-tree`
doing the exact same type of thing (even involving a web UI) might
decide they also want to handle the other types of conflicts.  I don't
want merge-tree to have a design only capable of handling unrenamed
files, or only textual conflicts, since the new mode for merge-tree
was all about handling the complexity of a real merge.

> This means that we need the base, HEAD and MERGE OIDs here (and modes, you
> are absolutely correct).

Now you're back to requiring three again.  What about modify/delete
conflicts where there are only two available?

Your stated requirements either lack precision about what you really
want (e.g. "the OIDs and modes that would have appeared in the higher
order stages in the index for the given path") or aren't satisfiable.
I'm against providing anything that knowingly paints us into a corner.

...
> > > @@ -3972,7 +3973,17 @@ static void process_entries(struct merge_options *opt,
> > >                                           &dir_metadata);
> > >                 if (mi->clean)
> > >                         record_entry_for_tree(&dir_metadata, path, mi);
> > > -               else {
> > > +               else if (opt->pre_resolved_conflicts &&
> > > +                        (resolved_oid =
> > > +                         strmap_get(opt->pre_resolved_conflicts, path))) {
> >
> > You have a couple problematic assumptions here:
> >
> >   * What if the user's resolution required fixing a semantic conflict,
> > meaning they needed to modify a file that had no conflicts?  Your code
> > here would ignore those, because the clean case is handled above.
> >
> >   * What if the user's resolution required adding a totally new file
> > (either because a rename is handled as a separate add/delete, or they
> > just needed a new file)?  The loop above isn't over items in
> > pre_resolved_conflicts, it just assumes that items in
> > pre_resolved_conflicts are also in the plist.items being looped over.
>
> I can see how these assumptions may look ludicrous when coming from the
> command-line. But again, we are talking about the realities of a conflict
> resolution via a web UI.
>
> So I think that it is out of the question to address non-textual conflicts
> in this scenario. And then the assumptions make much more sense.

Waving your hands and saying it came from a web UI doesn't reduce my
worries or concerns at all.  I could easily imagine a web UI that
allows users to specify other modifications they want to make, even
limited to textual ones, to include in the merge.  What happens when
they modify some file that had otherwise merged cleanly (e.g. a file
that gains a new call to some function, which after the merge needs an
extra parameter passed to it)?  Your solution doesn't handle it; it'd
send that user-specified change to /dev/null.

To be fair, you mentioned below that this is just a proof of concept,
and that certainly reduces worries/concerns.  It's totally fine to
have such things.  Maybe you intend to keep this patch internal
indefinitely.  That's fine too.  My commentary is just feedback on if
we want merge-ort/merge-tree extended more in this kind of fashion
(and it might also serve as useful pointers on how to extend your
internal patch if you get requests to extend your web UI a bit to
handle more cases).  :-)

> > > +                       mi->clean = 1;
> > > +                       mi->is_null = 0;
> > > +                       memcpy(&mi->result.oid, resolved_oid,
> > > +                              sizeof(*resolved_oid));
> >
> > And here's another:
> >
> >   * What if the provided resolution was a deletion of the file in
> > question (especially after e.g. a modify/delete or rename/delete
> > conflict)?  Don't you need to have a special check for the zero_oid
> > here?
> >
> > And if I combine the things above:
> >
> >   * What if the user wants to remove a file and add a directory of
> > files in its place at some location in order to resolve things?
> > Granted, you didn't handle either deletes or new files above, but if
> > you did add both, then you might have this issue.  The code at this
> > point used some very carefully constructed logic and order of walking
> > items specifically to handle file/delete conflicts correctly.  I'm
> > worried that your conflict resolution stuff here might interact badly
> > with that.
> >
> > > +                       if (!mi->result.mode)
> > > +                               mi->result.mode = 0100644;
> >
> > How do you know it's not supposed to be 0100755?
>
> I don't ;-)
>
> This was a proof-of-concept, and I meant to look into this a bit further,
> trying to figure out from where I could get this information, but I never
> got to that.
>
> Now that I think about it, the best solution would probably be for the
> mode to be provided in addition to the blob OID, so that the caller has to
> decide.

+1.

...
> > directory/file are still handled by providing individual files, but
> > ordering traversal becomes really tricky.  The directory/file case
> > might even require the pre_resolved_conflicts to be pulled out of that
> > loop somehow.  It'd take some investigative work, or some deep
> > thought, or both.
>
> My idea for directory/file conflicts is to essentially teach the web UI to
> throw its hands in the air upon encountering them, and telling the user
> that it's not possible to resolve these types of conflicts via the UI.
>
> And since my principal driver here is the concrete need for such a
> server-side merge with conflict resolution, that's as far as I want to
> push `merge-tree`, too, and leave any more complicated resolution to a
> future date, or never, whichever comes first.

:-)

> > > A subsequent commit changed my hacky `merge-tree` hack to optionally
> > > accept NUL-terminated merge conflict resolutions in <path>/<OID> pairs via
> > > stdin (again, avoiding files to be written where we do not _need_ spend
> > > I/O unnecessarily).
> > >
> > > The biggest problem we faced at the Contributor Summit was that our
> > > discussion was not informed by the actual server-side needs. So I would
> > > like to reiterate my challenge to make that the driver. Let's not use any
> > > hypothetical scenario as the basis for the design of `git merge-tree`, but
> > > let's use the concrete requirements of actual server-side merges that are
> > > currently implemented using libgit2, when trying to figure out what
> > > functionality we need from `merge-tree`, and in what shape.
> > >
> > > Here is an initial list:
> > >
> > > - need to determine whether a merge will succeed, quickly
> > >
> > > - need the tree OID for a successful merge
> > >
> > > - need the list of items that conflict...
> >
> > I think my version is good up to here.
>
> Yes!

:-)

> The only thing I would change is to not even bother providing a tree
> object in case of conflicts. Either we provide it, and expect the user to
> somehow reconstruct the conflict types from there, or we simply don't say
> anything about the tree whose (transitive) file contents may or may not
> contain conflict markers (or not, think: binary files). Providing a tree
> object in case of a failed merge isn't helpful IMO.

You've got a false dichotomy here.  Providing a tree object does NOT
imply that we expect users to reconstruct a list of conflicted files
or the types of conflicts from that tree.  I pretty strongly argued
against that, in fact (see [*], , starting at "???" -- and yes, I mean
a literal triple question mark).

I also don't see why you are so bothered by the tree object being
printed.  I already pointed out that we can't easily avoid computing
the tree object without risking hurting performance more than you're
trying to help it (again, see [*]).

But beyond performance, by trying to require that the tree object not
be printed, you are trying to prevent users from getting information
that they simply cannot obtain any other way.  Even if `git
merge-tree` provided a list of up to three (stage, mode, oid) triplets
for each conflicted filename corresponding to what you'd find in the
index in the higher order stages (which sounds like a totally
reasonable request, by the way), you could not reconstruct the
information available in the tree for the same conflicted paths.
Sure, you could get "close", and for some uses, maybe that's good
enough, but most users I've ever seen prefer the file in the working
copy after a merge for trying to resolve conflicts over the (stage,
mode, oid) pairs found in the index.  And since file in the tree
potentially has some information not reconstructable from the index, I
don't see why we should avoid printing 41 characters to stdout to
allow the user to make use of it.  If it's not useful to you,
whatever, toss those 41 printed bytes.

[*] https://lore.kernel.org/git/CABPp-BHvXrP0sTTmuTYfACoJTCcm9+wk_f441nj4TstrmQdqMQ@mail.gmail.com/

> > >  , along with OIDs and modes, if the merge failed
> >
> > Could you clarify what you mean here by OIDs and modes?  For a given
> > path, are you just looking for a (pathname, OID, mode) tuple?  (In
> > which case, you'd get the OID of a file that potentially has embedded
> > conflict markers)
> >
> > Or are you looking for multiple OIDs and modes for each file?
>
> This. In case of a conflict, I am looking for (mode,OID) for the merge
> base (which _can_ be a virtual one in case of recursive merges) as well as
> for the two divergent revisions that were supposed to be merged.
>
> I do realize that other types of conflicts can occur, such as a
> rename/rename conflict, and we would need a way to represent these in the
> output, too. But in such an instance, where no clear (mode,OID) triplet
> can be provided that represents the merge conflicts for this file, the
> current web UI cannot offer a way to resolve the conflict, so I am a bit
> less interested in that scenario.

Okay, this is helping explain a bit better.

Out of curiosity, does this mean the web UI only can resolve cases
where all three modes & OIDs are present, and the files are text
rather than binary?  For example, does this mean the web UI cannot
handle cases like modify/delete or add/add?

> > If you are looking for multiple OIDs and modes for each file
> > (representing where the content came from on the different sides of
> > the merge), are you looking for:
> >   * the OID and mode of each file that played a part in the merge resolution
> > OR
> >   * just the OIDs and modes that would have been recorded in the index
> > had the merge been done by `git merge`?
>
> I think the latter was my idea.

Ah, perfect; that's something we can provide!

> But again, you made me think of rename/rename conflicts and friends, and
> we would need a way to represent those, too. Even if my current use case
> would need to only detect their presence in order to say "nope, can't
> resolve that on the web".

But now you're making me worry whether I can satisfy your requests
again, or at minimum, whether I need to do a lot more work in
merge-ort to answer them.  Do you need a representation other than the
list of (stage, mode, oid) triplets?

That might be hard to come by, especially for cases like mod6 of
t6422.  The code basically assumes there's going to be 0 or more
conflict types at each path (and yes, you can absolutely have more
than 1), but doesn't store these conflict types other than in terms of
outputting various "CONFLICT (type): <more info>" strings (though it
does keep the output messages in memory and associated with each file
before printing them at the end).  Further, while some conflict types
which could be considered to affect multiple paths (e.g. rename/rename
where both sides rename the same file differently), they are only
stored once relative to only one of the filenames to avoid printing
the same conflict notice three times. (For example, in the
rename/rename case, the conflict message is associated with the base
filename).

If you do only need the list of (stage, mode, oid) triplets, then that
can be provided for every file.  However, things like
rename/rename(1to2) will only provide a single (stage, mode, oid)
triplet for the oldfilename, and also only provide one triplet for the
new filename on side1, and also only provide one triplet for the new
filename on side2.  I don't know if it'll confuse users why they have
a conflict for a file with only a single triplet.

> > > And now that I wrote all this, I realize I should have sent it to the
> > > `merge-tree` thread, not the `merge-tree-ort` thread...
> >
> > My submission was RFC and you're providing C, so it's all good in my
> > book.  I'm reading both threads.  :-)
>
> :-)

I'm a little worried I might come across sounding a little picky since
I tend to dive into details and really fixate on them.  I apologize in
advance if it sounds that way at all; you provide lots of good points
to think about and I can't help but dive into the quirks surrounding
each.  I really appreciate all the awesome feedback and food for
thought.

Junio C Hamano Jan. 12, 2022, 5:54 p.m. UTC | #15

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Maybe that was the right thing to do, maybe not, but it went out with
> v2.30.0 and the lack of complaints since then would seem to suggest that
> I was right that removing it wouldn't be a big deal.
>
> Of course it may have broken someone's script somewhere.
>
> But an important distinction is that they can get it working again by
> just copy/pasting that ~100 line shell library into their own script, or
> calling the underlying commands it was invoking themselves.

Was parse-remote a part of what promised our end-users?  merge-tree
has been since it was written.  

That distinction is even more important.

Junio C Hamano Jan. 12, 2022, 6:06 p.m. UTC | #16

Elijah Newren <newren@gmail.com> writes:

>> And isn't any doubt around that even more reason to just go with
>> Elijah's plan of introducing new plumbing? I.e. is it really costing us
>> to just leave these "legacy merge" plumbing built-ins
>
> I definitely think it's worth guiding users away from the old `git
> merge-tree` behavior in documentation (i.e. deprecating it).  That may
> also lead towards its eventual removal, but I'm not as worried about
> that.

Yup, promising users that we will remove it and telling them that
they should migrate away from it is necessary.  Doing anything else
is simply irresponsible.

I however suspect that Ævar didn't mean by "legacy merge plumbing
built-in" the strategy backends.  IOW, I had an impression that what
is on the chopping block is merge-tree and not merge-recursive.

But since you brought up deprecation of recursive, let's spend a few
minutes on the topic.

> `git merge-recursive` was actively used in various places, including
> in `git cherry-pick`.  I had used it a few times myself in a script.
> I don't see a need to deprecate it currently, which naturally would
> push its removal (if anyone is pushing for it) even further away.

I suspect that we may be able to just invoke ort when recursive is
invoked, and such a wrapping may even be easier than wrapping "git
blame" to replace "git annotate" (where a command line option or two
needs to change behaviour).  I doubt there is -X<strategy-option>
that affects recursive that ORT does not understand, so it may be
quite simple to deprecate "merge -s recursive".

As you say, replacing the internal implementation is a different
matter.

>   * `merge-recursive.c` is still hard-coded in three places in the
> code; you can't even set a configuration option to choose merge-ort.c
> in those places: builtin/am, builtin/merge-recursive, and
> builtin/stash.
>
> More details on that second point: All three of these use
> merge_recursive_generic() and need that usage to be replaced.  It's on
> my TODO list.

Yes, I do recall mentioning that when we were reviewing the series
that added ort.

Elijah Newren Jan. 12, 2022, 8:06 p.m. UTC | #17

On Wed, Jan 12, 2022 at 10:06 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
...
> I however suspect that Ævar didn't mean by "legacy merge plumbing
> built-in" the strategy backends.  IOW, I had an impression that what
> is on the chopping block is merge-tree and not merge-recursive.
>
> But since you brought up deprecation of recursive, let's spend a few
> minutes on the topic.

Not sure it matters, but for reference, Ævar explicitly brought up
merge-recursive.c.  The fuller quote was:

> >> I.e. is it really costing us
> >> to just leave these "legacy merge" plumbing built-ins and
> >> merge-recursive.c etc. in place?

Because he brought it up, I decided to address it.  It was unclear to
me whether he meant builtin/merge-recursive.c or the toplevel
merge-recursive.c, so I just addressed both.

> I suspect that we may be able to just invoke ort when recursive is
> invoked, and such a wrapping may even be easier than wrapping "git
> blame" to replace "git annotate" (where a command line option or two
> needs to change behaviour).

Yes, that was the plan I had.

> I doubt there is -X<strategy-option>
> that affects recursive that ORT does not understand,

Technically there are currently two.  As documented in
merge-strategies.txt, ORT takes the same -X options as recursive, but
currently ignores both -Xdiff-algorithm (instead passing
HISTOGRAM_DIFF down to ll_merge()), and -Xno-renames (instead passing
DIFF_DETECT_RENAME down to diffcore_rename_extended()).  I guess there
are three if you count -Xpatience separately from
-Xdiff-algorithm=patience.

> so it may be quite simple to deprecate "merge -s recursive".

Yes...but why deprecate?  I thought the plan was to (eventually) make
requests for either `recursive` or `ort` be handled by running the
`ort` backend.  Making that kind of switch is much smaller than the
one we already made to switch the default backend from `recursive` to
`ort`, so I'm not sure I see what we gain by doing such a switch in
stages.  Maybe I'm missing something?

Junio C Hamano Jan. 13, 2022, 6:08 a.m. UTC | #18

Elijah Newren <newren@gmail.com> writes:

>> so it may be quite simple to deprecate "merge -s recursive".
>
> Yes...but why deprecate?  I thought the plan was to (eventually) make
> requests for either `recursive` or `ort` be handled by running the
> `ort` backend.  Making that kind of switch is much smaller than the
> one we already made to switch the default backend from `recursive` to
> `ort`, so I'm not sure I see what we gain by doing such a switch in
> stages.  Maybe I'm missing something?

Didn't we "deprecate" but still indefinitely support "annotate"?  I
have been assuming that recursive will be in that category after ort
establishes itself.

Elijah Newren Jan. 13, 2022, 8:01 a.m. UTC | #19

On Wed, Jan 12, 2022 at 10:08 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> >> so it may be quite simple to deprecate "merge -s recursive".
> >
> > Yes...but why deprecate?  I thought the plan was to (eventually) make
> > requests for either `recursive` or `ort` be handled by running the
> > `ort` backend.  Making that kind of switch is much smaller than the
> > one we already made to switch the default backend from `recursive` to
> > `ort`, so I'm not sure I see what we gain by doing such a switch in
> > stages.  Maybe I'm missing something?
>
> Didn't we "deprecate" but still indefinitely support "annotate"?  I
> have been assuming that recursive will be in that category after ort
> establishes itself.

But in that case, we suggested folks use "blame" instead of "annotate", right?

From a user point of view, my intention was that "ort" is just the
newer version of "recursive" (which happens to be an essentially total
rewrite).  ort only got a separate name because we wanted a special
testing period and a way for users to select the old version of
recursive or the new version of recursive.  Of course, that period is
multiple release cycles, so both names will persist, but they'll just
be aliases.

Once we switch "recursive" to mean "ort" so they both run the exact
same code, I was intending to have the documentation prefer the term
"recursive" over "ort" since it is a more meaningful user-facing name.

Ævar Arnfjörð Bjarmason Jan. 13, 2022, 9:22 a.m. UTC | #20

On Wed, Jan 12 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> Maybe that was the right thing to do, maybe not, but it went out with
>> v2.30.0 and the lack of complaints since then would seem to suggest that
>> I was right that removing it wouldn't be a big deal.
>>
>> Of course it may have broken someone's script somewhere.
>>
>> But an important distinction is that they can get it working again by
>> just copy/pasting that ~100 line shell library into their own script, or
>> calling the underlying commands it was invoking themselves.
>
> Was parse-remote a part of what promised our end-users? [...]

It was listed under "LOW-LEVEL COMMANDS (PLUMBING)" => "Syncing
repositories", which has git-daemon, git-{upload,receive}-pack,
git-shell etc. now.

So, pedantically we removed a removed a plumbing utility without much if
any warning.

But as I argued when it was removed I think that realistically some of
our plumbing tools were made for internal-only use, and as the *.sh->C
rewriting of built-ins progressed they became orphaned.

They only had public-facing manpages due to plans that never came to
fruition, or just in copy/pasting an existing template at the time.

I don't think that would matter for git-parse-remote if it had
subsequently gotted widespread use in the wild (even if it were
undocumented), but when I investigated that it really seemed to be used
by approximately nobody.

But a while after it was removed you pushed back on some further similar
changes to git-sh-setup. I asked some follow-up questions in
https://lore.kernel.org/git/87tuiwjfvi.fsf@evledraar.gmail.com/ about
how we should consider these that you didn't reply to.

The greater context being that I was removing git-parse-remote.sh so
that I could eventually get rid of the bridge of extending
libintl/gettext to *.sh-land. The current state of that on "master" in
that regard being:

    $ git grep '\b(eval_)?gettext(ln)?\b' -- ':!t/' ':!ci/' ':!git-gui' ':!git-sh-i18n.sh' '*.sh'
    git-merge-octopus.sh:    gettextln "Error: Your local changes to the following files would be overwritten by merge"
    git-merge-octopus.sh:           gettextln "Automated merge did not work."
    git-merge-octopus.sh:           gettextln "Should not be doing an octopus."
    git-merge-octopus.sh:           die "$(eval_gettext "Unable to find common commit with \$pretty_name")"
    git-merge-octopus.sh:           eval_gettextln "Already up to date with \$pretty_name"
    git-merge-octopus.sh:           eval_gettextln "Fast-forwarding to: \$pretty_name"
    git-merge-octopus.sh:   eval_gettextln "Trying simple merge with \$pretty_name"
    git-merge-octopus.sh:           gettextln "Simple merge did not work, trying automatic merge."
    git-sh-setup.sh:# Source git-sh-i18n for gettext support.
    git-sh-setup.sh:                die "$(eval_gettext "usage: \$dashless \$USAGE")"
    git-sh-setup.sh:                LONG_USAGE="$(eval_gettext "usage: \$dashless \$USAGE")"
    git-sh-setup.sh:                LONG_USAGE="$(eval_gettext "usage: \$dashless \$USAGE
    git-sh-setup.sh:                gettextln "Cannot chdir to \$cdup, the toplevel of the working tree" >&2
    git-sh-setup.sh:                die "$(eval_gettext "fatal: \$program_name cannot be used without a working tree.")"
    git-sh-setup.sh:                die "$(eval_gettext "fatal: \$program_name cannot be used without a working tree.")"
    git-sh-setup.sh:                        gettextln "Cannot rewrite branches: You have unstaged changes." >&2
    git-sh-setup.sh:                        eval_gettextln "Cannot \$action: You have unstaged changes." >&2
    git-sh-setup.sh:                        eval_gettextln "Cannot \$action: Your index contains uncommitted changes." >&2
    git-sh-setup.sh:                    gettextln "Additionally, your index contains uncommitted changes." >&2
    git-sh-setup.sh:                        gettextln "You need to run this command from the toplevel of the working tree." >&2
    git-sh-setup.sh:                gettextln "Unable to determine absolute path of git directory" >&2
    git-submodule.sh:                       die "fatal: $(eval_gettext "Unable to find current revision in submodule path '\$displaypath'")"
    git-submodule.sh:                               die "fatal: $(eval_gettext "Unable to fetch in submodule path '\$sm_path'")"
    git-submodule.sh:                       die "fatal: $(eval_gettext "Unable to find current \${remote_name}/\${branch} revision in submodule path '\$sm_path'")"
    git-submodule.sh:                               die_msg="fatal: $(eval_gettext "Failed to recurse into submodule path '\$displaypath'")"

I.e. if we were able to get rid of that we could remove
sh-i18n--envsubst.c and git-sh-i18n.sh itself.

Some of that is dead code, others have pending *.sh->C rewrites. For the
rest we could expose a trivial git-i18n.c helper to emit the <10
messages that remained pending further rewrites, which would be much
simpler than extending the generic libintl functionality to *.sh.

But since git-sh-i18n.sh latter is publicly documented as plumbing
(which I'm responsible for, merely by copy/pasting an existing template)
I stalled on that. Since you seemed to suggest in the linked-to thread
that removing any such publicly documented shellscripting functions was
a no-go, even if we'd previously removed git-parse-remote.sh.

Ævar Arnfjörð Bjarmason Jan. 13, 2022, 9:26 a.m. UTC | #21

On Wed, Jan 12 2022, Elijah Newren wrote:

> On Wed, Jan 12, 2022 at 10:06 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Elijah Newren <newren@gmail.com> writes:
>>
> ...
>> I however suspect that Ævar didn't mean by "legacy merge plumbing
>> built-in" the strategy backends.  IOW, I had an impression that what
>> is on the chopping block is merge-tree and not merge-recursive.
>>
>> But since you brought up deprecation of recursive, let's spend a few
>> minutes on the topic.
>
> Not sure it matters, but for reference, Ævar explicitly brought up
> merge-recursive.c.  The fuller quote was:
>
>> >> I.e. is it really costing us
>> >> to just leave these "legacy merge" plumbing built-ins and
>> >> merge-recursive.c etc. in place?
>
> Because he brought it up, I decided to address it.  It was unclear to
> me whether he meant builtin/merge-recursive.c or the toplevel
> merge-recursive.c, so I just addressed both.

FWIW what I meant (but clearly didn't make clear enough) is whether we'd
deprecate the git-merge-tree(1) command, not whatever powers it under
the hood.

I.e. I took the greater discussion here to mean (but may have
misunderstood it) that we were talking about the needs for a
libgit2-replacing merge plumbing.

The existing git-merge-tree command probably gets us 5% towards that,
and I can see how being bug-for-bug compatible with it might be
inconvenient in some future on-top-of-ort rewrite and extension of it.

So we probably SHOULD keep it, but I don't think it's a MUST. I.e. if
you/someone wrote some more powerful version of it, and keeping it
became hard to support I think it would be OK to transition/deprecate
it, as presumably its existing users wouldn't be too inconvenienced, or
would be happier with the more powerful plumbing tool.

Johannes Schindelin Feb. 22, 2022, 12:44 p.m. UTC | #22

Hi Elijah,

to save every reader some time, I snipped the parts that were addressed
elsewhere already.

On Tue, 11 Jan 2022, Elijah Newren wrote:

> On Tue, Jan 11, 2022 at 2:30 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Mon, 10 Jan 2022, Elijah Newren wrote:
> >
> > > You have a couple problematic assumptions here:
> > >
> > >   * What if the user's resolution required fixing a semantic
> > >   conflict, meaning they needed to modify a file that had no
> > >   conflicts?  Your code here would ignore those, because the clean
> > >   case is handled above.
> > >
> > >   * What if the user's resolution required adding a totally new file
> > >   (either because a rename is handled as a separate add/delete, or
> > >   they just needed a new file)?  The loop above isn't over items in
> > >   pre_resolved_conflicts, it just assumes that items in
> > >   pre_resolved_conflicts are also in the plist.items being looped
> > >   over.
> >
> > I can see how these assumptions may look ludicrous when coming from
> > the command-line. But again, we are talking about the realities of a
> > conflict resolution via a web UI.
> >
> > So I think that it is out of the question to address non-textual
> > conflicts in this scenario. And then the assumptions make much more
> > sense.
>
> Waving your hands and saying it came from a web UI doesn't reduce my
> worries or concerns at all.  I could easily imagine a web UI that
> allows users to specify other modifications they want to make, even
> limited to textual ones, to include in the merge.  What happens when
> they modify some file that had otherwise merged cleanly (e.g. a file
> that gains a new call to some function, which after the merge needs an
> extra parameter passed to it)?  Your solution doesn't handle it; it'd
> send that user-specified change to /dev/null.
>
> To be fair, you mentioned below that this is just a proof of concept,
> and that certainly reduces worries/concerns.  It's totally fine to
> have such things.  Maybe you intend to keep this patch internal
> indefinitely.  That's fine too.  My commentary is just feedback on if
> we want merge-ort/merge-tree extended more in this kind of fashion
> (and it might also serve as useful pointers on how to extend your
> internal patch if you get requests to extend your web UI a bit to
> handle more cases).  :-)

Fair enough. I think I'll keep this patch internal-only for now, and
iterate with real scenarios to figure out what we need/not need.

> > > Could you clarify what you mean here by OIDs and modes?  For a given
> > > path, are you just looking for a (pathname, OID, mode) tuple?  (In
> > > which case, you'd get the OID of a file that potentially has embedded
> > > conflict markers)
> > >
> > > Or are you looking for multiple OIDs and modes for each file?
> >
> > This. In case of a conflict, I am looking for (mode,OID) for the merge
> > base (which _can_ be a virtual one in case of recursive merges) as well as
> > for the two divergent revisions that were supposed to be merged.
> >
> > I do realize that other types of conflicts can occur, such as a
> > rename/rename conflict, and we would need a way to represent these in the
> > output, too. But in such an instance, where no clear (mode,OID) triplet
> > can be provided that represents the merge conflicts for this file, the
> > current web UI cannot offer a way to resolve the conflict, so I am a bit
> > less interested in that scenario.
>
> Okay, this is helping explain a bit better.
>
> Out of curiosity, does this mean the web UI only can resolve cases
> where all three modes & OIDs are present, and the files are text
> rather than binary?  For example, does this mean the web UI cannot
> handle cases like modify/delete or add/add?

Right now, the UI is based on the assumption that the underlying merge
machinery does not even try to detect renames. Which simplifies the range
of supported scenarios somewhat.

I have not looked at the underlying code, but
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-on-github
clearly says:

	You can resolve simple merge conflicts that involve competing line
	changes on GitHub, using the conflict editor.

So yes, the UI does not even try to support resolving modify/delete
conflicts at this time.

> > But again, you made me think of rename/rename conflicts and friends, and
> > we would need a way to represent those, too. Even if my current use case
> > would need to only detect their presence in order to say "nope, can't
> > resolve that on the web".
>
> But now you're making me worry whether I can satisfy your requests
> again, or at minimum, whether I need to do a lot more work in
> merge-ort to answer them.  Do you need a representation other than the
> list of (stage, mode, oid) triplets?

I guess we will have to accept that the (stage, mode, OID) list is the
best form, for now.

We will have to see what is possible to do on the UI side, and then extend
the `merge-ort` side as needed.

> I'm a little worried I might come across sounding a little picky since I
> tend to dive into details and really fixate on them.  I apologize in
> advance if it sounds that way at all; you provide lots of good points to
> think about and I can't help but dive into the quirks surrounding each.
> I really appreciate all the awesome feedback and food for thought.

Personally, I did not find your comments picky at all, but rather
constructive. I find this conversation thoroughly enjoyable and productive
even at times when you prove me wrong. You set a really high bar as far as
collaboration style goes. Thank you very much for that!

Ciao,
Dscho

Johannes Schindelin Feb. 22, 2022, 1:08 p.m. UTC | #23

Hi Elijah,

On Tue, 11 Jan 2022, Elijah Newren wrote:

> On Mon, Jan 10, 2022 at 9:59 AM Elijah Newren <newren@gmail.com> wrote:
> >
> > On Mon, Jan 10, 2022 at 5:49 AM Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> > >
> ...
> > >, therefore it expects the resolved conflicts to
> > > be in _files_. I don't think that there is a way to reasonably handle
> > > symlink target conflicts or directory/file/symlink conflicts, but there
> > > might be.
> >
> > You really need (mode,oid) pairs to be provided by the user.  That
> > fixes the executable issue I mentioned above, and makes it clear how
> > to handle symlinks and file/symlink conflicts.
> >
> > directory/file are still handled by providing individual files, but
> > ordering traversal becomes really tricky.  The directory/file case
> > might even require the pre_resolved_conflicts to be pulled out of that
> > loop somehow.  It'd take some investigative work, or some deep
> > thought, or both.
>
> I think I came up with a solution to this during my run yesterday,
> though I haven't tried or tested it.  Instead of modifying the loop
> over plist.items, you instead add a preliminary loop over
> pre_resolved_conflicts that modifies opt->priv->paths (and add this
> preliminary loop just before the items from opt->priv->paths are added
> to plist.items).  In that preliminary loop, you need to make sure that
> (a) any files in pre_resolved_conflicts corresponding to existing
> _files_ in opt->priv->path result in updating that item's clean &
> is_null & mode & oid state, (b) any files in pre_resolved_conflicts
> that correspond to existing _directories_ in opt->priv->path need the
> value to be expanded to be a conflict_info instead of just a
> merged_info, you need to set the df_conflict bit, and don't update the
> merge_info fields but do update the extended conflict_info ones, (c)
> any new files in pre_resolved_conflicts result in new entries
> opt->priv->paths, (d) any leading directories of new files in
> pre_resolved_conflicts are appropriately handled, meaning: (d1) new
> opt->priv->paths are created if the directory path wasn't in
> opt->priv->paths before, (d2) a tweak to df_conflict for the directory
> item if it previously existed in opt->priv->paths but only as a file
> (possibly also necessitating expanding from a merged_info to a
> conflict_info), (d3) no-op if the directory already existed in
> opt->priv->paths and was just a directory (and in this case, you can
> stop walking the parent directories to the toplevel).
>
> Then, after this preliminary loop that modifies opt->priv->paths, the
> rest can just proceed as-is.
>
> That should handle new files, new directories, and all directory/file
> conflicts.  Yeah, it's a bunch to look at, but directory/file
> conflicts are always a bear.

Indeed.

What's even worse is the question how to represent that in a web UI, and I
think I'll wait for that to happen (if ever) to give that part of the
design more thought.

Ciao,
Dscho

[RFC,0/2] Introduce new merge-tree-ort command

Message

Comments