Message ID | 20220105163324.73369-1-chriscool@tuxfamily.org (mailing list archive) |
---|---|
Headers | show |
Series | Introduce new merge-tree-ort command | expand |
On Wed, Jan 5, 2022 at 8:33 AM Christian Couder <christian.couder@gmail.com> wrote: > > During the 2nd Virtual Git Contributors’ Summit last October, and even > before, the subject of performing server side merges and rebases came > up, as platforms like GitHub and GitLab would like to support many > features and data formats that libgit2 doesn't support, like for > example SHA256 hashes and partial clone. > > It's hard for them to get rid of libgit2 though, because Git itself > doesn't have a good way to support server side merges and rebases, > while libgit2 has ways to perform them. Without server side merges and > rebases, those platforms would have to launch some kind of checkout, > which can be very expensive, before any merge or rebase. > > The latest discussions on this topic following the 2nd Virtual > Summit[1] ended with some proposals around a `git merge-tree` on > steroids that could be a good solution to this issue. > > The current `git merge-tree` command though seems to have a number of > issues, especially: > > - it's too much related to the old merge recursive strategy which is > not the default anymore since v2.34.0 and is likely to be > deprecated over time, > > - it seems to output things in its own special format, which is not > easy to customize, and which needs special code and logic to parse I agree we don't want those...but why would new merge-tree options have to use the old merge strategy or the old output format? > To move forward on this, this small RFC patch series introduces a new > `git merge-tree-ort` command with the following design: Slightly dislike the command name. `ort` was meant as a temporary, internal name. I don't think it's very meaningful to users, so I was hoping to just make `recursive` mean `ort` after we had enough testing, and to delete merge-recursive.[ch] at that time. Then `ort` merely becomes a historical footnote (...and perhaps part of the name of the file where the `recursive` algorithm is implemented). > - it uses merge-ort's API as is to perform the merge > > - it gets back a tree oid and a cleanliness status from merge-ort's > API and prints them out first Good so far. > > - it uses diff's API as is to output changed paths and code > > - the diff API, actually diff_tree_oid() is called 3 times: once for > the diff versus branch1 ("ours"), once for the diff versus branch2 > ("theirs"), and once for the diff versus the base. Why? That seems to be a performance penalty for anyone that doesn't want/need the diffs, and since we return a tree, a caller can go and get whatever diffs they like. > Therefore: > > - its code is very simple and very easy to extend and customize, for > example by passing diff or merge-ort options that the code would > just pass on to the merge-ort and diff APIs respectively > > - its output can easily be parsed using simple code These points are good. > and existing diff parsers > > This of course means that merge-tree-ort's output is not backward > compatible with merge-tree's output, but it doesn't seem that there is > much value in keeping the same output anyway. On the contrary > merge-tree's output is likely to hold us back already. > > The first patch in the series adds the new command without any test > and documentation. > > The second patch in the series adds a few tests that let us see how > the command's output looks like in different very simple cases. > > Of course if this approach is considered valuable, I plan to add some > documentation, more tests and very likely a number of options before > submitting the next iteration. Was there something you didn't like about https://lore.kernel.org/git/pull.1114.git.git.1640927044.gitgitgadget@gmail.com/? > I am not sure that it's worth showing the 3 diffs (versus branch1, > branch2 and base) by default. Maybe by default no diff at all should > be shown and the command should have --branch1 (or --ours), --branch2 > (or --theirs) and --base options to ask for such output, but for an > RFC patch I thought it would be better to output the 3 diffs so that > people get a better idea of the approach this patch series is taking. I think not showing, neither by default or at all would be better. All three of these are things users could easily generate for themselves with the tree we return. I'm curious, though, what's the usecase for wanting these specific diffs? Two things you didn't return that users cannot get any other way: (1) conflict and warning messages, (2) list of conflicted paths. > [1] https://lore.kernel.org/git/nycvar.QRO.7.76.6.2110211147490.56@tvgsbejvaqbjf.bet/ > > > Christian Couder (2): > merge-ort: add new merge-tree-ort command > merge-ort: add t/t4310-merge-tree-ort.sh > > .gitignore | 1 + > Makefile | 1 + > builtin.h | 1 + > builtin/merge-tree-ort.c | 93 ++++++++++++++++++++++ > git.c | 1 + > t/t4310-merge-tree-ort.sh | 162 ++++++++++++++++++++++++++++++++++++++ > 6 files changed, 259 insertions(+) > create mode 100644 builtin/merge-tree-ort.c > create mode 100755 t/t4310-merge-tree-ort.sh > > -- > 2.34.1.433.g7bc349372a.dirty
On Wed, Jan 5, 2022 at 8:53 AM Elijah Newren <newren@gmail.com> wrote: > > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder > <christian.couder@gmail.com> wrote: > > > > During the 2nd Virtual Git Contributors’ Summit last October, and even > > before, the subject of performing server side merges and rebases came > > up, as platforms like GitHub and GitLab would like to support many > > features and data formats that libgit2 doesn't support, like for > > example SHA256 hashes and partial clone. > > ... > > The first patch in the series adds the new command without any test > > and documentation. > > > > The second patch in the series adds a few tests that let us see how > > the command's output looks like in different very simple cases. > > > > Of course if this approach is considered valuable, I plan to add some > > documentation, more tests and very likely a number of options before > > submitting the next iteration. > > Was there something you didn't like about > https://lore.kernel.org/git/pull.1114.git.git.1640927044.gitgitgadget@gmail.com/? Since I already had some fixes queued up for that series based on feedback from Johannes Altmanninger and Fabian Stelzer, I decided to submit those so you wouldn't have to stumble over the same problems. So see https://lore.kernel.org/git/pull.1114.v2.git.git.1641403655.gitgitgadget@gmail.com/ instead (which also now has a pointer to this series in the cover letter).
On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@gmail.com> wrote: > > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder > <christian.couder@gmail.com> wrote: > > The current `git merge-tree` command though seems to have a number of > > issues, especially: > > > > - it's too much related to the old merge recursive strategy which is > > not the default anymore since v2.34.0 and is likely to be > > deprecated over time, > > > > - it seems to output things in its own special format, which is not > > easy to customize, and which needs special code and logic to parse > > I agree we don't want those...but why would new merge-tree options > have to use the old merge strategy or the old output format? Yeah, it's not necessary if there are 2 separate modes, a "real" mode (like what you implemented with --real in your recent patch series) and a "trivial" mode (which is the name you give to the old code). Adding modes like this to a command is likely to make the command and its documentation more difficult to understand though. For example I think that we created `git switch` because the different modes of `git checkout` made the command hard to learn. I gave other reasons in [1] why I prefer a separate command. [1] https://lore.kernel.org/git/CAP8UFD1LgfZ0MT9=cMvxCcox++_MBBhWG9Twf42cMiXL42AdpQ@mail.gmail.com/ > > To move forward on this, this small RFC patch series introduces a new > > `git merge-tree-ort` command with the following design: > > Slightly dislike the command name. I am ok with changing the command name. > `ort` was meant as a temporary, > internal name. I don't think it's very meaningful to users, so I was > hoping to just make `recursive` mean `ort` after we had enough > testing, and to delete merge-recursive.[ch] at that time. Then `ort` > merely becomes a historical footnote (...and perhaps part of the name > of the file where the `recursive` algorithm is implemented). I think something similar could still be done with `git merge-tree-ort` or whatever name we give to this command. For example we could first add --ort to `git merge-tree` and make it call `git merge-tree-ort`, then after some time make --ort the default, then after some more time remove `git merge-tree-ort`. And whenever we make those changes we could also rename the builtin/merge-tree*.{h,c} accordingly. > > - it uses merge-ort's API as is to perform the merge > > > > - it gets back a tree oid and a cleanliness status from merge-ort's > > API and prints them out first > > Good so far. > > > - it uses diff's API as is to output changed paths and code > > > > - the diff API, actually diff_tree_oid() is called 3 times: once for > > the diff versus branch1 ("ours"), once for the diff versus branch2 > > ("theirs"), and once for the diff versus the base. > > Why? That seems to be a performance penalty for anyone that doesn't > want/need the diffs, and since we return a tree, a caller can go and > get whatever diffs they like. I say somewhere else that I am planning to add options to disable this or parts of this diff output. I think it's still interesting for the command to be able to output diffs, especially diffs of conflicting parts. In [2] Ævar said: => I.e. I'm not the first or last to have (not for anything serious) => implement a dry-run bare-repo merge with something like: => => git merge-tree origin/master git-for-windows/main origin/seen >diff => # Better regex needed, but basically this => grep "^\+<<<<<<< \.our$" diff && conflict=t Also `git merge-tree` currently outputs diffs, so I thought it would be sad if the new command couldn't do it. [2] https://lore.kernel.org/git/211109.861r3qdpt8.gmgdl@evledraar.gmail.com/ > > Therefore: > > > > - its code is very simple and very easy to extend and customize, for > > example by passing diff or merge-ort options that the code would > > just pass on to the merge-ort and diff APIs respectively > > > > - its output can easily be parsed using simple code > > These points are good. > > > and existing diff parsers > > > > This of course means that merge-tree-ort's output is not backward > > compatible with merge-tree's output, but it doesn't seem that there is > > much value in keeping the same output anyway. On the contrary > > merge-tree's output is likely to hold us back already. > > > > The first patch in the series adds the new command without any test > > and documentation. > > > > The second patch in the series adds a few tests that let us see how > > the command's output looks like in different very simple cases. > > > > Of course if this approach is considered valuable, I plan to add some > > documentation, more tests and very likely a number of options before > > submitting the next iteration. > > Was there something you didn't like about > https://lore.kernel.org/git/pull.1114.git.git.1640927044.gitgitgadget@gmail.com/? I was having a vacation at the time and even though I skimmed the mailing list, I missed it. Sorry. Also I thought that you might not be interested in this anymore as you didn't reply to [1] and [2] where Ævar and I both said that we were interested in your opinion on a git merge-tree on steroids. > > I am not sure that it's worth showing the 3 diffs (versus branch1, > > branch2 and base) by default. Maybe by default no diff at all should > > be shown and the command should have --branch1 (or --ours), --branch2 > > (or --theirs) and --base options to ask for such output, but for an > > RFC patch I thought it would be better to output the 3 diffs so that > > people get a better idea of the approach this patch series is taking. > > I think not showing, neither by default I am ok with not showing them by default. > or at all would be better. > All three of these are things users could easily generate for > themselves with the tree we return. I'm curious, though, what's the > usecase for wanting these specific diffs? I think I replied to this above. > Two things you didn't return that users cannot get any other way: (1) > conflict and warning messages, (2) list of conflicted paths. Yeah, I wasn't sure how they could be returned with the merge-ort (or maybe diff) API, and I thought that the current `git merge-tree` doesn't report them, so I was aiming for something roughly just as powerful as the current `git merge-tree`.
On Fri, Jan 7, 2022 at 9:58 AM Christian Couder <christian.couder@gmail.com> wrote: > > On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@gmail.com> wrote: > > > > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder > > <christian.couder@gmail.com> wrote: > > > > The current `git merge-tree` command though seems to have a number of > > > issues, especially: > > > > > > - it's too much related to the old merge recursive strategy which is > > > not the default anymore since v2.34.0 and is likely to be > > > deprecated over time, > > > > > > - it seems to output things in its own special format, which is not > > > easy to customize, and which needs special code and logic to parse > > > > I agree we don't want those...but why would new merge-tree options > > have to use the old merge strategy or the old output format? > > Yeah, it's not necessary if there are 2 separate modes, a "real" mode > (like what you implemented with --real in your recent patch series) > and a "trivial" mode (which is the name you give to the old code). > > Adding modes like this to a command is likely to make the command and > its documentation more difficult to understand though. For example I > think that we created `git switch` because the different modes of `git > checkout` made the command hard to learn. > > I gave other reasons in [1] why I prefer a separate command. > > [1] https://lore.kernel.org/git/CAP8UFD1LgfZ0MT9=cMvxCcox++_MBBhWG9Twf42cMiXL42AdpQ@mail.gmail.com/ I can see where you're coming from, but I think the particular comparison you use isn't quite apples to apples. `git checkout` has a "change branches" mode and a "revert specific paths" mode. While those have significant implementation overlap, they seem like different concepts to users. For merge-tree, the implementation is completely orthogonal between the two modes, but the concept is basically the same from the user viewpoint. Yes, the output differs in the two merge-tree modes, but command line arguments are often used to tweak the output (much like diff has different output styles based on various flags). In fact, in [1] where you suggest a new command you suggest that it should have "roughly the same features as git merge-tree and a similar interface". To me, that suggests that the two may be good candidates for being similar commands. That said, I'm not against a new command. Personally, my main reason for using merge-tree wasn't because that's where I thought it was best to put it, but just that (IIRC) each of Dscho, Peff, and Junio suggested that location. My biggest gripe was just the command name... > > > To move forward on this, this small RFC patch series introduces a new > > > `git merge-tree-ort` command with the following design: > > > > Slightly dislike the command name. > > I am ok with changing the command name. :-) > > `ort` was meant as a temporary, > > internal name. I don't think it's very meaningful to users, so I was > > hoping to just make `recursive` mean `ort` after we had enough > > testing, and to delete merge-recursive.[ch] at that time. Then `ort` > > merely becomes a historical footnote (...and perhaps part of the name > > of the file where the `recursive` algorithm is implemented). > > I think something similar could still be done with `git > merge-tree-ort` or whatever name we give to this command. For example > we could first add --ort to `git merge-tree` and make it call `git > merge-tree-ort`, then after some time make --ort the default, then > after some more time remove `git merge-tree-ort`. And whenever we make > those changes we could also rename the builtin/merge-tree*.{h,c} > accordingly. User-facing names tend to take a while to remove; I'd rather start with user-facing command and option names that are meaningful to users in terms of what they want to accomplish. I don't think `ort` qualifies as such; I'd rather `ort` only be used by expert users (e.g. folks wanting to test out a new merge algorithm they heard about before it became the default). > > > - it uses merge-ort's API as is to perform the merge > > > > > > - it gets back a tree oid and a cleanliness status from merge-ort's > > > API and prints them out first > > > > Good so far. > > > > > - it uses diff's API as is to output changed paths and code > > > > > > - the diff API, actually diff_tree_oid() is called 3 times: once for > > > the diff versus branch1 ("ours"), once for the diff versus branch2 > > > ("theirs"), and once for the diff versus the base. > > > > Why? That seems to be a performance penalty for anyone that doesn't > > want/need the diffs, and since we return a tree, a caller can go and > > get whatever diffs they like. > > I say somewhere else that I am planning to add options to disable this > or parts of this diff output. > > I think it's still interesting for the command to be able to output > diffs, especially diffs of conflicting parts. Are you presuming that you can output a diff of conflicting parts? A diff on a modify/delete or can't-merge-binary-files won't show anything, and won't provide any notification that there even was a conflict. > In [2] Ævar said: > > => I.e. I'm not the first or last to have (not for anything serious) > => implement a dry-run bare-repo merge with something like: > => > => git merge-tree origin/master git-for-windows/main origin/seen >diff > => # Better regex needed, but basically this > => grep "^\+<<<<<<< \.our$" diff && conflict=t This code is pretty hacky; it presumes that content conflicts are the only type of conflict. Inability to merge binary files, file/directory conflicts, mode conflicts, modify/delete conflicts, various rename conflicts, etc. are all other types. I'd rather not encourage such hacks. Returning whether there were conflicts in the exit status is far cleaner. Providing some mechanism for getting a list of files which had conflicts may also be useful, but both of these are things where a diff is the wrong tool. > Also `git merge-tree` currently outputs diffs, so I thought it would > be sad if the new command couldn't do it. > > [2] https://lore.kernel.org/git/211109.861r3qdpt8.gmgdl@evledraar.gmail.com/ Hmm, I had a totally different opinion. I felt the diffs in the current merge-tree was a hack to deal with the fact that they didn't have good tools to make use of the results, and ended up providing a Rube-Goldberg scheme to do anything useful with it. Providing a tree is a concrete and easily usable object for end users. They can feed that tree to other git commands to do additional things, and obviates the need for the Rube-Goldberg contraption. > > Was there something you didn't like about > > https://lore.kernel.org/git/pull.1114.git.git.1640927044.gitgitgadget@gmail.com/? > > I was having a vacation at the time and even though I skimmed the > mailing list, I missed it. Sorry. Ah, gotcha. I was just curious if there was something you felt was wrong and you wanted to propose an alternative. > Also I thought that you might not be interested in this anymore as you > didn't reply to [1] and [2] where Ævar and I both said that we were > interested in your opinion on a git merge-tree on steroids. Sorry, I saw Ævar's email and noticed various assumptions I felt were problematic, but noticed he said he was just spitballing (which is totally fair). I figured that rather than attempt to explain each detail, it'd be better to update my code and provide it...and I then ignored the rest of the thread and missed your question. Sorry. I certainly meant to update this series and send it out earlier, but there were multiple other series of mine that were also backed up and several series I needed to review, so it took me a while to clean out the queue a bit.
Hi Christian, On Fri, 7 Jan 2022, Christian Couder wrote: > On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@gmail.com> wrote: > > > > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder > > <christian.couder@gmail.com> wrote: > > > > To move forward on this, this small RFC patch series introduces a > > > new `git merge-tree-ort` command with the following design: > > > > Slightly dislike the command name. > > I am ok with changing the command name. Or just going with `merge-tree`? That command name has the distinct advantage of _already_ being used by Git. So there is no chance for users to have a `git-merge-tree` script lying around that all of a sudden would stop working because a superseding built-in using the same name is introduced. That is a distinct advantage of using the existing command for the new code flow, even if the command name could be interpreted as misleading. Of course, the name could also be construed to be on point: it merges, and it produces a tree, hence "merge-tree". > > > - the diff API, actually diff_tree_oid() is called 3 times: once for > > > the diff versus branch1 ("ours"), once for the diff versus branch2 > > > ("theirs"), and once for the diff versus the base. > > > > Why? That seems to be a performance penalty for anyone that doesn't > > want/need the diffs, and since we return a tree, a caller can go and > > get whatever diffs they like. > > I say somewhere else that I am planning to add options to disable this > or parts of this diff output. Well, now you got me really curious. Since you are listed as part of GitLab (https://about.gitlab.com/company/team/#chriscool), I assume that you are pretty familiar with how merges are done on GitLab's server side, and plan on experimenting with your own work to use `merge-ort` on GitLab's servers. Which makes me wonder where that idea comes from to use the diff API? What operations does GitLab need from that Git command, which code flows need to be supported (and what inputs/outputs are there)? > I think it's still interesting for the command to be able to output > diffs, especially diffs of conflicting parts. In [2] Ævar said: > > => I.e. I'm not the first or last to have (not for anything serious) > => implement a dry-run bare-repo merge with something like: > => > => git merge-tree origin/master git-for-windows/main origin/seen >diff > => # Better regex needed, but basically this > => grep "^\+<<<<<<< \.our$" diff && conflict=t > > Also `git merge-tree` currently outputs diffs, so I thought it would > be sad if the new command couldn't do it. > > [2] https://lore.kernel.org/git/211109.861r3qdpt8.gmgdl@evledraar.gmail.com/ But `git merge-tree` was only meant as a proof-of-concept, intended to entice other developers to turn it into something useful. As it took _this_ long, we can safely assume that the user interface of `merge-tree` left a bit of a room for improvement. Wouldn't you agree? > Also I thought that you might not be interested in this anymore as you > didn't reply to [1] and [2] where Ævar and I both said that we were > interested in your opinion on a git merge-tree on steroids. Elijah stated at the time that he was scarce on time for Git. But back to the topic: have you played with using `merge-ort` in GitLab already? What were the needs you saw that would be required of the Git command, what flows would it have to support? Ciao, Johannes
Hi Elijah, On Fri, 7 Jan 2022, Elijah Newren wrote: > On Fri, Jan 7, 2022 at 9:58 AM Christian Couder > <christian.couder@gmail.com> wrote: > > > > On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@gmail.com> wrote: > > > > > > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder > > > <christian.couder@gmail.com> wrote: > > > > > > The current `git merge-tree` command though seems to have a number of > > > > issues, especially: > > > > > > > > - it's too much related to the old merge recursive strategy which is > > > > not the default anymore since v2.34.0 and is likely to be > > > > deprecated over time, > > > > > > > > - it seems to output things in its own special format, which is not > > > > easy to customize, and which needs special code and logic to parse > > > > > > I agree we don't want those...but why would new merge-tree options > > > have to use the old merge strategy or the old output format? > > > > Yeah, it's not necessary if there are 2 separate modes, a "real" mode > > (like what you implemented with --real in your recent patch series) > > and a "trivial" mode (which is the name you give to the old code). > > > > Adding modes like this to a command is likely to make the command and > > its documentation more difficult to understand though. For example I > > think that we created `git switch` because the different modes of `git > > checkout` made the command hard to learn. > > > > I gave other reasons in [1] why I prefer a separate command. > > > > [1] https://lore.kernel.org/git/CAP8UFD1LgfZ0MT9=cMvxCcox++_MBBhWG9Twf42cMiXL42AdpQ@mail.gmail.com/ > > I can see where you're coming from, but I think the particular > comparison you use isn't quite apples to apples. `git checkout` has a > "change branches" mode and a "revert specific paths" mode. While > those have significant implementation overlap, they seem like > different concepts to users. For merge-tree, the implementation is > completely orthogonal between the two modes, but the concept is > basically the same from the user viewpoint. Yes, the output differs > in the two merge-tree modes, but command line arguments are often used > to tweak the output (much like diff has different output styles based > on various flags). In fact, in [1] where you suggest a new command > you suggest that it should have "roughly the same features as git > merge-tree and a similar interface". To me, that suggests that the > two may be good candidates for being similar commands. > > That said, I'm not against a new command. Personally, my main reason > for using merge-tree wasn't because that's where I thought it was best > to put it, but just that (IIRC) each of Dscho, Peff, and Junio > suggested that location. > > My biggest gripe was just the command name... I am against a new command for what essentially serves the original purpose of `merge-tree`. The fact that `merge-tree` has not seen any work in almost 12 years is testament not only to how hard it was to disentangle the work-tree requirement from the recursive merge (it is one of my favorite counterexamples when anybody claims that you can easily prototype code in a script and then convert it to C), but the fact that there is no user within Git itself (apart from t/t4300-merge-tree.sh, which does not count) speaks volumes about the design of that `merge-tree` tool. So it's only fair to breathe life into it by letting it do what it was meant to do all along. > > Also `git merge-tree` currently outputs diffs, so I thought it would > > be sad if the new command couldn't do it. > > > > [2] https://lore.kernel.org/git/211109.861r3qdpt8.gmgdl@evledraar.gmail.com/ > > Hmm, I had a totally different opinion. I felt the diffs in the > current merge-tree was a hack to deal with the fact that they didn't > have good tools to make use of the results, and ended up providing a > Rube-Goldberg scheme to do anything useful with it. Indeed. When I had a look how libgit2 is used to perform a server-side merge, I saw how careful the code is not to produce anything unneeded. And since the `merge` operation is performed a ton of times even without any user interaction, producing a diff is the _least_ of said code's concerns. > Providing a tree is a concrete and easily usable object for end users. > They can feed that tree to other git commands to do additional things, > and obviates the need for the Rube-Goldberg contraption. I strongly concur. So I would like to reiterate my challenge to you, Christian: could you have a look at the server-side merge of GitLab, and see what it actually would need of the `git merge-tree` command? I _bet_ it needs first and foremost the information "is this mergeable?". That is by far the most common question the code I saw had to answer, without any follow-up questions asked. An add-on question is then "which files/directories conflicted?". And there, it really only wanted the file names, along with the OIDs of the respective item in the base, the HEAD and the merge branch. In my work in December, I also had to implement another thing that I think we will have to implement in `merge-tree` in one form or another: when users provided merge conflict resolutions via the web UI, we want the merge to succeed. What I implemented was this (in a nutshell, a way to provide file names with associated blob OIDs that resolve the content conflicts): -- snip -- Author: Johannes Schindelin <Johannes.Schindelin@gmx.de> Date: Thu Dec 16 20:46:35 2021 +0100 Subject: merge-ort: allow pre-resolving merge conflicts One of merge-ort's particular strengths is that it works well even in a worktree-less setting, e.g. in a backend for a server-side merge. In such a scenario, it is conceivable that the merge in question not only results in a conflict, but that the caller figures out some sort of resolution before calling the merge again. The second call, of course, is meant to apply the merge conflict resolution. With this commit, we allow just that. The new, optional `pre_resolved_conflicts` field of `struct merge_options` maps paths to the blob OID of the resolution. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> diff --git a/merge-ort.c b/merge-ort.c index a74d4251c3..fa59ce2f97 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -3961,6 +3961,7 @@ static void process_entries(struct merge_options *opt, prefetch_for_content_merges(opt, &plist); for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) { char *path = entry->string; + struct object_id *resolved_oid; /* * NOTE: mi may actually be a pointer to a conflict_info, but * we have to check mi->clean first to see if it's safe to @@ -3972,7 +3973,17 @@ static void process_entries(struct merge_options *opt, &dir_metadata); if (mi->clean) record_entry_for_tree(&dir_metadata, path, mi); - else { + else if (opt->pre_resolved_conflicts && + (resolved_oid = + strmap_get(opt->pre_resolved_conflicts, path))) { + mi->clean = 1; + mi->is_null = 0; + memcpy(&mi->result.oid, resolved_oid, + sizeof(*resolved_oid)); + if (!mi->result.mode) + mi->result.mode = 0100644; + record_entry_for_tree(&dir_metadata, path, mi); + } else { struct conflict_info *ci = (struct conflict_info *)mi; process_entry(opt, path, ci, &dir_metadata); } diff --git a/merge-recursive.h b/merge-recursive.h index 0795a1d3ec..1f45effdd0 100644 --- a/merge-recursive.h +++ b/merge-recursive.h @@ -47,6 +47,13 @@ struct merge_options { const char *subtree_shift; unsigned renormalize : 1; + /* + * (merge-ORT only) If non-NULL, contains a map from `path` to OID. If + * the given `path would be marked as conflict, it is instead resolved + * to the specified blob. + */ + struct strmap *pre_resolved_conflicts; + /* internal fields used by the implementation */ struct merge_options_internal *priv; }; -- snap -- It is a proof-of-concept, therefore it expects the resolved conflicts to be in _files_. I don't think that there is a way to reasonably handle symlink target conflicts or directory/file/symlink conflicts, but there might be. A subsequent commit changed my hacky `merge-tree` hack to optionally accept NUL-terminated merge conflict resolutions in <path>/<OID> pairs via stdin (again, avoiding files to be written where we do not _need_ spend I/O unnecessarily). The biggest problem we faced at the Contributor Summit was that our discussion was not informed by the actual server-side needs. So I would like to reiterate my challenge to make that the driver. Let's not use any hypothetical scenario as the basis for the design of `git merge-tree`, but let's use the concrete requirements of actual server-side merges that are currently implemented using libgit2, when trying to figure out what functionality we need from `merge-tree`, and in what shape. Here is an initial list: - need to determine whether a merge will succeed, quickly - need the tree OID for a successful merge - need the list of items that conflict, along with OIDs and modes, if the merge failed - need a way to provide merge conflict resolutions And now that I wrote all this, I realize I should have sent it to the `merge-tree` thread, not the `merge-tree-ort` thread... Ciao, Dscho
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > I am against a new command for what essentially serves the original > purpose of `merge-tree`. > > The fact that `merge-tree` has not seen any work in almost 12 years is > testament not only to how hard it was to disentangle the work-tree > requirement from the recursive merge (it is one of my favorite > counterexamples when anybody claims that you can easily prototype code in > a script and then convert it to C), but the fact that there is no user > within Git itself (apart from t/t4300-merge-tree.sh, which does not count) > speaks volumes about the design of that `merge-tree` tool. > > So it's only fair to breathe life into it by letting it do what it was > meant to do all along. My "Yup" would not weigh as much as one that Linus (whose original merge-tree survived this long without seeing much enhancements) might give us, but he is busy elsewhere so you guys have to live with mine ;-) As to its output, I do agree that "we give a tree when it is already usable to record in a new commit" is a valuable option to have. The original behaviour should be made available somehow, for those who built their workflow (including scripts) around it, though.
On Mon, Jan 10, 2022 at 5:49 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Elijah, > > On Fri, 7 Jan 2022, Elijah Newren wrote: > > > On Fri, Jan 7, 2022 at 9:58 AM Christian Couder > > <christian.couder@gmail.com> wrote: > > > > > > On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@gmail.com> wrote: > > > > > > > > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder > > > > <christian.couder@gmail.com> wrote: > > > ... > > I _bet_ it needs first and foremost the information "is this mergeable?". > > That is by far the most common question the code I saw had to answer, > without any follow-up questions asked. > > An add-on question is then "which files/directories conflicted?". And > there, it really only wanted the file names, along with the OIDs of the > respective item in the base, the HEAD and the merge branch. This might be difficult to provide for some cases, e.g. rename/rename(1to2), especially if that conflict gets entangled with any others. Users would also have difficulty gaining these even using the command line `git merge` (with either recursive or ort) for these cases. Also, does this presume three OIDs? What about the cases where there are 4 or 6 for a given path? I'm a bit worried about the assumptions underlying this request, and that it might not be satisfiable in general depending on what those assumptions are. > In my work in December, I also had to implement another thing that I think > we will have to implement in `merge-tree` in one form or another: when > users provided merge conflict resolutions via the web UI, we want the > merge to succeed. What I implemented was this (in a nutshell, a way to > provide file names with associated blob OIDs that resolve the content > conflicts): Interesting. I'm curious, though -- if they are providing conflict resolutions, then would they not have had a previous merge to find out what the conflicts were? If so, wouldn't they be able to provide the tree to avoid the need for a second merge? > > -- snip -- > Author: Johannes Schindelin <Johannes.Schindelin@gmx.de> > Date: Thu Dec 16 20:46:35 2021 +0100 > Subject: merge-ort: allow pre-resolving merge conflicts > > One of merge-ort's particular strengths is that it works well even in a > worktree-less setting, e.g. in a backend for a server-side merge. > > In such a scenario, it is conceivable that the merge in question not > only results in a conflict, but that the caller figures out some sort of > resolution before calling the merge again. The second call, of course, > is meant to apply the merge conflict resolution. > > With this commit, we allow just that. The new, optional > `pre_resolved_conflicts` field of `struct merge_options` maps paths to > the blob OID of the resolution. > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > > diff --git a/merge-ort.c b/merge-ort.c > index a74d4251c3..fa59ce2f97 100644 > --- a/merge-ort.c > +++ b/merge-ort.c > @@ -3961,6 +3961,7 @@ static void process_entries(struct merge_options *opt, > prefetch_for_content_merges(opt, &plist); > for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) { > char *path = entry->string; > + struct object_id *resolved_oid; > /* > * NOTE: mi may actually be a pointer to a conflict_info, but > * we have to check mi->clean first to see if it's safe to > @@ -3972,7 +3973,17 @@ static void process_entries(struct merge_options *opt, > &dir_metadata); > if (mi->clean) > record_entry_for_tree(&dir_metadata, path, mi); > - else { > + else if (opt->pre_resolved_conflicts && > + (resolved_oid = > + strmap_get(opt->pre_resolved_conflicts, path))) { You have a couple problematic assumptions here: * What if the user's resolution required fixing a semantic conflict, meaning they needed to modify a file that had no conflicts? Your code here would ignore those, because the clean case is handled above. * What if the user's resolution required adding a totally new file (either because a rename is handled as a separate add/delete, or they just needed a new file)? The loop above isn't over items in pre_resolved_conflicts, it just assumes that items in pre_resolved_conflicts are also in the plist.items being looped over. > + mi->clean = 1; > + mi->is_null = 0; > + memcpy(&mi->result.oid, resolved_oid, > + sizeof(*resolved_oid)); And here's another: * What if the provided resolution was a deletion of the file in question (especially after e.g. a modify/delete or rename/delete conflict)? Don't you need to have a special check for the zero_oid here? And if I combine the things above: * What if the user wants to remove a file and add a directory of files in its place at some location in order to resolve things? Granted, you didn't handle either deletes or new files above, but if you did add both, then you might have this issue. The code at this point used some very carefully constructed logic and order of walking items specifically to handle file/delete conflicts correctly. I'm worried that your conflict resolution stuff here might interact badly with that. > + if (!mi->result.mode) > + mi->result.mode = 0100644; How do you know it's not supposed to be 0100755? > + record_entry_for_tree(&dir_metadata, path, mi); > + } else { > struct conflict_info *ci = (struct conflict_info *)mi; > process_entry(opt, path, ci, &dir_metadata); > } > diff --git a/merge-recursive.h b/merge-recursive.h > index 0795a1d3ec..1f45effdd0 100644 > --- a/merge-recursive.h > +++ b/merge-recursive.h > @@ -47,6 +47,13 @@ struct merge_options { > const char *subtree_shift; > unsigned renormalize : 1; > > + /* > + * (merge-ORT only) If non-NULL, contains a map from `path` to OID. If > + * the given `path would be marked as conflict, it is instead resolved > + * to the specified blob. > + */ > + struct strmap *pre_resolved_conflicts; > + > /* internal fields used by the implementation */ > struct merge_options_internal *priv; > }; > -- snap -- > > It is a proof-of-concept Fair enough. >, therefore it expects the resolved conflicts to > be in _files_. I don't think that there is a way to reasonably handle > symlink target conflicts or directory/file/symlink conflicts, but there > might be. You really need (mode,oid) pairs to be provided by the user. That fixes the executable issue I mentioned above, and makes it clear how to handle symlinks and file/symlink conflicts. directory/file are still handled by providing individual files, but ordering traversal becomes really tricky. The directory/file case might even require the pre_resolved_conflicts to be pulled out of that loop somehow. It'd take some investigative work, or some deep thought, or both. > A subsequent commit changed my hacky `merge-tree` hack to optionally > accept NUL-terminated merge conflict resolutions in <path>/<OID> pairs via > stdin (again, avoiding files to be written where we do not _need_ spend > I/O unnecessarily). > > The biggest problem we faced at the Contributor Summit was that our > discussion was not informed by the actual server-side needs. So I would > like to reiterate my challenge to make that the driver. Let's not use any > hypothetical scenario as the basis for the design of `git merge-tree`, but > let's use the concrete requirements of actual server-side merges that are > currently implemented using libgit2, when trying to figure out what > functionality we need from `merge-tree`, and in what shape. > > Here is an initial list: > > - need to determine whether a merge will succeed, quickly > > - need the tree OID for a successful merge > > - need the list of items that conflict... I think my version is good up to here. > , along with OIDs and modes, if the merge failed Could you clarify what you mean here by OIDs and modes? For a given path, are you just looking for a (pathname, OID, mode) tuple? (In which case, you'd get the OID of a file that potentially has embedded conflict markers) Or are you looking for multiple OIDs and modes for each file? If you are looking for multiple OIDs and modes for each file (representing where the content came from on the different sides of the merge), are you looking for: * the OID and mode of each file that played a part in the merge resolution OR * just the OIDs and modes that would have been recorded in the index had the merge been done by `git merge`? (Those last two possibilities are usually the same answer, but no they are not always the same. The index cannot hold all the OIDs and modes in various cases and we have to squash them down to three, possibly tossing some information that might have been of interest to the user. This can even occur when you have a unique merge base.) > > - need a way to provide merge conflict resolutions > > And now that I wrote all this, I realize I should have sent it to the > `merge-tree` thread, not the `merge-tree-ort` thread... My submission was RFC and you're providing C, so it's all good in my book. I'm reading both threads. :-)
Hi Junio, On Mon, 10 Jan 2022, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > I am against a new command for what essentially serves the original > > purpose of `merge-tree`. > > > > The fact that `merge-tree` has not seen any work in almost 12 years is > > testament not only to how hard it was to disentangle the work-tree > > requirement from the recursive merge (it is one of my favorite > > counterexamples when anybody claims that you can easily prototype code in > > a script and then convert it to C), but the fact that there is no user > > within Git itself (apart from t/t4300-merge-tree.sh, which does not count) > > speaks volumes about the design of that `merge-tree` tool. > > > > So it's only fair to breathe life into it by letting it do what it was > > meant to do all along. > > My "Yup" would not weigh as much as one that Linus (whose original > merge-tree survived this long without seeing much enhancements) > might give us, but he is busy elsewhere so you guys have to live > with mine ;-) > > As to its output, I do agree that "we give a tree when it is already > usable to record in a new commit" is a valuable option to have. The > original behaviour should be made available somehow, for those who > built their workflow (including scripts) around it, though. No, I don't think it is a good idea to keep the original behavior around indefinitely, when it is totally unclear whether there actually _is_ any user of this feature out there. We intentionally broke any existing users of `git-parse-remote.sh` by removing it, when that feature was much more likely to be used in scripts than `git merge-tree`. We cannot say on the one hand that we will get rid of some useful script just because we don't want the maintenance burden when on the other hand trying to keep support for an operation that is unlikely to have any users. That does not compute. (And don't get hung up on the specific example of `git-parse-remote.sh`, you know there we shuffled around _many_ more things recently that had a good chance of breaking existing users.) Besides, the original `git remote-tree` behavior will be very easy to recreate using the better UI, the one that outputs tree OIDs. Ciao, Dscho
On Tue, Jan 11 2022, Johannes Schindelin wrote: > Hi Junio, > > On Mon, 10 Jan 2022, Junio C Hamano wrote: > >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> >> > I am against a new command for what essentially serves the original >> > purpose of `merge-tree`. >> > >> > The fact that `merge-tree` has not seen any work in almost 12 years is >> > testament not only to how hard it was to disentangle the work-tree >> > requirement from the recursive merge (it is one of my favorite >> > counterexamples when anybody claims that you can easily prototype code in >> > a script and then convert it to C), but the fact that there is no user >> > within Git itself (apart from t/t4300-merge-tree.sh, which does not count) >> > speaks volumes about the design of that `merge-tree` tool. >> > >> > So it's only fair to breathe life into it by letting it do what it was >> > meant to do all along. >> >> My "Yup" would not weigh as much as one that Linus (whose original >> merge-tree survived this long without seeing much enhancements) >> might give us, but he is busy elsewhere so you guys have to live >> with mine ;-) >> >> As to its output, I do agree that "we give a tree when it is already >> usable to record in a new commit" is a valuable option to have. The >> original behaviour should be made available somehow, for those who >> built their workflow (including scripts) around it, though. > > No, I don't think it is a good idea to keep the original behavior around > indefinitely, when it is totally unclear whether there actually _is_ any > user of this feature out there. For what it's worth I've used it for some automation at an ex-employer at least once. Grepping through my #*git* IRC logs there's a few mentions of it, and likewise for StackOverflow. I'm not opposed to replacing it, and I think that probably in-the-wild users of it are almost certainly just grepping for the conflict markers to see if there's conflicts, or parsing which files have them etc. So if we can provide a better interface that they can use (or even make git merge-tree a thin wrapper...). > We intentionally broke any existing users of `git-parse-remote.sh` by > removing it, when that feature was much more likely to be used in scripts > than `git merge-tree`. We cannot say on the one hand that we will get rid > of some useful script just because we don't want the maintenance burden > when on the other hand trying to keep support for an operation that is > unlikely to have any users. That does not compute. (And don't get hung up > on the specific example of `git-parse-remote.sh`[...] The specific example or case really matters. We don't have some generic flowchart for deprecating things that applies in all cases. I removed git-parse-remote.sh in a89a2fbfccd (parse-remote: remove this now-unused library, 2020-11-14) and had the opposite impression you're expressing here. I.e. searching around for in-the-wild uses via search engines, stack overflow etc. either found nothing interesting, or just a reflection of ourselves (i.e. generated manpages etc.). That along with its history, i.e. it wasn't *really* meant for out-of-tree exposure if you look at the history of how it came about, it was just made at a time when similar libraries had similar manpages etc. added as a template. Maybe that was the right thing to do, maybe not, but it went out with v2.30.0 and the lack of complaints since then would seem to suggest that I was right that removing it wouldn't be a big deal. Of course it may have broken someone's script somewhere. But an important distinction is that they can get it working again by just copy/pasting that ~100 line shell library into their own script, or calling the underlying commands it was invoking themselves. Which doesn't apply to "git merge-tree" unless we come up with a replacement, and even then a 1=1 port of some in-the-wild code might be much harder, i.e. if it's deeply coupled to parsing the specific output it emits now. So I'd personally be much more hesitant to remove or change that, but of course we might still come up with good reasons for why it's worth it, especially with some advertised deprecation period. And isn't any doubt around that even more reason to just go with Elijah's plan of introducing new plumbing? I.e. is it really costing us to just leave these "legacy merge" plumbing built-ins and merge-recursive.c etc. in place?
On Mon, Jan 10, 2022 at 9:59 AM Elijah Newren <newren@gmail.com> wrote: > > On Mon, Jan 10, 2022 at 5:49 AM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > ... > >, therefore it expects the resolved conflicts to > > be in _files_. I don't think that there is a way to reasonably handle > > symlink target conflicts or directory/file/symlink conflicts, but there > > might be. > > You really need (mode,oid) pairs to be provided by the user. That > fixes the executable issue I mentioned above, and makes it clear how > to handle symlinks and file/symlink conflicts. > > directory/file are still handled by providing individual files, but > ordering traversal becomes really tricky. The directory/file case > might even require the pre_resolved_conflicts to be pulled out of that > loop somehow. It'd take some investigative work, or some deep > thought, or both. I think I came up with a solution to this during my run yesterday, though I haven't tried or tested it. Instead of modifying the loop over plist.items, you instead add a preliminary loop over pre_resolved_conflicts that modifies opt->priv->paths (and add this preliminary loop just before the items from opt->priv->paths are added to plist.items). In that preliminary loop, you need to make sure that (a) any files in pre_resolved_conflicts corresponding to existing _files_ in opt->priv->path result in updating that item's clean & is_null & mode & oid state, (b) any files in pre_resolved_conflicts that correspond to existing _directories_ in opt->priv->path need the value to be expanded to be a conflict_info instead of just a merged_info, you need to set the df_conflict bit, and don't update the merge_info fields but do update the extended conflict_info ones, (c) any new files in pre_resolved_conflicts result in new entries opt->priv->paths, (d) any leading directories of new files in pre_resolved_conflicts are appropriately handled, meaning: (d1) new opt->priv->paths are created if the directory path wasn't in opt->priv->paths before, (d2) a tweak to df_conflict for the directory item if it previously existed in opt->priv->paths but only as a file (possibly also necessitating expanding from a merged_info to a conflict_info), (d3) no-op if the directory already existed in opt->priv->paths and was just a directory (and in this case, you can stop walking the parent directories to the toplevel). Then, after this preliminary loop that modifies opt->priv->paths, the rest can just proceed as-is. That should handle new files, new directories, and all directory/file conflicts. Yeah, it's a bunch to look at, but directory/file conflicts are always a bear.
On Tue, Jan 11, 2022 at 12:38 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > On Tue, Jan 11 2022, Johannes Schindelin wrote: > > > Hi Junio, > > > > On Mon, 10 Jan 2022, Junio C Hamano wrote: > > > >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > >> ... > > No, I don't think it is a good idea to keep the original behavior around > > indefinitely, when it is totally unclear whether there actually _is_ any > > user of this feature out there. > > For what it's worth I've used it for some automation at an ex-employer > at least once. Grepping through my #*git* IRC logs there's a few > mentions of it, and likewise for StackOverflow. > > I'm not opposed to replacing it, and I think that probably in-the-wild > users of it are almost certainly just grepping for the conflict markers > to see if there's conflicts, or parsing which files have them etc. > > So if we can provide a better interface that they can use (or even make > git merge-tree a thin wrapper...). ... > So I'd personally be much more hesitant to remove or change that, but of > course we might still come up with good reasons for why it's worth it, > especially with some advertised deprecation period. > > And isn't any doubt around that even more reason to just go with > Elijah's plan of introducing new plumbing? I.e. is it really costing us > to just leave these "legacy merge" plumbing built-ins I definitely think it's worth guiding users away from the old `git merge-tree` behavior in documentation (i.e. deprecating it). That may also lead towards its eventual removal, but I'm not as worried about that. `git merge-recursive` was actively used in various places, including in `git cherry-pick`. I had used it a few times myself in a script. I don't see a need to deprecate it currently, which naturally would push its removal (if anyone is pushing for it) even further away. > and merge-recursive.c etc. in place? This, however, is different. I have much stronger opinions here and I do want to eventually remove merge-recursive.c. Definitely not now, because: * `ort` has not been the default long enough. We had one bug reported this cycle where ort did the wrong thing and using `recursive` was a good workaround. There may be a long tail here, so I'd like to wait a couple years for reports to trickle in. * `merge-recursive.c` is still hard-coded in three places in the code; you can't even set a configuration option to choose merge-ort.c in those places: builtin/am, builtin/merge-recursive, and builtin/stash. More details on that second point: All three of these use merge_recursive_generic() and need that usage to be replaced. It's on my TODO list. Although it might look like a trivial job of just copying merge_recursive_generic() and replace its merge_recursive() call with merge_ort_recursive(), that would only get you something that would die on assert()s in ort's setup. If you removed those assert()s, then you'd merely be copying the bugs in merge_recursive_generic() usage into the new world (and adding one new bug). We should instead fix the problems identified with that usage and those callers during the review of ort. So don't try to do this simplistic translation.
Hi Elijah, On Mon, 10 Jan 2022, Elijah Newren wrote: > On Mon, Jan 10, 2022 at 5:49 AM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > > On Fri, 7 Jan 2022, Elijah Newren wrote: > > > > > On Fri, Jan 7, 2022 at 9:58 AM Christian Couder > > > <christian.couder@gmail.com> wrote: > > > > > > > > On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@gmail.com> > > > > wrote: > > > > > > > > > > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder > > > > > <christian.couder@gmail.com> wrote: > > > > > ... > > > > I _bet_ it needs first and foremost the information "is this > > mergeable?". > > > > That is by far the most common question the code I saw had to answer, > > without any follow-up questions asked. > > > > An add-on question is then "which files/directories conflicted?". And > > there, it really only wanted the file names, along with the OIDs of > > the respective item in the base, the HEAD and the merge branch. > > This might be difficult to provide for some cases, e.g. > rename/rename(1to2), especially if that conflict gets entangled with any > others. Users would also have difficulty gaining these even using the > command line `git merge` (with either recursive or ort) for these cases. > > Also, does this presume three OIDs? What about the cases where there > are 4 or 6 for a given path? > > I'm a bit worried about the assumptions underlying this request, and > that it might not be satisfiable in general depending on what those > assumptions are. Well, that request is driven by the reality of a web UI. You cannot reasonably resolve just any merge conflict in a web UI. But you can easily resolve a trivial content conflict in, say, a README file, without opening a console window, cloning the repository, starting an editor, then saving the result after resolving the textual conflict, committing it, then trying to force-push to the original PR branch (if the contributor has given you permission to push). So what I am talking about here really is the case where no rename conflict happened, no directory/file conflict, no type change. Just the good ole' textual conflicts where the same lines were modified in divergent ways. This means that we need the base, HEAD and MERGE OIDs here (and modes, you are absolutely correct). > > In my work in December, I also had to implement another thing that I > > think we will have to implement in `merge-tree` in one form or > > another: when users provided merge conflict resolutions via the web > > UI, we want the merge to succeed. What I implemented was this (in a > > nutshell, a way to provide file names with associated blob OIDs that > > resolve the content conflicts): > > Interesting. I'm curious, though -- if they are providing conflict > resolutions, then would they not have had a previous merge to find out > what the conflicts were? If so, wouldn't they be able to provide the > tree to avoid the need for a second merge? No, the conflict resolutions are apparently stored independently from the commits. Probably to allow for the same resolutions to be applied in case that the PR's target branch changes but leaves the same conflicts. Again, we are dealing with the realities of a web UI ;-) BTW I am not saying that the way I implemented it is 100% the best way to achieve the intended goal. It is quite possible that there is a better way, or that we need to at least provide the (mode,OID) triplet corresponding to the conflict, too, so that the merge machinery can verify that the resolution is applied to the _correct_ conflict ;-) > > -- snip -- > > Author: Johannes Schindelin <Johannes.Schindelin@gmx.de> > > Date: Thu Dec 16 20:46:35 2021 +0100 > > Subject: merge-ort: allow pre-resolving merge conflicts > > > > One of merge-ort's particular strengths is that it works well even in a > > worktree-less setting, e.g. in a backend for a server-side merge. > > > > In such a scenario, it is conceivable that the merge in question not > > only results in a conflict, but that the caller figures out some sort of > > resolution before calling the merge again. The second call, of course, > > is meant to apply the merge conflict resolution. > > > > With this commit, we allow just that. The new, optional > > `pre_resolved_conflicts` field of `struct merge_options` maps paths to > > the blob OID of the resolution. > > > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > > > > diff --git a/merge-ort.c b/merge-ort.c > > index a74d4251c3..fa59ce2f97 100644 > > --- a/merge-ort.c > > +++ b/merge-ort.c > > @@ -3961,6 +3961,7 @@ static void process_entries(struct merge_options *opt, > > prefetch_for_content_merges(opt, &plist); > > for (entry = &plist.items[plist.nr-1]; entry >= plist.items; --entry) { > > char *path = entry->string; > > + struct object_id *resolved_oid; > > /* > > * NOTE: mi may actually be a pointer to a conflict_info, but > > * we have to check mi->clean first to see if it's safe to > > @@ -3972,7 +3973,17 @@ static void process_entries(struct merge_options *opt, > > &dir_metadata); > > if (mi->clean) > > record_entry_for_tree(&dir_metadata, path, mi); > > - else { > > + else if (opt->pre_resolved_conflicts && > > + (resolved_oid = > > + strmap_get(opt->pre_resolved_conflicts, path))) { > > You have a couple problematic assumptions here: > > * What if the user's resolution required fixing a semantic conflict, > meaning they needed to modify a file that had no conflicts? Your code > here would ignore those, because the clean case is handled above. > > * What if the user's resolution required adding a totally new file > (either because a rename is handled as a separate add/delete, or they > just needed a new file)? The loop above isn't over items in > pre_resolved_conflicts, it just assumes that items in > pre_resolved_conflicts are also in the plist.items being looped over. I can see how these assumptions may look ludicrous when coming from the command-line. But again, we are talking about the realities of a conflict resolution via a web UI. So I think that it is out of the question to address non-textual conflicts in this scenario. And then the assumptions make much more sense. > > > + mi->clean = 1; > > + mi->is_null = 0; > > + memcpy(&mi->result.oid, resolved_oid, > > + sizeof(*resolved_oid)); > > And here's another: > > * What if the provided resolution was a deletion of the file in > question (especially after e.g. a modify/delete or rename/delete > conflict)? Don't you need to have a special check for the zero_oid > here? > > And if I combine the things above: > > * What if the user wants to remove a file and add a directory of > files in its place at some location in order to resolve things? > Granted, you didn't handle either deletes or new files above, but if > you did add both, then you might have this issue. The code at this > point used some very carefully constructed logic and order of walking > items specifically to handle file/delete conflicts correctly. I'm > worried that your conflict resolution stuff here might interact badly > with that. > > > + if (!mi->result.mode) > > + mi->result.mode = 0100644; > > How do you know it's not supposed to be 0100755? I don't ;-) This was a proof-of-concept, and I meant to look into this a bit further, trying to figure out from where I could get this information, but I never got to that. Now that I think about it, the best solution would probably be for the mode to be provided in addition to the blob OID, so that the caller has to decide. > > + record_entry_for_tree(&dir_metadata, path, mi); > > + } else { > > struct conflict_info *ci = (struct conflict_info *)mi; > > process_entry(opt, path, ci, &dir_metadata); > > } > > diff --git a/merge-recursive.h b/merge-recursive.h > > index 0795a1d3ec..1f45effdd0 100644 > > --- a/merge-recursive.h > > +++ b/merge-recursive.h > > @@ -47,6 +47,13 @@ struct merge_options { > > const char *subtree_shift; > > unsigned renormalize : 1; > > > > + /* > > + * (merge-ORT only) If non-NULL, contains a map from `path` to OID. If > > + * the given `path would be marked as conflict, it is instead resolved > > + * to the specified blob. > > + */ > > + struct strmap *pre_resolved_conflicts; > > + > > /* internal fields used by the implementation */ > > struct merge_options_internal *priv; > > }; > > -- snap -- > > > > It is a proof-of-concept > > Fair enough. > > >, therefore it expects the resolved conflicts to > > be in _files_. I don't think that there is a way to reasonably handle > > symlink target conflicts or directory/file/symlink conflicts, but there > > might be. > > You really need (mode,oid) pairs to be provided by the user. That > fixes the executable issue I mentioned above, and makes it clear how > to handle symlinks and file/symlink conflicts. It's a really good point, too. > directory/file are still handled by providing individual files, but > ordering traversal becomes really tricky. The directory/file case > might even require the pre_resolved_conflicts to be pulled out of that > loop somehow. It'd take some investigative work, or some deep > thought, or both. My idea for directory/file conflicts is to essentially teach the web UI to throw its hands in the air upon encountering them, and telling the user that it's not possible to resolve these types of conflicts via the UI. And since my principal driver here is the concrete need for such a server-side merge with conflict resolution, that's as far as I want to push `merge-tree`, too, and leave any more complicated resolution to a future date, or never, whichever comes first. > > A subsequent commit changed my hacky `merge-tree` hack to optionally > > accept NUL-terminated merge conflict resolutions in <path>/<OID> pairs via > > stdin (again, avoiding files to be written where we do not _need_ spend > > I/O unnecessarily). > > > > The biggest problem we faced at the Contributor Summit was that our > > discussion was not informed by the actual server-side needs. So I would > > like to reiterate my challenge to make that the driver. Let's not use any > > hypothetical scenario as the basis for the design of `git merge-tree`, but > > let's use the concrete requirements of actual server-side merges that are > > currently implemented using libgit2, when trying to figure out what > > functionality we need from `merge-tree`, and in what shape. > > > > Here is an initial list: > > > > - need to determine whether a merge will succeed, quickly > > > > - need the tree OID for a successful merge > > > > - need the list of items that conflict... > > I think my version is good up to here. Yes! The only thing I would change is to not even bother providing a tree object in case of conflicts. Either we provide it, and expect the user to somehow reconstruct the conflict types from there, or we simply don't say anything about the tree whose (transitive) file contents may or may not contain conflict markers (or not, think: binary files). Providing a tree object in case of a failed merge isn't helpful IMO. > > , along with OIDs and modes, if the merge failed > > Could you clarify what you mean here by OIDs and modes? For a given > path, are you just looking for a (pathname, OID, mode) tuple? (In > which case, you'd get the OID of a file that potentially has embedded > conflict markers) > > Or are you looking for multiple OIDs and modes for each file? This. In case of a conflict, I am looking for (mode,OID) for the merge base (which _can_ be a virtual one in case of recursive merges) as well as for the two divergent revisions that were supposed to be merged. I do realize that other types of conflicts can occur, such as a rename/rename conflict, and we would need a way to represent these in the output, too. But in such an instance, where no clear (mode,OID) triplet can be provided that represents the merge conflicts for this file, the current web UI cannot offer a way to resolve the conflict, so I am a bit less interested in that scenario. > If you are looking for multiple OIDs and modes for each file > (representing where the content came from on the different sides of > the merge), are you looking for: > * the OID and mode of each file that played a part in the merge resolution > OR > * just the OIDs and modes that would have been recorded in the index > had the merge been done by `git merge`? I think the latter was my idea. But again, you made me think of rename/rename conflicts and friends, and we would need a way to represent those, too. Even if my current use case would need to only detect their presence in order to say "nope, can't resolve that on the web". > (Those last two possibilities are usually the same answer, but no they > are not always the same. The index cannot hold all the OIDs and modes > in various cases and we have to squash them down to three, possibly > tossing some information that might have been of interest to the user. > This can even occur when you have a unique merge base.) > > > > > - need a way to provide merge conflict resolutions > > > > And now that I wrote all this, I realize I should have sent it to the > > `merge-tree` thread, not the `merge-tree-ort` thread... > > My submission was RFC and you're providing C, so it's all good in my > book. I'm reading both threads. :-) :-) Thank you so much! Dscho
Hi Dscho, On Tue, Jan 11, 2022 at 2:30 PM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Elijah, > > On Mon, 10 Jan 2022, Elijah Newren wrote: > > > On Mon, Jan 10, 2022 at 5:49 AM Johannes Schindelin > > <Johannes.Schindelin@gmx.de> wrote: > > > > > > On Fri, 7 Jan 2022, Elijah Newren wrote: > > > > > > > On Fri, Jan 7, 2022 at 9:58 AM Christian Couder > > > > <christian.couder@gmail.com> wrote: > > > > > > > > > > On Wed, Jan 5, 2022 at 5:54 PM Elijah Newren <newren@gmail.com> > > > > > wrote: > > > > > > > > > > > > On Wed, Jan 5, 2022 at 8:33 AM Christian Couder > > > > > > <christian.couder@gmail.com> wrote: > > > > > > > ... > > > > > > I _bet_ it needs first and foremost the information "is this > > > mergeable?". > > > > > > That is by far the most common question the code I saw had to answer, > > > without any follow-up questions asked. > > > > > > An add-on question is then "which files/directories conflicted?". And > > > there, it really only wanted the file names, along with the OIDs of > > > the respective item in the base, the HEAD and the merge branch. > > > > This might be difficult to provide for some cases, e.g. > > rename/rename(1to2), especially if that conflict gets entangled with any > > others. Users would also have difficulty gaining these even using the > > command line `git merge` (with either recursive or ort) for these cases. > > > > Also, does this presume three OIDs? What about the cases where there > > are 4 or 6 for a given path? > > > > I'm a bit worried about the assumptions underlying this request, and > > that it might not be satisfiable in general depending on what those > > assumptions are. > > Well, that request is driven by the reality of a web UI. > > You cannot reasonably resolve just any merge conflict in a web UI. But you > can easily resolve a trivial content conflict in, say, a README file, > without opening a console window, cloning the repository, starting an > editor, then saving the result after resolving the textual conflict, > committing it, then trying to force-push to the original PR branch (if the > contributor has given you permission to push). > > So what I am talking about here really is the case where no rename > conflict happened, no directory/file conflict, no type change. Just the > good ole' textual conflicts where the same lines were modified in > divergent ways. Sure, but we were talking about needs for `git merge-tree`. You are saying we need more information out of it (OIDs and modes) to fullfill user requests. That's totally reasonable. But stating that you don't care about certain types of conflicts or cases where renames happen doesn't really help answer my question; while that's a restriction you can put on your internal implementation, other users of `merge-tree` doing the exact same type of thing (even involving a web UI) might decide they also want to handle the other types of conflicts. I don't want merge-tree to have a design only capable of handling unrenamed files, or only textual conflicts, since the new mode for merge-tree was all about handling the complexity of a real merge. > This means that we need the base, HEAD and MERGE OIDs here (and modes, you > are absolutely correct). Now you're back to requiring three again. What about modify/delete conflicts where there are only two available? Your stated requirements either lack precision about what you really want (e.g. "the OIDs and modes that would have appeared in the higher order stages in the index for the given path") or aren't satisfiable. I'm against providing anything that knowingly paints us into a corner. ... > > > @@ -3972,7 +3973,17 @@ static void process_entries(struct merge_options *opt, > > > &dir_metadata); > > > if (mi->clean) > > > record_entry_for_tree(&dir_metadata, path, mi); > > > - else { > > > + else if (opt->pre_resolved_conflicts && > > > + (resolved_oid = > > > + strmap_get(opt->pre_resolved_conflicts, path))) { > > > > You have a couple problematic assumptions here: > > > > * What if the user's resolution required fixing a semantic conflict, > > meaning they needed to modify a file that had no conflicts? Your code > > here would ignore those, because the clean case is handled above. > > > > * What if the user's resolution required adding a totally new file > > (either because a rename is handled as a separate add/delete, or they > > just needed a new file)? The loop above isn't over items in > > pre_resolved_conflicts, it just assumes that items in > > pre_resolved_conflicts are also in the plist.items being looped over. > > I can see how these assumptions may look ludicrous when coming from the > command-line. But again, we are talking about the realities of a conflict > resolution via a web UI. > > So I think that it is out of the question to address non-textual conflicts > in this scenario. And then the assumptions make much more sense. Waving your hands and saying it came from a web UI doesn't reduce my worries or concerns at all. I could easily imagine a web UI that allows users to specify other modifications they want to make, even limited to textual ones, to include in the merge. What happens when they modify some file that had otherwise merged cleanly (e.g. a file that gains a new call to some function, which after the merge needs an extra parameter passed to it)? Your solution doesn't handle it; it'd send that user-specified change to /dev/null. To be fair, you mentioned below that this is just a proof of concept, and that certainly reduces worries/concerns. It's totally fine to have such things. Maybe you intend to keep this patch internal indefinitely. That's fine too. My commentary is just feedback on if we want merge-ort/merge-tree extended more in this kind of fashion (and it might also serve as useful pointers on how to extend your internal patch if you get requests to extend your web UI a bit to handle more cases). :-) > > > + mi->clean = 1; > > > + mi->is_null = 0; > > > + memcpy(&mi->result.oid, resolved_oid, > > > + sizeof(*resolved_oid)); > > > > And here's another: > > > > * What if the provided resolution was a deletion of the file in > > question (especially after e.g. a modify/delete or rename/delete > > conflict)? Don't you need to have a special check for the zero_oid > > here? > > > > And if I combine the things above: > > > > * What if the user wants to remove a file and add a directory of > > files in its place at some location in order to resolve things? > > Granted, you didn't handle either deletes or new files above, but if > > you did add both, then you might have this issue. The code at this > > point used some very carefully constructed logic and order of walking > > items specifically to handle file/delete conflicts correctly. I'm > > worried that your conflict resolution stuff here might interact badly > > with that. > > > > > + if (!mi->result.mode) > > > + mi->result.mode = 0100644; > > > > How do you know it's not supposed to be 0100755? > > I don't ;-) > > This was a proof-of-concept, and I meant to look into this a bit further, > trying to figure out from where I could get this information, but I never > got to that. > > Now that I think about it, the best solution would probably be for the > mode to be provided in addition to the blob OID, so that the caller has to > decide. +1. ... > > directory/file are still handled by providing individual files, but > > ordering traversal becomes really tricky. The directory/file case > > might even require the pre_resolved_conflicts to be pulled out of that > > loop somehow. It'd take some investigative work, or some deep > > thought, or both. > > My idea for directory/file conflicts is to essentially teach the web UI to > throw its hands in the air upon encountering them, and telling the user > that it's not possible to resolve these types of conflicts via the UI. > > And since my principal driver here is the concrete need for such a > server-side merge with conflict resolution, that's as far as I want to > push `merge-tree`, too, and leave any more complicated resolution to a > future date, or never, whichever comes first. :-) > > > A subsequent commit changed my hacky `merge-tree` hack to optionally > > > accept NUL-terminated merge conflict resolutions in <path>/<OID> pairs via > > > stdin (again, avoiding files to be written where we do not _need_ spend > > > I/O unnecessarily). > > > > > > The biggest problem we faced at the Contributor Summit was that our > > > discussion was not informed by the actual server-side needs. So I would > > > like to reiterate my challenge to make that the driver. Let's not use any > > > hypothetical scenario as the basis for the design of `git merge-tree`, but > > > let's use the concrete requirements of actual server-side merges that are > > > currently implemented using libgit2, when trying to figure out what > > > functionality we need from `merge-tree`, and in what shape. > > > > > > Here is an initial list: > > > > > > - need to determine whether a merge will succeed, quickly > > > > > > - need the tree OID for a successful merge > > > > > > - need the list of items that conflict... > > > > I think my version is good up to here. > > Yes! :-) > The only thing I would change is to not even bother providing a tree > object in case of conflicts. Either we provide it, and expect the user to > somehow reconstruct the conflict types from there, or we simply don't say > anything about the tree whose (transitive) file contents may or may not > contain conflict markers (or not, think: binary files). Providing a tree > object in case of a failed merge isn't helpful IMO. You've got a false dichotomy here. Providing a tree object does NOT imply that we expect users to reconstruct a list of conflicted files or the types of conflicts from that tree. I pretty strongly argued against that, in fact (see [*], , starting at "???" -- and yes, I mean a literal triple question mark). I also don't see why you are so bothered by the tree object being printed. I already pointed out that we can't easily avoid computing the tree object without risking hurting performance more than you're trying to help it (again, see [*]). But beyond performance, by trying to require that the tree object not be printed, you are trying to prevent users from getting information that they simply cannot obtain any other way. Even if `git merge-tree` provided a list of up to three (stage, mode, oid) triplets for each conflicted filename corresponding to what you'd find in the index in the higher order stages (which sounds like a totally reasonable request, by the way), you could not reconstruct the information available in the tree for the same conflicted paths. Sure, you could get "close", and for some uses, maybe that's good enough, but most users I've ever seen prefer the file in the working copy after a merge for trying to resolve conflicts over the (stage, mode, oid) pairs found in the index. And since file in the tree potentially has some information not reconstructable from the index, I don't see why we should avoid printing 41 characters to stdout to allow the user to make use of it. If it's not useful to you, whatever, toss those 41 printed bytes. [*] https://lore.kernel.org/git/CABPp-BHvXrP0sTTmuTYfACoJTCcm9+wk_f441nj4TstrmQdqMQ@mail.gmail.com/ > > > , along with OIDs and modes, if the merge failed > > > > Could you clarify what you mean here by OIDs and modes? For a given > > path, are you just looking for a (pathname, OID, mode) tuple? (In > > which case, you'd get the OID of a file that potentially has embedded > > conflict markers) > > > > Or are you looking for multiple OIDs and modes for each file? > > This. In case of a conflict, I am looking for (mode,OID) for the merge > base (which _can_ be a virtual one in case of recursive merges) as well as > for the two divergent revisions that were supposed to be merged. > > I do realize that other types of conflicts can occur, such as a > rename/rename conflict, and we would need a way to represent these in the > output, too. But in such an instance, where no clear (mode,OID) triplet > can be provided that represents the merge conflicts for this file, the > current web UI cannot offer a way to resolve the conflict, so I am a bit > less interested in that scenario. Okay, this is helping explain a bit better. Out of curiosity, does this mean the web UI only can resolve cases where all three modes & OIDs are present, and the files are text rather than binary? For example, does this mean the web UI cannot handle cases like modify/delete or add/add? > > If you are looking for multiple OIDs and modes for each file > > (representing where the content came from on the different sides of > > the merge), are you looking for: > > * the OID and mode of each file that played a part in the merge resolution > > OR > > * just the OIDs and modes that would have been recorded in the index > > had the merge been done by `git merge`? > > I think the latter was my idea. Ah, perfect; that's something we can provide! > But again, you made me think of rename/rename conflicts and friends, and > we would need a way to represent those, too. Even if my current use case > would need to only detect their presence in order to say "nope, can't > resolve that on the web". But now you're making me worry whether I can satisfy your requests again, or at minimum, whether I need to do a lot more work in merge-ort to answer them. Do you need a representation other than the list of (stage, mode, oid) triplets? That might be hard to come by, especially for cases like mod6 of t6422. The code basically assumes there's going to be 0 or more conflict types at each path (and yes, you can absolutely have more than 1), but doesn't store these conflict types other than in terms of outputting various "CONFLICT (type): <more info>" strings (though it does keep the output messages in memory and associated with each file before printing them at the end). Further, while some conflict types which could be considered to affect multiple paths (e.g. rename/rename where both sides rename the same file differently), they are only stored once relative to only one of the filenames to avoid printing the same conflict notice three times. (For example, in the rename/rename case, the conflict message is associated with the base filename). If you do only need the list of (stage, mode, oid) triplets, then that can be provided for every file. However, things like rename/rename(1to2) will only provide a single (stage, mode, oid) triplet for the oldfilename, and also only provide one triplet for the new filename on side1, and also only provide one triplet for the new filename on side2. I don't know if it'll confuse users why they have a conflict for a file with only a single triplet. > > > And now that I wrote all this, I realize I should have sent it to the > > > `merge-tree` thread, not the `merge-tree-ort` thread... > > > > My submission was RFC and you're providing C, so it's all good in my > > book. I'm reading both threads. :-) > > :-) I'm a little worried I might come across sounding a little picky since I tend to dive into details and really fixate on them. I apologize in advance if it sounds that way at all; you provide lots of good points to think about and I can't help but dive into the quirks surrounding each. I really appreciate all the awesome feedback and food for thought.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Maybe that was the right thing to do, maybe not, but it went out with > v2.30.0 and the lack of complaints since then would seem to suggest that > I was right that removing it wouldn't be a big deal. > > Of course it may have broken someone's script somewhere. > > But an important distinction is that they can get it working again by > just copy/pasting that ~100 line shell library into their own script, or > calling the underlying commands it was invoking themselves. Was parse-remote a part of what promised our end-users? merge-tree has been since it was written. That distinction is even more important.
Elijah Newren <newren@gmail.com> writes: >> And isn't any doubt around that even more reason to just go with >> Elijah's plan of introducing new plumbing? I.e. is it really costing us >> to just leave these "legacy merge" plumbing built-ins > > I definitely think it's worth guiding users away from the old `git > merge-tree` behavior in documentation (i.e. deprecating it). That may > also lead towards its eventual removal, but I'm not as worried about > that. Yup, promising users that we will remove it and telling them that they should migrate away from it is necessary. Doing anything else is simply irresponsible. I however suspect that Ævar didn't mean by "legacy merge plumbing built-in" the strategy backends. IOW, I had an impression that what is on the chopping block is merge-tree and not merge-recursive. But since you brought up deprecation of recursive, let's spend a few minutes on the topic. > `git merge-recursive` was actively used in various places, including > in `git cherry-pick`. I had used it a few times myself in a script. > I don't see a need to deprecate it currently, which naturally would > push its removal (if anyone is pushing for it) even further away. I suspect that we may be able to just invoke ort when recursive is invoked, and such a wrapping may even be easier than wrapping "git blame" to replace "git annotate" (where a command line option or two needs to change behaviour). I doubt there is -X<strategy-option> that affects recursive that ORT does not understand, so it may be quite simple to deprecate "merge -s recursive". As you say, replacing the internal implementation is a different matter. > * `merge-recursive.c` is still hard-coded in three places in the > code; you can't even set a configuration option to choose merge-ort.c > in those places: builtin/am, builtin/merge-recursive, and > builtin/stash. > > More details on that second point: All three of these use > merge_recursive_generic() and need that usage to be replaced. It's on > my TODO list. Yes, I do recall mentioning that when we were reviewing the series that added ort.
On Wed, Jan 12, 2022 at 10:06 AM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > ... > I however suspect that Ævar didn't mean by "legacy merge plumbing > built-in" the strategy backends. IOW, I had an impression that what > is on the chopping block is merge-tree and not merge-recursive. > > But since you brought up deprecation of recursive, let's spend a few > minutes on the topic. Not sure it matters, but for reference, Ævar explicitly brought up merge-recursive.c. The fuller quote was: > >> I.e. is it really costing us > >> to just leave these "legacy merge" plumbing built-ins and > >> merge-recursive.c etc. in place? Because he brought it up, I decided to address it. It was unclear to me whether he meant builtin/merge-recursive.c or the toplevel merge-recursive.c, so I just addressed both. > I suspect that we may be able to just invoke ort when recursive is > invoked, and such a wrapping may even be easier than wrapping "git > blame" to replace "git annotate" (where a command line option or two > needs to change behaviour). Yes, that was the plan I had. > I doubt there is -X<strategy-option> > that affects recursive that ORT does not understand, Technically there are currently two. As documented in merge-strategies.txt, ORT takes the same -X options as recursive, but currently ignores both -Xdiff-algorithm (instead passing HISTOGRAM_DIFF down to ll_merge()), and -Xno-renames (instead passing DIFF_DETECT_RENAME down to diffcore_rename_extended()). I guess there are three if you count -Xpatience separately from -Xdiff-algorithm=patience. > so it may be quite simple to deprecate "merge -s recursive". Yes...but why deprecate? I thought the plan was to (eventually) make requests for either `recursive` or `ort` be handled by running the `ort` backend. Making that kind of switch is much smaller than the one we already made to switch the default backend from `recursive` to `ort`, so I'm not sure I see what we gain by doing such a switch in stages. Maybe I'm missing something?
Elijah Newren <newren@gmail.com> writes: >> so it may be quite simple to deprecate "merge -s recursive". > > Yes...but why deprecate? I thought the plan was to (eventually) make > requests for either `recursive` or `ort` be handled by running the > `ort` backend. Making that kind of switch is much smaller than the > one we already made to switch the default backend from `recursive` to > `ort`, so I'm not sure I see what we gain by doing such a switch in > stages. Maybe I'm missing something? Didn't we "deprecate" but still indefinitely support "annotate"? I have been assuming that recursive will be in that category after ort establishes itself.
On Wed, Jan 12, 2022 at 10:08 PM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > >> so it may be quite simple to deprecate "merge -s recursive". > > > > Yes...but why deprecate? I thought the plan was to (eventually) make > > requests for either `recursive` or `ort` be handled by running the > > `ort` backend. Making that kind of switch is much smaller than the > > one we already made to switch the default backend from `recursive` to > > `ort`, so I'm not sure I see what we gain by doing such a switch in > > stages. Maybe I'm missing something? > > Didn't we "deprecate" but still indefinitely support "annotate"? I > have been assuming that recursive will be in that category after ort > establishes itself. But in that case, we suggested folks use "blame" instead of "annotate", right? From a user point of view, my intention was that "ort" is just the newer version of "recursive" (which happens to be an essentially total rewrite). ort only got a separate name because we wanted a special testing period and a way for users to select the old version of recursive or the new version of recursive. Of course, that period is multiple release cycles, so both names will persist, but they'll just be aliases. Once we switch "recursive" to mean "ort" so they both run the exact same code, I was intending to have the documentation prefer the term "recursive" over "ort" since it is a more meaningful user-facing name.
On Wed, Jan 12 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> Maybe that was the right thing to do, maybe not, but it went out with >> v2.30.0 and the lack of complaints since then would seem to suggest that >> I was right that removing it wouldn't be a big deal. >> >> Of course it may have broken someone's script somewhere. >> >> But an important distinction is that they can get it working again by >> just copy/pasting that ~100 line shell library into their own script, or >> calling the underlying commands it was invoking themselves. > > Was parse-remote a part of what promised our end-users? [...] It was listed under "LOW-LEVEL COMMANDS (PLUMBING)" => "Syncing repositories", which has git-daemon, git-{upload,receive}-pack, git-shell etc. now. So, pedantically we removed a removed a plumbing utility without much if any warning. But as I argued when it was removed I think that realistically some of our plumbing tools were made for internal-only use, and as the *.sh->C rewriting of built-ins progressed they became orphaned. They only had public-facing manpages due to plans that never came to fruition, or just in copy/pasting an existing template at the time. I don't think that would matter for git-parse-remote if it had subsequently gotted widespread use in the wild (even if it were undocumented), but when I investigated that it really seemed to be used by approximately nobody. But a while after it was removed you pushed back on some further similar changes to git-sh-setup. I asked some follow-up questions in https://lore.kernel.org/git/87tuiwjfvi.fsf@evledraar.gmail.com/ about how we should consider these that you didn't reply to. The greater context being that I was removing git-parse-remote.sh so that I could eventually get rid of the bridge of extending libintl/gettext to *.sh-land. The current state of that on "master" in that regard being: $ git grep '\b(eval_)?gettext(ln)?\b' -- ':!t/' ':!ci/' ':!git-gui' ':!git-sh-i18n.sh' '*.sh' git-merge-octopus.sh: gettextln "Error: Your local changes to the following files would be overwritten by merge" git-merge-octopus.sh: gettextln "Automated merge did not work." git-merge-octopus.sh: gettextln "Should not be doing an octopus." git-merge-octopus.sh: die "$(eval_gettext "Unable to find common commit with \$pretty_name")" git-merge-octopus.sh: eval_gettextln "Already up to date with \$pretty_name" git-merge-octopus.sh: eval_gettextln "Fast-forwarding to: \$pretty_name" git-merge-octopus.sh: eval_gettextln "Trying simple merge with \$pretty_name" git-merge-octopus.sh: gettextln "Simple merge did not work, trying automatic merge." git-sh-setup.sh:# Source git-sh-i18n for gettext support. git-sh-setup.sh: die "$(eval_gettext "usage: \$dashless \$USAGE")" git-sh-setup.sh: LONG_USAGE="$(eval_gettext "usage: \$dashless \$USAGE")" git-sh-setup.sh: LONG_USAGE="$(eval_gettext "usage: \$dashless \$USAGE git-sh-setup.sh: gettextln "Cannot chdir to \$cdup, the toplevel of the working tree" >&2 git-sh-setup.sh: die "$(eval_gettext "fatal: \$program_name cannot be used without a working tree.")" git-sh-setup.sh: die "$(eval_gettext "fatal: \$program_name cannot be used without a working tree.")" git-sh-setup.sh: gettextln "Cannot rewrite branches: You have unstaged changes." >&2 git-sh-setup.sh: eval_gettextln "Cannot \$action: You have unstaged changes." >&2 git-sh-setup.sh: eval_gettextln "Cannot \$action: Your index contains uncommitted changes." >&2 git-sh-setup.sh: gettextln "Additionally, your index contains uncommitted changes." >&2 git-sh-setup.sh: gettextln "You need to run this command from the toplevel of the working tree." >&2 git-sh-setup.sh: gettextln "Unable to determine absolute path of git directory" >&2 git-submodule.sh: die "fatal: $(eval_gettext "Unable to find current revision in submodule path '\$displaypath'")" git-submodule.sh: die "fatal: $(eval_gettext "Unable to fetch in submodule path '\$sm_path'")" git-submodule.sh: die "fatal: $(eval_gettext "Unable to find current \${remote_name}/\${branch} revision in submodule path '\$sm_path'")" git-submodule.sh: die_msg="fatal: $(eval_gettext "Failed to recurse into submodule path '\$displaypath'")" I.e. if we were able to get rid of that we could remove sh-i18n--envsubst.c and git-sh-i18n.sh itself. Some of that is dead code, others have pending *.sh->C rewrites. For the rest we could expose a trivial git-i18n.c helper to emit the <10 messages that remained pending further rewrites, which would be much simpler than extending the generic libintl functionality to *.sh. But since git-sh-i18n.sh latter is publicly documented as plumbing (which I'm responsible for, merely by copy/pasting an existing template) I stalled on that. Since you seemed to suggest in the linked-to thread that removing any such publicly documented shellscripting functions was a no-go, even if we'd previously removed git-parse-remote.sh.
On Wed, Jan 12 2022, Elijah Newren wrote: > On Wed, Jan 12, 2022 at 10:06 AM Junio C Hamano <gitster@pobox.com> wrote: >> >> Elijah Newren <newren@gmail.com> writes: >> > ... >> I however suspect that Ævar didn't mean by "legacy merge plumbing >> built-in" the strategy backends. IOW, I had an impression that what >> is on the chopping block is merge-tree and not merge-recursive. >> >> But since you brought up deprecation of recursive, let's spend a few >> minutes on the topic. > > Not sure it matters, but for reference, Ævar explicitly brought up > merge-recursive.c. The fuller quote was: > >> >> I.e. is it really costing us >> >> to just leave these "legacy merge" plumbing built-ins and >> >> merge-recursive.c etc. in place? > > Because he brought it up, I decided to address it. It was unclear to > me whether he meant builtin/merge-recursive.c or the toplevel > merge-recursive.c, so I just addressed both. FWIW what I meant (but clearly didn't make clear enough) is whether we'd deprecate the git-merge-tree(1) command, not whatever powers it under the hood. I.e. I took the greater discussion here to mean (but may have misunderstood it) that we were talking about the needs for a libgit2-replacing merge plumbing. The existing git-merge-tree command probably gets us 5% towards that, and I can see how being bug-for-bug compatible with it might be inconvenient in some future on-top-of-ort rewrite and extension of it. So we probably SHOULD keep it, but I don't think it's a MUST. I.e. if you/someone wrote some more powerful version of it, and keeping it became hard to support I think it would be OK to transition/deprecate it, as presumably its existing users wouldn't be too inconvenienced, or would be happier with the more powerful plumbing tool.
Hi Elijah, to save every reader some time, I snipped the parts that were addressed elsewhere already. On Tue, 11 Jan 2022, Elijah Newren wrote: > On Tue, Jan 11, 2022 at 2:30 PM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > > On Mon, 10 Jan 2022, Elijah Newren wrote: > > > > > You have a couple problematic assumptions here: > > > > > > * What if the user's resolution required fixing a semantic > > > conflict, meaning they needed to modify a file that had no > > > conflicts? Your code here would ignore those, because the clean > > > case is handled above. > > > > > > * What if the user's resolution required adding a totally new file > > > (either because a rename is handled as a separate add/delete, or > > > they just needed a new file)? The loop above isn't over items in > > > pre_resolved_conflicts, it just assumes that items in > > > pre_resolved_conflicts are also in the plist.items being looped > > > over. > > > > I can see how these assumptions may look ludicrous when coming from > > the command-line. But again, we are talking about the realities of a > > conflict resolution via a web UI. > > > > So I think that it is out of the question to address non-textual > > conflicts in this scenario. And then the assumptions make much more > > sense. > > Waving your hands and saying it came from a web UI doesn't reduce my > worries or concerns at all. I could easily imagine a web UI that > allows users to specify other modifications they want to make, even > limited to textual ones, to include in the merge. What happens when > they modify some file that had otherwise merged cleanly (e.g. a file > that gains a new call to some function, which after the merge needs an > extra parameter passed to it)? Your solution doesn't handle it; it'd > send that user-specified change to /dev/null. > > To be fair, you mentioned below that this is just a proof of concept, > and that certainly reduces worries/concerns. It's totally fine to > have such things. Maybe you intend to keep this patch internal > indefinitely. That's fine too. My commentary is just feedback on if > we want merge-ort/merge-tree extended more in this kind of fashion > (and it might also serve as useful pointers on how to extend your > internal patch if you get requests to extend your web UI a bit to > handle more cases). :-) Fair enough. I think I'll keep this patch internal-only for now, and iterate with real scenarios to figure out what we need/not need. > > > Could you clarify what you mean here by OIDs and modes? For a given > > > path, are you just looking for a (pathname, OID, mode) tuple? (In > > > which case, you'd get the OID of a file that potentially has embedded > > > conflict markers) > > > > > > Or are you looking for multiple OIDs and modes for each file? > > > > This. In case of a conflict, I am looking for (mode,OID) for the merge > > base (which _can_ be a virtual one in case of recursive merges) as well as > > for the two divergent revisions that were supposed to be merged. > > > > I do realize that other types of conflicts can occur, such as a > > rename/rename conflict, and we would need a way to represent these in the > > output, too. But in such an instance, where no clear (mode,OID) triplet > > can be provided that represents the merge conflicts for this file, the > > current web UI cannot offer a way to resolve the conflict, so I am a bit > > less interested in that scenario. > > Okay, this is helping explain a bit better. > > Out of curiosity, does this mean the web UI only can resolve cases > where all three modes & OIDs are present, and the files are text > rather than binary? For example, does this mean the web UI cannot > handle cases like modify/delete or add/add? Right now, the UI is based on the assumption that the underlying merge machinery does not even try to detect renames. Which simplifies the range of supported scenarios somewhat. I have not looked at the underlying code, but https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-on-github clearly says: You can resolve simple merge conflicts that involve competing line changes on GitHub, using the conflict editor. So yes, the UI does not even try to support resolving modify/delete conflicts at this time. > > But again, you made me think of rename/rename conflicts and friends, and > > we would need a way to represent those, too. Even if my current use case > > would need to only detect their presence in order to say "nope, can't > > resolve that on the web". > > But now you're making me worry whether I can satisfy your requests > again, or at minimum, whether I need to do a lot more work in > merge-ort to answer them. Do you need a representation other than the > list of (stage, mode, oid) triplets? I guess we will have to accept that the (stage, mode, OID) list is the best form, for now. We will have to see what is possible to do on the UI side, and then extend the `merge-ort` side as needed. > I'm a little worried I might come across sounding a little picky since I > tend to dive into details and really fixate on them. I apologize in > advance if it sounds that way at all; you provide lots of good points to > think about and I can't help but dive into the quirks surrounding each. > I really appreciate all the awesome feedback and food for thought. Personally, I did not find your comments picky at all, but rather constructive. I find this conversation thoroughly enjoyable and productive even at times when you prove me wrong. You set a really high bar as far as collaboration style goes. Thank you very much for that! Ciao, Dscho
Hi Elijah, On Tue, 11 Jan 2022, Elijah Newren wrote: > On Mon, Jan 10, 2022 at 9:59 AM Elijah Newren <newren@gmail.com> wrote: > > > > On Mon, Jan 10, 2022 at 5:49 AM Johannes Schindelin > > <Johannes.Schindelin@gmx.de> wrote: > > > > ... > > >, therefore it expects the resolved conflicts to > > > be in _files_. I don't think that there is a way to reasonably handle > > > symlink target conflicts or directory/file/symlink conflicts, but there > > > might be. > > > > You really need (mode,oid) pairs to be provided by the user. That > > fixes the executable issue I mentioned above, and makes it clear how > > to handle symlinks and file/symlink conflicts. > > > > directory/file are still handled by providing individual files, but > > ordering traversal becomes really tricky. The directory/file case > > might even require the pre_resolved_conflicts to be pulled out of that > > loop somehow. It'd take some investigative work, or some deep > > thought, or both. > > I think I came up with a solution to this during my run yesterday, > though I haven't tried or tested it. Instead of modifying the loop > over plist.items, you instead add a preliminary loop over > pre_resolved_conflicts that modifies opt->priv->paths (and add this > preliminary loop just before the items from opt->priv->paths are added > to plist.items). In that preliminary loop, you need to make sure that > (a) any files in pre_resolved_conflicts corresponding to existing > _files_ in opt->priv->path result in updating that item's clean & > is_null & mode & oid state, (b) any files in pre_resolved_conflicts > that correspond to existing _directories_ in opt->priv->path need the > value to be expanded to be a conflict_info instead of just a > merged_info, you need to set the df_conflict bit, and don't update the > merge_info fields but do update the extended conflict_info ones, (c) > any new files in pre_resolved_conflicts result in new entries > opt->priv->paths, (d) any leading directories of new files in > pre_resolved_conflicts are appropriately handled, meaning: (d1) new > opt->priv->paths are created if the directory path wasn't in > opt->priv->paths before, (d2) a tweak to df_conflict for the directory > item if it previously existed in opt->priv->paths but only as a file > (possibly also necessitating expanding from a merged_info to a > conflict_info), (d3) no-op if the directory already existed in > opt->priv->paths and was just a directory (and in this case, you can > stop walking the parent directories to the toplevel). > > Then, after this preliminary loop that modifies opt->priv->paths, the > rest can just proceed as-is. > > That should handle new files, new directories, and all directory/file > conflicts. Yeah, it's a bunch to look at, but directory/file > conflicts are always a bear. Indeed. What's even worse is the question how to represent that in a web UI, and I think I'll wait for that to happen (if ever) to give that part of the design more thought. Ciao, Dscho