mbox series

[0/2] RFC: implement new zdiff3 conflict style

Message ID pull.1036.git.git.1623734171.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series RFC: implement new zdiff3 conflict style | expand

Message

Philippe Blain via GitGitGadget June 15, 2021, 5:16 a.m. UTC
Implement a zealous diff3, or "zdiff3". This new mode is identical to
ordinary diff3 except that it allows compaction of common lines between the
two sides of history, if those common lines occur at the beginning or end of
a conflict hunk.

This is just RFC, because I need to add tests. Also, while I've remerged
every merge, revert, or duly marked cherry-pick from both git.git and
linux.git with this patch using the new zdiff3 mode, that only shows it
doesn't segfault. (Though I also reran 10% of the linux remerges with zdiff3
under valgrind without issues.) I looked through some differences between
--remerge-diff with diff3 and --remerge-diff with zdiff3, but those are
essentially diffs of a diff of a diff, which I found hard to read. I'd like
to look through more examples, and use it for a while before submitting the
patches without the RFC tag.

Elijah Newren (2):
  xdiff: implement a zealous diff3, or "zdiff3"
  update documentation for new zdiff3 conflictStyle

 Documentation/config/merge.txt         |  9 ++++-
 Documentation/git-checkout.txt         |  3 +-
 Documentation/git-merge-file.txt       |  3 ++
 Documentation/git-merge.txt            | 32 +++++++++++++---
 Documentation/git-rebase.txt           |  6 +--
 Documentation/git-restore.txt          |  3 +-
 Documentation/git-switch.txt           |  3 +-
 Documentation/technical/rerere.txt     | 10 ++---
 builtin/checkout.c                     |  2 +-
 builtin/merge-file.c                   |  2 +
 contrib/completion/git-completion.bash |  6 +--
 xdiff-interface.c                      |  2 +
 xdiff/xdiff.h                          |  1 +
 xdiff/xmerge.c                         | 52 ++++++++++++++++++++++++--
 14 files changed, 107 insertions(+), 27 deletions(-)


base-commit: 670b81a890388c60b7032a4f5b879f2ece8c4558
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1036%2Fnewren%2Fzdiff3-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1036/newren/zdiff3-v1
Pull-Request: https://github.com/git/git/pull/1036

Comments

Jeff King June 15, 2021, 9:43 a.m. UTC | #1
On Tue, Jun 15, 2021 at 05:16:08AM +0000, Elijah Newren via GitGitGadget wrote:

> Implement a zealous diff3, or "zdiff3". This new mode is identical to
> ordinary diff3 except that it allows compaction of common lines between the
> two sides of history, if those common lines occur at the beginning or end of
> a conflict hunk.
> 
> This is just RFC, because I need to add tests. Also, while I've remerged
> every merge, revert, or duly marked cherry-pick from both git.git and
> linux.git with this patch using the new zdiff3 mode, that only shows it
> doesn't segfault. (Though I also reran 10% of the linux remerges with zdiff3
> under valgrind without issues.) I looked through some differences between
> --remerge-diff with diff3 and --remerge-diff with zdiff3, but those are
> essentially diffs of a diff of a diff, which I found hard to read. I'd like
> to look through more examples, and use it for a while before submitting the
> patches without the RFC tag.

I did something similar (but I wasn't smart enough to try your
remerge-diff, and just re-ran a bunch of merges).

Skimming over the results, I didn't see anything that looked incorrect.
Many of them are pretty non-exciting, though. A common case seems to be
ones like 01a2a03c56 (Merge branch 'jc/diff-filter-negation',
2013-09-09), where two sides both add functions in the same place, and
the common lines are just the closing "}" followed by a blank line.

Removing those shared lines actually makes things less readable, IMHO,
but I don't think it's the wrong thing to do. The usual "merge" zealous
minimization likewise produces the same unreadability. If we want to
address that, I think the best way would be by teaching the minimization
some heuristics about which lines are trivial.

Here's another interesting one. In 0c52457b7c (Merge branch
'nd/daemon-informative-errors-typofix', 2014-01-10), the diff3 looks
like:

  <<<<<<< ours
                  if (starts_with(arg, "--informative-errors")) {
  ||||||| base
                  if (!prefixcmp(arg, "--informative-errors")) {
  =======
                  if (!strcmp(arg, "--informative-errors")) {
  >>>>>>> theirs
                          informative_errors = 1;
                          continue;
                  }
  <<<<<<< ours
                  if (starts_with(arg, "--no-informative-errors")) {
  ||||||| base
                  if (!prefixcmp(arg, "--no-informative-errors")) {
  =======
                  if (!strcmp(arg, "--no-informative-errors")) {
  >>>>>>> theirs

A little clunky, but it's easy-ish to see what's going on. With zdiff3,
the context between the two hunks is rolled into a single hunk:

  <<<<<<< ours
                  if (starts_with(arg, "--informative-errors")) {
                          informative_errors = 1;
                          continue;
                  }
                  if (starts_with(arg, "--no-informative-errors")) {
  ||||||| base
                  if (!prefixcmp(arg, "--informative-errors")) {
  =======
                  if (!strcmp(arg, "--informative-errors")) {
                          informative_errors = 1;
                          continue;
                  }
                  if (!strcmp(arg, "--no-informative-errors")) {
  >>>>>>> theirs

which seems worse. I haven't dug/thought carefully enough into your
change yet to know if this is expected, or if there's a bug.

-Peff
Elijah Newren June 15, 2021, 7:35 p.m. UTC | #2
On Tue, Jun 15, 2021 at 2:43 AM Jeff King <peff@peff.net> wrote:
>
> On Tue, Jun 15, 2021 at 05:16:08AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > Implement a zealous diff3, or "zdiff3". This new mode is identical to
> > ordinary diff3 except that it allows compaction of common lines between the
> > two sides of history, if those common lines occur at the beginning or end of
> > a conflict hunk.
> >
> > This is just RFC, because I need to add tests. Also, while I've remerged
> > every merge, revert, or duly marked cherry-pick from both git.git and
> > linux.git with this patch using the new zdiff3 mode, that only shows it
> > doesn't segfault. (Though I also reran 10% of the linux remerges with zdiff3
> > under valgrind without issues.) I looked through some differences between
> > --remerge-diff with diff3 and --remerge-diff with zdiff3, but those are
> > essentially diffs of a diff of a diff, which I found hard to read. I'd like
> > to look through more examples, and use it for a while before submitting the
> > patches without the RFC tag.
>
> I did something similar (but I wasn't smart enough to try your
> remerge-diff, and just re-ran a bunch of merges).

What I did was great for testing if there were funny merges that might
cause segfaults or such with zdiff3, but not so clever for viewing the
direct output from zdiff3.  Using remerge-diff in this way does not
show the [z]diff3 output directly, but a diff of that output against
what was ultimately recorded in the merge, forcing me to attempt to
recreate the original in my head.

(And, of course, I made it even worse by taking the remerge-diff
output with diff3, and the remerge-diff output with zdiff3, and then
diffing those, resulting in yet another layer of diffs that I'd have
to undo in my head to attempt to construct the original.)

> Skimming over the results, I didn't see anything that looked incorrect.
> Many of them are pretty non-exciting, though. A common case seems to be
> ones like 01a2a03c56 (Merge branch 'jc/diff-filter-negation',
> 2013-09-09), where two sides both add functions in the same place, and
> the common lines are just the closing "}" followed by a blank line.
>
> Removing those shared lines actually makes things less readable, IMHO,
> but I don't think it's the wrong thing to do. The usual "merge" zealous
> minimization likewise produces the same unreadability. If we want to
> address that, I think the best way would be by teaching the minimization
> some heuristics about which lines are trivial.
>
> Here's another interesting one. In 0c52457b7c (Merge branch
> 'nd/daemon-informative-errors-typofix', 2014-01-10), the diff3 looks
> like:
>
>   <<<<<<< ours
>                   if (starts_with(arg, "--informative-errors")) {
>   ||||||| base
>                   if (!prefixcmp(arg, "--informative-errors")) {
>   =======
>                   if (!strcmp(arg, "--informative-errors")) {
>   >>>>>>> theirs
>                           informative_errors = 1;
>                           continue;
>                   }
>   <<<<<<< ours
>                   if (starts_with(arg, "--no-informative-errors")) {
>   ||||||| base
>                   if (!prefixcmp(arg, "--no-informative-errors")) {
>   =======
>                   if (!strcmp(arg, "--no-informative-errors")) {
>   >>>>>>> theirs
>
> A little clunky, but it's easy-ish to see what's going on. With zdiff3,
> the context between the two hunks is rolled into a single hunk:
>
>   <<<<<<< ours
>                   if (starts_with(arg, "--informative-errors")) {
>                           informative_errors = 1;
>                           continue;
>                   }
>                   if (starts_with(arg, "--no-informative-errors")) {
>   ||||||| base
>                   if (!prefixcmp(arg, "--informative-errors")) {
>   =======
>                   if (!strcmp(arg, "--informative-errors")) {
>                           informative_errors = 1;
>                           continue;
>                   }
>                   if (!strcmp(arg, "--no-informative-errors")) {
>   >>>>>>> theirs
>
> which seems worse. I haven't dug/thought carefully enough into your
> change yet to know if this is expected, or if there's a bug.

Yeah, I don't know for sure either but that does look buggy to me.
Thanks for digging up these examples.  I'm a bit overdrawn on time for
this, so I'll pick it back up in a week or two and investigate this
case further.
Johannes Sixt June 15, 2021, 9:36 p.m. UTC | #3
Am 15.06.21 um 07:16 schrieb Elijah Newren via GitGitGadget:
> Implement a zealous diff3, or "zdiff3". This new mode is identical to
> ordinary diff3 except that it allows compaction of common lines between the
> two sides of history, if those common lines occur at the beginning or end of
> a conflict hunk.

As a data point, I tried this series (cf9d93e547 en/zdiff3) on my
criss-cross merge test case that started this adventure, and it produces
the very same output as diff3; cf.
https://lore.kernel.org/git/60883e1b-787f-5ec2-a9af-f2f6757d3c43@kdbg.org/

-- Hannes
Elijah Newren June 15, 2021, 9:45 p.m. UTC | #4
On Tue, Jun 15, 2021 at 2:36 PM Johannes Sixt <j6t@kdbg.org> wrote:
>
> Am 15.06.21 um 07:16 schrieb Elijah Newren via GitGitGadget:
> > Implement a zealous diff3, or "zdiff3". This new mode is identical to
> > ordinary diff3 except that it allows compaction of common lines between the
> > two sides of history, if those common lines occur at the beginning or end of
> > a conflict hunk.
>
> As a data point, I tried this series (cf9d93e547 en/zdiff3) on my
> criss-cross merge test case that started this adventure, and it produces
> the very same output as diff3; cf.
> https://lore.kernel.org/git/60883e1b-787f-5ec2-a9af-f2f6757d3c43@kdbg.org/

That's good to hear; your two sides had no common text at the
beginning or end of the conflict hunk, so I wouldn't expect zdiff3 to
change that particular example.

The XDL_MERGE_FAVOR_BASE idea (cf.
https://lore.kernel.org/git/20210611190235.1970106-1-newren@gmail.com/),
though would I think simplify the diff3 conflict markers in your
example to

<<<<<<< HEAD
    CClustering ComputeSSLClusters(double threshPercent, const
CDataInfo* scale) const override;
    void ComputeDist(DistFunc distFunc, CDoubleArray& dist,
        double& minDist, double& maxDist) const;
    double EstimateNodeDist2() const override;
    std::vector<double> EstimateNeighborMinDist() const override;
||||||| merged common ancestors
    CClustering ComputeClusters(const double* dist, double threshold,
        const CDataInfo* scale) const override;
    virtual void ComputeDist(DistFunc distFunc, CDoubleArray& dist,
        double& minDist, double& maxDist);
    virtual void ComputeUMatrix();
    virtual void ComputeKNearest(int K, const double*,
        Neighborhood& result) const;
=======
    CClustering ComputeSSLClusters(double threshPercent,
        const CDoubleArray& compWeights, const CDataInfo* scale) const override;
    static void ComputeDist(const CNetNodeHolder& vecs, CDoubleArray& dist,
        double& minDist, double& maxDist);
>>>>>>> no-compweights-in-cnet

That seems like it might be nicer, but I don't do many criss-cross
merges myself so it'd be nice to get opinions of others like you who
do.
Johannes Sixt June 16, 2021, 6:16 a.m. UTC | #5
Am 15.06.21 um 23:45 schrieb Elijah Newren:
> On Tue, Jun 15, 2021 at 2:36 PM Johannes Sixt <j6t@kdbg.org> wrote:
>>
>> Am 15.06.21 um 07:16 schrieb Elijah Newren via GitGitGadget:
>>> Implement a zealous diff3, or "zdiff3". This new mode is identical to
>>> ordinary diff3 except that it allows compaction of common lines between the
>>> two sides of history, if those common lines occur at the beginning or end of
>>> a conflict hunk.
>>
>> As a data point, I tried this series (cf9d93e547 en/zdiff3) on my
>> criss-cross merge test case that started this adventure, and it produces
>> the very same output as diff3; cf.
>> https://lore.kernel.org/git/60883e1b-787f-5ec2-a9af-f2f6757d3c43@kdbg.org/
> 
> That's good to hear; your two sides had no common text at the
> beginning or end of the conflict hunk, so I wouldn't expect zdiff3 to
> change that particular example.
> 
> The XDL_MERGE_FAVOR_BASE idea (cf.
> https://lore.kernel.org/git/20210611190235.1970106-1-newren@gmail.com/),
> though would I think simplify the diff3 conflict markers in your
> example to
> 
> <<<<<<< HEAD
>     CClustering ComputeSSLClusters(double threshPercent, const
> CDataInfo* scale) const override;
>     void ComputeDist(DistFunc distFunc, CDoubleArray& dist,
>         double& minDist, double& maxDist) const;
>     double EstimateNodeDist2() const override;
>     std::vector<double> EstimateNeighborMinDist() const override;
> ||||||| merged common ancestors
>     CClustering ComputeClusters(const double* dist, double threshold,
>         const CDataInfo* scale) const override;
>     virtual void ComputeDist(DistFunc distFunc, CDoubleArray& dist,
>         double& minDist, double& maxDist);
>     virtual void ComputeUMatrix();
>     virtual void ComputeKNearest(int K, const double*,
>         Neighborhood& result) const;
> =======
>     CClustering ComputeSSLClusters(double threshPercent,
>         const CDoubleArray& compWeights, const CDataInfo* scale) const override;
>     static void ComputeDist(const CNetNodeHolder& vecs, CDoubleArray& dist,
>         double& minDist, double& maxDist);
>>>>>>>> no-compweights-in-cnet
> 
> That seems like it might be nicer, but I don't do many criss-cross
> merges myself so it'd be nice to get opinions of others like you who
> do.

That *is* nicer as it is just the regular conflict with some base
context. Does that mean that the regular recursive merge is a bit sloppy
because it throws away too many conflicts that occur in virtual
ancestors? Even if that is the case, I couldn't tell whether that is a
disadvantage or not, as I've actually never seen nested conflicts in the
past; the diff3 test was the first time I saw one.

With the referenced patch applied (after a small tweak around needs_cr
to make it compile), my testcase does indeed produce the above conflict
under zdiff3 mode. The diff3 mode, OTOH, produces an exceedingly large
non-nested and obviously incorrect conflict (I'm not going to post it
here): our and their side are large and share a lot of text, but the
base part is identical to the above and is far too small.

-- Hannes
Elijah Newren June 16, 2021, 8:14 a.m. UTC | #6
On Tue, Jun 15, 2021 at 11:16 PM Johannes Sixt <j6t@kdbg.org> wrote:
>
> Am 15.06.21 um 23:45 schrieb Elijah Newren:
> > On Tue, Jun 15, 2021 at 2:36 PM Johannes Sixt <j6t@kdbg.org> wrote:
> >>
> >> Am 15.06.21 um 07:16 schrieb Elijah Newren via GitGitGadget:
> >>> Implement a zealous diff3, or "zdiff3". This new mode is identical to
> >>> ordinary diff3 except that it allows compaction of common lines between the
> >>> two sides of history, if those common lines occur at the beginning or end of
> >>> a conflict hunk.
> >>
> >> As a data point, I tried this series (cf9d93e547 en/zdiff3) on my
> >> criss-cross merge test case that started this adventure, and it produces
> >> the very same output as diff3; cf.
> >> https://lore.kernel.org/git/60883e1b-787f-5ec2-a9af-f2f6757d3c43@kdbg.org/
> >
> > That's good to hear; your two sides had no common text at the
> > beginning or end of the conflict hunk, so I wouldn't expect zdiff3 to
> > change that particular example.
> >
> > The XDL_MERGE_FAVOR_BASE idea (cf.
> > https://lore.kernel.org/git/20210611190235.1970106-1-newren@gmail.com/),
> > though would I think simplify the diff3 conflict markers in your
> > example to
> >
> > <<<<<<< HEAD
> >     CClustering ComputeSSLClusters(double threshPercent, const
> > CDataInfo* scale) const override;
> >     void ComputeDist(DistFunc distFunc, CDoubleArray& dist,
> >         double& minDist, double& maxDist) const;
> >     double EstimateNodeDist2() const override;
> >     std::vector<double> EstimateNeighborMinDist() const override;
> > ||||||| merged common ancestors
> >     CClustering ComputeClusters(const double* dist, double threshold,
> >         const CDataInfo* scale) const override;
> >     virtual void ComputeDist(DistFunc distFunc, CDoubleArray& dist,
> >         double& minDist, double& maxDist);
> >     virtual void ComputeUMatrix();
> >     virtual void ComputeKNearest(int K, const double*,
> >         Neighborhood& result) const;
> > =======
> >     CClustering ComputeSSLClusters(double threshPercent,
> >         const CDoubleArray& compWeights, const CDataInfo* scale) const override;
> >     static void ComputeDist(const CNetNodeHolder& vecs, CDoubleArray& dist,
> >         double& minDist, double& maxDist);
> >>>>>>>> no-compweights-in-cnet
> >
> > That seems like it might be nicer, but I don't do many criss-cross
> > merges myself so it'd be nice to get opinions of others like you who
> > do.
>
> That *is* nicer as it is just the regular conflict with some base
> context. Does that mean that the regular recursive merge is a bit sloppy
> because it throws away too many conflicts that occur in virtual
> ancestors? Even if that is the case, I couldn't tell whether that is a
> disadvantage or not, as I've actually never seen nested conflicts in the
> past; the diff3 test was the first time I saw one.
>
> With the referenced patch applied (after a small tweak around needs_cr
> to make it compile), my testcase does indeed produce the above conflict
> under zdiff3 mode. The diff3 mode, OTOH, produces an exceedingly large
> non-nested and obviously incorrect conflict (I'm not going to post it
> here): our and their side are large and share a lot of text, but the
> base part is identical to the above and is far too small.

The thing about XDL_MERGE_FAVOR_BASE in combination with a recursive
merge, is that if you have two merge-bases, then XDL_MERGE_FAVOR_BASE
will put the text from the merge base of the merge-bases in the
"original text" region.  If your merge bases themselves had multiple
merge-bases, and those merge bases in turn had multiple merge-bases,
etc., so that you end up with a deeply nested recursive merge, then
XDL_MERGE_FAVOR_BASE will cause you to get the "original text" filled
by the text from some ancient ancestor.  The net result is that it
makes it look like you have done a three-way merge between the two
sides you expect with a really ancient merge base.  So, a really small
original text is probably something to be expected from the
XDL_MERGE_FAVOR_BASE patch.

The fact that zdiff3 shrinks it down considerably reflects the fact
that your two sides had long sections of code that matched at the
beginning and end of the conflict hunk.  So it sounds to me like both
pieces were behaving implementationally as designed for your testcase.

Whether using "ancient original text" makes sense in general rather
than using a "more recent original text" (in the form of an attempted
merge of the merge-bases complete with conflict markers), feels like
it's one of those tradeoff things that I figured those who do lots of
criss-cross merges should weigh in on rather than me.
Phillip Wood June 16, 2021, 8:57 a.m. UTC | #7
On 15/06/2021 20:35, Elijah Newren wrote:
> On Tue, Jun 15, 2021 at 2:43 AM Jeff King <peff@peff.net> wrote:
>>
>> On Tue, Jun 15, 2021 at 05:16:08AM +0000, Elijah Newren via GitGitGadget wrote:
>>
>>> Implement a zealous diff3, or "zdiff3". This new mode is identical to
>>> ordinary diff3 except that it allows compaction of common lines between the
>>> two sides of history, if those common lines occur at the beginning or end of
>>> a conflict hunk.
>>>
>>> This is just RFC, because I need to add tests. Also, while I've remerged
>>> every merge, revert, or duly marked cherry-pick from both git.git and
>>> linux.git with this patch using the new zdiff3 mode, that only shows it
>>> doesn't segfault. (Though I also reran 10% of the linux remerges with zdiff3
>>> under valgrind without issues.) I looked through some differences between
>>> --remerge-diff with diff3 and --remerge-diff with zdiff3, but those are
>>> essentially diffs of a diff of a diff, which I found hard to read. I'd like
>>> to look through more examples, and use it for a while before submitting the
>>> patches without the RFC tag.
>>
>> I did something similar (but I wasn't smart enough to try your
>> remerge-diff, and just re-ran a bunch of merges).
> 
> What I did was great for testing if there were funny merges that might
> cause segfaults or such with zdiff3, but not so clever for viewing the
> direct output from zdiff3.  Using remerge-diff in this way does not
> show the [z]diff3 output directly, but a diff of that output against
> what was ultimately recorded in the merge, forcing me to attempt to
> recreate the original in my head.
> 
> (And, of course, I made it even worse by taking the remerge-diff
> output with diff3, and the remerge-diff output with zdiff3, and then
> diffing those, resulting in yet another layer of diffs that I'd have
> to undo in my head to attempt to construct the original.)
> 
>> Skimming over the results, I didn't see anything that looked incorrect.
>> Many of them are pretty non-exciting, though. A common case seems to be
>> ones like 01a2a03c56 (Merge branch 'jc/diff-filter-negation',
>> 2013-09-09), where two sides both add functions in the same place, and
>> the common lines are just the closing "}" followed by a blank line.
>>
>> Removing those shared lines actually makes things less readable, IMHO,
>> but I don't think it's the wrong thing to do. The usual "merge" zealous
>> minimization likewise produces the same unreadability. If we want to
>> address that, I think the best way would be by teaching the minimization
>> some heuristics about which lines are trivial.
>>
>> Here's another interesting one. In 0c52457b7c (Merge branch
>> 'nd/daemon-informative-errors-typofix', 2014-01-10), the diff3 looks
>> like:
>>
>>    <<<<<<< ours
>>                    if (starts_with(arg, "--informative-errors")) {
>>    ||||||| base
>>                    if (!prefixcmp(arg, "--informative-errors")) {
>>    =======
>>                    if (!strcmp(arg, "--informative-errors")) {
>>    >>>>>>> theirs
>>                            informative_errors = 1;
>>                            continue;
>>                    }
>>    <<<<<<< ours
>>                    if (starts_with(arg, "--no-informative-errors")) {
>>    ||||||| base
>>                    if (!prefixcmp(arg, "--no-informative-errors")) {
>>    =======
>>                    if (!strcmp(arg, "--no-informative-errors")) {
>>    >>>>>>> theirs
>>
>> A little clunky, but it's easy-ish to see what's going on. With zdiff3,
>> the context between the two hunks is rolled into a single hunk:
>>
>>    <<<<<<< ours
>>                    if (starts_with(arg, "--informative-errors")) {
>>                            informative_errors = 1;
>>                            continue;
>>                    }
>>                    if (starts_with(arg, "--no-informative-errors")) {
>>    ||||||| base
>>                    if (!prefixcmp(arg, "--informative-errors")) {
>>    =======
>>                    if (!strcmp(arg, "--informative-errors")) {
>>                            informative_errors = 1;
>>                            continue;
>>                    }
>>                    if (!strcmp(arg, "--no-informative-errors")) {
>>    >>>>>>> theirs
>>
>> which seems worse. I haven't dug/thought carefully enough into your
>> change yet to know if this is expected, or if there's a bug.

XDL_MERGE_ZEALOUS coalesces adjacent conflicts that are separated by 
fewer than four lines. Unfortunately the existing code in 
xdl_merge_two_conflicts() only coalesces 'ours' and 'theirs', not 
'base'. Applying

diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
index b1dc9df7ea..5f76957169 100644
--- a/xdiff/xmerge.c
+++ b/xdiff/xmerge.c
@@ -455,6 +455,7 @@ static int lines_contain_alnum(xdfenv_t *xe, int i, 
int chg)
  static void xdl_merge_two_conflicts(xdmerge_t *m)
  {
         xdmerge_t *next_m = m->next;
+       m->chg0 = next_m->i0 + next_m->chg0 - m->i0;
         m->chg1 = next_m->i1 + next_m->chg1 - m->i1;
         m->chg2 = next_m->i2 + next_m->chg2 - m->i2;
         m->next = next_m->next;

and running
     git checkout 0c52457b7c^ &&
     bin-wrappers/git -c merge.conflictstyle=zdiff3 merge 0c52457b7c^2
gives

<<<<<<< HEAD
		if (starts_with(arg, "--informative-errors")) {
			informative_errors = 1;
			continue;
		}
		if (starts_with(arg, "--no-informative-errors")) {
||||||| 2f93541d88
		if (!prefixcmp(arg, "--informative-errors")) {
			informative_errors = 1;
			continue;
		}
		if (!prefixcmp(arg, "--no-informative-errors")) {
=======
		if (!strcmp(arg, "--informative-errors")) {
			informative_errors = 1;
			continue;
		}
		if (!strcmp(arg, "--no-informative-errors")) {
 >>>>>>> 0c52457b7c^2

Which I think is correct. Whether combining single line conflicts in 
this way is useful is a different question (and is independent of your 
patch). I can see that with larger conflicts it is worth it but here we 
end up with conflicts where 60% of the lines are from the base version. 
One the other hand there are fewer conflicts to resolve - I'm not sure 
which I prefer.

Best Wishes

Phillip

> Yeah, I don't know for sure either but that does look buggy to me.
> Thanks for digging up these examples.  I'm a bit overdrawn on time for
> this, so I'll pick it back up in a week or two and investigate this
> case further.
>
Jeff King June 16, 2021, 10:31 a.m. UTC | #8
On Wed, Jun 16, 2021 at 09:57:49AM +0100, Phillip Wood wrote:

> > > which seems worse. I haven't dug/thought carefully enough into your
> > > change yet to know if this is expected, or if there's a bug.
> 
> XDL_MERGE_ZEALOUS coalesces adjacent conflicts that are separated by fewer
> than four lines. Unfortunately the existing code in
> xdl_merge_two_conflicts() only coalesces 'ours' and 'theirs', not 'base'.
> Applying
> [...]
> gives
> 
> <<<<<<< HEAD
> 		if (starts_with(arg, "--informative-errors")) {
> 			informative_errors = 1;
> 			continue;
> 		}
> 		if (starts_with(arg, "--no-informative-errors")) {
> ||||||| 2f93541d88
> 		if (!prefixcmp(arg, "--informative-errors")) {
> 			informative_errors = 1;
> 			continue;
> 		}
> 		if (!prefixcmp(arg, "--no-informative-errors")) {
> =======
> 		if (!strcmp(arg, "--informative-errors")) {
> 			informative_errors = 1;
> 			continue;
> 		}
> 		if (!strcmp(arg, "--no-informative-errors")) {
> >>>>>>> 0c52457b7c^2
> 
> Which I think is correct. Whether combining single line conflicts in this
> way is useful is a different question (and is independent of your patch). I
> can see that with larger conflicts it is worth it but here we end up with
> conflicts where 60% of the lines are from the base version. One the other
> hand there are fewer conflicts to resolve - I'm not sure which I prefer.

Thanks for figuring that out. I agree that the output after the patch
you showed is correct, in the sense that the common lines show up in the
base now. It does feel like it's working against the point of zdiff3,
though, which is to reduce the number of common lines shown in the
"ours" and "theirs" hunks.

Likewise, I think this coalescing makes things worse even for "merge",
where you get:

  <<<<<<< ours
                  if (starts_with(arg, "--informative-errors")) {
                          informative_errors = 1;
                          continue;
                  }
                  if (starts_with(arg, "--no-informative-errors")) {
  =======
                  if (!strcmp(arg, "--informative-errors")) {
                          informative_errors = 1;
                          continue;
                  }
                  if (!strcmp(arg, "--no-informative-errors")) {
  >>>>>>> theirs

and have to figure out manually that those interior lines are common.
But I imagine there are cases where you have a large number of
uninteresting lines (blank lines, "}", etc) that create a lot of noise
in the output by breaking up the actual changed lines into tiny hunks
that are hard to digest on their own.

-Peff
Elijah Newren June 17, 2021, 5:03 a.m. UTC | #9
On Wed, Jun 16, 2021 at 1:57 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>
> On 15/06/2021 20:35, Elijah Newren wrote:
> > On Tue, Jun 15, 2021 at 2:43 AM Jeff King <peff@peff.net> wrote:
> >>
> >> On Tue, Jun 15, 2021 at 05:16:08AM +0000, Elijah Newren via GitGitGadget wrote:
> >>
> >>> Implement a zealous diff3, or "zdiff3". This new mode is identical to
> >>> ordinary diff3 except that it allows compaction of common lines between the
> >>> two sides of history, if those common lines occur at the beginning or end of
> >>> a conflict hunk.
> >>>
> >>> This is just RFC, because I need to add tests. Also, while I've remerged
> >>> every merge, revert, or duly marked cherry-pick from both git.git and
> >>> linux.git with this patch using the new zdiff3 mode, that only shows it
> >>> doesn't segfault. (Though I also reran 10% of the linux remerges with zdiff3
> >>> under valgrind without issues.) I looked through some differences between
> >>> --remerge-diff with diff3 and --remerge-diff with zdiff3, but those are
> >>> essentially diffs of a diff of a diff, which I found hard to read. I'd like
> >>> to look through more examples, and use it for a while before submitting the
> >>> patches without the RFC tag.
> >>
> >> I did something similar (but I wasn't smart enough to try your
> >> remerge-diff, and just re-ran a bunch of merges).
> >
> > What I did was great for testing if there were funny merges that might
> > cause segfaults or such with zdiff3, but not so clever for viewing the
> > direct output from zdiff3.  Using remerge-diff in this way does not
> > show the [z]diff3 output directly, but a diff of that output against
> > what was ultimately recorded in the merge, forcing me to attempt to
> > recreate the original in my head.
> >
> > (And, of course, I made it even worse by taking the remerge-diff
> > output with diff3, and the remerge-diff output with zdiff3, and then
> > diffing those, resulting in yet another layer of diffs that I'd have
> > to undo in my head to attempt to construct the original.)
> >
> >> Skimming over the results, I didn't see anything that looked incorrect.
> >> Many of them are pretty non-exciting, though. A common case seems to be
> >> ones like 01a2a03c56 (Merge branch 'jc/diff-filter-negation',
> >> 2013-09-09), where two sides both add functions in the same place, and
> >> the common lines are just the closing "}" followed by a blank line.
> >>
> >> Removing those shared lines actually makes things less readable, IMHO,
> >> but I don't think it's the wrong thing to do. The usual "merge" zealous
> >> minimization likewise produces the same unreadability. If we want to
> >> address that, I think the best way would be by teaching the minimization
> >> some heuristics about which lines are trivial.
> >>
> >> Here's another interesting one. In 0c52457b7c (Merge branch
> >> 'nd/daemon-informative-errors-typofix', 2014-01-10), the diff3 looks
> >> like:
> >>
> >>    <<<<<<< ours
> >>                    if (starts_with(arg, "--informative-errors")) {
> >>    ||||||| base
> >>                    if (!prefixcmp(arg, "--informative-errors")) {
> >>    =======
> >>                    if (!strcmp(arg, "--informative-errors")) {
> >>    >>>>>>> theirs
> >>                            informative_errors = 1;
> >>                            continue;
> >>                    }
> >>    <<<<<<< ours
> >>                    if (starts_with(arg, "--no-informative-errors")) {
> >>    ||||||| base
> >>                    if (!prefixcmp(arg, "--no-informative-errors")) {
> >>    =======
> >>                    if (!strcmp(arg, "--no-informative-errors")) {
> >>    >>>>>>> theirs
> >>
> >> A little clunky, but it's easy-ish to see what's going on. With zdiff3,
> >> the context between the two hunks is rolled into a single hunk:
> >>
> >>    <<<<<<< ours
> >>                    if (starts_with(arg, "--informative-errors")) {
> >>                            informative_errors = 1;
> >>                            continue;
> >>                    }
> >>                    if (starts_with(arg, "--no-informative-errors")) {
> >>    ||||||| base
> >>                    if (!prefixcmp(arg, "--informative-errors")) {
> >>    =======
> >>                    if (!strcmp(arg, "--informative-errors")) {
> >>                            informative_errors = 1;
> >>                            continue;
> >>                    }
> >>                    if (!strcmp(arg, "--no-informative-errors")) {
> >>    >>>>>>> theirs
> >>
> >> which seems worse. I haven't dug/thought carefully enough into your
> >> change yet to know if this is expected, or if there's a bug.
>
> XDL_MERGE_ZEALOUS coalesces adjacent conflicts that are separated by
> fewer than four lines. Unfortunately the existing code in
> xdl_merge_two_conflicts() only coalesces 'ours' and 'theirs', not
> 'base'. Applying
>
> diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
> index b1dc9df7ea..5f76957169 100644
> --- a/xdiff/xmerge.c
> +++ b/xdiff/xmerge.c
> @@ -455,6 +455,7 @@ static int lines_contain_alnum(xdfenv_t *xe, int i,
> int chg)
>   static void xdl_merge_two_conflicts(xdmerge_t *m)
>   {
>          xdmerge_t *next_m = m->next;
> +       m->chg0 = next_m->i0 + next_m->chg0 - m->i0;
>          m->chg1 = next_m->i1 + next_m->chg1 - m->i1;
>          m->chg2 = next_m->i2 + next_m->chg2 - m->i2;
>          m->next = next_m->next;
>
> and running
>      git checkout 0c52457b7c^ &&
>      bin-wrappers/git -c merge.conflictstyle=zdiff3 merge 0c52457b7c^2
> gives
>
> <<<<<<< HEAD
>                 if (starts_with(arg, "--informative-errors")) {
>                         informative_errors = 1;
>                         continue;
>                 }
>                 if (starts_with(arg, "--no-informative-errors")) {
> ||||||| 2f93541d88
>                 if (!prefixcmp(arg, "--informative-errors")) {
>                         informative_errors = 1;
>                         continue;
>                 }
>                 if (!prefixcmp(arg, "--no-informative-errors")) {
> =======
>                 if (!strcmp(arg, "--informative-errors")) {
>                         informative_errors = 1;
>                         continue;
>                 }
>                 if (!strcmp(arg, "--no-informative-errors")) {
>  >>>>>>> 0c52457b7c^2
>
> Which I think is correct. Whether combining single line conflicts in
> this way is useful is a different question (and is independent of your
> patch). I can see that with larger conflicts it is worth it but here we
> end up with conflicts where 60% of the lines are from the base version.
> One the other hand there are fewer conflicts to resolve - I'm not sure
> which I prefer.

Oh, sweet, thanks for tracking this down!  I'll try to find some time
to play with it some more.
Phillip Wood June 23, 2021, 9:53 a.m. UTC | #10
On 16/06/2021 11:31, Jeff King wrote:
> On Wed, Jun 16, 2021 at 09:57:49AM +0100, Phillip Wood wrote:
> 
>>>> which seems worse. I haven't dug/thought carefully enough into your
>>>> change yet to know if this is expected, or if there's a bug.
>>
>> XDL_MERGE_ZEALOUS coalesces adjacent conflicts that are separated by fewer
>> than four lines. Unfortunately the existing code in
>> xdl_merge_two_conflicts() only coalesces 'ours' and 'theirs', not 'base'.
>> Applying
>> [...]
>> gives
>>
>> <<<<<<< HEAD
>> 		if (starts_with(arg, "--informative-errors")) {
>> 			informative_errors = 1;
>> 			continue;
>> 		}
>> 		if (starts_with(arg, "--no-informative-errors")) {
>> ||||||| 2f93541d88
>> 		if (!prefixcmp(arg, "--informative-errors")) {
>> 			informative_errors = 1;
>> 			continue;
>> 		}
>> 		if (!prefixcmp(arg, "--no-informative-errors")) {
>> =======
>> 		if (!strcmp(arg, "--informative-errors")) {
>> 			informative_errors = 1;
>> 			continue;
>> 		}
>> 		if (!strcmp(arg, "--no-informative-errors")) {
>>>>>>>>> 0c52457b7c^2
>>
>> Which I think is correct. Whether combining single line conflicts in this
>> way is useful is a different question (and is independent of your patch). I
>> can see that with larger conflicts it is worth it but here we end up with
>> conflicts where 60% of the lines are from the base version. One the other
>> hand there are fewer conflicts to resolve - I'm not sure which I prefer.
> 
> Thanks for figuring that out. I agree that the output after the patch
> you showed is correct, in the sense that the common lines show up in the
> base now. It does feel like it's working against the point of zdiff3,
> though, which is to reduce the number of common lines shown in the
> "ours" and "theirs" hunks.

I agree - the output is longer rather than shorter. As we only want to 
trim the common prefix and suffix from the conflicts I wonder if it 
would be better to special case zdiff3 rather than piggy backing on the 
existing XDL_MERGE_ZEALOUS implementation. We can trim the common lines 
by looping over the begging and end of the hunk comparing the lines 
with xdl_recmatch() without going to the trouble of diffing them as 
XDL_MERGE_ZEALOUS does. I don't think we need to worry about coalescing 
adjacent conflicts for zdiff3. It makes sense to coalesce in the 
XDL_MERGE_ZEALOUS case as it can potentially split a  N line conflict 
hunk into N/2 single line conflict hunks but zdiff3 does not split 
conflict hunks.

> Likewise, I think this coalescing makes things worse even for "merge",
> where you get:
> 
>    <<<<<<< ours
>                    if (starts_with(arg, "--informative-errors")) {
>                            informative_errors = 1;
>                            continue;
>                    }
>                    if (starts_with(arg, "--no-informative-errors")) {
>    =======
>                    if (!strcmp(arg, "--informative-errors")) {
>                            informative_errors = 1;
>                            continue;
>                    }
>                    if (!strcmp(arg, "--no-informative-errors")) {
>    >>>>>>> theirs
> 
> and have to figure out manually that those interior lines are common.
> But I imagine there are cases where you have a large number of
> uninteresting lines (blank lines, "}", etc that create a lot of noise > in the output by breaking up the actual changed lines into tiny hunks
> that are hard to digest on their own.

Yes, I think the heuristic for coalescing conflict hunks could be 
improved. It would be fairly simple to only join two hunks if the 
conflicts are longer that the context between them and the existing 
XDL_MERGE_ZEALOUS_ALNUM logic allows conflicts with more context between 
them to be coalesced if the context lines are uninteresting. I think 
XDL_MERGE_ZEALOUS_ALNUM is only used by `git merge-file` at the moment, 
with everything else going through ll_merge() which uses XDL_MERGE_ZEALOUS

Best Wishes

Phillip

> 
> -Peff
>
Jeff King June 23, 2021, 10:28 p.m. UTC | #11
On Wed, Jun 23, 2021 at 10:53:25AM +0100, Phillip Wood wrote:

> > Thanks for figuring that out. I agree that the output after the patch
> > you showed is correct, in the sense that the common lines show up in the
> > base now. It does feel like it's working against the point of zdiff3,
> > though, which is to reduce the number of common lines shown in the
> > "ours" and "theirs" hunks.
> 
> I agree - the output is longer rather than shorter. As we only want to trim
> the common prefix and suffix from the conflicts I wonder if it would be
> better to special case zdiff3 rather than piggy backing on the existing
> XDL_MERGE_ZEALOUS implementation. We can trim the common lines by looping
> over the begging and end of the hunk comparing the lines with xdl_recmatch()
> without going to the trouble of diffing them as XDL_MERGE_ZEALOUS does. I
> don't think we need to worry about coalescing adjacent conflicts for zdiff3.
> It makes sense to coalesce in the XDL_MERGE_ZEALOUS case as it can
> potentially split a  N line conflict hunk into N/2 single line conflict
> hunks but zdiff3 does not split conflict hunks.

That matches my intuition of a reasonable path forward (but I confess to
not being too well-versed in the details of the XDL_MERGE internals).

> Yes, I think the heuristic for coalescing conflict hunks could be improved.
> It would be fairly simple to only join two hunks if the conflicts are longer
> that the context between them and the existing XDL_MERGE_ZEALOUS_ALNUM logic
> allows conflicts with more context between them to be coalesced if the
> context lines are uninteresting. I think XDL_MERGE_ZEALOUS_ALNUM is only
> used by `git merge-file` at the moment, with everything else going through
> ll_merge() which uses XDL_MERGE_ZEALOUS

I don't recall much discussion around using ALNUM versus not, nor could
I find much in the archive. It looks like merge-file was converted to
show off ALNUM when it was added in ee95ec5d58 (xdl_merge(): introduce
XDL_MERGE_ZEALOUS_ALNUM, 2008-02-17), and it never really progressed
from there.

It might be an interesting exercise to re-run a bunch of merges and see
if ALNUM produces better (or worse) results on the whole.

-Peff