Message ID | pull.1036.git.git.1623734171.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | RFC: implement new zdiff3 conflict style | expand |
On Tue, Jun 15, 2021 at 05:16:08AM +0000, Elijah Newren via GitGitGadget wrote: > Implement a zealous diff3, or "zdiff3". This new mode is identical to > ordinary diff3 except that it allows compaction of common lines between the > two sides of history, if those common lines occur at the beginning or end of > a conflict hunk. > > This is just RFC, because I need to add tests. Also, while I've remerged > every merge, revert, or duly marked cherry-pick from both git.git and > linux.git with this patch using the new zdiff3 mode, that only shows it > doesn't segfault. (Though I also reran 10% of the linux remerges with zdiff3 > under valgrind without issues.) I looked through some differences between > --remerge-diff with diff3 and --remerge-diff with zdiff3, but those are > essentially diffs of a diff of a diff, which I found hard to read. I'd like > to look through more examples, and use it for a while before submitting the > patches without the RFC tag. I did something similar (but I wasn't smart enough to try your remerge-diff, and just re-ran a bunch of merges). Skimming over the results, I didn't see anything that looked incorrect. Many of them are pretty non-exciting, though. A common case seems to be ones like 01a2a03c56 (Merge branch 'jc/diff-filter-negation', 2013-09-09), where two sides both add functions in the same place, and the common lines are just the closing "}" followed by a blank line. Removing those shared lines actually makes things less readable, IMHO, but I don't think it's the wrong thing to do. The usual "merge" zealous minimization likewise produces the same unreadability. If we want to address that, I think the best way would be by teaching the minimization some heuristics about which lines are trivial. Here's another interesting one. In 0c52457b7c (Merge branch 'nd/daemon-informative-errors-typofix', 2014-01-10), the diff3 looks like: <<<<<<< ours if (starts_with(arg, "--informative-errors")) { ||||||| base if (!prefixcmp(arg, "--informative-errors")) { ======= if (!strcmp(arg, "--informative-errors")) { >>>>>>> theirs informative_errors = 1; continue; } <<<<<<< ours if (starts_with(arg, "--no-informative-errors")) { ||||||| base if (!prefixcmp(arg, "--no-informative-errors")) { ======= if (!strcmp(arg, "--no-informative-errors")) { >>>>>>> theirs A little clunky, but it's easy-ish to see what's going on. With zdiff3, the context between the two hunks is rolled into a single hunk: <<<<<<< ours if (starts_with(arg, "--informative-errors")) { informative_errors = 1; continue; } if (starts_with(arg, "--no-informative-errors")) { ||||||| base if (!prefixcmp(arg, "--informative-errors")) { ======= if (!strcmp(arg, "--informative-errors")) { informative_errors = 1; continue; } if (!strcmp(arg, "--no-informative-errors")) { >>>>>>> theirs which seems worse. I haven't dug/thought carefully enough into your change yet to know if this is expected, or if there's a bug. -Peff
On Tue, Jun 15, 2021 at 2:43 AM Jeff King <peff@peff.net> wrote: > > On Tue, Jun 15, 2021 at 05:16:08AM +0000, Elijah Newren via GitGitGadget wrote: > > > Implement a zealous diff3, or "zdiff3". This new mode is identical to > > ordinary diff3 except that it allows compaction of common lines between the > > two sides of history, if those common lines occur at the beginning or end of > > a conflict hunk. > > > > This is just RFC, because I need to add tests. Also, while I've remerged > > every merge, revert, or duly marked cherry-pick from both git.git and > > linux.git with this patch using the new zdiff3 mode, that only shows it > > doesn't segfault. (Though I also reran 10% of the linux remerges with zdiff3 > > under valgrind without issues.) I looked through some differences between > > --remerge-diff with diff3 and --remerge-diff with zdiff3, but those are > > essentially diffs of a diff of a diff, which I found hard to read. I'd like > > to look through more examples, and use it for a while before submitting the > > patches without the RFC tag. > > I did something similar (but I wasn't smart enough to try your > remerge-diff, and just re-ran a bunch of merges). What I did was great for testing if there were funny merges that might cause segfaults or such with zdiff3, but not so clever for viewing the direct output from zdiff3. Using remerge-diff in this way does not show the [z]diff3 output directly, but a diff of that output against what was ultimately recorded in the merge, forcing me to attempt to recreate the original in my head. (And, of course, I made it even worse by taking the remerge-diff output with diff3, and the remerge-diff output with zdiff3, and then diffing those, resulting in yet another layer of diffs that I'd have to undo in my head to attempt to construct the original.) > Skimming over the results, I didn't see anything that looked incorrect. > Many of them are pretty non-exciting, though. A common case seems to be > ones like 01a2a03c56 (Merge branch 'jc/diff-filter-negation', > 2013-09-09), where two sides both add functions in the same place, and > the common lines are just the closing "}" followed by a blank line. > > Removing those shared lines actually makes things less readable, IMHO, > but I don't think it's the wrong thing to do. The usual "merge" zealous > minimization likewise produces the same unreadability. If we want to > address that, I think the best way would be by teaching the minimization > some heuristics about which lines are trivial. > > Here's another interesting one. In 0c52457b7c (Merge branch > 'nd/daemon-informative-errors-typofix', 2014-01-10), the diff3 looks > like: > > <<<<<<< ours > if (starts_with(arg, "--informative-errors")) { > ||||||| base > if (!prefixcmp(arg, "--informative-errors")) { > ======= > if (!strcmp(arg, "--informative-errors")) { > >>>>>>> theirs > informative_errors = 1; > continue; > } > <<<<<<< ours > if (starts_with(arg, "--no-informative-errors")) { > ||||||| base > if (!prefixcmp(arg, "--no-informative-errors")) { > ======= > if (!strcmp(arg, "--no-informative-errors")) { > >>>>>>> theirs > > A little clunky, but it's easy-ish to see what's going on. With zdiff3, > the context between the two hunks is rolled into a single hunk: > > <<<<<<< ours > if (starts_with(arg, "--informative-errors")) { > informative_errors = 1; > continue; > } > if (starts_with(arg, "--no-informative-errors")) { > ||||||| base > if (!prefixcmp(arg, "--informative-errors")) { > ======= > if (!strcmp(arg, "--informative-errors")) { > informative_errors = 1; > continue; > } > if (!strcmp(arg, "--no-informative-errors")) { > >>>>>>> theirs > > which seems worse. I haven't dug/thought carefully enough into your > change yet to know if this is expected, or if there's a bug. Yeah, I don't know for sure either but that does look buggy to me. Thanks for digging up these examples. I'm a bit overdrawn on time for this, so I'll pick it back up in a week or two and investigate this case further.
Am 15.06.21 um 07:16 schrieb Elijah Newren via GitGitGadget: > Implement a zealous diff3, or "zdiff3". This new mode is identical to > ordinary diff3 except that it allows compaction of common lines between the > two sides of history, if those common lines occur at the beginning or end of > a conflict hunk. As a data point, I tried this series (cf9d93e547 en/zdiff3) on my criss-cross merge test case that started this adventure, and it produces the very same output as diff3; cf. https://lore.kernel.org/git/60883e1b-787f-5ec2-a9af-f2f6757d3c43@kdbg.org/ -- Hannes
On Tue, Jun 15, 2021 at 2:36 PM Johannes Sixt <j6t@kdbg.org> wrote: > > Am 15.06.21 um 07:16 schrieb Elijah Newren via GitGitGadget: > > Implement a zealous diff3, or "zdiff3". This new mode is identical to > > ordinary diff3 except that it allows compaction of common lines between the > > two sides of history, if those common lines occur at the beginning or end of > > a conflict hunk. > > As a data point, I tried this series (cf9d93e547 en/zdiff3) on my > criss-cross merge test case that started this adventure, and it produces > the very same output as diff3; cf. > https://lore.kernel.org/git/60883e1b-787f-5ec2-a9af-f2f6757d3c43@kdbg.org/ That's good to hear; your two sides had no common text at the beginning or end of the conflict hunk, so I wouldn't expect zdiff3 to change that particular example. The XDL_MERGE_FAVOR_BASE idea (cf. https://lore.kernel.org/git/20210611190235.1970106-1-newren@gmail.com/), though would I think simplify the diff3 conflict markers in your example to <<<<<<< HEAD CClustering ComputeSSLClusters(double threshPercent, const CDataInfo* scale) const override; void ComputeDist(DistFunc distFunc, CDoubleArray& dist, double& minDist, double& maxDist) const; double EstimateNodeDist2() const override; std::vector<double> EstimateNeighborMinDist() const override; ||||||| merged common ancestors CClustering ComputeClusters(const double* dist, double threshold, const CDataInfo* scale) const override; virtual void ComputeDist(DistFunc distFunc, CDoubleArray& dist, double& minDist, double& maxDist); virtual void ComputeUMatrix(); virtual void ComputeKNearest(int K, const double*, Neighborhood& result) const; ======= CClustering ComputeSSLClusters(double threshPercent, const CDoubleArray& compWeights, const CDataInfo* scale) const override; static void ComputeDist(const CNetNodeHolder& vecs, CDoubleArray& dist, double& minDist, double& maxDist); >>>>>>> no-compweights-in-cnet That seems like it might be nicer, but I don't do many criss-cross merges myself so it'd be nice to get opinions of others like you who do.
Am 15.06.21 um 23:45 schrieb Elijah Newren: > On Tue, Jun 15, 2021 at 2:36 PM Johannes Sixt <j6t@kdbg.org> wrote: >> >> Am 15.06.21 um 07:16 schrieb Elijah Newren via GitGitGadget: >>> Implement a zealous diff3, or "zdiff3". This new mode is identical to >>> ordinary diff3 except that it allows compaction of common lines between the >>> two sides of history, if those common lines occur at the beginning or end of >>> a conflict hunk. >> >> As a data point, I tried this series (cf9d93e547 en/zdiff3) on my >> criss-cross merge test case that started this adventure, and it produces >> the very same output as diff3; cf. >> https://lore.kernel.org/git/60883e1b-787f-5ec2-a9af-f2f6757d3c43@kdbg.org/ > > That's good to hear; your two sides had no common text at the > beginning or end of the conflict hunk, so I wouldn't expect zdiff3 to > change that particular example. > > The XDL_MERGE_FAVOR_BASE idea (cf. > https://lore.kernel.org/git/20210611190235.1970106-1-newren@gmail.com/), > though would I think simplify the diff3 conflict markers in your > example to > > <<<<<<< HEAD > CClustering ComputeSSLClusters(double threshPercent, const > CDataInfo* scale) const override; > void ComputeDist(DistFunc distFunc, CDoubleArray& dist, > double& minDist, double& maxDist) const; > double EstimateNodeDist2() const override; > std::vector<double> EstimateNeighborMinDist() const override; > ||||||| merged common ancestors > CClustering ComputeClusters(const double* dist, double threshold, > const CDataInfo* scale) const override; > virtual void ComputeDist(DistFunc distFunc, CDoubleArray& dist, > double& minDist, double& maxDist); > virtual void ComputeUMatrix(); > virtual void ComputeKNearest(int K, const double*, > Neighborhood& result) const; > ======= > CClustering ComputeSSLClusters(double threshPercent, > const CDoubleArray& compWeights, const CDataInfo* scale) const override; > static void ComputeDist(const CNetNodeHolder& vecs, CDoubleArray& dist, > double& minDist, double& maxDist); >>>>>>>> no-compweights-in-cnet > > That seems like it might be nicer, but I don't do many criss-cross > merges myself so it'd be nice to get opinions of others like you who > do. That *is* nicer as it is just the regular conflict with some base context. Does that mean that the regular recursive merge is a bit sloppy because it throws away too many conflicts that occur in virtual ancestors? Even if that is the case, I couldn't tell whether that is a disadvantage or not, as I've actually never seen nested conflicts in the past; the diff3 test was the first time I saw one. With the referenced patch applied (after a small tweak around needs_cr to make it compile), my testcase does indeed produce the above conflict under zdiff3 mode. The diff3 mode, OTOH, produces an exceedingly large non-nested and obviously incorrect conflict (I'm not going to post it here): our and their side are large and share a lot of text, but the base part is identical to the above and is far too small. -- Hannes
On Tue, Jun 15, 2021 at 11:16 PM Johannes Sixt <j6t@kdbg.org> wrote: > > Am 15.06.21 um 23:45 schrieb Elijah Newren: > > On Tue, Jun 15, 2021 at 2:36 PM Johannes Sixt <j6t@kdbg.org> wrote: > >> > >> Am 15.06.21 um 07:16 schrieb Elijah Newren via GitGitGadget: > >>> Implement a zealous diff3, or "zdiff3". This new mode is identical to > >>> ordinary diff3 except that it allows compaction of common lines between the > >>> two sides of history, if those common lines occur at the beginning or end of > >>> a conflict hunk. > >> > >> As a data point, I tried this series (cf9d93e547 en/zdiff3) on my > >> criss-cross merge test case that started this adventure, and it produces > >> the very same output as diff3; cf. > >> https://lore.kernel.org/git/60883e1b-787f-5ec2-a9af-f2f6757d3c43@kdbg.org/ > > > > That's good to hear; your two sides had no common text at the > > beginning or end of the conflict hunk, so I wouldn't expect zdiff3 to > > change that particular example. > > > > The XDL_MERGE_FAVOR_BASE idea (cf. > > https://lore.kernel.org/git/20210611190235.1970106-1-newren@gmail.com/), > > though would I think simplify the diff3 conflict markers in your > > example to > > > > <<<<<<< HEAD > > CClustering ComputeSSLClusters(double threshPercent, const > > CDataInfo* scale) const override; > > void ComputeDist(DistFunc distFunc, CDoubleArray& dist, > > double& minDist, double& maxDist) const; > > double EstimateNodeDist2() const override; > > std::vector<double> EstimateNeighborMinDist() const override; > > ||||||| merged common ancestors > > CClustering ComputeClusters(const double* dist, double threshold, > > const CDataInfo* scale) const override; > > virtual void ComputeDist(DistFunc distFunc, CDoubleArray& dist, > > double& minDist, double& maxDist); > > virtual void ComputeUMatrix(); > > virtual void ComputeKNearest(int K, const double*, > > Neighborhood& result) const; > > ======= > > CClustering ComputeSSLClusters(double threshPercent, > > const CDoubleArray& compWeights, const CDataInfo* scale) const override; > > static void ComputeDist(const CNetNodeHolder& vecs, CDoubleArray& dist, > > double& minDist, double& maxDist); > >>>>>>>> no-compweights-in-cnet > > > > That seems like it might be nicer, but I don't do many criss-cross > > merges myself so it'd be nice to get opinions of others like you who > > do. > > That *is* nicer as it is just the regular conflict with some base > context. Does that mean that the regular recursive merge is a bit sloppy > because it throws away too many conflicts that occur in virtual > ancestors? Even if that is the case, I couldn't tell whether that is a > disadvantage or not, as I've actually never seen nested conflicts in the > past; the diff3 test was the first time I saw one. > > With the referenced patch applied (after a small tweak around needs_cr > to make it compile), my testcase does indeed produce the above conflict > under zdiff3 mode. The diff3 mode, OTOH, produces an exceedingly large > non-nested and obviously incorrect conflict (I'm not going to post it > here): our and their side are large and share a lot of text, but the > base part is identical to the above and is far too small. The thing about XDL_MERGE_FAVOR_BASE in combination with a recursive merge, is that if you have two merge-bases, then XDL_MERGE_FAVOR_BASE will put the text from the merge base of the merge-bases in the "original text" region. If your merge bases themselves had multiple merge-bases, and those merge bases in turn had multiple merge-bases, etc., so that you end up with a deeply nested recursive merge, then XDL_MERGE_FAVOR_BASE will cause you to get the "original text" filled by the text from some ancient ancestor. The net result is that it makes it look like you have done a three-way merge between the two sides you expect with a really ancient merge base. So, a really small original text is probably something to be expected from the XDL_MERGE_FAVOR_BASE patch. The fact that zdiff3 shrinks it down considerably reflects the fact that your two sides had long sections of code that matched at the beginning and end of the conflict hunk. So it sounds to me like both pieces were behaving implementationally as designed for your testcase. Whether using "ancient original text" makes sense in general rather than using a "more recent original text" (in the form of an attempted merge of the merge-bases complete with conflict markers), feels like it's one of those tradeoff things that I figured those who do lots of criss-cross merges should weigh in on rather than me.
On 15/06/2021 20:35, Elijah Newren wrote: > On Tue, Jun 15, 2021 at 2:43 AM Jeff King <peff@peff.net> wrote: >> >> On Tue, Jun 15, 2021 at 05:16:08AM +0000, Elijah Newren via GitGitGadget wrote: >> >>> Implement a zealous diff3, or "zdiff3". This new mode is identical to >>> ordinary diff3 except that it allows compaction of common lines between the >>> two sides of history, if those common lines occur at the beginning or end of >>> a conflict hunk. >>> >>> This is just RFC, because I need to add tests. Also, while I've remerged >>> every merge, revert, or duly marked cherry-pick from both git.git and >>> linux.git with this patch using the new zdiff3 mode, that only shows it >>> doesn't segfault. (Though I also reran 10% of the linux remerges with zdiff3 >>> under valgrind without issues.) I looked through some differences between >>> --remerge-diff with diff3 and --remerge-diff with zdiff3, but those are >>> essentially diffs of a diff of a diff, which I found hard to read. I'd like >>> to look through more examples, and use it for a while before submitting the >>> patches without the RFC tag. >> >> I did something similar (but I wasn't smart enough to try your >> remerge-diff, and just re-ran a bunch of merges). > > What I did was great for testing if there were funny merges that might > cause segfaults or such with zdiff3, but not so clever for viewing the > direct output from zdiff3. Using remerge-diff in this way does not > show the [z]diff3 output directly, but a diff of that output against > what was ultimately recorded in the merge, forcing me to attempt to > recreate the original in my head. > > (And, of course, I made it even worse by taking the remerge-diff > output with diff3, and the remerge-diff output with zdiff3, and then > diffing those, resulting in yet another layer of diffs that I'd have > to undo in my head to attempt to construct the original.) > >> Skimming over the results, I didn't see anything that looked incorrect. >> Many of them are pretty non-exciting, though. A common case seems to be >> ones like 01a2a03c56 (Merge branch 'jc/diff-filter-negation', >> 2013-09-09), where two sides both add functions in the same place, and >> the common lines are just the closing "}" followed by a blank line. >> >> Removing those shared lines actually makes things less readable, IMHO, >> but I don't think it's the wrong thing to do. The usual "merge" zealous >> minimization likewise produces the same unreadability. If we want to >> address that, I think the best way would be by teaching the minimization >> some heuristics about which lines are trivial. >> >> Here's another interesting one. In 0c52457b7c (Merge branch >> 'nd/daemon-informative-errors-typofix', 2014-01-10), the diff3 looks >> like: >> >> <<<<<<< ours >> if (starts_with(arg, "--informative-errors")) { >> ||||||| base >> if (!prefixcmp(arg, "--informative-errors")) { >> ======= >> if (!strcmp(arg, "--informative-errors")) { >> >>>>>>> theirs >> informative_errors = 1; >> continue; >> } >> <<<<<<< ours >> if (starts_with(arg, "--no-informative-errors")) { >> ||||||| base >> if (!prefixcmp(arg, "--no-informative-errors")) { >> ======= >> if (!strcmp(arg, "--no-informative-errors")) { >> >>>>>>> theirs >> >> A little clunky, but it's easy-ish to see what's going on. With zdiff3, >> the context between the two hunks is rolled into a single hunk: >> >> <<<<<<< ours >> if (starts_with(arg, "--informative-errors")) { >> informative_errors = 1; >> continue; >> } >> if (starts_with(arg, "--no-informative-errors")) { >> ||||||| base >> if (!prefixcmp(arg, "--informative-errors")) { >> ======= >> if (!strcmp(arg, "--informative-errors")) { >> informative_errors = 1; >> continue; >> } >> if (!strcmp(arg, "--no-informative-errors")) { >> >>>>>>> theirs >> >> which seems worse. I haven't dug/thought carefully enough into your >> change yet to know if this is expected, or if there's a bug. XDL_MERGE_ZEALOUS coalesces adjacent conflicts that are separated by fewer than four lines. Unfortunately the existing code in xdl_merge_two_conflicts() only coalesces 'ours' and 'theirs', not 'base'. Applying diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c index b1dc9df7ea..5f76957169 100644 --- a/xdiff/xmerge.c +++ b/xdiff/xmerge.c @@ -455,6 +455,7 @@ static int lines_contain_alnum(xdfenv_t *xe, int i, int chg) static void xdl_merge_two_conflicts(xdmerge_t *m) { xdmerge_t *next_m = m->next; + m->chg0 = next_m->i0 + next_m->chg0 - m->i0; m->chg1 = next_m->i1 + next_m->chg1 - m->i1; m->chg2 = next_m->i2 + next_m->chg2 - m->i2; m->next = next_m->next; and running git checkout 0c52457b7c^ && bin-wrappers/git -c merge.conflictstyle=zdiff3 merge 0c52457b7c^2 gives <<<<<<< HEAD if (starts_with(arg, "--informative-errors")) { informative_errors = 1; continue; } if (starts_with(arg, "--no-informative-errors")) { ||||||| 2f93541d88 if (!prefixcmp(arg, "--informative-errors")) { informative_errors = 1; continue; } if (!prefixcmp(arg, "--no-informative-errors")) { ======= if (!strcmp(arg, "--informative-errors")) { informative_errors = 1; continue; } if (!strcmp(arg, "--no-informative-errors")) { >>>>>>> 0c52457b7c^2 Which I think is correct. Whether combining single line conflicts in this way is useful is a different question (and is independent of your patch). I can see that with larger conflicts it is worth it but here we end up with conflicts where 60% of the lines are from the base version. One the other hand there are fewer conflicts to resolve - I'm not sure which I prefer. Best Wishes Phillip > Yeah, I don't know for sure either but that does look buggy to me. > Thanks for digging up these examples. I'm a bit overdrawn on time for > this, so I'll pick it back up in a week or two and investigate this > case further. >
On Wed, Jun 16, 2021 at 09:57:49AM +0100, Phillip Wood wrote: > > > which seems worse. I haven't dug/thought carefully enough into your > > > change yet to know if this is expected, or if there's a bug. > > XDL_MERGE_ZEALOUS coalesces adjacent conflicts that are separated by fewer > than four lines. Unfortunately the existing code in > xdl_merge_two_conflicts() only coalesces 'ours' and 'theirs', not 'base'. > Applying > [...] > gives > > <<<<<<< HEAD > if (starts_with(arg, "--informative-errors")) { > informative_errors = 1; > continue; > } > if (starts_with(arg, "--no-informative-errors")) { > ||||||| 2f93541d88 > if (!prefixcmp(arg, "--informative-errors")) { > informative_errors = 1; > continue; > } > if (!prefixcmp(arg, "--no-informative-errors")) { > ======= > if (!strcmp(arg, "--informative-errors")) { > informative_errors = 1; > continue; > } > if (!strcmp(arg, "--no-informative-errors")) { > >>>>>>> 0c52457b7c^2 > > Which I think is correct. Whether combining single line conflicts in this > way is useful is a different question (and is independent of your patch). I > can see that with larger conflicts it is worth it but here we end up with > conflicts where 60% of the lines are from the base version. One the other > hand there are fewer conflicts to resolve - I'm not sure which I prefer. Thanks for figuring that out. I agree that the output after the patch you showed is correct, in the sense that the common lines show up in the base now. It does feel like it's working against the point of zdiff3, though, which is to reduce the number of common lines shown in the "ours" and "theirs" hunks. Likewise, I think this coalescing makes things worse even for "merge", where you get: <<<<<<< ours if (starts_with(arg, "--informative-errors")) { informative_errors = 1; continue; } if (starts_with(arg, "--no-informative-errors")) { ======= if (!strcmp(arg, "--informative-errors")) { informative_errors = 1; continue; } if (!strcmp(arg, "--no-informative-errors")) { >>>>>>> theirs and have to figure out manually that those interior lines are common. But I imagine there are cases where you have a large number of uninteresting lines (blank lines, "}", etc) that create a lot of noise in the output by breaking up the actual changed lines into tiny hunks that are hard to digest on their own. -Peff
On Wed, Jun 16, 2021 at 1:57 AM Phillip Wood <phillip.wood123@gmail.com> wrote: > > On 15/06/2021 20:35, Elijah Newren wrote: > > On Tue, Jun 15, 2021 at 2:43 AM Jeff King <peff@peff.net> wrote: > >> > >> On Tue, Jun 15, 2021 at 05:16:08AM +0000, Elijah Newren via GitGitGadget wrote: > >> > >>> Implement a zealous diff3, or "zdiff3". This new mode is identical to > >>> ordinary diff3 except that it allows compaction of common lines between the > >>> two sides of history, if those common lines occur at the beginning or end of > >>> a conflict hunk. > >>> > >>> This is just RFC, because I need to add tests. Also, while I've remerged > >>> every merge, revert, or duly marked cherry-pick from both git.git and > >>> linux.git with this patch using the new zdiff3 mode, that only shows it > >>> doesn't segfault. (Though I also reran 10% of the linux remerges with zdiff3 > >>> under valgrind without issues.) I looked through some differences between > >>> --remerge-diff with diff3 and --remerge-diff with zdiff3, but those are > >>> essentially diffs of a diff of a diff, which I found hard to read. I'd like > >>> to look through more examples, and use it for a while before submitting the > >>> patches without the RFC tag. > >> > >> I did something similar (but I wasn't smart enough to try your > >> remerge-diff, and just re-ran a bunch of merges). > > > > What I did was great for testing if there were funny merges that might > > cause segfaults or such with zdiff3, but not so clever for viewing the > > direct output from zdiff3. Using remerge-diff in this way does not > > show the [z]diff3 output directly, but a diff of that output against > > what was ultimately recorded in the merge, forcing me to attempt to > > recreate the original in my head. > > > > (And, of course, I made it even worse by taking the remerge-diff > > output with diff3, and the remerge-diff output with zdiff3, and then > > diffing those, resulting in yet another layer of diffs that I'd have > > to undo in my head to attempt to construct the original.) > > > >> Skimming over the results, I didn't see anything that looked incorrect. > >> Many of them are pretty non-exciting, though. A common case seems to be > >> ones like 01a2a03c56 (Merge branch 'jc/diff-filter-negation', > >> 2013-09-09), where two sides both add functions in the same place, and > >> the common lines are just the closing "}" followed by a blank line. > >> > >> Removing those shared lines actually makes things less readable, IMHO, > >> but I don't think it's the wrong thing to do. The usual "merge" zealous > >> minimization likewise produces the same unreadability. If we want to > >> address that, I think the best way would be by teaching the minimization > >> some heuristics about which lines are trivial. > >> > >> Here's another interesting one. In 0c52457b7c (Merge branch > >> 'nd/daemon-informative-errors-typofix', 2014-01-10), the diff3 looks > >> like: > >> > >> <<<<<<< ours > >> if (starts_with(arg, "--informative-errors")) { > >> ||||||| base > >> if (!prefixcmp(arg, "--informative-errors")) { > >> ======= > >> if (!strcmp(arg, "--informative-errors")) { > >> >>>>>>> theirs > >> informative_errors = 1; > >> continue; > >> } > >> <<<<<<< ours > >> if (starts_with(arg, "--no-informative-errors")) { > >> ||||||| base > >> if (!prefixcmp(arg, "--no-informative-errors")) { > >> ======= > >> if (!strcmp(arg, "--no-informative-errors")) { > >> >>>>>>> theirs > >> > >> A little clunky, but it's easy-ish to see what's going on. With zdiff3, > >> the context between the two hunks is rolled into a single hunk: > >> > >> <<<<<<< ours > >> if (starts_with(arg, "--informative-errors")) { > >> informative_errors = 1; > >> continue; > >> } > >> if (starts_with(arg, "--no-informative-errors")) { > >> ||||||| base > >> if (!prefixcmp(arg, "--informative-errors")) { > >> ======= > >> if (!strcmp(arg, "--informative-errors")) { > >> informative_errors = 1; > >> continue; > >> } > >> if (!strcmp(arg, "--no-informative-errors")) { > >> >>>>>>> theirs > >> > >> which seems worse. I haven't dug/thought carefully enough into your > >> change yet to know if this is expected, or if there's a bug. > > XDL_MERGE_ZEALOUS coalesces adjacent conflicts that are separated by > fewer than four lines. Unfortunately the existing code in > xdl_merge_two_conflicts() only coalesces 'ours' and 'theirs', not > 'base'. Applying > > diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c > index b1dc9df7ea..5f76957169 100644 > --- a/xdiff/xmerge.c > +++ b/xdiff/xmerge.c > @@ -455,6 +455,7 @@ static int lines_contain_alnum(xdfenv_t *xe, int i, > int chg) > static void xdl_merge_two_conflicts(xdmerge_t *m) > { > xdmerge_t *next_m = m->next; > + m->chg0 = next_m->i0 + next_m->chg0 - m->i0; > m->chg1 = next_m->i1 + next_m->chg1 - m->i1; > m->chg2 = next_m->i2 + next_m->chg2 - m->i2; > m->next = next_m->next; > > and running > git checkout 0c52457b7c^ && > bin-wrappers/git -c merge.conflictstyle=zdiff3 merge 0c52457b7c^2 > gives > > <<<<<<< HEAD > if (starts_with(arg, "--informative-errors")) { > informative_errors = 1; > continue; > } > if (starts_with(arg, "--no-informative-errors")) { > ||||||| 2f93541d88 > if (!prefixcmp(arg, "--informative-errors")) { > informative_errors = 1; > continue; > } > if (!prefixcmp(arg, "--no-informative-errors")) { > ======= > if (!strcmp(arg, "--informative-errors")) { > informative_errors = 1; > continue; > } > if (!strcmp(arg, "--no-informative-errors")) { > >>>>>>> 0c52457b7c^2 > > Which I think is correct. Whether combining single line conflicts in > this way is useful is a different question (and is independent of your > patch). I can see that with larger conflicts it is worth it but here we > end up with conflicts where 60% of the lines are from the base version. > One the other hand there are fewer conflicts to resolve - I'm not sure > which I prefer. Oh, sweet, thanks for tracking this down! I'll try to find some time to play with it some more.
On 16/06/2021 11:31, Jeff King wrote: > On Wed, Jun 16, 2021 at 09:57:49AM +0100, Phillip Wood wrote: > >>>> which seems worse. I haven't dug/thought carefully enough into your >>>> change yet to know if this is expected, or if there's a bug. >> >> XDL_MERGE_ZEALOUS coalesces adjacent conflicts that are separated by fewer >> than four lines. Unfortunately the existing code in >> xdl_merge_two_conflicts() only coalesces 'ours' and 'theirs', not 'base'. >> Applying >> [...] >> gives >> >> <<<<<<< HEAD >> if (starts_with(arg, "--informative-errors")) { >> informative_errors = 1; >> continue; >> } >> if (starts_with(arg, "--no-informative-errors")) { >> ||||||| 2f93541d88 >> if (!prefixcmp(arg, "--informative-errors")) { >> informative_errors = 1; >> continue; >> } >> if (!prefixcmp(arg, "--no-informative-errors")) { >> ======= >> if (!strcmp(arg, "--informative-errors")) { >> informative_errors = 1; >> continue; >> } >> if (!strcmp(arg, "--no-informative-errors")) { >>>>>>>>> 0c52457b7c^2 >> >> Which I think is correct. Whether combining single line conflicts in this >> way is useful is a different question (and is independent of your patch). I >> can see that with larger conflicts it is worth it but here we end up with >> conflicts where 60% of the lines are from the base version. One the other >> hand there are fewer conflicts to resolve - I'm not sure which I prefer. > > Thanks for figuring that out. I agree that the output after the patch > you showed is correct, in the sense that the common lines show up in the > base now. It does feel like it's working against the point of zdiff3, > though, which is to reduce the number of common lines shown in the > "ours" and "theirs" hunks. I agree - the output is longer rather than shorter. As we only want to trim the common prefix and suffix from the conflicts I wonder if it would be better to special case zdiff3 rather than piggy backing on the existing XDL_MERGE_ZEALOUS implementation. We can trim the common lines by looping over the begging and end of the hunk comparing the lines with xdl_recmatch() without going to the trouble of diffing them as XDL_MERGE_ZEALOUS does. I don't think we need to worry about coalescing adjacent conflicts for zdiff3. It makes sense to coalesce in the XDL_MERGE_ZEALOUS case as it can potentially split a N line conflict hunk into N/2 single line conflict hunks but zdiff3 does not split conflict hunks. > Likewise, I think this coalescing makes things worse even for "merge", > where you get: > > <<<<<<< ours > if (starts_with(arg, "--informative-errors")) { > informative_errors = 1; > continue; > } > if (starts_with(arg, "--no-informative-errors")) { > ======= > if (!strcmp(arg, "--informative-errors")) { > informative_errors = 1; > continue; > } > if (!strcmp(arg, "--no-informative-errors")) { > >>>>>>> theirs > > and have to figure out manually that those interior lines are common. > But I imagine there are cases where you have a large number of > uninteresting lines (blank lines, "}", etc that create a lot of noise > in the output by breaking up the actual changed lines into tiny hunks > that are hard to digest on their own. Yes, I think the heuristic for coalescing conflict hunks could be improved. It would be fairly simple to only join two hunks if the conflicts are longer that the context between them and the existing XDL_MERGE_ZEALOUS_ALNUM logic allows conflicts with more context between them to be coalesced if the context lines are uninteresting. I think XDL_MERGE_ZEALOUS_ALNUM is only used by `git merge-file` at the moment, with everything else going through ll_merge() which uses XDL_MERGE_ZEALOUS Best Wishes Phillip > > -Peff >
On Wed, Jun 23, 2021 at 10:53:25AM +0100, Phillip Wood wrote: > > Thanks for figuring that out. I agree that the output after the patch > > you showed is correct, in the sense that the common lines show up in the > > base now. It does feel like it's working against the point of zdiff3, > > though, which is to reduce the number of common lines shown in the > > "ours" and "theirs" hunks. > > I agree - the output is longer rather than shorter. As we only want to trim > the common prefix and suffix from the conflicts I wonder if it would be > better to special case zdiff3 rather than piggy backing on the existing > XDL_MERGE_ZEALOUS implementation. We can trim the common lines by looping > over the begging and end of the hunk comparing the lines with xdl_recmatch() > without going to the trouble of diffing them as XDL_MERGE_ZEALOUS does. I > don't think we need to worry about coalescing adjacent conflicts for zdiff3. > It makes sense to coalesce in the XDL_MERGE_ZEALOUS case as it can > potentially split a N line conflict hunk into N/2 single line conflict > hunks but zdiff3 does not split conflict hunks. That matches my intuition of a reasonable path forward (but I confess to not being too well-versed in the details of the XDL_MERGE internals). > Yes, I think the heuristic for coalescing conflict hunks could be improved. > It would be fairly simple to only join two hunks if the conflicts are longer > that the context between them and the existing XDL_MERGE_ZEALOUS_ALNUM logic > allows conflicts with more context between them to be coalesced if the > context lines are uninteresting. I think XDL_MERGE_ZEALOUS_ALNUM is only > used by `git merge-file` at the moment, with everything else going through > ll_merge() which uses XDL_MERGE_ZEALOUS I don't recall much discussion around using ALNUM versus not, nor could I find much in the archive. It looks like merge-file was converted to show off ALNUM when it was added in ee95ec5d58 (xdl_merge(): introduce XDL_MERGE_ZEALOUS_ALNUM, 2008-02-17), and it never really progressed from there. It might be an interesting exercise to re-run a bunch of merges and see if ALNUM produces better (or worse) results on the whole. -Peff