Message ID | xmqqjzka7p2t.fsf_-_@gitster.g (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Re* [PATCH v4 0/3] advice: add "all" option to disable all hints | expand |
On Fri, May 3, 2024 at 2:00 PM Junio C Hamano <gitster@pobox.com> wrote: > Dragan Simic <dsimic@manjaro.org> writes: > > Just a small suggestion... Perhaps the creation factor needs adjusting > > for the range diff to actually be produced. [...] > > I see this happen to too many series I see on the list. There are > cases when the user knows that they are comparing an old and a new > iterations of the same series, e.g. running it from format-patch. > We probably should use a much higher creation factor than default to > run range-diff in such a context. > > IOW, this shouldn't have to be done by individual users, but by the > tool. > > Perhaps something along this line may not be a bad idea. > > ----- >8 --------- >8 --------- >8 --------- >8 ----- > Subject: [PATCH] format-patch: run range-diff with larger creation-factor > > We see too often that a range-diff added to format-patch output > shows too many "unmatched" patches. This is because the default > value for creation-factor is set to a relatively low value. > > It may be justified for other uses (like you have a yet-to-be-sent > new iteration of your series, and compare it against the 'seen' > branch that has an older iteration, probably with the '--left-only' > option, to pick out only your patches while ignoring the others) of > "range-diff" command, but when the command is run as part of the > format-patch, the user _knows_ and expects that the patches in the > old and the new iterations roughly correspond to each other, so we > can and should use a much higher default. > > Signed-off-by: Junio C Hamano <gitster@pobox.com> Ævar had posted[1] pretty much the exact same patch a few years ago. At the time, I had trouble understanding why `git range-diff` and `git format-patch --range-dif` would need separate default creation factors[2], and I still have trouble understanding why they should be different. Aren't both commands addressing the same use-case of comparing one version of a series against a subsequent version? In your response[3], you seemed to agree with that observation and suggested instead simply increasing the default creation factor for both commands (which sounds reasonable to me). [1]: https://lore.kernel.org/git/87y35g9l18.fsf@evledraar.gmail.com/ [2]: https://lore.kernel.org/git/CAPig+cRMiEcXVRYrgp+B3tcDreh41-a5_k0zABe+HNce0G=Cyw@mail.gmail.com/ [3]: https://lore.kernel.org/git/xmqqzhps4uyq.fsf@gitster-ct.c.googlers.com/ > --- > diff --git c/builtin/log.c w/builtin/log.c > index 4da7399905..7a019476c3 100644 > --- c/builtin/log.c > +++ w/builtin/log.c > @@ -2294,7 +2294,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) > if (creation_factor < 0) > - creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT; > + creation_factor = CREATION_FACTOR_FOR_THE_SAME_SERIES; > else if (!rdiff_prev) > die(_("the option '%s' requires '%s'"), "--creation-factor", "--range-diff"); > diff --git c/range-diff.h w/range-diff.h > index 04ffe217be..2f69f6a434 100644 > --- c/range-diff.h > +++ w/range-diff.h > @@ -6,6 +6,12 @@ > #define RANGE_DIFF_CREATION_FACTOR_DEFAULT 60 > > +/* > + * A much higher value than the default, when we KNOW we are comparing > + * the same series (e.g., used when format-patch calls range-diff). > + */ > +#define CREATION_FACTOR_FOR_THE_SAME_SERIES 999
Eric Sunshine <sunshine@sunshineco.com> writes: > different. Aren't both commands addressing the same use-case of > comparing one version of a series against a subsequent version? In > your response[3], you seemed to agree with that observation and > suggested instead simply increasing the default creation factor for > both commands (which sounds reasonable to me). I think Dscho's use case was compare only single updated series of his with seen that has tons of irrelevant other patches, to have the command sift the patches belong to other topics away while making comparison with the older incarnation of his topic. I never use the command for such a comparison, but I can imagine that in such a context lower criteria may help reduce false matches of his new commits with unrelated commits on 'seen'.
Junio C Hamano <gitster@pobox.com> writes: > Eric Sunshine <sunshine@sunshineco.com> writes: > >> different. Aren't both commands addressing the same use-case of >> comparing one version of a series against a subsequent version? In >> your response[3], you seemed to agree with that observation and >> suggested instead simply increasing the default creation factor for >> both commands (which sounds reasonable to me). > > I think Dscho's use case was compare only single updated series of > his with seen that has tons of irrelevant other patches, to have the > command sift the patches belong to other topics away while making > comparison with the older incarnation of his topic. I never use the > command for such a comparison, but I can imagine that in such a > context lower criteria may help reduce false matches of his new > commits with unrelated commits on 'seen'. It seems that Dscho was in agreement that format-patch's use case should try to be more aggressive at least back then. In the message in the thread you pointed https://lore.kernel.org/git/nycvar.QRO.7.76.6.1903211209280.41@tvgsbejvaqbjf.bet/ he does not give us the exact reason why he does not think the "more aggressive" mode is not suitable for other use cases, though. A similar thread was raised more recently: https://lore.kernel.org/git/rq6919s9-qspp-rn6o-n704-r0400q10747r@tzk.qr/ Also, there was an attempt to clarify what the "factor" really meant, but we did not get a real conclusion other than the UNIT to measure the "factor" is called "percent" (without specifying what 100% is relative to, "percent" does not mean much to guide users to guess what the right value would be). https://lore.kernel.org/git/85snn12q-po05-osqs-n1o0-n6040392q01q@tzk.qr/ So, yes, --creation-factor=<value> is messy, I think a very high value, much higher than the hardcoded default of 60, is more appropriate for use in format-patch, but we do not know what bad effect it would have if we used much higher default everywhere.
On Fri, May 3, 2024 at 4:08 PM Junio C Hamano <gitster@pobox.com> wrote: > It seems that Dscho was in agreement that format-patch's use case > should try to be more aggressive at least back then. In the message > in the thread you pointed > > https://lore.kernel.org/git/nycvar.QRO.7.76.6.1903211209280.41@tvgsbejvaqbjf.bet/ > > he does not give us the exact reason why he does not think the "more > aggressive" mode is not suitable for other use cases, though. Having an answer to that question could be helpful. > A similar thread was raised more recently: > > https://lore.kernel.org/git/rq6919s9-qspp-rn6o-n704-r0400q10747r@tzk.qr/ I think I missed this thread. > Also, there was an attempt to clarify what the "factor" really > meant, but we did not get a real conclusion other than the UNIT to > measure the "factor" is called "percent" (without specifying what > 100% is relative to, "percent" does not mean much to guide users > to guess what the right value would be). > > https://lore.kernel.org/git/85snn12q-po05-osqs-n1o0-n6040392q01q@tzk.qr/ I recall this discussion, as well as its followup (in which I proclaimed my continuing lack of understanding of what creation-factor really represents): https://lore.kernel.org/git/CAPig+cRp4N=6EktoisKAH09aVAPkPgZfHJYcB5pJFJ-CUpBHFA@mail.gmail.com/ > So, yes, --creation-factor=<value> is messy, I think a very high > value, much higher than the hardcoded default of 60, is more > appropriate for use in format-patch, but we do not know what bad > effect it would have if we used much higher default everywhere. At this point in time, I'm not particularly against giving `git format-patch` its own default creation-factor. My only worry (and perhaps it's very minor) is that separate default creation-factors for `git range-diff` and `git format-patch --range-diff` could catch the unwary off-guard when invoking `git range-diff` to manually check the difference between an old an new version of a series before finally invoking `git format-patch --range-diff` to actually create the patches for sending. (I've done this myself on a very few occasions, but perhaps not often enough to warrant the concern(?).)
Hi, On Fri, 3 May 2024, Eric Sunshine wrote: > On Fri, May 3, 2024 at 4:08 PM Junio C Hamano <gitster@pobox.com> wrote: > > It seems that Dscho was in agreement that format-patch's use case > > should try to be more aggressive at least back then. In the message > > in the thread you pointed > > > > https://lore.kernel.org/git/nycvar.QRO.7.76.6.1903211209280.41@tvgsbejvaqbjf.bet/ > > > > he does not give us the exact reason why he does not think the "more > > aggressive" mode is not suitable for other use cases, though. > > Having an answer to that question could be helpful. Sure. The `creation factor` essentially tells Git how confident the caller is that the two provided commit ranges contain only matching pairs of patches, no matter how different they look. When calling `git format-patch --range-diff`, we can be _quite_ certain. Not necessarily 100% (in which case the creation factor would need to be a lot higher than 100), as it does happen that contributors have to drop a patch or two, and add a patch or two, sometimes in the same iteration. When calling `git range-diff` in general, we can be less certain about that. In fact, users like me often call `git range-diff` to _determine_ whether there are obvious matches (for example, to see whether commits in GitGitGadget PRs have corresponding commits in `seen`). That difference in certainty is the entire reason why I contend that `range-diff` and `format-patch --range-diff` need different defaults for the creation factor. > > A similar thread was raised more recently: > > > > https://lore.kernel.org/git/rq6919s9-qspp-rn6o-n704-r0400q10747r@tzk.qr/ > > I think I missed this thread. Heh. I had forgotten about it. Ciao, Johannes
Hello Eric, On 2024-05-03 23:24, Eric Sunshine wrote: > On Fri, May 3, 2024 at 4:08 PM Junio C Hamano <gitster@pobox.com> > wrote: >> So, yes, --creation-factor=<value> is messy, I think a very high >> value, much higher than the hardcoded default of 60, is more >> appropriate for use in format-patch, but we do not know what bad >> effect it would have if we used much higher default everywhere. > > At this point in time, I'm not particularly against giving `git > format-patch` its own default creation-factor. > > My only worry (and perhaps it's very minor) is that separate default > creation-factors for `git range-diff` and `git format-patch > --range-diff` could catch the unwary off-guard when invoking `git > range-diff` to manually check the difference between an old an new > version of a series before finally invoking `git format-patch > --range-diff` to actually create the patches for sending. (I've done > this myself on a very few occasions, but perhaps not often enough to > warrant the concern(?).) I agree that having two different default values for the creation factor in "git format-patch" and "git range-diff" is a valid concern, but I think that the resulting benefits, in form of patch submissions that don't lack the range diff contents, outweigh this concern.
diff --git c/builtin/log.c w/builtin/log.c index 4da7399905..7a019476c3 100644 --- c/builtin/log.c +++ w/builtin/log.c @@ -2294,7 +2294,7 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) } if (creation_factor < 0) - creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT; + creation_factor = CREATION_FACTOR_FOR_THE_SAME_SERIES; else if (!rdiff_prev) die(_("the option '%s' requires '%s'"), "--creation-factor", "--range-diff"); diff --git c/range-diff.h w/range-diff.h index 04ffe217be..2f69f6a434 100644 --- c/range-diff.h +++ w/range-diff.h @@ -6,6 +6,12 @@ #define RANGE_DIFF_CREATION_FACTOR_DEFAULT 60 +/* + * A much higher value than the default, when we KNOW we are comparing + * the same series (e.g., used when format-patch calls range-diff). + */ +#define CREATION_FACTOR_FOR_THE_SAME_SERIES 999 + struct range_diff_options { int creation_factor; unsigned dual_color:1;