Message ID | 3f21e10f919eead049dc2440a29bb2bed6c99d0d.1611339373.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Range diff with ranges lacking dotdot | expand |
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > Currently, when called with exactly two arguments, `git range-diff` > tests for a literal `..` in each of the two. Likewise, the argument > provided via `--range-diff` to `git format-patch` is checked in the same > manner. > > However, `<commit>^!` is a perfectly valid commit range, equivalent to > `<commit>^..<commit>` according to the `SPECIFYING RANGES` section of > gitrevisions[7]. > > In preparation for allowing more sophisticated ways to specify commit > ranges, let's refactor the check into its own function. I think the sharing between the two makes sense, but the helper function should make it clear in its name that this is "the kind of commit range range-diff wants to take". Among the commit range "git log" and friends can take, range-diff can take only a subset of it, and only a subset of it is meaningful to range-diff (e.g. HEAD^@ is still a commit range you can give to "git log", but it would not make much sense to give it to range-diff). Perhaps s/specifies_commit_range/is_range_diff_range/ or something. > diff --git a/revision.h b/revision.h Move this to range-diff.h, not revision.h which is about true commit ranges. > index 086ff10280d..66777c8e60f 100644 > --- a/revision.h > +++ b/revision.h > @@ -457,4 +457,11 @@ int rewrite_parents(struct rev_info *revs, > */ > struct commit_list *get_saved_parents(struct rev_info *revs, const struct commit *commit); > > +/* > + * Determine whether the given argument defines a commit range, e.g. A..B. > + * Note that this only validates the format but does _not_ parse it, i.e. > + * it does _not_ look up the specified commits in the local repository. > + */ And s/defines a commit range/is usable as a range to range-diff/ Thanks. > +int specifies_commit_range(const char *range); > + > #endif
On Fri, Jan 22, 2021 at 12:27:10PM -0800, Junio C Hamano wrote: > "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> > writes: > > > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > > > Currently, when called with exactly two arguments, `git range-diff` > > tests for a literal `..` in each of the two. Likewise, the argument > > provided via `--range-diff` to `git format-patch` is checked in the same > > manner. > > > > However, `<commit>^!` is a perfectly valid commit range, equivalent to > > `<commit>^..<commit>` according to the `SPECIFYING RANGES` section of > > gitrevisions[7]. > > > > In preparation for allowing more sophisticated ways to specify commit > > ranges, let's refactor the check into its own function. > > I think the sharing between the two makes sense, but the helper > function should make it clear in its name that this is "the kind of > commit range range-diff wants to take". Among the commit range "git > log" and friends can take, range-diff can take only a subset of it, > and only a subset of it is meaningful to range-diff (e.g. HEAD^@ is > still a commit range you can give to "git log", but it would not > make much sense to give it to range-diff). Does it make so little sense to forbid passing HEAD^@ as a range to range-diff? I can imagine situations where is would make sense, e.g. I often create customer patch stacks from a set of topic branches using octopus merge. To compare two of these ^@ might be handy. My POV is that if it's easy to use the same function (and so the same set of range descriptors) for git log and git range-diff then do so. This yields a consistent behaviour which is IMHO better than preventing people to do things that are considered strange today. Best regards Uwe
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes: >> > In preparation for allowing more sophisticated ways to specify commit >> > ranges, let's refactor the check into its own function. >> >> I think the sharing between the two makes sense, but the helper >> function should make it clear in its name that this is "the kind of >> commit range range-diff wants to take". Among the commit range "git >> log" and friends can take, range-diff can take only a subset of it, >> and only a subset of it is meaningful to range-diff (e.g. HEAD^@ is >> still a commit range you can give to "git log", but it would not >> make much sense to give it to range-diff). > > Does it make so little sense to forbid passing HEAD^@ as a range to > range-diff? I can imagine situations where is would make sense, e.g. I > often create customer patch stacks from a set of topic branches using > octopus merge. To compare two of these ^@ might be handy. You can discuss for each individual syntax of a single-token range and decide which ones are and are not suitable for range-diff, but I suspect the reason behind this business started with dot-dot is to perform a superficial "sanity check" at the command line parser level before passing them to the revision machinery, and having to deal with potential errors and/or having to compare unreasonably large segments of history that the user did not intend. Also I first thought that the command changes the behaviour, given two tokens, depending on the shape of these two tokens (i.e. when they satisfy the "is-range?" we are discussing, they are taken as two ranges to be compared, and otherwise does something else), but after revisiting the code and "git help range-diff", it always does one thing when given (1) one arg: gives a symmetric range and what is to be compared is its left and right half, (2) two args: each is meant to name a set of commits and these two are to be compared) or (3) three args: each is meant to name a commit, and the arg1..arg2 and arg1..arg3 are the ranges to be compared. so ... > My POV is that if it's easy to use the same function (and so the same > set of range descriptors) for git log and git range-diff then do so. > This yields a consistent behaviour which is IMHO better than preventing > people to do things that are considered strange today. ... I am OK with that point of view. It certainly is simpler to explain to end users. Having said that, it would make it much harder to implement efficiently, though. For example, when your user says git range-diff A B to compare "git log A" (all the way down to the root) and "git log B" (likewise), you'd certainly optimize the older common part of the history out, essentially turning it into git range-diff A..B B..A or its moral equivalent git range-diff A...B But you cannot apply such an optimization blindly. When the user gives A..B and B..A as two args, you somehow need to notice that you shouldn't rewrite it to "A..B...B..A", and for that, you'd still need some "parsing" of these args. So, I dunno. Limiting the second form to only forms that the implementation does not have to do such optimization would certainly make it simpler for Dscho to implement ;-)
Hello Junio, On Mon, Jan 25, 2021 at 11:24:39AM -0800, Junio C Hamano wrote: > Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes: > > >> > In preparation for allowing more sophisticated ways to specify commit > >> > ranges, let's refactor the check into its own function. > >> > >> I think the sharing between the two makes sense, but the helper > >> function should make it clear in its name that this is "the kind of > >> commit range range-diff wants to take". Among the commit range "git > >> log" and friends can take, range-diff can take only a subset of it, > >> and only a subset of it is meaningful to range-diff (e.g. HEAD^@ is > >> still a commit range you can give to "git log", but it would not > >> make much sense to give it to range-diff). > > > > Does it make so little sense to forbid passing HEAD^@ as a range to > > range-diff? I can imagine situations where is would make sense, e.g. I > > often create customer patch stacks from a set of topic branches using > > octopus merge. To compare two of these ^@ might be handy. > > You can discuss for each individual syntax of a single-token range > and decide which ones are and are not suitable for range-diff, but I > suspect the reason behind this business started with dot-dot is to > perform a superficial "sanity check" at the command line parser > level before passing them to the revision machinery, and having to > deal with potential errors and/or having to compare unreasonably > large segments of history that the user did not intend. > > Also I first thought that the command changes the behaviour, given > two tokens, depending on the shape of these two tokens (i.e. when > they satisfy the "is-range?" we are discussing, they are taken as > two ranges to be compared, and otherwise does something else), but > after revisiting the code and "git help range-diff", it always does > one thing when given > > (1) one arg: gives a symmetric range and what is to be compared > is its left and right half, > > (2) two args: each is meant to name a set of commits and these two > are to be compared) or > > (3) three args: each is meant to name a commit, and the arg1..arg2 > and arg1..arg3 are the ranges to be compared. > > so ... > > > My POV is that if it's easy to use the same function (and so the same > > set of range descriptors) for git log and git range-diff then do so. > > This yields a consistent behaviour which is IMHO better than preventing > > people to do things that are considered strange today. > > ... I am OK with that point of view. It certainly is simpler to > explain to end users. It seems you understood my argument :-) > Having said that, it would make it much harder to implement > efficiently, though. For example, when your user says > > git range-diff A B > > to compare "git log A" (all the way down to the root) and "git log > B" (likewise), you'd certainly optimize the older common part of the > history out, essentially turning it into > > git range-diff A..B B..A > > or its moral equivalent > > git range-diff A...B > > But you cannot apply such an optimization blindly. When the user > gives A..B and B..A as two args, you somehow need to notice that > you shouldn't rewrite it to "A..B...B..A", and for that, you'd still > need some "parsing" of these args. I agree that for a long history git range-diff A B is an expensive request and I wouldn't invest too many cycles optimizing it. (And if I'd optimize it, it wouldn't be done using textual combination of the two strings but by checking if the two ranges intersect. This way something like git range-diff v4.0..v4.6-rc1 v4.0..v4.5.6 and maybe even git range-diff v4.0..v4.6-rc1 v4.0-rc1..v4.5.6 would benefit, too. But note I'm not (anymore) familiar with the git source code, so I don't know if this is easy/sensible to do and I'm just looking at the problem from an architectural and theoretical POV.) > So, I dunno. Limiting the second form to only forms that the > implementation does not have to do such optimization would certainly > make it simpler for Dscho to implement ;-) I don't want to make it more complicated for Dscho, I'm happy if I can in the near future use range-diff with $rev1^! $ref2^! . So feel free to ignore me. Best regards Uwe
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes: >> > My POV is that if it's easy to use the same function (and so the same >> > set of range descriptors) for git log and git range-diff then do so. >> > This yields a consistent behaviour which is IMHO better than preventing >> > people to do things that are considered strange today. >> >> ... I am OK with that point of view. It certainly is simpler to >> explain to end users. > > It seems you understood my argument :-) So it seems ;-). >> Having said that, it would make it much harder to implement >> efficiently, though. For example, when your user says >> >> git range-diff A B >> >> to compare "git log A" (all the way down to the root) and "git log >> B" (likewise), you'd certainly optimize the older common part of the >> history out, essentially turning it into >> >> git range-diff A..B B..A >> >> or its moral equivalent >> >> git range-diff A...B >> >> But you cannot apply such an optimization blindly. When the user >> gives A..B and B..A as two args, you somehow need to notice that >> you shouldn't rewrite it to "A..B...B..A", and for that, you'd still >> need some "parsing" of these args. > > I agree that for a long history > > git range-diff A B > > is an expensive request and I wouldn't invest too many cycles optimizing > it. Well, your devil's advocate can argue that accepting an input that can easily make the tool unusable would be irresponsible, though. And there are two possible ways out: (1) declare that optimizing "A B" into "A...B" is way too difficult to do in general, and find a good enough way to see if A and B individually gives a "range" that should be manageable; or (2) invest cycles to optimize, so your users do not have to care. I think the series takes the former approach, and I find it understandable. It is a different matter if the way found and implemented in the patch is "good enough" to tell if a given argument names a manageable range. You said something about "rev^{/^here are two..dots}" that can be mistaken as a "good enough" range, but it actually names a revision and all the way down to the root. > (And if I'd optimize it, it wouldn't be done using textual > combination of the two strings but by checking if the two ranges > intersect. Yup, that is in line with what I mumbled earlier about setup_revisions() and inspecting the rev_info.pending, I think. >> So, I dunno. Limiting the second form to only forms that the >> implementation does not have to do such optimization would certainly >> make it simpler for Dscho to implement ;-) > > I don't want to make it more complicated for Dscho, I'm happy if I can > in the near future use range-diff with $rev1^! $ref2^! . So feel free to > ignore me. > > Best regards > Uwe
diff --git a/builtin/log.c b/builtin/log.c index bd6ff4f9f95..099abdfb7e6 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -1680,7 +1680,7 @@ static void infer_range_diff_ranges(struct strbuf *r1, struct commit *head) { const char *head_oid = oid_to_hex(&head->object.oid); - int prev_is_range = !!strstr(prev, ".."); + int prev_is_range = specifies_commit_range(prev); if (prev_is_range) strbuf_addstr(r1, prev); diff --git a/builtin/range-diff.c b/builtin/range-diff.c index 24c4162f744..89d54158011 100644 --- a/builtin/range-diff.c +++ b/builtin/range-diff.c @@ -3,6 +3,7 @@ #include "parse-options.h" #include "range-diff.h" #include "config.h" +#include "revision.h" static const char * const builtin_range_diff_usage[] = { N_("git range-diff [<options>] <old-base>..<old-tip> <new-base>..<new-tip>"), @@ -46,12 +47,12 @@ int cmd_range_diff(int argc, const char **argv, const char *prefix) diffopt.use_color = 1; if (argc == 2) { - if (!strstr(argv[0], "..")) - die(_("no .. in range: '%s'"), argv[0]); + if (!specifies_commit_range(argv[0])) + die(_("not a commit range: '%s'"), argv[0]); strbuf_addstr(&range1, argv[0]); - if (!strstr(argv[1], "..")) - die(_("no .. in range: '%s'"), argv[1]); + if (!specifies_commit_range(argv[1])) + die(_("not a commit range: '%s'"), argv[1]); strbuf_addstr(&range2, argv[1]); } else if (argc == 3) { strbuf_addf(&range1, "%s..%s", argv[0], argv[1]); diff --git a/revision.c b/revision.c index 9dff845bed6..00675f598a3 100644 --- a/revision.c +++ b/revision.c @@ -4206,3 +4206,8 @@ void put_revision_mark(const struct rev_info *revs, const struct commit *commit) fputs(mark, stdout); putchar(' '); } + +int specifies_commit_range(const char *range) +{ + return !!strstr(range, ".."); +} diff --git a/revision.h b/revision.h index 086ff10280d..66777c8e60f 100644 --- a/revision.h +++ b/revision.h @@ -457,4 +457,11 @@ int rewrite_parents(struct rev_info *revs, */ struct commit_list *get_saved_parents(struct rev_info *revs, const struct commit *commit); +/* + * Determine whether the given argument defines a commit range, e.g. A..B. + * Note that this only validates the format but does _not_ parse it, i.e. + * it does _not_ look up the specified commits in the local repository. + */ +int specifies_commit_range(const char *range); + #endif