Message ID | f96bb36930a6e5e42f0d3b9c5dfa3b2cc27c1f9d.1607223276.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | diffcore-rename improvements | expand |
On Sun, Dec 06, 2020 at 02:54:31AM +0000, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > diffcore-rename had two different checks of the form > > if ((a < limit || b < limit) && > a * b <= limit * limit) > > Since these are all non-negative integers, this can be simplified to > > if (a * b <= limit * limit) Makes sense. > The only advantage of the former would be in avoiding a couple > multiplications in the rare case that both a and b are BOTH very large. > I see no reason for such an optimization given that this code is not in > any kind of loop. Prefer code simplicity here and change to the latter > form. If you were really paranoid, you could perform these checks with unsigned_mult_overflows(), but I don't think that it's worth doing so here. > Signed-off-by: Elijah Newren <newren@gmail.com> > --- > diffcore-rename.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/diffcore-rename.c b/diffcore-rename.c > index 68ddf51a2a1..0f8fce9293e 100644 > --- a/diffcore-rename.c > +++ b/diffcore-rename.c > @@ -450,9 +450,8 @@ static int too_many_rename_candidates(int num_targets, int num_sources, > */ > if (rename_limit <= 0) > rename_limit = 32767; > - if ((num_targets <= rename_limit || num_sources <= rename_limit) && > - ((uint64_t)num_targets * (uint64_t)num_sources > - <= (uint64_t)rename_limit * (uint64_t)rename_limit)) > + if ((uint64_t)num_targets * (uint64_t)num_sources > + <= (uint64_t)rename_limit * (uint64_t)rename_limit) One small nit here (and below) is that not all of these need casting. Only one operand of each multiplication needs to be widened for the compiler to widen the other, too. So, I'd write this instead as: > + if ((num_targets * (uint64_t)num_sources) <= > + (rename_limit * (uint64_t)rename_limit)) or something. (I tend to prefer the cast on the right-most operand, since it makes clear that there's no need to cast the "first" operand, and that casting either will do the trick). > return 0; > > options->needed_rename_limit = > @@ -468,9 +467,8 @@ static int too_many_rename_candidates(int num_targets, int num_sources, > continue; > limited_sources++; > } > - if ((num_targets <= rename_limit || limited_sources <= rename_limit) && > - ((uint64_t)num_targets * (uint64_t)limited_sources > - <= (uint64_t)rename_limit * (uint64_t)rename_limit)) > + if ((uint64_t)num_targets * (uint64_t)limited_sources > + <= (uint64_t)rename_limit * (uint64_t)rename_limit) Same notes here, of course. I was hoping that we could only do this multiplication once, but it looks like limited_sources grows between the two checks, so we have to repeat it here, I suppose. Thanks, Taylor
On Wed, Dec 9, 2020 at 2:10 PM Taylor Blau <me@ttaylorr.com> wrote: > > On Sun, Dec 06, 2020 at 02:54:31AM +0000, Elijah Newren via GitGitGadget wrote: > > From: Elijah Newren <newren@gmail.com> > > > > diffcore-rename had two different checks of the form > > > > if ((a < limit || b < limit) && > > a * b <= limit * limit) > > > > Since these are all non-negative integers, this can be simplified to > > > > if (a * b <= limit * limit) > > Makes sense. > > > The only advantage of the former would be in avoiding a couple > > multiplications in the rare case that both a and b are BOTH very large. > > I see no reason for such an optimization given that this code is not in > > any kind of loop. Prefer code simplicity here and change to the latter > > form. > > If you were really paranoid, you could perform these checks with > unsigned_mult_overflows(), but I don't think that it's worth doing so > here. > > > Signed-off-by: Elijah Newren <newren@gmail.com> > > --- > > diffcore-rename.c | 10 ++++------ > > 1 file changed, 4 insertions(+), 6 deletions(-) > > > > diff --git a/diffcore-rename.c b/diffcore-rename.c > > index 68ddf51a2a1..0f8fce9293e 100644 > > --- a/diffcore-rename.c > > +++ b/diffcore-rename.c > > @@ -450,9 +450,8 @@ static int too_many_rename_candidates(int num_targets, int num_sources, > > */ > > if (rename_limit <= 0) > > rename_limit = 32767; > > - if ((num_targets <= rename_limit || num_sources <= rename_limit) && > > - ((uint64_t)num_targets * (uint64_t)num_sources > > - <= (uint64_t)rename_limit * (uint64_t)rename_limit)) > > + if ((uint64_t)num_targets * (uint64_t)num_sources > > + <= (uint64_t)rename_limit * (uint64_t)rename_limit) > > One small nit here (and below) is that not all of these need casting. > Only one operand of each multiplication needs to be widened for the > compiler to widen the other, too. So, I'd write this instead as: > > > + if ((num_targets * (uint64_t)num_sources) <= > > + (rename_limit * (uint64_t)rename_limit)) > > or something. (I tend to prefer the cast on the right-most operand, > since it makes clear that there's no need to cast the "first" operand, > and that casting either will do the trick). Yeah, I think that reads slightly better too, but on the other hand it does make the diff slightly harder to read. *shrug*. > > return 0; > > > > options->needed_rename_limit = > > @@ -468,9 +467,8 @@ static int too_many_rename_candidates(int num_targets, int num_sources, > > continue; > > limited_sources++; > > } > > - if ((num_targets <= rename_limit || limited_sources <= rename_limit) && > > - ((uint64_t)num_targets * (uint64_t)limited_sources > > - <= (uint64_t)rename_limit * (uint64_t)rename_limit)) > > + if ((uint64_t)num_targets * (uint64_t)limited_sources > > + <= (uint64_t)rename_limit * (uint64_t)rename_limit) > > Same notes here, of course. I was hoping that we could only do this > multiplication once, but it looks like limited_sources grows between the > two checks, so we have to repeat it here, I suppose. The right hand side of the expression is the same -- rename_limit * rename_limit -- so it could be stored off (though I don't think there's much point in doing so unless it made the code clearer; this is not remotely close to a hot path). However, in the left hand side, it's not so much that one of the variables has changed since the last check, it's that it uses a different set of variables in the check. In particular, it replaces 'num_sources' with 'limited_sources'.
Taylor Blau <me@ttaylorr.com> writes: > On Sun, Dec 06, 2020 at 02:54:31AM +0000, Elijah Newren via GitGitGadget wrote: >> From: Elijah Newren <newren@gmail.com> >> >> diffcore-rename had two different checks of the form >> >> if ((a < limit || b < limit) && >> a * b <= limit * limit) >> >> Since these are all non-negative integers, this can be simplified to >> >> if (a * b <= limit * limit) > > Makes sense. I've always assumed that the original was for correctness (if a and b are both larger than limit, a*b could end up being smaller than limit*limit when the result of multiplication of the former wraps around but not the latter) ... >> The only advantage of the former would be in avoiding a couple >> multiplications in the rare case that both a and b are BOTH very large. >> I see no reason for such an optimization given that this code is not in >> any kind of loop. Prefer code simplicity here and change to the latter >> form. > > If you were really paranoid, you could perform these checks with > unsigned_mult_overflows(), but I don't think that it's worth doing so > here. ... and in no way as an optimization. So, I dunno.
On Wed, Dec 9, 2020 at 6:03 PM Junio C Hamano <gitster@pobox.com> wrote: > > Taylor Blau <me@ttaylorr.com> writes: > > > On Sun, Dec 06, 2020 at 02:54:31AM +0000, Elijah Newren via GitGitGadget wrote: > >> From: Elijah Newren <newren@gmail.com> > >> > >> diffcore-rename had two different checks of the form > >> > >> if ((a < limit || b < limit) && > >> a * b <= limit * limit) > >> > >> Since these are all non-negative integers, this can be simplified to > >> > >> if (a * b <= limit * limit) > > > > Makes sense. > > I've always assumed that the original was for correctness (if a and > b are both larger than limit, a*b could end up being smaller than > limit*limit when the result of multiplication of the former wraps > around but not the latter) ... > > >> The only advantage of the former would be in avoiding a couple > >> multiplications in the rare case that both a and b are BOTH very large. > >> I see no reason for such an optimization given that this code is not in > >> any kind of loop. Prefer code simplicity here and change to the latter > >> form. > > > > If you were really paranoid, you could perform these checks with > > unsigned_mult_overflows(), but I don't think that it's worth doing so > > here. > > ... and in no way as an optimization. > > So, I dunno. Ah, so would you be okay replacing these with if (st_mult(num_targets, limited_sources) <= st_mult(rename_limit, rename_limit)) ? That'd make the correctness intent far clearer, and allow us to drop the casting as well since st_mult converts to size_t. (If, on the off chance you're on a platform where size_t is 32-bits and the multiplications don't fit in that size, then the requested matrices for computing rename detection won't fit in memory for you.)
Elijah Newren <newren@gmail.com> writes: > Ah, so would you be okay replacing these with > > if (st_mult(num_targets, limited_sources) <= st_mult(rename_limit, > rename_limit)) It's subtle that the use of size_t is a sensible thing to do, and definitely deserves an in-line comment, but I think that is a good way to go.
diff --git a/diffcore-rename.c b/diffcore-rename.c index 68ddf51a2a1..0f8fce9293e 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -450,9 +450,8 @@ static int too_many_rename_candidates(int num_targets, int num_sources, */ if (rename_limit <= 0) rename_limit = 32767; - if ((num_targets <= rename_limit || num_sources <= rename_limit) && - ((uint64_t)num_targets * (uint64_t)num_sources - <= (uint64_t)rename_limit * (uint64_t)rename_limit)) + if ((uint64_t)num_targets * (uint64_t)num_sources + <= (uint64_t)rename_limit * (uint64_t)rename_limit) return 0; options->needed_rename_limit = @@ -468,9 +467,8 @@ static int too_many_rename_candidates(int num_targets, int num_sources, continue; limited_sources++; } - if ((num_targets <= rename_limit || limited_sources <= rename_limit) && - ((uint64_t)num_targets * (uint64_t)limited_sources - <= (uint64_t)rename_limit * (uint64_t)rename_limit)) + if ((uint64_t)num_targets * (uint64_t)limited_sources + <= (uint64_t)rename_limit * (uint64_t)rename_limit) return 2; return 1; }