Message ID | 20200214182222.GG150965@coredump.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | combining object filters and bitmaps | expand |
On Fri, Feb 14, 2020 at 01:22:22PM -0500, Jeff King wrote: > The prior commit taught "--count --objects" to work without bitmaps. We > should be able to get the same answer much more quickly with bitmaps. > > Note that we punt on the max_count case here. This perhaps _could_ be > made to work if we find all of the boundary commits and treat them as > UNINTERESTING, subtracting them (and their reachable objects) from the > set we return. That implies an actual commit traversal, but we'd still > be faster due to avoiding opening up any trees. Given the complexity and > the fact that anyone is unlikely to want this, it makes sense to just > fall back to the non-bitmap case for now. > > Signed-off-by: Jeff King <peff@peff.net> > --- > builtin/rev-list.c | 21 ++++++++++++++++++--- > t/t5310-pack-bitmaps.sh | 6 ++++++ > 2 files changed, 24 insertions(+), 3 deletions(-) > > diff --git a/builtin/rev-list.c b/builtin/rev-list.c > index 9452123988..70f3207ecc 100644 > --- a/builtin/rev-list.c > +++ b/builtin/rev-list.c > @@ -374,7 +374,10 @@ static inline int parse_missing_action_value(const char *value) > > static int try_bitmap_count(struct rev_info *revs) > { > - uint32_t commit_count; > + uint32_t commit_count = 0, > + tag_count = 0, > + tree_count = 0, > + blob_count = 0; Hmm, I don't usually see the comma-separated declaration/initialization in git.git. Is there a reason you did it here? Not that I really mind one way or the other, just interested. > int max_count; > struct bitmap_index *bitmap_git; > > @@ -389,6 +392,15 @@ static int try_bitmap_count(struct rev_info *revs) > if (revs->left_right || revs->cherry_mark) > return -1; > > + /* > + * If we're counting reachable objects, we can't handle a max count of > + * commits to traverse, since we don't know which objects go with which > + * commit. > + */ > + if (revs->max_count >= 0 && > + (revs->tag_objects || revs->tree_objects || revs->blob_objects)) An aside unrelated to the patch at hand: the expression (revs->tag_objects || revs->tree_objects || revs->blob_objects) does occur in an awful lot of places throughout this file. Do you imagine it'd be useful to pull this check out into its own function, perhaps as a preparatory patch in a later version of this series? I'm also not fussed if you don't think that such a change would be useful, it's just an observation I had after seeing this expression a few times. > + return -1; > + > /* > * This must be saved before doing any walking, since the revision > * machinery will count it down to zero while traversing. > @@ -399,11 +411,14 @@ static int try_bitmap_count(struct rev_info *revs) > if (!bitmap_git) > return -1; > > - count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL); > + count_bitmap_commit_list(bitmap_git, &commit_count, > + revs->tree_objects ? &tree_count : NULL, > + revs->blob_objects ? &blob_count : NULL, > + revs->tag_objects ? &tag_count : NULL); > if (max_count >= 0 && max_count < commit_count) > commit_count = max_count; > > - printf("%d\n", commit_count); > + printf("%d\n", commit_count + tree_count + blob_count + tag_count); > free_bitmap_index(bitmap_git); > return 0; > } > diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh > index 6640329ebf..7ba7d294a5 100755 > --- a/t/t5310-pack-bitmaps.sh > +++ b/t/t5310-pack-bitmaps.sh > @@ -74,6 +74,12 @@ rev_list_tests() { > test_cmp expect actual > ' > > + test_expect_success "counting objects via bitmap ($state)" ' > + git rev-list --count --objects HEAD >expect && > + git rev-list --use-bitmap-index --count --objects HEAD >actual && > + test_cmp expect actual > + ' > + > test_expect_success "enumerate --objects ($state)" ' > git rev-list --objects --use-bitmap-index HEAD >tmp && > cut -d" " -f1 <tmp >tmp2 && > -- > 2.25.0.796.gcc29325708 Your tests look good to me, too. Thanks, Taylor
On Fri, Feb 14, 2020 at 04:45:55PM -0800, Taylor Blau wrote: > > static int try_bitmap_count(struct rev_info *revs) > > { > > - uint32_t commit_count; > > + uint32_t commit_count = 0, > > + tag_count = 0, > > + tree_count = 0, > > + blob_count = 0; > > Hmm, I don't usually see the comma-separated declaration/initialization > in git.git. Is there a reason you did it here? Not that I really mind > one way or the other, just interested. The variables are all related, and all should have the same type. I'd complain about a patch that did: int ret, count; because there's no logical reason those two variables have the same type. They just happen to. And putting them both on the same line is even worse, because it makes a diff changing one of them noisy. But in the code quoted above, if one of them changes, they would all (presumably) change. So I think it communicates something to group them like this. That said, if we had a style guideline that strictly forbade this (because it's often applied in _bad_ spots), I wouldn't be that sad to convert it. > > + if (revs->max_count >= 0 && > > + (revs->tag_objects || revs->tree_objects || revs->blob_objects)) > > An aside unrelated to the patch at hand: the expression > > (revs->tag_objects || revs->tree_objects || revs->blob_objects) > > does occur in an awful lot of places throughout this file. Do you > imagine it'd be useful to pull this check out into its own function, > perhaps as a preparatory patch in a later version of this series? > > I'm also not fussed if you don't think that such a change would be > useful, it's just an observation I had after seeing this expression a > few times. The most amusing part about it is that there's actually no way to specify one type without the other. Most of the revision code tries to be respectful of the fact that we might one day change that (though there are some subtleties; e.g., asking for blobs but not trees would still need to traverse the trees, but just not show them). So this really could have been revs->objects. :) But I think given the effort to treat them individually so far, it would be a step backwards to switch now. As far as whether there should be something like: int revs_show_objects(struct rev_info *revs) { return revs->tag_objects || revs->tree_objects || revs->blob_objects; } I don't feel strongly either way. I find doing it inline perfectly readable, but it's entirely possible my brain has been rotted by years of looking at the revision code. -Peff
Jeff King <peff@peff.net> writes: >> > + uint32_t commit_count = 0, >> > + tag_count = 0, >> > + tree_count = 0, >> > + blob_count = 0; >> >> Hmm, I don't usually see the comma-separated declaration/initialization >> in git.git. Is there a reason you did it here? Not that I really mind >> one way or the other, just interested. > > The variables are all related, and all should have the same type. I'd > complain about a patch that did: > > int ret, count; > > because there's no logical reason those two variables have the same > type. They just happen to. And putting them both on the same line is > even worse, because it makes a diff changing one of them noisy. > > But in the code quoted above, if one of them changes, they would all > (presumably) change. So I think it communicates something to group them > like this. I often apply exactly the same criteria as above to my code and review---since it is not just you (or me), perhaps CodingGuideline can help other readers, but I am OK to delay documenting it until we find the third person who has been applying this rule that has not been spelt out explicitly ;-)
diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 9452123988..70f3207ecc 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -374,7 +374,10 @@ static inline int parse_missing_action_value(const char *value) static int try_bitmap_count(struct rev_info *revs) { - uint32_t commit_count; + uint32_t commit_count = 0, + tag_count = 0, + tree_count = 0, + blob_count = 0; int max_count; struct bitmap_index *bitmap_git; @@ -389,6 +392,15 @@ static int try_bitmap_count(struct rev_info *revs) if (revs->left_right || revs->cherry_mark) return -1; + /* + * If we're counting reachable objects, we can't handle a max count of + * commits to traverse, since we don't know which objects go with which + * commit. + */ + if (revs->max_count >= 0 && + (revs->tag_objects || revs->tree_objects || revs->blob_objects)) + return -1; + /* * This must be saved before doing any walking, since the revision * machinery will count it down to zero while traversing. @@ -399,11 +411,14 @@ static int try_bitmap_count(struct rev_info *revs) if (!bitmap_git) return -1; - count_bitmap_commit_list(bitmap_git, &commit_count, NULL, NULL, NULL); + count_bitmap_commit_list(bitmap_git, &commit_count, + revs->tree_objects ? &tree_count : NULL, + revs->blob_objects ? &blob_count : NULL, + revs->tag_objects ? &tag_count : NULL); if (max_count >= 0 && max_count < commit_count) commit_count = max_count; - printf("%d\n", commit_count); + printf("%d\n", commit_count + tree_count + blob_count + tag_count); free_bitmap_index(bitmap_git); return 0; } diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 6640329ebf..7ba7d294a5 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -74,6 +74,12 @@ rev_list_tests() { test_cmp expect actual ' + test_expect_success "counting objects via bitmap ($state)" ' + git rev-list --count --objects HEAD >expect && + git rev-list --use-bitmap-index --count --objects HEAD >actual && + test_cmp expect actual + ' + test_expect_success "enumerate --objects ($state)" ' git rev-list --objects --use-bitmap-index HEAD >tmp && cut -d" " -f1 <tmp >tmp2 &&
The prior commit taught "--count --objects" to work without bitmaps. We should be able to get the same answer much more quickly with bitmaps. Note that we punt on the max_count case here. This perhaps _could_ be made to work if we find all of the boundary commits and treat them as UNINTERESTING, subtracting them (and their reachable objects) from the set we return. That implies an actual commit traversal, but we'd still be faster due to avoiding opening up any trees. Given the complexity and the fact that anyone is unlikely to want this, it makes sense to just fall back to the non-bitmap case for now. Signed-off-by: Jeff King <peff@peff.net> --- builtin/rev-list.c | 21 ++++++++++++++++++--- t/t5310-pack-bitmaps.sh | 6 ++++++ 2 files changed, 24 insertions(+), 3 deletions(-)