Message ID | 42399a1c2e52e1d055a2d0ad96af2ca4dce6b1a0.1605649533.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | pack-bitmap: bitmap generation improvements | expand |
> diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c > index b0493d971d..3ac90ae410 100644 > --- a/pack-bitmap-write.c > +++ b/pack-bitmap-write.c > @@ -195,7 +195,8 @@ struct bitmap_builder { > }; > > static void bitmap_builder_init(struct bitmap_builder *bb, > - struct bitmap_writer *writer) > + struct bitmap_writer *writer, > + struct bitmap_index *old_bitmap) > { > struct rev_info revs; > struct commit *commit; > @@ -234,12 +235,26 @@ static void bitmap_builder_init(struct bitmap_builder *bb, > > c_ent = bb_data_at(&bb->data, commit); > > + if (old_bitmap && bitmap_for_commit(old_bitmap, commit)) { > + /* > + * This commit has an existing bitmap, so we can > + * get its bits immediately without an object > + * walk. There is no need to continue walking > + * beyond this commit. > + */ OK - as far as I understand, the reason for continuing the walk would be to find reverse edges that connect this commit and its ancestors so that this commit's ancestors can contribute bitmaps to this commit, but we do not need such contributions, so we do not need to continue the walk. Makes sense. > + c_ent->maximal = 1; > + p = NULL; Here, we're setting maximal without also setting a bit in this commit's commit_mask. This is fine because we're not propagating this commit's commit_mask to any parents (we're not continuing the walk from this commit), but it seems like a code smell. Suggested fix is below. > + } > + > if (c_ent->maximal) { > num_maximal++; > ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); > bb->commits[bb->commits_nr++] = commit; > } As far as I can tell, this means that this commit occupies a bit position in the commit mask that it doesn't need. Could this go into a separate list instead, to be appended to bb->commits at the very end? We could even skip the whole maximal stuff (for commits with existing bitmaps) and replace "c_ent->maximal = 1;" above with "add to list that we're going to append to bb->commits at the very end". That has the advantage of not having to redefine "maximal". > > + if (!c_ent->commit_mask) > + continue; I think this should be moved as far up as possible (right after the call to bb_data_at()) and commented, something like: If there is no commit_mask, there is no reason to iterate over this commit; it is not selected (if it were, it would not have a blank commit mask) and all its children have existing bitmaps (see the comment starting with "This commit has an existing bitmap" below), so it does not contribute anything to the final bitmap file or its descendants.
On Wed, Dec 02, 2020 at 12:08:08AM -0800, Jonathan Tan wrote: > > + c_ent->maximal = 1; > > + p = NULL; > > Here, we're setting maximal without also setting a bit in this commit's > commit_mask. This is fine because we're not propagating this commit's > commit_mask to any parents (we're not continuing the walk from this > commit), but it seems like a code smell. Suggested fix is below. > > > + } > > + > > if (c_ent->maximal) { > > num_maximal++; > > ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); > > bb->commits[bb->commits_nr++] = commit; > > } > > As far as I can tell, this means that this commit occupies a bit > position in the commit mask that it doesn't need. Could this go into a > separate list instead, to be appended to bb->commits at the very end? > > We could even skip the whole maximal stuff (for commits with existing > bitmaps) and replace "c_ent->maximal = 1;" above with "add to list that > we're going to append to bb->commits at the very end". That has the > advantage of not having to redefine "maximal". Hmm. I'd trust Stolee's opinion over mine here, so I'll be curious what he has to say. > > > > + if (!c_ent->commit_mask) > > + continue; > > I think this should be moved as far up as possible (right after > the call to bb_data_at()) and commented, something like: > > If there is no commit_mask, there is no reason to iterate over this > commit; it is not selected (if it were, it would not have a blank > commit mask) and all its children have existing bitmaps (see the > comment starting with "This commit has an existing bitmap" below), so > it does not contribute anything to the final bitmap file or its > descendants. Good suggestion, thanks. Thanks, Taylor
On 12/2/2020 11:35 AM, Taylor Blau wrote: > On Wed, Dec 02, 2020 at 12:08:08AM -0800, Jonathan Tan wrote: >>> + c_ent->maximal = 1; >>> + p = NULL; >> >> Here, we're setting maximal without also setting a bit in this commit's >> commit_mask. This is fine because we're not propagating this commit's >> commit_mask to any parents (we're not continuing the walk from this >> commit), but it seems like a code smell. Suggested fix is below. >> >>> + } >>> + >>> if (c_ent->maximal) { >>> num_maximal++; >>> ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); >>> bb->commits[bb->commits_nr++] = commit; >>> } >> >> As far as I can tell, this means that this commit occupies a bit >> position in the commit mask that it doesn't need. Could this go into a >> separate list instead, to be appended to bb->commits at the very end? I don't see any value in having a second list here. That only makes things more complicated. >> We could even skip the whole maximal stuff (for commits with existing >> bitmaps) and replace "c_ent->maximal = 1;" above with "add to list that >> we're going to append to bb->commits at the very end". That has the >> advantage of not having to redefine "maximal". > > Hmm. I'd trust Stolee's opinion over mine here, so I'll be curious what > he has to say. It would be equivalent to add it to the list and then continuing the loop instead of piggy-backing on the if (c_ent->maximal) block, followed by a trivial loop over the (nullified) parents. >>> >>> + if (!c_ent->commit_mask) >>> + continue; >> >> I think this should be moved as far up as possible (right after >> the call to bb_data_at()) and commented, something like: >> >> If there is no commit_mask, there is no reason to iterate over this >> commit; it is not selected (if it were, it would not have a blank >> commit mask) and all its children have existing bitmaps (see the >> comment starting with "This commit has an existing bitmap" below), so >> it does not contribute anything to the final bitmap file or its >> descendants. > > Good suggestion, thanks. Yeah, makes sense to me. Thanks, -Stolee
On Wed, Dec 02, 2020 at 01:22:27PM -0500, Derrick Stolee wrote: > >> We could even skip the whole maximal stuff (for commits with existing > >> bitmaps) and replace "c_ent->maximal = 1;" above with "add to list that > >> we're going to append to bb->commits at the very end". That has the > >> advantage of not having to redefine "maximal". > > > > Hmm. I'd trust Stolee's opinion over mine here, so I'll be curious what > > he has to say. > > It would be equivalent to add it to the list and then continuing the > loop instead of piggy-backing on the if (c_ent->maximal) block, followed > by a trivial loop over the (nullified) parents. Jonathan: does that seem OK to you to leave it as-is? If you don't have strong objections, I'll go ahead with sending v3 a little later today. Thanks, Taylor
> On 12/2/2020 11:35 AM, Taylor Blau wrote: > > On Wed, Dec 02, 2020 at 12:08:08AM -0800, Jonathan Tan wrote: > >>> + c_ent->maximal = 1; > >>> + p = NULL; > >> > >> Here, we're setting maximal without also setting a bit in this commit's > >> commit_mask. This is fine because we're not propagating this commit's > >> commit_mask to any parents (we're not continuing the walk from this > >> commit), but it seems like a code smell. Suggested fix is below. > >> > >>> + } > >>> + > >>> if (c_ent->maximal) { > >>> num_maximal++; > >>> ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); > >>> bb->commits[bb->commits_nr++] = commit; > >>> } > >> > >> As far as I can tell, this means that this commit occupies a bit > >> position in the commit mask that it doesn't need. Could this go into a > >> separate list instead, to be appended to bb->commits at the very end? > > I don't see any value in having a second list here. That only makes > things more complicated. It does make things more complicated, but it could help shrink commit bitmasks (which seem to be a concern, according to patch 23). Suppose num_maximal was 3 and we encountered such a commit (not selected, but has an old bitmap). So we increment num_maximal. Then, we encounter a selected commit. That commit would then have a bitmask of ???01. If we had not incremented num_maximal (which would require a second list), then the bitmask would be ???1. > >> We could even skip the whole maximal stuff (for commits with existing > >> bitmaps) and replace "c_ent->maximal = 1;" above with "add to list that > >> we're going to append to bb->commits at the very end". That has the > >> advantage of not having to redefine "maximal". > > > > Hmm. I'd trust Stolee's opinion over mine here, so I'll be curious what > > he has to say. > > It would be equivalent to add it to the list and then continuing the > loop instead of piggy-backing on the if (c_ent->maximal) block, followed > by a trivial loop over the (nullified) parents. That is true. This suggestion was for code clarity, not for correctness.
> On Wed, Dec 02, 2020 at 01:22:27PM -0500, Derrick Stolee wrote: > > >> We could even skip the whole maximal stuff (for commits with existing > > >> bitmaps) and replace "c_ent->maximal = 1;" above with "add to list that > > >> we're going to append to bb->commits at the very end". That has the > > >> advantage of not having to redefine "maximal". > > > > > > Hmm. I'd trust Stolee's opinion over mine here, so I'll be curious what > > > he has to say. > > > > It would be equivalent to add it to the list and then continuing the > > loop instead of piggy-backing on the if (c_ent->maximal) block, followed > > by a trivial loop over the (nullified) parents. > > Jonathan: does that seem OK to you to leave it as-is? If you don't have > strong objections, I'll go ahead with sending v3 a little later today. Like I (just) said in [1], I think that my comment stands, but this is a minor and local issue that does not affect the functionality of the overall patch set so I think you can go ahead and send v3. [1] https://lore.kernel.org/git/20201207182418.3034961-1-jonathantanmy@google.com/
On 12/7/2020 1:24 PM, Jonathan Tan wrote: >> On 12/2/2020 11:35 AM, Taylor Blau wrote: >>> On Wed, Dec 02, 2020 at 12:08:08AM -0800, Jonathan Tan wrote: >>>>> + c_ent->maximal = 1; >>>>> + p = NULL; >>>> >>>> Here, we're setting maximal without also setting a bit in this commit's >>>> commit_mask. This is fine because we're not propagating this commit's >>>> commit_mask to any parents (we're not continuing the walk from this >>>> commit), but it seems like a code smell. Suggested fix is below. >>>> >>>>> + } >>>>> + >>>>> if (c_ent->maximal) { >>>>> num_maximal++; >>>>> ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); >>>>> bb->commits[bb->commits_nr++] = commit; >>>>> } >>>> >>>> As far as I can tell, this means that this commit occupies a bit >>>> position in the commit mask that it doesn't need. Could this go into a >>>> separate list instead, to be appended to bb->commits at the very end? >> >> I don't see any value in having a second list here. That only makes >> things more complicated. > > It does make things more complicated, but it could help shrink commit > bitmasks (which seem to be a concern, according to patch 23). > > Suppose num_maximal was 3 and we encountered such a commit (not > selected, but has an old bitmap). So we increment num_maximal. Then, we > encounter a selected commit. That commit would then have a bitmask of > ???01. If we had not incremented num_maximal (which would require a > second list), then the bitmask would be ???1. OK, I see the value. The value is bounded, since the number of these "0" gaps is bounded by the number of selected commits _and_ reduces the possible number of maximal commits. However, that seems like enough justification to create the second list. Thanks, -Stolee
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index b0493d971d..3ac90ae410 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -195,7 +195,8 @@ struct bitmap_builder { }; static void bitmap_builder_init(struct bitmap_builder *bb, - struct bitmap_writer *writer) + struct bitmap_writer *writer, + struct bitmap_index *old_bitmap) { struct rev_info revs; struct commit *commit; @@ -234,12 +235,26 @@ static void bitmap_builder_init(struct bitmap_builder *bb, c_ent = bb_data_at(&bb->data, commit); + if (old_bitmap && bitmap_for_commit(old_bitmap, commit)) { + /* + * This commit has an existing bitmap, so we can + * get its bits immediately without an object + * walk. There is no need to continue walking + * beyond this commit. + */ + c_ent->maximal = 1; + p = NULL; + } + if (c_ent->maximal) { num_maximal++; ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); bb->commits[bb->commits_nr++] = commit; } + if (!c_ent->commit_mask) + continue; + if (p) { struct bb_commit *p_ent = bb_data_at(&bb->data, p->item); int c_not_p, p_not_c; @@ -422,7 +437,7 @@ void bitmap_writer_build(struct packing_data *to_pack) else mapping = NULL; - bitmap_builder_init(&bb, &writer); + bitmap_builder_init(&bb, &writer, old_bitmap); for (i = bb.commits_nr; i > 0; i--) { struct commit *commit = bb.commits[i-1]; struct bb_commit *ent = bb_data_at(&bb.data, commit);