Message ID | 4bf5e78a54dfdcbe13dd66ba4c5955a159ea181d.1605649533.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | pack-bitmap: bitmap generation improvements | expand |
> From: Derrick Stolee <dstolee@microsoft.com> > > When constructing new bitmaps, we perform a commit and tree walk in > fill_bitmap_commit() and fill_bitmap_tree(). This walk would benefit > from using existing bitmaps when available. We must track the existing > bitmaps and translate them into the new object order, but this is > generally faster than parsing trees. Makes sense. > In fill_bitmap_commit(), we must reorder thing somewhat. The priority > queue walks commits from newest-to-oldest, which means we correctly stop > walking when reaching a commit with a bitmap. Makes sense. > However, if we walk trees > from top to bottom, then we might be parsing trees that are actually > part of a re-used bitmap. Isn't the issue that we shouldn't walk trees at all before exhausting our commit search, not the direction that we walk the trees in (top to bottom or bottom to top or whatever)? > To avoid over-walking trees, add them to a > LIFO queue and walk them from bottom-to-top after exploring commits > completely. Just to clarify - would it work just as well with a FIFO queue (not LIFO queue)? It seems to me that the most important part is doing this after exploring commits completely. > On git.git, this reduces a second immediate bitmap computation from 2.0s > to 1.0s. On linux.git, we go from 32s to 22s. On chromium's fork > network, we go from 227s to 198s. Nice timings. > static void fill_bitmap_commit(struct bb_commit *ent, > struct commit *commit, > - struct prio_queue *queue) > + struct prio_queue *queue, > + struct prio_queue *tree_queue, > + struct bitmap_index *old_bitmap, > + const uint32_t *mapping) > { > if (!ent->bitmap) > ent->bitmap = bitmap_new(); > > - bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); > prio_queue_put(queue, commit); > > while (queue->nr) { > struct commit_list *p; > struct commit *c = prio_queue_get(queue); > > + /* > + * If this commit has an old bitmap, then translate that > + * bitmap and add its bits to this one. No need to walk > + * parents or the tree for this commit. > + */ This comment should be right before "if (old && ...", I think. Here, it is a bit misleading. It leads me to think that "this commit has an old bitmap" means old_bitmap != NULL, but it is actually old != NULL. > + if (old_bitmap && mapping) { This is defensive in that if we somehow calculate old_bitmap without mapping (or the other way around) (which is a bug), things just slow down instead of breaking. I'm OK with this, but I still wanted to call it out. > + struct ewah_bitmap *old; > + > + old = bitmap_for_commit(old_bitmap, c); > + if (old && !rebuild_bitmap(mapping, old, ent->bitmap)) > + continue; > + } > + > + /* > + * Mark ourselves and queue our tree. The commit > + * walk ensures we cover all parents. > + */ > bitmap_set(ent->bitmap, find_object_pos(&c->object.oid)); > - fill_bitmap_tree(ent->bitmap, get_commit_tree(c)); > + prio_queue_put(tree_queue, get_commit_tree(c)); > > for (p = c->parents; p; p = p->next) { > int pos = find_object_pos(&p->item->object.oid); [snip] > @@ -386,6 +408,9 @@ void bitmap_writer_build(struct packing_data *to_pack) > size_t i; > int nr_stored = 0; /* for progress */ > struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; > + struct prio_queue tree_queue = { NULL }; NULL here does mean LIFO queue. OK. > @@ -395,6 +420,12 @@ void bitmap_writer_build(struct packing_data *to_pack) > trace2_region_enter("pack-bitmap-write", "building_bitmaps_total", > the_repository); > > + old_bitmap = prepare_bitmap_git(to_pack->repo); > + if (old_bitmap) > + mapping = create_bitmap_mapping(old_bitmap, to_pack); > + else > + mapping = NULL; Here, we prepare the old_bitmap and mapping arguments. OK.
On Tue, Dec 01, 2020 at 11:28:11PM -0800, Jonathan Tan wrote: > > However, if we walk trees > > from top to bottom, then we might be parsing trees that are actually > > part of a re-used bitmap. > > Isn't the issue that we shouldn't walk trees at all before exhausting > our commit search, not the direction that we walk the trees in (top to > bottom or bottom to top or whatever)? Right, the direction that we explore trees in isn't important: what matters is that we consider them after the commits. I've clarified the commit message to reflect this. > > To avoid over-walking trees, add them to a > > LIFO queue and walk them from bottom-to-top after exploring commits > > completely. > > Just to clarify - would it work just as well with a FIFO queue (not LIFO > queue)? It seems to me that the most important part is doing this after > exploring commits completely. Yup, see above. > > static void fill_bitmap_commit(struct bb_commit *ent, > > struct commit *commit, > > - struct prio_queue *queue) > > + struct prio_queue *queue, > > + struct prio_queue *tree_queue, > > + struct bitmap_index *old_bitmap, > > + const uint32_t *mapping) > > { > > if (!ent->bitmap) > > ent->bitmap = bitmap_new(); > > > > - bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); > > prio_queue_put(queue, commit); > > > > while (queue->nr) { > > struct commit_list *p; > > struct commit *c = prio_queue_get(queue); > > > > + /* > > + * If this commit has an old bitmap, then translate that > > + * bitmap and add its bits to this one. No need to walk > > + * parents or the tree for this commit. > > + */ > > This comment should be right before "if (old && ...", I think. Here, it > is a bit misleading. It leads me to think that "this commit has an old > bitmap" means old_bitmap != NULL, but it is actually old != NULL. Yup, the comment is much more clear when placed there, thanks. > > + if (old_bitmap && mapping) { > > This is defensive in that if we somehow calculate old_bitmap without > mapping (or the other way around) (which is a bug), things just slow > down instead of breaking. I'm OK with this, but I still wanted to call > it out. Right, we should never have one without the other, so in that sense this is a defensive check. IOW, this could easily be written as `if (old_bitmap)` or `if (mapping)`, but being extra defensive here doesn't hurt. Thanks, Taylor
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 1995f75818..37204b691c 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -340,20 +340,39 @@ static void fill_bitmap_tree(struct bitmap *bitmap, static void fill_bitmap_commit(struct bb_commit *ent, struct commit *commit, - struct prio_queue *queue) + struct prio_queue *queue, + struct prio_queue *tree_queue, + struct bitmap_index *old_bitmap, + const uint32_t *mapping) { if (!ent->bitmap) ent->bitmap = bitmap_new(); - bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); prio_queue_put(queue, commit); while (queue->nr) { struct commit_list *p; struct commit *c = prio_queue_get(queue); + /* + * If this commit has an old bitmap, then translate that + * bitmap and add its bits to this one. No need to walk + * parents or the tree for this commit. + */ + if (old_bitmap && mapping) { + struct ewah_bitmap *old; + + old = bitmap_for_commit(old_bitmap, c); + if (old && !rebuild_bitmap(mapping, old, ent->bitmap)) + continue; + } + + /* + * Mark ourselves and queue our tree. The commit + * walk ensures we cover all parents. + */ bitmap_set(ent->bitmap, find_object_pos(&c->object.oid)); - fill_bitmap_tree(ent->bitmap, get_commit_tree(c)); + prio_queue_put(tree_queue, get_commit_tree(c)); for (p = c->parents; p; p = p->next) { int pos = find_object_pos(&p->item->object.oid); @@ -363,6 +382,9 @@ static void fill_bitmap_commit(struct bb_commit *ent, } } } + + while (tree_queue->nr) + fill_bitmap_tree(ent->bitmap, prio_queue_get(tree_queue)); } static void store_selected(struct bb_commit *ent, struct commit *commit) @@ -386,6 +408,9 @@ void bitmap_writer_build(struct packing_data *to_pack) size_t i; int nr_stored = 0; /* for progress */ struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; + struct prio_queue tree_queue = { NULL }; + struct bitmap_index *old_bitmap; + uint32_t *mapping; writer.bitmaps = kh_init_oid_map(); writer.to_pack = to_pack; @@ -395,6 +420,12 @@ void bitmap_writer_build(struct packing_data *to_pack) trace2_region_enter("pack-bitmap-write", "building_bitmaps_total", the_repository); + old_bitmap = prepare_bitmap_git(to_pack->repo); + if (old_bitmap) + mapping = create_bitmap_mapping(old_bitmap, to_pack); + else + mapping = NULL; + bitmap_builder_init(&bb, &writer); for (i = bb.commits_nr; i > 0; i--) { struct commit *commit = bb.commits[i-1]; @@ -402,7 +433,8 @@ void bitmap_writer_build(struct packing_data *to_pack) struct commit *child; int reused = 0; - fill_bitmap_commit(ent, commit, &queue); + fill_bitmap_commit(ent, commit, &queue, &tree_queue, + old_bitmap, mapping); if (ent->selected) { store_selected(ent, commit); @@ -428,7 +460,9 @@ void bitmap_writer_build(struct packing_data *to_pack) ent->bitmap = NULL; } clear_prio_queue(&queue); + clear_prio_queue(&tree_queue); bitmap_builder_clear(&bb); + free(mapping); stop_progress(&writer.progress);