Message ID | 2728513.vuYhMxLoTh@mintaka.ncbr.muni.cz (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [RFC,resend] builtin/repack: Honor --keep-pack and .keep when repacking promisor objects | expand |
On 2025-01-29 at 10:02:06, Tomáš Trnka wrote: > diff --git a/builtin/repack.c b/builtin/repack.c > index d6bb37e84a..fe62fe03eb 100644 > --- a/builtin/repack.c > +++ b/builtin/repack.c > @@ -388,15 +388,23 @@ static int has_pack_ext(const struct generated_pack_data > *data, > } > > static void repack_promisor_objects(const struct pack_objects_args *args, > - struct string_list *names) > + struct string_list *names, > + struct string_list > *keep_pack_list) I don't have a strong opinion about the technical aspects of this patch (nor sufficient knowledge to review it)[0], but I noticed that there's a couple of places, this line among them, which are unexpectedly wrapped, so I don't believe this patch will actually apply. I noticed that the email didn't specify an MUA header (or I missed it), so I can't make a suggestion on how to fix your MUA, but you may want to use `git send-email` to avoid this problem in the future. [0] In other words, no need to CC me on a resend.
On Thursday, 30 January 2025 3:26:19, CET brian m. carlson wrote: > I don't have a strong opinion about the technical aspects of this patch > (nor sufficient knowledge to review it)[0], but I noticed that there's a > couple of places, this line among them, which are unexpectedly wrapped, > so I don't believe this patch will actually apply. Oops, my sincere apologies to everyone. I'll resend once again. (I messed up by not realizing that KMail will re-enable line wrapping when re- sending my original message, which was formatted correctly.) 2T
On Wed, Jan 29, 2025 at 6:12 PM Tomáš Trnka <trnka@scm.com> wrote: > > git-repack currently does not pass --keep-pack or --honor-pack-keep to > the git-pack-objects handling promisor packs. This means that settings > like gc.bigPackThreshold are completely ignored for promisor packs. > > The simple fix is to just copy the keep-pack logic into > repack_promisor_objects(), although this could possibly be improved by > making prepare_pack_objects() handle it instead. > We repack promisor packs by reading all the objects in promisor packs (in repack.c), and send them to pack-objects. pack-objects then write a single pack containing all the promisor objects. The actual old promisor pack deletion happens in repack.c So just simply copying the keep-pack logic to repack_promisor_objects() does not prevent the keep promisor packs from being repacked. One way to achieve what you wanted would be to filter the keep packs in repack_promisor_objects's for_each_packed_object(). Thanks.
On Monday, 10 February 2025 12:38:17, CET Han Young wrote: > We repack promisor packs by reading all the objects in promisor packs > (in repack.c), and send them to pack-objects. pack-objects then write a > single pack containing all the promisor objects. The actual old promisor > pack deletion happens in repack.c > > So just simply copying the keep-pack logic to repack_promisor_objects() > does not prevent the keep promisor packs from being repacked. I don't know much about the internals so maybe I misinterpreted what I saw, but the patch seems to fix the issue I observed: I have two 40 GiB promisor packs, and as soon as I accumulated 50 additional small packs (due to fetches), gc triggered a repack including the two big packs, despite them being above the bigPackThreshold so they could be kept. Repacking them is very disruptive, both because of the CPU and RAM load this produces, and also because this is on btrfs with snapshots, so rewriting those 80 GiB into a new file means consuming that much extra disk space for no good reason. With my patch, gc did not touch these two big packs but still collected all the small ones into one new pack as expected. Everything else also seems to work fine. According to the man page for git-pack-objects, it seems to me that this is how it's meant to work, because the description for --keep-pack says "This flag causes an object already in the given pack to be ignored, even if it would have otherwise been packed." (and something similar for --honor-pack- keep). To my untrained eyes, it looks like that's also how want_found_object()/add_object_entry() in pack-objects.c handle it. 2T
On Mon, Feb 10, 2025 at 8:53 PM Tomáš Trnka <trnka@scm.com> wrote: > With my patch, gc did not touch these two big packs but still collected all > the small ones into one new pack as expected. Everything else also seems to > work fine. Sorry for the long wait. I have tested the patch against a repo with only promisor packs. There are one big promisor pack and many small ones. This patch do work as expected, I'll take back the earlier "... does not prevent the keep promisor packs from being repacked." > According to the man page for git-pack-objects, it seems to me that this is > how it's meant to work, because the description for --keep-pack says "This > flag causes an object already in the given pack to be ignored, even if it > would have otherwise been packed." (and something similar for --honor-pack- > keep). To my untrained eyes, it looks like that's also how > want_found_object()/add_object_entry() in pack-objects.c handle it. This is also true. However, the for_each_packed_object macro in repack.c does not ignore the keep packs. repack still iterating objects in keep packs and sending them to pack-objects. pack-objects will then exclude these objects. To avoid doing unnecessary work, objects in keep packs should not be send over to pack-objects. Checking if the object should be ignore takes some time, after all. As for the test, t0410-partial-clone.sh is a better place imo. For the test is for the partial-clone repos. Thanks.
diff --git a/builtin/repack.c b/builtin/repack.c index d6bb37e84a..fe62fe03eb 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -388,15 +388,23 @@ static int has_pack_ext(const struct generated_pack_data *data, } static void repack_promisor_objects(const struct pack_objects_args *args, - struct string_list *names) + struct string_list *names, + struct string_list *keep_pack_list) { struct child_process cmd = CHILD_PROCESS_INIT; FILE *out; struct strbuf line = STRBUF_INIT; + int i; prepare_pack_objects(&cmd, args, packtmp); cmd.in = -1; + if (!pack_kept_objects) + strvec_push(&cmd.args, "--honor-pack-keep"); + for (i = 0; i < keep_pack_list->nr; i++) + strvec_pushf(&cmd.args, "--keep-pack=%s", + keep_pack_list->items[i].string); + /*
git-repack currently does not pass --keep-pack or --honor-pack-keep to the git-pack-objects handling promisor packs. This means that settings like gc.bigPackThreshold are completely ignored for promisor packs. The simple fix is to just copy the keep-pack logic into repack_promisor_objects(), although this could possibly be improved by making prepare_pack_objects() handle it instead. Signed-off-by: Tomáš Trnka <trnka@scm.com> --- RFC: This probably needs a test, but where and how should it be implemented? Perhaps in t7700-repack.sh, copying one of the tests using prepare_for_keep_packs and just touching .promisor files? Or instead in t/t0410-partial-clone.sh using a copy/variant of one of the basic repack tests there? builtin/repack.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) * NEEDSWORK: Giving pack-objects only the OIDs without any ordering * hints may result in suboptimal deltas in the resulting pack. See if @@ -1350,7 +1358,7 @@ int cmd_repack(int argc, strvec_push(&cmd.args, "--delta-islands"); if (pack_everything & ALL_INTO_ONE) { - repack_promisor_objects(&po_args, &names); + repack_promisor_objects(&po_args, &names, &keep_pack_list); if (has_existing_non_kept_packs(&existing) && delete_redundant && base-commit: 92999a42db1c5f43f330e4f2bca4026b5b81576f