Message ID | d3992c98aaa54c3635c249a15d919aa1177324d8.1699311386.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 4263f9279e331fabb34fac7ef92a5a1061ae1d01 |
Headers | show |
Series | revision: exclude all packed objects with `--unpacked` | expand |
Taylor Blau <me@ttaylorr.com> writes: > Subject: Re: [PATCH 1/2] list-objects: drop --unpacked non-commit objects from results I thought the purpose of this change is to make sure that we drop packed objects from results when --unpacked is given? This makes it sound as if we are dropping unpacked ones instead? I dunno. > In git-rev-list(1), we describe the `--unpacked` option as: > > Only useful with `--objects`; print the object IDs that are not in > packs. > > This is true of commits, which we discard via get_commit_action(), but > not of the objects they reach. So if we ask for an --objects traversal > with --unpacked, we may get arbitrarily many objects which are indeed > packed. Strictly speaking, as long as all the objects that are not in packs are shown, "print the object IDs that are not in packs" is satisfied. With this fix, perhaps we would want to tighten the explanation a little bit while we are at it. Perhaps print the object names but exclude those that are in packs or something along that line? > diff --git a/list-objects.c b/list-objects.c > index c25c72b32c..c8a5fb998e 100644 > --- a/list-objects.c > +++ b/list-objects.c > @@ -39,6 +39,9 @@ static void show_object(struct traversal_context *ctx, > { > if (!ctx->show_object) > return; > + if (ctx->revs->unpacked && has_object_pack(&object->oid)) > + return; > + > ctx->show_object(object, name, ctx->show_data); > } The implementation is as straight-forward as it can get. Very pleasing. > diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh > index 12def7bcbf..6289a2e8b0 100755 > --- a/t/t6000-rev-list-misc.sh > +++ b/t/t6000-rev-list-misc.sh > @@ -169,4 +169,17 @@ test_expect_success 'rev-list --count --objects' ' > test_line_count = $count actual > ' > > +test_expect_success 'rev-list --unpacked' ' > + git repack -ad && > + test_commit unpacked && > + > + git rev-list --objects --no-object-names unpacked^.. >expect.raw && > + sort expect.raw >expect && > + > + git rev-list --all --objects --unpacked --no-object-names >actual.raw && > + sort actual.raw >actual && > + > + test_cmp expect actual > +' > + > test_done
On Tue, Nov 07, 2023 at 11:21:29AM +0900, Junio C Hamano wrote: > > In git-rev-list(1), we describe the `--unpacked` option as: > > > > Only useful with `--objects`; print the object IDs that are not in > > packs. > > > > This is true of commits, which we discard via get_commit_action(), but > > not of the objects they reach. So if we ask for an --objects traversal > > with --unpacked, we may get arbitrarily many objects which are indeed > > packed. > > Strictly speaking, as long as all the objects that are not in packs > are shown, "print the object IDs that are not in packs" is satisfied. > With this fix, perhaps we would want to tighten the explanation a > little bit while we are at it. Perhaps > > print the object names but exclude those that are in packs > > or something along that line? I think using the word "exclude" is a good idea, as it makes it clear that we are omitting objects that otherwise would be traversed (as opposed to just showing unpacked objects, reachable or not). But I wanted to point out one other subtlety here. The existing code (before this patch) checks for already-packed commits, and avoids adding them to the traversal. The problem this patch is fixing is that we may see objects they point to via other non-packed commits. But the opposite problem exists, too: we have unpacked objects that are reachable from those packed commits. It's probably reasonably rare, since we _tend_ to make packs by rolling up reachable chunks of history. But that's not a guarantee. One way I can think of for it to happen in practice is that somebody pushes (or fetches) a thin pack with commit C as a delta against an unpacked C'. In that case "index-pack --fix-thin" will create a duplicate of C' in the new pack, but its trees and blobs may remain unpacked. I think with the patch in this series we could actually drop that "do not traverse commits that are unpacked" line of code, and end up "more correct". But I suspect performance of an incremental "git repack -d" would suffer. This is kind of analagous to the "we do not traverse every UNINTERESTING commit just to mark its trees/blobs as UNINTERESTING" optimization. We know that it is not a true set difference, but it is OK in practice and it buys us a lot of performance. And just like that case, bitmaps do let us cheaply compute the true set difference. ;) -Peff
diff --git a/list-objects.c b/list-objects.c index c25c72b32c..c8a5fb998e 100644 --- a/list-objects.c +++ b/list-objects.c @@ -39,6 +39,9 @@ static void show_object(struct traversal_context *ctx, { if (!ctx->show_object) return; + if (ctx->revs->unpacked && has_object_pack(&object->oid)) + return; + ctx->show_object(object, name, ctx->show_data); } diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index 12def7bcbf..6289a2e8b0 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -169,4 +169,17 @@ test_expect_success 'rev-list --count --objects' ' test_line_count = $count actual ' +test_expect_success 'rev-list --unpacked' ' + git repack -ad && + test_commit unpacked && + + git rev-list --objects --no-object-names unpacked^.. >expect.raw && + sort expect.raw >expect && + + git rev-list --all --objects --unpacked --no-object-names >actual.raw && + sort actual.raw >actual && + + test_cmp expect actual +' + test_done