mbox series

[0/2] pack-objects: freshen objects with multi-cruft packs

Message ID cover.1740680964.git.me@ttaylorr.com (mailing list archive)
Headers show
Series pack-objects: freshen objects with multi-cruft packs | expand

Message

Taylor Blau Feb. 27, 2025, 6:29 p.m. UTC
This short series contains a fix for a bug I noticed while rolling out
multi-cruft packs (via 'git repack --max-cruft-size') within GitHub's
infrastructure.

The series is structured as follows:

 - The first patch simplifies how 'repack' aggregates cruft packs
   together when their size is below the '--max-cruft-size' or
   '--max-pack-size' threshold. This simplification changes behavior
   slightly, but not in a meaningful way. It occurred to me while
   writing the second patch.

 - The second patch describes and fixes the main bug. The gist here is
   that objects which are (a) unreachable, (b) exist in a cruft pack
   being retained, and (c) were freshened to have a more recent mtime
   than any existing cruft copy are unable to be freshened.

The fix pursued in the second patch changes the rules around when we
want to retain an object via builtin/pack-objects.c::want_found_object()
when at least one cruft pack will survive the repack.

Previously the rule was to discard any object which appears in any
surviving pack, regardless of mtime. The rule now is to only discard an
object if it appears in either (a) a non-cruft pack which will survive
the repack, or (b) a cruft pack whose mtime for that object is older
than the one we are trying to pack.

I think that this is the right behavior, but admittedly putting this
series together hurt my brain trying to think through all of the cases.
I'm fairly confident in the testing here as I remember it being fairly
exhaustive of all interesting cases. But I'd appreciate a sanity check
from others that they too are convinced this is the right approach.

Thanks in advance for your review!

Taylor Blau (2):
  builtin/repack.c: simplify cruft pack aggregation
  builtin/pack-objects.c: freshen objects from existing cruft packs

 builtin/pack-objects.c  | 118 ++++++++++++++++++++++++++++++++++------
 builtin/repack.c        |  38 +------------
 packfile.c              |   3 +-
 packfile.h              |   2 +
 t/t7704-repack-cruft.sh | 106 ++++++++++++++++++++++--------------
 5 files changed, 171 insertions(+), 96 deletions(-)


base-commit: 08bdfd453584e489d5a551aecbdcb77584e1b958

Comments

Elijah Newren Feb. 27, 2025, 7:28 p.m. UTC | #1
On Thu, Feb 27, 2025 at 10:29 AM Taylor Blau <me@ttaylorr.com> wrote:
>
> This short series contains a fix for a bug I noticed while rolling out
> multi-cruft packs (via 'git repack --max-cruft-size') within GitHub's
> infrastructure.
>
> The series is structured as follows:
>
>  - The first patch simplifies how 'repack' aggregates cruft packs
>    together when their size is below the '--max-cruft-size' or
>    '--max-pack-size' threshold. This simplification changes behavior
>    slightly, but not in a meaningful way. It occurred to me while
>    writing the second patch.
>
>  - The second patch describes and fixes the main bug. The gist here is
>    that objects which are (a) unreachable, (b) exist in a cruft pack
>    being retained, and (c) were freshened to have a more recent mtime
>    than any existing cruft copy are unable to be freshened.
>
> The fix pursued in the second patch changes the rules around when we
> want to retain an object via builtin/pack-objects.c::want_found_object()
> when at least one cruft pack will survive the repack.
>
> Previously the rule was to discard any object which appears in any
> surviving pack, regardless of mtime. The rule now is to only discard an
> object if it appears in either (a) a non-cruft pack which will survive
> the repack, or (b) a cruft pack whose mtime for that object is older
> than the one we are trying to pack.

I think in (b) you got the meaning reversed, and instead mean s/older
than/at least as new as/ ?

> I think that this is the right behavior, but admittedly putting this
> series together hurt my brain trying to think through all of the cases.
> I'm fairly confident in the testing here as I remember it being fairly
> exhaustive of all interesting cases. But I'd appreciate a sanity check
> from others that they too are convinced this is the right approach.
>
> Thanks in advance for your review!
>
> Taylor Blau (2):
>   builtin/repack.c: simplify cruft pack aggregation
>   builtin/pack-objects.c: freshen objects from existing cruft packs
>
>  builtin/pack-objects.c  | 118 ++++++++++++++++++++++++++++++++++------
>  builtin/repack.c        |  38 +------------
>  packfile.c              |   3 +-
>  packfile.h              |   2 +
>  t/t7704-repack-cruft.sh | 106 ++++++++++++++++++++++--------------
>  5 files changed, 171 insertions(+), 96 deletions(-)

Code changes look good to me, but I had some wording suggestions in a
few places for commit messages and comments.  (Sorry for missing some
of those in my preliminary review before you sent this series to the
list.)
Taylor Blau Feb. 27, 2025, 11:05 p.m. UTC | #2
On Thu, Feb 27, 2025 at 11:28:18AM -0800, Elijah Newren wrote:
> > Previously the rule was to discard any object which appears in any
> > surviving pack, regardless of mtime. The rule now is to only discard an
> > object if it appears in either (a) a non-cruft pack which will survive
> > the repack, or (b) a cruft pack whose mtime for that object is older
> > than the one we are trying to pack.
>
> I think in (b) you got the meaning reversed, and instead mean s/older
> than/at least as new as/ ?

Oops, you are definitely right, that is a clear typo on my part.

> Code changes look good to me, but I had some wording suggestions in a
> few places for commit messages and comments.  (Sorry for missing some
> of those in my preliminary review before you sent this series to the
> list.)

No problem. I had only shared them before sending them here for a sanity
check on the implementation, so I was figuring that I'd get some
comments here on both the patch message and contents.

I adjusted the wording of both patches slightly to reflect your review.
I'll hold off on sending another round for a day or so in case any other
reviewers want to chime in. Thanks again for the careful review!

Thanks,
Taylor