mbox series

[v2,0/5] packfile: use oidset for bad objects

Message ID e50c1465-59de-7fe1-de01-800404c7640e@web.de (mailing list archive)
Headers show
Series packfile: use oidset for bad objects | expand

Message

René Scharfe Sept. 11, 2021, 8:31 p.m. UTC
Replace the custom hash array for remembering corrupt pack entries with
an oidset.  This shortens and simplifies the code.

Changes since v1:
- inline oidset_size()
- inline nth_midxed_pack_entry() early
- use oidset_size() to avoid a function call if no bad objects exist

  oidset: make oidset_size() an inline function
  midx: inline nth_midxed_pack_entry()
  packfile: convert mark_bad_packed_object() to object_id
  packfile: convert has_packed_and_bad() to object_id
  packfile: use oidset for bad objects

 midx.c         | 37 +++++++++++--------------------------
 object-file.c  |  4 ++--
 object-store.h |  4 ++--
 oidset.c       |  5 -----
 oidset.h       |  5 ++++-
 packfile.c     | 38 +++++++++++---------------------------
 packfile.h     |  4 ++--
 7 files changed, 32 insertions(+), 65 deletions(-)

--
2.33.0

Comments

René Scharfe Sept. 11, 2021, 8:45 p.m. UTC | #1
Am 11.09.21 um 22:31 schrieb René Scharfe:
> Replace the custom hash array for remembering corrupt pack entries with
> an oidset.  This shortens and simplifies the code.
>
> Changes since v1:
> - inline oidset_size()
> - inline nth_midxed_pack_entry() early
> - use oidset_size() to avoid a function call if no bad objects exist

- forgot to add "v2" to the subject of patches 1, 2, 3 :-/

René
Jeff King Sept. 11, 2021, 9:22 p.m. UTC | #2
On Sat, Sep 11, 2021 at 10:31:52PM +0200, René Scharfe wrote:

> Replace the custom hash array for remembering corrupt pack entries with
> an oidset.  This shortens and simplifies the code.
> 
> Changes since v1:
> - inline oidset_size()
> - inline nth_midxed_pack_entry() early
> - use oidset_size() to avoid a function call if no bad objects exist

Thanks, these all look fine to me. I raised a question elsewhere in the
thread about inlining oidset_contains(), but I think that can be
considered separately (and is less clear-cut; we might win by saving a
function call, but we might lose due to larger code size).

-Peff
Derrick Stolee Sept. 11, 2021, 11:59 p.m. UTC | #3
On 9/11/21 4:31 PM, René Scharfe wrote:
> Replace the custom hash array for remembering corrupt pack entries with
> an oidset.  This shortens and simplifies the code.
> 
> Changes since v1:
> - inline oidset_size()
> - inline nth_midxed_pack_entry() early
> - use oidset_size() to avoid a function call if no bad objects exist
> 
>   oidset: make oidset_size() an inline function
>   midx: inline nth_midxed_pack_entry()
>   packfile: convert mark_bad_packed_object() to object_id
>   packfile: convert has_packed_and_bad() to object_id
>   packfile: use oidset for bad objects

These were easy reads, and I understand the value of them.

I initially hesitated to support the drop of
nth_midxed_pack_entry(), since it was designed with things
like midx bitmaps in mind (specifically, to also support
lex-order-to-stable-order conversions). However, it seems
that the midx bitmap series by Taylor is succeeding without
needing such a translation.

Thanks,
-Stolee
Taylor Blau Sept. 12, 2021, 1:51 a.m. UTC | #4
On Sat, Sep 11, 2021 at 07:59:40PM -0400, Derrick Stolee wrote:
> I initially hesitated to support the drop of
> nth_midxed_pack_entry(), since it was designed with things
> like midx bitmaps in mind (specifically, to also support
> lex-order-to-stable-order conversions).

I didn't know that nth_midxed_pack_entry was designed with either
purpose in mind, since it predates midx bitmaps by quite a bit.

> However, it seems that the midx bitmap series by Taylor is succeeding
> without needing such a translation.

Right, it looks like that function is only called by fill_midx_entry()
where it was inlined.

Thanks,
Taylor
Taylor Blau Sept. 12, 2021, 2:29 a.m. UTC | #5
On Sat, Sep 11, 2021 at 09:51:04PM -0400, Taylor Blau wrote:
> On Sat, Sep 11, 2021 at 07:59:40PM -0400, Derrick Stolee wrote:
> > I initially hesitated to support the drop of
> > nth_midxed_pack_entry(), since it was designed with things
> > like midx bitmaps in mind (specifically, to also support
> > lex-order-to-stable-order conversions).
>
> I didn't know that nth_midxed_pack_entry was designed with either
> purpose in mind, since it predates midx bitmaps by quite a bit.

Thinking on it more, I can imagine that you wrote this function
aspirationally envisioning something like MIDX bitmaps. And since you
and I discussed the design together quite a bit, I imagine that that's
the case ;-).

But I agree that after reading this series again, that the inline-ing
suggested makes sense (and doesn't conflict with any series I have in
flight which don't add any new callers).

Thanks,
Taylor
Derrick Stolee Sept. 12, 2021, 3:51 a.m. UTC | #6
On 9/11/21 10:29 PM, Taylor Blau wrote:
> On Sat, Sep 11, 2021 at 09:51:04PM -0400, Taylor Blau wrote:
>> On Sat, Sep 11, 2021 at 07:59:40PM -0400, Derrick Stolee wrote:
>>> I initially hesitated to support the drop of
>>> nth_midxed_pack_entry(), since it was designed with things
>>> like midx bitmaps in mind (specifically, to also support
>>> lex-order-to-stable-order conversions).
>>
>> I didn't know that nth_midxed_pack_entry was designed with either
>> purpose in mind, since it predates midx bitmaps by quite a bit.
> 
> Thinking on it more, I can imagine that you wrote this function
> aspirationally envisioning something like MIDX bitmaps. And since you
> and I discussed the design together quite a bit, I imagine that that's
> the case ;-).
> 
> But I agree that after reading this series again, that the inline-ing
> suggested makes sense (and doesn't conflict with any series I have in
> flight which don't add any new callers).

I'm thinking more to my original design of the multi-pack-index.
At that time, I was thinking about the possible integration
with bitmaps based on my experience in other systems which used
a stable object order to allow writing bitmaps asynchronously
with respect to the multi-pack-index write and object packing.

One thing that you did when first considering bitmaps over the
multi-pack-index was to demonstrate that a stable object order
is not required, which surprised and delighted me. It greatly
reduced the complexity of the problem, and being able to inline
this method is only one small fallout from that simplicity.

Thanks,
-Stolee
Taylor Blau Sept. 12, 2021, 4:01 a.m. UTC | #7
On Sat, Sep 11, 2021 at 11:51:36PM -0400, Derrick Stolee wrote:
> On 9/11/21 10:29 PM, Taylor Blau wrote:
> > On Sat, Sep 11, 2021 at 09:51:04PM -0400, Taylor Blau wrote:
> >> On Sat, Sep 11, 2021 at 07:59:40PM -0400, Derrick Stolee wrote:
> >>> I initially hesitated to support the drop of
> >>> nth_midxed_pack_entry(), since it was designed with things
> >>> like midx bitmaps in mind (specifically, to also support
> >>> lex-order-to-stable-order conversions).
> >>
> >> I didn't know that nth_midxed_pack_entry was designed with either
> >> purpose in mind, since it predates midx bitmaps by quite a bit.
> >
> > Thinking on it more, I can imagine that you wrote this function
> > aspirationally envisioning something like MIDX bitmaps. And since you
> > and I discussed the design together quite a bit, I imagine that that's
> > the case ;-).
> >
> > But I agree that after reading this series again, that the inline-ing
> > suggested makes sense (and doesn't conflict with any series I have in
> > flight which don't add any new callers).
>
> I'm thinking more to my original design of the multi-pack-index.
> At that time, I was thinking about the possible integration
> with bitmaps based on my experience in other systems which used
> a stable object order to allow writing bitmaps asynchronously
> with respect to the multi-pack-index write and object packing.

Makes sense, and thank you for clarifying. After re-reading my first
email, I figured that this is what you must have been talking about,
which is why I felt like I should rephrase (hence the follow-up email).

> One thing that you did when first considering bitmaps over the
> multi-pack-index was to demonstrate that a stable object order
> is not required, which surprised and delighted me. It greatly
> reduced the complexity of the problem, and being able to inline
> this method is only one small fallout from that simplicity.

This was definitely a consequence of what I had observed from seeing
what was "slow" when running bitmaps in production at GitHub. There,
repacking a repository's objects all into one pack each time we ran our
automated background jobs far outpaced the amount of time we spent
generating bitmaps.

And with your and Peff's work on improving bitmap generation itself,
things are in a pretty good place. I do have some potential ideas for
future improvement, like a mode where bitmaps are only "fast forwarded",
meaning that new bitmaps are only added between the commits selected for
bitmapping in the previous round and the current reference tips.

I think things like that end up getting you pretty far, but it may be
interesting to come back eventually and revisit adding a stable object
order. In the meantime, though... ;)

Thanks,
Taylor