Message ID | cover.1716318088.git.me@ttaylorr.com (mailing list archive) |
---|---|
Headers | show |
Series | pack-bitmap: pseudo-merge reachability bitmaps | expand |
On Tue, May 21, 2024 at 03:01:38PM -0400, Taylor Blau wrote:
> Here is another reroll my topic to introduce pseudo-merge bitmaps.
OK, I got through the whole thing. I left a few small comments, but
mostly just observations. Overall, the shape of it looks pretty good.
The much bigger question to me is: does it work?
The perf results you showed at the end are underwhelming, but I think
that's mostly because it's not an interesting repository. I think it
would be nice to see at least some point-in-time benchmarks on a
single repository.
But much more interesting to me is how it performs in the real world in
aggregate, over time:
- how often / how much do pseudo-merges speed up queries in the real
world. Clones/fetches, but also reachability queries. Could
connectivity checks use bitmaps with this?
- how often do pseudo-merge groups get invalidated by refs changing
(and thus we lose the speedups from above)?
- what's the cost like to generate them initially?
- what's the cost like for subsequent repacks? Does the selection /
grouping algorithm do a good job of keeping the older, larger groups
stable (so that we can reuse them verbatim)?
I know you don't have those answers yet, and I know there's some
chicken-and-egg with getting this integrated so that you can start to
explore that. So I mostly reviewed this with an eye towards:
- does the idea make sense (I think it does, but I'm kind of biased)
- are the patches going to hurt anybody who isn't using the new
feature (I think the answer is no)
- does the on-disk representation seem right, since that is hard to
change later (I didn't see any issues)
- does the implementation look clean and plausibly correct (yes, but
what I'm getting at is that I didn't pore over all of the new code
with a microscope. Mostly I think the proof is in the pudding that
it provides the same correct answers more quickly).
So to my eyes it looks good to move forward and let people start playing
with it. The big "experimental" warning in the config is good. Maybe
we'd want want in gitpacking(7), too?
I did wonder briefly what the backup plan is if there are problems with
the on-disk format (or worst case, if it turns out to be a dead end).
We've allocated a flag bit which I think we'd need to respect forever
(though as an optional extension, it's OK to understand and ignore it).
If we needed a "pseudo-merge bitmaps v2", how would that work? I think
we'd have to allocate another bit in the flags field.
I wonder if the start of the pseudo-merge section should have a 4-byte
version/flags field itself? I don't think that's something we've done
before, and maybe it's overkill. I dunno. It's just a lot easier to do
now than later.
-Peff
On Thu, May 23, 2024 at 07:05:32AM -0400, Jeff King wrote: > I wonder if the start of the pseudo-merge section should have a 4-byte > version/flags field itself? I don't think that's something we've done > before, and maybe it's overkill. I dunno. It's just a lot easier to do > now than later. I think the tricky thing here would be that the extension itself is a variable size, so every version would have to put the "extension size" field in the same place. Otherwise, an older Git client which doesn't understand a future version of the pseudo-merge extension wouldn't know how large the extension is, and wouldn't be able to adjust the index_end field appropriately to skip over it. Of course, we could make it a convention that says "all versions have to place the extension size field at the same relative offset", but it feels weird to read some of the extension while not understanding the whole thing. I'm definitely not saying that I think the specification ought to be set in stone forever, but I think any changes would want to be behind a new bitmap extension, not a version within the same extension. Thanks, Taylor
On Thu, May 23, 2024 at 07:05:32AM -0400, Jeff King wrote: > On Tue, May 21, 2024 at 03:01:38PM -0400, Taylor Blau wrote: > > > Here is another reroll my topic to introduce pseudo-merge bitmaps. > > OK, I got through the whole thing. I left a few small comments, but > mostly just observations. Overall, the shape of it looks pretty good. > The much bigger question to me is: does it work? Thanks for working through this series, I appreciate you taking a close look, especially since it's on the longer end. > The perf results you showed at the end are underwhelming, but I think > that's mostly because it's not an interesting repository. I think it > would be nice to see at least some point-in-time benchmarks on a > single repository. I'd say it's mostly because I misspelt "pseudo" as "psuedo", but either way ;-). > I know you don't have those answers yet, and I know there's some > chicken-and-egg with getting this integrated so that you can start to > explore that. So I mostly reviewed this with an eye towards: > > - does the idea make sense (I think it does, but I'm kind of biased) > > - are the patches going to hurt anybody who isn't using the new > feature (I think the answer is no) > > - does the on-disk representation seem right, since that is hard to > change later (I didn't see any issues) > > - does the implementation look clean and plausibly correct (yes, but > what I'm getting at is that I didn't pore over all of the new code > with a microscope. Mostly I think the proof is in the pudding that > it provides the same correct answers more quickly). > > So to my eyes it looks good to move forward and let people start playing > with it. The big "experimental" warning in the config is good. Maybe > we'd want want in gitpacking(7), too? I think adding a matching warning in gitpacking(7) is a good idea. Your feeling matches my own here that the burden of this series on those not using pseudo-merge bitmaps is negligible, and that this unblocks those who want to move forward with testing this out in the wild on doing so. I'll send a v4 shortly with the minor changes that I picked up from your last read-through, and then I think we are good to go here. Thanks, Taylor
On Thu, May 23, 2024 at 04:04:15PM -0400, Taylor Blau wrote: > On Thu, May 23, 2024 at 07:05:32AM -0400, Jeff King wrote: > > I wonder if the start of the pseudo-merge section should have a 4-byte > > version/flags field itself? I don't think that's something we've done > > before, and maybe it's overkill. I dunno. It's just a lot easier to do > > now than later. > > I think the tricky thing here would be that the extension itself is a > variable size, so every version would have to put the "extension size" > field in the same place. > > Otherwise, an older Git client which doesn't understand a future version > of the pseudo-merge extension wouldn't know how large the extension is, > and wouldn't be able to adjust the index_end field appropriately to skip > over it. > > Of course, we could make it a convention that says "all versions have to > place the extension size field at the same relative offset", but it > feels weird to read some of the extension while not understanding the > whole thing. Ah, yeah, I didn't think of that. That definitely complicates things. It certainly would be possible to have a version+size header at the start. Which is...basically reinventing the chunk format. Let's not worry about it for now, and as a long-term thing we might consider moving the bitmap format over to the chunk style. -Peff