Message ID | 20241008081350.8950-1-hanyang.tony@bytedance.com (mailing list archive) |
---|---|
Headers | show |
Series | repack: pack everything into promisor packfile in partial repos | expand |
Han Young <hanyang.tony@bytedance.com> writes: > As suggested by Jonathan[1], there are number of ways to fix this issue. > We have already explored some of them in this thread, and so far none of them > is satisfiable. Calvin and I tried to address the problem from fetch-pack side > and rev-list side. But the fix either consumes too much CPU power or results > in inefficient bandwidth use. > > So let's attack the problem from repack side. The goal is to prevent repack > from discarding local objects, previously it is done by carefully > separating promisor objects and normal objects in rev-list. > The implementation is flawed and no solution have been found so far. > Instead, we can get ride of rev-list and just pack everything into promisor > files. This way, no objects would be lost. > > By using 'repack everything', repacking requires less work and we are not > using more bandwidth. OK, perhaps. > Packing local objects into promisor packfiles means that it is no longer > possible to detect if an object is missing due to repository corruption > or because we need to fetch it from a promisor remote. Is it true that without doing this, we can tell between these two cases, though? More importantly, even if it is true, would there be a practical difference? In the sample scenario used in [1/3] where you created C2/T2/B2 on top of C1/T1/B1 (which came from a promisor remote), somebody else built C3/T3/B3 on top, and it came back from the promisor remote, you could lose 3's objects and 1's objects and they can be refetched but even if you lose 2's objects, since 3's objects are building on top of them, you should be able to fetch them from the promisor remote just like objects from 1 and 3, no? So strictly speaking, missing 2's objects may be "repository corruption" while missing 1's and 3's objects may not be, would there be a practical use for that information? > Promisor objects packing does not benefiting from the history and > path based delta calculation, and GC does not remove unreachable promisor > objects. By packing locally created normal objects into promisor packfile, > normal objects are converted into promisor objects. However, in partial cloned > repos, the number of locally created objects are small compared to promisor > objects. The impact should be negligible. > [1] https://lore.kernel.org/git/20240813004508.2768102-1-jonathantanmy@google.com/ > > *** Changes since v1 *** > Added tradeoffs in cover letter. > Fixed some partial clone test cases. > Updated partial clone documentation. These patches are based on the tip of master before 365529e1 (Merge branch 'ps/leakfixes-part-7', 2024-10-02), which will give mildly annoying conflicts when merged to 'seen'. I've managed to apply and then merge, so unless review discussions find needs for updates, there is no need for immediate reroll, but if you end up having to update these patches, it is a good idea to rebase the topic on top of v2.47.0 that was released early this week, as we are now entering a new development cycle. Thanks. > > Han Young (3): > repack: pack everything into packfile > t0410: adapt tests to repack changes > partial-clone: update doc > > Documentation/technical/partial-clone.txt | 16 +- > builtin/repack.c | 257 ++++++++++++---------- > t/t0410-partial-clone.sh | 68 +----- > t/t5616-partial-clone.sh | 9 +- > 4 files changed, 157 insertions(+), 193 deletions(-)
Junio C Hamano <gitster@pobox.com> writes: > I've managed to apply and then merge, so unless review discussions > find needs for updates, there is no need for immediate reroll, but > if you end up having to update these patches, it is a good idea to > rebase the topic on top of v2.47.0 that was released early this > week, as we are now entering a new development cycle. When merged to the tip of 'seen', it seems to break t5710. It might be due to mismerge, but can you check on your end? Thanks.
On Wed, Oct 9, 2024 at 5:57 AM Junio C Hamano <gitster@pobox.com> wrote: > > Packing local objects into promisor packfiles means that it is no longer > > possible to detect if an object is missing due to repository corruption > > or because we need to fetch it from a promisor remote. > > Is it true that without doing this, we can tell between these two > cases, though? More importantly, even if it is true, would there be > a practical difference? > > In the sample scenario used in [1/3] where you created C2/T2/B2 on > top of C1/T1/B1 (which came from a promisor remote), somebody else > built C3/T3/B3 on top, and it came back from the promisor remote, > you could lose 3's objects and 1's objects and they can be refetched > but even if you lose 2's objects, since 3's objects are building on > top of them, you should be able to fetch them from the promisor > remote just like objects from 1 and 3, no? So strictly speaking, > missing 2's objects may be "repository corruption" while missing 1's > and 3's objects may not be, would there be a practical use for that > information? Some code path does check if the missing object is promisor object before lazy fetching, `--missing=` does this check. But in that case, C2 is also a promisor object, the check would pass. There are no partial clone filter that omits commits, missing commit will always result in error. And even if we do report "repository corruption", the best course of action is still try to fetching them. So, no. I don't think there are practical uses for that information > These patches are based on the tip of master before 365529e1 (Merge > branch 'ps/leakfixes-part-7', 2024-10-02), which will give mildly > annoying conflicts when merged to 'seen'. > > I've managed to apply and then merge, so unless review discussions > find needs for updates, there is no need for immediate reroll, but > if you end up having to update these patches, it is a good idea to > rebase the topic on top of v2.47.0 that was released early this > week, as we are now entering a new development cycle. Thanks, I will rebase to master and see if any other tests break.