[0/8] Speed up connectivity checks via quarantine dir

Message ID	cover.1621451532.git.ps@pks.im (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> Date: Wed, 19 May 2021 21:13:18 +0200 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Subject: [PATCH 0/8] Speed up connectivity checks via quarantine dir Message-ID: <cover.1621451532.git.ps@pks.im> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="2u8Sv8/nWfPHBi0I" Content-Disposition: inline Precedence: bulk
Series	Speed up connectivity checks via quarantine dir \| expand [0/8] Speed up connectivity checks via quarantine dir [1/8] perf: fix when running with TEST_OUTPUT_DIRECTORY [2/8] p5400: add perf tests for git-receive-pack(1) [3/8] tmp-objdir: expose function to retrieve path [4/8] packfile: have `for_each_file_in_pack_dir()` return error codes [5/8] object-file: allow reading loose objects without reading their contents [6/8] connected: implement connectivity check via temporary object dirs [7/8] receive-pack: skip connectivity checks on delete-only commands [8/8] receive-pack: check connectivity via quarantined objects

Patrick Steinhardt May 19, 2021, 7:13 p.m. UTC

Hi,

While analyzing push performance on gitlab.com, I've been at times
wondering what git-receive-pack(1) is doing for so long. For some repos
which have loads of references (~880k), even tiny pushes of less than 10
objects took dozens of seconds to get accepted.

One of the issues I've found is the object connectivity check, which may
run for a significant amount of time. The root cause here is that we're
computing connectivity via `git rev-list --not --all`: if we've got many
refs in the repository, computing `--not --all` is hugely expensive.

This commit series thus implements an alternative way of computing
reachability, which reuses information from the object quarantine
environment. Instead of doing a refwalk, we just look iterate over all
packed and loose quarantined objects any for each of them, we determine
whether their immediate references are all satisfied.

This reimplementation is paying out quite well for repos which have many
refs. The following benchmarks for git-receive-pack(1) (added in patch
2/8) have been performed in linux-stable.git:

Test                                     v2.32.0-rc0             HEAD
--------------------------------------------------------------------------------------------
5400.3: receive-pack clone create        1.27(1.11+0.16)         0.02(0.01+0.01) -98.4%
5400.5: receive-pack clone update        1.27(1.13+0.13)         0.02(0.02+0.00) -98.4%
5400.7: receive-pack clone reset         0.13(0.11+0.02)         0.02(0.01+0.01) -84.6%
5400.9: receive-pack clone delete        0.02(0.01+0.01)         0.03(0.02+0.01) +50.0%
5400.11: receive-pack extrarefs create   33.01(18.80+14.43)      9.00(4.30+4.65) -72.7%
5400.13: receive-pack extrarefs update   33.13(18.85+14.50)      9.01(4.28+4.67) -72.8%
5400.15: receive-pack extrarefs reset    32.90(18.82+14.32)      9.04(4.26+4.77) -72.5%
5400.17: receive-pack extrarefs delete   9.13(4.35+4.77)         8.94(4.29+4.64) -2.1%
5400.19: receive-pack empty create       223.35(640.63+127.74)   227.55(651.75+130.94) +1.9%

These rather clearly show that the previous rev-walk has been a major
bottleneck in the implementation.

Patrick

Patrick Steinhardt (8):
  perf: fix when running with TEST_OUTPUT_DIRECTORY
  p5400: add perf tests for git-receive-pack(1)
  tmp-objdir: expose function to retrieve path
  packfile: have `for_each_file_in_pack_dir()` return error codes
  object-file: allow reading loose objects without reading their
    contents
  connected: implement connectivity check via temporary object dirs
  receive-pack: skip connectivity checks on delete-only commands
  receive-pack: check connectivity via quarantined objects

 builtin/receive-pack.c       |  57 +++++++----
 connected.c                  | 192 +++++++++++++++++++++++++++++++++++
 connected.h                  |  19 ++++
 midx.c                       |  22 ++--
 object-file.c                |   9 +-
 packfile.c                   |  26 +++--
 packfile.h                   |  10 +-
 t/perf/aggregate.perl        |   8 +-
 t/perf/p5400-receive-pack.sh |  74 ++++++++++++++
 t/perf/perf-lib.sh           |   4 +-
 t/perf/run                   |  25 +++--
 tmp-objdir.c                 |   7 ++
 tmp-objdir.h                 |   5 +
 13 files changed, 401 insertions(+), 57 deletions(-)
 create mode 100755 t/perf/p5400-receive-pack.sh

Chris Torek May 20, 2021, 2:19 a.m. UTC | #1

On Wed, May 19, 2021 at 1:22 PM Patrick Steinhardt <ps@pks.im> wrote:
> Test                                     v2.32.0-rc0             HEAD
> --------------------------------------------------------------------------------------------
> 5400.3: receive-pack clone create        1.27(1.11+0.16)         0.02(0.01+0.01) -98.4%
> 5400.5: receive-pack clone update        1.27(1.13+0.13)         0.02(0.02+0.00) -98.4%
> 5400.7: receive-pack clone reset         0.13(0.11+0.02)         0.02(0.01+0.01) -84.6%
> 5400.9: receive-pack clone delete        0.02(0.01+0.01)         0.03(0.02+0.01) +50.0%
> 5400.11: receive-pack extrarefs create   33.01(18.80+14.43)      9.00(4.30+4.65) -72.7%
> 5400.13: receive-pack extrarefs update   33.13(18.85+14.50)      9.01(4.28+4.67) -72.8%
> 5400.15: receive-pack extrarefs reset    32.90(18.82+14.32)      9.04(4.26+4.77) -72.5%
> 5400.17: receive-pack extrarefs delete   9.13(4.35+4.77)         8.94(4.29+4.64) -2.1%
> 5400.19: receive-pack empty create       223.35(640.63+127.74)   227.55(651.75+130.94) +1.9%
>
> These rather clearly show that the previous rev-walk has been a major
> bottleneck in the implementation.

These are pretty impressive speedups! :-)

I didn't look too closely at the C code for the connectivity scan as I am
not very familiar with the background.  I did look at everything lightly
though, for whatever that's worth.

Chris

Jeff King May 20, 2021, 4:50 p.m. UTC | #2

On Wed, May 19, 2021 at 09:13:18PM +0200, Patrick Steinhardt wrote:

> One of the issues I've found is the object connectivity check, which may
> run for a significant amount of time. The root cause here is that we're
> computing connectivity via `git rev-list --not --all`: if we've got many
> refs in the repository, computing `--not --all` is hugely expensive.
> 
> This commit series thus implements an alternative way of computing
> reachability, which reuses information from the object quarantine
> environment. Instead of doing a refwalk, we just look iterate over all
> packed and loose quarantined objects any for each of them, we determine
> whether their immediate references are all satisfied.

If I am reading the patches correctly, your definition of "satisfied"
is: the referenced object exists already on the receiving side.

But that's subtly different from the current rule, which is: the object
must be reachable from the current ref tips. The invariant that Git has
traditionally tried to maintain (for a repo not to be corrupt) is only
that we have the complete graph of objects reachable from the tips.

If we have an unreachable tree in the object database which references
blobs we don't have, that doesn't make the repository corrupt. And with
the current code, we would not accept a push that references that tree
(unless it also pushes the necessary blobs). But after your patch, we
would, and that would _make_ the repository corrupt.

I will say that:

  1. Modern versions of git-repack and git-prune try to keep even
     unreachable parts of the graph complete (if we are keeping object X
     that refers to Y, then we try to keep Y, too). But I don't know how
     foolproof it is (certainly the traversal we do there is "best
     effort"; if there's a missing reference that exists, we don't
     bail).

  2. This is not the only place that just checks object existence in the
     name of speed. When updating a ref, for example, we only check that
     the tip object exists.

So I suspect it might work OK in practice. But it is a pretty big
loosening of the current rules for pushes, and that makes me nervous.

There's another related change here that is actually a tightening of the
rules. The current code checks that the ref tips proposed by the sender
are valid.  If there are objects in the pack not needed for the ref
update, their connectivity isn't checked (though normal senders would
obviously avoid sending extra objects for no reason). Your "iterate over
all quarantined objects" makes that stricter.

I'm of two minds there:

  1. We could easily keep the original rule by just traversing the
     object graph starting from the ref tips, as we do now, but ending
     the traversal any time we hit an object that we already have
     outside the quarantine.

  2. This tightening is actually important if we want to avoid letting
     people _intentionally_ introduce the unreachable-but-incomplete
     scenario. Without it, an easy denial-of-service corruption against
     a repository you can push to is:

       - push an update to change a ref from X to Y. Include all objects
	 necessary for X..Y, but _also_ include a tree T which points to
	 a missing blob B. This will be accepted by the current rules
	 (but not by your patch).

       - push an update to change the ref from Y to C, where C is a
	 commit whose root tree is T. Your patch allows this (because we
	 already have T in the repository). But the resulting repository
	 is corrupt (the ref now points to an incomplete object graph).

If we wanted to keep the existing rule (requiring that any objects that
sender didn't provide are actually reachable from the current refs),
then we'd want to be able to check reachability quickly. And there I'd
probably turn to reachability bitmaps.

I suspect that "rev-list --use-bitmap-index" in the connectivity check
might help in some cases. Especially when you are looking at the union
of objects reachable from all refs, we can avoid a lot of fill-in
traversal (because if the bitmap for a recent ref includes the object in
an older ref, then we know the older ref is covered, even if it doesn't
have an on-disk bitmap at its tip). But I would not be at all surprised
if there are other slowdowns in the traversal code when you have a lot
of refs (e.g., I think we're pretty eager to parse all of the traversal
tips as part of the setup).

-Peff

Junio C Hamano May 20, 2021, 9:45 p.m. UTC | #3

Jeff King <peff@peff.net> writes:

> If we have an unreachable tree in the object database which references
> blobs we don't have, that doesn't make the repository corrupt. And with
> the current code, we would not accept a push that references that tree
> (unless it also pushes the necessary blobs). But after your patch, we
> would, and that would _make_ the repository corrupt.
>
> I will say that:
>
>   1. Modern versions of git-repack and git-prune try to keep even
>      unreachable parts of the graph complete (if we are keeping object X
>      that refers to Y, then we try to keep Y, too). But I don't know how
>      foolproof it is (certainly the traversal we do there is "best
>      effort"; if there's a missing reference that exists, we don't
>      bail).
>
>   2. This is not the only place that just checks object existence in the
>      name of speed. When updating a ref, for example, we only check that
>      the tip object exists.

There might be already other ways to corrupt repositories, and a
corrupted repository to be left unnoticed, in other words.

But that does not make it OK to add more ways to corrupt
repositories.

>   1. We could easily keep the original rule by just traversing the
>      object graph starting from the ref tips, as we do now, but ending
>      the traversal any time we hit an object that we already have
>      outside the quarantine.
>
>   2. This tightening is actually important if we want to avoid letting
>      people _intentionally_ introduce the unreachable-but-incomplete
>      scenario. Without it, an easy denial-of-service corruption against
>      a repository you can push to is:
>
>        - push an update to change a ref from X to Y. Include all objects
> 	 necessary for X..Y, but _also_ include a tree T which points to
> 	 a missing blob B. This will be accepted by the current rules
> 	 (but not by your patch).
>
>        - push an update to change the ref from Y to C, where C is a
> 	 commit whose root tree is T. Your patch allows this (because we
> 	 already have T in the repository). But the resulting repository
> 	 is corrupt (the ref now points to an incomplete object graph).

Hmph, the last step of that attack would not work with our current
check; is this the same new hole the series brings in as you
explained earlier for a case where a newly pushed tree/commit starts
to reference a left-over dangling tree already in the repository
whose content blobs are missing?

> If we wanted to keep the existing rule (requiring that any objects that
> sender didn't provide are actually reachable from the current refs),
> then we'd want to be able to check reachability quickly. And there I'd
> probably turn to reachability bitmaps.

True.  As we are not "performance is king---a code that corrupts
repositories as quickly as possible is an improvement" kind of
project, we should keep the existing "an object can become part of
DAG referred by ref tips only when the objects it refers to all
exist in the object store, because we want to keep the invariant: an
object that is reachable from a ref is guaranteed to have everything
reachable from it in the object store" rule, and find a way to make
it fast to enforce that rule somehow.

Thank for a review.

Jeff King May 21, 2021, 9:30 a.m. UTC | #4

On Fri, May 21, 2021 at 06:45:02AM +0900, Junio C Hamano wrote:

> >   2. This tightening is actually important if we want to avoid letting
> >      people _intentionally_ introduce the unreachable-but-incomplete
> >      scenario. Without it, an easy denial-of-service corruption against
> >      a repository you can push to is:
> >
> >        - push an update to change a ref from X to Y. Include all objects
> > 	 necessary for X..Y, but _also_ include a tree T which points to
> > 	 a missing blob B. This will be accepted by the current rules
> > 	 (but not by your patch).
> >
> >        - push an update to change the ref from Y to C, where C is a
> > 	 commit whose root tree is T. Your patch allows this (because we
> > 	 already have T in the repository). But the resulting repository
> > 	 is corrupt (the ref now points to an incomplete object graph).
> 
> Hmph, the last step of that attack would not work with our current
> check; is this the same new hole the series brings in as you
> explained earlier for a case where a newly pushed tree/commit starts
> to reference a left-over dangling tree already in the repository
> whose content blobs are missing?

Right. The current state is immune to this attack; the rule is "refs
must be complete, but nothing else has to be". The state after Patrick's
series is (I think) likewise immune; the rule is "all objects must be
complete".

I'm not sure if that was intentional, or a lucky save because the series
tries to optimize both parts (both which objects we decide to check, as
well as which we consider we "already have"). But it's quite subtle. :)

> > If we wanted to keep the existing rule (requiring that any objects that
> > sender didn't provide are actually reachable from the current refs),
> > then we'd want to be able to check reachability quickly. And there I'd
> > probably turn to reachability bitmaps.
> 
> True.  As we are not "performance is king---a code that corrupts
> repositories as quickly as possible is an improvement" kind of
> project, we should keep the existing "an object can become part of
> DAG referred by ref tips only when the objects it refers to all
> exist in the object store, because we want to keep the invariant: an
> object that is reachable from a ref is guaranteed to have everything
> reachable from it in the object store" rule, and find a way to make
> it fast to enforce that rule somehow.

That's my feeling, too. A rule of "you are not allowed to introduce
objects which directly reference a missing object" would be likewise
correct, if consistently enforced. But it makes me nervous to switch
from one to the other, especially in just one place.

And I guess in that sense, this series isn't immune as I said earlier.
It looks like a push from a shallow clone would use the old "reachable
from refs" check even after this series.

-Peff

Patrick Steinhardt May 21, 2021, 9:42 a.m. UTC | #5

On Thu, May 20, 2021 at 12:50:24PM -0400, Jeff King wrote:
> On Wed, May 19, 2021 at 09:13:18PM +0200, Patrick Steinhardt wrote:
> 
> > One of the issues I've found is the object connectivity check, which may
> > run for a significant amount of time. The root cause here is that we're
> > computing connectivity via `git rev-list --not --all`: if we've got many
> > refs in the repository, computing `--not --all` is hugely expensive.
> > 
> > This commit series thus implements an alternative way of computing
> > reachability, which reuses information from the object quarantine
> > environment. Instead of doing a refwalk, we just look iterate over all
> > packed and loose quarantined objects any for each of them, we determine
> > whether their immediate references are all satisfied.
> 
> If I am reading the patches correctly, your definition of "satisfied"
> is: the referenced object exists already on the receiving side.

Indeed. Each referenced object must either be in the quarantine
directory or an already existing one in the main repo. If it's
quarantined, then we'll eventually check it, too, so we don't need to
traverse any further than immediate parents.

> But that's subtly different from the current rule, which is: the object
> must be reachable from the current ref tips. The invariant that Git has
> traditionally tried to maintain (for a repo not to be corrupt) is only
> that we have the complete graph of objects reachable from the tips.
> 
> If we have an unreachable tree in the object database which references
> blobs we don't have, that doesn't make the repository corrupt. And with
> the current code, we would not accept a push that references that tree
> (unless it also pushes the necessary blobs). But after your patch, we
> would, and that would _make_ the repository corrupt.

Right. The assumption basically is that if it's part of the repository's
ODB already, then it was checked at some point before and should be
fully connected. Also, we typically wouldn't delete any objects which
are referenced by anything else, so either we delete the unreachable
tree and its referenced blob together, or none of them.

> I will say that:
> 
>   1. Modern versions of git-repack and git-prune try to keep even
>      unreachable parts of the graph complete (if we are keeping object X
>      that refers to Y, then we try to keep Y, too). But I don't know how
>      foolproof it is (certainly the traversal we do there is "best
>      effort"; if there's a missing reference that exists, we don't
>      bail).
> 
>   2. This is not the only place that just checks object existence in the
>      name of speed. When updating a ref, for example, we only check that
>      the tip object exists.
> 
> So I suspect it might work OK in practice. But it is a pretty big
> loosening of the current rules for pushes, and that makes me nervous.
>
> There's another related change here that is actually a tightening of the
> rules. The current code checks that the ref tips proposed by the sender
> are valid.  If there are objects in the pack not needed for the ref
> update, their connectivity isn't checked (though normal senders would
> obviously avoid sending extra objects for no reason). Your "iterate over
> all quarantined objects" makes that stricter.
> 
> I'm of two minds there:
> 
>   1. We could easily keep the original rule by just traversing the
>      object graph starting from the ref tips, as we do now, but ending
>      the traversal any time we hit an object that we already have
>      outside the quarantine.

This should be doable, although it's a bit more complex compared to my
current proposal. As far as I know we don't have any APIs to say "Please
look up objects in this object directory, only", but we'd need to do
that or otherwise we're not able to tell where an object was looked up
from.

Alternatively, we can implement this via a combination of both
approaches: first, we enumerate all quarantined objects and mark them
so. Second, we do a refwalk on all tips and stop whenever we find a
non-quarantined object.

My guess is that this is still going to be a lot faster compared to the
current implementation, at least when updating existing branches. But in
the case where we're pushing into an empty repo (or when we're pushing
entirely unrelated history), we now essentially have to iterate over all
objects twice. I'm sure this is going to be noticable performance wise.

>   2. This tightening is actually important if we want to avoid letting
>      people _intentionally_ introduce the unreachable-but-incomplete
>      scenario. Without it, an easy denial-of-service corruption against
>      a repository you can push to is:
> 
>        - push an update to change a ref from X to Y. Include all objects
> 	 necessary for X..Y, but _also_ include a tree T which points to
> 	 a missing blob B. This will be accepted by the current rules
> 	 (but not by your patch).
> 
>        - push an update to change the ref from Y to C, where C is a
> 	 commit whose root tree is T. Your patch allows this (because we
> 	 already have T in the repository). But the resulting repository
> 	 is corrupt (the ref now points to an incomplete object graph).

So with my patch we catch it right when it's being introduced, while
right now we only detect it as soon as somebody actually wants to start
using it. Depending on one's viewpoint, one could probably argue that
the repository is already corrupt as soon as we accept the tree into our
repo because we're missing objects. But except for git-fsck(1), one
wouldn't typically notice.

This does surface a second issue: right now, if receive.fsckObjects is
set to `false` (which is the default), we'd also happily accept objects
into the repo which aren't even parseable in case they're not
referenced. This is another form of corruption we allow with the current
implementation, but which gets detected by the new check.

> If we wanted to keep the existing rule (requiring that any objects that
> sender didn't provide are actually reachable from the current refs),
> then we'd want to be able to check reachability quickly. And there I'd
> probably turn to reachability bitmaps.

Honestly, at GitLab we've only had problems with reachability bitmaps to
determine newly pushed objects. We tried to use them to detect all newly
pushed blobs which are below a certain size (to detect LFS pointers, we
had this discussion before). But computing the set of new objects with
bitmaps enabled was orders of magnitudes slower for some repos than with
bitmaps disabled and has caused quite some pain already.

One of these edge cases I've fixed with negative tags causing a complete
graph walk (540cdc11ad (pack-bitmap: avoid traversal of objects
referenced by uninteresting tag, 2021-03-22)), but seemingly there are
more. I haven't yet found the time to dig into this though.

> I suspect that "rev-list --use-bitmap-index" in the connectivity check
> might help in some cases. Especially when you are looking at the union
> of objects reachable from all refs, we can avoid a lot of fill-in
> traversal (because if the bitmap for a recent ref includes the object in
> an older ref, then we know the older ref is covered, even if it doesn't
> have an on-disk bitmap at its tip). But I would not be at all surprised
> if there are other slowdowns in the traversal code when you have a lot
> of refs (e.g., I think we're pretty eager to parse all of the traversal
> tips as part of the setup).

Does this help if you've got hundreds of thousands of refs though? As I
said, one of the benchmarks I'm using is a repo with 880k+ references.
And lots of objects referenced by them will never be covered by bitmaps
in the first place because they're not part of the usual user-visible
references but part of an alternate.

Patrick

Ævar Arnfjörð Bjarmason May 21, 2021, 11:20 a.m. UTC | #6

On Thu, May 20 2021, Jeff King wrote:

> On Wed, May 19, 2021 at 09:13:18PM +0200, Patrick Steinhardt wrote:
>
>> One of the issues I've found is the object connectivity check, which may
>> run for a significant amount of time. The root cause here is that we're
>> computing connectivity via `git rev-list --not --all`: if we've got many
>> refs in the repository, computing `--not --all` is hugely expensive.
>> 
>> This commit series thus implements an alternative way of computing
>> reachability, which reuses information from the object quarantine
>> environment. Instead of doing a refwalk, we just look iterate over all
>> packed and loose quarantined objects any for each of them, we determine
>> whether their immediate references are all satisfied.
>
> If I am reading the patches correctly, your definition of "satisfied"
> is: the referenced object exists already on the receiving side.
>
> But that's subtly different from the current rule, which is: the object
> must be reachable from the current ref tips. The invariant that Git has
> traditionally tried to maintain (for a repo not to be corrupt) is only
> that we have the complete graph of objects reachable from the tips.
>
> If we have an unreachable tree in the object database which references
> blobs we don't have, that doesn't make the repository corrupt. And with
> the current code, we would not accept a push that references that tree
> (unless it also pushes the necessary blobs). But after your patch, we
> would, and that would _make_ the repository corrupt.
>
> I will say that:
>
>   1. Modern versions of git-repack and git-prune try to keep even
>      unreachable parts of the graph complete (if we are keeping object X
>      that refers to Y, then we try to keep Y, too). But I don't know how
>      foolproof it is (certainly the traversal we do there is "best
>      effort"; if there's a missing reference that exists, we don't
>      bail).
>
>   2. This is not the only place that just checks object existence in the
>      name of speed. When updating a ref, for example, we only check that
>      the tip object exists.

Hopefull you mean "when we update a ref locally", i.e. update-ref, not
receive-pack.

I think that's fine, and we should consider these corruption detections
to have two different classes, there's the local update-ref, mktag
etc. which typically only do skin-deep checking, and the full check we
want to do in receive-pack and other transport.fsckObjects.

It's fine in practice for the "local" case to be fast and loose, but
when you're accepting foreign objects over the network we should always
be as paranoid as possible, both to prevent accidental corruption and
deliberate attack.

None of that goes against what you're saying, just a bit of an
elaboration.

> [...]
> There's another related change here that is actually a tightening of the
> rules. The current code checks that the ref tips proposed by the sender
> are valid.  If there are objects in the pack not needed for the ref
> update, their connectivity isn't checked (though normal senders would
> obviously avoid sending extra objects for no reason). Your "iterate over
> all quarantined objects" makes that stricter.
>
> I'm of two minds there:
>
>   1. We could easily keep the original rule by just traversing the
>      object graph starting from the ref tips, as we do now, but ending
>      the traversal any time we hit an object that we already have
>      outside the quarantine.
>
>   2. This tightening is actually important if we want to avoid letting
>      people _intentionally_ introduce the unreachable-but-incomplete
>      scenario. Without it, an easy denial-of-service corruption against
>      a repository you can push to is:
>
>        - push an update to change a ref from X to Y. Include all objects
> 	 necessary for X..Y, but _also_ include a tree T which points to
> 	 a missing blob B. This will be accepted by the current rules
> 	 (but not by your patch).
>
>        - push an update to change the ref from Y to C, where C is a
> 	 commit whose root tree is T. Your patch allows this (because we
> 	 already have T in the repository). But the resulting repository
> 	 is corrupt (the ref now points to an incomplete object graph).

We should also consider not closing the door to some future
optimizations and features by being overly strict with #2 here. Maybe
I've misunderstood things, but I think tightening it would prevent
things like:

 A. I'm pushing a ref update for X..Y, the server is at X, but I happen
    to have a pack (e.g. from an earlier pull) that contains objects
    from W..Y. The server doesn't need W..X, but I just sent the whole
    W..Y set over saying "please update to Y".

 B. I got halfway with patches to make clients aid servers with
    server-side corruption of objects (the root cause was some NFS
    shenanigans + our non-fsync()-ing). A server would have an empty
    loose object, to recover I needed to manually scp it from a
    client->server. This happened a few times at odd hours.

    With the not-accepted core.checkCollisions patch I hacked up for
    related reasons[1] I found that we were actually quite close to
    learning a mode on the server-side where we'd just blindly accept
    such objects (the client would also need to learn to do a hail-mary
    push).

    Strictly speaking we could support such a recovery mode while still
    having the #2 under discussion here (only accepting such objects if
    our own repo is corrupt), but I thought it was rather neat that it
    would naturally fall out of the general rule that we didn't care
    about "redundant" objects + my tweaks to make the "there's a
    collision" check less anal (in that case it was a false alarm, our
    local object was corrupt, but not the one the remote end tried to
    send).

1. https://lore.kernel.org/git/20181113201910.11518-1-avarab@gmail.com/

[0/8] Speed up connectivity checks via quarantine dir

Message

Comments