diff mbox series

[v2,5/8] Documentation: add Packfile URIs design doc

Message ID 5ce56844d3fb740e29d2f3d4be2ade0b2ad5f7fd.1552073690.git.jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series CDN offloading of fetch response | expand

Commit Message

Jonathan Tan March 8, 2019, 9:55 p.m. UTC
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/technical/packfile-uri.txt | 78 ++++++++++++++++++++++++
 Documentation/technical/protocol-v2.txt  | 28 ++++++++-
 2 files changed, 105 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/technical/packfile-uri.txt

Comments

Jeff King April 23, 2019, 5:31 a.m. UTC | #1
On Fri, Mar 08, 2019 at 01:55:17PM -0800, Jonathan Tan wrote:

> +If the 'packfile-uris' feature is advertised, the following argument
> +can be included in the client's request as well as the potential
> +addition of the 'packfile-uris' section in the server's response as
> +explained below.
> +
> +    packfile-uris <comma-separated list of protocols>
> +	Indicates to the server that the client is willing to receive
> +	URIs of any of the given protocols in place of objects in the
> +	sent packfile. Before performing the connectivity check, the
> +	client should download from all given URIs. Currently, the
> +	protocols supported are "http" and "https".

This negotiation seems backwards to me, because it puts too much power
in the hands of the server.

The server says "OK, I support this feature". Then the client says "I
support it, too. But only if you like these protocols". And then the
server dumps a bunch of URIs and expects the client to respect them.

The problem I see is that the client doesn't get to vet the list of
URIs; it only gets to specify a protocol match. But there are many other
reasons it might want to reject a URI: we don't like the protocol, the
domain name is on a blacklist (or not on a whitelist), the domain name
can't resolve, we can't make a TCP connection to the server, we can't
successfully fetch the pack.

You'll note that those rise in complexity and time as you go down the
list. I'm not sure where on that spectrum we'd want our clients to stop
vetting (and it may even depend on config). But I think we ought to
design the protocol to put the decision in the hands of the client so
that it _can_ make those choices itself.

I.e., I think the conversation ought to be more like:

  Server: I support packfile-uris X, Y, Z.

  Client: Great. I'll use URIs X and Z.

  Server: OK, here's your pack, minus any objects I know are in X and Z.
          I'll send you the objects from Y as normal.

And then the client is free to pick and choose. The initial server uri
list can come in the capabilities list, or it can be a separate request
once the client sees the server supports packfile-uris and wants to ask
about them. We may need some way for the server to group the uris so
that the client knows which ones are alternates of each other (and which
ones are needed to make a complete set).

-Peff
Jonathan Tan April 23, 2019, 8:38 p.m. UTC | #2
> The problem I see is that the client doesn't get to vet the list of
> URIs; it only gets to specify a protocol match. But there are many other
> reasons it might want to reject a URI: we don't like the protocol, the
> domain name is on a blacklist (or not on a whitelist), the domain name
> can't resolve, we can't make a TCP connection to the server, we can't
> successfully fetch the pack.
> 
> You'll note that those rise in complexity and time as you go down the
> list. I'm not sure where on that spectrum we'd want our clients to stop
> vetting (and it may even depend on config). But I think we ought to
> design the protocol to put the decision in the hands of the client so
> that it _can_ make those choices itself.
> 
> I.e., I think the conversation ought to be more like:
> 
>   Server: I support packfile-uris X, Y, Z.
> 
>   Client: Great. I'll use URIs X and Z.
> 
>   Server: OK, here's your pack, minus any objects I know are in X and Z.
>           I'll send you the objects from Y as normal.
> 
> And then the client is free to pick and choose.

One drawback I see is that the server needs to compute objects to be
sent twice - once to generate the URIs and once after the client has
informed the server which URIs it wants.

If we expect some packfile-uris to not be usable sometimes (for any of
the reasons you listed), this would be nice. The protocol in my patches
support a rudimentary version of this (if any of the URIs don't work for
any reason, just fetch again without advertising that we support
packfile URIs) but the packfile received during the first patch is
wasted.

So the tradeoff is: in the good case, your suggestion means that we make
another fetch request, increasing the load on the server and taking more
time. In the bad case, your suggestions means that we avoid sending a
useless packfile upon the first patch, and if the server is smart
enough, even the second packfile will be smaller. It depends on how
often we think the bad case occurs, if servers will typically send more
than one packfile-uri, and how smart we think servers will typically be.

> The initial server uri
> list can come in the capabilities list, or it can be a separate request
> once the client sees the server supports packfile-uris and wants to ask
> about them.

I don't think this is possible - the URI list is dependent on the wants
and haves.

> We may need some way for the server to group the uris so
> that the client knows which ones are alternates of each other (and which
> ones are needed to make a complete set).

My initial design didn't have this feature (in fact, it seems to me that
each thing should have one canonical URL, which means that there is no
need for an alternate) - do you think we should be thinking about this
at this stage?
Jonathan Nieder April 23, 2019, 10:11 p.m. UTC | #3
Hi,

Jeff King wrote:
> On Fri, Mar 08, 2019 at 01:55:17PM -0800, Jonathan Tan wrote:

>> +If the 'packfile-uris' feature is advertised, the following argument
>> +can be included in the client's request as well as the potential
>> +addition of the 'packfile-uris' section in the server's response as
>> +explained below.
>> +
>> +    packfile-uris <comma-separated list of protocols>
>> +	Indicates to the server that the client is willing to receive
>> +	URIs of any of the given protocols in place of objects in the
>> +	sent packfile. Before performing the connectivity check, the
>> +	client should download from all given URIs. Currently, the
>> +	protocols supported are "http" and "https".
>
> This negotiation seems backwards to me, because it puts too much power
> in the hands of the server.

Thanks.  Forgive me if this was covered earlier in the conversation, but
why do we need more than one protocol at all here?  Can we restrict this
to only-https, all the time?

[...]
> The problem I see is that the client doesn't get to vet the list of
> URIs; it only gets to specify a protocol match. But there are many other
> reasons it might want to reject a URI: we don't like the protocol, the
> domain name is on a blacklist (or not on a whitelist), the domain name
> can't resolve, we can't make a TCP connection to the server, we can't
> successfully fetch the pack.

Christian mentioned this desire to vet URIs before, and I'll admit I
found it hard to imagine a use case.  Why can't it work like e.g.
<frame> on the web, where if you don't like that domain, then you
don't get to access the page?  From a server operator's point of view,
if you want to support a second URI that more clients support, why
wouldn't you just always use that second URI instead of making clients
choose?

Thanks and hope that helps,
Jonathan
Ævar Arnfjörð Bjarmason April 23, 2019, 10:18 p.m. UTC | #4
On Tue, Apr 23 2019, Jonathan Tan wrote:

>> The problem I see is that the client doesn't get to vet the list of
>> URIs; it only gets to specify a protocol match. But there are many other
>> reasons it might want to reject a URI: we don't like the protocol, the
>> domain name is on a blacklist (or not on a whitelist), the domain name
>> can't resolve, we can't make a TCP connection to the server, we can't
>> successfully fetch the pack.
>>
>> You'll note that those rise in complexity and time as you go down the
>> list. I'm not sure where on that spectrum we'd want our clients to stop
>> vetting (and it may even depend on config). But I think we ought to
>> design the protocol to put the decision in the hands of the client so
>> that it _can_ make those choices itself.
>>
>> I.e., I think the conversation ought to be more like:
>>
>>   Server: I support packfile-uris X, Y, Z.
>>
>>   Client: Great. I'll use URIs X and Z.
>>
>>   Server: OK, here's your pack, minus any objects I know are in X and Z.
>>           I'll send you the objects from Y as normal.
>>
>> And then the client is free to pick and choose.
>
> One drawback I see is that the server needs to compute objects to be
> sent twice - once to generate the URIs and once after the client has
> informed the server which URIs it wants.
>
> If we expect some packfile-uris to not be usable sometimes (for any of
> the reasons you listed), this would be nice. The protocol in my patches
> support a rudimentary version of this (if any of the URIs don't work for
> any reason, just fetch again without advertising that we support
> packfile URIs) but the packfile received during the first patch is
> wasted.

This is really orthagonal to this series, but wouldn't a better
resumption strategy here be to walk the pack we just downloaded, run the
equivalent of 'commit-graph write' on it to figure out likely "tip"
commits, and use those in "have" lines to negotiate with the server the
next time around?

I've sometimes wished we optionally had that sort of fetch algorithm, in
particular now setting "alternates" on a freshly init-ed repo will do
the full initial fetch even though we have most/all of the objects in
the alternates now.

> So the tradeoff is: in the good case, your suggestion means that we make
> another fetch request, increasing the load on the server and taking more
> time. In the bad case, your suggestions means that we avoid sending a
> useless packfile upon the first patch, and if the server is smart
> enough, even the second packfile will be smaller. It depends on how
> often we think the bad case occurs, if servers will typically send more
> than one packfile-uri, and how smart we think servers will typically be.
>
>> The initial server uri
>> list can come in the capabilities list, or it can be a separate request
>> once the client sees the server supports packfile-uris and wants to ask
>> about them.
>
> I don't think this is possible - the URI list is dependent on the wants
> and haves.
>
>> We may need some way for the server to group the uris so
>> that the client knows which ones are alternates of each other (and which
>> ones are needed to make a complete set).
>
> My initial design didn't have this feature (in fact, it seems to me that
> each thing should have one canonical URL, which means that there is no
> need for an alternate) - do you think we should be thinking about this
> at this stage?
Jonathan Nieder April 23, 2019, 10:22 p.m. UTC | #5
Hi,

Ævar Arnfjörð Bjarmason wrote:

> This is really orthagonal to this series, but wouldn't a better
> resumption strategy here be to walk the pack we just downloaded, run the
> equivalent of 'commit-graph write' on it to figure out likely "tip"
> commits, and use those in "have" lines to negotiate with the server the
> next time around?

Do you mean this for when a pack is self-contained and contains all
objects reachable from those "tip" commits?

What would you do when a pack is not self-contained in that way?

Thanks,
Jonathan
Ævar Arnfjörð Bjarmason April 23, 2019, 10:25 p.m. UTC | #6
On Wed, Apr 24 2019, Jonathan Nieder wrote:

> Hi,
>
> Jeff King wrote:
>> On Fri, Mar 08, 2019 at 01:55:17PM -0800, Jonathan Tan wrote:
>
>>> +If the 'packfile-uris' feature is advertised, the following argument
>>> +can be included in the client's request as well as the potential
>>> +addition of the 'packfile-uris' section in the server's response as
>>> +explained below.
>>> +
>>> +    packfile-uris <comma-separated list of protocols>
>>> +	Indicates to the server that the client is willing to receive
>>> +	URIs of any of the given protocols in place of objects in the
>>> +	sent packfile. Before performing the connectivity check, the
>>> +	client should download from all given URIs. Currently, the
>>> +	protocols supported are "http" and "https".
>>
>> This negotiation seems backwards to me, because it puts too much power
>> in the hands of the server.
>
> Thanks.  Forgive me if this was covered earlier in the conversation, but
> why do we need more than one protocol at all here?  Can we restrict this
> to only-https, all the time?

There was this in an earlier discussion about this:
https://public-inbox.org/git/877eds5fpl.fsf@evledraar.gmail.com/

It seems arbitrary to break it for new features if we support http in
general, especially with a design as it is now where the checksum of the
pack is transmitted out-of-band.

> [...]
>> The problem I see is that the client doesn't get to vet the list of
>> URIs; it only gets to specify a protocol match. But there are many other
>> reasons it might want to reject a URI: we don't like the protocol, the
>> domain name is on a blacklist (or not on a whitelist), the domain name
>> can't resolve, we can't make a TCP connection to the server, we can't
>> successfully fetch the pack.
>
> Christian mentioned this desire to vet URIs before, and I'll admit I
> found it hard to imagine a use case.  Why can't it work like e.g.
> <frame> on the web, where if you don't like that domain, then you
> don't get to access the page?  From a server operator's point of view,
> if you want to support a second URI that more clients support, why
> wouldn't you just always use that second URI instead of making clients
> choose?
>
> Thanks and hope that helps,
> Jonathan
Ævar Arnfjörð Bjarmason April 23, 2019, 10:30 p.m. UTC | #7
On Wed, Apr 24 2019, Jonathan Nieder wrote:

> Hi,
>
> Ævar Arnfjörð Bjarmason wrote:
>
>> This is really orthagonal to this series, but wouldn't a better
>> resumption strategy here be to walk the pack we just downloaded, run the
>> equivalent of 'commit-graph write' on it to figure out likely "tip"
>> commits, and use those in "have" lines to negotiate with the server the
>> next time around?
>
> Do you mean this for when a pack is self-contained and contains all
> objects reachable from those "tip" commits?
>
> What would you do when a pack is not self-contained in that way?

Indeed, it had been a while since I read the first version of this. I
was assuming a "base pack" use-case, but it seems it's narrowly isolated
to just "N packs each containing one big blob", right?
Jonathan Nieder April 23, 2019, 10:48 p.m. UTC | #8
Ævar Arnfjörð Bjarmason wrote:
> On Wed, Apr 24 2019, Jonathan Nieder wrote:
>> Jeff King wrote:
>>> On Fri, Mar 08, 2019 at 01:55:17PM -0800, Jonathan Tan wrote:

>>>> +If the 'packfile-uris' feature is advertised, the following argument
>>>> +can be included in the client's request as well as the potential
>>>> +addition of the 'packfile-uris' section in the server's response as
>>>> +explained below.
>>>> +
>>>> +    packfile-uris <comma-separated list of protocols>
>>>> +	Indicates to the server that the client is willing to receive
>>>> +	URIs of any of the given protocols in place of objects in the
>>>> +	sent packfile. Before performing the connectivity check, the
>>>> +	client should download from all given URIs. Currently, the
>>>> +	protocols supported are "http" and "https".
>>>
>>> This negotiation seems backwards to me, because it puts too much power
>>> in the hands of the server.
>>
>> Thanks.  Forgive me if this was covered earlier in the conversation, but
>> why do we need more than one protocol at all here?  Can we restrict this
>> to only-https, all the time?
>
> There was this in an earlier discussion about this:
> https://public-inbox.org/git/877eds5fpl.fsf@evledraar.gmail.com/
>
> It seems arbitrary to break it for new features if we support http in
> general, especially with a design as it is now where the checksum of the
> pack is transmitted out-of-band.

Thanks for the pointer.  TLS provides privacy, too, but I can see why
in today's world it might not always be easy to set it up, and given
that we have integrity protection via that checksum, I can see why
some people might have a legitimate need for using plain "http" here.

We may also want to support packfile-uris using SSH protocol in the
future.  Might as well figure out how the protocol negotiation works
now.  So let's delve more into it:

Peff mentioned that it feels backwards for the client to specify what
protocols they support in the request, instead of the server
specifying them upfront in the capability advertisement.  I'm inclined
to agree: it's probably reasonable to put this in server capabilities
instead.  That would even allow the client to do something like

	This server only supports HTTP without TLS, which you have
	indicated is a condition in which you want to be prompted.
	Proceed?

	[Use HTTP packfiles]  [Use slower but safer inline packs]

Peff also asked whether protocol scheme is the right granularity:
should the server list what domains they can serve packfiles from
instead?  In other words, once you're doing it for protocol schemes,
why not do it for whole URIs too?  I'm grateful for the question since
it's a way to probe at design assumptions.

- protocol schemes are likely to be low in number because each has its
  own code path to handle it.  By comparison, domains or URIs may be
  too numerous to be something we want to jam into the capability
  advertisement.  (Or the server operator could always use the same
  domain as the Git repo, and then use a 302 to redirect to the CDN.
  I suspect this is likely to be a common setup anyway: it allows the
  Git server to generate a short-lived signed URL that it uses as the
  target of a 302.  But in this case, what is the point of a domain
  whitelist?)

- relatedly, because the list of protocol schemes is small, it is
  feasible to test client behavior with each subset of protocol
  schemes enabled.  Finer-grained filtering would mean more esoteric
  client configurations for server operators to support and debug.

- supported protocol schemes do not vary per request.  The actual
  packfile URI is dynamic and varies per request

- separately from questions of preference or security policy,
  clients may have support for a limited subset of protocol schemes.
  For example, imagine a stripped-down client without SSH support.
  So we need a way to agree about this capability anyway.

So I suspect that, at least to start, protocol scheme negotiation
should be enough and we don't need full URI negotiation.

There are a few escape valves:

- affected clients can complain to the server operator, who will then
  reconfigure the server to use more appropriate packfile URIs

- if there is a need for different clients to use different packfile
  URIs, clients can pass a flag, using --server-option, to the server
  to help it choose.

- a client can disable support for packfile URIs on a particular
  request and fall back to inline packs.

- if and when an affected client materializes, they can help us
  improve the protocol to handle their needs.

Sensible?

Thanks,
Jonathan
Jonathan Nieder April 23, 2019, 10:51 p.m. UTC | #9
Ævar Arnfjörð Bjarmason wrote:
> On Wed, Apr 24 2019, Jonathan Nieder wrote:

>> Do you mean this for when a pack is self-contained and contains all
>> objects reachable from those "tip" commits?
>>
>> What would you do when a pack is not self-contained in that way?
>
> Indeed, it had been a while since I read the first version of this. I
> was assuming a "base pack" use-case, but it seems it's narrowly isolated
> to just "N packs each containing one big blob", right?

The demo in this patch series covers the single isolated blob case.
The protocol supports the "base pack" use case but many others as
well:

* daily "catch-up fetch" packs
* "base pack without blobs"
* ... etc ...

Thanks,
Jonathan
Junio C Hamano April 24, 2019, 3:01 a.m. UTC | #10
Jeff King <peff@peff.net> writes:

> I.e., I think the conversation ought to be more like:
>
>   Server: I support packfile-uris X, Y, Z.
>
>   Client: Great. I'll use URIs X and Z.
>
>   Server: OK, here's your pack, minus any objects I know are in X and Z.
>           I'll send you the objects from Y as normal.

I agree with the overall direction, but I am afraid that the server
advertisement may have to become a lot more complex.

For example, are you assuming in the above example that X, Y and Z
are not overlapping?

Or perhaps X is the whole repository that is quick to access only
from European clients, while Y and Z combined have the same set of
objects that are meant for our Asian friends.  IOW, the server may
expect that the client would say one of three "there is nothing I
can use then", "I am happy with X and won't use Y and Z" or "I'll
use Y and Z then, without X", and no other combinations like "I'll
use X and Z" may make sense.

> And then the client is free to pick and choose. The initial server uri
> list can come in the capabilities list, or it can be a separate request
> once the client sees the server supports packfile-uris and wants to ask
> about them. We may need some way for the server to group the uris so
> that the client knows which ones are alternates of each other (and which
> ones are needed to make a complete set).

I guess what I am saying is that it is not so clear how we can
present the server offered URIs to the client in such a way that the
client (either mechanically, or consulting the human) can make a
useful choice,
Ævar Arnfjörð Bjarmason April 24, 2019, 7:48 a.m. UTC | #11
On Wed, Apr 24 2019, Jonathan Nieder wrote:

> Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Apr 24 2019, Jonathan Nieder wrote:
>>> Jeff King wrote:
>>>> On Fri, Mar 08, 2019 at 01:55:17PM -0800, Jonathan Tan wrote:
>
>>>>> +If the 'packfile-uris' feature is advertised, the following argument
>>>>> +can be included in the client's request as well as the potential
>>>>> +addition of the 'packfile-uris' section in the server's response as
>>>>> +explained below.
>>>>> +
>>>>> +    packfile-uris <comma-separated list of protocols>
>>>>> +	Indicates to the server that the client is willing to receive
>>>>> +	URIs of any of the given protocols in place of objects in the
>>>>> +	sent packfile. Before performing the connectivity check, the
>>>>> +	client should download from all given URIs. Currently, the
>>>>> +	protocols supported are "http" and "https".
>>>>
>>>> This negotiation seems backwards to me, because it puts too much power
>>>> in the hands of the server.
>>>
>>> Thanks.  Forgive me if this was covered earlier in the conversation, but
>>> why do we need more than one protocol at all here?  Can we restrict this
>>> to only-https, all the time?
>>
>> There was this in an earlier discussion about this:
>> https://public-inbox.org/git/877eds5fpl.fsf@evledraar.gmail.com/
>>
>> It seems arbitrary to break it for new features if we support http in
>> general, especially with a design as it is now where the checksum of the
>> pack is transmitted out-of-band.
>
> Thanks for the pointer.  TLS provides privacy, too, but I can see why
> in today's world it might not always be easy to set it up, and given
> that we have integrity protection via that checksum, I can see why
> some people might have a legitimate need for using plain "http" here.
>
> We may also want to support packfile-uris using SSH protocol in the
> future.  Might as well figure out how the protocol negotiation works
> now.  So let's delve more into it:
>
> Peff mentioned that it feels backwards for the client to specify what
> protocols they support in the request, instead of the server
> specifying them upfront in the capability advertisement.  I'm inclined
> to agree: it's probably reasonable to put this in server capabilities
> instead.  That would even allow the client to do something like
>
> 	This server only supports HTTP without TLS, which you have
> 	indicated is a condition in which you want to be prompted.
> 	Proceed?
>
> 	[Use HTTP packfiles]  [Use slower but safer inline packs]
>
> Peff also asked whether protocol scheme is the right granularity:
> should the server list what domains they can serve packfiles from
> instead?  In other words, once you're doing it for protocol schemes,
> why not do it for whole URIs too?  I'm grateful for the question since
> it's a way to probe at design assumptions.
>
> - protocol schemes are likely to be low in number because each has its
>   own code path to handle it.  By comparison, domains or URIs may be
>   too numerous to be something we want to jam into the capability
>   advertisement.  (Or the server operator could always use the same
>   domain as the Git repo, and then use a 302 to redirect to the CDN.
>   I suspect this is likely to be a common setup anyway: it allows the
>   Git server to generate a short-lived signed URL that it uses as the
>   target of a 302.  But in this case, what is the point of a domain
>   whitelist?)
>
> - relatedly, because the list of protocol schemes is small, it is
>   feasible to test client behavior with each subset of protocol
>   schemes enabled.  Finer-grained filtering would mean more esoteric
>   client configurations for server operators to support and debug.
>
> - supported protocol schemes do not vary per request.  The actual
>   packfile URI is dynamic and varies per request
>
> - separately from questions of preference or security policy,
>   clients may have support for a limited subset of protocol schemes.
>   For example, imagine a stripped-down client without SSH support.
>   So we need a way to agree about this capability anyway.
>
> So I suspect that, at least to start, protocol scheme negotiation
> should be enough and we don't need full URI negotiation.
>
> There are a few escape valves:
>
> - affected clients can complain to the server operator, who will then
>   reconfigure the server to use more appropriate packfile URIs
>
> - if there is a need for different clients to use different packfile
>   URIs, clients can pass a flag, using --server-option, to the server
>   to help it choose.
>
> - a client can disable support for packfile URIs on a particular
>   request and fall back to inline packs.
>
> - if and when an affected client materializes, they can help us
>   improve the protocol to handle their needs.
>
> Sensible?

Food for thought: would we consider ssh->https a "downgrade"? I think
"maybe". We're going from whatever custom setting the user has
(e.g. manually approve new hosts) to the CA system.

But I think it would be fine to just only whitelist ssh->https and ban
everything else behind a very scary config option or something, we could
always fleshen out the semantics of upgrade/downgrade/switching later,
and it would IMO suck less than outright banning a protcol we otherwise
support in the design, and which (unlike git://) is something people are
still finding uses for in the wild for non-legacy reasons.
diff mbox series

Patch

diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt
new file mode 100644
index 0000000000..6a5a6440d5
--- /dev/null
+++ b/Documentation/technical/packfile-uri.txt
@@ -0,0 +1,78 @@ 
+Packfile URIs
+=============
+
+This feature allows servers to serve part of their packfile response as URIs.
+This allows server designs that improve scalability in bandwidth and CPU usage
+(for example, by serving some data through a CDN), and (in the future) provides
+some measure of resumability to clients.
+
+This feature is available only in protocol version 2.
+
+Protocol
+--------
+
+The server advertises `packfile-uris`.
+
+If the client then communicates which protocols (HTTPS, etc.) it supports with
+a `packfile-uris` argument, the server MAY send a `packfile-uris` section
+directly before the `packfile` section (right after `wanted-refs` if it is
+sent) containing URIs of any of the given protocols. The URIs point to
+packfiles that use only features that the client has declared that it supports
+(e.g. ofs-delta and thin-pack). See protocol-v2.txt for the documentation of
+this section.
+
+Clients then should understand that the returned packfile could be incomplete,
+and that it needs to download all the given URIs before the fetch or clone is
+complete.
+
+Server design
+-------------
+
+The server can be trivially made compatible with the proposed protocol by
+having it advertise `packfile-uris`, tolerating the client sending
+`packfile-uris`, and never sending any `packfile-uris` section. But we should
+include some sort of non-trivial implementation in the Minimum Viable Product,
+at least so that we can test the client.
+
+This is the implementation: a feature, marked experimental, that allows the
+server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
+<uri>` entries. Whenever the list of objects to be sent is assembled, a blob
+with the given sha1 can be replaced by the given URI. This allows, for example,
+servers to delegate serving of large blobs to CDNs.
+
+Client design
+-------------
+
+While fetching, the client needs to remember the list of URIs and cannot
+declare that the fetch is complete until all URIs have been downloaded as
+packfiles.
+
+The division of work (initial fetch + additional URIs) introduces convenient
+points for resumption of an interrupted clone - such resumption can be done
+after the Minimum Viable Product (see "Future work").
+
+The client can inhibit this feature (i.e. refrain from sending the
+`packfile-uris` parameter) by passing --no-packfile-uris to `git fetch`.
+
+Future work
+-----------
+
+The protocol design allows some evolution of the server and client without any
+need for protocol changes, so only a small-scoped design is included here to
+form the MVP. For example, the following can be done:
+
+ * On the server, a long-running process that takes in entire requests and
+   outputs a list of URIs and the corresponding inclusion and exclusion sets of
+   objects. This allows, e.g., signed URIs to be used and packfiles for common
+   requests to be cached.
+ * On the client, resumption of clone. If a clone is interrupted, information
+   could be recorded in the repository's config and a "clone-resume" command
+   can resume the clone in progress. (Resumption of subsequent fetches is more
+   difficult because that must deal with the user wanting to use the repository
+   even after the fetch was interrupted.)
+
+There are some possible features that will require a change in protocol:
+
+ * Additional HTTP headers (e.g. authentication)
+ * Byte range support
+ * Different file formats referenced by URIs (e.g. raw object)
diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt
index 36239ec7e9..7b63c26ecd 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -323,13 +323,26 @@  included in the client's request:
 	indicating its sideband (1, 2, or 3), and the server may send "0005\2"
 	(a PKT-LINE of sideband 2 with no payload) as a keepalive packet.
 
+If the 'packfile-uris' feature is advertised, the following argument
+can be included in the client's request as well as the potential
+addition of the 'packfile-uris' section in the server's response as
+explained below.
+
+    packfile-uris <comma-separated list of protocols>
+	Indicates to the server that the client is willing to receive
+	URIs of any of the given protocols in place of objects in the
+	sent packfile. Before performing the connectivity check, the
+	client should download from all given URIs. Currently, the
+	protocols supported are "http" and "https".
+
 The response of `fetch` is broken into a number of sections separated by
 delimiter packets (0001), with each section beginning with its section
 header. Most sections are sent only when the packfile is sent.
 
     output = acknowledgements flush-pkt |
 	     [acknowledgments delim-pkt] [shallow-info delim-pkt]
-	     [wanted-refs delim-pkt] packfile flush-pkt
+	     [wanted-refs delim-pkt] [packfile-uris delim-pkt]
+	     packfile flush-pkt
 
     acknowledgments = PKT-LINE("acknowledgments" LF)
 		      (nak | *ack)
@@ -347,6 +360,9 @@  header. Most sections are sent only when the packfile is sent.
 		  *PKT-LINE(wanted-ref LF)
     wanted-ref = obj-id SP refname
 
+    packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
+    packfile-uri = PKT-LINE(40*(HEXDIGIT) SP *%x20-ff LF)
+
     packfile = PKT-LINE("packfile" LF)
 	       *PKT-LINE(%x01-03 *%x00-ff)
 
@@ -418,6 +434,16 @@  header. Most sections are sent only when the packfile is sent.
 	* The server MUST NOT send any refs which were not requested
 	  using 'want-ref' lines.
 
+    packfile-uris section
+	* This section is only included if the client sent
+	  'packfile-uris' and the server has at least one such URI to
+	  send.
+
+	* Always begins with the section header "packfile-uris".
+
+	* For each URI the server sends, it sends a hash of the pack's
+	  contents (as output by git index-pack) followed by the URI.
+
     packfile section
 	* This section is only included if the client has sent 'want'
 	  lines in its request and either requested that no more