[v2,5/8] Documentation: add Packfile URIs design doc

Message ID	5ce56844d3fb740e29d2f3d4be2ade0b2ad5f7fd.1552073690.git.jonathantanmy@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Date: Fri, 8 Mar 2019 13:55:17 -0800 In-Reply-To: <cover.1552073690.git.jonathantanmy@google.com> Message-Id: <5ce56844d3fb740e29d2f3d4be2ade0b2ad5f7fd.1552073690.git.jonathantanmy@google.com> Mime-Version: 1.0 References: <cover.1550963965.git.jonathantanmy@google.com> <cover.1552073690.git.jonathantanmy@google.com> Subject: [PATCH v2 5/8] Documentation: add Packfile URIs design doc From: Jonathan Tan <jonathantanmy@google.com> To: avarab@gmail.com, git@vger.kernel.org Cc: Jonathan Tan <jonathantanmy@google.com>, Junio C Hamano <gitster@pobox.com> Content-Type: text/plain; charset="UTF-8" Sender: git-owner@vger.kernel.org Precedence: bulk
Series	CDN offloading of fetch response \| expand [v2,0/8] CDN offloading of fetch response [v2,1/8] http: use --stdin when getting dumb HTTP pack [v2,2/8] http: improve documentation of http_pack_request [v2,3/8] http-fetch: support fetching packfiles by URL [v2,4/8] Documentation: order protocol v2 sections [v2,5/8] Documentation: add Packfile URIs design doc [v2,6/8] upload-pack: refactor reading of pack-objects out [v2,7/8] fetch-pack: support more than one pack lockfile [v2,8/8] upload-pack: send part of packfile response as uri

diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt new file mode 100644 index 0000000000..6a5a6440d5 --- /dev/null +++ b/Documentation/technical/packfile-uri.txt @@ -0,0 +1,78 @@ +Packfile URIs +============= + +This feature allows servers to serve part of their packfile response as URIs. +This allows server designs that improve scalability in bandwidth and CPU usage +(for example, by serving some data through a CDN), and (in the future) provides +some measure of resumability to clients. + +This feature is available only in protocol version 2. + +Protocol +-------- + +The server advertises `packfile-uris`. + +If the client then communicates which protocols (HTTPS, etc.) it supports with +a `packfile-uris` argument, the server MAY send a `packfile-uris` section +directly before the `packfile` section (right after `wanted-refs` if it is +sent) containing URIs of any of the given protocols. The URIs point to +packfiles that use only features that the client has declared that it supports +(e.g. ofs-delta and thin-pack). See protocol-v2.txt for the documentation of +this section. + +Clients then should understand that the returned packfile could be incomplete, +and that it needs to download all the given URIs before the fetch or clone is +complete. + +Server design +------------- + +The server can be trivially made compatible with the proposed protocol by +having it advertise `packfile-uris`, tolerating the client sending +`packfile-uris`, and never sending any `packfile-uris` section. But we should +include some sort of non-trivial implementation in the Minimum Viable Product, +at least so that we can test the client. + +This is the implementation: a feature, marked experimental, that allows the +server to be configured by one or more `uploadpack.blobPackfileUri=<sha1> +<uri>` entries. Whenever the list of objects to be sent is assembled, a blob +with the given sha1 can be replaced by the given URI. This allows, for example, +servers to delegate serving of large blobs to CDNs. + +Client design +------------- + +While fetching, the client needs to remember the list of URIs and cannot +declare that the fetch is complete until all URIs have been downloaded as +packfiles. + +The division of work (initial fetch + additional URIs) introduces convenient +points for resumption of an interrupted clone - such resumption can be done +after the Minimum Viable Product (see "Future work"). + +The client can inhibit this feature (i.e. refrain from sending the +`packfile-uris` parameter) by passing --no-packfile-uris to `git fetch`. + +Future work +----------- + +The protocol design allows some evolution of the server and client without any +need for protocol changes, so only a small-scoped design is included here to +form the MVP. For example, the following can be done: + + * On the server, a long-running process that takes in entire requests and + outputs a list of URIs and the corresponding inclusion and exclusion sets of + objects. This allows, e.g., signed URIs to be used and packfiles for common + requests to be cached. + * On the client, resumption of clone. If a clone is interrupted, information + could be recorded in the repository's config and a "clone-resume" command + can resume the clone in progress. (Resumption of subsequent fetches is more + difficult because that must deal with the user wanting to use the repository + even after the fetch was interrupted.) + +There are some possible features that will require a change in protocol: + + * Additional HTTP headers (e.g. authentication) + * Byte range support + * Different file formats referenced by URIs (e.g. raw object) diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index 36239ec7e9..7b63c26ecd 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -323,13 +323,26 @@ included in the client's request: indicating its sideband (1, 2, or 3), and the server may send "0005\2" (a PKT-LINE of sideband 2 with no payload) as a keepalive packet. +If the 'packfile-uris' feature is advertised, the following argument +can be included in the client's request as well as the potential +addition of the 'packfile-uris' section in the server's response as +explained below. + + packfile-uris <comma-separated list of protocols> + Indicates to the server that the client is willing to receive + URIs of any of the given protocols in place of objects in the + sent packfile. Before performing the connectivity check, the + client should download from all given URIs. Currently, the + protocols supported are "http" and "https". + The response of `fetch` is broken into a number of sections separated by delimiter packets (0001), with each section beginning with its section header. Most sections are sent only when the packfile is sent. output = acknowledgements flush-pkt | [acknowledgments delim-pkt] [shallow-info delim-pkt] - [wanted-refs delim-pkt] packfile flush-pkt + [wanted-refs delim-pkt] [packfile-uris delim-pkt] + packfile flush-pkt acknowledgments = PKT-LINE("acknowledgments" LF) (nak | *ack) @@ -347,6 +360,9 @@ header. Most sections are sent only when the packfile is sent. *PKT-LINE(wanted-ref LF) wanted-ref = obj-id SP refname + packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri + packfile-uri = PKT-LINE(40*(HEXDIGIT) SP *%x20-ff LF) + packfile = PKT-LINE("packfile" LF) *PKT-LINE(%x01-03 *%x00-ff) @@ -418,6 +434,16 @@ header. Most sections are sent only when the packfile is sent. * The server MUST NOT send any refs which were not requested using 'want-ref' lines. + packfile-uris section + * This section is only included if the client sent + 'packfile-uris' and the server has at least one such URI to + send. + + * Always begins with the section header "packfile-uris". + + * For each URI the server sends, it sends a hash of the pack's + contents (as output by git index-pack) followed by the URI. + packfile section * This section is only included if the client has sent 'want' lines in its request and either requested that no more

[v2,5/8] Documentation: add Packfile URIs design doc

Commit Message

Comments

Patch