mbox series

[v2,0/4] Add proto v2 archive command with HTTP support

Message ID 20180927012455.234876-1-steadmon@google.com (mailing list archive)
Headers show
Series Add proto v2 archive command with HTTP support | expand

Message

Josh Steadmon Sept. 27, 2018, 1:24 a.m. UTC
This is the second version of my series to add a new protocol v2 command
for archiving, with support for HTTP(S).

NEEDSWORK: a server built with this series is not backwards-compatible
with clients that set GIT_PROTOCOL=version=2 or configure
protocol.version=2. The old client will unconditionally send "argument
..." packet lines, which breaks the server's expectations of a
"command=archive" request, while the server's capability advertisement
in turn breaks the clients expectation of either an ACK or NACK.

I've been discussing workarounds for this with Jonathan Nieder, but
please let me know if you have any suggestions for v3 of this series.


Josh Steadmon (4):
  archive: follow test standards around assertions
  archive: use packet_reader for communications
  archive: implement protocol v2 archive command
  archive: allow archive over HTTP(S) with proto v2

 Documentation/technical/protocol-v2.txt | 21 ++++++++-
 builtin/archive.c                       | 58 +++++++++++++++++++------
 builtin/upload-archive.c                | 27 ++++++++++--
 http-backend.c                          | 13 +++++-
 serve.c                                 |  7 +++
 t/t5000-tar-tree.sh                     | 33 +++++++-------
 t/t5701-git-serve.sh                    |  1 +
 transport-helper.c                      |  7 +--
 8 files changed, 130 insertions(+), 37 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  c2e371ad24 archive: follow test standards around assertions
1:  b514184273 ! 2:  a65f73f627 archive: use packet_reader for communications
    @@ -6,7 +6,10 @@
         handling, which will make implementation of protocol v2 support in
         git-archive easier.
     
    +    This refactoring does not change the behavior of "git archive".
    +
         Signed-off-by: Josh Steadmon <steadmon@google.com>
    +    Reviewed-by: Stefan Beller <sbeller@google.com>
     
      diff --git a/builtin/archive.c b/builtin/archive.c
    @@ -42,24 +45,24 @@
     -	if (!buf)
     +	status = packet_reader_read(&reader);
     +
    -+	if (status == PACKET_READ_FLUSH)
    ++	if (status != PACKET_READ_NORMAL || reader.pktlen <= 0)
      		die(_("git archive: expected ACK/NAK, got a flush packet"));
     -	if (strcmp(buf, "ACK")) {
     -		if (starts_with(buf, "NACK "))
     -			die(_("git archive: NACK %s"), buf + 5);
     -		if (starts_with(buf, "ERR "))
     -			die(_("remote error: %s"), buf + 4);
    -+	if (strcmp(reader.buffer, "ACK")) {
    -+		if (starts_with(reader.buffer, "NACK "))
    -+			die(_("git archive: NACK %s"), reader.buffer + 5);
    -+		if (starts_with(reader.buffer, "ERR "))
    -+			die(_("remote error: %s"), reader.buffer + 4);
    ++	if (strcmp(reader.line, "ACK")) {
    ++		if (starts_with(reader.line, "NACK "))
    ++			die(_("git archive: NACK %s"), reader.line + 5);
    ++		if (starts_with(reader.line, "ERR "))
    ++			die(_("remote error: %s"), reader.line + 4);
      		die(_("git archive: protocol error"));
      	}
      
     -	if (packet_read_line(fd[0], NULL))
     +	status = packet_reader_read(&reader);
    -+	if (status != PACKET_READ_FLUSH)
    ++	if (status == PACKET_READ_NORMAL && reader.pktlen > 0)
      		die(_("git archive: expected a flush"));
      
      	/* Now, start reading from fd[0] and spit it out to stdout */
2:  1518c15dc1 < -:  ---------- archive: implement protocol v2 archive command
-:  ---------- > 3:  0a8cc5e331 archive: implement protocol v2 archive command
3:  1b7ad8d8f6 ! 4:  97a1424f32 archive: allow archive over HTTP(S) with proto v2
    @@ -10,16 +10,20 @@
      +++ b/builtin/archive.c
     @@
      		status = packet_reader_read(&reader);
    - 		if (status != PACKET_READ_FLUSH)
    + 		if (status == PACKET_READ_NORMAL && reader.pktlen > 0)
      			die(_("git archive: expected a flush"));
     -	}
     +	} else if (version == protocol_v2 &&
    -+		   starts_with(transport->url, "http"))
    ++		   (starts_with(transport->url, "http://") ||
    ++		    starts_with(transport->url, "https://")))
     +		/*
     +		 * Commands over HTTP require two requests, so there's an
    -+		 * additional server response to parse.
    ++		 * additional server response to parse. We do only basic sanity
    ++		 * checking here that the versions presented match across
    ++		 * requests.
     +		 */
    -+		discover_version(&reader);
    ++		if (version != discover_version(&reader))
    ++			die(_("git archive: received different protocol versions in subsequent requests"));
      
      	/* Now, start reading from fd[0] and spit it out to stdout */
      	rv = recv_sideband("archive", fd[0], 1);
    @@ -40,7 +44,10 @@
      	struct strbuf buf = STRBUF_INIT;
      
     +	if (!strcmp(service_name, "git-upload-archive")) {
    -+		/* git-upload-archive doesn't need --stateless-rpc */
    ++		/*
    ++		 * git-upload-archive doesn't need --stateless-rpc, because it
    ++		 * always handles only a single request.
    ++		 */
     +		argv[1] = ".";
     +		argv[2] = NULL;
     +	}
    @@ -63,10 +70,12 @@
      --- a/transport-helper.c
      +++ b/transport-helper.c
     @@
    + 		strbuf_addf(&cmdbuf, "connect %s\n", name);
      		ret = run_connect(transport, &cmdbuf);
      	} else if (data->stateless_connect &&
    - 		   (get_protocol_version_config() == protocol_v2) &&
    +-		   (get_protocol_version_config() == protocol_v2) &&
     -		   !strcmp("git-upload-pack", name)) {
    ++		   get_protocol_version_config() == protocol_v2 &&
     +		   (!strcmp("git-upload-pack", name) ||
     +		    !strcmp("git-upload-archive", name))) {
      		strbuf_addf(&cmdbuf, "stateless-connect %s\n", name);

Comments

Stefan Beller Sept. 27, 2018, 6:20 p.m. UTC | #1
On Wed, Sep 26, 2018 at 6:25 PM Josh Steadmon <steadmon@google.com> wrote:
>
> This is the second version of my series to add a new protocol v2 command
> for archiving, with support for HTTP(S).
>
> NEEDSWORK: a server built with this series is not backwards-compatible
> with clients that set GIT_PROTOCOL=version=2 or configure
> protocol.version=2. The old client will unconditionally send "argument
> ..." packet lines, which breaks the server's expectations of a
> "command=archive" request,

So if an old client sets protocol to v2, it would only apply that
protocol version
to fetch, not archive, so it would start following a v0 conversation, but
as the protocol version is set, it would be transmitted to the server.
This sounds like a bug in the client?

>  while the server's capability advertisement
> in turn breaks the clients expectation of either an ACK or NACK.

Could a modern client send either another protocol version (3?)
or a special capability along the protocol version ("fixed_archive")

> I've been discussing workarounds for this with Jonathan Nieder, but
> please let me know if you have any suggestions for v3 of this series.

Care to open the discussion to the list? What are the different
approaches, what are the pros/cons?
Jonathan Nieder Sept. 27, 2018, 6:30 p.m. UTC | #2
Stefan Beller wrote:
> On Wed, Sep 26, 2018 at 6:25 PM Josh Steadmon <steadmon@google.com> wrote:

>> I've been discussing workarounds for this with Jonathan Nieder, but
>> please let me know if you have any suggestions for v3 of this series.
>
> Care to open the discussion to the list? What are the different
> approaches, what are the pros/cons?

Do you mean sending video of chatting in the office?

Josh and I discussed that

 1. Clients sending version=2 when they do not, in fact, speak protocol
    v2 for a service is a (serious) bug.  (Separately from this
    series) we should fix it.

 2. That bug is already in the wild, alas.  Fortunately the semantics of
    GIT_PROTOCOL as a list of key/value pairs is well defined.  So we
    have choices of (a) bump version to version=3 (b) pass another
    value 'version=2:yesreallyversion=2' (c) etc.

 3. This is likely to affect push, too.

Thanks and hope that helps,
Jonathan
Josh Steadmon Sept. 27, 2018, 6:30 p.m. UTC | #3
On 2018.09.27 11:20, Stefan Beller wrote:
> On Wed, Sep 26, 2018 at 6:25 PM Josh Steadmon <steadmon@google.com> wrote:
> >
> > This is the second version of my series to add a new protocol v2 command
> > for archiving, with support for HTTP(S).
> >
> > NEEDSWORK: a server built with this series is not backwards-compatible
> > with clients that set GIT_PROTOCOL=version=2 or configure
> > protocol.version=2. The old client will unconditionally send "argument
> > ..." packet lines, which breaks the server's expectations of a
> > "command=archive" request,
> 
> So if an old client sets protocol to v2, it would only apply that
> protocol version
> to fetch, not archive, so it would start following a v0 conversation, but
> as the protocol version is set, it would be transmitted to the server.
> This sounds like a bug in the client?

Yeah, basically. We're telling the server we support v2, even if the
specific operation we're trying to do doesn't have a v2 implementation
on the client. So this is going to make it ugly to replace existing
commands.

> >  while the server's capability advertisement
> > in turn breaks the clients expectation of either an ACK or NACK.
> 
> Could a modern client send either another protocol version (3?)
> or a special capability along the protocol version ("fixed_archive")
> 
> > I've been discussing workarounds for this with Jonathan Nieder, but
> > please let me know if you have any suggestions for v3 of this series.
> 
> Care to open the discussion to the list? What are the different
> approaches, what are the pros/cons?

Jonathan suggested something along the lines of what you said above,
adding a new field in GIT_PROTOCOL. So we'd send something like
"version=2:archive_version=2" and have the server detect the latter.

I'm not sure if that's the best way to go about this since I'm not
familiar with the version detection code for other parts of the system.
I worry that it will lead us down the path of having to specify a
version for every command that we eventually convert to protocol v2. On
the other hand, I don't see any other way to work around this, at least
in the archive case. We can't peek at the client's transmissions on the
server, because v2 requires that the server speaks first...
Junio C Hamano Sept. 27, 2018, 10:20 p.m. UTC | #4
Jonathan Nieder <jrnieder@gmail.com> writes:

>  1. Clients sending version=2 when they do not, in fact, speak protocol
>     v2 for a service is a (serious) bug.  (Separately from this
>     series) we should fix it.
>
>  2. That bug is already in the wild, alas.  Fortunately the semantics of
>     GIT_PROTOCOL as a list of key/value pairs is well defined.  So we
>     have choices of (a) bump version to version=3 (b) pass another
>     value 'version=2:yesreallyversion=2' (c) etc.
>
>  3. This is likely to affect push, too.

Do you mean that existing "git push", "git fetch" and "git archive"
sends version=2 even when they are not capable of speaking protocol
v2?  I thought that "git archive [--remote]" was left outside of the
protocol update (that was the reason why the earlier attempt took a
hacky route of "shallow clone followed by local archive"), so there
is no "git archive" in the wild that can even say "version=$n"
(which requires you to be at least version=1)?
Josh Steadmon Sept. 27, 2018, 10:33 p.m. UTC | #5
On 2018.09.27 15:20, Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:
> 
> >  1. Clients sending version=2 when they do not, in fact, speak protocol
> >     v2 for a service is a (serious) bug.  (Separately from this
> >     series) we should fix it.
> >
> >  2. That bug is already in the wild, alas.  Fortunately the semantics of
> >     GIT_PROTOCOL as a list of key/value pairs is well defined.  So we
> >     have choices of (a) bump version to version=3 (b) pass another
> >     value 'version=2:yesreallyversion=2' (c) etc.
> >
> >  3. This is likely to affect push, too.
> 
> Do you mean that existing "git push", "git fetch" and "git archive"
> sends version=2 even when they are not capable of speaking protocol
> v2?  I thought that "git archive [--remote]" was left outside of the
> protocol update (that was the reason why the earlier attempt took a
> hacky route of "shallow clone followed by local archive"), so there
> is no "git archive" in the wild that can even say "version=$n"
> (which requires you to be at least version=1)?

Yes, the version on my desktop sends version=2 when archiving:

∫ which git
/usr/bin/git
∫ git --version
git version 2.19.0.605.g01d371f741-goog
∫ GIT_TRACE_PACKET=${HOME}/server_trace git daemon \
  --enable=upload-archive \
  --base-path=${HOME}/src/bare-repos &
[1] 258496
∫ git archive --remote git://localhost/test-repo.git HEAD >! test.tar
∫ grep version ~/server_trace
15:31:22.377869 pkt-line.c:80           packet:          git< git-upload-archive /test-repo.git\0host=localhost\0\0version=2\0
Junio C Hamano Sept. 28, 2018, 1:25 a.m. UTC | #6
Josh Steadmon <steadmon@google.com> writes:

> Yes, the version on my desktop sends version=2 when archiving:
>
> ∫ which git
> /usr/bin/git
> ∫ git --version
> git version 2.19.0.605.g01d371f741-goog
> ∫ GIT_TRACE_PACKET=${HOME}/server_trace git daemon \
>   --enable=upload-archive \
>   --base-path=${HOME}/src/bare-repos &
> [1] 258496
> ∫ git archive --remote git://localhost/test-repo.git HEAD >! test.tar
> ∫ grep version ~/server_trace
> 15:31:22.377869 pkt-line.c:80           packet:          git< git-upload-archive /test-repo.git\0host=localhost\0\0version=2\0

Ah, that's truly broken.

Come to think of it, do we need to be using uniform versions across
different endpoints?  The archive request could be at v3 while fetch
request could still be at v2, in which case the design to use a
single protocol.version variable is probably the root cause of the
confusion?  Perhaps like protocol.<name>.allow, we would want
protocol.<name>.version or something like that (and no
protocol.version) to make it clear that protocol v2 used for
fetching has nothing to do with protocol v1 or v2 or v3 used for
archiving?

Luckily, protocol.version is still marked as experimental so it is
not too bad that we caught the design mistake (if it is one) and can
now correct it before the damage spreads too widely.