Message ID | 20180906193516.28909-1-max@max630.net (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | http-backend: allow empty CONTENT_LENGTH | expand |
Max Kirillov <max@max630.net> writes: > According to RFC3875, empty environment variable is equivalent to unset, > and for CONTENT_LENGTH it should mean zero body to read. > > However, as discussed in [1], unset CONTENT_LENGTH is also used for > chunked encoding to indicate reading until EOF, so keep this behavior also > for empty CONTENT_LENGTH. Makes sense. > > Add a test for the case. > > [1] https://public-inbox.org/git/20160329201349.GB9527@sigill.intra.peff.net/ > > Signed-off-by: Max Kirillov <max@max630.net> > --- > Hi. > > This should fix it. I'm not sure should it treat it as 0 or "-1" > At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1" > So keep the existing behavior as much as possible I am not sure what you mean by the above, between 0 and -1. The code signals the caller of get_content_length() that req_len is -1 which is used as a sign to read through to the EOF, so it appears to me that the code treats missing content-length (i.e. str == NULL case) as "-1". > http-backend.c | 2 +- > t/t5562-http-backend-content-length.sh | 11 +++++++++++ > 2 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/http-backend.c b/http-backend.c > index e88d29f62b..a1230d7ead 100644 > --- a/http-backend.c > +++ b/http-backend.c > @@ -353,7 +353,7 @@ static ssize_t get_content_length(void) > ssize_t val = -1; > const char *str = getenv("CONTENT_LENGTH"); > > - if (str && !git_parse_ssize_t(str, &val)) > + if (str && *str && !git_parse_ssize_t(str, &val)) > die("failed to parse CONTENT_LENGTH: %s", str); > return val; > } > diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh > index 057dcb85d6..ca34c2f054 100755 > --- a/t/t5562-http-backend-content-length.sh > +++ b/t/t5562-http-backend-content-length.sh > @@ -152,4 +152,15 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' ' > grep "fatal:.*CONTENT_LENGTH" err > ' > > +test_expect_success 'empty CONTENT_LENGTH' ' > + env \ > + QUERY_STRING=/repo.git/HEAD \ > + PATH_TRANSLATED="$PWD"/.git/HEAD \ > + GIT_HTTP_EXPORT_ALL=TRUE \ > + REQUEST_METHOD=GET \ > + CONTENT_LENGTH="" \ > + git http-backend <empty_body >act.out 2>act.err && > + verify_http_result "200 OK" > +' > + > test_done
Hi, Max Kirillov wrote: > According to RFC3875, empty environment variable is equivalent to unset, > and for CONTENT_LENGTH it should mean zero body to read. > > However, as discussed in [1], unset CONTENT_LENGTH is also used for > chunked encoding to indicate reading until EOF, so keep this behavior also > for empty CONTENT_LENGTH. > > Add a test for the case. > > [1] https://public-inbox.org/git/20160329201349.GB9527@sigill.intra.peff.net/ > > Signed-off-by: Max Kirillov <max@max630.net> Reported-by: Jelmer Vernooij <jelmer@jelmer.uk> Thanks for fixing it. Can you include a summary of [1] instead of relying on the mailing list archive? Perhaps just omiting "as discussed in [1]" would do the trick. Alternatively, if there's a point from that discussion that's relevant to the change, please include it here. That way, people finding this change later can save some time by avoiding having to dig through that mailing list thread. For example, it's probably worth mentioning that this was discovered using dulwich's test suite. [...] > This should fix it. I'm not sure should it treat it as 0 or "-1" > At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1" > So keep the existing behavior as much as possible That sounds worth figuring out so we can understand and possibly document it better. What are the ramifications of this choice --- what would work / not work with each choice? [...] > --- a/t/t5562-http-backend-content-length.sh > +++ b/t/t5562-http-backend-content-length.sh > @@ -152,4 +152,15 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' ' Yay, thanks for this as well. Sincerely, Jonathan
On Thu, Sep 06, 2018 at 02:54:18PM -0700, Junio C Hamano wrote: > Max Kirillov <max@max630.net> writes: >> This should fix it. I'm not sure should it treat it as 0 or "-1" >> At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1" >> So keep the existing behavior as much as possible > > I am not sure what you mean by the above, between 0 and -1. The > code signals the caller of get_content_length() that req_len is -1 > which is used as a sign to read through to the EOF, so it appears to > me that the code treats missing content-length (i.e. str == NULL > case) as "-1". I made a mistake in this, it should be "if I try to treat missing CONTENT_LENGTH as 0". This, as far as I understand, what the RFC specifies. That is, after the following change, the test "large fetch-pack requests can be split across POSTs" from t5551 starts faliing: -- >8 -- @@ -353,8 +353,12 @@ static ssize_t get_content_length(void) ssize_t val = -1; const char *str = getenv("CONTENT_LENGTH"); - if (str && *str && !git_parse_ssize_t(str, &val)) - die("failed to parse CONTENT_LENGTH: %s", str); + if (str && *str) { + if (!git_parse_ssize_t(str, &val)) + die("failed to parse CONTENT_LENGTH: %s", str); + } else + val = 0; + return val; }
On Fri, Sep 07, 2018 at 06:27:40AM +0300, Max Kirillov wrote: > On Thu, Sep 06, 2018 at 02:54:18PM -0700, Junio C Hamano wrote: > > Max Kirillov <max@max630.net> writes: > >> This should fix it. I'm not sure should it treat it as 0 or "-1" > >> At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1" > >> So keep the existing behavior as much as possible > > > > I am not sure what you mean by the above, between 0 and -1. The > > code signals the caller of get_content_length() that req_len is -1 > > which is used as a sign to read through to the EOF, so it appears to > > me that the code treats missing content-length (i.e. str == NULL > > case) as "-1". > > I made a mistake in this, it should be "if I try to treat missing > CONTENT_LENGTH as 0". This, as far as I understand, what the > RFC specifies. > > That is, after the following change, the test "large fetch-pack > requests can be split across POSTs" from t5551 starts faliing: > > -- >8 -- > @@ -353,8 +353,12 @@ static ssize_t get_content_length(void) > ssize_t val = -1; > const char *str = getenv("CONTENT_LENGTH"); > > - if (str && *str && !git_parse_ssize_t(str, &val)) > - die("failed to parse CONTENT_LENGTH: %s", str); > + if (str && *str) { > + if (!git_parse_ssize_t(str, &val)) > + die("failed to parse CONTENT_LENGTH: %s", str); > + } else > + val = 0; > + Right, I'm pretty sure it is a problem if you treat a missing CONTENT_LENGTH as "present, but zero". Because chunked encodings from apache really do want us to read until EOF. My understanding from Jelmer's report is that a present-but-empty variable should be counted as "0" to mean "do not read any body bytes". That matches my reading of RFC 3875, which says: If no data is attached, then NULL (or unset). (and earlier they explicitly define NULL as the empty string). That said, we do not do what they say for the "unset" case. And cannot without breaking chunked encoding from apache. So I don't know how much we want to follow that rfc to the letter, but at least it makes sense to me to revert this case back to what Git used to do, and what the rfc says. In other words, I think the logic we want is: if (!str) { /* * RFC3875 says this must mean "no body", but in practice we * receive chunked encodings with no CONTENT_LENGTH. Tell the * caller to read until EOF. */ val = -1; } else if (!*str) { /* * An empty length should be treated as "no body" according to * RFC3875, and this seems to hold in practice. */ val = 0; } else { /* * We have a CONTENT_LENGTH; trust what's in it as long as it * can be parsed. */ if (!git_parse_ssize_t(str, &val)) die(...); } -Peff
On Thu, Sep 06, 2018 at 11:38:31PM -0400, Jeff King wrote: > My understanding from Jelmer's report is that a present-but-empty > variable should be counted as "0" to mean "do not read any body bytes". > That matches my reading of RFC 3875, which says: > > If no data is attached, then NULL (or unset). > > (and earlier they explicitly define NULL as the empty string). That > said, we do not do what they say for the "unset" case. And cannot > without breaking chunked encoding from apache. So I don't know how much > we want to follow that rfc to the letter, but at least it makes sense to > me to revert this case back to what Git used to do, and what the rfc > says. I could find this discussion about it: https://lists.gt.net/apache/users/373042 Basically, it says the CGI RFC was written before chunked encoding appeared, so implementations should choose between caching all boody before calling script, or breaking the spec some way. So apache does it so. (I wonder how IIS would handle it) > In other words, I think the logic we want is: > > if (!str) { > /* > * RFC3875 says this must mean "no body", but in practice we > * receive chunked encodings with no CONTENT_LENGTH. Tell the > * caller to read until EOF. > */ > val = -1; > } else if (!*str) { > /* > * An empty length should be treated as "no body" according to > * RFC3875, and this seems to hold in practice. > */ > val = 0; > } else { > /* > * We have a CONTENT_LENGTH; trust what's in it as long as it > * can be parsed. > */ > if (!git_parse_ssize_t(str, &val)) > die(...); > } I feel reluctant to treat empty and unset differently, but probably this is the only thing which could be done. I'll resumbmit some time later.
Actually, another reason for the latest issue was that CONTENT_LENGTH is parsed for GET requests at all. It should be parsed only for POST requests, or, rather, only for upoad-pack and receive-pack requests.
Max Kirillov <max@max630.net> writes: > Actually, another reason for the latest issue was that CONTENT_LENGTH > is parsed for GET requests at all. It should be parsed only for POST > requests, or, rather, only for upoad-pack and receive-pack requests. Not really. The layered design of the HTTP protocol means that any request type can have non-empty body, but request types for which no semantics of the body is defined must ignore what is in the body, which in turn means we need to parse and pay attention to the content length etc. to find the end of the body, if only to ignore it. In any case, hopefully we can fix this before the final, as this is a regression introduced during this cycle?
On Fri, Sep 07, 2018 at 02:49:23AM -0700, Junio C Hamano wrote: > Max Kirillov <max@max630.net> writes: > >> Actually, another reason for the latest issue was that CONTENT_LENGTH >> is parsed for GET requests at all. It should be parsed only for POST >> requests, or, rather, only for upoad-pack and receive-pack requests. > > Not really. The layered design of the HTTP protocol means that any > request type can have non-empty body, but request types for which > no semantics of the body is defined must ignore what is in the body, > which in turn means we need to parse and pay attention to the > content length etc. to find the end of the body, if only to ignore > it. I don't think it is git's job to police web server implementations, especially considering that there is a gap between letter of RFC and actual behavior. Anyway, it only runs the check for "*/info/refs" GET request, which ends up in get_info_refs(). Other GET requests do not check CONTENT_LENGTH. Also, the version of service which is started from get_info_refs() do not consume input (I think, actually, the "--stateless-rpc" argument is not needed there). > In any case, hopefully we can fix this before the final, as this is > a regression introduced during this cycle? Yes, I'm working on it.
On Fri, Sep 07, 2018 at 02:49:23AM -0700, Junio C Hamano wrote: > In any case, hopefully we can fix this before the final, as this is > a regression introduced during this cycle? I think I am going to stop at the v4. Unless there are some corrections requested.
diff --git a/http-backend.c b/http-backend.c index e88d29f62b..a1230d7ead 100644 --- a/http-backend.c +++ b/http-backend.c @@ -353,7 +353,7 @@ static ssize_t get_content_length(void) ssize_t val = -1; const char *str = getenv("CONTENT_LENGTH"); - if (str && !git_parse_ssize_t(str, &val)) + if (str && *str && !git_parse_ssize_t(str, &val)) die("failed to parse CONTENT_LENGTH: %s", str); return val; } diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh index 057dcb85d6..ca34c2f054 100755 --- a/t/t5562-http-backend-content-length.sh +++ b/t/t5562-http-backend-content-length.sh @@ -152,4 +152,15 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' ' grep "fatal:.*CONTENT_LENGTH" err ' +test_expect_success 'empty CONTENT_LENGTH' ' + env \ + QUERY_STRING=/repo.git/HEAD \ + PATH_TRANSLATED="$PWD"/.git/HEAD \ + GIT_HTTP_EXPORT_ALL=TRUE \ + REQUEST_METHOD=GET \ + CONTENT_LENGTH="" \ + git http-backend <empty_body >act.out 2>act.err && + verify_http_result "200 OK" +' + test_done
According to RFC3875, empty environment variable is equivalent to unset, and for CONTENT_LENGTH it should mean zero body to read. However, as discussed in [1], unset CONTENT_LENGTH is also used for chunked encoding to indicate reading until EOF, so keep this behavior also for empty CONTENT_LENGTH. Add a test for the case. [1] https://public-inbox.org/git/20160329201349.GB9527@sigill.intra.peff.net/ Signed-off-by: Max Kirillov <max@max630.net> --- Hi. This should fix it. I'm not sure should it treat it as 0 or "-1" At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1" So keep the existing behavior as much as possible http-backend.c | 2 +- t/t5562-http-backend-content-length.sh | 11 +++++++++++ 2 files changed, 12 insertions(+), 1 deletion(-)