Message ID | 20240402200517.GA875182@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 324231174247c70e66908b78924908fbfcebed19 |
Headers | show |
Series | git+curl 8.7.0 workaround | expand |
Jeff King <peff@peff.net> writes: > In get_active_slot(), we return a CURL handle that may have been used > before (reusing them is good because it lets curl reuse the same > connection across many requests). We set a few curl options back to > defaults that may have been modified by previous requests. > > We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which > defaults to "-1"). This usually doesn't matter because most POSTs will > set both fields together anyway. But there is one exception: when > handling a large request in remote-curl's post_rpc(), we don't set > _either_, and instead set a READFUNCTION to stream data into libcurl. > > This can interact weirdly with a stale POSTFIELDSIZE setting, because > curl will assume it should read only some set number of bytes from our > READFUNCTION. However, it has worked in practice because we also > manually set a "Transfer-Encoding: chunked" header, which libcurl uses > as a clue to set the POSTFIELDSIZE to -1 itself. > > So everything works, but we're better off resetting the size manually > for a few reasons: > > - there was a regression in curl 8.7.0 where the chunked header > detection didn't kick in, causing any large HTTP requests made by > Git to fail. This has since been fixed (but not yet released). In > the issue, curl folks recommended setting it explicitly to -1: > > https://github.com/curl/curl/issues/13229#issuecomment-2029826058 > > and it indeed works around the regression. So even though it won't > be strictly necessary after the fix there, this will help folks who > end up using the affected libcurl versions. > > - it's consistent with what a new curl handle would look like. Since > get_active_slot() may or may not return a used handle, this reduces > the possibility of heisenbugs that only appear with certain request > patterns. > > Note that the recommendation in the curl issue is to actually drop the > manual Transfer-Encoding header. Modern libcurl will add the header > itself when streaming from a READFUNCTION. However, that code wasn't > added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST > if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to > support back to 7.19.5, so those older versions still need the manual > header. > > Signed-off-by: Jeff King <peff@peff.net> > --- > http.c | 1 + > 1 file changed, 1 insertion(+) As always, well articulated. Thanks. Will queue. > diff --git a/http.c b/http.c > index e73b136e58..3d80bd6116 100644 > --- a/http.c > +++ b/http.c > @@ -1452,6 +1452,7 @@ struct active_request_slot *get_active_slot(void) > curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL); > curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL); > curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL); > + curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, -1L); > curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0); > curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1); > curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);
On Tue, Apr 02, 2024 at 04:05:17PM -0400, Jeff King wrote: > So everything works, but we're better off resetting the size manually > for a few reasons: > > - there was a regression in curl 8.7.0 where the chunked header > detection didn't kick in, causing any large HTTP requests made by > Git to fail. This has since been fixed (but not yet released). In > the issue, curl folks recommended setting it explicitly to -1: > > https://github.com/curl/curl/issues/13229#issuecomment-2029826058 > > and it indeed works around the regression. So even though it won't > be strictly necessary after the fix there, this will help folks who > end up using the affected libcurl versions. Hmph. This isn't quite enough to make things work with 8.7.0, because there are two things wrong there: 1. curl uses the leftover POSTFIELDSIZE, which this patch fixes. So HTTP/1.1 is fine (including t5551). 2. In HTTP/2 mode, it sends chunked data, even though HTTP/2 does not use "Transfer-Encoding: chunked" (it correctly does not send the header, but it puts unnecessary chunk framing around the data). t5559 (which covers HTTP/2) fails, and so does accessing HTTP/2 capable hosts like github.com (though score another win for t5559; it has introduced some hassles, but it diagnosed a real problem we would not have otherwise seen in the test suite). This second problem can be fixed by eliminating the manual transfer-encoding header, which is what's confusing curl's HTTP/2 code. But as I said earlier... > Note that the recommendation in the curl issue is to actually drop the > manual Transfer-Encoding header. Modern libcurl will add the header > itself when streaming from a READFUNCTION. However, that code wasn't > added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST > if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to > support back to 7.19.5, so those older versions still need the manual > header. ...we can't do so unconditionally. So we'd need something like: diff --git a/remote-curl.c b/remote-curl.c index 31b02b8840..215bcc6e10 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -955,7 +955,9 @@ static int post_rpc(struct rpc_state *rpc, int stateless_connect, int flush_rece /* The request body is large and the size cannot be predicted. * We must use chunked encoding to send it. */ +#if LIBCURL_VERSION_NUM < 0x074200 headers = curl_slist_append(headers, "Transfer-Encoding: chunked"); +#endif rpc->initial_buffer = 1; curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, rpc_out); curl_easy_setopt(slot->curl, CURLOPT_INFILE, rpc); I was hoping not to have to carry any version-specific cruft (especially since newer versions of curl should fix the regression). But it's not too bad, and it actually marks the code as something that we can ditch in the future when that version of curl becomes obsolete. I can try to prepare a cleaner patch for it tomorrow, but comments welcome in the meantime. -Peff
On Tue, Apr 02, 2024 at 04:05:17PM -0400, Jeff King wrote: > In get_active_slot(), we return a CURL handle that may have been used > before (reusing them is good because it lets curl reuse the same > connection across many requests). We set a few curl options back to > defaults that may have been modified by previous requests. > > We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which > defaults to "-1"). This usually doesn't matter because most POSTs will > set both fields together anyway. But there is one exception: when > handling a large request in remote-curl's post_rpc(), we don't set > _either_, and instead set a READFUNCTION to stream data into libcurl. > > This can interact weirdly with a stale POSTFIELDSIZE setting, because > curl will assume it should read only some set number of bytes from our > READFUNCTION. However, it has worked in practice because we also > manually set a "Transfer-Encoding: chunked" header, which libcurl uses > as a clue to set the POSTFIELDSIZE to -1 itself. > > So everything works, but we're better off resetting the size manually > for a few reasons: > > - there was a regression in curl 8.7.0 where the chunked header > detection didn't kick in, causing any large HTTP requests made by > Git to fail. This has since been fixed (but not yet released). In > the issue, curl folks recommended setting it explicitly to -1: > > https://github.com/curl/curl/issues/13229#issuecomment-2029826058 > > and it indeed works around the regression. So even though it won't > be strictly necessary after the fix there, this will help folks who > end up using the affected libcurl versions. > > - it's consistent with what a new curl handle would look like. Since > get_active_slot() may or may not return a used handle, this reduces > the possibility of heisenbugs that only appear with certain request > patterns. > > Note that the recommendation in the curl issue is to actually drop the > manual Transfer-Encoding header. Modern libcurl will add the header > itself when streaming from a READFUNCTION. However, that code wasn't > added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST > if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to > support back to 7.19.5, so those older versions still need the manual > header. > > Signed-off-by: Jeff King <peff@peff.net> > --- > http.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/http.c b/http.c > index e73b136e58..3d80bd6116 100644 > --- a/http.c > +++ b/http.c > @@ -1452,6 +1452,7 @@ struct active_request_slot *get_active_slot(void) > curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL); > curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL); > curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL); > + curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, -1L); > curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0); > curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1); > curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1); Can't we refactor this code to instead use `curl_easy_reset()`? That function already resets most of the data we want to reset and would also end up setting `POSFIELDSIZE = -1` via `Curl_init_userdefined()`. So wouldn't the following be a more sensible fix? diff --git a/http.c b/http.c index e73b136e58..e5f5bc23db 100644 --- a/http.c +++ b/http.c @@ -1442,20 +1442,14 @@ struct active_request_slot *get_active_slot(void) slot->finished = NULL; slot->callback_data = NULL; slot->callback_func = NULL; + curl_easy_reset(slot->curl); curl_easy_setopt(slot->curl, CURLOPT_COOKIEFILE, curl_cookie_file); if (curl_save_cookies) curl_easy_setopt(slot->curl, CURLOPT_COOKIEJAR, curl_cookie_file); curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, pragma_header); curl_easy_setopt(slot->curl, CURLOPT_RESOLVE, host_resolutions); curl_easy_setopt(slot->curl, CURLOPT_ERRORBUFFER, curl_errorstr); - curl_easy_setopt(slot->curl, CURLOPT_CUSTOMREQUEST, NULL); - curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL); - curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL); - curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL); - curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0); - curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1); curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1); - curl_easy_setopt(slot->curl, CURLOPT_RANGE, NULL); /* * Default following to off unless "ALWAYS" is configured; this gives Patrick
On Wed, Apr 03, 2024 at 08:30:54AM +0200, Patrick Steinhardt wrote: > On Tue, Apr 02, 2024 at 04:05:17PM -0400, Jeff King wrote: > > In get_active_slot(), we return a CURL handle that may have been used > > before (reusing them is good because it lets curl reuse the same > > connection across many requests). We set a few curl options back to > > defaults that may have been modified by previous requests. > > > > We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which > > defaults to "-1"). This usually doesn't matter because most POSTs will > > set both fields together anyway. But there is one exception: when > > handling a large request in remote-curl's post_rpc(), we don't set > > _either_, and instead set a READFUNCTION to stream data into libcurl. > > > > This can interact weirdly with a stale POSTFIELDSIZE setting, because > > curl will assume it should read only some set number of bytes from our > > READFUNCTION. However, it has worked in practice because we also > > manually set a "Transfer-Encoding: chunked" header, which libcurl uses > > as a clue to set the POSTFIELDSIZE to -1 itself. > > > > So everything works, but we're better off resetting the size manually > > for a few reasons: > > > > - there was a regression in curl 8.7.0 where the chunked header > > detection didn't kick in, causing any large HTTP requests made by > > Git to fail. This has since been fixed (but not yet released). In > > the issue, curl folks recommended setting it explicitly to -1: > > > > https://github.com/curl/curl/issues/13229#issuecomment-2029826058 > > > > and it indeed works around the regression. So even though it won't > > be strictly necessary after the fix there, this will help folks who > > end up using the affected libcurl versions. > > > > - it's consistent with what a new curl handle would look like. Since > > get_active_slot() may or may not return a used handle, this reduces > > the possibility of heisenbugs that only appear with certain request > > patterns. > > > > Note that the recommendation in the curl issue is to actually drop the > > manual Transfer-Encoding header. Modern libcurl will add the header > > itself when streaming from a READFUNCTION. However, that code wasn't > > added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST > > if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to > > support back to 7.19.5, so those older versions still need the manual > > header. > > > > Signed-off-by: Jeff King <peff@peff.net> > > --- > > http.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/http.c b/http.c > > index e73b136e58..3d80bd6116 100644 > > --- a/http.c > > +++ b/http.c > > @@ -1452,6 +1452,7 @@ struct active_request_slot *get_active_slot(void) > > curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL); > > curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL); > > curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL); > > + curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, -1L); > > curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0); > > curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1); > > curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1); > > Can't we refactor this code to instead use `curl_easy_reset()`? That > function already resets most of the data we want to reset and would also > end up setting `POSFIELDSIZE = -1` via `Curl_init_userdefined()`. So > wouldn't the following be a more sensible fix? > > diff --git a/http.c b/http.c > index e73b136e58..e5f5bc23db 100644 > --- a/http.c > +++ b/http.c > @@ -1442,20 +1442,14 @@ struct active_request_slot *get_active_slot(void) > slot->finished = NULL; > slot->callback_data = NULL; > slot->callback_func = NULL; > + curl_easy_reset(slot->curl); > curl_easy_setopt(slot->curl, CURLOPT_COOKIEFILE, curl_cookie_file); > if (curl_save_cookies) > curl_easy_setopt(slot->curl, CURLOPT_COOKIEJAR, curl_cookie_file); > curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, pragma_header); > curl_easy_setopt(slot->curl, CURLOPT_RESOLVE, host_resolutions); > curl_easy_setopt(slot->curl, CURLOPT_ERRORBUFFER, curl_errorstr); > - curl_easy_setopt(slot->curl, CURLOPT_CUSTOMREQUEST, NULL); > - curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL); > - curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL); > - curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL); > - curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0); > - curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1); > curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1); > - curl_easy_setopt(slot->curl, CURLOPT_RANGE, NULL); > > /* > * Default following to off unless "ALWAYS" is configured; this gives Oh well, the answer is "no", or at least not as easily as this, as the failing tests tell us. I guess it resets more data than we actually want it to reset, but I didn't dig any deeper than that. Patrick
On Wed, Apr 03, 2024 at 08:34:37AM +0200, Patrick Steinhardt wrote: > > Can't we refactor this code to instead use `curl_easy_reset()`? That > > function already resets most of the data we want to reset and would also > > end up setting `POSFIELDSIZE = -1` via `Curl_init_userdefined()`. So > > wouldn't the following be a more sensible fix? > [...] > Oh well, the answer is "no", or at least not as easily as this, as the > failing tests tell us. I guess it resets more data than we actually want > it to reset, but I didn't dig any deeper than that. Yeah. The curl setup is really in two parts: 1. we make a handle in get_curl_handle(), which also sets up a bunch of options. We use that to make a single "curl_default" handle, and then when we want a new handle we curl_easy_duphandle() that 2. when we want to make a request we call get_active_slot(), which will either return an already-used handle or duphandle() a new one. And then reset some options, but also do some more setup. Your patch touches spot (2), so it's erasing all of the setup done in (1). I don't think there's a way to say "go back to state when we called duphandle(), but keep reusing connections, etc". Possibly it could be solved by pushing all of the setup from (1) into (2). Though that would probably require separating out some config handling, etc, from the actual curl_easy_setopt() calls (we wouldn't want to complain about invalid config for _every_ request, for example, just once per program run). This is all also a bit more complicated than it needs to be for smart-http. The dumb-http code may want to have several handles going at once to do individual object/pack downloads. Whereas for smart http, we really are only working with one handle, and doing one request at a time. I don't know if we'll ever drop dumb-http support, but there would probably be a lot of cleanup possibilities if we did. -Peff
diff --git a/http.c b/http.c index e73b136e58..3d80bd6116 100644 --- a/http.c +++ b/http.c @@ -1452,6 +1452,7 @@ struct active_request_slot *get_active_slot(void) curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL); curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL); curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL); + curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, -1L); curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0); curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1); curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);
In get_active_slot(), we return a CURL handle that may have been used before (reusing them is good because it lets curl reuse the same connection across many requests). We set a few curl options back to defaults that may have been modified by previous requests. We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which defaults to "-1"). This usually doesn't matter because most POSTs will set both fields together anyway. But there is one exception: when handling a large request in remote-curl's post_rpc(), we don't set _either_, and instead set a READFUNCTION to stream data into libcurl. This can interact weirdly with a stale POSTFIELDSIZE setting, because curl will assume it should read only some set number of bytes from our READFUNCTION. However, it has worked in practice because we also manually set a "Transfer-Encoding: chunked" header, which libcurl uses as a clue to set the POSTFIELDSIZE to -1 itself. So everything works, but we're better off resetting the size manually for a few reasons: - there was a regression in curl 8.7.0 where the chunked header detection didn't kick in, causing any large HTTP requests made by Git to fail. This has since been fixed (but not yet released). In the issue, curl folks recommended setting it explicitly to -1: https://github.com/curl/curl/issues/13229#issuecomment-2029826058 and it indeed works around the regression. So even though it won't be strictly necessary after the fix there, this will help folks who end up using the affected libcurl versions. - it's consistent with what a new curl handle would look like. Since get_active_slot() may or may not return a used handle, this reduces the possibility of heisenbugs that only appear with certain request patterns. Note that the recommendation in the curl issue is to actually drop the manual Transfer-Encoding header. Modern libcurl will add the header itself when streaming from a READFUNCTION. However, that code wasn't added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to support back to 7.19.5, so those older versions still need the manual header. Signed-off-by: Jeff King <peff@peff.net> --- http.c | 1 + 1 file changed, 1 insertion(+)