Message ID | xmqqczgqjr8y.fsf_-_@gitster.g (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | http.c: clear the 'finished' member once we are done with it | expand |
On Fri, May 06, 2022 at 02:17:01PM -0700, Junio C Hamano wrote: > diff --git a/http.c b/http.c > index 229da4d148..85437b1980 100644 > --- a/http.c > +++ b/http.c > @@ -1367,6 +1367,9 @@ void run_active_slot(struct active_request_slot *slot) > select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout); > } > } > + > + if (slot->finished == &finished) > + slot->finished = NULL; I am not completely sure yet (since I looked at it long ago and got sidetracked) but I think this might be optimized out (at least by gcc12) since it is technically UB, which is why it never "fixed" the warning. the "correct" way to implement this would be to make "finished" a thread local static, which is finally one good reason to support C99, but the syntax to do so with Windows broke my first attempt at doing so and now I can even find the code I used then which required a per platform macro and was better looking than the following Carlo --- >8 ---- Date: Wed, 20 Apr 2022 23:25:55 -0700 Subject: [PATCH] http: avoid using out of scope pointers baa7b67d091 (HTTP slot reuse fixes, 2006-03-10) introduces a way to notify a curl thread that its slot is finished by using a pointer to a stack variable from run_active_slot(), but then gcc 12 was released and started rightfully to complain about it (-Wdangling-pointer). Use instead a thread storage static variable which is safe to use between threads since C99 and doesn't go out of scope, while being functionally equivalent to the original code, and also remove the workaround from 9c539d1027d (config.mak.dev: alternative workaround to gcc 12 warning in http.c, 2022-04-15) Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> --- config.mak.dev | 1 - http.c | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/config.mak.dev b/config.mak.dev index c3104f400b..335efd4620 100644 --- a/config.mak.dev +++ b/config.mak.dev @@ -68,7 +68,6 @@ endif # https://bugzilla.redhat.com/show_bug.cgi?id=2075786 ifneq ($(filter gcc12,$(COMPILER_FEATURES)),) DEVELOPER_CFLAGS += -Wno-error=stringop-overread -DEVELOPER_CFLAGS += -Wno-error=dangling-pointer endif GIT_TEST_PERL_FATAL_WARNINGS = YesPlease diff --git a/http.c b/http.c index 229da4d148..cb9acfca19 100644 --- a/http.c +++ b/http.c @@ -1327,7 +1327,7 @@ void run_active_slot(struct active_request_slot *slot) fd_set excfds; int max_fd; struct timeval select_timeout; - int finished = 0; + static __thread int finished; slot->finished = &finished; while (!finished) {
Carlo Marcelo Arenas Belón <carenas@gmail.com> writes: > On Fri, May 06, 2022 at 02:17:01PM -0700, Junio C Hamano wrote: >> diff --git a/http.c b/http.c >> index 229da4d148..85437b1980 100644 >> --- a/http.c >> +++ b/http.c >> @@ -1367,6 +1367,9 @@ void run_active_slot(struct active_request_slot *slot) >> select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout); >> } >> } >> + >> + if (slot->finished == &finished) >> + slot->finished = NULL; > > I am not completely sure yet (since I looked at it long ago and got > sidetracked) but I think this might be optimized out (at least by gcc12) > since it is technically UB, which is why it never "fixed" the warning. UB meaning "undefined behaviour"? Which part is? Taking the address of an on-stack variable "finished"? Comparing it with a pointer that may or may not have been assigned/overwritten elsewhere in a structure? Not clearing the member in the struct unconditionally? Puzzled.
On Sat, May 7, 2022 at 11:42 AM Junio C Hamano <gitster@pobox.com> wrote: > > Carlo Marcelo Arenas Belón <carenas@gmail.com> writes: > > > On Fri, May 06, 2022 at 02:17:01PM -0700, Junio C Hamano wrote: > >> diff --git a/http.c b/http.c > >> index 229da4d148..85437b1980 100644 > >> --- a/http.c > >> +++ b/http.c > >> @@ -1367,6 +1367,9 @@ void run_active_slot(struct active_request_slot *slot) > >> select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout); > >> } > >> } > >> + > >> + if (slot->finished == &finished) > >> + slot->finished = NULL; > > > > I am not completely sure yet (since I looked at it long ago and got > > sidetracked) but I think this might be optimized out (at least by gcc12) > > since it is technically UB, which is why it never "fixed" the warning. > > UB meaning "undefined behaviour"? Which part is? Taking the > address of an on-stack variable "finished"? > Comparing it with a > pointer that may or may not have been assigned/overwritten elsewhere > in a structure? it is not very intuitive, but using a pointer to a variable that is out of scope is UB, and in this case the value of slot->finished might point to an address that is not in our own stack (because it came from a different thread), hence undefined Carlo
Hi Junio, On Fri, 6 May 2022, Junio C Hamano wrote: > In http.c, the run_active_slot() function allows the given "slot" to > make progress by calling step_active_slots() in a loop repeatedly, > and the loop is not left until the request held in the slot > completes. > > Ages ago, we used to use the slot->in_use member to get out of the > loop, which misbehaved when the request in "slot" completes (at > which time, the result of the request is copied away from the slot, > and the in_use member is cleared, making the slot ready to be > reused), and the "slot" gets reused to service a different request > (at which time, the "slot" becomes in_use again, even though it is > for a different request). The loop terminating condition mistakenly > thought that the original request has yet to be completed. > > Today's code, after baa7b67d (HTTP slot reuse fixes, 2006-03-10) > fixed this issue, uses a separate "slot->finished" member that is > set in run_active_slot() to point to an on-stack variable, and the > code that completes the request in finish_active_slot() clears the > on-stack variable via the pointer to signal that the particular > request held by the slot has completed. It also clears the in_use > member (as before that fix), so that the slot itself can safely be > reused for an unrelated request. > > One thing that is not quite clean in this arrangement is that, > unless the slot gets reused, at which point the finished member is > reset to NULL, the member keeps the value of &finished, which > becomes a dangling pointer into the stack when run_active_slot() > returns. Clear the finished member before the control leaves the > function, but make sure to limit it to the case where the pointer > still points at the on-stack variable of ours (the pointer may be > set to point at the on-stack variable of somebody else after the > slot gets reused, in which case we do not want to touch it). > > Signed-off-by: Junio C Hamano <gitster@pobox.com> > --- > > * So, this has been sitting in my pile of random patches for a few > weeks. I stumbled over the need for this while investigating the build failures caused by upgrading Git for Windows' SDK's GCC to v12.x. > diff --git a/http.c b/http.c > index 229da4d148..85437b1980 100644 > --- a/http.c > +++ b/http.c > @@ -1367,6 +1367,9 @@ void run_active_slot(struct active_request_slot *slot) > select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout); > } > } > + > + if (slot->finished == &finished) > + slot->finished = NULL; First of all, I suspect that https://github.com/git/git/blob/v2.36.1/http.c#L1207 makes sure that GCC's complaint is not actually accurate: we always re-set `finished` to `NULL` when getting an unused slot, so even if there is a left-over dangling pointer, it is not actually used, ever. But we need something to pacify GCC. Let's look at your patch. The first thing to note is that this is not _quite_ thread-safe: between checking the condition `slot->finished == &finished` and assigning `slot->finished`, another thread could potentially have noticed that the slot is not in use and overwritten the `finished` attribute, which would then be set to `NULL` in this thread, in which case _that other_ thread's `while (!finished)` loop would become an infinite loop. Having said that, the time window is really narrow. Besides, I suspect that we _already_ have an equivalent "offender" in https://github.com/git/git/blob/v2.36.1/http.c#L1336: we look at `in_use` there, assuming that it is either `1` if "our" request is still active, and otherwise it is `0`. However, it might have turned to `0` _and_ to `1` again in the meantime (but the `in_use` would now refer to _another_ request). I am not quite sure how correct my reading of the situation is, so please double-check my analysis. If that analysis is correct, I would expect the correct solution to turn `finished` into an attribute of the slot, and change its role to be a flag that this slot is spoken for and cannot be re-used quite yet even if it is not currently in use. Something like this: -- snip -- diff --git a/http-walker.c b/http-walker.c index 910fae539b89..5cc369dea853 100644 --- a/http-walker.c +++ b/http-walker.c @@ -225,13 +225,9 @@ static void process_alternates_response(void *callback_data) alt_req->url->buf); active_requests++; slot->in_use = 1; - if (slot->finished != NULL) - (*slot->finished) = 0; if (!start_active_slot(slot)) { cdata->got_alternates = -1; slot->in_use = 0; - if (slot->finished != NULL) - (*slot->finished) = 1; } return; } diff --git a/http.c b/http.c index b08795715f8a..2d125132fb90 100644 --- a/http.c +++ b/http.c @@ -205,8 +205,7 @@ static void finish_active_slot(struct active_request_slot *slot) closedown_active_slot(slot); curl_easy_getinfo(slot->curl, CURLINFO_HTTP_CODE, &slot->http_code); - if (slot->finished != NULL) - (*slot->finished) = 1; + slot->in_use = 0; /* Store slot results so they can be read after the slot is reused */ if (slot->results != NULL) { @@ -1212,13 +1211,14 @@ struct active_request_slot *get_active_slot(void) process_curl_messages(); } - while (slot != NULL && slot->in_use) + while (slot != NULL && (slot->in_use || slot->reserved_for_use)) slot = slot->next; if (slot == NULL) { newslot = xmalloc(sizeof(*newslot)); newslot->curl = NULL; newslot->in_use = 0; + newslot->reserved_for_use = 0; newslot->next = NULL; slot = active_queue_head; @@ -1240,7 +1240,6 @@ struct active_request_slot *get_active_slot(void) active_requests++; slot->in_use = 1; slot->results = NULL; - slot->finished = NULL; slot->callback_data = NULL; slot->callback_func = NULL; curl_easy_setopt(slot->curl, CURLOPT_COOKIEFILE, curl_cookie_file); @@ -1332,7 +1331,7 @@ void fill_active_slots(void) } while (slot != NULL) { - if (!slot->in_use && slot->curl != NULL + if (!slot->in_use && !slot->reserved_for_use && slot->curl && curl_session_count > min_curl_sessions) { curl_easy_cleanup(slot->curl); slot->curl = NULL; @@ -1363,10 +1362,9 @@ void run_active_slot(struct active_request_slot *slot) fd_set excfds; int max_fd; struct timeval select_timeout; - int finished = 0; - slot->finished = &finished; - while (!finished) { + slot->reserved_for_use = 1; + while (slot->in_use) { step_active_slots(); if (slot->in_use) { @@ -1403,6 +1401,7 @@ void run_active_slot(struct active_request_slot *slot) select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout); } } + slot->reserved_for_use = 0; } static void release_active_slot(struct active_request_slot *slot) diff --git a/http.h b/http.h index df1590e53a45..3b2f6da570cd 100644 --- a/http.h +++ b/http.h @@ -22,9 +22,9 @@ struct slot_results { struct active_request_slot { CURL *curl; int in_use; + int reserved_for_use; CURLcode curl_result; long http_code; - int *finished; struct slot_results *results; void *callback_data; void (*callback_func)(void *data); -- snap -- I integrated this into a local branch that fixes the build with GCC v12.x (required so that our CI/PR builds work again after Git for Windows' SDK upgraded its GCC) and plan on contributing these patches in a bit. Ciao, Dscho > } > > static void release_active_slot(struct active_request_slot *slot) > -- > 2.36.1-200-gf89ea983ca > > >
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > I stumbled over the need for this while investigating the build failures > caused by upgrading Git for Windows' SDK's GCC to v12.x. > >> diff --git a/http.c b/http.c >> index 229da4d148..85437b1980 100644 >> --- a/http.c >> +++ b/http.c >> @@ -1367,6 +1367,9 @@ void run_active_slot(struct active_request_slot *slot) >> select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout); >> } >> } >> + >> + if (slot->finished == &finished) >> + slot->finished = NULL; > > First of all, I suspect that > https://github.com/git/git/blob/v2.36.1/http.c#L1207 makes sure that GCC's > complaint is not actually accurate: we always re-set `finished` to `NULL` > when getting an unused slot, so even if there is a left-over dangling > pointer, it is not actually used, ever. > > But we need something to pacify GCC. Let's look at your patch. > > The first thing to note is that this is not _quite_ thread-safe: between Does this part of the code ever run multi-threaded? > If that analysis is correct, I would expect the correct solution to turn > `finished` into an attribute of the slot, and change its role to be a flag > that this slot is spoken for and cannot be re-used quite yet even if it is > not currently in use. I have a feeling that we've mentioned that at least twice (perhaps three times) in the recent past that it is in essense reverting what the "finished" change baa7b67d (HTTP slot reuse fixes, 2006-03-10) did. We used to use the in-use bit of the slot as an indicator that the slot dispatched by run_active_slot() has finished (i.e. the in-use bit must be cleared when the request held in the struct is fully done), but that broke when a slot we are looking at in run_active_slot() is serviced (which makes in_use false), and then another request reuses the slot (now no longer in_use), before the control comes back to the loop. "while (slot->in_use)" at the beginning of the loop was still true, but the original request the slot was being used for, the one that the run_active_slot() function cares about, has completed. So...
Junio C Hamano <gitster@pobox.com> writes: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > >> I stumbled over the need for this while investigating the build failures >> caused by upgrading Git for Windows' SDK's GCC to v12.x. >> >>> diff --git a/http.c b/http.c >>> index 229da4d148..85437b1980 100644 >>> --- a/http.c >>> +++ b/http.c >>> @@ -1367,6 +1367,9 @@ void run_active_slot(struct active_request_slot *slot) >>> select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout); >>> } >>> } >>> + >>> + if (slot->finished == &finished) >>> + slot->finished = NULL; > ... > I have a feeling that we've mentioned that at least twice (perhaps > three times) in the recent past that it is in essense reverting what > the "finished" change baa7b67d (HTTP slot reuse fixes, 2006-03-10) > did. We used to use the in-use bit of the slot as an indicator that > the slot dispatched by run_active_slot() has finished (i.e. the > in-use bit must be cleared when the request held in the struct is > fully done), but that broke when a slot we are looking at in > run_active_slot() is serviced (which makes in_use false), and then > another request reuses the slot (now no longer in_use), before the > control comes back to the loop. "while (slot->in_use)" at the > beginning of the loop was still true, but the original request the > slot was being used for, the one that the run_active_slot() function > cares about, has completed. Given that the breakage we fixed in 2006 is about run_active_slot() calling step_active_slots() repeatedly, during which this and other requests in flight completes when curl_multi_perform() receives and handles responses, and recursively ends up calling run_active_slot() for _another_ request reusing the slot we are interested in in the codepath in the above disccussion, I _think_ we do not have to consider the case where slot->finished is pointing at somebody else's finished variable on stack here. Yes, while we repeatedly call step_active_slots(), our request in the slot may complete, the slot may be marked as unused, somebody else may reuse the slot, marking it as in_use again and using slot->finished pointer to their on-stack finsihed. But that somebody else's invocation of run_active_slot() will not give control back before their on-stack finished indicates that their recursive call to step_active_slots() completes their request. So after they come back and we exit our while() loop, either slot->finished points at our finished if slot did not get reused, or it points at an unused part of the stack that has long been rewound when we returned from the recursive call. In either case, slot->finished never points at an on-stack address of an ongoing run_active_slot() call made by somebody else that the guard I added (i.e. we must only clear it if it points our on-stack "finished") was trying to protect against clobbering. So, I guess an unconditional assignment of slot->finished = NULL; there would be sufficient.
Hi Junio, On Mon, 23 May 2022, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > I stumbled over the need for this while investigating the build failures > > caused by upgrading Git for Windows' SDK's GCC to v12.x. > > > >> diff --git a/http.c b/http.c > >> index 229da4d148..85437b1980 100644 > >> --- a/http.c > >> +++ b/http.c > >> @@ -1367,6 +1367,9 @@ void run_active_slot(struct active_request_slot *slot) > >> select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout); > >> } > >> } > >> + > >> + if (slot->finished == &finished) > >> + slot->finished = NULL; > > > > First of all, I suspect that > > https://github.com/git/git/blob/v2.36.1/http.c#L1207 makes sure that GCC's > > complaint is not actually accurate: we always re-set `finished` to `NULL` > > when getting an unused slot, so even if there is a left-over dangling > > pointer, it is not actually used, ever. > > > > But we need something to pacify GCC. Let's look at your patch. > > > > The first thing to note is that this is not _quite_ thread-safe: between > > Does this part of the code ever run multi-threaded? It calls into cURL, which I suspect has a multi-threaded mode of operation, and we do use some callbacks in `http-walker.c` and I was worried that these callbacks could be called via cURL calls. That's why I am concerned about thread-safety. I have looked for a while and not found any way that code in `http.c` or `http-walker.c` could set `finished` by way of a cURL callback, but there is a lot of code there and I could have missed something crucial. > > If that analysis is correct, I would expect the correct solution to turn > > `finished` into an attribute of the slot, and change its role to be a flag > > that this slot is spoken for and cannot be re-used quite yet even if it is > > not currently in use. > > I have a feeling that we've mentioned that at least twice (perhaps > three times) in the recent past that it is in essense reverting what > the "finished" change baa7b67d (HTTP slot reuse fixes, 2006-03-10) > did. We used to use the in-use bit of the slot as an indicator that > the slot dispatched by run_active_slot() has finished (i.e. the > in-use bit must be cleared when the request held in the struct is > fully done), but that broke when a slot we are looking at in > run_active_slot() is serviced (which makes in_use false), and then > another request reuses the slot (now no longer in_use), before the > control comes back to the loop. "while (slot->in_use)" at the > beginning of the loop was still true, but the original request the > slot was being used for, the one that the run_active_slot() function > cares about, has completed. > > So... No, I suggested to replace the `finished` variable with an attribute (or "field" or "member variable") of the slot, and to respect it when looking for an unused slot, i.e. not only look for a slot whose `in_use` is 0 but also require `reserved_for_use` to be 0. In essence, the `run_active_slot()` function owns the slot, even if it is not marked as `in_use`. That should address the same concern as baa7b67d but without using a pointer to a local variable. Ciao, Dscho
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > It calls into cURL, which I suspect has a multi-threaded mode of > operation, https://curl.se/libcurl/c/threadsafe.html ;-) My understanding is that what we have is pretty much select() driven single-threaded multi-fd transfer. > No, I suggested to replace the `finished` variable with an attribute (or > "field" or "member variable") of the slot, and to respect it when looking > for an unused slot, i.e. not only look for a slot whose `in_use` is 0 but > also require `reserved_for_use` to be 0. In essence, the > `run_active_slot()` function owns the slot, even if it is not marked as > `in_use`. That should address the same concern as baa7b67d but without > using a pointer to a local variable. Not really. An outer run_active_slot() and an inner run_active_slot() have a pointer to the same slot object. The inner one got hold of that object because the request the slot used to represent for the outer run_active_slot() has finished, so we would toggle either *(slot->finished) or the new slot->done in an attempt to signal the completion to the outer run_active_slot() and then make the slot not-in-use. The slot becomes in-use again with a different request and the inner run_active_slot() is run. It first says "this slot is not done yet---we are making a request using it". How would the inner one say that, exactly? In baa7b67d's fix, it is done by setting slot->finished = &finished to its own stackframe. Because the outer run_active_slot() does not look at slot->finished, but it looks at the finished on its stackframe, what the inner run_active_slot() does here would not break the outer one. If we replace the mechanism with a separate member in the slot structure, so that the outer run_active_slot() looks at slot->done and the inner run_active_slot() also clears slot->done before proceeding, then the inner one clobbers what the outer one will look at when the recursive call that led to the inner one returns.
On Mon, 23 May 2022, Junio C Hamano wrote: >> It calls into cURL, which I suspect has a multi-threaded mode of >> operation, > > https://curl.se/libcurl/c/threadsafe.html ;-) > > My understanding is that what we have is pretty much select() driven > single-threaded multi-fd transfer. Confirmed. libcurl *can* use threads (if built that way), but the only use it has for such subthreads is for resolving host names. libcurl, its API and its callbacks etc always operate in the same single thread.
Hi Daniel, On Tue, 24 May 2022, Daniel Stenberg wrote: > On Mon, 23 May 2022, Junio C Hamano wrote: > > > > It calls into cURL, which I suspect has a multi-threaded mode of > > > operation, > > > > https://curl.se/libcurl/c/threadsafe.html ;-) > > > > My understanding is that what we have is pretty much select() driven > > single-threaded multi-fd transfer. > > Confirmed. libcurl *can* use threads (if built that way), but the only use it > has for such subthreads is for resolving host names. libcurl, its API and its > callbacks etc always operate in the same single thread. Great, thanks for clarifying! Ciao, Dscho P.S.: I'm enjoying your book, Uncurled. It feels great to see validation in some experiences (I'm a youngster compared to you, maintaining Open Source software only since 2001 or so, but still), and to get a fresh and inspiring perspective on others.
Hi Junio, On Mon, 23 May 2022, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > I suggested to replace the `finished` variable with an attribute (or > > "field" or "member variable") of the slot, and to respect it when > > looking for an unused slot, i.e. not only look for a slot whose > > `in_use` is 0 but also require `reserved_for_use` to be 0. In essence, > > the `run_active_slot()` function owns the slot, even if it is not > > marked as `in_use`. That should address the same concern as baa7b67d > > but without using a pointer to a local variable. > > Not really. An outer run_active_slot() and an inner > run_active_slot() have a pointer to the same slot object. How is that possible? One of the first things that function does is to assign `slot->finished = &finished`, and then run that `while (!finished)` loop. How would the outer `run_active_slot()` ever get signaled via `finished` when the inner `run_active_slot()` would overwrite `slot->finished`? I am puzzled why we do not see infinite loops in such outer calls all the time, then. Ciao, Dscho
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> Not really. An outer run_active_slot() and an inner >> run_active_slot() have a pointer to the same slot object. > > How is that possible? One of the first things that function does is to > assign `slot->finished = &finished`, and then run that `while (!finished)` > loop. > > How would the outer `run_active_slot()` ever get signaled via `finished` > when the inner `run_active_slot()` would overwrite `slot->finished`? I am > puzzled why we do not see infinite loops in such outer calls all the time, > then. The idea in http subsystem goes like this. * Generally, we have multiple curl requests in flight. A curlm passed to curl_multi_perform() call knows about them and attempts to make as much progress without blocking. * After calling curl_multi_perform(), we call process_curl_messages() to collect the response that corresponds to the request. This is done using the slot data structure. Once we read the response, we may process it further by making a callback. * A slot, when finished, can be reused. THe reuse is controlled by its in_use member. So, let's trace a code flow, http-walker.c::fetch_object() is used as a sample starting point. * http-walker.c::fetch_object() - pushes the object name to object_request queue. - calls step_active_slots() to make progress. This function in turn - calls curl_multi_perform() repeatedly to make progress - calls process_curl_messages() to possibly complete some active slots - calls fill_active_slots() to fill more requests. This function calls the "fill" function repeatedly to make more requests, which is http-walker.c::fill_active_slot() in this code path. It - repeatedly calls start_object_request() * start_object_request() does these: - calls new_http_object_request(), which prepares object-request structure, in which there is a slot member that was obtained by calling get_active_slot(). * get_active_slot() does many things, but all we need to know here is that it does "in_use = 1". - sets callback for the slot to process_object_response() - calls start_active_slot(), which adds the slot to curlm and calls curl_multi_perform() to make progress on the active slots. - calls run_active_slots() repeatedly. Now run_active_slots() we know about. Before baa7b67d (HTTP slot reuse fixes, 2006-03-10), we used to loop on slot->in_use but to fix a bug we updated it to use slot->finished. * run_active_slot() - takes a slot - clears finished on its stack - makes slot->finished point at &finished on its stack - loops until "finished" is set - calls step_active_slots(); what it does can be seen above, but here, we need to know what process_curl_messages() it calls does, in order to complete some requests. * process_curl_messages() - reads the response from curl - finds the slot with request that resulted in the response - sets its result member - calls finish_active_slot() on it, which in turn does these: - calls closedown_active_slot(), slot->in_use becomes 0 - sets (*slot->finished) = 1 - calls slot->callback_func The callback_func was set to process_object_response() earlier in this code flow. * http-walker.c::process_object_response() - calls process_http_object_request(), which dissociates the slot from the http_object_request object. - may call fetch_alternates() when the object is not found, otherwise calls finish_object_request(). Let's see what happens when fetch_alternates() gets called here. * http-walker.c::fetch_alternates() - calls step_active_slots() to make progress - calls get_active_slot() - calls start_active_slot() - calls run_active_slot() Now we can see how the "slot" we used in the "outer" run_active_slot() can be reused for a different request. We received response to the request, and in process_curl_messages(), we called finish_active_slot() on the slot, which did three things: (1) slot is now not-in-use, (2) the "finished" on the stack of the outer run_active_slot() is set to 1, and (3) called the process_object_response() callback. The callback then asked for an unused slot, and got the slot we just used, because we no longer need it (the necessary information in the response have been copied away to http_object_request object before the slot was dissociated from it, and the only one bit of information the outer run_active_slot() needs has already been sent there on its on-stack "finished" variable). The reused slot goes through the usual start_active_slot() call to add it to curlm, and then the "inner" run_active_slot() is started on it. Until the inner run_active_slot() returns, fetch_alternates() would not return, but once it does, the control goes back to the outer run_active_slot(), where it finds that its "finished" is now set to 1. This incidentally is a good illustration why the thread-starter patch that did if (&finished == slot->finished) slot->finished = NULL; would be sufficient, and the "clear only ours" guard is not necessary, I think. If the inner run_active_slot() did not trigger a callback that adds more reuse of the slot, it will clear slot->finished to NULL itself, with or without the guard. And the outer run_active_slot() may fail to clear if the guard is there, but slot->finished is NULL in that case, so there is no point in clearing it again. And if the inner run_active_slot() did trigger a callback that ended up reusing the slot, then eventually the innermost one would have cleared slot->finished to NULL, with or without the guard, before it returned the control to inner run_active_slot(). The inference goes the same way to show that the guard is not necessary but is not hurting. I _think_ we can even get away by not doing anything to slot->finished at the end of run_active_slot(), as we are not multi-threaded and the callee only returns to the caller, but if it helps pleasing the warning compiler, I'd prefer the simplest workaround, perhaps with an unconditional clearing there? What did I miss? I must be missing something, as I can explain how the current "(*slot->finished) = 1" with "while (finished)" correctly works, but I cannot quite explain why the original "while (slot->in_use)" would not, which is annoying. In other words, why we needed baa7b67d (HTTP slot reuse fixes, 2006-03-10) in the first place? It is possible that we had some code paths that forgot to drop in_use before the inner run_active returned that have been fixed in the 16 years and this fix was hiding that bug, but I dunno.
Daniel Stenberg <daniel@haxx.se> writes: > On Mon, 23 May 2022, Junio C Hamano wrote: > >>> It calls into cURL, which I suspect has a multi-threaded mode of >>> operation, >> >> https://curl.se/libcurl/c/threadsafe.html ;-) >> >> My understanding is that what we have is pretty much select() driven >> single-threaded multi-fd transfer. > > Confirmed. libcurl *can* use threads (if built that way), but the only > use it has for such subthreads is for resolving host names. libcurl, > its API and its callbacks etc always operate in the same single > thread. Thanks. It always is nice to have the authority/expert readily answering our stupid questions ;-)
On Tue, May 24, 2022 at 10:15:57AM -0700, Junio C Hamano wrote: > > I _think_ we can even get away by not doing anything to > slot->finished at the end of run_active_slot(), as we are not > multi-threaded and the callee only returns to the caller, but if it > helps pleasing the warning compiler, I'd prefer the simplest > workaround, perhaps with an unconditional clearing there? Assuming that some overly clever compiler might optimize that out (either because it might think it is Undefined Behaviour or for other unknown reasons) then Ævar's version would be better for clearing the "warning". But your patch fixed the "bug" that a probably overeager compiler was "detecting". > What did I miss? I must be missing something, as I can explain how > the current "(*slot->finished) = 1" with "while (finished)" > correctly works, but I cannot quite explain why the original "while > (slot->in_use)" would not, which is annoying. My guess is that there is a curl version somewhere that is patched to use threads more extensible than upstream and where this code is stil needed. I think it is also safe to assume (like you did) that this is a 16 year bug that was already fixed and reverting that code would be an alternative too. Carlo
On Tue, May 24 2022, Junio C Hamano wrote: > [...] > This incidentally is a good illustration why the thread-starter > patch that did > > if (&finished == slot->finished) > slot->finished = NULL; > > would be sufficient, and the "clear only ours" guard is not > necessary, I think. If the inner run_active_slot() did not trigger > a callback that adds more reuse of the slot, it will clear > slot->finished to NULL itself, with or without the guard. And the > outer run_active_slot() may fail to clear if the guard is there, but > slot->finished is NULL in that case, so there is no point in clearing > it again. > > And if the inner run_active_slot() did trigger a callback that ended > up reusing the slot, then eventually the innermost one would have > cleared slot->finished to NULL, with or without the guard, before it > returned the control to inner run_active_slot(). The inference goes > the same way to show that the guard is not necessary but is not > hurting. > > I _think_ we can even get away by not doing anything to > slot->finished at the end of run_active_slot(), as we are not > multi-threaded and the callee only returns to the caller, but if it > helps pleasing the warning compiler, I'd prefer the simplest > workaround, perhaps with an unconditional clearing there? I'll admit I haven't fully looked into this again, but does anything in the subsequent analysis suggest that my original patch wouldn't be a working solution to this, still: https://lore.kernel.org/git/patch-1.1-1cec367e805-20220126T212921Z-avarab@gmail.com/ ? The advantage of it over any small and narrow fix like your hunk quoted above is that it doesn't make the end result look as though we care about e.g. thread races, which evidently is something more than one person looking at this has (needlessly) ended up worrying about. > What did I miss? I must be missing something, as I can explain how > the current "(*slot->finished) = 1" with "while (finished)" > correctly works, but I cannot quite explain why the original "while > (slot->in_use)" would not, which is annoying. Perhaps that change was also worried about thread safety? I only briefly re-looked at this again, but I don't think I ever found exactly what that 2006-era fix was meant to fix, specifically. > In other words, why we needed baa7b67d (HTTP slot reuse fixes, > 2006-03-10) in the first place? It is possible that we had some > code paths that forgot to drop in_use before the inner run_active > returned that have been fixed in the 16 years and this fix was > hiding that bug, but I dunno. I haven't found that out either, either back in January or just now.
On Tue, May 24 2022, Carlo Marcelo Arenas Belón wrote: > On Tue, May 24, 2022 at 10:15:57AM -0700, Junio C Hamano wrote: >> >> I _think_ we can even get away by not doing anything to >> slot->finished at the end of run_active_slot(), as we are not >> multi-threaded and the callee only returns to the caller, but if it >> helps pleasing the warning compiler, I'd prefer the simplest >> workaround, perhaps with an unconditional clearing there? > > Assuming that some overly clever compiler might optimize that out (either > because it might think it is Undefined Behaviour or for other unknown > reasons) then Ævar's version would be better for clearing the "warning". > > But your patch fixed the "bug" that a probably overeager compiler was > "detecting". Just briefly for those who perhaps didn't fully read the initial thread. Per [1] and [2] (search for -fanalyzer in that [2]) it's not a bug, undesired behavior etc. from GCC that it's "overeager" to warn in this case. Most warnings from compilers are in the category of not being triggered on the basis of exhaustive code analysis which tries to prove that it's a practical issue for *your codebase*. It's equally about warning you about patterns that might be a problem in the future. In this case I don't know how this line of reasoning started, or how the output is confusing. E.g. Johannes notes upthread that the "complaint is not actually accurate". Well, yes it is, because the warning says: http.c: In function ‘run_active_slot’: http.c:1332:24: warning: storing the address of local variable ‘finished’ in ‘*slot.finished’ [-Wdangling-pointer=] 1332 | slot->finished = &finished; | ~~~~~~~~~~~~~~~^~~~~~~~~~~ I.e. it's telling us that we're *storing* the address, which we're doing. "Storing" meaning "past the lifetime of the function". It doesn't mean that GCC has additionally proved that we'll later used it in a way that will have a meaningful impact on the behavior of our program, or even that it's tried to do that. See an excerpt from the GCC code (a comment) in [1]. 1. https://lore.kernel.org/git/220127.86mtjhdeme.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220414.86h76vd69t.gmgdl@evledraar.gmail.com/
Carlo Marcelo Arenas Belón <carenas@gmail.com> writes: > My guess is that there is a curl version somewhere that is patched to use > threads more extensible than upstream and where this code is stil needed. > I think it is also safe to assume (like you did) that this is a 16 year bug > that was already fixed and reverting that code would be an alternative too. Sorry, but this does not make any sense to me. It wasn't like we were working around somebody else's bug 16 years ago. Reverting the bugfix we made 16 years ago will make the issue fixed 16 years ago to reappear. Also, even if your version of curl library uses multi-threading internally, it cannot magically make _our_ calls into curl library to be multi-threaded, letting a control flow that came from run_active_slot() down to another recursive invocation of run_active_slot() to spin there calling curl_multi_perform() repeatedly, while returning the control to the outer run_active_slot() at the same time. So "a curl version somewhere that is patched" does not sound like a plausible explanation, either.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> I _think_ we can even get away by not doing anything to >> slot->finished at the end of run_active_slot(), as we are not >> multi-threaded and the callee only returns to the caller, but if it >> helps pleasing the warning compiler, I'd prefer the simplest >> workaround, perhaps with an unconditional clearing there? > > I'll admit I haven't fully looked into this again, but does anything in > the subsequent analysis suggest that my original patch wouldn't be a > working solution to this, still: > https://lore.kernel.org/git/patch-1.1-1cec367e805-20220126T212921Z-avarab@gmail.com/ ? I traced _one_ code path as a demonstration to show why the current "slot->finished = &finished" based solution works. But I think what we need is to demonstrate a code path in the old version that shows why the old slot->in_use would not have worked and the slot->finished was needed, and demonstrate why it NO LONGER is the case in today's code. Without that, especially with the latter, I cannot take the "just revert 16-year old bugfix because a new compiler throws a warning related to multi-threaded code to it, even though we are strictly single-threaded" as a serious solution. And because I do not think I've seen anybody has done that necessary digging, I would still prefer the "if the compiler somehow cares, then let's clear the finished member once we are done with it" much better than "we do not know why but we somehow think we can do without this bugfix, even though we wouldn't be making noises about this piece of code if a new compiler did not start emitting a warning". Thanks.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > It doesn't mean that GCC has additionally proved that we'll later used > it in a way that will have a meaningful impact on the behavior of our > program, or even that it's tried to do that. See an excerpt from the GCC > code (a comment) in [1]. But that means the warning just as irrelevant as "you stored 438 to this integer variable". Sure, there may be cases where that integer variable should not exceed 400 and if the compiler can tell us that, that would be a valuable help to developers. But "you stored an address of an object that can go out of scope in another object whose lifetime lasts beyond the scope" alone is not, without "and the caller that passed the latter object later dereferences that address here". We certainly shouldn't -Werror on such a warning and bend our code because of it.
Carlo Marcelo Arenas Belón <carenas@gmail.com> writes: > On Tue, May 24, 2022 at 10:15:57AM -0700, Junio C Hamano wrote: >> >> I _think_ we can even get away by not doing anything to >> slot->finished at the end of run_active_slot(), as we are not >> multi-threaded and the callee only returns to the caller, but if it >> helps pleasing the warning compiler, I'd prefer the simplest >> workaround, perhaps with an unconditional clearing there? > > Assuming that some overly clever compiler might optimize that out (either > because it might think it is Undefined Behaviour or for other unknown > reasons) then Ævar's version would be better for clearing the "warning". You keep saying undefined behaviour but in this case I do not quite see there is anything undefined. The warning, as Ævar said in a message, is about storing an address of an object on the stack in another object whose lifetime lasts beyond the life of the stackframe. If you dereference such a pointer _after_ we return from run_active_slot() function, the behaviour may indeed be undefined. But if you recall one such call trace I walked through for Dscho in another message this morning, we do not make such a dereferencing. The run_active_slot() function sets up the slot with the pointer to its on-stack variable in it, we make a call chain that is several levels deep, and at some point in the call chain, the request that is represented by the slot may complete and slot may be passed to finish_active_slot(), which would update (*slot->finished) thus modifying the on-stack variable of the run_active_slot() that we will eventually return to. Is such a use of the pointer in the structure a cause for an undefined behaviour?
Please, I apologize in advance because I am making statements way above my paygrade and without the relevant foundation (because as I mentioned, I looked at it briefly a while ago, and hadn't been able to get the time to complete my analysis). I did read all your analysis and while as Aevar pointed out, I might be mistaken, or not making myself clear enough, it is not intentional and if told to do so, will extract myself from this thread until I can do a full analysis. On Tue, May 24, 2022 at 4:19 PM Junio C Hamano <gitster@pobox.com> wrote: > > Is such a use of the pointer in the structure a cause for an > undefined behaviour? If you make an IMHO overly strict reading of the standard (which it would seem gcc12 implementers might have done) then ANY access to a pointer that might be out of scope is Undefined Behaviour. gcc doesn't know what happens to the variable after it gets shipped out of this function inside that opaque structure that is passed around between threads in curl, so it MIGHT reasonably assume that: 1) There is a layering violation somewhere and another thread is modifying that field to point it to a different thread local from the same function. 2) Once it gets the value back, then it can assume that reading that pointer might be undefined behaviour because it might be coming from a different stack frame than ours (after all, any sane developer would store instead a flag) 3) since the if uses a condition that is UB then it can optimize it out I can't see any other reason why in the original code, with the if, the warning was still being triggered, but not without it. BTW while trying to debug gcc to find out if this is a bug or a lost opportunity for optimization of something else, I noticed that it would only trigger in cases where the structure was shipped outside the function through another function, not when it was returned back from it, for example. I don't even have gcc-12 in the current workstation I am using, but would prepare a machine with it and debug further if you think giving a more complete answer is necessary. Carlo
Junio C Hamano venit, vidit, dixit 2022-05-25 00:34:40: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > > > It doesn't mean that GCC has additionally proved that we'll later used > > it in a way that will have a meaningful impact on the behavior of our > > program, or even that it's tried to do that. See an excerpt from the GCC > > code (a comment) in [1]. > > But that means the warning just as irrelevant as "you stored 438 to > this integer variable". Sure, there may be cases where that integer > variable should not exceed 400 and if the compiler can tell us that, > that would be a valuable help to developers. An integer can hold 438 perfectly, without any help by carefully designed code. > But "you stored an > address of an object that can go out of scope in another object > whose lifetime lasts beyond the scope" alone is not, without "and > the caller that passed the latter object later dereferences that > address here". We certainly shouldn't -Werror on such a warning > and bend our code because of it. A global variable cannot hold a meaningful pointer to a local variable, unless the carefully designed code helps. So that "analogy" rather highlights the essential difference, unless you think about a pointer as "just some number" rather than "something that can be dereferenced". [read global as outer scope, local as inner scope for simplicity] Common practice is not necessarily good practice. In a traditional C mindset, everything around pointers and memory management is doomed to boom unless the code is designed carefully and "you know what you are doing". I'm not indicating that you do not - on the contrary, you very well do, as your detailed analysis of the code flow shows. At the same time, it shows that we cannot be certain about that piece of code without that detailed expert analysis. C is not C++ nor rust, nor should it be; but the warnings and errors in newer standards typically try to avoid those pitfalls by making sure that, e.g., pointers do not go stale for "obvious reasons". They might missflag cases where this is preempted for non-obvious reasons, but forcing you to be explicit (more obvious) in your code is a good thing, especially for maintainability of the code base. Pointing from outer scope to memory in an inner scope should be a no-go; that's what this error is about. Unsetting that pointer (by setting the pointer to NULL) right before the inner scope ends is exactly the right solution. If this "breaks" the code, the code is broken already. Ironically, my original one line patch seems to work here, as your detailed analysis shows. Truth in advertising: I arrived at that patch after a considerably less detailed analysis ;) Michael
Hi Junio, On Tue, 24 May 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > > >> I _think_ we can even get away by not doing anything to > >> slot->finished at the end of run_active_slot(), as we are not > >> multi-threaded and the callee only returns to the caller, but if it > >> helps pleasing the warning compiler, I'd prefer the simplest > >> workaround, perhaps with an unconditional clearing there? > > > > I'll admit I haven't fully looked into this again, but does anything in > > the subsequent analysis suggest that my original patch wouldn't be a > > working solution to this, still: > > https://lore.kernel.org/git/patch-1.1-1cec367e805-20220126T212921Z-avarab@gmail.com/ ? > > I traced _one_ code path as a demonstration to show why the current > "slot->finished = &finished" based solution works. > > But I think what we need is to demonstrate a code path in the old > version that shows why the old slot->in_use would not have worked > and the slot->finished was needed, and demonstrate why it NO LONGER > is the case in today's code. Without that, especially with the > latter, I cannot take the "just revert 16-year old bugfix because a > new compiler throws a warning related to multi-threaded code to it, > even though we are strictly single-threaded" as a serious solution. > > And because I do not think I've seen anybody has done that necessary > digging, I would still prefer the "if the compiler somehow cares, > then let's clear the finished member once we are done with it" much > better than "we do not know why but we somehow think we can do > without this bugfix, even though we wouldn't be making noises about > this piece of code if a new compiler did not start emitting a > warning". The commit in question is baa7b67d091 (HTTP slot reuse fixes, 2006-03-10), and I did look around in the Git mailing list archives for mails that were sent around the same date, but did not see much that would help understand the context, except that the patch series clearly talks about `http-push`: https://lore.kernel.org/git/20060311041749.GB3997@reactrix.com/ The thing about `http-push` is that it adds a "fill function" that is executed in `fill_active_slots()`, which is called in turn by `step_active_slots()`, which, as you will recall, is called within that busy loop in `run_active_slot()`. And that "fill function" is where it starts to get interesting. It's called `fill_active_slot()`: https://github.com/git/git/blob/v2.36.1/http-push.c#L604-L625 This function starts requests, such as fetching loose objects, starting a PUT or a MKCOL. Notably, though, `fill_active_slot()` does not wait for the request to finish. In other words, it will potentially reuse the current slot if it was _just_ marked as no longer in use, and then the code flow will eventually return to that loop in `run_active_slot()`, with any reused slot still being marked as `in_use`. So yes, reverting that commit would reintroduce the regression, and I am very happy that we now have a grip on this Chesterton's Fence. This same analysis, of course, also puts a nail into the coffin of the `reserved_for_use` idea because while it would fix the reuse bug, it would unnecessarily squat on slots that might well be needed. Ciao, Dscho
On Tue, May 24 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> It doesn't mean that GCC has additionally proved that we'll later used >> it in a way that will have a meaningful impact on the behavior of our >> program, or even that it's tried to do that. See an excerpt from the GCC >> code (a comment) in [1]. > > But that means the warning just as irrelevant as "you stored 438 to > this integer variable". Sure, there may be cases where that integer > variable should not exceed 400 and if the compiler can tell us that, > that would be a valuable help to developers. But "you stored an > address of an object that can go out of scope in another object > whose lifetime lasts beyond the scope" alone is not, without "and > the caller that passed the latter object later dereferences that > address here". We certainly shouldn't -Werror on such a warning > and bend our code because of it. I think it says something that 1) we had exactly one of these in our codebase 2) as we've discussed the pointer isn't actually *needed* outside the scope of the function, it's just left-over. Now, if it were used, e.g. let's say we had some code that took the struct and inspected its members we'd likely have a segfault here, or worse it would "work", but only on the platforms we'd test at first. Which isn't the case with a leftover "int finished" holding a 438. The point of this warning, like so many others, is to ask "hey, do you really need to be running around with this particular pair of scissors?".
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > This same analysis, of course, also puts a nail into the coffin of the > `reserved_for_use` idea because while it would fix the reuse bug, it would > unnecessarily squat on slots that might well be needed. It is like that an in-kernel structure that represents a process has to stay around in the zombie state until its exit status is culled. With s/reserved_for_use/zombie/ the name of the new member would make more sense ;-) With the "slot->finished" trick, compared to the approach to delay the reuse, we can reuse them a bit earlier, but because I do not think we accumulate unbounded number of these zombie requests, and when we run out the active slots in the active queue, and because get_active_slot() will allocate a new one, the wastage might not be too bad. So, I am not sure if it is that bad to be called a nail in the coffin. In any case, https://github.com/git/git/actions/runs/2381379417 is the run with the single liner "clear slot->finished before leaving" with your other 3 gcc12 fixes. The tests are not clean because we have linux-leaks complaining on ds/bundle-uri-more RFC patches and win test (9) seems to have issues with t7527 (fsmonitor), but I am taking the fact that any of the "win test" jobs even start as an evidence that we have pleased gcc12 enough to get there? Thanks.
On Tue, 24 May 2022, Junio C Hamano wrote: > It always is nice to have the authority/expert readily answering our stupid > questions ;-) I probably miss a few here and there, but I do keep an eye out for curl related subjects where I can help. After all, I'm git fan and user myself.
diff --git a/http.c b/http.c index 229da4d148..85437b1980 100644 --- a/http.c +++ b/http.c @@ -1367,6 +1367,9 @@ void run_active_slot(struct active_request_slot *slot) select(max_fd+1, &readfds, &writefds, &excfds, &select_timeout); } } + + if (slot->finished == &finished) + slot->finished = NULL; } static void release_active_slot(struct active_request_slot *slot)
In http.c, the run_active_slot() function allows the given "slot" to make progress by calling step_active_slots() in a loop repeatedly, and the loop is not left until the request held in the slot completes. Ages ago, we used to use the slot->in_use member to get out of the loop, which misbehaved when the request in "slot" completes (at which time, the result of the request is copied away from the slot, and the in_use member is cleared, making the slot ready to be reused), and the "slot" gets reused to service a different request (at which time, the "slot" becomes in_use again, even though it is for a different request). The loop terminating condition mistakenly thought that the original request has yet to be completed. Today's code, after baa7b67d (HTTP slot reuse fixes, 2006-03-10) fixed this issue, uses a separate "slot->finished" member that is set in run_active_slot() to point to an on-stack variable, and the code that completes the request in finish_active_slot() clears the on-stack variable via the pointer to signal that the particular request held by the slot has completed. It also clears the in_use member (as before that fix), so that the slot itself can safely be reused for an unrelated request. One thing that is not quite clean in this arrangement is that, unless the slot gets reused, at which point the finished member is reset to NULL, the member keeps the value of &finished, which becomes a dangling pointer into the stack when run_active_slot() returns. Clear the finished member before the control leaves the function, but make sure to limit it to the case where the pointer still points at the on-stack variable of ours (the pointer may be set to point at the on-stack variable of somebody else after the slot gets reused, in which case we do not want to touch it). Signed-off-by: Junio C Hamano <gitster@pobox.com> --- * So, this has been sitting in my pile of random patches for a few weeks. http.c | 3 +++ 1 file changed, 3 insertions(+)