Message ID | 0ed09e9abb85e73a80d044c1ddaed303517752ac.1722632287.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Small fixes for issues detected during internal CI runs | expand |
"Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Kyle Lippincott <spectral@google.com> > > If the loop executes more than once due to cwd being longer than 128 > bytes, then `errno = ERANGE` might persist outside of this function. > This technically shouldn't be a problem, as all locations where the > value in `errno` is tested should either (a) call a function that's > guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior > to calling the function that only conditionally sets errno, such as the > `strtod` function. In the case of functions in category (b), it's easy > to forget to do that. > > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. > This matches the behavior in functions like `run_transaction_hook` > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). I am still uneasy to see this unconditional clearing, which looks more like spreading the bad practice from two places you identified than following good behaviour modelled after these two places. But I'll let it pass. As long as our programmers understand that across strbuf_getcwd(), errno will *not* be preserved, even if the function returns success, it would be OK. As the usual convention around errno is that a successful call would leave errno intact, not clear it to 0, it would make it a bit harder to learn our API for newcomers, though. Thanks. > Signed-off-by: Kyle Lippincott <spectral@google.com> > --- > strbuf.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/strbuf.c b/strbuf.c > index 3d2189a7f64..b94ef040ab0 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb) > strbuf_grow(sb, guessed_len); > if (getcwd(sb->buf, sb->alloc)) { > strbuf_setlen(sb, strlen(sb->buf)); > + errno = 0; > return 0; > }
On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote: > > [...] > > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. > > This matches the behavior in functions like `run_transaction_hook` > > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). > > I am still uneasy to see this unconditional clearing, which looks > more like spreading the bad practice from two places you identified > than following good behaviour modelled after these two places. > > But I'll let it pass. > > As long as our programmers understand that across strbuf_getcwd(), > errno will *not* be preserved, even if the function returns success, > it would be OK. As the usual convention around errno is that a > successful call would leave errno intact, not clear it to 0, it > would make it a bit harder to learn our API for newcomers, though. For what it's worth, I share your misgivings about this change and consider the suggestion[*] to make it save/restore `errno` upon success more sensible. It would also be a welcome change to see the function documentation in strbuf.h updated to mention that it follows the usual convention of leaving `errno` untouched upon success and clobbered upon error. [*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/
On Fri, Aug 2, 2024 at 2:32 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Kyle Lippincott <spectral@google.com> > > > > If the loop executes more than once due to cwd being longer than 128 > > bytes, then `errno = ERANGE` might persist outside of this function. > > This technically shouldn't be a problem, as all locations where the > > value in `errno` is tested should either (a) call a function that's > > guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior > > to calling the function that only conditionally sets errno, such as the > > `strtod` function. In the case of functions in category (b), it's easy > > to forget to do that. > > > > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. > > This matches the behavior in functions like `run_transaction_hook` > > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). > > I am still uneasy to see this unconditional clearing, which looks > more like spreading the bad practice from two places you identified > than following good behaviour modelled after these two places. > > But I'll let it pass. > > As long as our programmers understand that across strbuf_getcwd(), > errno will *not* be preserved, even if the function returns success, > it would be OK. As the usual convention around errno is that a > successful call would leave errno intact, not clear it to 0, it > would make it a bit harder to learn our API for newcomers, though. I'm sympathetic to that argument. If you'd prefer to not have this patch, I'm fine with it not landing, and instead at some future date I may try to work on those #leftoverbits from the previous patch (to make a safer wrapper around strtoX, and ban the use of the unwrapped versions), or someone else can if they beat me to it. Since this is wrapping a posix function, and posix has things to say about this (see below), I agree that it shouldn't set it to 0, and withdraw this patch. I'm including my references below mostly because with the information I just acquired, I think that any attempt to _preserve_ errno is also folly. No function we write, unless we explicitly state that it _will_ preserve errno, should feel obligated to do so. The number of cases where errno _could_ be modified according to the various specifications (C99 and posix) are just too numerous. --- Perhaps because I'm not all that experienced with C, but when I did C a couple decades ago, I operated in a mode where basically every function was actively hostile. If I wanted errno preserved across a function call, then it's up to me (the caller) to do so, regardless of what the current implementation of that function says will happen, because that can change at any point. Unless the function is documented as errno-preserving, I'm going to treat it as errno-hostile. In practice, this didn't really matter much, as I've never found `if (some_func()) { if (!some_other_func()) { /* use errno from `some_func` */ } }` logic to happen often, but maybe it does in "real" programs, I was just a hobbyist self-teaching at the time. The C standard has a very precise definition of how the library functions defined in the C specification will act. It guarantees: - the library functions defined in the specification will never set errno to 0. - the library functions defined in the specification may set the value to non-zero whether an error occurs or not, "provided the use of errno is not documented in the description of the function in this International Standard". What this means is that (a) if the function as defined in the C standard mentions errno, it can only set the values as specified there, and (b) if the function as defined in the C standard does _not_ mention errno, such as `fopen` or `strstr`, it can do _whatever it wants_ to errno, even on success, _except_ set it to 0. POSIX has similar language (https://pubs.opengroup.org/onlinepubs/009695399/functions/errno.html), with some key differences: - The value of errno should only be examined when it is indicated to be valid by a function's return value. - The setting of errno after a successful call to a function is unspecified unless the description of that function specifies that errno shall not be modified. This means that unlike the C specification, which says that if a function doesn't describe its use of errno it can do anything it wants to errno [except set it to 0], in POSIX, a function can do anything it wants to errno [except set it to 0] at any time. What this means in practice is that errno should never be assumed to be preserved across calls to posix functions (like getcwd). Also, strbuf_getcwd calls free, malloc, and realloc, none of which mention errno in the C specification, so they can do whatever they want to it [except set it to 0]. That I was able to find one single function that was causing problems is luck, and not guaranteed by any specification. Kind of makes me want to try writing an actively hostile C99 and POSIX environment, and see how many things break with it. :) C99 spec doesn't say anything about malloc setting errno? Ok! malloc now sets errno to ENOENT on tuesdays [in GMT because I'm not a monster], but only on success. On any other day, it'll set it to ERANGE, regardless of success or failure. > > Thanks. > > > Signed-off-by: Kyle Lippincott <spectral@google.com> > > --- > > strbuf.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/strbuf.c b/strbuf.c > > index 3d2189a7f64..b94ef040ab0 100644 > > --- a/strbuf.c > > +++ b/strbuf.c > > @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb) > > strbuf_grow(sb, guessed_len); > > if (getcwd(sb->buf, sb->alloc)) { > > strbuf_setlen(sb, strlen(sb->buf)); > > + errno = 0; > > return 0; > > }
Eric Sunshine <sunshine@sunshineco.com> writes: > On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote: >> > [...] >> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. >> > This matches the behavior in functions like `run_transaction_hook` >> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). >> >> I am still uneasy to see this unconditional clearing, which looks >> more like spreading the bad practice from two places you identified >> than following good behaviour modelled after these two places. >> >> But I'll let it pass. >> >> As long as our programmers understand that across strbuf_getcwd(), >> errno will *not* be preserved, even if the function returns success, >> it would be OK. As the usual convention around errno is that a >> successful call would leave errno intact, not clear it to 0, it >> would make it a bit harder to learn our API for newcomers, though. > > For what it's worth, I share your misgivings about this change and > consider the suggestion[*] to make it save/restore `errno` upon > success more sensible. It would also be a welcome change to see the > function documentation in strbuf.h updated to mention that it follows > the usual convention of leaving `errno` untouched upon success and > clobbered upon error. > > [*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/ Yup, of course save/restore would be safer, and probably easier to reason about for many people. Thanks.
On Fri, Aug 2, 2024 at 4:51 PM Kyle Lippincott <spectral@google.com> wrote: > > On Fri, Aug 2, 2024 at 2:32 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > > > From: Kyle Lippincott <spectral@google.com> > > > > > > If the loop executes more than once due to cwd being longer than 128 > > > bytes, then `errno = ERANGE` might persist outside of this function. > > > This technically shouldn't be a problem, as all locations where the > > > value in `errno` is tested should either (a) call a function that's > > > guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior > > > to calling the function that only conditionally sets errno, such as the > > > `strtod` function. In the case of functions in category (b), it's easy > > > to forget to do that. > > > > > > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. > > > This matches the behavior in functions like `run_transaction_hook` > > > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). > > > > I am still uneasy to see this unconditional clearing, which looks > > more like spreading the bad practice from two places you identified > > than following good behaviour modelled after these two places. > > > > But I'll let it pass. > > > > As long as our programmers understand that across strbuf_getcwd(), > > errno will *not* be preserved, even if the function returns success, > > it would be OK. As the usual convention around errno is that a > > successful call would leave errno intact, not clear it to 0, it > > would make it a bit harder to learn our API for newcomers, though. > > I'm sympathetic to that argument. If you'd prefer to not have this > patch, I'm fine with it not landing, and instead at some future date I > may try to work on those #leftoverbits from the previous patch (to > make a safer wrapper around strtoX, and ban the use of the unwrapped > versions), or someone else can if they beat me to it. > > Since this is wrapping a posix function, and posix has things to say > about this (see below), I agree that it shouldn't set it to 0, and > withdraw this patch. Dropped this patch in the reroll that (I think) I just sent. > > I'm including my references below mostly because with the information > I just acquired, I think that any attempt to _preserve_ errno is also > folly. No function we write, unless we explicitly state that it _will_ > preserve errno, should feel obligated to do so. The number of cases > where errno _could_ be modified according to the various > specifications (C99 and posix) are just too numerous. > > --- > > Perhaps because I'm not all that experienced with C, but when I did C > a couple decades ago, I operated in a mode where basically every > function was actively hostile. If I wanted errno preserved across a > function call, then it's up to me (the caller) to do so, regardless of > what the current implementation of that function says will happen, > because that can change at any point. Unless the function is > documented as errno-preserving, I'm going to treat it as > errno-hostile. In practice, this didn't really matter much, as I've > never found `if (some_func()) { if (!some_other_func()) { /* use errno > from `some_func` */ } }` logic to happen often, but maybe it does in > "real" programs, I was just a hobbyist self-teaching at the time. > > The C standard has a very precise definition of how the library > functions defined in the C specification will act. It guarantees: > - the library functions defined in the specification will never set errno to 0. > - the library functions defined in the specification may set the value > to non-zero whether an error occurs or not, "provided the use of errno > is not documented in the description of the function in this > International Standard". What this means is that (a) if the function > as defined in the C standard mentions errno, it can only set the > values as specified there, and (b) if the function as defined in the C > standard does _not_ mention errno, such as `fopen` or `strstr`, it can > do _whatever it wants_ to errno, even on success, _except_ set it to > 0. > > POSIX has similar language > (https://pubs.opengroup.org/onlinepubs/009695399/functions/errno.html), > with some key differences: > - The value of errno should only be examined when it is indicated to > be valid by a function's return value. > - The setting of errno after a successful call to a function is > unspecified unless the description of that function specifies that > errno shall not be modified. > > This means that unlike the C specification, which says that if a > function doesn't describe its use of errno it can do anything it wants > to errno [except set it to 0], in POSIX, a function can do anything it > wants to errno [except set it to 0] at any time. > > What this means in practice is that errno should never be assumed to > be preserved across calls to posix functions (like getcwd). Also, > strbuf_getcwd calls free, malloc, and realloc, none of which mention > errno in the C specification, so they can do whatever they want to it > [except set it to 0]. That I was able to find one single function that > was causing problems is luck, and not guaranteed by any specification. > > Kind of makes me want to try writing an actively hostile C99 and POSIX > environment, and see how many things break with it. :) C99 spec > doesn't say anything about malloc setting errno? Ok! malloc now sets > errno to ENOENT on tuesdays [in GMT because I'm not a monster], but > only on success. On any other day, it'll set it to ERANGE, regardless > of success or failure. > > > > > Thanks. > > > > > Signed-off-by: Kyle Lippincott <spectral@google.com> > > > --- > > > strbuf.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/strbuf.c b/strbuf.c > > > index 3d2189a7f64..b94ef040ab0 100644 > > > --- a/strbuf.c > > > +++ b/strbuf.c > > > @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb) > > > strbuf_grow(sb, guessed_len); > > > if (getcwd(sb->buf, sb->alloc)) { > > > strbuf_setlen(sb, strlen(sb->buf)); > > > + errno = 0; > > > return 0; > > > }
On Mon, Aug 05, 2024 at 08:51:50AM -0700, Junio C Hamano wrote: > Eric Sunshine <sunshine@sunshineco.com> writes: > > > On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote: > >> > [...] > >> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. > >> > This matches the behavior in functions like `run_transaction_hook` > >> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). > >> > >> I am still uneasy to see this unconditional clearing, which looks > >> more like spreading the bad practice from two places you identified > >> than following good behaviour modelled after these two places. > >> > >> But I'll let it pass. > >> > >> As long as our programmers understand that across strbuf_getcwd(), > >> errno will *not* be preserved, even if the function returns success, > >> it would be OK. As the usual convention around errno is that a > >> successful call would leave errno intact, not clear it to 0, it > >> would make it a bit harder to learn our API for newcomers, though. > > > > For what it's worth, I share your misgivings about this change and > > consider the suggestion[*] to make it save/restore `errno` upon > > success more sensible. It would also be a welcome change to see the > > function documentation in strbuf.h updated to mention that it follows > > the usual convention of leaving `errno` untouched upon success and > > clobbered upon error. > > > > [*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/ > > Yup, of course save/restore would be safer, and probably easier to > reason about for many people. Is it really all that reasonable? We're essentially partitioning our set of APIs into two sets, where one set knows to keep `errno` intact whereas another set doesn't. In such a world, you have to be very careful about which APIs you are calling in a function that wants to keep `errno` intact, which to me sounds like a maintenance headache. I'd claim that most callers never care about `errno` at all. For the callers that do, I feel it is way more fragile to rely on whether or not a called function leaves `errno` intact or not. For one, it's fragile because that may easily change due to a bug. Second, it is fragile because the dependency on `errno` is not explicitly documented via code, but rather an implicit dependency. So isn't it more reasonable to rather make the few callers that do require `errno` to be left intact to save it? It makes the dependency explicit, avoids splitting our functions into two sets and allows us to just ignore this issue for the majority of functions that couldn't care less about `errno`. Patrick
On Mon, Aug 5, 2024 at 11:26 PM Patrick Steinhardt <ps@pks.im> wrote: > > On Mon, Aug 05, 2024 at 08:51:50AM -0700, Junio C Hamano wrote: > > Eric Sunshine <sunshine@sunshineco.com> writes: > > > > > On Fri, Aug 2, 2024 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote: > > >> > [...] > > >> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully. > > >> > This matches the behavior in functions like `run_transaction_hook` > > >> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564). > > >> > > >> I am still uneasy to see this unconditional clearing, which looks > > >> more like spreading the bad practice from two places you identified > > >> than following good behaviour modelled after these two places. > > >> > > >> But I'll let it pass. > > >> > > >> As long as our programmers understand that across strbuf_getcwd(), > > >> errno will *not* be preserved, even if the function returns success, > > >> it would be OK. As the usual convention around errno is that a > > >> successful call would leave errno intact, not clear it to 0, it > > >> would make it a bit harder to learn our API for newcomers, though. > > > > > > For what it's worth, I share your misgivings about this change and > > > consider the suggestion[*] to make it save/restore `errno` upon > > > success more sensible. It would also be a welcome change to see the > > > function documentation in strbuf.h updated to mention that it follows > > > the usual convention of leaving `errno` untouched upon success and > > > clobbered upon error. > > > > > > [*]: https://lore.kernel.org/git/xmqqv80jeza5.fsf@gitster.g/ > > > > Yup, of course save/restore would be safer, and probably easier to > > reason about for many people. > > Is it really all that reasonable? We're essentially partitioning our set > of APIs into two sets, where one set knows to keep `errno` intact > whereas another set doesn't. In such a world, you have to be very > careful about which APIs you are calling in a function that wants to > keep `errno` intact, which to me sounds like a maintenance headache. > > I'd claim that most callers never care about `errno` at all. For the > callers that do, I feel it is way more fragile to rely on whether or not > a called function leaves `errno` intact or not. For one, it's fragile > because that may easily change due to a bug. Second, it is fragile > because the dependency on `errno` is not explicitly documented via code, > but rather an implicit dependency. > > So isn't it more reasonable to rather make the few callers that do > require `errno` to be left intact to save it? It makes the dependency > explicit, avoids splitting our functions into two sets and allows us to > just ignore this issue for the majority of functions that couldn't care > less about `errno`. 100% agreed. The C language specification says you can't rely on errno persisting across function calls, and that the caller must preserve it if it needs that behavior for some reason. The POSIX specification says you can't either except in very rare circumstances where it guarantees errno will not change. The Linux man page for errno says you can't rely on errno not changing, even for printf: https://man7.org/linux/man-pages/man3/errno.3.html A common mistake is to do if (somecall() == -1) { printf("somecall() failed\n"); if (errno == ...) { ... } } where errno no longer needs to have the value it had upon return from somecall() (i.e., it may have been changed by the printf(3)). If the value of errno should be preserved across a library call, it must be saved: if (somecall() == -1) { int errsv = errno; printf("somecall() failed\n"); if (errsv == ...) { ... } } Basically: errno is _extremely_ volatile. One should assume that _every_ function call is going to change it, even if they return successfully. The only thing that can't happen is that the functions defined in the C and POSIX standards set errno to 0, which is why I withdrew the patch (since it's a wrapper around a function defined in POSIX). But in general, I don't see any reason for any of the functions we write to be errno preserving, especially since any call to malloc, printf, trace functionality, etc. may modify errno. > > Patrick
diff --git a/strbuf.c b/strbuf.c index 3d2189a7f64..b94ef040ab0 100644 --- a/strbuf.c +++ b/strbuf.c @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb) strbuf_grow(sb, guessed_len); if (getcwd(sb->buf, sb->alloc)) { strbuf_setlen(sb, strlen(sb->buf)); + errno = 0; return 0; }