Message ID | bbe00b9e-64d8-4ec8-a2b9-2c6917c72dbd@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mem-pool: use st_add() in mem_pool_strvfmt() | expand |
On Sun, Mar 31, 2024 at 08:53:07PM +0200, René Scharfe wrote: > If len is INT_MAX in mem_pool_strvfmt(), then len + 1 overflows. > Casting it to size_t would prevent that. Use st_add() to go a step > further and make the addition *obviously* safe. The compiler can > optimize the check away on platforms where SIZE_MAX > INT_MAX, i.e. > basically everywhere. Yeah, I think this is a good thing to do. I was confused at first why we had an "int" at all, but it's the usual crappy snprintf interface. Which, by the way... > @@ -123,13 +124,14 @@ static char *mem_pool_strvfmt(struct mem_pool *pool, const char *fmt, > if (len < 0) > BUG("your vsnprintf is broken (returned %d)", len); Not new in your patch, and I know this is copied from the strbuf code, but I think a BUG() is probably the wrong thing. We added it long ago to let us know about broken vsnprintf() implementations, but we'd have flushed those out by now, as nothing in Git would work on such a platform. And meanwhile there are legitimate reasons for a non-broken vsnprintf() to return -1: namely that it is the only useful thing they can do when the requested string is larger than INT_MAX (e.g., "%s" on a string that is over 2GB). This is sort of academic, of course. There's no useful error to return here, and anybody who manages to shove 2GB into a place where we expect a short string fully deserves to have their program abort. I don't have a good example of where you can trigger this (it used to be easy with long attribute names, but these days we refuse to parse them). But in general probably calling die() is more appropriate. There's a similar call in vreportf() that tries to keep going, but it ends up with lousy results. E.g., try: perl -le 'print "create refs/heads/", "a"x2147483648, "HEAD"' | git update-ref --stdin which results in just "fatal: ", since formatting the error string fails. Perhaps we should just print the unexpanded format string ("invalid ref format: %s" in this case). It's not great, but it's better than nothing. I guess I diverged quite far from reviewing your patch. ;) It obviously looks fine, but the snprintf() int return value got me off on a tangent. -Peff
Am 01.04.24 um 05:36 schrieb Jeff King: > On Sun, Mar 31, 2024 at 08:53:07PM +0200, René Scharfe wrote: > > Which, by the way... > >> @@ -123,13 +124,14 @@ static char *mem_pool_strvfmt(struct mem_pool *pool, const char *fmt, >> if (len < 0) >> BUG("your vsnprintf is broken (returned %d)", len); > > Not new in your patch, and I know this is copied from the strbuf code, > but I think a BUG() is probably the wrong thing. We added it long ago to > let us know about broken vsnprintf() implementations, but we'd have > flushed those out by now, as nothing in Git would work on such a > platform. > > And meanwhile there are legitimate reasons for a non-broken vsnprintf() > to return -1: namely that it is the only useful thing they can do when > the requested string is larger than INT_MAX (e.g., "%s" on a string that > is over 2GB). This is sort of academic, of course. There's no useful > error to return here, and anybody who manages to shove 2GB into a place > where we expect a short string fully deserves to have their program > abort. > > I don't have a good example of where you can trigger this (it used to be > easy with long attribute names, but these days we refuse to parse them). > But in general probably calling die() is more appropriate. Makes sense. Could be rolled into a new wrapper, xvsnprintf(); imap-send.c::nfvasprintf() could call it as well. There are also callers of vsnprintf(3) that use its return value without checking for error: builtin/receive-pack.c::report_message(), path.c::mksnpath() and arguably imap-send.c::nfsnprintf(). > There's a similar call in vreportf() that tries to keep going, but it > ends up with lousy results. E.g., try: > > perl -le 'print "create refs/heads/", "a"x2147483648, "HEAD"' | > git update-ref --stdin > > which results in just "fatal: ", since formatting the error string > fails. Perhaps we should just print the unexpanded format string > ("invalid ref format: %s" in this case). It's not great, but it's better > than nothing. We can throw in errno to distinguish between EILSEQ (invalid wide character) and EOVERFLOW. And we'd better not call die_errno() to avoid triggering a recursion warning. We can open-code it instead: if (vsnprintf(p, pend - p, err, params) < 0) { fprintf(stderr, _("%sunable to format message '%s': %s\n"), _("fatal: "), err, strerror(errno)); exit(128); } But when I ran your test command (on macOS 14.4.1) ten times with this change I got: fatal: unable to format message 'invalid ref format: %s': Invalid argument fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 fatal: unable to format message 'invalid ref format: %s': Invalid argument fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 fatal: unable to format message 'invalid ref format: %s': Invalid argument fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 fatal: unable to format message 'invalid ref format: %s': Invalid argument fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 Which scares me. Why is errno sometimes zero? Why EINVAL instead of EOVERFLOW? O_o René
On Tue, Apr 02, 2024 at 03:48:45PM +0200, René Scharfe wrote: > Makes sense. Could be rolled into a new wrapper, xvsnprintf(); > imap-send.c::nfvasprintf() could call it as well. > > There are also callers of vsnprintf(3) that use its return value without > checking for error: builtin/receive-pack.c::report_message(), > path.c::mksnpath() and arguably imap-send.c::nfsnprintf(). Hmm, yeah. Those are all OK not to use xsnprintf(), since they handle truncation themselves. But the first two don't look like they handle a negative return well. In report_message(), we'd end up shrinking "sz". That's potentially an out-of-bounds problem, except that we'll always have put a non-empty prefix into the buffer. For mksnpath(), though, I suspect that trying to format a very long name could result in the output being full of uninitialized bytes. It only has one caller, which creates "foo~1" when we got EEXIST from "foo". So I doubt you can get up to too much mischief with it. But it could easily be replaced by mkpathdup() (or even a reusable strbuf outside the loop if you really wanted to hyper-optimize) And then we could get rid of mksnpath() entirely, and its horrible bad_path failure mode. > We can throw in errno to distinguish between EILSEQ (invalid wide > character) and EOVERFLOW. And we'd better not call die_errno() to avoid > triggering a recursion warning. We can open-code it instead: > > if (vsnprintf(p, pend - p, err, params) < 0) { > fprintf(stderr, _("%sunable to format message '%s': %s\n"), > _("fatal: "), err, strerror(errno)); > exit(128); > } We could also just throw it into the buffer and let the rest of the function proceed, like: diff --git a/usage.c b/usage.c index 09f0ed509b..5baab98fa3 100644 --- a/usage.c +++ b/usage.c @@ -19,8 +19,10 @@ static void vreportf(const char *prefix, const char *err, va_list params) } memcpy(msg, prefix, prefix_len); p = msg + prefix_len; - if (vsnprintf(p, pend - p, err, params) < 0) + if (vsnprintf(p, pend - p, err, params) < 0) { + if (snprintf(p, pend - p, "could not format error: %s", err) < 0) *p = '\0'; /* vsnprintf() failed, clip at prefix */ + } for (; p != pend - 1 && *p; p++) { if (iscntrl(*p) && *p != '\t' && *p != '\n') Though most of the rest of the function is not that useful (it is mostly removing unreadable chars, and hopefully we do not have any of those in our format strings!). I had not thought about showing strerror(). The C standard does mention a negative value for encoding errors, but says nothing about errno. POSIX does seem to mention EILSEQ and EOVERFLOW. So this... > But when I ran your test command (on macOS 14.4.1) ten times with this > change I got: > > fatal: unable to format message 'invalid ref format: %s': Invalid argument > fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 > fatal: unable to format message 'invalid ref format: %s': Invalid argument > fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 > fatal: unable to format message 'invalid ref format: %s': Invalid argument > fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 > fatal: unable to format message 'invalid ref format: %s': Invalid argument > fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 > fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 > fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 > > Which scares me. Why is errno sometimes zero? Why EINVAL instead of > EOVERFLOW? O_o ...is just confusing. I do think even without worrying about errno, simply saying "I tried to format 'some error: %s' and couldn't" is going to be way more useful than just "fatal: ". The only reason it would fail is that there's something gross in that "%s". We can't be more specific without interpreting the printf-format ourselves (which is probably not worth it). -Peff
Am 03.04.24 um 03:18 schrieb Jeff King: > On Tue, Apr 02, 2024 at 03:48:45PM +0200, René Scharfe wrote: > >> Makes sense. Could be rolled into a new wrapper, xvsnprintf(); >> imap-send.c::nfvasprintf() could call it as well. >> >> There are also callers of vsnprintf(3) that use its return value without >> checking for error: builtin/receive-pack.c::report_message(), >> path.c::mksnpath() and arguably imap-send.c::nfsnprintf(). > > Hmm, yeah. Those are all OK not to use xsnprintf(), since they handle > truncation themselves. But the first two don't look like they handle a > negative return well. In report_message(), we'd end up shrinking "sz". > That's potentially an out-of-bounds problem, except that we'll always > have put a non-empty prefix into the buffer. Getting only a truncated prefix is hopefully detected as invalid, but explicit handling would probably be better. > For mksnpath(), though, I > suspect that trying to format a very long name could result in the > output being full of uninitialized bytes. > > It only has one caller, which creates "foo~1" when we got EEXIST from > "foo". So I doubt you can get up to too much mischief with it. But it > could easily be replaced by mkpathdup() (or even a reusable strbuf > outside the loop if you really wanted to hyper-optimize) > > And then we could get rid of mksnpath() entirely, and its horrible > bad_path failure mode. mkpath() would be perfect but unusable in multiple threads. Cleaning up after mkpathdup() is a bit iffy in that caller. strbuf would be a bit much in that error path, I think, and you might have to export or reimplement strbuf_cleanup_path(). >> We can throw in errno to distinguish between EILSEQ (invalid wide >> character) and EOVERFLOW. And we'd better not call die_errno() to avoid >> triggering a recursion warning. We can open-code it instead: >> >> if (vsnprintf(p, pend - p, err, params) < 0) { >> fprintf(stderr, _("%sunable to format message '%s': %s\n"), >> _("fatal: "), err, strerror(errno)); >> exit(128); >> } > > We could also just throw it into the buffer and let the rest of the > function proceed, like: > > diff --git a/usage.c b/usage.c > index 09f0ed509b..5baab98fa3 100644 > --- a/usage.c > +++ b/usage.c > @@ -19,8 +19,10 @@ static void vreportf(const char *prefix, const char *err, va_list params) > } > memcpy(msg, prefix, prefix_len); > p = msg + prefix_len; > - if (vsnprintf(p, pend - p, err, params) < 0) > + if (vsnprintf(p, pend - p, err, params) < 0) { > + if (snprintf(p, pend - p, "could not format error: %s", err) < 0) > *p = '\0'; /* vsnprintf() failed, clip at prefix */ > + } > > for (; p != pend - 1 && *p; p++) { > if (iscntrl(*p) && *p != '\t' && *p != '\n') > > Though most of the rest of the function is not that useful (it is mostly > removing unreadable chars, and hopefully we do not have any of those in > our format strings!). For warnings and usage messages this would keep the prefix and not die. This would look a bit strange, no? usage: could not format error: TERRIBLE MESSAGE! > I had not thought about showing strerror(). The C > standard does mention a negative value for encoding errors, but says > nothing about errno. POSIX does seem to mention EILSEQ and EOVERFLOW. > So this... > >> But when I ran your test command (on macOS 14.4.1) ten times with this >> change I got: >> >> fatal: unable to format message 'invalid ref format: %s': Invalid argument >> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 >> fatal: unable to format message 'invalid ref format: %s': Invalid argument >> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 >> fatal: unable to format message 'invalid ref format: %s': Invalid argument >> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 >> fatal: unable to format message 'invalid ref format: %s': Invalid argument >> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 >> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 >> fatal: unable to format message 'invalid ref format: %s': Undefined error: 0 >> >> Which scares me. Why is errno sometimes zero? Why EINVAL instead of >> EOVERFLOW? O_o > > ...is just confusing. I do think even without worrying about errno, > simply saying "I tried to format 'some error: %s' and couldn't" is going > to be way more useful than just "fatal: ". The only reason it would fail > is that there's something gross in that "%s". We can't be more specific > without interpreting the printf-format ourselves (which is probably not > worth it). Yes, showing errno doesn't add that much value. Except in this case it shows that there's something going on that I don't understand. Dare I dig deeper? Probably not.. René
On Wed, Apr 03, 2024 at 12:01:13PM +0200, René Scharfe wrote: > > For mksnpath(), though, I > > suspect that trying to format a very long name could result in the > > output being full of uninitialized bytes. > > > > It only has one caller, which creates "foo~1" when we got EEXIST from > > "foo". So I doubt you can get up to too much mischief with it. But it > > could easily be replaced by mkpathdup() (or even a reusable strbuf > > outside the loop if you really wanted to hyper-optimize) > > > > And then we could get rid of mksnpath() entirely, and its horrible > > bad_path failure mode. > > mkpath() would be perfect but unusable in multiple threads. Cleaning > up after mkpathdup() is a bit iffy in that caller. strbuf would be a > bit much in that error path, I think, and you might have to export or > reimplement strbuf_cleanup_path(). Yeah, I'd prefer not to go to mkpath(), even though that's the simplest thing, just because we should be reducing the subtle non-reentrant parts of the code over time. I don't think the cleanup handling for mkpathdup() is too bad: diff --git a/apply.c b/apply.c index 432837a674..15dfe607ff 100644 --- a/apply.c +++ b/apply.c @@ -4502,20 +4502,25 @@ static int create_one_file(struct apply_state *state, unsigned int nr = getpid(); for (;;) { - char newpath[PATH_MAX]; - mksnpath(newpath, sizeof(newpath), "%s~%u", path, nr); + char *newpath = mkpathdup("%s~%u", path, nr); res = try_create_file(state, newpath, mode, buf, size); - if (res < 0) + if (res < 0) { + free(newpath); return -1; + } if (!res) { - if (!rename(newpath, path)) + if (!rename(newpath, path)) { + free(newpath); return 0; + } unlink_or_warn(newpath); + free(newpath); break; } if (errno != EEXIST) break; ++nr; + free(newpath); } } return error_errno(_("unable to write file '%s' mode %o"), It even gets a little easier if you hoist it to a strbuf outside the loop, as you don't need to free on each loop iteration. That seems worth it to me to get rid of a function that IMHO is mis-designed (it is really only useful if you assume paths are bounded by PATH_MAX, which we know isn't always true). > > diff --git a/usage.c b/usage.c > > index 09f0ed509b..5baab98fa3 100644 > > --- a/usage.c > > +++ b/usage.c > > @@ -19,8 +19,10 @@ static void vreportf(const char *prefix, const char *err, va_list params) > > } > > memcpy(msg, prefix, prefix_len); > > p = msg + prefix_len; > > - if (vsnprintf(p, pend - p, err, params) < 0) > > + if (vsnprintf(p, pend - p, err, params) < 0) { > > + if (snprintf(p, pend - p, "could not format error: %s", err) < 0) > > *p = '\0'; /* vsnprintf() failed, clip at prefix */ > > + } > > > > for (; p != pend - 1 && *p; p++) { > > if (iscntrl(*p) && *p != '\t' && *p != '\n') > > > > Though most of the rest of the function is not that useful (it is mostly > > removing unreadable chars, and hopefully we do not have any of those in > > our format strings!). > > For warnings and usage messages this would keep the prefix and not > die. This would look a bit strange, no? > > usage: could not format error: TERRIBLE MESSAGE! Sure, but I think any solution here is going to look strange. Keep in mind that we're trying to improve the case where we print _nothing_ useful at all. If you do see this on a non-fatal message, the subsequent messages may be informative. E.g.: error: could not format error: unable to open loose object %s fatal: bad object HEAD~12 is probably better than just exiting after the first. > Yes, showing errno doesn't add that much value. Except in this case it > shows that there's something going on that I don't understand. Dare I > dig deeper? Probably not.. Well, let us know if you do. ;) -Peff
On Wed, Apr 03, 2024 at 04:48:36PM -0400, Jeff King wrote: > Yeah, I'd prefer not to go to mkpath(), even though that's the simplest > thing, just because we should be reducing the subtle non-reentrant parts > of the code over time. I don't think the cleanup handling for > mkpathdup() is too bad: > > diff --git a/apply.c b/apply.c > index 432837a674..15dfe607ff 100644 > --- a/apply.c > +++ b/apply.c > @@ -4502,20 +4502,25 @@ static int create_one_file(struct apply_state *state, > unsigned int nr = getpid(); > > for (;;) { > - char newpath[PATH_MAX]; > - mksnpath(newpath, sizeof(newpath), "%s~%u", path, nr); > + char *newpath = mkpathdup("%s~%u", path, nr); > res = try_create_file(state, newpath, mode, buf, size); > - if (res < 0) > + if (res < 0) { > + free(newpath); > return -1; > + } > if (!res) { > - if (!rename(newpath, path)) > + if (!rename(newpath, path)) { > + free(newpath); > return 0; > + } > unlink_or_warn(newpath); > + free(newpath); > break; > } > if (errno != EEXIST) > break; > ++nr; > + free(newpath); > } > } > return error_errno(_("unable to write file '%s' mode %o"), OK, this misses one of the breaks, and potentially clobbers errno in that path by calling free(). So it is harder than I said. I still think it's worth doing, though. -Peff
Am 03.04.24 um 22:48 schrieb Jeff King: > On Wed, Apr 03, 2024 at 12:01:13PM +0200, René Scharfe wrote: > >>> diff --git a/usage.c b/usage.c >>> index 09f0ed509b..5baab98fa3 100644 >>> --- a/usage.c >>> +++ b/usage.c >>> @@ -19,8 +19,10 @@ static void vreportf(const char *prefix, const char *err, va_list params) >>> } >>> memcpy(msg, prefix, prefix_len); >>> p = msg + prefix_len; >>> - if (vsnprintf(p, pend - p, err, params) < 0) >>> + if (vsnprintf(p, pend - p, err, params) < 0) { >>> + if (snprintf(p, pend - p, "could not format error: %s", err) < 0) >>> *p = '\0'; /* vsnprintf() failed, clip at prefix */ >>> + } >>> >>> for (; p != pend - 1 && *p; p++) { >>> if (iscntrl(*p) && *p != '\t' && *p != '\n') >>> >>> Though most of the rest of the function is not that useful (it is mostly >>> removing unreadable chars, and hopefully we do not have any of those in >>> our format strings!). Hmm, this might be worth doing to avoid messing up the terminal if 'err' points into the weeds for some reason. >> For warnings and usage messages this would keep the prefix and not >> die. This would look a bit strange, no? >> >> usage: could not format error: TERRIBLE MESSAGE! > > Sure, but I think any solution here is going to look strange. Keep in > mind that we're trying to improve the case where we print _nothing_ > useful at all. If you do see this on a non-fatal message, the subsequent > messages may be informative. E.g.: > > error: could not format error: unable to open loose object %s > fatal: bad object HEAD~12 > > is probably better than just exiting after the first. OK, right, a format error doesn't have to be fatal and we can keep running and possibly provide more details. But mixing the format error with the actual payload message is not ideal. At least we should give the format error its proper prefix, while still reporting the prefix of the original message, e.g. like this: error: unable to format message: unable to open loose object %s fatal: ... or this: error: unable to format message: fatal: unable to open loose object %s I tend to like the first one slightly better, even though the empty message looks silly. It doesn't mix the two and answers the questions that I would have: Why did the program stop? Due to a fatal error. Why is the fatal message silent? The preceding error explains it. While the second one only reveals the fatality somewhere in the middle of the text, weakening the meaning of prefixes. >> Yes, showing errno doesn't add that much value. Except in this case it >> shows that there's something going on that I don't understand. Dare I >> dig deeper? Probably not.. > > Well, let us know if you do. ;) I still don't know why the error code varies between runs, but it clearly does not come from vsnprintf(3) -- if I set errno to some arbitrary value before the call, it is kept. Which is enough to convince me to ignore errno here. René
On Fri, Apr 05, 2024 at 03:10:33PM +0200, René Scharfe wrote: > >>> memcpy(msg, prefix, prefix_len); > >>> p = msg + prefix_len; > >>> - if (vsnprintf(p, pend - p, err, params) < 0) > >>> + if (vsnprintf(p, pend - p, err, params) < 0) { > >>> + if (snprintf(p, pend - p, "could not format error: %s", err) < 0) > >>> *p = '\0'; /* vsnprintf() failed, clip at prefix */ > >>> + } > >>> > >>> for (; p != pend - 1 && *p; p++) { > >>> if (iscntrl(*p) && *p != '\t' && *p != '\n') > >>> > >>> Though most of the rest of the function is not that useful (it is mostly > >>> removing unreadable chars, and hopefully we do not have any of those in > >>> our format strings!). > > Hmm, this might be worth doing to avoid messing up the terminal if > 'err' points into the weeds for some reason. I think we have bigger problems if "err" is messed up, because we'll be interpreting garbage as a format string and almost certainly triggering undefined behavior. And in general if we are not using a constant string as the format it's a potential security vulnerability. So I think we can mostly rely on the compile-time -Wformat checks for this. > OK, right, a format error doesn't have to be fatal and we can keep > running and possibly provide more details. > > But mixing the format error with the actual payload message is not ideal. > At least we should give the format error its proper prefix, while still > reporting the prefix of the original message, e.g. like this: > > error: unable to format message: unable to open loose object %s > fatal: > > ... or this: > > error: unable to format message: fatal: unable to open loose object %s > > I tend to like the first one slightly better, even though the empty > message looks silly. It doesn't mix the two and answers the questions > that I would have: Why did the program stop? Due to a fatal error. > Why is the fatal message silent? The preceding error explains it. > > While the second one only reveals the fatality somewhere in the middle > of the text, weakening the meaning of prefixes. Yeah, I think your first one is good. It's _ugly_, but it's easy to figure out what happened, and this is not something people generally see (and the status quo is just "fatal:", so you're easily 50% less ugly). > I still don't know why the error code varies between runs, but it > clearly does not come from vsnprintf(3) -- if I set errno to some > arbitrary value before the call, it is kept. Which is enough to > convince me to ignore errno here. Agreed. Thanks for digging into it. -Peff
diff --git a/mem-pool.c b/mem-pool.c index 2078c22b09..3065b12b23 100644 --- a/mem-pool.c +++ b/mem-pool.c @@ -115,6 +115,7 @@ static char *mem_pool_strvfmt(struct mem_pool *pool, const char *fmt, size_t available = block ? block->end - block->next_free : 0; va_list cp; int len, len2; + size_t size; char *ret; va_copy(cp, ap); @@ -123,13 +124,14 @@ static char *mem_pool_strvfmt(struct mem_pool *pool, const char *fmt, if (len < 0) BUG("your vsnprintf is broken (returned %d)", len); - ret = mem_pool_alloc(pool, len + 1); /* 1 for NUL */ + size = st_add(len, 1); /* 1 for NUL */ + ret = mem_pool_alloc(pool, size); /* Shortcut; relies on mem_pool_alloc() not touching buffer contents. */ if (ret == next_free) return ret; - len2 = vsnprintf(ret, len + 1, fmt, ap); + len2 = vsnprintf(ret, size, fmt, ap); if (len2 != len) BUG("your vsnprintf is broken (returns inconsistent lengths)"); return ret;
If len is INT_MAX in mem_pool_strvfmt(), then len + 1 overflows. Casting it to size_t would prevent that. Use st_add() to go a step further and make the addition *obviously* safe. The compiler can optimize the check away on platforms where SIZE_MAX > INT_MAX, i.e. basically everywhere. Signed-off-by: René Scharfe <l.s.r@web.de> --- mem-pool.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) -- 2.44.0