Message ID | 20210208140154.10964-2-rf@opensource.cirrus.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v5,1/4] lib: vsprintf: scanf: Negative number must have field width > 1 | expand |
On Mon, Feb 08, 2021 at 02:01:52PM +0000, Richard Fitzgerald wrote: > The existing code attempted to handle numbers by doing a strto[u]l(), > ignoring the field width, and then repeatedly dividing to extract the > field out of the full converted value. If the string contains a run of > valid digits longer than will fit in a long or long long, this would > overflow and no amount of dividing can recover the correct value. > > This patch fixes vsscanf() to obey number field widths when parsing > the number. > > A new _parse_integer_limit() is added that takes a limit for the number > of characters to parse. The number field conversion in vsscanf is changed > to use this new function. > > If a number starts with a radix prefix, the field width must be long > enough for at last one digit after the prefix. If not, it will be handled > like this: > > sscanf("0x4", "%1i", &i): i=0, scanning continues with the 'x' > sscanf("0x4", "%2i", &i): i=0, scanning continues with the '4' > > This is consistent with the observed behaviour of userland sscanf. > > Note that this patch does NOT fix the problem of a single field value > overflowing the target type. So for example: > > sscanf("123456789abcdef", "%x", &i); > > Will not produce the correct result because the value obviously overflows > INT_MAX. But sscanf will report a successful conversion. I have a few nit-picks, but it's up to you and maintainers how to proceed. ... > -unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base) > +static unsigned long long simple_strntoull(const char *startp, size_t max_chars, > + char **endp, unsigned int base) > { > - unsigned long long result; > + const char *cp; > + unsigned long long result = 0ULL; > unsigned int rv; > > - cp = _parse_integer_fixup_radix(cp, &base); > - rv = _parse_integer(cp, base, &result); > + cp = _parse_integer_fixup_radix(startp, &base); > + if ((cp - startp) >= max_chars) { > + cp = startp + max_chars; > + goto out; > + } > + > + max_chars -= (cp - startp); > + rv = _parse_integer_limit(cp, base, &result, max_chars); > /* FIXME */ > cp += (rv & ~KSTRTOX_OVERFLOW); > > +out: > if (endp) > *endp = (char *)cp; > > return result; > } A nit-pick: What if we rewrite above as static unsigned long long simple_strntoull(const char *cp, size_t max_chars, char **endp, unsigned int base) { unsigned long long result = 0ULL; const char *startp = cp; unsigned int rv; size_t chars; cp = _parse_integer_fixup_radix(cp, &base); chars = cp - startp; if (chars >= max_chars) { /* We hit the limit */ cp = startp + max_chars; } else { rv = _parse_integer_limit(cp, base, &result, max_chars - chars); /* FIXME */ cp += (rv & ~KSTRTOX_OVERFLOW); } if (endp) *endp = (char *)cp; return result; } ... > +static long long simple_strntoll(const char *cp, size_t max_chars, char **endp, > + unsigned int base) > +{ > + /* > + * simple_strntoull safely handles receiving max_chars==0 in the > + * case we start with max_chars==1 and find a '-' prefix. A nit-pick: Spaces surrounding '=='? simple_strntoull -> simple_strntoull()? > + */ Above misses to add something like: "Otherwise we hit the '-' as an illegal number in the following simple_strntoull() call." > + if (*cp == '-' && max_chars > 0) > + return -simple_strntoull(cp + 1, max_chars - 1, endp, base); > + > + return simple_strntoull(cp, max_chars, endp, base); > +} ... > + val.s = simple_strntoll(str, > + field_width > 0 ? field_width : SIZE_MAX, > + &next, base); A nit-pick: Wouldn't be negative field_width "big enough" to just being used as is? Also, is field_width == 0 should be treated as "parse to the MAX"? ... > + val.u = simple_strntoull(str, > + field_width > 0 ? field_width : SIZE_MAX, > + &next, base); Ditto.
On 08/02/2021 15:18, Andy Shevchenko wrote: > On Mon, Feb 08, 2021 at 02:01:52PM +0000, Richard Fitzgerald wrote: >> The existing code attempted to handle numbers by doing a strto[u]l(), >> ignoring the field width, and then repeatedly dividing to extract the >> field out of the full converted value. If the string contains a run of >> valid digits longer than will fit in a long or long long, this would >> overflow and no amount of dividing can recover the correct value. >> >> This patch fixes vsscanf() to obey number field widths when parsing >> the number. >> >> A new _parse_integer_limit() is added that takes a limit for the number >> of characters to parse. The number field conversion in vsscanf is changed >> to use this new function. >> >> If a number starts with a radix prefix, the field width must be long >> enough for at last one digit after the prefix. If not, it will be handled >> like this: >> >> sscanf("0x4", "%1i", &i): i=0, scanning continues with the 'x' >> sscanf("0x4", "%2i", &i): i=0, scanning continues with the '4' >> >> This is consistent with the observed behaviour of userland sscanf. >> >> Note that this patch does NOT fix the problem of a single field value >> overflowing the target type. So for example: >> >> sscanf("123456789abcdef", "%x", &i); >> >> Will not produce the correct result because the value obviously overflows >> INT_MAX. But sscanf will report a successful conversion. > > > I have a few nit-picks, but it's up to you and maintainers how to proceed. > > ... > >> -unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base) >> +static unsigned long long simple_strntoull(const char *startp, size_t max_chars, >> + char **endp, unsigned int base) >> { >> - unsigned long long result; >> + const char *cp; >> + unsigned long long result = 0ULL; >> unsigned int rv; >> >> - cp = _parse_integer_fixup_radix(cp, &base); >> - rv = _parse_integer(cp, base, &result); >> + cp = _parse_integer_fixup_radix(startp, &base); >> + if ((cp - startp) >= max_chars) { >> + cp = startp + max_chars; >> + goto out; >> + } >> + >> + max_chars -= (cp - startp); >> + rv = _parse_integer_limit(cp, base, &result, max_chars); >> /* FIXME */ >> cp += (rv & ~KSTRTOX_OVERFLOW); >> >> +out: >> if (endp) >> *endp = (char *)cp; >> >> return result; >> } > > A nit-pick: What if we rewrite above as > > static unsigned long long simple_strntoull(const char *cp, size_t max_chars, > char **endp, unsigned int base) > { > unsigned long long result = 0ULL; > const char *startp = cp; > unsigned int rv; > size_t chars; > > cp = _parse_integer_fixup_radix(cp, &base); > chars = cp - startp; > if (chars >= max_chars) { > /* We hit the limit */ > cp = startp + max_chars; > } else { > rv = _parse_integer_limit(cp, base, &result, max_chars - chars); > /* FIXME */ > cp += (rv & ~KSTRTOX_OVERFLOW); > } > > if (endp) > *endp = (char *)cp; > > return result; > } > > ... I don't mind rewriting that code if you prefer that way. I am used to working on other kernel subsytems where the preference is to bail out on the error case so that the "normal" case flows without nesting. > >> +static long long simple_strntoll(const char *cp, size_t max_chars, char **endp, >> + unsigned int base) >> +{ >> + /* >> + * simple_strntoull safely handles receiving max_chars==0 in the >> + * case we start with max_chars==1 and find a '-' prefix. > > A nit-pick: Spaces surrounding '=='? simple_strntoull -> simple_strntoull()? > >> + */ > > Above misses to add something like: > > "Otherwise we hit the '-' as an illegal number in the following > simple_strntoull() call." > >> + if (*cp == '-' && max_chars > 0) >> + return -simple_strntoull(cp + 1, max_chars - 1, endp, base); >> + >> + return simple_strntoull(cp, max_chars, endp, base); > > >> +} > > ... > >> + val.s = simple_strntoll(str, >> + field_width > 0 ? field_width : SIZE_MAX, >> + &next, base); > > A nit-pick: Wouldn't be negative field_width "big enough" to just being used as field_width is s16 so really should be sign-extended to make it "very big". I think this would be less readable what the intention is and what assumptions it is based on. There's a risk someone would look at (size_t)(long)field_width and think the (long) is redundant. Perhaps change field_width to int? There I ask myself "if it can be an int, why is it declared s16?" and worry there is something subtle in the code. My personal preference is to avoid using tricks in code that isn't time critical. > is? Also, is field_width == 0 should be treated as "parse to the MAX"? > > ... Earlier code terminates scanning if the width parsed from the format string is <= 0. So field_width can only be -1 or > 0 here. But now you point it out, that test would be better as field_width >= 0 ... so it deals with 0 if it ever happened to sneak through to here somehow. > >> + val.u = simple_strntoull(str, >> + field_width > 0 ? field_width : SIZE_MAX, >> + &next, base); > > Ditto. >
On Mon 2021-02-08 17:38:29, Richard Fitzgerald wrote: > On 08/02/2021 15:18, Andy Shevchenko wrote: > > On Mon, Feb 08, 2021 at 02:01:52PM +0000, Richard Fitzgerald wrote: > > > The existing code attempted to handle numbers by doing a strto[u]l(), > > > ignoring the field width, and then repeatedly dividing to extract the > > > field out of the full converted value. If the string contains a run of > > > valid digits longer than will fit in a long or long long, this would > > > overflow and no amount of dividing can recover the correct value. > > > > > > -unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base) > > > +static unsigned long long simple_strntoull(const char *startp, size_t max_chars, > > > + char **endp, unsigned int base) > > > { > > > - unsigned long long result; > > > + const char *cp; > > > + unsigned long long result = 0ULL; > > > unsigned int rv; > > > - cp = _parse_integer_fixup_radix(cp, &base); > > > - rv = _parse_integer(cp, base, &result); > > > + cp = _parse_integer_fixup_radix(startp, &base); > > > + if ((cp - startp) >= max_chars) { > > > + cp = startp + max_chars; > > > + goto out; > > > + } > > > + > > > + max_chars -= (cp - startp); > > > + rv = _parse_integer_limit(cp, base, &result, max_chars); > > > /* FIXME */ > > > cp += (rv & ~KSTRTOX_OVERFLOW); > > > +out: > > > if (endp) > > > *endp = (char *)cp; > > > return result; > > > } > > > > A nit-pick: What if we rewrite above as > > > > static unsigned long long simple_strntoull(const char *cp, size_t max_chars, > > char **endp, unsigned int base) > > { > > unsigned long long result = 0ULL; > > const char *startp = cp; > > unsigned int rv; > > size_t chars; > > > > cp = _parse_integer_fixup_radix(cp, &base); > > chars = cp - startp; > > if (chars >= max_chars) { > > /* We hit the limit */ > > cp = startp + max_chars; > > } else { > > rv = _parse_integer_limit(cp, base, &result, max_chars - chars); > > /* FIXME */ > > cp += (rv & ~KSTRTOX_OVERFLOW); > > } > > > > if (endp) > > *endp = (char *)cp; > > > > return result; > > } > > > > ... > > > I don't mind rewriting that code if you prefer that way. > I am used to working on other kernel subsytems where the preference is > to bail out on the error case so that the "normal" case flows without > nesting. Yeah. But in this case Andy's variant looks slightly better redable to me. ... > > > > > + val.s = simple_strntoll(str, > > > + field_width > 0 ? field_width : SIZE_MAX, > > > + &next, base); > > > > A nit-pick: Wouldn't be negative field_width "big enough" to just being used as > field_width is s16 so really should be sign-extended I guess that Andy just missed that it was a signed type. And it has to be because -1 means SIZE_MAX. > to make it "very > big". I think this would be less readable what the intention is and what > assumptions it is based on. There's a risk someone would look at > > (size_t)(long)field_width > > and think the (long) is redundant. > Perhaps change field_width to int? There I ask myself "if it can be an > int, why is it declared s16?" and worry there is something subtle in the > code. > > My personal preference is to avoid using tricks in code that isn't time > critical. I agree. Let's keep the check with signed type. > > is? Also, is field_width == 0 should be treated as "parse to the MAX"? filed_width == 0 actually means that no characters are read. I should return zero value. > > ... > > Earlier code terminates scanning if the width parsed from the format > string is <= 0. To make it clear what earlier code means. vsscanf() bail out earlier when field_width == 0. It is handled by this code: /* get field width */ field_width = -1; if (isdigit(*fmt)) { field_width = skip_atoi(&fmt); if (field_width <= 0) break; } > So field_width can only be -1 or > 0 here. But now you > point it out, that test would be better as field_width >= 0 ... so > it deals with 0 if it ever happened to sneak through to here > somehow. It might make sense to be proactive and change it to >= 0. But I would do it in a separate patch. The "< 0" condition matches the original code. Best Regards, Petr
On Thu 2021-02-11 13:55:26, Petr Mladek wrote: > On Mon 2021-02-08 17:38:29, Richard Fitzgerald wrote: > > On 08/02/2021 15:18, Andy Shevchenko wrote: > > > On Mon, Feb 08, 2021 at 02:01:52PM +0000, Richard Fitzgerald wrote: > > > A nit-pick: What if we rewrite above as > > > > > > static unsigned long long simple_strntoull(const char *cp, size_t max_chars, > > > char **endp, unsigned int base) > > > { > > > unsigned long long result = 0ULL; > > > const char *startp = cp; > > > unsigned int rv; > > > size_t chars; > > > > > > cp = _parse_integer_fixup_radix(cp, &base); > > > chars = cp - startp; > > > if (chars >= max_chars) { > > > /* We hit the limit */ > > > cp = startp + max_chars; > > > } else { > > > rv = _parse_integer_limit(cp, base, &result, max_chars - chars); > > > /* FIXME */ > > > cp += (rv & ~KSTRTOX_OVERFLOW); > > > } > > > > > > if (endp) > > > *endp = (char *)cp; > > > > > > return result; > > > } > > > > > > ... > > > > > > I don't mind rewriting that code if you prefer that way. > > I am used to working on other kernel subsytems where the preference is > > to bail out on the error case so that the "normal" case flows without > > nesting. > > Yeah. But in this case Andy's variant looks slightly better redable to me. > ... > > > > > > > > + val.s = simple_strntoll(str, > > > > + field_width > 0 ? field_width : SIZE_MAX, > > > > + &next, base); > > > > > > is? Also, is field_width == 0 should be treated as "parse to the MAX"? > > > > Earlier code terminates scanning if the width parsed from the format > > string is <= 0. > > > So field_width can only be -1 or > 0 here. But now you > > point it out, that test would be better as field_width >= 0 ... so > > it deals with 0 if it ever happened to sneak through to here > > somehow. > > It might make sense to be proactive and change it to >= 0. > But I would do it in a separate patch. The "< 0" condition > matches the original code. Ah, I have missed that you have already sent v6 where you did this change in the same patch. There is no need to resend it just because of this. I am going to look at v6. Best Regards, Petr
diff --git a/lib/kstrtox.c b/lib/kstrtox.c index a118b0b1e9b2..0fdd07a03564 100644 --- a/lib/kstrtox.c +++ b/lib/kstrtox.c @@ -39,20 +39,22 @@ const char *_parse_integer_fixup_radix(const char *s, unsigned int *base) /* * Convert non-negative integer string representation in explicitly given radix - * to an integer. + * to an integer. A maximum of max_chars characters will be converted. + * * Return number of characters consumed maybe or-ed with overflow bit. * If overflow occurs, result integer (incorrect) is still returned. * * Don't you dare use this function. */ -unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p) +unsigned int _parse_integer_limit(const char *s, unsigned int base, unsigned long long *p, + size_t max_chars) { unsigned long long res; unsigned int rv; res = 0; rv = 0; - while (1) { + while (max_chars--) { unsigned int c = *s; unsigned int lc = c | 0x20; /* don't tolower() this line */ unsigned int val; @@ -82,6 +84,11 @@ unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long return rv; } +unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p) +{ + return _parse_integer_limit(s, base, p, SIZE_MAX); +} + static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res) { unsigned long long _res; diff --git a/lib/kstrtox.h b/lib/kstrtox.h index 3b4637bcd254..158c400ca865 100644 --- a/lib/kstrtox.h +++ b/lib/kstrtox.h @@ -4,6 +4,8 @@ #define KSTRTOX_OVERFLOW (1U << 31) const char *_parse_integer_fixup_radix(const char *s, unsigned int *base); +unsigned int _parse_integer_limit(const char *s, unsigned int base, unsigned long long *res, + size_t max_chars); unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *res); #endif diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 28bb26cd1f67..1ede80c376b7 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -53,29 +53,43 @@ #include <linux/string_helpers.h> #include "kstrtox.h" -/** - * simple_strtoull - convert a string to an unsigned long long - * @cp: The start of the string - * @endp: A pointer to the end of the parsed string will be placed here - * @base: The number base to use - * - * This function has caveats. Please use kstrtoull instead. - */ -unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base) +static unsigned long long simple_strntoull(const char *startp, size_t max_chars, + char **endp, unsigned int base) { - unsigned long long result; + const char *cp; + unsigned long long result = 0ULL; unsigned int rv; - cp = _parse_integer_fixup_radix(cp, &base); - rv = _parse_integer(cp, base, &result); + cp = _parse_integer_fixup_radix(startp, &base); + if ((cp - startp) >= max_chars) { + cp = startp + max_chars; + goto out; + } + + max_chars -= (cp - startp); + rv = _parse_integer_limit(cp, base, &result, max_chars); /* FIXME */ cp += (rv & ~KSTRTOX_OVERFLOW); +out: if (endp) *endp = (char *)cp; return result; } + +/** + * simple_strtoull - convert a string to an unsigned long long + * @cp: The start of the string + * @endp: A pointer to the end of the parsed string will be placed here + * @base: The number base to use + * + * This function has caveats. Please use kstrtoull instead. + */ +unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base) +{ + return simple_strntoull(cp, SIZE_MAX, endp, base); +} EXPORT_SYMBOL(simple_strtoull); /** @@ -88,7 +102,7 @@ EXPORT_SYMBOL(simple_strtoull); */ unsigned long simple_strtoul(const char *cp, char **endp, unsigned int base) { - return simple_strtoull(cp, endp, base); + return simple_strntoull(cp, SIZE_MAX, endp, base); } EXPORT_SYMBOL(simple_strtoul); @@ -109,6 +123,19 @@ long simple_strtol(const char *cp, char **endp, unsigned int base) } EXPORT_SYMBOL(simple_strtol); +static long long simple_strntoll(const char *cp, size_t max_chars, char **endp, + unsigned int base) +{ + /* + * simple_strntoull safely handles receiving max_chars==0 in the + * case we start with max_chars==1 and find a '-' prefix. + */ + if (*cp == '-' && max_chars > 0) + return -simple_strntoull(cp + 1, max_chars - 1, endp, base); + + return simple_strntoull(cp, max_chars, endp, base); +} + /** * simple_strtoll - convert a string to a signed long long * @cp: The start of the string @@ -119,10 +146,7 @@ EXPORT_SYMBOL(simple_strtol); */ long long simple_strtoll(const char *cp, char **endp, unsigned int base) { - if (*cp == '-') - return -simple_strtoull(cp + 1, endp, base); - - return simple_strtoull(cp, endp, base); + return simple_strntoll(cp, SIZE_MAX, endp, base); } EXPORT_SYMBOL(simple_strtoll); @@ -3449,25 +3473,13 @@ int vsscanf(const char *buf, const char *fmt, va_list args) break; if (is_sign) - val.s = qualifier != 'L' ? - simple_strtol(str, &next, base) : - simple_strtoll(str, &next, base); + val.s = simple_strntoll(str, + field_width > 0 ? field_width : SIZE_MAX, + &next, base); else - val.u = qualifier != 'L' ? - simple_strtoul(str, &next, base) : - simple_strtoull(str, &next, base); - - if (field_width > 0 && next - str > field_width) { - if (base == 0) - _parse_integer_fixup_radix(str, &base); - while (next - str > field_width) { - if (is_sign) - val.s = div_s64(val.s, base); - else - val.u = div_u64(val.u, base); - --next; - } - } + val.u = simple_strntoull(str, + field_width > 0 ? field_width : SIZE_MAX, + &next, base); switch (qualifier) { case 'H': /* that's 'hh' in format */
The existing code attempted to handle numbers by doing a strto[u]l(), ignoring the field width, and then repeatedly dividing to extract the field out of the full converted value. If the string contains a run of valid digits longer than will fit in a long or long long, this would overflow and no amount of dividing can recover the correct value. This patch fixes vsscanf() to obey number field widths when parsing the number. A new _parse_integer_limit() is added that takes a limit for the number of characters to parse. The number field conversion in vsscanf is changed to use this new function. If a number starts with a radix prefix, the field width must be long enough for at last one digit after the prefix. If not, it will be handled like this: sscanf("0x4", "%1i", &i): i=0, scanning continues with the 'x' sscanf("0x4", "%2i", &i): i=0, scanning continues with the '4' This is consistent with the observed behaviour of userland sscanf. Note that this patch does NOT fix the problem of a single field value overflowing the target type. So for example: sscanf("123456789abcdef", "%x", &i); Will not produce the correct result because the value obviously overflows INT_MAX. But sscanf will report a successful conversion. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> --- Changed since v3: - Consistently use SIZE_MAX as the "infinity" value when passing to size_t arguments. - Use while-loop instead of for-loop in _parse_integer_limit(). - Keep the existing arguments for _parse_integer() on their original line. And the corresponding arguments to _parse_integer_limit() formatted/wrapped the same way as _parse_integer(). - Remove redundant check for (max_chars == 0) in simple_strntoull(). - Fixed "vsscanf" -> "vsscanf()" in commit message. --- lib/kstrtox.c | 13 ++++++-- lib/kstrtox.h | 2 ++ lib/vsprintf.c | 82 +++++++++++++++++++++++++++++--------------------- 3 files changed, 59 insertions(+), 38 deletions(-)