Message ID | 20191205021459.29920-1-tao3.xu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | util/cutils: Expand do_strtosz parsing precision to 64 bits | expand |
Tao Xu <tao3.xu@intel.com> writes: > Parse input string both as a double and as a uint64_t, then use the > method which consumes more characters. Update the related test cases. > > Signed-off-by: Tao Xu <tao3.xu@intel.com> > --- [...] > diff --git a/util/cutils.c b/util/cutils.c > index 77acadc70a..b08058c57c 100644 > --- a/util/cutils.c > +++ b/util/cutils.c > @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end, > const char default_suffix, int64_t unit, > uint64_t *result) > { > - int retval; > - const char *endptr; > + int retval, retd, retu; > + const char *suffix, *suffixd, *suffixu; > unsigned char c; > int mul_required = 0; > - double val, mul, integral, fraction; > + bool use_strtod; > + uint64_t valu; > + double vald, mul, integral, fraction; Note for later: @mul is double. > + > + retd = qemu_strtod_finite(nptr, &suffixd, &vald); > + retu = qemu_strtou64(nptr, &suffixu, 0, &valu); > + use_strtod = strlen(suffixd) < strlen(suffixu); > + > + /* > + * Parse @nptr both as a double and as a uint64_t, then use the method > + * which consumes more characters. > + */ The comment is in a funny place. I'd put it right before the qemu_strtod_finite() line. > + if (use_strtod) { > + suffix = suffixd; > + retval = retd; > + } else { > + suffix = suffixu; > + retval = retu; > + } > > - retval = qemu_strtod_finite(nptr, &endptr, &val); > if (retval) { > goto out; > } This is even more subtle than it looks. A close reading of the function contracts leads to three cases for each conversion: * parse error (including infinity and NaN) @retu / @retd is -EINVAL @valu / @vald is uninitialized @suffixu / @suffixd is @nptr * range error @retu / @retd is -ERANGE @valu / @vald is our best approximation of the conversion result @suffixu / @suffixd points to the first character not consumed by the conversion. Sub-cases: - uint64_t overflow We know the conversion result exceeds UINT64_MAX. - double overflow we know the conversion result's magnitude exceeds the largest representable finite double DBL_MAX. - double underflow we know the conversion result is close to zero (closer than DBL_MIN, the smallest normalized positive double). * success @retu / @retd is 0 @valu / @vald is the conversion result @suffixu / @suffixd points to the first character not consumed by the conversion. This leads to a matrix (parse error, uint64_t overflow, success) x (parse error, double overflow, double underflow, success). We need to check the code does what we want for each element of this matrix, and document any behavior that's not perfectly obvious. (success, success): we pick uint64_t if qemu_strtou64() consumed more characters than qemu_strtod_finite(), else double. "More" is important here; when they consume the same characters, we *need* to use the uint64_t result. Example: for "18446744073709551615", we need to use uint64_t 18446744073709551615, not double 18446744073709551616.0. But for "18446744073709551616.", we need to use the double. Good. (success, parse error) and (parse error, success): we pick the one that succeeds, because success consumes characters, and failure to parse does not. Good. (parse error, parse error): neither consumes characters, so we pick uint64_t. Good. (parse error, double overflow), (parse error, double underflow) and (uint64_t overflow, parse error): we pick the range error, because it consumes characters. Good. These are the simple combinations. The remainder are hairier: (success, double overflow), (success, double underflow), (uint64_t overflow, success). I lack the time to analyze them today. Must be done before we take this patch. Any takers? > - fraction = modf(val, &integral); > - if (fraction != 0) { > - mul_required = 1; > + if (use_strtod) { > + fraction = modf(vald, &integral); > + if (fraction != 0) { > + mul_required = 1; > + } > } Here, @suffix points to the suffix character, if any. > - c = *endptr; > + c = *suffix; > mul = suffix_mul(c, unit); > if (mul >= 0) { > - endptr++; > + suffix++; Now @suffix points to the first character not consumed, *not* the suffix. Your patch effectively renames @endptr to @suffix. I think @endptr is the better name. Keeping the name also makes the diff smaller and slightly easier to review. > } else { > mul = suffix_mul(default_suffix, unit); suffix_mul() returns int64_t. The assignment converts it to double. Fine before the patch, because @mul is the multiplier for a double value. No longer true after the patch, see below. > assert(mul >= 0); > @@ -238,23 +257,36 @@ static int do_strtosz(const char *nptr, const char **end, > retval = -EINVAL; > goto out; > } > - /* > - * Values near UINT64_MAX overflow to 2**64 when converting to double > - * precision. Compare against the maximum representable double precision > - * value below 2**64, computed as "the next value after 2**64 (0x1p64) in > - * the direction of 0". > - */ > - if ((val * mul > nextafter(0x1p64, 0)) || val < 0) { > - retval = -ERANGE; > - goto out; > + > + if (use_strtod) { > + /* > + * Values near UINT64_MAX overflow to 2**64 when converting to double > + * precision. Compare against the maximum representable double precision > + * value below 2**64, computed as "the next value after 2**64 (0x1p64) > + * in the direction of 0". > + */ > + if ((vald * mul > nextafter(0x1p64, 0)) || vald < 0) { > + retval = -ERANGE; > + goto out; > + } > + *result = vald * mul; Here, @mul is a multiplier for double vald. > + } else { > + /* Reject negative input and overflow output */ > + while (qemu_isspace(*nptr)) { > + nptr++; > + } > + if (*nptr == '-' || UINT64_MAX / (uint64_t) mul < valu) { > + retval = -ERANGE; > + goto out; > + } > + *result = valu * (uint64_t) mul; Here, @mul is a multiplier for uint64_t valu. Please change @mul to int64_t to reduce conversions. > } > - *result = val * mul; > retval = 0; > > out: > if (end) { > - *end = endptr; > - } else if (*endptr) { > + *end = suffix; > + } else if (*suffix) { > retval = -EINVAL; > }
On 12/5/19 11:29 PM, Markus Armbruster wrote: > Tao Xu <tao3.xu@intel.com> writes: > >> Parse input string both as a double and as a uint64_t, then use the >> method which consumes more characters. Update the related test cases. >> >> Signed-off-by: Tao Xu <tao3.xu@intel.com> >> --- > [...] >> diff --git a/util/cutils.c b/util/cutils.c >> index 77acadc70a..b08058c57c 100644 >> --- a/util/cutils.c >> +++ b/util/cutils.c >> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end, >> const char default_suffix, int64_t unit, >> uint64_t *result) >> { >> - int retval; >> - const char *endptr; >> + int retval, retd, retu; >> + const char *suffix, *suffixd, *suffixu; >> unsigned char c; >> int mul_required = 0; >> - double val, mul, integral, fraction; >> + bool use_strtod; >> + uint64_t valu; >> + double vald, mul, integral, fraction; > > Note for later: @mul is double. > >> + >> + retd = qemu_strtod_finite(nptr, &suffixd, &vald); >> + retu = qemu_strtou64(nptr, &suffixu, 0, &valu); >> + use_strtod = strlen(suffixd) < strlen(suffixu); >> + >> + /* >> + * Parse @nptr both as a double and as a uint64_t, then use the method >> + * which consumes more characters. >> + */ > > The comment is in a funny place. I'd put it right before the > qemu_strtod_finite() line. > >> + if (use_strtod) { >> + suffix = suffixd; >> + retval = retd; >> + } else { >> + suffix = suffixu; >> + retval = retu; >> + } >> >> - retval = qemu_strtod_finite(nptr, &endptr, &val); >> if (retval) { >> goto out; >> } > > This is even more subtle than it looks. > > A close reading of the function contracts leads to three cases for each > conversion: > > * parse error (including infinity and NaN) > > @retu / @retd is -EINVAL > @valu / @vald is uninitialized > @suffixu / @suffixd is @nptr > > * range error > > @retu / @retd is -ERANGE > @valu / @vald is our best approximation of the conversion result > @suffixu / @suffixd points to the first character not consumed by the > conversion. > > Sub-cases: > > - uint64_t overflow > > We know the conversion result exceeds UINT64_MAX. > > - double overflow > > we know the conversion result's magnitude exceeds the largest > representable finite double DBL_MAX. > > - double underflow > > we know the conversion result is close to zero (closer than DBL_MIN, > the smallest normalized positive double). > > * success > > @retu / @retd is 0 > @valu / @vald is the conversion result > @suffixu / @suffixd points to the first character not consumed by the > conversion. > > This leads to a matrix (parse error, uint64_t overflow, success) x > (parse error, double overflow, double underflow, success). We need to > check the code does what we want for each element of this matrix, and > document any behavior that's not perfectly obvious. > > (success, success): we pick uint64_t if qemu_strtou64() consumed more > characters than qemu_strtod_finite(), else double. "More" is important > here; when they consume the same characters, we *need* to use the > uint64_t result. Example: for "18446744073709551615", we need to use > uint64_t 18446744073709551615, not double 18446744073709551616.0. But > for "18446744073709551616.", we need to use the double. Good. > > (success, parse error) and (parse error, success): we pick the one that > succeeds, because success consumes characters, and failure to parse does > not. Good. > > (parse error, parse error): neither consumes characters, so we pick > uint64_t. Good. > > (parse error, double overflow), (parse error, double underflow) and > (uint64_t overflow, parse error): we pick the range error, because it > consumes characters. Good. > > These are the simple combinations. The remainder are hairier: (success, > double overflow), (success, double underflow), (uint64_t overflow, > success). I lack the time to analyze them today. Must be done before > we take this patch. Any takers? (success, double overflow), (success, double underflow), pick double overflow error, return -ERANGE. Because it consumes characters. Example: for "1.79769e+309", qemu_strtou64 consumes "1", and prases as uint64_t; but qemu_strtod_finite return -ERANGE and consumes all characters. It is OK. (uint64_t overflow, success), consume the same characters, use the uint64_t return -ERANGE. Note that even if qemu_strtod_finite can parse these cases such as "18446744073709551617", but the result is uint64_t so we also need to return -ERANGE. It is OK. Thank you for your analysis and suggestion. I will add more test cases to cover some of these analysis. > >> - fraction = modf(val, &integral); >> - if (fraction != 0) { >> - mul_required = 1; >> + if (use_strtod) { >> + fraction = modf(vald, &integral); >> + if (fraction != 0) { >> + mul_required = 1; >> + } >> } > > Here, @suffix points to the suffix character, if any. > >> - c = *endptr; >> + c = *suffix; >> mul = suffix_mul(c, unit); >> if (mul >= 0) { >> - endptr++; >> + suffix++; > > Now @suffix points to the first character not consumed, *not* the > suffix. > > Your patch effectively renames @endptr to @suffix. I think @endptr is > the better name. Keeping the name also makes the diff smaller and > slightly easier to review. > >> } else { >> mul = suffix_mul(default_suffix, unit); > > suffix_mul() returns int64_t. The assignment converts it to double. > Fine before the patch, because @mul is the multiplier for a double > value. No longer true after the patch, see below. > >> assert(mul >= 0); >> @@ -238,23 +257,36 @@ static int do_strtosz(const char *nptr, const char **end, >> retval = -EINVAL; >> goto out; >> } >> - /* >> - * Values near UINT64_MAX overflow to 2**64 when converting to double >> - * precision. Compare against the maximum representable double precision >> - * value below 2**64, computed as "the next value after 2**64 (0x1p64) in >> - * the direction of 0". >> - */ >> - if ((val * mul > nextafter(0x1p64, 0)) || val < 0) { >> - retval = -ERANGE; >> - goto out; >> + >> + if (use_strtod) { >> + /* >> + * Values near UINT64_MAX overflow to 2**64 when converting to double >> + * precision. Compare against the maximum representable double precision >> + * value below 2**64, computed as "the next value after 2**64 (0x1p64) >> + * in the direction of 0". >> + */ >> + if ((vald * mul > nextafter(0x1p64, 0)) || vald < 0) { >> + retval = -ERANGE; >> + goto out; >> + } >> + *result = vald * mul; > > Here, @mul is a multiplier for double vald. > >> + } else { >> + /* Reject negative input and overflow output */ >> + while (qemu_isspace(*nptr)) { >> + nptr++; >> + } >> + if (*nptr == '-' || UINT64_MAX / (uint64_t) mul < valu) { >> + retval = -ERANGE; >> + goto out; >> + } >> + *result = valu * (uint64_t) mul; > > Here, @mul is a multiplier for uint64_t valu. > > Please change @mul to int64_t to reduce conversions. > >> } >> - *result = val * mul; >> retval = 0; >> >> out: >> if (end) { >> - *end = endptr; >> - } else if (*endptr) { >> + *end = suffix; >> + } else if (*suffix) { >> retval = -EINVAL; >> } >
Tao Xu <tao3.xu@intel.com> writes: > On 12/5/19 11:29 PM, Markus Armbruster wrote: >> Tao Xu <tao3.xu@intel.com> writes: >> >>> Parse input string both as a double and as a uint64_t, then use the >>> method which consumes more characters. Update the related test cases. >>> >>> Signed-off-by: Tao Xu <tao3.xu@intel.com> >>> --- >> [...] >>> diff --git a/util/cutils.c b/util/cutils.c >>> index 77acadc70a..b08058c57c 100644 >>> --- a/util/cutils.c >>> +++ b/util/cutils.c >>> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end, >>> const char default_suffix, int64_t unit, >>> uint64_t *result) >>> { >>> - int retval; >>> - const char *endptr; >>> + int retval, retd, retu; >>> + const char *suffix, *suffixd, *suffixu; >>> unsigned char c; >>> int mul_required = 0; >>> - double val, mul, integral, fraction; >>> + bool use_strtod; >>> + uint64_t valu; >>> + double vald, mul, integral, fraction; >> >> Note for later: @mul is double. >> >>> + >>> + retd = qemu_strtod_finite(nptr, &suffixd, &vald); >>> + retu = qemu_strtou64(nptr, &suffixu, 0, &valu); Note for later: passing 0 to base accepts octal and hexadecimal integers. >>> + use_strtod = strlen(suffixd) < strlen(suffixu); >>> + >>> + /* >>> + * Parse @nptr both as a double and as a uint64_t, then use the method >>> + * which consumes more characters. >>> + */ >> >> The comment is in a funny place. I'd put it right before the >> qemu_strtod_finite() line. >> >>> + if (use_strtod) { >>> + suffix = suffixd; >>> + retval = retd; >>> + } else { >>> + suffix = suffixu; >>> + retval = retu; >>> + } >>> - retval = qemu_strtod_finite(nptr, &endptr, &val); >>> if (retval) { >>> goto out; >>> } >> >> This is even more subtle than it looks. >> >> A close reading of the function contracts leads to three cases for each >> conversion: >> >> * parse error (including infinity and NaN) >> >> @retu / @retd is -EINVAL >> @valu / @vald is uninitialized >> @suffixu / @suffixd is @nptr >> >> * range error >> >> @retu / @retd is -ERANGE >> @valu / @vald is our best approximation of the conversion result >> @suffixu / @suffixd points to the first character not consumed by the >> conversion. >> >> Sub-cases: >> >> - uint64_t overflow >> >> We know the conversion result exceeds UINT64_MAX. >> >> - double overflow >> >> we know the conversion result's magnitude exceeds the largest >> representable finite double DBL_MAX. >> >> - double underflow >> >> we know the conversion result is close to zero (closer than DBL_MIN, >> the smallest normalized positive double). >> >> * success >> >> @retu / @retd is 0 >> @valu / @vald is the conversion result >> @suffixu / @suffixd points to the first character not consumed by the >> conversion. >> >> This leads to a matrix (parse error, uint64_t overflow, success) x >> (parse error, double overflow, double underflow, success). We need to >> check the code does what we want for each element of this matrix, and >> document any behavior that's not perfectly obvious. >> >> (success, success): we pick uint64_t if qemu_strtou64() consumed more >> characters than qemu_strtod_finite(), else double. "More" is important >> here; when they consume the same characters, we *need* to use the >> uint64_t result. Example: for "18446744073709551615", we need to use >> uint64_t 18446744073709551615, not double 18446744073709551616.0. But >> for "18446744073709551616.", we need to use the double. Good. Also fun: for "0123", we use uint64_t 83, not double 123.0. But for "0123.", we use 123.0, not 83. Do we really want to accept octal and hexadecimal integers? >> (success, parse error) and (parse error, success): we pick the one that >> succeeds, because success consumes characters, and failure to parse does >> not. Good. >> >> (parse error, parse error): neither consumes characters, so we pick >> uint64_t. Good. >> >> (parse error, double overflow), (parse error, double underflow) and >> (uint64_t overflow, parse error): we pick the range error, because it >> consumes characters. Good. >> >> These are the simple combinations. The remainder are hairier: (success, >> double overflow), (success, double underflow), (uint64_t overflow, >> success). I lack the time to analyze them today. Must be done before >> we take this patch. Any takers? > > (success, double overflow), (success, double underflow), pick double > overflow error, return -ERANGE. Because it consumes > characters. Example: for "1.79769e+309", qemu_strtou64 consumes "1", > and prases as uint64_t; but qemu_strtod_finite return -ERANGE and > consumes all characters. It is OK. The only way to have double overflow when uint64_t succeeds is an exponent. Double consumes the characters making up the exponent, uint64_t does not. We use double. The only way to have double underflow is with an exponent or a decimal point. Double consumes their characters, uint64_t does not. We use double. Okay. > (uint64_t overflow, success), consume the same characters, use the > uint64_t return -ERANGE. Note that even if qemu_strtod_finite can > parse these cases such as "18446744073709551617", but the result is > uint64_t so we also need to return -ERANGE. It is OK. That's just one of two cases, I think. The other one is when the overflowing integer is followed by an exponent or decimal point. We use double then. Converting the double to uint64_t overflows, except when a negative exponent brings the number into range. Examples: "18446744073709551617" picks uint64_t overflow, "18446744073709551617.0" picks double success (but converting it to uint64_t below overflows), and "18446744073709551617e-10" picks double success (converted to 1844674407 below). Okay. > Thank you for your analysis and suggestion. I will add more test cases > to cover some of these analysis. Good move.
> On 5 Dec 2019, at 16:29, Markus Armbruster <armbru@redhat.com> wrote: > > Tao Xu <tao3.xu@intel.com> writes: > >> Parse input string both as a double and as a uint64_t, then use the >> method which consumes more characters. Update the related test cases. >> >> Signed-off-by: Tao Xu <tao3.xu@intel.com> >> --- > [...] >> diff --git a/util/cutils.c b/util/cutils.c >> index 77acadc70a..b08058c57c 100644 >> --- a/util/cutils.c >> +++ b/util/cutils.c >> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end, >> const char default_suffix, int64_t unit, >> uint64_t *result) >> { >> - int retval; >> - const char *endptr; >> + int retval, retd, retu; >> + const char *suffix, *suffixd, *suffixu; >> unsigned char c; >> int mul_required = 0; >> - double val, mul, integral, fraction; >> + bool use_strtod; >> + uint64_t valu; >> + double vald, mul, integral, fraction; > > Note for later: @mul is double. > >> + >> + retd = qemu_strtod_finite(nptr, &suffixd, &vald); >> + retu = qemu_strtou64(nptr, &suffixu, 0, &valu); >> + use_strtod = strlen(suffixd) < strlen(suffixu); >> + >> + /* >> + * Parse @nptr both as a double and as a uint64_t, then use the method >> + * which consumes more characters. >> + */ > > The comment is in a funny place. I'd put it right before the > qemu_strtod_finite() line. > >> + if (use_strtod) { >> + suffix = suffixd; >> + retval = retd; >> + } else { >> + suffix = suffixu; >> + retval = retu; >> + } >> >> - retval = qemu_strtod_finite(nptr, &endptr, &val); >> if (retval) { >> goto out; >> } > > This is even more subtle than it looks. But why it is even necessary? The “contract” for the function used to be that it returned rounded values beyond 2^53, which in itself is curious. But now it’s a 6-dimensional matrix of hell with NaNs and barfnots, when the name implies it’s simply doing a text to u64 conversion… There is certainly a reason, but I’m really curious what it is :-) > > A close reading of the function contracts leads to three cases for each > conversion: > > * parse error (including infinity and NaN) > > @retu / @retd is -EINVAL > @valu / @vald is uninitialized > @suffixu / @suffixd is @nptr > > * range error > > @retu / @retd is -ERANGE > @valu / @vald is our best approximation of the conversion result > @suffixu / @suffixd points to the first character not consumed by the > conversion. > > Sub-cases: > > - uint64_t overflow > > We know the conversion result exceeds UINT64_MAX. > > - double overflow > > we know the conversion result's magnitude exceeds the largest > representable finite double DBL_MAX. > > - double underflow > > we know the conversion result is close to zero (closer than DBL_MIN, > the smallest normalized positive double). > > * success > > @retu / @retd is 0 > @valu / @vald is the conversion result > @suffixu / @suffixd points to the first character not consumed by the > conversion. > > This leads to a matrix (parse error, uint64_t overflow, success) x > (parse error, double overflow, double underflow, success). We need to > check the code does what we want for each element of this matrix, and > document any behavior that's not perfectly obvious. > > (success, success): we pick uint64_t if qemu_strtou64() consumed more > characters than qemu_strtod_finite(), else double. "More" is important > here; when they consume the same characters, we *need* to use the > uint64_t result. Example: for "18446744073709551615", we need to use > uint64_t 18446744073709551615, not double 18446744073709551616.0. But > for "18446744073709551616.", we need to use the double. Good. > > (success, parse error) and (parse error, success): we pick the one that > succeeds, because success consumes characters, and failure to parse does > not. Good. > > (parse error, parse error): neither consumes characters, so we pick > uint64_t. Good. > > (parse error, double overflow), (parse error, double underflow) and > (uint64_t overflow, parse error): we pick the range error, because it > consumes characters. Good. > > These are the simple combinations. The remainder are hairier: (success, > double overflow), (success, double underflow), (uint64_t overflow, > success). I lack the time to analyze them today. Must be done before > we take this patch. Any takers? > >> - fraction = modf(val, &integral); >> - if (fraction != 0) { >> - mul_required = 1; >> + if (use_strtod) { >> + fraction = modf(vald, &integral); >> + if (fraction != 0) { >> + mul_required = 1; >> + } >> } > > Here, @suffix points to the suffix character, if any. > >> - c = *endptr; >> + c = *suffix; >> mul = suffix_mul(c, unit); >> if (mul >= 0) { >> - endptr++; >> + suffix++; > > Now @suffix points to the first character not consumed, *not* the > suffix. > > Your patch effectively renames @endptr to @suffix. I think @endptr is > the better name. Keeping the name also makes the diff smaller and > slightly easier to review. > >> } else { >> mul = suffix_mul(default_suffix, unit); > > suffix_mul() returns int64_t. The assignment converts it to double. > Fine before the patch, because @mul is the multiplier for a double > value. No longer true after the patch, see below. > >> assert(mul >= 0); >> @@ -238,23 +257,36 @@ static int do_strtosz(const char *nptr, const char **end, >> retval = -EINVAL; >> goto out; >> } >> - /* >> - * Values near UINT64_MAX overflow to 2**64 when converting to double >> - * precision. Compare against the maximum representable double precision >> - * value below 2**64, computed as "the next value after 2**64 (0x1p64) in >> - * the direction of 0". >> - */ >> - if ((val * mul > nextafter(0x1p64, 0)) || val < 0) { >> - retval = -ERANGE; >> - goto out; >> + >> + if (use_strtod) { >> + /* >> + * Values near UINT64_MAX overflow to 2**64 when converting to double >> + * precision. Compare against the maximum representable double precision >> + * value below 2**64, computed as "the next value after 2**64 (0x1p64) >> + * in the direction of 0". >> + */ >> + if ((vald * mul > nextafter(0x1p64, 0)) || vald < 0) { >> + retval = -ERANGE; >> + goto out; >> + } >> + *result = vald * mul; > > Here, @mul is a multiplier for double vald. > >> + } else { >> + /* Reject negative input and overflow output */ >> + while (qemu_isspace(*nptr)) { >> + nptr++; >> + } >> + if (*nptr == '-' || UINT64_MAX / (uint64_t) mul < valu) { >> + retval = -ERANGE; >> + goto out; >> + } >> + *result = valu * (uint64_t) mul; > > Here, @mul is a multiplier for uint64_t valu. > > Please change @mul to int64_t to reduce conversions. > >> } >> - *result = val * mul; >> retval = 0; >> >> out: >> if (end) { >> - *end = endptr; >> - } else if (*endptr) { >> + *end = suffix; >> + } else if (*suffix) { >> retval = -EINVAL; >> } > >
Christophe de Dinechin <dinechin@redhat.com> writes: >> On 5 Dec 2019, at 16:29, Markus Armbruster <armbru@redhat.com> wrote: >> >> Tao Xu <tao3.xu@intel.com> writes: >> >>> Parse input string both as a double and as a uint64_t, then use the >>> method which consumes more characters. Update the related test cases. >>> >>> Signed-off-by: Tao Xu <tao3.xu@intel.com> >>> --- >> [...] >>> diff --git a/util/cutils.c b/util/cutils.c >>> index 77acadc70a..b08058c57c 100644 >>> --- a/util/cutils.c >>> +++ b/util/cutils.c >>> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end, >>> const char default_suffix, int64_t unit, >>> uint64_t *result) >>> { >>> - int retval; >>> - const char *endptr; >>> + int retval, retd, retu; >>> + const char *suffix, *suffixd, *suffixu; >>> unsigned char c; >>> int mul_required = 0; >>> - double val, mul, integral, fraction; >>> + bool use_strtod; >>> + uint64_t valu; >>> + double vald, mul, integral, fraction; >> >> Note for later: @mul is double. >> >>> + >>> + retd = qemu_strtod_finite(nptr, &suffixd, &vald); >>> + retu = qemu_strtou64(nptr, &suffixu, 0, &valu); >>> + use_strtod = strlen(suffixd) < strlen(suffixu); >>> + >>> + /* >>> + * Parse @nptr both as a double and as a uint64_t, then use the method >>> + * which consumes more characters. >>> + */ >> >> The comment is in a funny place. I'd put it right before the >> qemu_strtod_finite() line. >> >>> + if (use_strtod) { >>> + suffix = suffixd; >>> + retval = retd; >>> + } else { >>> + suffix = suffixu; >>> + retval = retu; >>> + } >>> >>> - retval = qemu_strtod_finite(nptr, &endptr, &val); >>> if (retval) { >>> goto out; >>> } >> >> This is even more subtle than it looks. > > But why it is even necessary? > > The “contract” for the function used to be that it returned rounded values > beyond 2^53, which in itself is curious. > > But now it’s a 6-dimensional matrix of hell with NaNs and barfnots, when the > name implies it’s simply doing a text to u64 conversion… > > There is certainly a reason, but I’m really curious what it is :-) It all goes back to commit 9f9b17a4f0 "Introduce strtosz() library function to convert a string to a byte count.". To support "convenient" usage like "1.5G", it parses the number part with strtod(). This limits us to 53 bits of precision. Larger sizes get rounded. I guess the excuse for this was that when you're dealing with sizes that large (petabytes!), your least significant bits are zero anyway. Regardless, the interface is *awful*. We should've forced the author to spell it out in all its glory in a proper function contract. That tends to cool the enthusiasm for "convenient" syntax amazingly fast. The awful interface has been confusing people for close to a decade now. What to do? Tao Xu's patch tries to make the function do what its users expect, namely parse a bleepin' 64 bit integer, without breaking any of the "convenience" syntax. Turns out that's amazingly subtle. Are we making things less confusing or more?
> On 17 Dec 2019, at 15:08, Markus Armbruster <armbru@redhat.com> wrote: > > Christophe de Dinechin <dinechin@redhat.com> writes: > >>> On 5 Dec 2019, at 16:29, Markus Armbruster <armbru@redhat.com> wrote: >>> >>> Tao Xu <tao3.xu@intel.com> writes: >>> >>>> Parse input string both as a double and as a uint64_t, then use the >>>> method which consumes more characters. Update the related test cases. >>>> >>>> Signed-off-by: Tao Xu <tao3.xu@intel.com> >>>> --- >>> [...] >>>> diff --git a/util/cutils.c b/util/cutils.c >>>> index 77acadc70a..b08058c57c 100644 >>>> --- a/util/cutils.c >>>> +++ b/util/cutils.c >>>> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end, >>>> const char default_suffix, int64_t unit, >>>> uint64_t *result) >>>> { >>>> - int retval; >>>> - const char *endptr; >>>> + int retval, retd, retu; >>>> + const char *suffix, *suffixd, *suffixu; >>>> unsigned char c; >>>> int mul_required = 0; >>>> - double val, mul, integral, fraction; >>>> + bool use_strtod; >>>> + uint64_t valu; >>>> + double vald, mul, integral, fraction; >>> >>> Note for later: @mul is double. >>> >>>> + >>>> + retd = qemu_strtod_finite(nptr, &suffixd, &vald); >>>> + retu = qemu_strtou64(nptr, &suffixu, 0, &valu); >>>> + use_strtod = strlen(suffixd) < strlen(suffixu); >>>> + >>>> + /* >>>> + * Parse @nptr both as a double and as a uint64_t, then use the method >>>> + * which consumes more characters. >>>> + */ >>> >>> The comment is in a funny place. I'd put it right before the >>> qemu_strtod_finite() line. >>> >>>> + if (use_strtod) { >>>> + suffix = suffixd; >>>> + retval = retd; >>>> + } else { >>>> + suffix = suffixu; >>>> + retval = retu; >>>> + } >>>> >>>> - retval = qemu_strtod_finite(nptr, &endptr, &val); >>>> if (retval) { >>>> goto out; >>>> } >>> >>> This is even more subtle than it looks. >> >> But why it is even necessary? >> >> The “contract” for the function used to be that it returned rounded values >> beyond 2^53, which in itself is curious. >> >> But now it’s a 6-dimensional matrix of hell with NaNs and barfnots, when the >> name implies it’s simply doing a text to u64 conversion… >> >> There is certainly a reason, but I’m really curious what it is :-) > > It all goes back to commit 9f9b17a4f0 "Introduce strtosz() library > function to convert a string to a byte count.". To support "convenient" > usage like "1.5G", it parses the number part with strtod(). This limits > us to 53 bits of precision. Larger sizes get rounded. > > I guess the excuse for this was that when you're dealing with sizes that > large (petabytes!), your least significant bits are zero anyway. > > Regardless, the interface is *awful*. We should've forced the author to > spell it out in all its glory in a proper function contract. That tends > to cool the enthusiasm for "convenient" syntax amazingly fast. > > The awful interface has been confusing people for close to a decade now. > > What to do? I see. Thanks for the rationale. I knew it had to make sense :-) I’d probably avoid strtod even with the convenient syntax above. Do you want 1.33e-6M to be allowed? Do we want to ever accept or generate NaN or Inf values? > > Tao Xu's patch tries to make the function do what its users expect, > namely parse a bleepin' 64 bit integer, without breaking any of the > "convenience" syntax. Turns out that's amazingly subtle. Are we making > things less confusing or more?
Christophe de Dinechin <dinechin@redhat.com> writes: >> On 17 Dec 2019, at 15:08, Markus Armbruster <armbru@redhat.com> wrote: >> >> Christophe de Dinechin <dinechin@redhat.com> writes: >> >>>> On 5 Dec 2019, at 16:29, Markus Armbruster <armbru@redhat.com> wrote: >>>> >>>> Tao Xu <tao3.xu@intel.com> writes: >>>> >>>>> Parse input string both as a double and as a uint64_t, then use the >>>>> method which consumes more characters. Update the related test cases. >>>>> >>>>> Signed-off-by: Tao Xu <tao3.xu@intel.com> >>>>> --- >>>> [...] >>>>> diff --git a/util/cutils.c b/util/cutils.c >>>>> index 77acadc70a..b08058c57c 100644 >>>>> --- a/util/cutils.c >>>>> +++ b/util/cutils.c >>>>> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end, >>>>> const char default_suffix, int64_t unit, >>>>> uint64_t *result) >>>>> { >>>>> - int retval; >>>>> - const char *endptr; >>>>> + int retval, retd, retu; >>>>> + const char *suffix, *suffixd, *suffixu; >>>>> unsigned char c; >>>>> int mul_required = 0; >>>>> - double val, mul, integral, fraction; >>>>> + bool use_strtod; >>>>> + uint64_t valu; >>>>> + double vald, mul, integral, fraction; >>>> >>>> Note for later: @mul is double. >>>> >>>>> + >>>>> + retd = qemu_strtod_finite(nptr, &suffixd, &vald); >>>>> + retu = qemu_strtou64(nptr, &suffixu, 0, &valu); >>>>> + use_strtod = strlen(suffixd) < strlen(suffixu); >>>>> + >>>>> + /* >>>>> + * Parse @nptr both as a double and as a uint64_t, then use the method >>>>> + * which consumes more characters. >>>>> + */ >>>> >>>> The comment is in a funny place. I'd put it right before the >>>> qemu_strtod_finite() line. >>>> >>>>> + if (use_strtod) { >>>>> + suffix = suffixd; >>>>> + retval = retd; >>>>> + } else { >>>>> + suffix = suffixu; >>>>> + retval = retu; >>>>> + } >>>>> >>>>> - retval = qemu_strtod_finite(nptr, &endptr, &val); >>>>> if (retval) { >>>>> goto out; >>>>> } >>>> >>>> This is even more subtle than it looks. >>> >>> But why it is even necessary? >>> >>> The “contract” for the function used to be that it returned rounded values >>> beyond 2^53, which in itself is curious. >>> >>> But now it’s a 6-dimensional matrix of hell with NaNs and barfnots, when the >>> name implies it’s simply doing a text to u64 conversion… >>> >>> There is certainly a reason, but I’m really curious what it is :-) >> >> It all goes back to commit 9f9b17a4f0 "Introduce strtosz() library >> function to convert a string to a byte count.". To support "convenient" >> usage like "1.5G", it parses the number part with strtod(). This limits >> us to 53 bits of precision. Larger sizes get rounded. >> >> I guess the excuse for this was that when you're dealing with sizes that >> large (petabytes!), your least significant bits are zero anyway. >> >> Regardless, the interface is *awful*. We should've forced the author to >> spell it out in all its glory in a proper function contract. That tends >> to cool the enthusiasm for "convenient" syntax amazingly fast. >> >> The awful interface has been confusing people for close to a decade now. >> >> What to do? > > I see. Thanks for the rationale. I knew it had to make sense :-) For a value of "sense"... > I’d probably avoid strtod even with the convenient syntax above. > Do you want 1.33e-6M to be allowed? Do we want to ever > accept or generate NaN or Inf values? NaN or Inf definitely not. That's why we use qemu_strtod_finite() before and after the patch. No sane person should ever use 1.33e-6M. Or even 1.1k (which yields 1126, rounded silently from machine number 1126.4000000000001, which approximates the true value 1126.4). Certain fractions are actually sane. 1.5k denotes a perfectly fine integer, which the code manages not to screw up. I'd recommend against using fractions regardless. What usage are we prepared to break? What kind of confusion are we willing to bear? Those are the questions. >> Tao Xu's patch tries to make the function do what its users expect, >> namely parse a bleepin' 64 bit integer, without breaking any of the >> "convenience" syntax. Turns out that's amazingly subtle. Are we making >> things less confusing or more?
On 12/17/2019 6:25 PM, Markus Armbruster wrote: > Tao Xu <tao3.xu@intel.com> writes: > >> On 12/5/19 11:29 PM, Markus Armbruster wrote: >>> Tao Xu <tao3.xu@intel.com> writes: >>> >>>> Parse input string both as a double and as a uint64_t, then use the >>>> method which consumes more characters. Update the related test cases. >>>> >>>> Signed-off-by: Tao Xu <tao3.xu@intel.com> >>>> --- >>> [...] >>>> diff --git a/util/cutils.c b/util/cutils.c >>>> index 77acadc70a..b08058c57c 100644 >>>> --- a/util/cutils.c >>>> +++ b/util/cutils.c >>>> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end, >>>> const char default_suffix, int64_t unit, >>>> uint64_t *result) >>>> { >>>> - int retval; >>>> - const char *endptr; >>>> + int retval, retd, retu; >>>> + const char *suffix, *suffixd, *suffixu; >>>> unsigned char c; >>>> int mul_required = 0; >>>> - double val, mul, integral, fraction; >>>> + bool use_strtod; >>>> + uint64_t valu; >>>> + double vald, mul, integral, fraction; >>> >>> Note for later: @mul is double. >>> >>>> + >>>> + retd = qemu_strtod_finite(nptr, &suffixd, &vald); >>>> + retu = qemu_strtou64(nptr, &suffixu, 0, &valu); > > Note for later: passing 0 to base accepts octal and hexadecimal > integers. > >>>> + use_strtod = strlen(suffixd) < strlen(suffixu); >>>> + >>>> + /* >>>> + * Parse @nptr both as a double and as a uint64_t, then use the method >>>> + * which consumes more characters. >>>> + */ >>> >>> The comment is in a funny place. I'd put it right before the >>> qemu_strtod_finite() line. >>> >>>> + if (use_strtod) { >>>> + suffix = suffixd; >>>> + retval = retd; >>>> + } else { >>>> + suffix = suffixu; >>>> + retval = retu; >>>> + } >>>> - retval = qemu_strtod_finite(nptr, &endptr, &val); >>>> if (retval) { >>>> goto out; >>>> } >>> >>> This is even more subtle than it looks. >>> >>> A close reading of the function contracts leads to three cases for each >>> conversion: >>> >>> * parse error (including infinity and NaN) >>> >>> @retu / @retd is -EINVAL >>> @valu / @vald is uninitialized >>> @suffixu / @suffixd is @nptr >>> >>> * range error >>> >>> @retu / @retd is -ERANGE >>> @valu / @vald is our best approximation of the conversion result >>> @suffixu / @suffixd points to the first character not consumed by the >>> conversion. >>> >>> Sub-cases: >>> >>> - uint64_t overflow >>> >>> We know the conversion result exceeds UINT64_MAX. >>> >>> - double overflow >>> >>> we know the conversion result's magnitude exceeds the largest >>> representable finite double DBL_MAX. >>> >>> - double underflow >>> >>> we know the conversion result is close to zero (closer than DBL_MIN, >>> the smallest normalized positive double). >>> >>> * success >>> >>> @retu / @retd is 0 >>> @valu / @vald is the conversion result >>> @suffixu / @suffixd points to the first character not consumed by the >>> conversion. >>> >>> This leads to a matrix (parse error, uint64_t overflow, success) x >>> (parse error, double overflow, double underflow, success). We need to >>> check the code does what we want for each element of this matrix, and >>> document any behavior that's not perfectly obvious. >>> >>> (success, success): we pick uint64_t if qemu_strtou64() consumed more >>> characters than qemu_strtod_finite(), else double. "More" is important >>> here; when they consume the same characters, we *need* to use the >>> uint64_t result. Example: for "18446744073709551615", we need to use >>> uint64_t 18446744073709551615, not double 18446744073709551616.0. But >>> for "18446744073709551616.", we need to use the double. Good. > > Also fun: for "0123", we use uint64_t 83, not double 123.0. But for > "0123.", we use 123.0, not 83. > > Do we really want to accept octal and hexadecimal integers? > Thank you for reminding me. Octal and hexadecimal may bring more confusion. I will use qemu_strtou64(nptr, &suffixu, 10, &valu) and add test for input like "0123". >>> (success, parse error) and (parse error, success): we pick the one that >>> succeeds, because success consumes characters, and failure to parse does >>> not. Good. >>> >>> (parse error, parse error): neither consumes characters, so we pick >>> uint64_t. Good. >>> >>> (parse error, double overflow), (parse error, double underflow) and >>> (uint64_t overflow, parse error): we pick the range error, because it >>> consumes characters. Good. >>> >>> These are the simple combinations. The remainder are hairier: (success, >>> double overflow), (success, double underflow), (uint64_t overflow, >>> success). I lack the time to analyze them today. Must be done before >>> we take this patch. Any takers? >> >> (success, double overflow), (success, double underflow), pick double >> overflow error, return -ERANGE. Because it consumes >> characters. Example: for "1.79769e+309", qemu_strtou64 consumes "1", >> and prases as uint64_t; but qemu_strtod_finite return -ERANGE and >> consumes all characters. It is OK. > > The only way to have double overflow when uint64_t succeeds is an > exponent. Double consumes the characters making up the exponent, > uint64_t does not. We use double. > > The only way to have double underflow is with an exponent or a decimal > point. Double consumes their characters, uint64_t does not. We use > double. > > Okay. > >> (uint64_t overflow, success), consume the same characters, use the >> uint64_t return -ERANGE. Note that even if qemu_strtod_finite can >> parse these cases such as "18446744073709551617", but the result is >> uint64_t so we also need to return -ERANGE. It is OK. > > That's just one of two cases, I think. The other one is when the > overflowing integer is followed by an exponent or decimal point. We use > double then. Converting the double to uint64_t overflows, except when a > negative exponent brings the number into range. > > Examples: "18446744073709551617" picks uint64_t overflow, > "18446744073709551617.0" picks double success (but converting it to > uint64_t below overflows), and "18446744073709551617e-10" picks double > success (converted to 1844674407 below). > > Okay. > >> Thank you for your analysis and suggestion. I will add more test cases >> to cover some of these analysis. > > Good move. > > Thank you for your further analysis.
On 12/17/2019 11:01 PM, Markus Armbruster wrote: > Christophe de Dinechin <dinechin@redhat.com> writes: > >>> On 17 Dec 2019, at 15:08, Markus Armbruster <armbru@redhat.com> wrote: >>> >>> Christophe de Dinechin <dinechin@redhat.com> writes: >>> >>>>> On 5 Dec 2019, at 16:29, Markus Armbruster <armbru@redhat.com> wrote: >>>>> >>>>> Tao Xu <tao3.xu@intel.com> writes: >>>>> >>>>>> Parse input string both as a double and as a uint64_t, then use the >>>>>> method which consumes more characters. Update the related test cases. >>>>>> >>>>>> Signed-off-by: Tao Xu <tao3.xu@intel.com> >>>>>> --- >>>>> [...] >>>>>> diff --git a/util/cutils.c b/util/cutils.c >>>>>> index 77acadc70a..b08058c57c 100644 >>>>>> --- a/util/cutils.c >>>>>> +++ b/util/cutils.c >>>>>> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end, >>>>>> const char default_suffix, int64_t unit, >>>>>> uint64_t *result) >>>>>> { >>>>>> - int retval; >>>>>> - const char *endptr; >>>>>> + int retval, retd, retu; >>>>>> + const char *suffix, *suffixd, *suffixu; >>>>>> unsigned char c; >>>>>> int mul_required = 0; >>>>>> - double val, mul, integral, fraction; >>>>>> + bool use_strtod; >>>>>> + uint64_t valu; >>>>>> + double vald, mul, integral, fraction; >>>>> >>>>> Note for later: @mul is double. >>>>> >>>>>> + >>>>>> + retd = qemu_strtod_finite(nptr, &suffixd, &vald); >>>>>> + retu = qemu_strtou64(nptr, &suffixu, 0, &valu); >>>>>> + use_strtod = strlen(suffixd) < strlen(suffixu); >>>>>> + >>>>>> + /* >>>>>> + * Parse @nptr both as a double and as a uint64_t, then use the method >>>>>> + * which consumes more characters. >>>>>> + */ >>>>> >>>>> The comment is in a funny place. I'd put it right before the >>>>> qemu_strtod_finite() line. >>>>> >>>>>> + if (use_strtod) { >>>>>> + suffix = suffixd; >>>>>> + retval = retd; >>>>>> + } else { >>>>>> + suffix = suffixu; >>>>>> + retval = retu; >>>>>> + } >>>>>> >>>>>> - retval = qemu_strtod_finite(nptr, &endptr, &val); >>>>>> if (retval) { >>>>>> goto out; >>>>>> } >>>>> >>>>> This is even more subtle than it looks. >>>> >>>> But why it is even necessary? >>>> >>>> The “contract” for the function used to be that it returned rounded values >>>> beyond 2^53, which in itself is curious. >>>> >>>> But now it’s a 6-dimensional matrix of hell with NaNs and barfnots, when the >>>> name implies it’s simply doing a text to u64 conversion… >>>> >>>> There is certainly a reason, but I’m really curious what it is :-) >>> >>> It all goes back to commit 9f9b17a4f0 "Introduce strtosz() library >>> function to convert a string to a byte count.". To support "convenient" >>> usage like "1.5G", it parses the number part with strtod(). This limits >>> us to 53 bits of precision. Larger sizes get rounded. >>> >>> I guess the excuse for this was that when you're dealing with sizes that >>> large (petabytes!), your least significant bits are zero anyway. >>> >>> Regardless, the interface is *awful*. We should've forced the author to >>> spell it out in all its glory in a proper function contract. That tends >>> to cool the enthusiasm for "convenient" syntax amazingly fast. >>> >>> The awful interface has been confusing people for close to a decade now. >>> >>> What to do? >> >> I see. Thanks for the rationale. I knew it had to make sense :-) > > For a value of "sense"... > >> I’d probably avoid strtod even with the convenient syntax above. >> Do you want 1.33e-6M to be allowed? Do we want to ever >> accept or generate NaN or Inf values? > > NaN or Inf definitely not. That's why we use qemu_strtod_finite() > before and after the patch. > > No sane person should ever use 1.33e-6M. Or even 1.1k (which yields > 1126, rounded silently from machine number 1126.4000000000001, which > approximates the true value 1126.4). > > Certain fractions are actually sane. 1.5k denotes a perfectly fine > integer, which the code manages not to screw up. I'd recommend against > using fractions regardless. > > What usage are we prepared to break? What kind of confusion are we > willing to bear? Those are the questions. > >>> Tao Xu's patch tries to make the function do what its users expect, >>> namely parse a bleepin' 64 bit integer, without breaking any of the >>> "convenience" syntax. Turns out that's amazingly subtle. Are we making >>> things less confusing or more? > Thanks for your explanation. I think another reason is build-in 'size' is really commonly used. May be someone use '-m 1.5G' to boot QEMU or write it to a config file.
On 12/18/2019 9:33 AM, Tao Xu wrote: > On 12/17/2019 6:25 PM, Markus Armbruster wrote: >> Tao Xu <tao3.xu@intel.com> writes: >> >>> On 12/5/19 11:29 PM, Markus Armbruster wrote: >>>> Tao Xu <tao3.xu@intel.com> writes: >>>> >>>>> Parse input string both as a double and as a uint64_t, then use the >>>>> method which consumes more characters. Update the related test cases. >>>>> >>>>> Signed-off-by: Tao Xu <tao3.xu@intel.com> >>>>> --- >>>> [...] >>>>> diff --git a/util/cutils.c b/util/cutils.c >>>>> index 77acadc70a..b08058c57c 100644 >>>>> --- a/util/cutils.c >>>>> +++ b/util/cutils.c >>>>> @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const >>>>> char **end, >>>>> const char default_suffix, int64_t unit, >>>>> uint64_t *result) >>>>> { >>>>> - int retval; >>>>> - const char *endptr; >>>>> + int retval, retd, retu; >>>>> + const char *suffix, *suffixd, *suffixu; >>>>> unsigned char c; >>>>> int mul_required = 0; >>>>> - double val, mul, integral, fraction; >>>>> + bool use_strtod; >>>>> + uint64_t valu; >>>>> + double vald, mul, integral, fraction; >>>> >>>> Note for later: @mul is double. >>>> >>>>> + >>>>> + retd = qemu_strtod_finite(nptr, &suffixd, &vald); >>>>> + retu = qemu_strtou64(nptr, &suffixu, 0, &valu); >> >> Note for later: passing 0 to base accepts octal and hexadecimal >> integers. >> >>>>> + use_strtod = strlen(suffixd) < strlen(suffixu); >>>>> + >>>>> + /* >>>>> + * Parse @nptr both as a double and as a uint64_t, then use >>>>> the method >>>>> + * which consumes more characters. >>>>> + */ >>>> >>>> The comment is in a funny place. I'd put it right before the >>>> qemu_strtod_finite() line. >>>> >>>>> + if (use_strtod) { >>>>> + suffix = suffixd; >>>>> + retval = retd; >>>>> + } else { >>>>> + suffix = suffixu; >>>>> + retval = retu; >>>>> + } >>>>> - retval = qemu_strtod_finite(nptr, &endptr, &val); >>>>> if (retval) { >>>>> goto out; >>>>> } >>>> >>>> This is even more subtle than it looks. >>>> >>>> A close reading of the function contracts leads to three cases for each >>>> conversion: >>>> >>>> * parse error (including infinity and NaN) >>>> >>>> @retu / @retd is -EINVAL >>>> @valu / @vald is uninitialized >>>> @suffixu / @suffixd is @nptr >>>> >>>> * range error >>>> >>>> @retu / @retd is -ERANGE >>>> @valu / @vald is our best approximation of the conversion result >>>> @suffixu / @suffixd points to the first character not consumed >>>> by the >>>> conversion. >>>> >>>> Sub-cases: >>>> >>>> - uint64_t overflow >>>> >>>> We know the conversion result exceeds UINT64_MAX. >>>> >>>> - double overflow >>>> >>>> we know the conversion result's magnitude exceeds the largest >>>> representable finite double DBL_MAX. >>>> >>>> - double underflow >>>> >>>> we know the conversion result is close to zero (closer than >>>> DBL_MIN, >>>> the smallest normalized positive double). >>>> >>>> * success >>>> >>>> @retu / @retd is 0 >>>> @valu / @vald is the conversion result >>>> @suffixu / @suffixd points to the first character not consumed >>>> by the >>>> conversion. >>>> >>>> This leads to a matrix (parse error, uint64_t overflow, success) x >>>> (parse error, double overflow, double underflow, success). We need to >>>> check the code does what we want for each element of this matrix, and >>>> document any behavior that's not perfectly obvious. >>>> >>>> (success, success): we pick uint64_t if qemu_strtou64() consumed more >>>> characters than qemu_strtod_finite(), else double. "More" is important >>>> here; when they consume the same characters, we *need* to use the >>>> uint64_t result. Example: for "18446744073709551615", we need to use >>>> uint64_t 18446744073709551615, not double 18446744073709551616.0. But >>>> for "18446744073709551616.", we need to use the double. Good. >> >> Also fun: for "0123", we use uint64_t 83, not double 123.0. But for >> "0123.", we use 123.0, not 83. >> >> Do we really want to accept octal and hexadecimal integers? >> > > Thank you for reminding me. Octal and hexadecimal may bring more > confusion. I will use qemu_strtou64(nptr, &suffixu, 10, &valu) and add > test for input like "0123". > Hi Markus, After I use qemu_strtou64(nptr, &suffixu, 10, &valu), it cause another question. Because qemu_strtod_finite support hexadecimal input, so in this situation, it will parsed as double. It will also let large hexadecimal integers be rounded. So there may be two solution: 1: use qemu_strtou64(nptr, &suffixu, 0, &valu) and parse octal as decimal. This will keep hexadecimal valid as now. "0123" --> 123; "0x123" --> 291 2: use qemu_strtou64(nptr, &suffixu, 10, &valu) and reject octal and decimal. "0123" --> Error; "0x123" --> Error
Tao Xu <tao3.xu@intel.com> writes: > On 12/18/2019 9:33 AM, Tao Xu wrote: >> On 12/17/2019 6:25 PM, Markus Armbruster wrote: [...] >>> Also fun: for "0123", we use uint64_t 83, not double 123.0. But for >>> "0123.", we use 123.0, not 83. >>> >>> Do we really want to accept octal and hexadecimal integers? >>> >> >> Thank you for reminding me. Octal and hexadecimal may bring more >> confusion. I will use qemu_strtou64(nptr, &suffixu, 10, &valu) and >> add test for input like "0123". >> > > Hi Markus, > > After I use qemu_strtou64(nptr, &suffixu, 10, &valu), it cause another > question. Because qemu_strtod_finite support hexadecimal input, so in > this situation, it will parsed as double. It will also let large > hexadecimal integers be rounded. So there may be two solution: > > 1: use qemu_strtou64(nptr, &suffixu, 0, &valu) and parse octal as > decimal. This will keep hexadecimal valid as now. > > "0123" --> 123; "0x123" --> 291 How would you make qemu_strtou64() parse octal as decimal? > 2: use qemu_strtou64(nptr, &suffixu, 10, &valu) and reject octal and > decimal. > > "0123" --> Error; "0x123" --> Error How would you reject the 0x prefix?
On 12/17/19 7:33 PM, Tao Xu wrote: >> Also fun: for "0123", we use uint64_t 83, not double 123.0. But for >> "0123.", we use 123.0, not 83. >> >> Do we really want to accept octal and hexadecimal integers? >> > > Thank you for reminding me. Octal and hexadecimal may bring more > confusion. I will use qemu_strtou64(nptr, &suffixu, 10, &valu) and add > test for input like "0123". Note that JSON does not permit octal numbers, but ALSO does not permit '0123' as a valid JSON number. Of course, this parser is intended for human users rather than a JSON parser, so silently accepting it as decimal 123 is probably okay, but it is worth remembering that decisions are not trivial here.
On 12/19/2019 2:26 AM, Markus Armbruster wrote: > Tao Xu <tao3.xu@intel.com> writes: > >> On 12/18/2019 9:33 AM, Tao Xu wrote: >>> On 12/17/2019 6:25 PM, Markus Armbruster wrote: > [...] >>>> Also fun: for "0123", we use uint64_t 83, not double 123.0. But for >>>> "0123.", we use 123.0, not 83. >>>> >>>> Do we really want to accept octal and hexadecimal integers? >>>> >>> >>> Thank you for reminding me. Octal and hexadecimal may bring more >>> confusion. I will use qemu_strtou64(nptr, &suffixu, 10, &valu) and >>> add test for input like "0123". >>> >> >> Hi Markus, >> >> After I use qemu_strtou64(nptr, &suffixu, 10, &valu), it cause another >> question. Because qemu_strtod_finite support hexadecimal input, so in >> this situation, it will parsed as double. It will also let large >> hexadecimal integers be rounded. So there may be two solution: >> >> 1: use qemu_strtou64(nptr, &suffixu, 0, &valu) and parse octal as >> decimal. This will keep hexadecimal valid as now. >> >> "0123" --> 123; "0x123" --> 291 > > How would you make qemu_strtou64() parse octal as decimal? How about this solution, set @base as variable, if we detect hexadecimal, we use 0, then can prase decimal as u64, else we use 10, then can prase octal as decimal, because 0 prefix will be ignored in qemu_strtou64(nptr, &suffixu, 10, &valu); const char *p = nptr; while (qemu_isspace(*p)) { p++; } if (*p == '0' && (qemu_toupper(*(p+1)) == 'X' ||) { base = 0; } else { base = 10; } retd = qemu_strtod_finite(nptr, &suffixd, &vald); retu = qemu_strtou64(nptr, &suffixu, base, &valu); use_strtod = strlen(suffixd) < strlen(suffixu); if (use_strtod) { endptr = suffixd; retval = retd; } else { endptr = suffixu; retval = retu; } > >> 2: use qemu_strtou64(nptr, &suffixu, 10, &valu) and reject octal and >> decimal. >> >> "0123" --> Error; "0x123" --> Error > > How would you reject the 0x prefix? > How about check the first&second character is '0' and 'x' and then return -EINVAL.
Tao Xu <tao3.xu@intel.com> writes: > On 12/19/2019 2:26 AM, Markus Armbruster wrote: >> Tao Xu <tao3.xu@intel.com> writes: >> >>> On 12/18/2019 9:33 AM, Tao Xu wrote: >>>> On 12/17/2019 6:25 PM, Markus Armbruster wrote: >> [...] >>>>> Also fun: for "0123", we use uint64_t 83, not double 123.0. But for >>>>> "0123.", we use 123.0, not 83. >>>>> >>>>> Do we really want to accept octal and hexadecimal integers? >>>>> >>>> >>>> Thank you for reminding me. Octal and hexadecimal may bring more >>>> confusion. I will use qemu_strtou64(nptr, &suffixu, 10, &valu) and >>>> add test for input like "0123". >>>> >>> >>> Hi Markus, >>> >>> After I use qemu_strtou64(nptr, &suffixu, 10, &valu), it cause another >>> question. Because qemu_strtod_finite support hexadecimal input, so in >>> this situation, it will parsed as double. It will also let large >>> hexadecimal integers be rounded. So there may be two solution: >>> >>> 1: use qemu_strtou64(nptr, &suffixu, 0, &valu) and parse octal as >>> decimal. This will keep hexadecimal valid as now. >>> >>> "0123" --> 123; "0x123" --> 291 >> >> How would you make qemu_strtou64() parse octal as decimal? > > How about this solution, set @base as variable, if we detect > hexadecimal, we use 0, then can prase decimal as u64, else we use 10, > then can prase octal as decimal, because 0 prefix will be ignored in > qemu_strtou64(nptr, &suffixu, 10, &valu); > > const char *p = nptr; > while (qemu_isspace(*p)) { > p++; > } > if (*p == '0' && (qemu_toupper(*(p+1)) == 'X' ||) { > base = 0; > } else { > base = 10; > } > > retd = qemu_strtod_finite(nptr, &suffixd, &vald); > retu = qemu_strtou64(nptr, &suffixu, base, &valu); > use_strtod = strlen(suffixd) < strlen(suffixu); > > if (use_strtod) { > endptr = suffixd; > retval = retd; > } else { > endptr = suffixu; > retval = retu; > } You skip whitespace, but neglect to skip the sign. Peeking into the input to predict what qemu_strtou64() will do feels unadvisable. We could try both base 10 and 16, and use whatever consumes more characters, but that's even more complicated. This function's contract is *terrible*. We've tried to improve on it, but so far only managed to make it more complex and differently terrible. Do we really, really, really need full precision? If no, let's flee this swamp without looking back. If yes, what's the *simplest* solution that provides it? >>> 2: use qemu_strtou64(nptr, &suffixu, 10, &valu) and reject octal and >>> decimal. >>> >>> "0123" --> Error; "0x123" --> Error >> >> How would you reject the 0x prefix? >> > How about check the first&second character is '0' and 'x' and then > return -EINVAL.
diff --git a/tests/test-cutils.c b/tests/test-cutils.c index 1aa8351520..4a7030c611 100644 --- a/tests/test-cutils.c +++ b/tests/test-cutils.c @@ -1970,40 +1970,25 @@ static void test_qemu_strtosz_simple(void) g_assert_cmpint(err, ==, 0); g_assert_cmpint(res, ==, 12345); - /* Note: precision is 53 bits since we're parsing with strtod() */ - - str = "9007199254740991"; /* 2^53-1 */ - err = qemu_strtosz(str, &endptr, &res); - g_assert_cmpint(err, ==, 0); - g_assert_cmpint(res, ==, 0x1fffffffffffff); - g_assert(endptr == str + 16); - - str = "9007199254740992"; /* 2^53 */ - err = qemu_strtosz(str, &endptr, &res); - g_assert_cmpint(err, ==, 0); - g_assert_cmpint(res, ==, 0x20000000000000); - g_assert(endptr == str + 16); + /* Note: precision is 64 bits (UINT64_MAX) */ str = "9007199254740993"; /* 2^53+1 */ err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, 0); - g_assert_cmpint(res, ==, 0x20000000000000); /* rounded to 53 bits */ + g_assert_cmpint(res, ==, 0x20000000000001); g_assert(endptr == str + 16); - str = "18446744073709549568"; /* 0xfffffffffffff800 (53 msbs set) */ + str = "18446744073709550591"; /* 0xfffffffffffffbff */ err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, 0); - g_assert_cmpint(res, ==, 0xfffffffffffff800); + g_assert_cmpint(res, ==, 0xfffffffffffffbff); g_assert(endptr == str + 20); - str = "18446744073709550591"; /* 0xfffffffffffffbff */ + str = "18446744073709551615"; /* 2^64-1 (UINT64_MAX) */ err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, 0); - g_assert_cmpint(res, ==, 0xfffffffffffff800); /* rounded to 53 bits */ + g_assert_cmpint(res, ==, 0xffffffffffffffff); g_assert(endptr == str + 20); - - /* 0x7ffffffffffffe00..0x7fffffffffffffff get rounded to - * 0x8000000000000000, thus -ERANGE; see test_qemu_strtosz_erange() */ } static void test_qemu_strtosz_units(void) @@ -2145,16 +2130,6 @@ static void test_qemu_strtosz_erange(void) g_assert_cmpint(err, ==, -ERANGE); g_assert(endptr == str + 2); - str = "18446744073709550592"; /* 0xfffffffffffffc00 */ - err = qemu_strtosz(str, &endptr, &res); - g_assert_cmpint(err, ==, -ERANGE); - g_assert(endptr == str + 20); - - str = "18446744073709551615"; /* 2^64-1 */ - err = qemu_strtosz(str, &endptr, &res); - g_assert_cmpint(err, ==, -ERANGE); - g_assert(endptr == str + 20); - str = "18446744073709551616"; /* 2^64 */ err = qemu_strtosz(str, &endptr, &res); g_assert_cmpint(err, ==, -ERANGE); diff --git a/tests/test-keyval.c b/tests/test-keyval.c index 09b0ae3c68..fad941fcb8 100644 --- a/tests/test-keyval.c +++ b/tests/test-keyval.c @@ -383,59 +383,26 @@ static void test_keyval_visit_size(void) visit_end_struct(v, NULL); visit_free(v); - /* Note: precision is 53 bits since we're parsing with strtod() */ + /* Note: precision is 64 bits (UINT64_MAX) */ - /* Around limit of precision: 2^53-1, 2^53, 2^53+1 */ - qdict = keyval_parse("sz1=9007199254740991," - "sz2=9007199254740992," - "sz3=9007199254740993", + /* Around limit of precision: UINT64_MAX - 1, UINT64_MAX */ + qdict = keyval_parse("sz1=18446744073709551614," + "sz2=18446744073709551615", NULL, &error_abort); v = qobject_input_visitor_new_keyval(QOBJECT(qdict)); qobject_unref(qdict); visit_start_struct(v, NULL, NULL, 0, &error_abort); visit_type_size(v, "sz1", &sz, &error_abort); - g_assert_cmphex(sz, ==, 0x1fffffffffffff); + g_assert_cmphex(sz, ==, 0xfffffffffffffffe); visit_type_size(v, "sz2", &sz, &error_abort); - g_assert_cmphex(sz, ==, 0x20000000000000); - visit_type_size(v, "sz3", &sz, &error_abort); - g_assert_cmphex(sz, ==, 0x20000000000000); - visit_check_struct(v, &error_abort); - visit_end_struct(v, NULL); - visit_free(v); - - /* Close to signed upper limit 0x7ffffffffffffc00 (53 msbs set) */ - qdict = keyval_parse("sz1=9223372036854774784," /* 7ffffffffffffc00 */ - "sz2=9223372036854775295", /* 7ffffffffffffdff */ - NULL, &error_abort); - v = qobject_input_visitor_new_keyval(QOBJECT(qdict)); - qobject_unref(qdict); - visit_start_struct(v, NULL, NULL, 0, &error_abort); - visit_type_size(v, "sz1", &sz, &error_abort); - g_assert_cmphex(sz, ==, 0x7ffffffffffffc00); - visit_type_size(v, "sz2", &sz, &error_abort); - g_assert_cmphex(sz, ==, 0x7ffffffffffffc00); - visit_check_struct(v, &error_abort); - visit_end_struct(v, NULL); - visit_free(v); - - /* Close to actual upper limit 0xfffffffffffff800 (53 msbs set) */ - qdict = keyval_parse("sz1=18446744073709549568," /* fffffffffffff800 */ - "sz2=18446744073709550591", /* fffffffffffffbff */ - NULL, &error_abort); - v = qobject_input_visitor_new_keyval(QOBJECT(qdict)); - qobject_unref(qdict); - visit_start_struct(v, NULL, NULL, 0, &error_abort); - visit_type_size(v, "sz1", &sz, &error_abort); - g_assert_cmphex(sz, ==, 0xfffffffffffff800); - visit_type_size(v, "sz2", &sz, &error_abort); - g_assert_cmphex(sz, ==, 0xfffffffffffff800); + g_assert_cmphex(sz, ==, 0xffffffffffffffff); visit_check_struct(v, &error_abort); visit_end_struct(v, NULL); visit_free(v); /* Beyond limits */ qdict = keyval_parse("sz1=-1," - "sz2=18446744073709550592", /* fffffffffffffc00 */ + "sz2=18446744073709551616", /* 2^64 */ NULL, &error_abort); v = qobject_input_visitor_new_keyval(QOBJECT(qdict)); qobject_unref(qdict); diff --git a/tests/test-qemu-opts.c b/tests/test-qemu-opts.c index ef96e84aed..3a8b8c0168 100644 --- a/tests/test-qemu-opts.c +++ b/tests/test-qemu-opts.c @@ -650,50 +650,25 @@ static void test_opts_parse_size(void) g_assert_cmpuint(opts_count(opts), ==, 1); g_assert_cmpuint(qemu_opt_get_size(opts, "size1", 1), ==, 0); - /* Note: precision is 53 bits since we're parsing with strtod() */ + /* Note: precision is 64 bits (UINT64_MAX) */ - /* Around limit of precision: 2^53-1, 2^53, 2^54 */ + /* Around limit of precision: UINT64_MAX - 1, UINT64_MAX */ opts = qemu_opts_parse(&opts_list_02, - "size1=9007199254740991," - "size2=9007199254740992," - "size3=9007199254740993", - false, &error_abort); - g_assert_cmpuint(opts_count(opts), ==, 3); - g_assert_cmphex(qemu_opt_get_size(opts, "size1", 1), - ==, 0x1fffffffffffff); - g_assert_cmphex(qemu_opt_get_size(opts, "size2", 1), - ==, 0x20000000000000); - g_assert_cmphex(qemu_opt_get_size(opts, "size3", 1), - ==, 0x20000000000000); - - /* Close to signed upper limit 0x7ffffffffffffc00 (53 msbs set) */ - opts = qemu_opts_parse(&opts_list_02, - "size1=9223372036854774784," /* 7ffffffffffffc00 */ - "size2=9223372036854775295", /* 7ffffffffffffdff */ - false, &error_abort); - g_assert_cmpuint(opts_count(opts), ==, 2); - g_assert_cmphex(qemu_opt_get_size(opts, "size1", 1), - ==, 0x7ffffffffffffc00); - g_assert_cmphex(qemu_opt_get_size(opts, "size2", 1), - ==, 0x7ffffffffffffc00); - - /* Close to actual upper limit 0xfffffffffffff800 (53 msbs set) */ - opts = qemu_opts_parse(&opts_list_02, - "size1=18446744073709549568," /* fffffffffffff800 */ - "size2=18446744073709550591", /* fffffffffffffbff */ + "size1=18446744073709551614," + "size2=18446744073709551615", false, &error_abort); g_assert_cmpuint(opts_count(opts), ==, 2); g_assert_cmphex(qemu_opt_get_size(opts, "size1", 1), - ==, 0xfffffffffffff800); + ==, 0xfffffffffffffffe); g_assert_cmphex(qemu_opt_get_size(opts, "size2", 1), - ==, 0xfffffffffffff800); + ==, 0xffffffffffffffff); /* Beyond limits */ opts = qemu_opts_parse(&opts_list_02, "size1=-1", false, &err); error_free_or_abort(&err); g_assert(!opts); opts = qemu_opts_parse(&opts_list_02, - "size1=18446744073709550592", /* fffffffffffffc00 */ + "size1=18446744073709551616", /* 2^64 */ false, &err); error_free_or_abort(&err); g_assert(!opts); diff --git a/util/cutils.c b/util/cutils.c index 77acadc70a..b08058c57c 100644 --- a/util/cutils.c +++ b/util/cutils.c @@ -212,24 +212,43 @@ static int do_strtosz(const char *nptr, const char **end, const char default_suffix, int64_t unit, uint64_t *result) { - int retval; - const char *endptr; + int retval, retd, retu; + const char *suffix, *suffixd, *suffixu; unsigned char c; int mul_required = 0; - double val, mul, integral, fraction; + bool use_strtod; + uint64_t valu; + double vald, mul, integral, fraction; + + retd = qemu_strtod_finite(nptr, &suffixd, &vald); + retu = qemu_strtou64(nptr, &suffixu, 0, &valu); + use_strtod = strlen(suffixd) < strlen(suffixu); + + /* + * Parse @nptr both as a double and as a uint64_t, then use the method + * which consumes more characters. + */ + if (use_strtod) { + suffix = suffixd; + retval = retd; + } else { + suffix = suffixu; + retval = retu; + } - retval = qemu_strtod_finite(nptr, &endptr, &val); if (retval) { goto out; } - fraction = modf(val, &integral); - if (fraction != 0) { - mul_required = 1; + if (use_strtod) { + fraction = modf(vald, &integral); + if (fraction != 0) { + mul_required = 1; + } } - c = *endptr; + c = *suffix; mul = suffix_mul(c, unit); if (mul >= 0) { - endptr++; + suffix++; } else { mul = suffix_mul(default_suffix, unit); assert(mul >= 0); @@ -238,23 +257,36 @@ static int do_strtosz(const char *nptr, const char **end, retval = -EINVAL; goto out; } - /* - * Values near UINT64_MAX overflow to 2**64 when converting to double - * precision. Compare against the maximum representable double precision - * value below 2**64, computed as "the next value after 2**64 (0x1p64) in - * the direction of 0". - */ - if ((val * mul > nextafter(0x1p64, 0)) || val < 0) { - retval = -ERANGE; - goto out; + + if (use_strtod) { + /* + * Values near UINT64_MAX overflow to 2**64 when converting to double + * precision. Compare against the maximum representable double precision + * value below 2**64, computed as "the next value after 2**64 (0x1p64) + * in the direction of 0". + */ + if ((vald * mul > nextafter(0x1p64, 0)) || vald < 0) { + retval = -ERANGE; + goto out; + } + *result = vald * mul; + } else { + /* Reject negative input and overflow output */ + while (qemu_isspace(*nptr)) { + nptr++; + } + if (*nptr == '-' || UINT64_MAX / (uint64_t) mul < valu) { + retval = -ERANGE; + goto out; + } + *result = valu * (uint64_t) mul; } - *result = val * mul; retval = 0; out: if (end) { - *end = endptr; - } else if (*endptr) { + *end = suffix; + } else if (*suffix) { retval = -EINVAL; }
Parse input string both as a double and as a uint64_t, then use the method which consumes more characters. Update the related test cases. Signed-off-by: Tao Xu <tao3.xu@intel.com> --- tests/test-cutils.c | 37 ++++----------------- tests/test-keyval.c | 47 ++++----------------------- tests/test-qemu-opts.c | 39 ++++------------------ util/cutils.c | 74 ++++++++++++++++++++++++++++++------------ 4 files changed, 73 insertions(+), 124 deletions(-)