diff mbox series

[v7,08/13] iio: afe: rescale: fix precision on fractional log scale

Message ID 20210801194000.3646303-9-liambeguin@gmail.com (mailing list archive)
State Changes Requested
Headers show
Series iio: afe: add temperature rescaling support | expand

Commit Message

Liam Beguin Aug. 1, 2021, 7:39 p.m. UTC
From: Liam Beguin <lvb@xiphos.com>

The IIO_VAL_FRACTIONAL_LOG2 scale type doesn't return the expected
scale. Update the case so that the rescaler returns a fractional type
and a more precise scale.

Signed-off-by: Liam Beguin <lvb@xiphos.com>
---
 drivers/iio/afe/iio-rescale.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

Comments

Peter Rosin Aug. 2, 2021, 9:17 a.m. UTC | #1
On 2021-08-01 21:39, Liam Beguin wrote:
> From: Liam Beguin <lvb@xiphos.com>
> 
> The IIO_VAL_FRACTIONAL_LOG2 scale type doesn't return the expected
> scale. Update the case so that the rescaler returns a fractional type
> and a more precise scale.
> 
> Signed-off-by: Liam Beguin <lvb@xiphos.com>
> ---
>  drivers/iio/afe/iio-rescale.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iio/afe/iio-rescale.c b/drivers/iio/afe/iio-rescale.c
> index abd7ad73d1ce..e37a9766080c 100644
> --- a/drivers/iio/afe/iio-rescale.c
> +++ b/drivers/iio/afe/iio-rescale.c
> @@ -47,12 +47,17 @@ int rescale_process_scale(struct rescale *rescale, int scale_type,
>  		*val2 = rescale->denominator;
>  		return IIO_VAL_FRACTIONAL;
>  	case IIO_VAL_FRACTIONAL_LOG2:
> -		tmp = *val * 1000000000LL;
> -		do_div(tmp, rescale->denominator);
> -		tmp *= rescale->numerator;
> -		do_div(tmp, 1000000000LL);
> +		if (check_mul_overflow(*val, rescale->numerator, (s32 *)&tmp) ||
> +		    check_mul_overflow(rescale->denominator, (1 << *val2), (s32 *)&tmp2)) {
> +			tmp = (s64)*val * rescale->numerator;
> +			tmp2 = (s64)rescale->denominator * (1 << *val2);
> +			factor = gcd(abs(tmp), abs(tmp2));
> +			tmp = div_s64(tmp, factor);
> +			tmp2 = div_s64(tmp2, factor);

The case I really worry about is when trying to get an exact result by using
gcd() really doesn't improve the situation, and the only way to avoid overflow
is to reduce the precision. A perhaps contrived example:

scale numerator   1,220,703,125    i.e. 5 ^ 13
scale denominator 1,162,261,467    i.e. 3 ^ 19
*val              1,129,900,996    i.e. 7 ^ 10  *  2 ^ 2
*val2             2                i.e. value = 7 ^ 10

Then you get overflow for both the calls to check_mul_overflow(). But when gcd()
returns 1 (or something too small) the overflow is "returned" as-is.

With the old code you get something that is at least not completely wrong, just
not as accurate as is perhaps possible:
*val   1,186,715,480
*val2  2
Or 1,186,715,480 / 2^2 = 296,678,870.

With this patch the above makes you attempt to return the fraction:
*val   1,379,273,676,757,812,500
*val2  4,649,045,868
Or 296,678,870.443403528 (or something like that, not 100% sure about all the
fractional digits, but they are not really important for my argument)

While the latter is more correct, truncation to 32-bit clobbers the result so
in reality this is returned:
*val   -281,918,188
*val2  354,078,572
Or -0.796202341

So, while it might seem unlucky that gcd() will not find a big enough factor,
it is certainly possible. And I also worry that when this happens it will only
happen once in a while, and that the resulting bad values might be extremely
unexpected and difficult to track down. Things that happen once in a blue moon
are simply not fun to debug.

I.e. I worry that small islands of input will cause failures. With the old code
there are no such islands. The scale factor alone determines the precision, and
if you get poor precision you get poor precision throughout the range. And any
problem will therefore be "stable" and much easier to debug for "innocent" 3rd
party users that may not even be aware that the rescaler is involved at all.

This is also an issue I have with patch 7/13, but there the only thing that is
sacrificed is CPU cycles. But nonetheless, I'm dubious if patch 7/13 is wise
precisely because it might cause issues that are intermittent and therefore
difficult to debug.

Also, changing the calculation so that you get more precision whenever that is
possible feels dangerous. I fear linearity breaks and that bigger input cause
smaller output due to rounding if the bigger value has to be rounded down, but
that this isn't done carefully enough. I.e. attempting to return an exact
fraction and only falling back to the old code when that is not possible is
still not safe since the old code isn't careful enough about rounding. I think
it is really important that bigger input cause bigger (or equal) output.
Otherwise you might trigger instability in feedback loops should a rescaler be
involved in a some regulator function.

Cheers,
Peter

> +		}
>  		*val = tmp;
> -		return scale_type;
> +		*val2 = tmp2;
> +		return IIO_VAL_FRACTIONAL;
>  	case IIO_VAL_INT_PLUS_NANO:
>  	case IIO_VAL_INT_PLUS_MICRO:
>  		if (scale_type == IIO_VAL_INT_PLUS_NANO)
>
Liam Beguin Aug. 15, 2021, 10:14 p.m. UTC | #2
On Mon Aug 2, 2021 at 5:17 AM EDT, Peter Rosin wrote:
> On 2021-08-01 21:39, Liam Beguin wrote:
> > From: Liam Beguin <lvb@xiphos.com>
> > 
> > The IIO_VAL_FRACTIONAL_LOG2 scale type doesn't return the expected
> > scale. Update the case so that the rescaler returns a fractional type
> > and a more precise scale.
> > 
> > Signed-off-by: Liam Beguin <lvb@xiphos.com>
> > ---
> >  drivers/iio/afe/iio-rescale.c | 15 ++++++++++-----
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/iio/afe/iio-rescale.c b/drivers/iio/afe/iio-rescale.c
> > index abd7ad73d1ce..e37a9766080c 100644
> > --- a/drivers/iio/afe/iio-rescale.c
> > +++ b/drivers/iio/afe/iio-rescale.c
> > @@ -47,12 +47,17 @@ int rescale_process_scale(struct rescale *rescale, int scale_type,
> >  		*val2 = rescale->denominator;
> >  		return IIO_VAL_FRACTIONAL;
> >  	case IIO_VAL_FRACTIONAL_LOG2:
> > -		tmp = *val * 1000000000LL;
> > -		do_div(tmp, rescale->denominator);
> > -		tmp *= rescale->numerator;
> > -		do_div(tmp, 1000000000LL);
> > +		if (check_mul_overflow(*val, rescale->numerator, (s32 *)&tmp) ||
> > +		    check_mul_overflow(rescale->denominator, (1 << *val2), (s32 *)&tmp2)) {
> > +			tmp = (s64)*val * rescale->numerator;
> > +			tmp2 = (s64)rescale->denominator * (1 << *val2);
> > +			factor = gcd(abs(tmp), abs(tmp2));
> > +			tmp = div_s64(tmp, factor);
> > +			tmp2 = div_s64(tmp2, factor);

Hi Peter,

Apologies for the delay, I got caught up on some other work.

>
> The case I really worry about is when trying to get an exact result by
> using
> gcd() really doesn't improve the situation, and the only way to avoid
> overflow
> is to reduce the precision. A perhaps contrived example:
>
> scale numerator 1,220,703,125 i.e. 5 ^ 13
> scale denominator 1,162,261,467 i.e. 3 ^ 19
> *val 1,129,900,996 i.e. 7 ^ 10 * 2 ^ 2
> *val2 2 i.e. value = 7 ^ 10
>
> Then you get overflow for both the calls to check_mul_overflow(). But
> when gcd()
> returns 1 (or something too small) the overflow is "returned" as-is.

I was aware of the issue when gcd() returns 1 and thought it would be
unlikely enough to not be an issue, but as you pointed out there's also
cases where it returns something that's not good enough to take care of
the overflow. This is unfortunately more likely to happen, and makes it
impossible to ignore.

>
> With the old code you get something that is at least not completely
> wrong, just
> not as accurate as is perhaps possible:
> *val 1,186,715,480
> *val2 2
> Or 1,186,715,480 / 2^2 = 296,678,870.
>
> With this patch the above makes you attempt to return the fraction:
> *val 1,379,273,676,757,812,500
> *val2 4,649,045,868
> Or 296,678,870.443403528 (or something like that, not 100% sure about
> all the
> fractional digits, but they are not really important for my argument)
>
> While the latter is more correct, truncation to 32-bit clobbers the
> result so
> in reality this is returned:
> *val -281,918,188
> *val2 354,078,572
> Or -0.796202341
>
> So, while it might seem unlucky that gcd() will not find a big enough
> factor,
> it is certainly possible. And I also worry that when this happens it
> will only
> happen once in a while, and that the resulting bad values might be
> extremely
> unexpected and difficult to track down. Things that happen once in a
> blue moon
> are simply not fun to debug.
>
> I.e. I worry that small islands of input will cause failures. With the
> old code
> there are no such islands. The scale factor alone determines the
> precision, and
> if you get poor precision you get poor precision throughout the range.
> And any
> problem will therefore be "stable" and much easier to debug for
> "innocent" 3rd
> party users that may not even be aware that the rescaler is involved at
> all.

I agree with you, that such islands are a bad thing that might cause a
lot of pain, and it's probably not worth it just to gain a few digits of
precision (that can sometimes be irrelevant).

I'll drop this change and will update the test cases to take into
account an error margin.

>
> This is also an issue I have with patch 7/13, but there the only thing
> that is
> sacrificed is CPU cycles. But nonetheless, I'm dubious if patch 7/13 is
> wise
> precisely because it might cause issues that are intermittent and
> therefore
> difficult to debug.

Again, I agree with you, patch 7/13 has the same limitations,
unfortunately, I did run into an overflow while testing this on a real
setup.

>
> Also, changing the calculation so that you get more precision whenever
> that is
> possible feels dangerous. I fear linearity breaks and that bigger input
> cause
> smaller output due to rounding if the bigger value has to be rounded
> down, but
> that this isn't done carefully enough. I.e. attempting to return an
> exact
> fraction and only falling back to the old code when that is not possible
> is
> still not safe since the old code isn't careful enough about rounding. I
> think
> it is really important that bigger input cause bigger (or equal) output.
> Otherwise you might trigger instability in feedback loops should a
> rescaler be
> involved in a some regulator function.

I see what you mean here, and it's a good point I hadn't considered.

To address some of these concerns, I was thinking of using consecutive
right shifts instead of gcd(), but that seems like the wrong way to go
given that we're working with signed integers.

For 7/13, I'll look into approximating like you did here originally.

Thanks,
Liam

>
> Cheers,
> Peter
>
> > +		}
> >  		*val = tmp;
> > -		return scale_type;
> > +		*val2 = tmp2;
> > +		return IIO_VAL_FRACTIONAL;
> >  	case IIO_VAL_INT_PLUS_NANO:
> >  	case IIO_VAL_INT_PLUS_MICRO:
> >  		if (scale_type == IIO_VAL_INT_PLUS_NANO)
> >
diff mbox series

Patch

diff --git a/drivers/iio/afe/iio-rescale.c b/drivers/iio/afe/iio-rescale.c
index abd7ad73d1ce..e37a9766080c 100644
--- a/drivers/iio/afe/iio-rescale.c
+++ b/drivers/iio/afe/iio-rescale.c
@@ -47,12 +47,17 @@  int rescale_process_scale(struct rescale *rescale, int scale_type,
 		*val2 = rescale->denominator;
 		return IIO_VAL_FRACTIONAL;
 	case IIO_VAL_FRACTIONAL_LOG2:
-		tmp = *val * 1000000000LL;
-		do_div(tmp, rescale->denominator);
-		tmp *= rescale->numerator;
-		do_div(tmp, 1000000000LL);
+		if (check_mul_overflow(*val, rescale->numerator, (s32 *)&tmp) ||
+		    check_mul_overflow(rescale->denominator, (1 << *val2), (s32 *)&tmp2)) {
+			tmp = (s64)*val * rescale->numerator;
+			tmp2 = (s64)rescale->denominator * (1 << *val2);
+			factor = gcd(abs(tmp), abs(tmp2));
+			tmp = div_s64(tmp, factor);
+			tmp2 = div_s64(tmp2, factor);
+		}
 		*val = tmp;
-		return scale_type;
+		*val2 = tmp2;
+		return IIO_VAL_FRACTIONAL;
 	case IIO_VAL_INT_PLUS_NANO:
 	case IIO_VAL_INT_PLUS_MICRO:
 		if (scale_type == IIO_VAL_INT_PLUS_NANO)