Message ID | 20171220142001.18161-1-cmo@melexis.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
From: Crt Mori > Sent: 20 December 2017 14:20 > > There is no option to perform 64bit integer sqrt on 32bit platform. > Added stronger typed int_sqrt64 enables the 64bit calculations to > be performed on 32bit platforms. Although int_sqrt() is a rough > approximation, the same algorithm is used in int_sqrt64() as good > enough on 32bit platform. The algorithm used gives an exact (rounded down) square root, not an approximation. With minor changes it ought to be possible to remove most of the 64bit arithmetic and shifts. If you care about performance then using 32 bit maths will be much faster. David -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 20 December 2017 at 15:39, David Laight <David.Laight@aculab.com> wrote: > From: Crt Mori >> Sent: 20 December 2017 14:20 >> >> There is no option to perform 64bit integer sqrt on 32bit platform. >> Added stronger typed int_sqrt64 enables the 64bit calculations to >> be performed on 32bit platforms. Although int_sqrt() is a rough >> approximation, the same algorithm is used in int_sqrt64() as good >> enough on 32bit platform. > > The algorithm used gives an exact (rounded down) square root, > not an approximation. > OK, noted. This is from new changed function indeed - will change in v11. > With minor changes it ought to be possible to remove most of the > 64bit arithmetic and shifts. > > If you care about performance then using 32 bit maths will be much faster. > I need precision not the performance. That is why I would like to leave int_sqrt as one for good performance while my will be used when you require a precision. > David > -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote: > With minor changes it ought to be possible to remove most of the > 64bit arithmetic and shifts. > > If you care about performance then using 32 bit maths will be much faster. Some, u64 add/sub/shift isn't exactly expensive, but yes, I also indicated that improvement is possible. At the very least y can be made a u32 I suppose. -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote: > >> With minor changes it ought to be possible to remove most of the >> 64bit arithmetic and shifts. >> >> If you care about performance then using 32 bit maths will be much faster. > > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also > indicated that improvement is possible. At the very least y can be made > a u32 I suppose. OK, is there any more easy optimizations you see? -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Crt Mori > Sent: 20 December 2017 16:17 > > On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote: > > > >> With minor changes it ought to be possible to remove most of the > >> 64bit arithmetic and shifts. > >> > >> If you care about performance then using 32 bit maths will be much faster. > > > > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also > > indicated that improvement is possible. At the very least y can be made > > a u32 I suppose. > > OK, is there any more easy optimizations you see? I think this version works. It doesn't have the optimisation for small values. unsigned int sqrt64(unsigned long long x) { unsigned int x_hi = x >> 32; unsigned int b = 0; unsigned int y = 0; unsigned int i; for (i = 0; i < 32; i++) { b <<= 2; b |= x_hi >> 30; x_hi <<= 2; if (i == 15) x_hi = x; y <<= 1; if (b > y) b -= ++y; } return y; } Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o and compare to that of your version. David
On Wed, 2017-12-20 at 16:46 +0000, David Laight wrote: > From: Crt Mori > > Sent: 20 December 2017 16:17 > > > > On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote: > > > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote: > > > > > > > With minor changes it ought to be possible to remove most of the > > > > 64bit arithmetic and shifts. > > > > > > > > If you care about performance then using 32 bit maths will be much faster. > > > > > > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also > > > indicated that improvement is possible. At the very least y can be made > > > a u32 I suppose. > > > > OK, is there any more easy optimizations you see? > > I think this version works. > It doesn't have the optimisation for small values. > > unsigned int sqrt64(unsigned long long x) > { > unsigned int x_hi = x >> 32; > > unsigned int b = 0; > unsigned int y = 0; > unsigned int i; > Perhaps add: if (x <= UINT_MAX) return int_sqrt((unsigned long)x); > for (i = 0; i < 32; i++) { > b <<= 2; > b |= x_hi >> 30; > x_hi <<= 2; > if (i == 15) > x_hi = x; > y <<= 1; > if (b > y) > b -= ++y; > } > return y; > } > > Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o > and compare to that of your version. > > David > -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 20 December 2017 at 17:46, David Laight <David.Laight@aculab.com> wrote: > From: Crt Mori >> Sent: 20 December 2017 16:17 >> >> On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote: >> > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote: >> > >> >> With minor changes it ought to be possible to remove most of the >> >> 64bit arithmetic and shifts. >> >> >> >> If you care about performance then using 32 bit maths will be much faster. >> > >> > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also >> > indicated that improvement is possible. At the very least y can be made >> > a u32 I suppose. >> >> OK, is there any more easy optimizations you see? > > I think this version works. > It doesn't have the optimisation for small values. > > unsigned int sqrt64(unsigned long long x) > { > unsigned int x_hi = x >> 32; > > unsigned int b = 0; > unsigned int y = 0; > unsigned int i; > > for (i = 0; i < 32; i++) { > b <<= 2; > b |= x_hi >> 30; > x_hi <<= 2; > if (i == 15) > x_hi = x; > y <<= 1; > if (b > y) > b -= ++y; > } > return y; > } > > Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o > and compare to that of your version. > > David > Wow, thanks. This seems like a major change. I did a quick run through unit tests for the sensor and the results are way off. On the sensor I had to convert double calculations to integer calculations and target was to get end result within 0.02 degC (with previous approximate sqrt implementation) in sensor working range. This now gets into 3 degC delta at least and some are way off. It might be off because of some scaling on the other hand during the equation (not exactly comparing sqrt implementations here). I will need a bit more testing of this to see if it is replaceable. -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Joe Perches > Sent: 20 December 2017 17:20 ... > > I think this version works. > > It doesn't have the optimisation for small values. > > > > unsigned int sqrt64(unsigned long long x) > > { > > unsigned int x_hi = x >> 32; > > > > unsigned int b = 0; > > unsigned int y = 0; > > unsigned int i; > > > > Perhaps add: > > if (x <= UINT_MAX) > return int_sqrt((unsigned long)x); Actually something like: i = 32; if (!x_hi) { x_hi = x; i = 16; } if (!(x_hi & 0xffff0000)) { x_hi <<= 16; i -= 8; } Repeat for 0xff000000, 0xf0000000 and 0xc0000000 and adjust loop to count down. David > > > for (i = 0; i < 32; i++) { > > b <<= 2; > > b |= x_hi >> 30; > > x_hi <<= 2; > > if (i == 15) > > x_hi = x; > > y <<= 1; > > if (b > y) > > b -= ++y; > > } > > return y; > > } > > > > Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o > > and compare to that of your version. > > > > David > > -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RnJvbTogQ3J0IE1vcmkNCj4gU2VudDogMjAgRGVjZW1iZXIgMjAxNyAxNzozMA0KPiBUbzogRGF2 aWQgTGFpZ2h0DQouLi4NCj4gPiBJIHRoaW5rIHRoaXMgdmVyc2lvbiB3b3Jrcy4NCj4gPiBJdCBk b2Vzbid0IGhhdmUgdGhlIG9wdGltaXNhdGlvbiBmb3Igc21hbGwgdmFsdWVzLg0KPiA+DQo+ID4g dW5zaWduZWQgaW50IHNxcnQ2NCh1bnNpZ25lZCBsb25nIGxvbmcgeCkNCj4gPiB7DQo+ID4gICAg ICAgICB1bnNpZ25lZCBpbnQgeF9oaSA9IHggPj4gMzI7DQo+ID4NCj4gPiAgICAgICAgIHVuc2ln bmVkIGludCBiID0gMDsNCj4gPiAgICAgICAgIHVuc2lnbmVkIGludCB5ID0gMDsNCj4gPiAgICAg ICAgIHVuc2lnbmVkIGludCBpOw0KPiA+DQo+ID4gICAgICAgICBmb3IgKGkgPSAwOyBpIDwgMzI7 IGkrKykgew0KPiA+ICAgICAgICAgICAgICAgICBiIDw8PSAyOw0KPiA+ICAgICAgICAgICAgICAg ICBiIHw9IHhfaGkgPj4gMzA7DQo+ID4gICAgICAgICAgICAgICAgIHhfaGkgPDw9IDI7DQo+ID4g ICAgICAgICAgICAgICAgIGlmIChpID09IDE1KQ0KPiA+ICAgICAgICAgICAgICAgICAgICAgICAg IHhfaGkgPSB4Ow0KPiA+ICAgICAgICAgICAgICAgICB5IDw8PSAxOw0KPiA+ICAgICAgICAgICAg ICAgICBpZiAoYiA+IHkpDQo+ID4gICAgICAgICAgICAgICAgICAgICAgICAgYiAtPSArK3k7DQo+ ID4gICAgICAgICB9DQo+ID4gICAgICAgICByZXR1cm4geTsNCj4gPiB9DQo+IA0KPiBXb3csIHRo YW5rcy4gVGhpcyBzZWVtcyBsaWtlIGEgbWFqb3IgY2hhbmdlLg0KDQpOb3QgcmVhbGx5LCBpdCBp cyBqdXN0IGRvaW5nIGl0IHNsaWdodGx5IGRpZmZlcmVudGx5Lg0KVGhlIGFsZ29yaXRobSBpcyB0 aGUgJ3NjaG9vbGJveScgb25lIHRoYXQgdXNlZCB0byBiZSB0YXVnaHQgYXQgc2Nob29sLg0KKEFs dGhvdWdoIEkgZm91bmQgaXQgaW4gYSBib290IGZyb20gdGhlIDE5MzBzLikNCg0KSSBjb21wYXJl ZCB0aGUgb2JqZWN0IGNvZGUgZm9yIGFtZDY0IChkb2Vzbid0IG5lZWQgdG8gcmVsb2FkDQoneF9o aScgaGFsZiB3YXkgdGhyb3VnaCkgYWdhaW5zdCB0aGUgb3JpZ2luYWwgbG9vcC4NClRoZXkgYXJl IG11Y2ggdGhlIHNhbWUgc2l6ZSwgZXhlY3V0aW9uIHNwZWVkIHdpbGwgZGVwZW5kIHRoZSBsZW5n dGhzDQpvZiB0aGUgZGVwZW5kZW5jeSBjaGFpbnMuDQoNCj4gSSBkaWQgYSBxdWljayBydW4gdGhy b3VnaCB1bml0IHRlc3RzIGZvciB0aGUgc2Vuc29yIGFuZCB0aGUgcmVzdWx0cw0KPiBhcmUgd2F5 IG9mZi4gT24gdGhlIHNlbnNvciBJIGhhZCB0byBjb252ZXJ0IGRvdWJsZSBjYWxjdWxhdGlvbnMg dG8NCj4gaW50ZWdlciBjYWxjdWxhdGlvbnMgYW5kIHRhcmdldCB3YXMgdG8gZ2V0IGVuZCByZXN1 bHQgd2l0aGluIDAuMDIgZGVnQw0KPiAod2l0aCBwcmV2aW91cyBhcHByb3hpbWF0ZSBzcXJ0IGlt cGxlbWVudGF0aW9uKSBpbiBzZW5zb3Igd29ya2luZw0KPiByYW5nZS4gVGhpcyBub3cgZ2V0cyBp bnRvIDMgZGVnQyBkZWx0YSBhdCBsZWFzdCBhbmQgc29tZSBhcmUgd2F5IG9mZi4NCj4gSXQgbWln aHQgYmUgb2ZmIGJlY2F1c2Ugb2Ygc29tZSBzY2FsaW5nIG9uIHRoZSBvdGhlciBoYW5kIGR1cmlu ZyB0aGUNCj4gZXF1YXRpb24gKG5vdCBleGFjdGx5IGNvbXBhcmluZyBzcXJ0IGltcGxlbWVudGF0 aW9ucyBoZXJlKS4NCj4gDQo+IEkgd2lsbCBuZWVkIGEgYml0IG1vcmUgdGVzdGluZyBvZiB0aGlz IHRvIHNlZSBpZiBpdCBpcyByZXBsYWNlYWJsZS4NCg0KWW91IHByb2JhYmx5IG5lZWQgdG8gcHV0 IHZhbHVlcyB0aHJvdWdoIGJvdGggKGFsbCAzKSBmdW5jdGlvbnMgdG8gc2VlDQp3aGVyZSB5b3Ug YXJlIGdldHRpbmcgZXJyb3JzLg0KSXQgbWlnaHQgYmUgcm91bmRpbmcsIG9yIHRoZXJlIG1pZ2h0 IGJlIGEgZnViYXIgc29tZXdoZXJlLg0KDQoJRGF2aWQNCg0K -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Crt Mori > Sent: 20 December 2017 17:30 > >> OK, is there any more easy optimizations you see? > > > > I think this version works. > > It doesn't have the optimisation for small values. > > > > unsigned int sqrt64(unsigned long long x) > > { > > unsigned int x_hi = x >> 32; > > > > unsigned int b = 0; > > unsigned int y = 0; > > unsigned int i; > > > > for (i = 0; i < 32; i++) { > > b <<= 2; > > b |= x_hi >> 30; > > x_hi <<= 2; > > if (i == 15) > > x_hi = x; > > y <<= 1; > > if (b > y) > > b -= ++y; > > } > > return y; > > } .. > > I did a quick run through unit tests for the sensor and the results > are way off. On the sensor I had to convert double calculations to > integer calculations and target was to get end result within 0.02 degC > (with previous approximate sqrt implementation) in sensor working > range. This now gets into 3 degC delta at least and some are way off. > It might be off because of some scaling on the other hand during the > equation (not exactly comparing sqrt implementations here). I didn't get it quite right... The last few lines need to be: if (b > y) { b -= ++y; y++; } } return y >> 1; } Although that then fails for inputs larger than 2^62. David
From: Crt Mori > Sent: 20 December 2017 17:30 > I did a quick run through unit tests for the sensor and the results > are way off > ... Try this version instead: unsigned int sqrt64(unsigned long long x_in) { unsigned int x = x_in >> 32; unsigned int b = 0; unsigned int y = 0; unsigned int i; i = 31; if (!x) { x = x_in; i = 15; } if (!(x & 0xffff0000)) { x <<= 16; i -= 8; } if (!(x & 0xff000000)) { x <<= 8; i -= 4; } if (!(x & 0xf0000000)) { x <<= 4; i -= 2; } do { b <<= 2; b |= x >> 30; x <<= 2; if (i == 16) x = x_in; y <<= 1; if (b > y) { b -= ++y; y++; } } while (--i); /* 'b' becomes 33 bits if the input is greater than 2^62 */ b <<= 1; b |= x >> 31; if (b > y || (b == y && x & (1u << 30))) y |= 1; return y; } I've tested that one with more values. David
On 21 December 2017 at 12:43, David Laight <David.Laight@aculab.com> wrote: > From: Crt Mori >> Sent: 20 December 2017 17:30 >> I did a quick run through unit tests for the sensor and the results >> are way off >> ... > > Try this version instead: > unsigned int sqrt64(unsigned long long x_in) > { > unsigned int x = x_in >> 32; > > unsigned int b = 0; > unsigned int y = 0; > unsigned int i; i can be u8. And I will still use explicit typing. > > i = 31; > if (!x) { > x = x_in; > i = 15; > } > if (!(x & 0xffff0000)) { > x <<= 16; > i -= 8; > } > if (!(x & 0xff000000)) { > x <<= 8; > i -= 4; > } > if (!(x & 0xf0000000)) { > x <<= 4; > i -= 2; > } > This part above looks like FLS > do { > b <<= 2; > b |= x >> 30; > x <<= 2; > if (i == 16) > x = x_in; > y <<= 1; > if (b > y) { > b -= ++y; > y++; > } > } while (--i); > > /* 'b' becomes 33 bits if the input is greater than 2^62 */ > b <<= 1; > b |= x >> 31; > if (b > y || (b == y && x & (1u << 30))) > y |= 1; > > return y; > } > > I've tested that one with more values. > > David > This one indeed works. I did some more testing this morning and I am fine with either. So question is: Do I make change as per David's suggestion with his sign-off, or leave the version it was before the change? -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RnJvbTogQ3J0IE1vcmkNCj4gU2VudDogMjEgRGVjZW1iZXIgMjAxNyAxMzoxOA0KLi4uDQo+ID4g ICAgICAgICB1bnNpZ25lZCBpbnQgaTsNCj4gDQo+IGkgY2FuIGJlIHU4LiBBbmQgSSB3aWxsIHN0 aWxsIHVzZSBleHBsaWNpdCB0eXBpbmcuDQoNCnU4IHdpbGwgYWRkIGV4dHJhIGNvZGUsIHVuc2ln bmVkIGludCBpcyBnb29kLg0KJ3gnIG5lZWRzIHRvIGJlIHUzMiwgJ3knIGFuZCAnYicgY291bGQg YmUgbGFyZ2VyLg0KDQpJIHdhcyB0ZXN0aW5nIGluIHVzZXJzcGFjZS4NCg0KLi4uDQo+IFRoaXMg cGFydCBhYm92ZSBsb29rcyBsaWtlIEZMUw0KSXQgYWxzbyBkb2VzIHRoZSByZXN0IG9mIHRoZSBy ZXF1aXJlZCBzaGlmdHMuDQoNCi4uLg0KPiBUaGlzIG9uZSBpbmRlZWQgd29ya3MuIEkgZGlkIHNv bWUgbW9yZSB0ZXN0aW5nIHRoaXMgbW9ybmluZyBhbmQgSSBhbQ0KPiBmaW5lIHdpdGggZWl0aGVy Lg0KPiANCj4gU28gcXVlc3Rpb24gaXM6IERvIEkgbWFrZSBjaGFuZ2UgYXMgcGVyIERhdmlkJ3Mg c3VnZ2VzdGlvbiB3aXRoIGhpcw0KPiBzaWduLW9mZiwgb3IgbGVhdmUgdGhlIHZlcnNpb24gaXQg d2FzIGJlZm9yZSB0aGUgY2hhbmdlPw0KDQpJZiB5b3UgZ2VuZXJhdGUgdGhlIGFjdHVhbCBwYXRj aCBJIGNhbiBhZGQgYW4gYXBwcm9wcmlhdGUgc2lnbi1vZmYNCihvciB3aGF0ZXZlciBpcyBuZWVk ZWQpLg0KDQoJRGF2aWQNCg0K -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 21, 2017 at 01:56:55PM +0000, David Laight wrote: > > This part above looks like FLS > It also does the rest of the required shifts. Still, fls() + shift is way faster on hardware that has an fls instruction. Writing out that binary search doesn't make sense. -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Peter Zijlstra > Sent: 21 December 2017 14:12 ... > > > This part above looks like FLS > > It also does the rest of the required shifts. > > Still, fls() + shift is way faster on hardware that has an fls > instruction. > > Writing out that binary search doesn't make sense. If the hardware doesn't have an appropriate fls instruction the soft fls()will be worse. If you used fls() you'd still need quite a bit of code to generate the correct shift and loop count adjustment. Given the cost of the loop iterations the 3 tests are noise. The open coded version is obviously correct... I didn't add the 4th one because the code always does 2 iterations. If you were really worried about performance there are faster algorithms (even doing 2 or 4 bits a time is faster). David -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From simple strong typing of existing int_sqrt we came to something a bit more complex or better. Can we decide now which we want in, or I submit v12 and we decide then (although it is not a v12, but whole new thing)? On 21 December 2017 at 15:48, David Laight <David.Laight@aculab.com> wrote: > From: Peter Zijlstra >> Sent: 21 December 2017 14:12 > ... >> > > This part above looks like FLS >> > It also does the rest of the required shifts. >> >> Still, fls() + shift is way faster on hardware that has an fls >> instruction. >> >> Writing out that binary search doesn't make sense. > > If the hardware doesn't have an appropriate fls instruction > the soft fls()will be worse. > > If you used fls() you'd still need quite a bit of code > to generate the correct shift and loop count adjustment. > Given the cost of the loop iterations the 3 tests are noise. > The open coded version is obviously correct... > > I didn't add the 4th one because the code always does 2 iterations. > > If you were really worried about performance there are faster > algorithms (even doing 2 or 4 bits a time is faster). > > David > -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
It has been some time now since this moved. I have decided not to use David's implementation because I want to maintain also range above 2^62 Are there any additional objections for this not to go in as it is? On 22 December 2017 at 14:44, Crt Mori <cmo@melexis.com> wrote: > > From simple strong typing of existing int_sqrt we came to something a > bit more complex or better. Can we decide now which we want in, or I > submit v12 and we decide then (although it is not a v12, but whole new > thing)? > > On 21 December 2017 at 15:48, David Laight <David.Laight@aculab.com> wrote: > > From: Peter Zijlstra > >> Sent: 21 December 2017 14:12 > > ... > >> > > This part above looks like FLS > >> > It also does the rest of the required shifts. > >> > >> Still, fls() + shift is way faster on hardware that has an fls > >> instruction. > >> > >> Writing out that binary search doesn't make sense. > > > > If the hardware doesn't have an appropriate fls instruction > > the soft fls()will be worse. > > > > If you used fls() you'd still need quite a bit of code > > to generate the correct shift and loop count adjustment. > > Given the cost of the loop iterations the 3 tests are noise. > > The open coded version is obviously correct... > > > > I didn't add the 4th one because the code always does 2 iterations. > > > > If you were really worried about performance there are faster > > algorithms (even doing 2 or 4 bits a time is faster). > > > > David > > -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RnJvbTogQ3J0IE1vcmkgW21haWx0bzpjbW9AbWVsZXhpcy5jb21dDQo+IFNlbnQ6IDA5IEphbnVh cnkgMjAxOCAxNToxOA0KPiANCj4gSXQgaGFzIGJlZW4gc29tZSB0aW1lIG5vdyBzaW5jZSB0aGlz IG1vdmVkLiBJIGhhdmUgZGVjaWRlZCBub3QgdG8gdXNlDQo+IERhdmlkJ3MgaW1wbGVtZW50YXRp b24gYmVjYXVzZSBJIHdhbnQgdG8gbWFpbnRhaW4gYWxzbyByYW5nZSBhYm92ZQ0KPiAyXjYyDQoN ClRoZSBsYXN0IHZlcnNpb24gSSBkaWQgc3VwcG9ydGVkIHRoZSBmdWxsIHJhbmdlLg0KDQoJRGF2 aWQNCg0K -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12 January 2018 at 10:41, David Laight <David.Laight@aculab.com> wrote: > From: Crt Mori [mailto:cmo@melexis.com] >> Sent: 09 January 2018 15:18 >> >> It has been some time now since this moved. I have decided not to use >> David's implementation because I want to maintain also range above >> 2^62 > > The last version I did supported the full range. Nothing changed below that comment, so I was assuming it is not supported (or did I miss a mail?). Also fls discussion had opposite opinions or it just seems inconclusive to me? > > David > So you all agree David Laight version is much better and should be used instead of currently proposed version? -- To unsubscribe from this list: send the line "unsubscribe linux-iio" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 0ad4c3044cf9..e4a3dc64e650 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -460,6 +460,15 @@ extern int func_ptr_is_kernel_text(void *ptr); unsigned long int_sqrt(unsigned long); +#if BITS_PER_LONG < 64 +u32 int_sqrt64(u64 x); +#else +static inline u32 int_sqrt64(u64 x) +{ + return (u32)int_sqrt(x); +} +#endif + extern void bust_spinlocks(int yes); extern int oops_in_progress; /* If set, an oops, panic(), BUG() or die() is in progress */ extern int panic_timeout; diff --git a/lib/int_sqrt.c b/lib/int_sqrt.c index 1ef4cc344977..184da54677ec 100644 --- a/lib/int_sqrt.c +++ b/lib/int_sqrt.c @@ -7,6 +7,7 @@ #include <linux/kernel.h> #include <linux/export.h> +#include <linux/bitops.h> /** * int_sqrt - rough approximation to sqrt @@ -36,3 +37,33 @@ unsigned long int_sqrt(unsigned long x) return y; } EXPORT_SYMBOL(int_sqrt); + +#if BITS_PER_LONG < 64 +/** + * int_sqrt64 - strongly typed int_sqrt function when minimum 64 bit input + * is expected. + * @x: 64bit integer of which to calculate the sqrt + */ +u32 int_sqrt64(u64 x) +{ + u64 b, m, y = 0; + + if (x <= 1) + return x; + + m = 1ULL << (fls64(x) & ~1ULL); + while (m != 0) { + b = y + m; + y >>= 1; + + if (x >= b) { + x -= b; + y += m; + } + m >>= 2; + } + + return y; +} +EXPORT_SYMBOL(int_sqrt64); +#endif
There is no option to perform 64bit integer sqrt on 32bit platform. Added stronger typed int_sqrt64 enables the 64bit calculations to be performed on 32bit platforms. Although int_sqrt() is a rough approximation, the same algorithm is used in int_sqrt64() as good enough on 32bit platform. Signed-off-by: Crt Mori <cmo@melexis.com> --- include/linux/kernel.h | 9 +++++++++ lib/int_sqrt.c | 31 +++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+)