diff mbox series

[v2,07/13] x86/bitops: Improve arch_ffs() in the general case

Message ID 20240524200338.1232391-8-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series xen/bitops: Untangle ffs()/fls() infrastructure | expand

Commit Message

Andrew Cooper May 24, 2024, 8:03 p.m. UTC
The asm in arch_ffs() is safe but inefficient.

CMOV would be an improvement over a conditional branch, but for 64bit CPUs
both Intel and AMD have provided enough details about the behaviour for a zero
input.  It is safe to pre-load the destination register with -1 and drop the
conditional logic.

However, it is common to find ffs() in a context where the optimiser knows
that x in nonzero even if it the value isn't known precisely, and in that case
it's safe to drop the preload of -1 too.

There are only a handful of uses of ffs() in the x86 build, and all of them
improve as a result of this:

  add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-31 (-31)
  Function                                     old     new   delta
  mask_write                                   114     107      -7
  xmem_pool_alloc                             1063    1039     -24

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
CC: Shawn Anastasio <sanastasio@raptorengineering.com>
CC: consulting@bugseng.com <consulting@bugseng.com>
CC: Simone Ballarin <simone.ballarin@bugseng.com>
CC: Federico Serafini <federico.serafini@bugseng.com>
CC: Nicola Vetrini <nicola.vetrini@bugseng.com>

v2:
 * New.
 * Use __builtin_constant_p(x > 0) to optimise better.
---
 xen/arch/x86/include/asm/bitops.h | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

Comments

Jan Beulich May 27, 2024, 12:40 p.m. UTC | #1
On 24.05.2024 22:03, Andrew Cooper wrote:
> The asm in arch_ffs() is safe but inefficient.
> 
> CMOV would be an improvement over a conditional branch, but for 64bit CPUs
> both Intel and AMD have provided enough details about the behaviour for a zero
> input.  It is safe to pre-load the destination register with -1 and drop the
> conditional logic.
> 
> However, it is common to find ffs() in a context where the optimiser knows
> that x in nonzero even if it the value isn't known precisely, and in that case
> it's safe to drop the preload of -1 too.
> 
> There are only a handful of uses of ffs() in the x86 build, and all of them
> improve as a result of this:
> 
>   add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-31 (-31)
>   Function                                     old     new   delta
>   mask_write                                   114     107      -7
>   xmem_pool_alloc                             1063    1039     -24
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one suggestion:

> --- a/xen/arch/x86/include/asm/bitops.h
> +++ b/xen/arch/x86/include/asm/bitops.h
> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
>  
>  static always_inline unsigned int arch_ffs(unsigned int x)
>  {
> -    int r;
> +    unsigned int r;
> +
> +    if ( __builtin_constant_p(x > 0) && x > 0 )

__builtin_constant_p(x) surely will do. In fact even the other "> 0" could
in principle be left out here.

Jan
Jan Beulich May 27, 2024, 1:27 p.m. UTC | #2
On 24.05.2024 22:03, Andrew Cooper wrote:
> --- a/xen/arch/x86/include/asm/bitops.h
> +++ b/xen/arch/x86/include/asm/bitops.h
> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
>  
>  static always_inline unsigned int arch_ffs(unsigned int x)
>  {
> -    int r;
> +    unsigned int r;
> +
> +    if ( __builtin_constant_p(x > 0) && x > 0 )
> +    {
> +        /* Safe, when the compiler knows that x is nonzero. */
> +        asm ( "bsf %[val], %[res]"
> +              : [res] "=r" (r)
> +              : [val] "rm" (x) );
> +    }

In patch 11 relevant things are all in a single patch, making it easier
to spot that this is dead code: The sole caller already has a
__builtin_constant_p(), hence I don't see how the one here could ever
return true. With that the respective part of the description is then
questionable, too, I'm afraid: Where did you observe any actual effect
from this? Or if you did - what am I missing?

Jan
Jan Beulich May 27, 2024, 1:37 p.m. UTC | #3
On 27.05.2024 15:27, Jan Beulich wrote:
> On 24.05.2024 22:03, Andrew Cooper wrote:
>> --- a/xen/arch/x86/include/asm/bitops.h
>> +++ b/xen/arch/x86/include/asm/bitops.h
>> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
>>  
>>  static always_inline unsigned int arch_ffs(unsigned int x)
>>  {
>> -    int r;
>> +    unsigned int r;
>> +
>> +    if ( __builtin_constant_p(x > 0) && x > 0 )
>> +    {
>> +        /* Safe, when the compiler knows that x is nonzero. */
>> +        asm ( "bsf %[val], %[res]"
>> +              : [res] "=r" (r)
>> +              : [val] "rm" (x) );
>> +    }
> 
> In patch 11 relevant things are all in a single patch, making it easier
> to spot that this is dead code: The sole caller already has a
> __builtin_constant_p(), hence I don't see how the one here could ever
> return true. With that the respective part of the description is then
> questionable, too, I'm afraid: Where did you observe any actual effect
> from this? Or if you did - what am I missing?

Hmm, thinking about it: I suppose that's why you have
__builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit
I'm (positively) surprised that the former may return true when the latter
doesn't. Nevertheless I'm inclined to think this deserves a brief comment.

As an aside, to better match the comment inside the if()'s body, how about

    if ( __builtin_constant_p(!!x) && x )

? That also may make a little more clear that this isn't just a style
choice, but actually needed for the intended purpose.

Jan
Andrew Cooper May 28, 2024, 12:30 p.m. UTC | #4
On 27/05/2024 2:37 pm, Jan Beulich wrote:
> On 27.05.2024 15:27, Jan Beulich wrote:
>> On 24.05.2024 22:03, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/include/asm/bitops.h
>>> +++ b/xen/arch/x86/include/asm/bitops.h
>>> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
>>>  
>>>  static always_inline unsigned int arch_ffs(unsigned int x)
>>>  {
>>> -    int r;
>>> +    unsigned int r;
>>> +
>>> +    if ( __builtin_constant_p(x > 0) && x > 0 )
>>> +    {
>>> +        /* Safe, when the compiler knows that x is nonzero. */
>>> +        asm ( "bsf %[val], %[res]"
>>> +              : [res] "=r" (r)
>>> +              : [val] "rm" (x) );
>>> +    }
>> In patch 11 relevant things are all in a single patch, making it easier
>> to spot that this is dead code: The sole caller already has a
>> __builtin_constant_p(), hence I don't see how the one here could ever
>> return true. With that the respective part of the description is then
>> questionable, too, I'm afraid: Where did you observe any actual effect
>> from this? Or if you did - what am I missing?
> Hmm, thinking about it: I suppose that's why you have
> __builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit
> I'm (positively) surprised that the former may return true when the latter
> doesn't.

So was I, but this recommendation came straight from the GCC mailing
list.  And it really does work, even back in obsolete versions of GCC.

__builtin_constant_p() operates on an expression not a value, and is
documented as such.


>  Nevertheless I'm inclined to think this deserves a brief comment.

There is a comment, and it's even visible in the snippet.

> As an aside, to better match the comment inside the if()'s body, how about
>
>     if ( __builtin_constant_p(!!x) && x )
>
> ? That also may make a little more clear that this isn't just a style
> choice, but actually needed for the intended purpose.

I am not changing the logic.

Apart from anything else, your suggestion is trivially buggy.  I care
about whether the RHS collapses to a constant, and the only way of doing
that correctly is asking the compiler about the *exact* expression. 
Asking about some other expression which you hope - but do not know -
that the compiler will treat equivalently is bogus.  It would be
strictly better to only take the else clause, than to have both halves
emitted.

This is the form I've tested extensively.  It's also the clearest form
IMO.  You can experiment with alternative forms when we're not staring
down code freeze of 4.19.

~Andrew
Jan Beulich May 28, 2024, 1:12 p.m. UTC | #5
On 28.05.2024 14:30, Andrew Cooper wrote:
> On 27/05/2024 2:37 pm, Jan Beulich wrote:
>> On 27.05.2024 15:27, Jan Beulich wrote:
>>> On 24.05.2024 22:03, Andrew Cooper wrote:
>>>> --- a/xen/arch/x86/include/asm/bitops.h
>>>> +++ b/xen/arch/x86/include/asm/bitops.h
>>>> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
>>>>  
>>>>  static always_inline unsigned int arch_ffs(unsigned int x)
>>>>  {
>>>> -    int r;
>>>> +    unsigned int r;
>>>> +
>>>> +    if ( __builtin_constant_p(x > 0) && x > 0 )
>>>> +    {
>>>> +        /* Safe, when the compiler knows that x is nonzero. */
>>>> +        asm ( "bsf %[val], %[res]"
>>>> +              : [res] "=r" (r)
>>>> +              : [val] "rm" (x) );
>>>> +    }
>>> In patch 11 relevant things are all in a single patch, making it easier
>>> to spot that this is dead code: The sole caller already has a
>>> __builtin_constant_p(), hence I don't see how the one here could ever
>>> return true. With that the respective part of the description is then
>>> questionable, too, I'm afraid: Where did you observe any actual effect
>>> from this? Or if you did - what am I missing?
>> Hmm, thinking about it: I suppose that's why you have
>> __builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit
>> I'm (positively) surprised that the former may return true when the latter
>> doesn't.
> 
> So was I, but this recommendation came straight from the GCC mailing
> list.  And it really does work, even back in obsolete versions of GCC.
> 
> __builtin_constant_p() operates on an expression not a value, and is
> documented as such.

Of course.

>>  Nevertheless I'm inclined to think this deserves a brief comment.
> 
> There is a comment, and it's even visible in the snippet.

The comment is about the asm(); it is neither placed to clearly relate
to __builtin_constant_p(), nor is it saying anything about this specific
property of it. You said you were equally surprised; don't you think
that when both of us are surprised, a specific (even if brief) comment
is warranted?

>> As an aside, to better match the comment inside the if()'s body, how about
>>
>>     if ( __builtin_constant_p(!!x) && x )
>>
>> ? That also may make a little more clear that this isn't just a style
>> choice, but actually needed for the intended purpose.
> 
> I am not changing the logic.
> 
> Apart from anything else, your suggestion is trivially buggy.  I care
> about whether the RHS collapses to a constant, and the only way of doing
> that correctly is asking the compiler about the *exact* expression. 
> Asking about some other expression which you hope - but do not know -
> that the compiler will treat equivalently is bogus.  It would be
> strictly better to only take the else clause, than to have both halves
> emitted.
> 
> This is the form I've tested extensively.  It's also the clearest form
> IMO.  You can experiment with alternative forms when we're not staring
> down code freeze of 4.19.

"Clearest form" is almost always a matter of taste. To me, comparing
unsigned values with > or < against 0 is generally at least suspicious.
Using != is typically better (again: imo), and simply omitting the != 0
then is shorter with no difference in effect. Except in peculiar cases
like this one, where indeed it took me some time to figure why the
comparison operator may not be omitted.

All that said: I'm not going to insist on any change; the R-b previously
offered still stands. I would highly appreciate though if the (further)
comment asked for could be added.

What I definitely dislike here is you - not for the first time - turning
down remarks because a change of yours is late. This feels even more so
bad when considering that I'm typically replying to your patches with
pretty little turnaround. Whereas various of mine, pending in part for
years, do not seem to deserve any review comments at all. Unlike before,
where it was "only" improvements or feature additions, meanwhile even
bug fixes are left sit like that. If I may be blunt: This may not work
this way for much longer. At some point I will need to artificially
delay reviews, making them dependent on my own work also being allowed
to make progress. I question though whether that would be in everyone's
interest.

Jan
Andrew Cooper June 1, 2024, 1:47 a.m. UTC | #6
On 28/05/2024 2:12 pm, Jan Beulich wrote:
> On 28.05.2024 14:30, Andrew Cooper wrote:
>> On 27/05/2024 2:37 pm, Jan Beulich wrote:
>>> On 27.05.2024 15:27, Jan Beulich wrote:
>>>> On 24.05.2024 22:03, Andrew Cooper wrote:
>>>>> --- a/xen/arch/x86/include/asm/bitops.h
>>>>> +++ b/xen/arch/x86/include/asm/bitops.h
>>>>> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
>>>>>  
>>>>>  static always_inline unsigned int arch_ffs(unsigned int x)
>>>>>  {
>>>>> -    int r;
>>>>> +    unsigned int r;
>>>>> +
>>>>> +    if ( __builtin_constant_p(x > 0) && x > 0 )
>>>>> +    {
>>>>> +        /* Safe, when the compiler knows that x is nonzero. */
>>>>> +        asm ( "bsf %[val], %[res]"
>>>>> +              : [res] "=r" (r)
>>>>> +              : [val] "rm" (x) );
>>>>> +    }
>>>> In patch 11 relevant things are all in a single patch, making it easier
>>>> to spot that this is dead code: The sole caller already has a
>>>> __builtin_constant_p(), hence I don't see how the one here could ever
>>>> return true. With that the respective part of the description is then
>>>> questionable, too, I'm afraid: Where did you observe any actual effect
>>>> from this? Or if you did - what am I missing?
>>> Hmm, thinking about it: I suppose that's why you have
>>> __builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit
>>> I'm (positively) surprised that the former may return true when the latter
>>> doesn't.
>> So was I, but this recommendation came straight from the GCC mailing
>> list.  And it really does work, even back in obsolete versions of GCC.
>>
>> __builtin_constant_p() operates on an expression not a value, and is
>> documented as such.
> Of course.
>
>>>  Nevertheless I'm inclined to think this deserves a brief comment.
>> There is a comment, and it's even visible in the snippet.
> The comment is about the asm(); it is neither placed to clearly relate
> to __builtin_constant_p(), nor is it saying anything about this specific
> property of it. You said you were equally surprised; don't you think
> that when both of us are surprised, a specific (even if brief) comment
> is warranted?

Spell it out for me like I'm an idiot.

Because I'm looking at the patch I submitted, and at your request for "a
brief comment", and I still have no idea what you think is wrong at the
moment.

I'm also not included to write a comment saying "go and read the GCC
manual more carefully".

>
>>> As an aside, to better match the comment inside the if()'s body, how about
>>>
>>>     if ( __builtin_constant_p(!!x) && x )
>>>
>>> ? That also may make a little more clear that this isn't just a style
>>> choice, but actually needed for the intended purpose.
>> I am not changing the logic.
>>
>> Apart from anything else, your suggestion is trivially buggy.  I care
>> about whether the RHS collapses to a constant, and the only way of doing
>> that correctly is asking the compiler about the *exact* expression. 
>> Asking about some other expression which you hope - but do not know -
>> that the compiler will treat equivalently is bogus.  It would be
>> strictly better to only take the else clause, than to have both halves
>> emitted.
>>
>> This is the form I've tested extensively.  It's also the clearest form
>> IMO.  You can experiment with alternative forms when we're not staring
>> down code freeze of 4.19.
> "Clearest form" is almost always a matter of taste. To me, comparing
> unsigned values with > or < against 0 is generally at least suspicious.
> Using != is typically better (again: imo), and simply omitting the != 0
> then is shorter with no difference in effect. Except in peculiar cases
> like this one, where indeed it took me some time to figure why the
> comparison operator may not be omitted.
>
> All that said: I'm not going to insist on any change; the R-b previously
> offered still stands. I would highly appreciate though if the (further)
> comment asked for could be added.
>
> What I definitely dislike here is you - not for the first time - turning
> down remarks because a change of yours is late.

Actually it's not to do with the release.  I'd reject it at any point
because it's an unreasonable request to make; to me, or to anyone else.

It would be a matter of taste (which again you have a singular view on),
if it wasn't for the fact that what you actually said was:

"I don't like it, and you should discard all the careful analysis you
did because here's a form I prefer, that I haven't tested concerning a
behaviour I didn't even realise until this email."

and even if it wasn't a buggy suggestion to begin with, it's still toxic
maintainer feedback.


Frankly, I'd have more time to review other peoples patches if I wasn't
wasting all of my time on premium grade manure like this, while trying
to help Oleksii who's had it far worse this release trying to clean up
droppings of maintainers-past.

~Andrew
Jan Beulich June 3, 2024, 6:24 a.m. UTC | #7
On 01.06.2024 03:47, Andrew Cooper wrote:
> On 28/05/2024 2:12 pm, Jan Beulich wrote:
>> On 28.05.2024 14:30, Andrew Cooper wrote:
>>> On 27/05/2024 2:37 pm, Jan Beulich wrote:
>>>> On 27.05.2024 15:27, Jan Beulich wrote:
>>>>> On 24.05.2024 22:03, Andrew Cooper wrote:
>>>>>> --- a/xen/arch/x86/include/asm/bitops.h
>>>>>> +++ b/xen/arch/x86/include/asm/bitops.h
>>>>>> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
>>>>>>  
>>>>>>  static always_inline unsigned int arch_ffs(unsigned int x)
>>>>>>  {
>>>>>> -    int r;
>>>>>> +    unsigned int r;
>>>>>> +
>>>>>> +    if ( __builtin_constant_p(x > 0) && x > 0 )
>>>>>> +    {
>>>>>> +        /* Safe, when the compiler knows that x is nonzero. */
>>>>>> +        asm ( "bsf %[val], %[res]"
>>>>>> +              : [res] "=r" (r)
>>>>>> +              : [val] "rm" (x) );
>>>>>> +    }
>>>>> In patch 11 relevant things are all in a single patch, making it easier
>>>>> to spot that this is dead code: The sole caller already has a
>>>>> __builtin_constant_p(), hence I don't see how the one here could ever
>>>>> return true. With that the respective part of the description is then
>>>>> questionable, too, I'm afraid: Where did you observe any actual effect
>>>>> from this? Or if you did - what am I missing?
>>>> Hmm, thinking about it: I suppose that's why you have
>>>> __builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit
>>>> I'm (positively) surprised that the former may return true when the latter
>>>> doesn't.
>>> So was I, but this recommendation came straight from the GCC mailing
>>> list.  And it really does work, even back in obsolete versions of GCC.
>>>
>>> __builtin_constant_p() operates on an expression not a value, and is
>>> documented as such.
>> Of course.
>>
>>>>  Nevertheless I'm inclined to think this deserves a brief comment.
>>> There is a comment, and it's even visible in the snippet.
>> The comment is about the asm(); it is neither placed to clearly relate
>> to __builtin_constant_p(), nor is it saying anything about this specific
>> property of it. You said you were equally surprised; don't you think
>> that when both of us are surprised, a specific (even if brief) comment
>> is warranted?
> 
> Spell it out for me like I'm an idiot.
> 
> Because I'm looking at the patch I submitted, and at your request for "a
> brief comment", and I still have no idea what you think is wrong at the
> moment.
> 
> I'm also not included to write a comment saying "go and read the GCC
> manual more carefully".
> 
>>
>>>> As an aside, to better match the comment inside the if()'s body, how about
>>>>
>>>>     if ( __builtin_constant_p(!!x) && x )
>>>>
>>>> ? That also may make a little more clear that this isn't just a style
>>>> choice, but actually needed for the intended purpose.
>>> I am not changing the logic.
>>>
>>> Apart from anything else, your suggestion is trivially buggy.  I care
>>> about whether the RHS collapses to a constant, and the only way of doing
>>> that correctly is asking the compiler about the *exact* expression. 
>>> Asking about some other expression which you hope - but do not know -
>>> that the compiler will treat equivalently is bogus.  It would be
>>> strictly better to only take the else clause, than to have both halves
>>> emitted.
>>>
>>> This is the form I've tested extensively.  It's also the clearest form
>>> IMO.  You can experiment with alternative forms when we're not staring
>>> down code freeze of 4.19.
>> "Clearest form" is almost always a matter of taste. To me, comparing
>> unsigned values with > or < against 0 is generally at least suspicious.
>> Using != is typically better (again: imo), and simply omitting the != 0
>> then is shorter with no difference in effect. Except in peculiar cases
>> like this one, where indeed it took me some time to figure why the
>> comparison operator may not be omitted.
>>
>> All that said: I'm not going to insist on any change; the R-b previously
>> offered still stands. I would highly appreciate though if the (further)
>> comment asked for could be added.
>>
>> What I definitely dislike here is you - not for the first time - turning
>> down remarks because a change of yours is late.
> 
> Actually it's not to do with the release.  I'd reject it at any point
> because it's an unreasonable request to make; to me, or to anyone else.
> 
> It would be a matter of taste (which again you have a singular view on),
> if it wasn't for the fact that what you actually said was:
> 
> "I don't like it, and you should discard all the careful analysis you
> did because here's a form I prefer, that I haven't tested concerning a
> behaviour I didn't even realise until this email."

Just to clarify: Long before this reply of yours I understood and admitted
my mistake. A more clear / well placed comment (see further up) might have
avoided that. Still - thanks for extending the comment in what you have
committed.

> and even if it wasn't a buggy suggestion to begin with, it's still toxic
> maintainer feedback.

What's toxic about making a mistake? What's toxic about disliking "x > 0"
for unsigned quantities? As you say, it's a matter of taste to a fair
degree. Yet there are ample cases where taste as in "make it as clear as
possible to every reader" is used to ask me or others to change style. I
don't see why I shouldn't be permitted to at least make a similar remark,
even if then it's turned down (for good or bad reasons).

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h
index 122767fc0d10..1d7aea6065ef 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -432,12 +432,28 @@  static inline int ffsl(unsigned long x)
 
 static always_inline unsigned int arch_ffs(unsigned int x)
 {
-    int r;
+    unsigned int r;
+
+    if ( __builtin_constant_p(x > 0) && x > 0 )
+    {
+        /* Safe, when the compiler knows that x is nonzero. */
+        asm ( "bsf %[val], %[res]"
+              : [res] "=r" (r)
+              : [val] "rm" (x) );
+    }
+    else
+    {
+        /*
+         * The AMD manual states that BSF won't modify the destination
+         * register if x=0.  The Intel manual states that the result is
+         * undefined, but the architects have said that the register is
+         * written back with it's old value (zero extended as normal).
+         */
+        asm ( "bsf %[val], %[res]"
+              : [res] "=r" (r)
+              : [val] "rm" (x), "[res]" (-1) );
+    }
 
-    asm ( "bsf %1,%0\n\t"
-          "jnz 1f\n\t"
-          "mov $-1,%0\n"
-          "1:" : "=r" (r) : "rm" (x));
     return r + 1;
 }
 #define arch_ffs arch_ffs