diff mbox series

[for-4.19,1/9] x86/irq: remove offline CPUs from old CPU mask when adjusting move_cleanup_count

Message ID 20240529090132.59434-2-roger.pau@citrix.com (mailing list archive)
State Superseded
Headers show
Series x86/irq: fixes for CPU hot{,un}plug | expand

Commit Message

Roger Pau Monné May 29, 2024, 9:01 a.m. UTC
When adjusting move_cleanup_count to account for CPUs that are offline also
adjust old_cpu_mask, otherwise further calls to fixup_irqs() could subtract
those again creating and create an imbalance in move_cleanup_count.

Fixes: 472e0b74c5c4 ('x86/IRQ: deal with move cleanup count state in fixup_irqs()')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/irq.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Jan Beulich May 29, 2024, 12:40 p.m. UTC | #1
On 29.05.2024 11:01, Roger Pau Monne wrote:
> When adjusting move_cleanup_count to account for CPUs that are offline also
> adjust old_cpu_mask, otherwise further calls to fixup_irqs() could subtract
> those again creating and create an imbalance in move_cleanup_count.

I'm in trouble with "creating"; I can't seem to be able to guess what you may
have meant.

> Fixes: 472e0b74c5c4 ('x86/IRQ: deal with move cleanup count state in fixup_irqs()')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

With the above clarified (adjustment can be done while committing)
Reviewed-by: Jan Beulich <jbeulich@suse.com>

> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -2572,6 +2572,14 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
>              desc->arch.move_cleanup_count -= cpumask_weight(affinity);
>              if ( !desc->arch.move_cleanup_count )
>                  release_old_vec(desc);
> +            else
> +                /*
> +                 * Adjust old_cpu_mask to account for the offline CPUs,
> +                 * otherwise further calls to fixup_irqs() could subtract those
> +                 * again and possibly underflow the counter.
> +                 */
> +                cpumask_and(desc->arch.old_cpu_mask, desc->arch.old_cpu_mask,
> +                            &cpu_online_map);
>          }

While functionality-wise okay, imo it would be slightly better to use
"affinity" here as well, so that even without looking at context beyond
what's shown here there is a direct connection to the cpumask_weight()
call. I.e.

                cpumask_andnot(desc->arch.old_cpu_mask, desc->arch.old_cpu_mask,
                               affinity);

Thoughts?

Jan
Roger Pau Monné May 29, 2024, 3:15 p.m. UTC | #2
On Wed, May 29, 2024 at 02:40:51PM +0200, Jan Beulich wrote:
> On 29.05.2024 11:01, Roger Pau Monne wrote:
> > When adjusting move_cleanup_count to account for CPUs that are offline also
> > adjust old_cpu_mask, otherwise further calls to fixup_irqs() could subtract
> > those again creating and create an imbalance in move_cleanup_count.
> 
> I'm in trouble with "creating"; I can't seem to be able to guess what you may
> have meant.

Oh, sorry, that's a typo.

I was meaning to point out that not removing the already subtracted
CPUs from the mask can lead to further calls to fixup_irqs()
subtracting them again and move_cleanup_count possibly underflowing.

Would you prefer to write it as:

"... could subtract those again and possibly underflow move_cleanup_count."

> > Fixes: 472e0b74c5c4 ('x86/IRQ: deal with move cleanup count state in fixup_irqs()')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> With the above clarified (adjustment can be done while committing)
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> > --- a/xen/arch/x86/irq.c
> > +++ b/xen/arch/x86/irq.c
> > @@ -2572,6 +2572,14 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
> >              desc->arch.move_cleanup_count -= cpumask_weight(affinity);
> >              if ( !desc->arch.move_cleanup_count )
> >                  release_old_vec(desc);
> > +            else
> > +                /*
> > +                 * Adjust old_cpu_mask to account for the offline CPUs,
> > +                 * otherwise further calls to fixup_irqs() could subtract those
> > +                 * again and possibly underflow the counter.
> > +                 */
> > +                cpumask_and(desc->arch.old_cpu_mask, desc->arch.old_cpu_mask,
> > +                            &cpu_online_map);
> >          }
> 
> While functionality-wise okay, imo it would be slightly better to use
> "affinity" here as well, so that even without looking at context beyond
> what's shown here there is a direct connection to the cpumask_weight()
> call. I.e.
> 
>                 cpumask_andnot(desc->arch.old_cpu_mask, desc->arch.old_cpu_mask,
>                                affinity);
> 
> Thoughts?

It was more straightforward for me to reason that removing the offline
CPUs is OK, but I can see that you might prefer to use 'affinity',
because that's the weight that's subtracted from move_cleanup_count.
Using either should lead to the same result if my understanding is
correct.

Thanks, Roger.
Jan Beulich May 29, 2024, 3:27 p.m. UTC | #3
On 29.05.2024 17:15, Roger Pau Monné wrote:
> On Wed, May 29, 2024 at 02:40:51PM +0200, Jan Beulich wrote:
>> On 29.05.2024 11:01, Roger Pau Monne wrote:
>>> When adjusting move_cleanup_count to account for CPUs that are offline also
>>> adjust old_cpu_mask, otherwise further calls to fixup_irqs() could subtract
>>> those again creating and create an imbalance in move_cleanup_count.
>>
>> I'm in trouble with "creating"; I can't seem to be able to guess what you may
>> have meant.
> 
> Oh, sorry, that's a typo.
> 
> I was meaning to point out that not removing the already subtracted
> CPUs from the mask can lead to further calls to fixup_irqs()
> subtracting them again and move_cleanup_count possibly underflowing.
> 
> Would you prefer to write it as:
> 
> "... could subtract those again and possibly underflow move_cleanup_count."

Fine with me. Looks like simply deleting "creating" and keeping the rest
as it was would be okay too? Whatever you prefer in the end.

>>> --- a/xen/arch/x86/irq.c
>>> +++ b/xen/arch/x86/irq.c
>>> @@ -2572,6 +2572,14 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
>>>              desc->arch.move_cleanup_count -= cpumask_weight(affinity);
>>>              if ( !desc->arch.move_cleanup_count )
>>>                  release_old_vec(desc);
>>> +            else
>>> +                /*
>>> +                 * Adjust old_cpu_mask to account for the offline CPUs,
>>> +                 * otherwise further calls to fixup_irqs() could subtract those
>>> +                 * again and possibly underflow the counter.
>>> +                 */
>>> +                cpumask_and(desc->arch.old_cpu_mask, desc->arch.old_cpu_mask,
>>> +                            &cpu_online_map);
>>>          }
>>
>> While functionality-wise okay, imo it would be slightly better to use
>> "affinity" here as well, so that even without looking at context beyond
>> what's shown here there is a direct connection to the cpumask_weight()
>> call. I.e.
>>
>>                 cpumask_andnot(desc->arch.old_cpu_mask, desc->arch.old_cpu_mask,
>>                                affinity);
>>
>> Thoughts?
> 
> It was more straightforward for me to reason that removing the offline
> CPUs is OK, but I can see that you might prefer to use 'affinity',
> because that's the weight that's subtracted from move_cleanup_count.
> Using either should lead to the same result if my understanding is
> correct.

That was the conclusion I came to, or else I wouldn't have made the
suggestion. Unless you have a strong preference for the as-is form, I'd
indeed prefer the suggested alternative.

Jan
Roger Pau Monné May 29, 2024, 3:34 p.m. UTC | #4
On Wed, May 29, 2024 at 05:27:06PM +0200, Jan Beulich wrote:
> On 29.05.2024 17:15, Roger Pau Monné wrote:
> > On Wed, May 29, 2024 at 02:40:51PM +0200, Jan Beulich wrote:
> >> On 29.05.2024 11:01, Roger Pau Monne wrote:
> >>> When adjusting move_cleanup_count to account for CPUs that are offline also
> >>> adjust old_cpu_mask, otherwise further calls to fixup_irqs() could subtract
> >>> those again creating and create an imbalance in move_cleanup_count.
> >>
> >> I'm in trouble with "creating"; I can't seem to be able to guess what you may
> >> have meant.
> > 
> > Oh, sorry, that's a typo.
> > 
> > I was meaning to point out that not removing the already subtracted
> > CPUs from the mask can lead to further calls to fixup_irqs()
> > subtracting them again and move_cleanup_count possibly underflowing.
> > 
> > Would you prefer to write it as:
> > 
> > "... could subtract those again and possibly underflow move_cleanup_count."
> 
> Fine with me. Looks like simply deleting "creating" and keeping the rest
> as it was would be okay too? Whatever you prefer in the end.

Yes, whatever you think it's clearer TBH, I don't really have a
preference.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index c16205a9beb6..9716e00e873b 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2572,6 +2572,14 @@  void fixup_irqs(const cpumask_t *mask, bool verbose)
             desc->arch.move_cleanup_count -= cpumask_weight(affinity);
             if ( !desc->arch.move_cleanup_count )
                 release_old_vec(desc);
+            else
+                /*
+                 * Adjust old_cpu_mask to account for the offline CPUs,
+                 * otherwise further calls to fixup_irqs() could subtract those
+                 * again and possibly underflow the counter.
+                 */
+                cpumask_and(desc->arch.old_cpu_mask, desc->arch.old_cpu_mask,
+                            &cpu_online_map);
         }
 
         if ( !desc->action || cpumask_subset(desc->affinity, mask) )