arm64: do not force irq affinity setting

Message ID	1403765395-16978-1-git-send-email-pgaikwad@nvidia.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Prashant Gaikwad <pgaikwad@nvidia.com> To: Catalin Marinas <catalin.marinas@arm.com>, Sudeep Holla <sudeep.holla@arm.com>, Will Deacon <will.deacon@arm.com> Subject: [PATCH] arm64: do not force irq affinity setting Date: Thu, 26 Jun 2014 12:19:55 +0530 Message-ID: <1403765395-16978-1-git-send-email-pgaikwad@nvidia.com> MIME-Version: 1.0 Cc: pgaikwad@nvidia.com, linux-arm-kernel@lists.infradead.org Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Prashant Gaikwad June 26, 2014, 6:49 a.m. UTC

Unconditional copying cpu_online_mask to affinity
may result in migrating affinity to wrong CPU.

For example, IRQ 5 affinity mask contains CPU 4-7,
it was affined to CPU4 and CPU 0-7 are online.
Now if we hot-unplug CPU4 then with current
implementation affinity mask will contain
CPU 0-3,5-7 and IRQ 5 will be affined to CPU0.

Instead copy cpu_online_mask to affinity only if
no online CPU is present in affinity mask and do
not force affinity seeting which would do the
CPU online check.

Signed-off-by: Prashant Gaikwad <pgaikwad@nvidia.com>
---
 arch/arm64/kernel/irq.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

Will Deacon June 26, 2014, 10:20 a.m. UTC | #1

Hello,

On Thu, Jun 26, 2014 at 07:49:55AM +0100, Prashant Gaikwad wrote:
> Unconditional copying cpu_online_mask to affinity
> may result in migrating affinity to wrong CPU.

We have a bug, but I don't follow your reasoning.

> For example, IRQ 5 affinity mask contains CPU 4-7,

Ok, so d->affinity is 0xf0...

> it was affined to CPU4 and CPU 0-7 are online.

...and cpu_online_mask is 0xff.

> Now if we hot-unplug CPU4 then with current
> implementation affinity mask will contain
> CPU 0-3,5-7 and IRQ 5 will be affined to CPU0.

cpumask_any_and(affinity, cpu_online_mask) will give return < nr_cpu_ids
since there is an intersection of 0xf0. That means ret is false.

The bug is that we then do affinity = cpu_online_mask; unconditionally,
but we *won't* do the cpumask_copy, since ret is false.

You can fix this by simply bringing the arm64 code into line with the arm
code, which begs the question as to why this has to exist in the arch/
backend at all!

Will

Prashant Gaikwad June 26, 2014, noon UTC | #2

On Thu, 2014-06-26 at 15:50 +0530, Will Deacon wrote:
> Hello,
> 
> On Thu, Jun 26, 2014 at 07:49:55AM +0100, Prashant Gaikwad wrote:
> > Unconditional copying cpu_online_mask to affinity
> > may result in migrating affinity to wrong CPU.
> 
> We have a bug, but I don't follow your reasoning.
> 
> > For example, IRQ 5 affinity mask contains CPU 4-7,
> 
> Ok, so d->affinity is 0xf0...
> 
> > it was affined to CPU4 and CPU 0-7 are online.
> 
> ...and cpu_online_mask is 0xff.
> 
> > Now if we hot-unplug CPU4 then with current
> > implementation affinity mask will contain
> > CPU 0-3,5-7 and IRQ 5 will be affined to CPU0.
> 
> cpumask_any_and(affinity, cpu_online_mask) will give return < nr_cpu_ids
> since there is an intersection of 0xf0. That means ret is false.
> 
> The bug is that we then do affinity = cpu_online_mask; unconditionally,
> but we *won't* do the cpumask_copy, since ret is false.
> 

We do not copy but the affinity mask passed to irq_set_affinity function
is nothing but cpu_online_mask. So in GIC it will set affinity to CPU0.

> You can fix this by simply bringing the arm64 code into line with the arm
> code, which begs the question as to why this has to exist in the arch/
> backend at all!

Where can we move this code?

> 
> Will

Will Deacon June 26, 2014, 1:11 p.m. UTC | #3

On Thu, Jun 26, 2014 at 01:00:24PM +0100, Prashant Gaikwad wrote:
> On Thu, 2014-06-26 at 15:50 +0530, Will Deacon wrote:
> > On Thu, Jun 26, 2014 at 07:49:55AM +0100, Prashant Gaikwad wrote:
> > > Unconditional copying cpu_online_mask to affinity
> > > may result in migrating affinity to wrong CPU.
> > 
> > We have a bug, but I don't follow your reasoning.
> > 
> > > For example, IRQ 5 affinity mask contains CPU 4-7,
> > 
> > Ok, so d->affinity is 0xf0...
> > 
> > > it was affined to CPU4 and CPU 0-7 are online.
> > 
> > ...and cpu_online_mask is 0xff.
> > 
> > > Now if we hot-unplug CPU4 then with current
> > > implementation affinity mask will contain
> > > CPU 0-3,5-7 and IRQ 5 will be affined to CPU0.
> > 
> > cpumask_any_and(affinity, cpu_online_mask) will give return < nr_cpu_ids
> > since there is an intersection of 0xf0. That means ret is false.
> > 
> > The bug is that we then do affinity = cpu_online_mask; unconditionally,
> > but we *won't* do the cpumask_copy, since ret is false.
> > 
> 
> We do not copy but the affinity mask passed to irq_set_affinity function
> is nothing but cpu_online_mask. So in GIC it will set affinity to CPU0.

Exactly, but your proposed patch changed more than that.

> > You can fix this by simply bringing the arm64 code into line with the arm
> > code, which begs the question as to why this has to exist in the arch/
> > backend at all!
> 
> Where can we move this code?

kernel/irq/migration.c?

Will

Prashant Gaikwad June 26, 2014, 1:40 p.m. UTC | #4

On Thu, 2014-06-26 at 18:41 +0530, Will Deacon wrote:
> On Thu, Jun 26, 2014 at 01:00:24PM +0100, Prashant Gaikwad wrote:
> > On Thu, 2014-06-26 at 15:50 +0530, Will Deacon wrote:
> > > On Thu, Jun 26, 2014 at 07:49:55AM +0100, Prashant Gaikwad wrote:
> > > > Unconditional copying cpu_online_mask to affinity
> > > > may result in migrating affinity to wrong CPU.
> > > 
> > > We have a bug, but I don't follow your reasoning.
> > > 
> > > > For example, IRQ 5 affinity mask contains CPU 4-7,
> > > 
> > > Ok, so d->affinity is 0xf0...
> > > 
> > > > it was affined to CPU4 and CPU 0-7 are online.
> > > 
> > > ...and cpu_online_mask is 0xff.
> > > 
> > > > Now if we hot-unplug CPU4 then with current
> > > > implementation affinity mask will contain
> > > > CPU 0-3,5-7 and IRQ 5 will be affined to CPU0.
> > > 
> > > cpumask_any_and(affinity, cpu_online_mask) will give return < nr_cpu_ids
> > > since there is an intersection of 0xf0. That means ret is false.
> > > 
> > > The bug is that we then do affinity = cpu_online_mask; unconditionally,
> > > but we *won't* do the cpumask_copy, since ret is false.
> > > 
> > 
> > We do not copy but the affinity mask passed to irq_set_affinity function
> > is nothing but cpu_online_mask. So in GIC it will set affinity to CPU0.
> 
> Exactly, but your proposed patch changed more than that.
> 

I am changing the force flag to false. That is because after I fix this
behavior we have another bug where the IRQ affinity is set to offline
CPU.

When cpumask_any_and(affinity, cpu_online_mask) return < nr_cpu_ids we
pass the affinity mask as it is which contains the offline CPU too and
if force flag is true then GIC driver skips online CPU check. If CPU0 is
going down then the affinity mask will have CPU0 and GIC driver will
keep the affinity to CPU0.

Changing force flag to false ensures that GIC driver checks for online
CPU.

> > > You can fix this by simply bringing the arm64 code into line with the arm
> > > code, which begs the question as to why this has to exist in the arch/
> > > backend at all!
> > 
> > Where can we move this code?
> 
> kernel/irq/migration.c?
> 
> Will

Sudeep Holla June 26, 2014, 1:45 p.m. UTC | #5

Hi Will,

On 26/06/14 11:20, Will Deacon wrote:
> Hello,
>
> On Thu, Jun 26, 2014 at 07:49:55AM +0100, Prashant Gaikwad wrote:
>> Unconditional copying cpu_online_mask to affinity
>> may result in migrating affinity to wrong CPU.
>
> We have a bug, but I don't follow your reasoning.
>
>> For example, IRQ 5 affinity mask contains CPU 4-7,
>
> Ok, so d->affinity is 0xf0...
>
>> it was affined to CPU4 and CPU 0-7 are online.
>
> ...and cpu_online_mask is 0xff.
>
>> Now if we hot-unplug CPU4 then with current
>> implementation affinity mask will contain
>> CPU 0-3,5-7 and IRQ 5 will be affined to CPU0.
>
> cpumask_any_and(affinity, cpu_online_mask) will give return < nr_cpu_ids
> since there is an intersection of 0xf0. That means ret is false.
>
> The bug is that we then do affinity = cpu_online_mask; unconditionally,
> but we *won't* do the cpumask_copy, since ret is false.
>
> You can fix this by simply bringing the arm64 code into line with the arm
> code, which begs the question as to why this has to exist in the arch/
> backend at all!
>

The unconditional assignment was added by me to fix CPU0 hotplug issue explained 
in commit 601c942176d8 which is wrong and evident from the above
usecase. It was added to retain the forced irq_set_affinity. The difference 
between arm and arm64 is because the arm doesn't have the patch [1]

We can move to irq_set_affinity without force option as this patch does.
I had mentioned similar solution[2], but Russell wants to get feedback from
tglx[3]

And yes I see similar implementations for many architectures, definitely
can be unified.

Regards,
Sudeep

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-May/254838.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-May/259255.html
[3] http://www.spinics.net/lists/arm-kernel/msg340279.html

Sudeep Holla June 26, 2014, 2:04 p.m. UTC | #6

Hi,

On 26/06/14 14:40, Prashant Gaikwad wrote:
> On Thu, 2014-06-26 at 18:41 +0530, Will Deacon wrote:
>> On Thu, Jun 26, 2014 at 01:00:24PM +0100, Prashant Gaikwad wrote:
>>> On Thu, 2014-06-26 at 15:50 +0530, Will Deacon wrote:
>>>> On Thu, Jun 26, 2014 at 07:49:55AM +0100, Prashant Gaikwad wrote:
>>>>> Unconditional copying cpu_online_mask to affinity
>>>>> may result in migrating affinity to wrong CPU.
>>>>
>>>> We have a bug, but I don't follow your reasoning.
>>>>
>>>>> For example, IRQ 5 affinity mask contains CPU 4-7,
>>>>
>>>> Ok, so d->affinity is 0xf0...
>>>>
>>>>> it was affined to CPU4 and CPU 0-7 are online.
>>>>
>>>> ...and cpu_online_mask is 0xff.
>>>>
>>>>> Now if we hot-unplug CPU4 then with current
>>>>> implementation affinity mask will contain
>>>>> CPU 0-3,5-7 and IRQ 5 will be affined to CPU0.
>>>>
>>>> cpumask_any_and(affinity, cpu_online_mask) will give return < nr_cpu_ids
>>>> since there is an intersection of 0xf0. That means ret is false.
>>>>
>>>> The bug is that we then do affinity = cpu_online_mask; unconditionally,
>>>> but we *won't* do the cpumask_copy, since ret is false.
>>>>
>>>
>>> We do not copy but the affinity mask passed to irq_set_affinity function
>>> is nothing but cpu_online_mask. So in GIC it will set affinity to CPU0.
>>
>> Exactly, but your proposed patch changed more than that.
>>
>
> I am changing the force flag to false. That is because after I fix this
> behavior we have another bug where the IRQ affinity is set to offline
> CPU.
>

That's correct, it's the original issue I saw and fixed incorrectly which
triggered the bug you have now.

The main reason to retain the force flag as true is that the implementation is
irqchip specific. GIC implements the way you explained but what if some other
irqchip implementation has something different.

I believe that's the reason why Russell wants to get feedback from tglx.

Regards,
Sudeep

arm64: do not force irq affinity setting

Commit Message

Comments

Patch