diff mbox

arm64: kdump: fix interrupt handling done during machine_crash_shutdown

Message ID 1519837260-30662-1-git-send-email-jaz@semihalf.com (mailing list archive)
State New, archived
Headers show

Commit Message

Grzegorz Jaszczyk Feb. 28, 2018, 5:01 p.m. UTC
Hitherto during machine_kexec_mask_interrupts there was an attempt to
remove active state using irq_set_irqchip_state() routine and only if it
failed, the attempt to EOI the interrupt was made. Nevertheless relaying
on return value from irq_set_irqchip_state inside
machine_kexec_mask_interrupts is incorrect - it only returns the status
of the routine but doesn't provide information if the interrupt was
deactivated correctly or not. Therefore the irq_eoi wasn't call even if
the interrupt remained active.

To determine the sate correctly the irq_get_irqchip_state() could be
used but according to the ARM Generic Interrupt Controller Architecture
Spec, non-secure reading from GICD_ISACTIVERn/GICD_ICACTIVERn can be not
permitted (depending on NS_access setting of Non-secure Access Control
Registers, a.k.a. GICD_NSACRn). What is more interesting GICD_NSACRn
is optional Secure register.

Moreover de-activating the interrupt via GICD_ISACTIVERn register
(regardless of the possibility of checking status or not) seems to not
do the job, when the GIC Distributor is configured to forward the
interrupts to the CPU interfaces.

Because of all above the attempt to deactivate interrupts via
irq_set_irqchip_state() is removed in this patch. Instead the irq_eoi is
called whenever the interrupt is in progress(irqd_irq_inprogress).

Before this patch the kdump triggered from interrupt context worked
correctly by accident when the GIC was configured with
GIC_CPU_CTRL_EOImodeNS == 1 (supports_deactivate == true). In mentioned
mode GIC_CPU_EOI has priority drop functionality only and
GIC_CPU_DEACTIVATE is used for interrupt deactivation. Also the
gic_handle_irq behaviour is a bit different in mentioned mode and
performs write to the GIC_CPU_EOI which causes the priority drop to the
idle priority. So even if the irq_eoi wasn't called during
machine_kexec_mask_interrupts, the interrupts of the crashdump kernel
was handled due to interrupt preemption (since the priority of still
active interrupt was dropped to idle priority).

Nevertheless when the kdump was triggered from interrupt context while
the GIC was configured to work in GIC_CPU_CTRL_EOImodeNS == 0, the
crashdump kernel hang in early stage due to lack of timer interrupt
arrival.

After this fix the kdump behaves correctly when triggered from interrupt
context independently of GIC_CPU_CTRL_EOImodeNS configuration.

Signed-off-by: Grzegorz Jaszczyk <jaz@semihalf.com>
---
 arch/arm64/kernel/machine_kexec.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

Comments

Mark Rutland Feb. 28, 2018, 5:16 p.m. UTC | #1
[Adding MarcZ]

On Wed, Feb 28, 2018 at 06:01:00PM +0100, Grzegorz Jaszczyk wrote:
> Hitherto during machine_kexec_mask_interrupts there was an attempt to
> remove active state using irq_set_irqchip_state() routine and only if it
> failed, the attempt to EOI the interrupt was made. Nevertheless relaying
> on return value from irq_set_irqchip_state inside
> machine_kexec_mask_interrupts is incorrect - it only returns the status
> of the routine but doesn't provide information if the interrupt was
> deactivated correctly or not. 

This doesn't sound right. The return value of irq_set_irqchip_state() is
certainly supposed to indicate that it did what it was asked to (i.e.
correctly (de)activated the interrupt).

IIUC, you're saying that there's a problem whereby:

(a) irq_set_irqchip_state() returns succesfully, but:

(b) irq_set_irqchip_state() does not alter the interrupt state to that
    requested.

... which sounds like a bug. 

When does this happen, exactly?

Thanks,
Mark.

> Therefore the irq_eoi wasn't call even if the interrupt remained
> active.
> 
> To determine the sate correctly the irq_get_irqchip_state() could be
> used but according to the ARM Generic Interrupt Controller Architecture
> Spec, non-secure reading from GICD_ISACTIVERn/GICD_ICACTIVERn can be not
> permitted (depending on NS_access setting of Non-secure Access Control
> Registers, a.k.a. GICD_NSACRn). What is more interesting GICD_NSACRn
> is optional Secure register.
> 
> Moreover de-activating the interrupt via GICD_ISACTIVERn register
> (regardless of the possibility of checking status or not) seems to not
> do the job, when the GIC Distributor is configured to forward the
> interrupts to the CPU interfaces.
> 
> Because of all above the attempt to deactivate interrupts via
> irq_set_irqchip_state() is removed in this patch. Instead the irq_eoi is
> called whenever the interrupt is in progress(irqd_irq_inprogress).
> 
> Before this patch the kdump triggered from interrupt context worked
> correctly by accident when the GIC was configured with
> GIC_CPU_CTRL_EOImodeNS == 1 (supports_deactivate == true). In mentioned
> mode GIC_CPU_EOI has priority drop functionality only and
> GIC_CPU_DEACTIVATE is used for interrupt deactivation. Also the
> gic_handle_irq behaviour is a bit different in mentioned mode and
> performs write to the GIC_CPU_EOI which causes the priority drop to the
> idle priority. So even if the irq_eoi wasn't called during
> machine_kexec_mask_interrupts, the interrupts of the crashdump kernel
> was handled due to interrupt preemption (since the priority of still
> active interrupt was dropped to idle priority).
> 
> Nevertheless when the kdump was triggered from interrupt context while
> the GIC was configured to work in GIC_CPU_CTRL_EOImodeNS == 0, the
> crashdump kernel hang in early stage due to lack of timer interrupt
> arrival.
> 
> After this fix the kdump behaves correctly when triggered from interrupt
> context independently of GIC_CPU_CTRL_EOImodeNS configuration.
> 
> Signed-off-by: Grzegorz Jaszczyk <jaz@semihalf.com>
> ---
>  arch/arm64/kernel/machine_kexec.c | 10 +---------
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> index f76ea92..30ad183 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -220,20 +220,12 @@ static void machine_kexec_mask_interrupts(void)
>  
>  	for_each_irq_desc(i, desc) {
>  		struct irq_chip *chip;
> -		int ret;
>  
>  		chip = irq_desc_get_chip(desc);
>  		if (!chip)
>  			continue;
>  
> -		/*
> -		 * First try to remove the active state. If this
> -		 * fails, try to EOI the interrupt.
> -		 */
> -		ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
> -
> -		if (ret && irqd_irq_inprogress(&desc->irq_data) &&
> -		    chip->irq_eoi)
> +		if (irqd_irq_inprogress(&desc->irq_data) && chip->irq_eoi)
>  			chip->irq_eoi(&desc->irq_data);
>  
>  		if (chip->irq_mask)
> -- 
> 2.7.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Marc Zyngier Feb. 28, 2018, 5:45 p.m. UTC | #2
On 28/02/18 17:16, Mark Rutland wrote:
> [Adding MarcZ]
> 
> On Wed, Feb 28, 2018 at 06:01:00PM +0100, Grzegorz Jaszczyk wrote:
>> Hitherto during machine_kexec_mask_interrupts there was an attempt to
>> remove active state using irq_set_irqchip_state() routine and only if it
>> failed, the attempt to EOI the interrupt was made. Nevertheless relaying
>> on return value from irq_set_irqchip_state inside
>> machine_kexec_mask_interrupts is incorrect - it only returns the status
>> of the routine but doesn't provide information if the interrupt was
>> deactivated correctly or not. 
> 
> This doesn't sound right. The return value of irq_set_irqchip_state() is
> certainly supposed to indicate that it did what it was asked to (i.e.
> correctly (de)activated the interrupt).
> 
> IIUC, you're saying that there's a problem whereby:
> 
> (a) irq_set_irqchip_state() returns succesfully, but:
> 
> (b) irq_set_irqchip_state() does not alter the interrupt state to that
>     requested.
> 
> ... which sounds like a bug. 
> 
> When does this happen, exactly?
> 
> Thanks,
> Mark.
> 
>> Therefore the irq_eoi wasn't call even if the interrupt remained
>> active.
>>
>> To determine the sate correctly the irq_get_irqchip_state() could be
>> used but according to the ARM Generic Interrupt Controller Architecture
>> Spec, non-secure reading from GICD_ISACTIVERn/GICD_ICACTIVERn can be not
>> permitted (depending on NS_access setting of Non-secure Access Control
>> Registers, a.k.a. GICD_NSACRn). What is more interesting GICD_NSACRn
>> is optional Secure register.

All the Linux interrupts are either non-secure, or accessible using
non-secure accessors. I'm afraid you're misunderstanding a thing or two
about the architecture.

>>
>> Moreover de-activating the interrupt via GICD_ISACTIVERn register
>> (regardless of the possibility of checking status or not) seems to not
>> do the job, when the GIC Distributor is configured to forward the
>> interrupts to the CPU interfaces.

Then you seem to have really buggy HW.

>>
>> Because of all above the attempt to deactivate interrupts via
>> irq_set_irqchip_state() is removed in this patch. Instead the irq_eoi is
>> called whenever the interrupt is in progress(irqd_irq_inprogress).
>>
>> Before this patch the kdump triggered from interrupt context worked
>> correctly by accident when the GIC was configured with
>> GIC_CPU_CTRL_EOImodeNS == 1 (supports_deactivate == true). In mentioned
>> mode GIC_CPU_EOI has priority drop functionality only and
>> GIC_CPU_DEACTIVATE is used for interrupt deactivation. Also the
>> gic_handle_irq behaviour is a bit different in mentioned mode and
>> performs write to the GIC_CPU_EOI which causes the priority drop to the
>> idle priority. So even if the irq_eoi wasn't called during
>> machine_kexec_mask_interrupts, the interrupts of the crashdump kernel
>> was handled due to interrupt preemption (since the priority of still
>> active interrupt was dropped to idle priority).
>>
>> Nevertheless when the kdump was triggered from interrupt context while
>> the GIC was configured to work in GIC_CPU_CTRL_EOImodeNS == 0, the
>> crashdump kernel hang in early stage due to lack of timer interrupt
>> arrival.
>>
>> After this fix the kdump behaves correctly when triggered from interrupt
>> context independently of GIC_CPU_CTRL_EOImodeNS configuration.
>>
>> Signed-off-by: Grzegorz Jaszczyk <jaz@semihalf.com>
>> ---
>>  arch/arm64/kernel/machine_kexec.c | 10 +---------
>>  1 file changed, 1 insertion(+), 9 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
>> index f76ea92..30ad183 100644
>> --- a/arch/arm64/kernel/machine_kexec.c
>> +++ b/arch/arm64/kernel/machine_kexec.c
>> @@ -220,20 +220,12 @@ static void machine_kexec_mask_interrupts(void)
>>  
>>  	for_each_irq_desc(i, desc) {
>>  		struct irq_chip *chip;
>> -		int ret;
>>  
>>  		chip = irq_desc_get_chip(desc);
>>  		if (!chip)
>>  			continue;
>>  
>> -		/*
>> -		 * First try to remove the active state. If this
>> -		 * fails, try to EOI the interrupt.
>> -		 */
>> -		ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
>> -
>> -		if (ret && irqd_irq_inprogress(&desc->irq_data) &&
>> -		    chip->irq_eoi)
>> +		if (irqd_irq_inprogress(&desc->irq_data) && chip->irq_eoi)

This really doesn't make any sense. Either you've successfully
deactivated the interrupt, or you haven't. Doing both is a terrible
violation of the GIC architecture, don't do that. There is also 0
guarantee that the interrupt was active on the CPU you're currently
running. If you activated it on another CPU, good luck.

To echo to Mark's concerns, you don't explain the root cause: Why is
irq_set_irqchip_state silently failing? With what IRQ type? On what GIC
implementation?

>>  			chip->irq_eoi(&desc->irq_data);
>>  
>>  		if (chip->irq_mask)

As it stands, this patch breaks more things than it should. I'd suggest
you start from the beginning and explain the issue you're facing.

Thanks,

	M.
Grzegorz Jaszczyk March 2, 2018, 11:56 a.m. UTC | #3
Thank you for your feedback. I probably over-interpreted some of the
documentation paragraph to justify (probably) buggy behavior that I am
seeing. Regardless of correctness of this patch I will appreciate if
you could help understanding this issue.

First the whole story: I was debugging why the crashdump kernel hangs
in v. early stage, when the kdump was triggered from the
ARM_SBSA_WATCHDOG interrupt handler, while everything worked fine when
it was triggered from the process context. Finally It occurred that it
is because the crashdump kernel doesn't get any timer interrupt. I
also notice that this problem doesn't occur when the gic is configured
to work in EOImode == 1. In such circumstances, the write to
GIC_CPU_EOI in gic_handle_irq is causing priority drop to idle, and
therefore when the crashdump kernel starts, the timer interrupt is
able to preempt still active watchdog interrupt (I know that this
interrupt shouldn't be active after irq_set_irqchip_state but for some
reason it seems to not do the job correctly).

In my commit log I wrongly describe the bahaviour of
irq_set_irqchip_state and irq_get_irqchip_state. In
machine_kexec_mask_interrupts (when watchdog interrupt is active)
after adding some debugs I see that (focusing only on watchdog
interrupt):
1) before calling irq_set_irqchip_state when I check the status with
irq_get_irqchip_state I see that watchdog interrupt is active
2) decative interrupt via irq_set_irqchip_state
3) check the status via irq_get_irqchip_state which indicates that the
status has changed to inactive, so everything seems to be fine, but
still in crashdump kernel I don't get any interrupts (when the EOImode
== 0).

When I modify the machine_kexec_mask_interrupts, to call the eoi for
watchdog (only temporary to observe the effect):
if (i == watchdog_irq)
     chip->irq_eoi(&desc->irq_data);

everything is working. So it seems that deactivating the interrupt via
write to GIC_CPU_EOI (EOImode == 0) or GIC_CPU_EOI +
GIC_CPU_DEACTIVATE (EOImode == 1) does the job, while deactivating it
with use of GIC_DIST_ACTIVE_CLEAR doesn't.

I am using the unmodified GICv2m ("arm,gic-400") and the watchdog
interrupt is connected as one of the SPI. Do you have any idea what
can be wrong? Maybe I am missing something? gic configuration? I also
don't exclude that nobody who work with kdump doesn't use (EOImode ==
0) and therefore didn't see this behavior.

Thank you in advance,
Grzegorz


2018-02-28 18:45 GMT+01:00 Marc Zyngier <marc.zyngier@arm.com>:
> On 28/02/18 17:16, Mark Rutland wrote:
>> [Adding MarcZ]
>>
>> On Wed, Feb 28, 2018 at 06:01:00PM +0100, Grzegorz Jaszczyk wrote:
>>> Hitherto during machine_kexec_mask_interrupts there was an attempt to
>>> remove active state using irq_set_irqchip_state() routine and only if it
>>> failed, the attempt to EOI the interrupt was made. Nevertheless relaying
>>> on return value from irq_set_irqchip_state inside
>>> machine_kexec_mask_interrupts is incorrect - it only returns the status
>>> of the routine but doesn't provide information if the interrupt was
>>> deactivated correctly or not.
>>
>> This doesn't sound right. The return value of irq_set_irqchip_state() is
>> certainly supposed to indicate that it did what it was asked to (i.e.
>> correctly (de)activated the interrupt).
>>
>> IIUC, you're saying that there's a problem whereby:
>>
>> (a) irq_set_irqchip_state() returns succesfully, but:
>>
>> (b) irq_set_irqchip_state() does not alter the interrupt state to that
>>     requested.
>>
>> ... which sounds like a bug.
>>
>> When does this happen, exactly?
>>
>> Thanks,
>> Mark.
>>
>>> Therefore the irq_eoi wasn't call even if the interrupt remained
>>> active.
>>>
>>> To determine the sate correctly the irq_get_irqchip_state() could be
>>> used but according to the ARM Generic Interrupt Controller Architecture
>>> Spec, non-secure reading from GICD_ISACTIVERn/GICD_ICACTIVERn can be not
>>> permitted (depending on NS_access setting of Non-secure Access Control
>>> Registers, a.k.a. GICD_NSACRn). What is more interesting GICD_NSACRn
>>> is optional Secure register.
>
> All the Linux interrupts are either non-secure, or accessible using
> non-secure accessors. I'm afraid you're misunderstanding a thing or two
> about the architecture.
>
>>>
>>> Moreover de-activating the interrupt via GICD_ISACTIVERn register
>>> (regardless of the possibility of checking status or not) seems to not
>>> do the job, when the GIC Distributor is configured to forward the
>>> interrupts to the CPU interfaces.
>
> Then you seem to have really buggy HW.
>
>>>
>>> Because of all above the attempt to deactivate interrupts via
>>> irq_set_irqchip_state() is removed in this patch. Instead the irq_eoi is
>>> called whenever the interrupt is in progress(irqd_irq_inprogress).
>>>
>>> Before this patch the kdump triggered from interrupt context worked
>>> correctly by accident when the GIC was configured with
>>> GIC_CPU_CTRL_EOImodeNS == 1 (supports_deactivate == true). In mentioned
>>> mode GIC_CPU_EOI has priority drop functionality only and
>>> GIC_CPU_DEACTIVATE is used for interrupt deactivation. Also the
>>> gic_handle_irq behaviour is a bit different in mentioned mode and
>>> performs write to the GIC_CPU_EOI which causes the priority drop to the
>>> idle priority. So even if the irq_eoi wasn't called during
>>> machine_kexec_mask_interrupts, the interrupts of the crashdump kernel
>>> was handled due to interrupt preemption (since the priority of still
>>> active interrupt was dropped to idle priority).
>>>
>>> Nevertheless when the kdump was triggered from interrupt context while
>>> the GIC was configured to work in GIC_CPU_CTRL_EOImodeNS == 0, the
>>> crashdump kernel hang in early stage due to lack of timer interrupt
>>> arrival.
>>>
>>> After this fix the kdump behaves correctly when triggered from interrupt
>>> context independently of GIC_CPU_CTRL_EOImodeNS configuration.
>>>
>>> Signed-off-by: Grzegorz Jaszczyk <jaz@semihalf.com>
>>> ---
>>>  arch/arm64/kernel/machine_kexec.c | 10 +---------
>>>  1 file changed, 1 insertion(+), 9 deletions(-)
>>>
>>> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
>>> index f76ea92..30ad183 100644
>>> --- a/arch/arm64/kernel/machine_kexec.c
>>> +++ b/arch/arm64/kernel/machine_kexec.c
>>> @@ -220,20 +220,12 @@ static void machine_kexec_mask_interrupts(void)
>>>
>>>      for_each_irq_desc(i, desc) {
>>>              struct irq_chip *chip;
>>> -            int ret;
>>>
>>>              chip = irq_desc_get_chip(desc);
>>>              if (!chip)
>>>                      continue;
>>>
>>> -            /*
>>> -             * First try to remove the active state. If this
>>> -             * fails, try to EOI the interrupt.
>>> -             */
>>> -            ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
>>> -
>>> -            if (ret && irqd_irq_inprogress(&desc->irq_data) &&
>>> -                chip->irq_eoi)
>>> +            if (irqd_irq_inprogress(&desc->irq_data) && chip->irq_eoi)
>
> This really doesn't make any sense. Either you've successfully
> deactivated the interrupt, or you haven't. Doing both is a terrible
> violation of the GIC architecture, don't do that. There is also 0
> guarantee that the interrupt was active on the CPU you're currently
> running. If you activated it on another CPU, good luck.
>
> To echo to Mark's concerns, you don't explain the root cause: Why is
> irq_set_irqchip_state silently failing? With what IRQ type? On what GIC
> implementation?
>
>>>                      chip->irq_eoi(&desc->irq_data);
>>>
>>>              if (chip->irq_mask)
>
> As it stands, this patch breaks more things than it should. I'd suggest
> you start from the beginning and explain the issue you're facing.
>
> Thanks,
>
>         M.
> --
> Jazz is not dead. It just smells funny...
Mark Rutland March 2, 2018, 12:05 p.m. UTC | #4
On Fri, Mar 02, 2018 at 12:56:24PM +0100, Grzegorz Jaszczyk wrote:
> Thank you for your feedback. I probably over-interpreted some of the
> documentation paragraph to justify (probably) buggy behavior that I am
> seeing. Regardless of correctness of this patch I will appreciate if
> you could help understanding this issue.
> 
> First the whole story: I was debugging why the crashdump kernel hangs
> in v. early stage, when the kdump was triggered from the
> ARM_SBSA_WATCHDOG interrupt handler, while everything worked fine when
> it was triggered from the process context. Finally It occurred that it
> is because the crashdump kernel doesn't get any timer interrupt. I
> also notice that this problem doesn't occur when the gic is configured
> to work in EOImode == 1. In such circumstances, the write to
> GIC_CPU_EOI in gic_handle_irq is causing priority drop to idle, and
> therefore when the crashdump kernel starts, the timer interrupt is
> able to preempt still active watchdog interrupt (I know that this
> interrupt shouldn't be active after irq_set_irqchip_state but for some
> reason it seems to not do the job correctly).

Do you have a way to reproduce the problem?

Is there an easy way to cause the watchdog to trigger a kdump as above,
e.g. via LKDTM?

> In my commit log I wrongly describe the bahaviour of
> irq_set_irqchip_state and irq_get_irqchip_state. In
> machine_kexec_mask_interrupts (when watchdog interrupt is active)
> after adding some debugs I see that (focusing only on watchdog
> interrupt):
> 1) before calling irq_set_irqchip_state when I check the status with
> irq_get_irqchip_state I see that watchdog interrupt is active
> 2) decative interrupt via irq_set_irqchip_state
> 3) check the status via irq_get_irqchip_state which indicates that the
> status has changed to inactive, so everything seems to be fine, but
> still in crashdump kernel I don't get any interrupts (when the EOImode
> == 0).
> 
> When I modify the machine_kexec_mask_interrupts, to call the eoi for
> watchdog (only temporary to observe the effect):
> if (i == watchdog_irq)
>      chip->irq_eoi(&desc->irq_data);
> 
> everything is working. So it seems that deactivating the interrupt via
> write to GIC_CPU_EOI (EOImode == 0) or GIC_CPU_EOI +
> GIC_CPU_DEACTIVATE (EOImode == 1) does the job, while deactivating it
> with use of GIC_DIST_ACTIVE_CLEAR doesn't.
> 
> I am using the unmodified GICv2m ("arm,gic-400") and the watchdog
> interrupt is connected as one of the SPI. 

I think you just mean GICv2 here. GICv2m is an MSI controller, and
shouldn't interact with the SBSA watchdog's SPI.

Can you tell us which platform you are seeing this on?

Thanks,
Mark.
Grzegorz Jaszczyk March 2, 2018, 12:59 p.m. UTC | #5
2018-03-02 13:05 GMT+01:00 Mark Rutland <mark.rutland@arm.com>:
> On Fri, Mar 02, 2018 at 12:56:24PM +0100, Grzegorz Jaszczyk wrote:
>> Thank you for your feedback. I probably over-interpreted some of the
>> documentation paragraph to justify (probably) buggy behavior that I am
>> seeing. Regardless of correctness of this patch I will appreciate if
>> you could help understanding this issue.
>>
>> First the whole story: I was debugging why the crashdump kernel hangs
>> in v. early stage, when the kdump was triggered from the
>> ARM_SBSA_WATCHDOG interrupt handler, while everything worked fine when
>> it was triggered from the process context. Finally It occurred that it
>> is because the crashdump kernel doesn't get any timer interrupt. I
>> also notice that this problem doesn't occur when the gic is configured
>> to work in EOImode == 1. In such circumstances, the write to
>> GIC_CPU_EOI in gic_handle_irq is causing priority drop to idle, and
>> therefore when the crashdump kernel starts, the timer interrupt is
>> able to preempt still active watchdog interrupt (I know that this
>> interrupt shouldn't be active after irq_set_irqchip_state but for some
>> reason it seems to not do the job correctly).
>
> Do you have a way to reproduce the problem?
>
> Is there an easy way to cause the watchdog to trigger a kdump as above,
> e.g. via LKDTM?

You can reproduce this problem by:
- enabling CONFIG_ARM_SBSA_WATCHDOG in your kernel
- passing via command-line: sbsa_gwdt.action=1 sbsa_gwdt.timeout=170
- then load/prepare crasdump kernel (I am doing it via kexec tool)
- echo 1 > /dev/watchdog

and after 170s the watchdog interrupt will hit triggering panic and
the whole kexec machinery will run. The sbsa_gwdt.timeout can't be too
small since it is also used for reset:
|----timeout-----(panic)----timeout-----reset.
If it is too small the crasdump kernel will not have enough time to start.

It is also reproducible with different interrupts, e.g. for test I put
the panic to i2c interrupt handler and it was behaving the same.

To use gic with EOImode == 0 mode, you can fulfill some of
gic_check_eoimode ( irqchip/irq-gic.c) conditions or just for test
"return false;" in this function.


> I think you just mean GICv2 here. GICv2m is an MSI controller, and
> shouldn't interact with the SBSA watchdog's SPI.

Yes of course, I just wanted to mention that it has MSI controller.

Thank you,
Grzegorz
Mark Rutland March 2, 2018, 1:15 p.m. UTC | #6
On Fri, Mar 02, 2018 at 01:59:27PM +0100, Grzegorz Jaszczyk wrote:
> 2018-03-02 13:05 GMT+01:00 Mark Rutland <mark.rutland@arm.com>:
> > Do you have a way to reproduce the problem?
> >
> > Is there an easy way to cause the watchdog to trigger a kdump as above,
> > e.g. via LKDTM?
> 
> You can reproduce this problem by:
> - enabling CONFIG_ARM_SBSA_WATCHDOG in your kernel
> - passing via command-line: sbsa_gwdt.action=1 sbsa_gwdt.timeout=170
> - then load/prepare crasdump kernel (I am doing it via kexec tool)
> - echo 1 > /dev/watchdog
> 
> and after 170s the watchdog interrupt will hit triggering panic and
> the whole kexec machinery will run. The sbsa_gwdt.timeout can't be too
> small since it is also used for reset:
> |----timeout-----(panic)----timeout-----reset.
> If it is too small the crasdump kernel will not have enough time to start.
> 
> It is also reproducible with different interrupts, e.g. for test I put
> the panic to i2c interrupt handler and it was behaving the same.

Do you see this for a panic() in *any* interrupt handler?

Can you trigger the issue with magic-sysrq c, for example?

> > I think you just mean GICv2 here. GICv2m is an MSI controller, and
> > shouldn't interact with the SBSA watchdog's SPI.
> 
> Yes of course, I just wanted to mention that it has MSI controller.

Can you please tell us which platform you're seeing this on?

Thanks,
Mark.
Grzegorz Jaszczyk March 2, 2018, 1:52 p.m. UTC | #7
2018-03-02 14:15 GMT+01:00 Mark Rutland <mark.rutland@arm.com>:
> On Fri, Mar 02, 2018 at 01:59:27PM +0100, Grzegorz Jaszczyk wrote:
>> 2018-03-02 13:05 GMT+01:00 Mark Rutland <mark.rutland@arm.com>:
>> > Do you have a way to reproduce the problem?
>> >
>> > Is there an easy way to cause the watchdog to trigger a kdump as above,
>> > e.g. via LKDTM?
>>
>> You can reproduce this problem by:
>> - enabling CONFIG_ARM_SBSA_WATCHDOG in your kernel
>> - passing via command-line: sbsa_gwdt.action=1 sbsa_gwdt.timeout=170
>> - then load/prepare crasdump kernel (I am doing it via kexec tool)
>> - echo 1 > /dev/watchdog
>>
>> and after 170s the watchdog interrupt will hit triggering panic and
>> the whole kexec machinery will run. The sbsa_gwdt.timeout can't be too
>> small since it is also used for reset:
>> |----timeout-----(panic)----timeout-----reset.
>> If it is too small the crasdump kernel will not have enough time to start.
>>
>> It is also reproducible with different interrupts, e.g. for test I put
>> the panic to i2c interrupt handler and it was behaving the same.
>
> Do you see this for a panic() in *any* interrupt handler?

I only test with this two interrupt handlers: watchdog and i2c but I
think it will behave the same with others - I can try with other if
you want, any suggestion which? Maybe with some PPI interrupt instead?
>
> Can you trigger the issue with magic-sysrq c, for example?

There is no problem when I trigger it via 'echo c >
/proc/sysrq-trigger' - it works well all the time. The problem appears
only, when the kexec/kdump procedure is triggered from interrupt
context - as I said it seems that deactivating the interrupt via
irq_set_irqchip_state doesn't do the job and because of that any new
interrupt (e.g. timer interrupt) can't interrupt the CPU (the previous
irq watchdog/i2c irq seems to be still active preventing other irq to
interrupt the CPU). This result with crashdump kernel hang (it waits
for the timer interrupt, which never interrupts the CPU).

Reworking the machine_kexec_mask_interrupts routine so it will call
'chip->irq_eoi(&desc->irq_data);' independently of
irq_set_irqchip_state return value, solves the problem.

Thank you,
Grzegorz
Mark Rutland March 2, 2018, 4:44 p.m. UTC | #8
On Fri, Mar 02, 2018 at 02:52:07PM +0100, Grzegorz Jaszczyk wrote:
> 2018-03-02 14:15 GMT+01:00 Mark Rutland <mark.rutland@arm.com>:
> > Do you see this for a panic() in *any* interrupt handler?
> 
> I only test with this two interrupt handlers: watchdog and i2c but I
> think it will behave the same with others - I can try with other if
> you want, any suggestion which? Maybe with some PPI interrupt instead?
> >
> > Can you trigger the issue with magic-sysrq c, for example?
> 
> There is no problem when I trigger it via 'echo c >
> /proc/sysrq-trigger' - it works well all the time. The problem appears
> only, when the kexec/kdump procedure is triggered from interrupt
> context

I'd meant that you'd send sysrq + c over serial, rather than writing to
/proc/sysrq-trigger. That way, the panic will be in the context of the
UART IRQ handler.

If that shows the issue, that's ilikely to be the easiest way for
someone else to reproduce and investigate this.

Thanks,
Mark.
Mark Rutland March 2, 2018, 4:57 p.m. UTC | #9
On Fri, Mar 02, 2018 at 04:44:13PM +0000, Mark Rutland wrote:
> On Fri, Mar 02, 2018 at 02:52:07PM +0100, Grzegorz Jaszczyk wrote:
> > 2018-03-02 14:15 GMT+01:00 Mark Rutland <mark.rutland@arm.com>:
> > > Do you see this for a panic() in *any* interrupt handler?
> > 
> > I only test with this two interrupt handlers: watchdog and i2c but I
> > think it will behave the same with others - I can try with other if
> > you want, any suggestion which? Maybe with some PPI interrupt instead?
> > >
> > > Can you trigger the issue with magic-sysrq c, for example?
> > 
> > There is no problem when I trigger it via 'echo c >
> > /proc/sysrq-trigger' - it works well all the time. The problem appears
> > only, when the kexec/kdump procedure is triggered from interrupt
> > context
> 
> I'd meant that you'd send sysrq + c over serial, rather than writing to
> /proc/sysrq-trigger. That way, the panic will be in the context of the
> UART IRQ handler.
> 
> If that shows the issue, that's ilikely to be the easiest way for
> someone else to reproduce and investigate this.

FWIW, having just given this a go on my Juno R1 with v4.16-rc3
defconfig, the UART IRQs work fine in the crash kernel. That crash
happened in IRQ context:

[  384.653153] Call trace:
[  384.655581]  sysrq_handle_crash+0x20/0x30
[  384.659559]  __handle_sysrq+0xa8/0x1a0
[  384.663278]  handle_sysrq+0x28/0x38
[  384.666738]  pl011_fifo_to_tty+0x150/0x1a8
[  384.670801]  pl011_int+0x30c/0x430
[  384.674177]  __handle_irq_event_percpu+0x5c/0x148
[  384.678843]  handle_irq_event_percpu+0x34/0x88
[  384.683250]  handle_irq_event+0x48/0x78
[  384.687056]  handle_fasteoi_irq+0xa8/0x180
[  384.691119]  generic_handle_irq+0x24/0x38
[  384.695095]  __handle_domain_irq+0x5c/0xb0
[  384.699158]  gic_handle_irq+0x58/0xa8
[  384.702790]  el1_irq+0xb0/0x128
[  384.705907]  cpuidle_enter_state+0x138/0x220
[  384.710142]  cpuidle_enter+0x18/0x20
[  384.713690]  call_cpuidle+0x1c/0x38
[  384.717151]  do_idle+0x1b0/0x1e8
[  384.720354]  cpu_startup_entry+0x20/0x28
[  384.724246]  rest_init+0xd0/0xe0
[  384.727450]  start_kernel+0x3e4/0x410

On a separate note, the crashkernel complained:

[    0.224730] CPU: CPUs started in inconsistent modes

... which is a separate disaster. I suspect the kexec code failed to punt the
crash CPU back to EL2 as it should have.

Thanks,
Mark.
Marc Zyngier March 7, 2018, 6:14 p.m. UTC | #10
On 02/03/18 11:56, Grzegorz Jaszczyk wrote:
> Thank you for your feedback. I probably over-interpreted some of the
> documentation paragraph to justify (probably) buggy behavior that I am
> seeing. Regardless of correctness of this patch I will appreciate if
> you could help understanding this issue.
> 
> First the whole story: I was debugging why the crashdump kernel hangs
> in v. early stage, when the kdump was triggered from the
> ARM_SBSA_WATCHDOG interrupt handler, while everything worked fine when
> it was triggered from the process context. Finally It occurred that it
> is because the crashdump kernel doesn't get any timer interrupt. I
> also notice that this problem doesn't occur when the gic is configured
> to work in EOImode == 1. In such circumstances, the write to
> GIC_CPU_EOI in gic_handle_irq is causing priority drop to idle, and
> therefore when the crashdump kernel starts, the timer interrupt is
> able to preempt still active watchdog interrupt (I know that this
> interrupt shouldn't be active after irq_set_irqchip_state but for some
> reason it seems to not do the job correctly).
> 
> In my commit log I wrongly describe the bahaviour of
> irq_set_irqchip_state and irq_get_irqchip_state. In
> machine_kexec_mask_interrupts (when watchdog interrupt is active)
> after adding some debugs I see that (focusing only on watchdog
> interrupt):
> 1) before calling irq_set_irqchip_state when I check the status with
> irq_get_irqchip_state I see that watchdog interrupt is active
> 2) decative interrupt via irq_set_irqchip_state
> 3) check the status via irq_get_irqchip_state which indicates that the
> status has changed to inactive, so everything seems to be fine, but
> still in crashdump kernel I don't get any interrupts (when the EOImode
> == 0).
> 
> When I modify the machine_kexec_mask_interrupts, to call the eoi for
> watchdog (only temporary to observe the effect):
> if (i == watchdog_irq)
>      chip->irq_eoi(&desc->irq_data);
> 
> everything is working. So it seems that deactivating the interrupt via
> write to GIC_CPU_EOI (EOImode == 0) or GIC_CPU_EOI +
> GIC_CPU_DEACTIVATE (EOImode == 1) does the job, while deactivating it
> with use of GIC_DIST_ACTIVE_CLEAR doesn't.
> 
> I am using the unmodified GICv2m ("arm,gic-400") and the watchdog
> interrupt is connected as one of the SPI. Do you have any idea what
> can be wrong? Maybe I am missing something? gic configuration? I also
> don't exclude that nobody who work with kdump doesn't use (EOImode ==
> 0) and therefore didn't see this behavior.

Not using EOImode==1 is definitely an oddity (at least on the host), but
that doesn't mean it shouldn't work.

The reason the thing is hanging is that although we correctly deactivate
the interrupt, nothing performs the priority drop. Your write to EOI
helps in the sense that it guarantees that both priority drop and
deactivate are done with the same operation, but that's not something
we'd want to expose.

My preferred approach would be to nuke the active priority registers at
boot time, as the CPUs come up. I'll try to write something this week.

	M.
Grzegorz Jaszczyk March 8, 2018, 10:06 p.m. UTC | #11
2018-03-02 17:57 GMT+01:00 Mark Rutland <mark.rutland@arm.com>:
>> > > Do you see this for a panic() in *any* interrupt handler?
>> >
>> > I only test with this two interrupt handlers: watchdog and i2c but I
>> > think it will behave the same with others - I can try with other if
>> > you want, any suggestion which? Maybe with some PPI interrupt instead?

I was able to reproduce it from other interrupts handler (UART, I2C,
timer and watchdog) no difference if it is PPI or SPI interrupt. I
also reproduce this issue with GICv3. But again it only happens when
eoimode = 0.
>> > >
>> > > Can you trigger the issue with magic-sysrq c, for example?
>> >
>> > There is no problem when I trigger it via 'echo c >
>> > /proc/sysrq-trigger' - it works well all the time. The problem appears
>> > only, when the kexec/kdump procedure is triggered from interrupt
>> > context
>>
>> I'd meant that you'd send sysrq + c over serial, rather than writing to
>> /proc/sysrq-trigger. That way, the panic will be in the context of the
>> UART IRQ handler.
>>
>> If that shows the issue, that's ilikely to be the easiest way for
>> someone else to reproduce and investigate this.

Yes it can be triggered by sending sysrq + c and indeed it is the
easiest way to reproduce it.
>
> FWIW, having just given this a go on my Juno R1 with v4.16-rc3
> defconfig, the UART IRQs work fine in the crash kernel. That crash
> happened in IRQ context:

I think that by default Juno uses eoimode = 1, did you try it when
eoimode was forced to be 0? Only eoimode = 0 triggers the issue.

Thank you,
Grzegorz
Grzegorz Jaszczyk March 9, 2018, 10:33 a.m. UTC | #12
> Not using EOImode==1 is definitely an oddity (at least on the host), but
> that doesn't mean it shouldn't work.
>
> The reason the thing is hanging is that although we correctly deactivate
> the interrupt, nothing performs the priority drop. Your write to EOI
> helps in the sense that it guarantees that both priority drop and
> deactivate are done with the same operation, but that's not something
> we'd want to expose.
>
> My preferred approach would be to nuke the active priority registers at
> boot time, as the CPUs come up. I'll try to write something this week.

I've made a PoC which performs priority drop at boot time as you
suggested and it works with both GICv2 and GICv3 but I see some
problem:
It seems that the only way to drop the priority is to perform write to
EOI register (the GIC_RPR is RO). According to GIC documentation a
write to EOI register must correspond to the most recent valid read
from IAR. The problem is that the interrupt was already acked in the
'original' kernel, so reading GICC_IAR in crashdump kernel returns
spurious interrupt and it seems that there is no way to figure out
appropriate irqnr for EOI write. Nevertheless I've observed that
choosing random irqnr for EOI write works fine (maybe because all
interrupts in Linux uses the same priority?).

Here is the PoC (not ready for submission only for further
discussion): https://pastebin.com/gLYNuRiZ

Looking forward to your feedback.

Thank you in advance,
Grzegorz
Marc Zyngier March 13, 2018, 5:25 p.m. UTC | #13
On 08/03/18 22:06, Grzegorz Jaszczyk wrote:
> 2018-03-02 17:57 GMT+01:00 Mark Rutland <mark.rutland@arm.com>:
>>>>> Do you see this for a panic() in *any* interrupt handler?
>>>>
>>>> I only test with this two interrupt handlers: watchdog and i2c but I
>>>> think it will behave the same with others - I can try with other if
>>>> you want, any suggestion which? Maybe with some PPI interrupt instead?
> 
> I was able to reproduce it from other interrupts handler (UART, I2C,
> timer and watchdog) no difference if it is PPI or SPI interrupt. I
> also reproduce this issue with GICv3. But again it only happens when
> eoimode = 0.
>>>>>
>>>>> Can you trigger the issue with magic-sysrq c, for example?
>>>>
>>>> There is no problem when I trigger it via 'echo c >
>>>> /proc/sysrq-trigger' - it works well all the time. The problem appears
>>>> only, when the kexec/kdump procedure is triggered from interrupt
>>>> context
>>>
>>> I'd meant that you'd send sysrq + c over serial, rather than writing to
>>> /proc/sysrq-trigger. That way, the panic will be in the context of the
>>> UART IRQ handler.
>>>
>>> If that shows the issue, that's ilikely to be the easiest way for
>>> someone else to reproduce and investigate this.
> 
> Yes it can be triggered by sending sysrq + c and indeed it is the
> easiest way to reproduce it.
>>
>> FWIW, having just given this a go on my Juno R1 with v4.16-rc3
>> defconfig, the UART IRQs work fine in the crash kernel. That crash
>> happened in IRQ context:
> 
> I think that by default Juno uses eoimode = 1, did you try it when
> eoimode was forced to be 0? Only eoimode = 0 triggers the issue.

FWIW, I've now posted fixes to LKML[1]. Feel free to test them and
report whether they fix the issue for you.

Thanks,

	M.

[1] https://lkml.org/lkml/2018/3/13/1088
diff mbox

Patch

diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index f76ea92..30ad183 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -220,20 +220,12 @@  static void machine_kexec_mask_interrupts(void)
 
 	for_each_irq_desc(i, desc) {
 		struct irq_chip *chip;
-		int ret;
 
 		chip = irq_desc_get_chip(desc);
 		if (!chip)
 			continue;
 
-		/*
-		 * First try to remove the active state. If this
-		 * fails, try to EOI the interrupt.
-		 */
-		ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);
-
-		if (ret && irqd_irq_inprogress(&desc->irq_data) &&
-		    chip->irq_eoi)
+		if (irqd_irq_inprogress(&desc->irq_data) && chip->irq_eoi)
 			chip->irq_eoi(&desc->irq_data);
 
 		if (chip->irq_mask)