mbox series

[0/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors

Message ID 20200228174817.74278-1-james.morse@arm.com (mailing list archive)
Headers show
Series ACPI / APEI: Kick the memory_failure() queue for synchronous errors | expand

Message

James Morse Feb. 28, 2020, 5:48 p.m. UTC
Hello!

These are the remaining patches from the SDEI series[0] that fix
a race between memory_failure() and user-space re-triggering the error
in ghes.c.


ghes_handle_memory_failure() calls memory_failure_queue() from
IRQ context to schedule memory_failure()s work as it needs to sleep.
Once the GHES machinery returns from the IRQ, it may return to user-space
before memory_failure() runs.

If the error that kicked all this off is specific to user-space, e.g. a
load from corrupted memory, we may find ourselves taking the error
again. If the user-space task is scheduled out, and memory_failure() runs,
the same user-space task may be scheduled in on another CPU, which could
also take the same error.

These lead to exaggerated error counters, which may cause some threshold
to be reached early.

This can happen with any error that causes a Synchronous External Abort
on arm64. I can't see why the same wouldn't happen with a machine-check
handled firmware first on x86.


This series adds a memory_failure_queue_kick() helper to
memory-failure.c, and calls it as task-work before returning to
user-space.


Currently arm64 papers over this problem by ignoring ghes_notify_sea()'s
return code as it knows there is still work to do. arm64 generates its
own signal to user-space, which means the first task to discover an
error will always be killed, even if the error was later handled.
(which is no improvement on the no-RAS behaviour)

As a final piece, arm64 can try to process the irq work queued by
ghes_notify_sea() while its still in the external abort handler. A succesfull
return value here now means the memory_failure() work will be done before we
return to user-space, we no longer need to generate our own signal.
This lets the original task survive the error if memory_failure() can
recover the corrupted memory.

Based on v5.6-rc2. I'm afraid it touches three different trees.
$subject says ACPI as that is where the bulk of the diffstat is.

This series may conflict in arm64 with a series from Mark Rutland to
cleanup the daif/PMR toggling.


This would be v9 of these patches, but after a year I figure I should
start the numbering again. I've dropped any collected tags.

Known issues:
 * arm64's apei_claim_sea() may unwittingly re-enable debug if it takes
   an external-abort from debug context. Patch 3 makes this worse
   instead of fixing it. The fix would make use of helpers from Mark R's
   series.


Thanks,

James


[0] https://lore.kernel.org/linux-arm-kernel/20190129184902.102850-1-james.morse@arm.com/
[1] https://lore.kernel.org/linux-acpi/1506516620-20033-3-git-send-email-xiexiuqi@huawei.com/

James Morse (3):
  mm/memory-failure: Add memory_failure_queue_kick()
  ACPI / APEI: Kick the memory_failure() queue for synchronous errors
  arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work

 arch/arm64/kernel/acpi.c | 25 +++++++++++++++
 arch/arm64/mm/fault.c    | 12 ++++---
 drivers/acpi/apei/ghes.c | 68 +++++++++++++++++++++++++++++++++-------
 include/acpi/ghes.h      |  3 ++
 include/linux/mm.h       |  1 +
 mm/memory-failure.c      | 15 ++++++++-
 6 files changed, 107 insertions(+), 17 deletions(-)

Comments

Tyler Baicar March 20, 2020, 7:19 p.m. UTC | #1
Hello James,

I think my one comment on patch 2 is valid, right? But for this series:

Tested-by: Tyler Baicar <baicar@os.amperecomputing.com>

Thanks,
Tyler

On Fri, Feb 28, 2020 at 12:48 PM James Morse <james.morse@arm.com> wrote:
>
> Hello!
>
> These are the remaining patches from the SDEI series[0] that fix
> a race between memory_failure() and user-space re-triggering the error
> in ghes.c.
>
>
> ghes_handle_memory_failure() calls memory_failure_queue() from
> IRQ context to schedule memory_failure()s work as it needs to sleep.
> Once the GHES machinery returns from the IRQ, it may return to user-space
> before memory_failure() runs.
>
> If the error that kicked all this off is specific to user-space, e.g. a
> load from corrupted memory, we may find ourselves taking the error
> again. If the user-space task is scheduled out, and memory_failure() runs,
> the same user-space task may be scheduled in on another CPU, which could
> also take the same error.
>
> These lead to exaggerated error counters, which may cause some threshold
> to be reached early.
>
> This can happen with any error that causes a Synchronous External Abort
> on arm64. I can't see why the same wouldn't happen with a machine-check
> handled firmware first on x86.
>
>
> This series adds a memory_failure_queue_kick() helper to
> memory-failure.c, and calls it as task-work before returning to
> user-space.
>
>
> Currently arm64 papers over this problem by ignoring ghes_notify_sea()'s
> return code as it knows there is still work to do. arm64 generates its
> own signal to user-space, which means the first task to discover an
> error will always be killed, even if the error was later handled.
> (which is no improvement on the no-RAS behaviour)
>
> As a final piece, arm64 can try to process the irq work queued by
> ghes_notify_sea() while its still in the external abort handler. A succesfull
> return value here now means the memory_failure() work will be done before we
> return to user-space, we no longer need to generate our own signal.
> This lets the original task survive the error if memory_failure() can
> recover the corrupted memory.
>
> Based on v5.6-rc2. I'm afraid it touches three different trees.
> $subject says ACPI as that is where the bulk of the diffstat is.
>
> This series may conflict in arm64 with a series from Mark Rutland to
> cleanup the daif/PMR toggling.
>
>
> This would be v9 of these patches, but after a year I figure I should
> start the numbering again. I've dropped any collected tags.
>
> Known issues:
>  * arm64's apei_claim_sea() may unwittingly re-enable debug if it takes
>    an external-abort from debug context. Patch 3 makes this worse
>    instead of fixing it. The fix would make use of helpers from Mark R's
>    series.
>
>
> Thanks,
>
> James
>
>
> [0] https://lore.kernel.org/linux-arm-kernel/20190129184902.102850-1-james.morse@arm.com/
> [1] https://lore.kernel.org/linux-acpi/1506516620-20033-3-git-send-email-xiexiuqi@huawei.com/
>
> James Morse (3):
>   mm/memory-failure: Add memory_failure_queue_kick()
>   ACPI / APEI: Kick the memory_failure() queue for synchronous errors
>   arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
>
>  arch/arm64/kernel/acpi.c | 25 +++++++++++++++
>  arch/arm64/mm/fault.c    | 12 ++++---
>  drivers/acpi/apei/ghes.c | 68 +++++++++++++++++++++++++++++++++-------
>  include/acpi/ghes.h      |  3 ++
>  include/linux/mm.h       |  1 +
>  mm/memory-failure.c      | 15 ++++++++-
>  6 files changed, 107 insertions(+), 17 deletions(-)
>
> --
> 2.24.1