Message ID | 20180921221705.6478-18-james.morse@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | APEI in_nmi() rework | expand |
+ Peter. On Fri, Sep 21, 2018 at 11:17:04PM +0100, James Morse wrote: > arm64 can take an NMI-like error notification when user-space steps in > some corrupt memory. APEI's GHES code will call memory_failure_queue() > to schedule the recovery work. We then return to user-space, possibly > taking the fault again. > > Currently the arch code unconditionally signals user-space from this > path, so we don't get stuck in this loop, but the affected process > never benefits from memory_failure()s recovery work. To fix this we > need to know the recovery work will run before we get back to user-space. > > Increase the priority of the recovery work by scheduling it on the > system_highpri_wq, then try to bump the current task off this CPU > so that the recovery work starts immediately. > > Reported-by: Xie XiuQi <xiexiuqi@huawei.com> > Signed-off-by: James Morse <james.morse@arm.com> > Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> > Tested-by: Tyler Baicar <tbaicar@codeaurora.org> > Tested-by: gengdongjiu <gengdongjiu@huawei.com> > CC: Xie XiuQi <xiexiuqi@huawei.com> > CC: gengdongjiu <gengdongjiu@huawei.com> > --- > mm/memory-failure.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 0cd3de3550f0..4e7b115cea5a 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -56,6 +56,7 @@ > #include <linux/memory_hotplug.h> > #include <linux/mm_inline.h> > #include <linux/memremap.h> > +#include <linux/preempt.h> > #include <linux/kfifo.h> > #include <linux/ratelimit.h> > #include <linux/page-isolation.h> > @@ -1454,6 +1455,7 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu); > */ > void memory_failure_queue(unsigned long pfn, int flags) > { > + int cpu = smp_processor_id(); > struct memory_failure_cpu *mf_cpu; > unsigned long proc_flags; > struct memory_failure_entry entry = { > @@ -1463,11 +1465,14 @@ void memory_failure_queue(unsigned long pfn, int flags) > > mf_cpu = &get_cpu_var(memory_failure_cpu); > spin_lock_irqsave(&mf_cpu->lock, proc_flags); > - if (kfifo_put(&mf_cpu->fifo, entry)) > - schedule_work_on(smp_processor_id(), &mf_cpu->work); > - else > + if (kfifo_put(&mf_cpu->fifo, entry)) { > + queue_work_on(cpu, system_highpri_wq, &mf_cpu->work); > + set_tsk_need_resched(current); > + preempt_set_need_resched(); What guarantees the workqueue would run before the process? I see this: ``WQ_HIGHPRI`` Work items of a highpri wq are queued to the highpri worker-pool of the target cpu. Highpri worker-pools are served by worker threads with elevated nice level. but is that enough?
On Mon, Oct 15, 2018 at 06:49:13PM +0200, Borislav Petkov wrote: > On Fri, Sep 21, 2018 at 11:17:04PM +0100, James Morse wrote: > > @@ -1463,11 +1465,14 @@ void memory_failure_queue(unsigned long pfn, int flags) > > > > mf_cpu = &get_cpu_var(memory_failure_cpu); > > spin_lock_irqsave(&mf_cpu->lock, proc_flags); > > - if (kfifo_put(&mf_cpu->fifo, entry)) > > - schedule_work_on(smp_processor_id(), &mf_cpu->work); > > - else > > + if (kfifo_put(&mf_cpu->fifo, entry)) { > > + queue_work_on(cpu, system_highpri_wq, &mf_cpu->work); > > + set_tsk_need_resched(current); > > + preempt_set_need_resched(); > > What guarantees the workqueue would run before the process? I see this: > > ``WQ_HIGHPRI`` > Work items of a highpri wq are queued to the highpri > worker-pool of the target cpu. Highpri worker-pools are > served by worker threads with elevated nice level. > > but is that enough? Nope. Nice just makes it more likely, but no guarantees what so ever. If you want to absolutely run something before we return to userspace, would not task_work() be what we're looking for?
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 0cd3de3550f0..4e7b115cea5a 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -56,6 +56,7 @@ #include <linux/memory_hotplug.h> #include <linux/mm_inline.h> #include <linux/memremap.h> +#include <linux/preempt.h> #include <linux/kfifo.h> #include <linux/ratelimit.h> #include <linux/page-isolation.h> @@ -1454,6 +1455,7 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu); */ void memory_failure_queue(unsigned long pfn, int flags) { + int cpu = smp_processor_id(); struct memory_failure_cpu *mf_cpu; unsigned long proc_flags; struct memory_failure_entry entry = { @@ -1463,11 +1465,14 @@ void memory_failure_queue(unsigned long pfn, int flags) mf_cpu = &get_cpu_var(memory_failure_cpu); spin_lock_irqsave(&mf_cpu->lock, proc_flags); - if (kfifo_put(&mf_cpu->fifo, entry)) - schedule_work_on(smp_processor_id(), &mf_cpu->work); - else + if (kfifo_put(&mf_cpu->fifo, entry)) { + queue_work_on(cpu, system_highpri_wq, &mf_cpu->work); + set_tsk_need_resched(current); + preempt_set_need_resched(); + } else { pr_err("Memory failure: buffer overflow when queuing memory failure at %#lx\n", pfn); + } spin_unlock_irqrestore(&mf_cpu->lock, proc_flags); put_cpu_var(memory_failure_cpu); }