Message ID | 20210120162422.0ed3dd56.yaoaili@kingsoft.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v5] mm,hwpoison: send SIGBUS to PF_MCE_EARLY processes on action required events | expand |
On Wed, Jan 20, 2021 at 04:24:22PM +0800, Aili Yao wrote: > When a memory uncorrected error is triggered by process who accessed > the address with error, It's Action Required Case for only current > process which triggered this; This Action Required case means Action > optional to other process who share the same page. Usually killing > current process will be sufficient, other processes sharing the same > page will get be signaled when they really touch the poisoned page. > > But there is another scenario that other processes > sharing the same page want to be signaled early with PF_MCE_EARLY set, > In this case, we should get them into kill list and signal > BUS_MCEERR_AO to them. > > So in this patch, task_early_kill will check current process if > force_early is set, and if not current,the code will fallback to > find_early_kill_thread() to check if there is PF_MCE_EARLY process > who cares the error. > > In kill_proc(), BUS_MCEERR_AR is only send to current, other processes in > kill list will be signaled with BUS_MCEERR_AO. > > Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > Signed-off-by: Aili Yao <yaoaili@kingsoft.com> Looks good to me, a few nits below. Reviewed-by: Oscar Salvador <osalvador@suse.de> > @@ -243,9 +243,12 @@ static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags) > pfn, t->comm, t->pid); > > if (flags & MF_ACTION_REQUIRED) { > - WARN_ON_ONCE(t != current); > - ret = force_sig_mceerr(BUS_MCEERR_AR, > + if (tk->tsk == current) You can re-use "t" here. > + ret = force_sig_mceerr(BUS_MCEERR_AR, > (void __user *)tk->addr, addr_lsb); > + else > + ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr, > + addr_lsb, t); I would place a brief comment above explaining why we are sending BUS_MCEER_AO to non-current tasks. E.g: "Signal other processes sharing the page if they have PF_MCE_EARLY set" > @@ -457,8 +463,6 @@ static struct task_struct *task_early_kill(struct task_struct *tsk, > */ > if (tsk->mm == current->mm) > return current; > - else > - return NULL; if (force_early && task->mm == current->mm) return current;
On Thu, 21 Jan 2021 11:03:05 +0100 Oscar Salvador <osalvador@suse.de> wrote: > On Wed, Jan 20, 2021 at 04:24:22PM +0800, Aili Yao wrote: > > When a memory uncorrected error is triggered by process who accessed > > the address with error, It's Action Required Case for only current > > process which triggered this; This Action Required case means Action > > optional to other process who share the same page. Usually killing > > current process will be sufficient, other processes sharing the same > > page will get be signaled when they really touch the poisoned page. > > > > But there is another scenario that other processes > > sharing the same page want to be signaled early with PF_MCE_EARLY set, > > In this case, we should get them into kill list and signal > > BUS_MCEERR_AO to them. > > > > So in this patch, task_early_kill will check current process if > > force_early is set, and if not current,the code will fallback to > > find_early_kill_thread() to check if there is PF_MCE_EARLY process > > who cares the error. > > > > In kill_proc(), BUS_MCEERR_AR is only send to current, other processes in > > kill list will be signaled with BUS_MCEERR_AO. > > > > Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > > Signed-off-by: Aili Yao <yaoaili@kingsoft.com> > > Looks good to me, a few nits below. > > Reviewed-by: Oscar Salvador <osalvador@suse.de> > > > > @@ -243,9 +243,12 @@ static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags) > > pfn, t->comm, t->pid); > > > > if (flags & MF_ACTION_REQUIRED) { > > - WARN_ON_ONCE(t != current); > > - ret = force_sig_mceerr(BUS_MCEERR_AR, > > + if (tk->tsk == current) > You can re-use "t" here. yeah, this look better, I will change to that. Thanks! > > > + ret = force_sig_mceerr(BUS_MCEERR_AR, > > (void __user *)tk->addr, addr_lsb); > > + else > > + ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr, > > + addr_lsb, t); > > I would place a brief comment above explaining why we are sending BUS_MCEER_AO > to non-current tasks. > E.g: "Signal other processes sharing the page if they have PF_MCE_EARLY set" Yes, it's good to have > > > @@ -457,8 +463,6 @@ static struct task_struct *task_early_kill(struct task_struct *tsk, > > */ > > if (tsk->mm == current->mm) > > return current; > > - else > > - return NULL; > > if (force_early && task->mm == current->mm) > return current; > > This looks more cleaner. I will modify above in a v6 patch! Thanks,
diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5a38e9eade94..441fe2787cd4 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -243,9 +243,12 @@ static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags) pfn, t->comm, t->pid); if (flags & MF_ACTION_REQUIRED) { - WARN_ON_ONCE(t != current); - ret = force_sig_mceerr(BUS_MCEERR_AR, + if (tk->tsk == current) + ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)tk->addr, addr_lsb); + else + ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr, + addr_lsb, t); } else { /* * Don't use force here, it's convenient if the signal @@ -440,10 +443,13 @@ static struct task_struct *find_early_kill_thread(struct task_struct *tsk) * Determine whether a given process is "early kill" process which expects * to be signaled when some page under the process is hwpoisoned. * Return task_struct of the dedicated thread (main thread unless explicitly - * specified) if the process is "early kill," and otherwise returns NULL. + * specified) if the process is "early kill" and otherwise returns NULL. * - * Note that the above is true for Action Optional case, but not for Action - * Required case where SIGBUS should sent only to the current thread. + * Note that the above is true for Action Optional case. For Action Required + * case, it's only meaningful to the current thread which need to be signaled + * with SIGBUS, this error is Action Optional for other non current + * processes sharing the same error page,if the process is "early kill", the + * task_struct of the dedicated thread will also be returned. */ static struct task_struct *task_early_kill(struct task_struct *tsk, int force_early) @@ -457,8 +463,6 @@ static struct task_struct *task_early_kill(struct task_struct *tsk, */ if (tsk->mm == current->mm) return current; - else - return NULL; } return find_early_kill_thread(tsk); }