Message ID | 20210326000235.370514-1-tony.luck@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix machine check recovery for copy_from_user | expand |
On Thu, 25 Mar 2021 17:02:31 -0700 Tony Luck <tony.luck@intel.com> wrote: > Maybe this is the way forward? I made some poor choices before > to treat poison consumption in the kernel when accessing user data > (get_user() or copy_from_user()) ... in particular assuming that > the right action was sending a SIGBUS to the task as if it had > synchronously accessed the poison location. > > First three patches may need to be combined (or broken up differently) > for bisectablilty. But they are presented separately here since they > touch separate parts of the problem. > > Second part is definitley incomplete. But I'd like to check that it > is the right approach before expending more brain cells in the maze > of nested macros that is lib/iov_iter.c > > Last part has been posted before. It covers the case where the kernel > takes more than one swing at reading poison data before returning to > user. > > Tony Luck (4): > x86/mce: Fix copyin code to return -EFAULT on machine check. > mce/iter: Check for copyin failure & return error up stack > mce/copyin: fix to not SIGBUS when copying from user hits poison > x86/mce: Avoid infinite loop for copy from user recovery > > arch/x86/kernel/cpu/mce/core.c | 63 +++++++++++++++++++++--------- > arch/x86/kernel/cpu/mce/severity.c | 2 - > arch/x86/lib/copy_user_64.S | 18 +++++---- > fs/iomap/buffered-io.c | 8 +++- > include/linux/sched.h | 2 +- > include/linux/uio.h | 2 +- > lib/iov_iter.c | 15 ++++++- > 7 files changed, 77 insertions(+), 33 deletions(-) > I have one scenario, may you take into account: If one copyin case occurs, write() returned by your patch, the user process may check the return values, for errors, it may exit the process, then the error page will be freed, and then the page maybe alloced to other process or to kernel itself, then code will initialize it and this will trigger one SRAO, if it's used by kernel, we may do nothing for this, and kernel may still touch it, and lead to one panic. Is this we expect? Thanks! Aili Yao
> I have one scenario, may you take into account: > > If one copyin case occurs, write() returned by your patch, the user process may > check the return values, for errors, it may exit the process, then the error page > will be freed, and then the page maybe alloced to other process or to kernel itself, > then code will initialize it and this will trigger one SRAO, if it's used by kernel, > we may do nothing for this, and kernel may still touch it, and lead to one panic. In this case kill_me_never() calls memory_failure() with flags == 0. I think (hope!) that means that it will unmap the page from the task, but will not send a signal. When the task exits the PTE for this page has the swap/poison signature, so the page is not freed for re-use. -Tony
On Thu, 8 Apr 2021 14:39:09 +0000 "Luck, Tony" <tony.luck@intel.com> wrote: > > I have one scenario, may you take into account: > > > > If one copyin case occurs, write() returned by your patch, the user process may > > check the return values, for errors, it may exit the process, then the error page > > will be freed, and then the page maybe alloced to other process or to kernel itself, > > then code will initialize it and this will trigger one SRAO, if it's used by kernel, > > we may do nothing for this, and kernel may still touch it, and lead to one panic. > > In this case kill_me_never() calls memory_failure() with flags == 0. I think (hope!) > that means that it will unmap the page from the task, but will not send a signal. > > When the task exits the PTE for this page has the swap/poison signature, so the > page is not freed for re-use. > > -Tony Oh, Yes, Sorry for my rudeness and error-understandings, I just happen to can't control my emotions and get confused for some other things. Thanks! Aili Yao