Message ID | 20180112005940.23279-8-ebiederm@xmission.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jan 11, 2018 at 06:59:37PM -0600, Eric W. Biederman wrote: > Setting si_code to 0 results in a userspace seeing an si_code of 0. > This is the same si_code as SI_USER. Posix and common sense requires > that SI_USER not be a signal specific si_code. As such this use of 0 > for the si_code is a pretty horribly broken ABI. > > Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a > value of __SI_KILL and now sees a value of SIL_KILL with the result > that uid and pid fields are copied and which might copying the si_addr > field by accident but certainly not by design. Making this a very > flakey implementation. > > Utilizing FPE_FIXME, siginfo_layout will now return SIL_FAULT and the > appropriate fields will be reliably copied. So what do you suggest when none of the SIGFPE FPE_xxx codes match the condition that "we don't know what happened" ? Raise a SIGKILL instead maybe? We will have dumped the VFP state into the kernel log at this point, things are pretty much fscked. It's probably an impossible condition unless the hardware has failed, no one has knowingly reported getting such a dump in their kernel log, so it's something that could very likely be changed in some way without anyone noticing.
Russell King - ARM Linux <linux@armlinux.org.uk> writes: > On Thu, Jan 11, 2018 at 06:59:37PM -0600, Eric W. Biederman wrote: >> Setting si_code to 0 results in a userspace seeing an si_code of 0. >> This is the same si_code as SI_USER. Posix and common sense requires >> that SI_USER not be a signal specific si_code. As such this use of 0 >> for the si_code is a pretty horribly broken ABI. >> >> Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a >> value of __SI_KILL and now sees a value of SIL_KILL with the result >> that uid and pid fields are copied and which might copying the si_addr >> field by accident but certainly not by design. Making this a very >> flakey implementation. >> >> Utilizing FPE_FIXME, siginfo_layout will now return SIL_FAULT and the >> appropriate fields will be reliably copied. > > So what do you suggest when none of the SIGFPE FPE_xxx codes match the > condition that "we don't know what happened" ? Raise a SIGKILL instead > maybe? We will have dumped the VFP state into the kernel log at this > point, things are pretty much fscked. > > It's probably an impossible condition unless the hardware has failed, > no one has knowingly reported getting such a dump in their kernel log, > so it's something that could very likely be changed in some way > without anyone noticing. It sounds like we have two equally valid possible solutions: 1) force_sig(SIGKILL, current); 2) Allocate a new FPE_xxx code in asm-generic/siginfo.h I believe the next available number is 15. If no one is going to notice this should be comparatively easy to fix. I just don't have the knowledge of arm to make the judgement myself. Eric
On Mon, Jan 15, 2018 at 05:49:47PM +0000, Russell King - ARM Linux wrote: > On Thu, Jan 11, 2018 at 06:59:37PM -0600, Eric W. Biederman wrote: > > Setting si_code to 0 results in a userspace seeing an si_code of 0. > > This is the same si_code as SI_USER. Posix and common sense requires > > that SI_USER not be a signal specific si_code. As such this use of 0 > > for the si_code is a pretty horribly broken ABI. > > > > Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a > > value of __SI_KILL and now sees a value of SIL_KILL with the result > > that uid and pid fields are copied and which might copying the si_addr > > field by accident but certainly not by design. Making this a very > > flakey implementation. > > > > Utilizing FPE_FIXME, siginfo_layout will now return SIL_FAULT and the > > appropriate fields will be reliably copied. > > So what do you suggest when none of the SIGFPE FPE_xxx codes match the > condition that "we don't know what happened" ? Raise a SIGKILL instead > maybe? We will have dumped the VFP state into the kernel log at this > point, things are pretty much fscked. > > It's probably an impossible condition unless the hardware has failed, > no one has knowingly reported getting such a dump in their kernel log, > so it's something that could very likely be changed in some way > without anyone noticing. arm64 can optionally not tell you exactly what exception(s) happened for vector operations. This may also be true for 32-bit. Unfortunately, there's nothing sensible to report in this case, and there is no such constant as FPE_UNKNOWN. Currently arm64 sets si_code to 0, which doesn't match any FPE_* constant (but unfortunately matches SI_USER) -- defining a new, distinct FPE_UNKNOWN is probably no worse than that. I'm not sure yet whether a similar argument applies for 32-bit. Cheers ---Dave
On Mon, Jan 15, 2018 at 05:49:47PM +0000, Russell King - ARM Linux wrote: > On Thu, Jan 11, 2018 at 06:59:37PM -0600, Eric W. Biederman wrote: > > Setting si_code to 0 results in a userspace seeing an si_code of 0. > > This is the same si_code as SI_USER. Posix and common sense requires > > that SI_USER not be a signal specific si_code. As such this use of 0 > > for the si_code is a pretty horribly broken ABI. > > > > Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a > > value of __SI_KILL and now sees a value of SIL_KILL with the result > > that uid and pid fields are copied and which might copying the si_addr > > field by accident but certainly not by design. Making this a very > > flakey implementation. > > > > Utilizing FPE_FIXME, siginfo_layout will now return SIL_FAULT and the > > appropriate fields will be reliably copied. > > So what do you suggest when none of the SIGFPE FPE_xxx codes match the > condition that "we don't know what happened" ? Raise a SIGKILL instead > maybe? We will have dumped the VFP state into the kernel log at this > point, things are pretty much fscked. > > It's probably an impossible condition unless the hardware has failed, > no one has knowingly reported getting such a dump in their kernel log, > so it's something that could very likely be changed in some way > without anyone noticing. Relating to this, what's your view on how to clean up the si_code zeros in fsr-2level.c and fsr-3level.c? Due to the historical evolution of the fault codes I'm less confident of getting these right than for arm64. Many are things that shouldn't happen and likely indicate a kernel bug or system failure if they do, so at least some of the { do_bad, SIGxxx, 0, ... } entries can probably be changed to { do_bad, SIGKILL, SI_KERNEL, ... } with no ill effects. But there are many fault codes whose meaning has changed over time. Cheers ---Dave
diff --git a/arch/arm/include/uapi/asm/siginfo.h b/arch/arm/include/uapi/asm/siginfo.h new file mode 100644 index 000000000000..d0513880be21 --- /dev/null +++ b/arch/arm/include/uapi/asm/siginfo.h @@ -0,0 +1,13 @@ +#ifndef __ASM_SIGINFO_H +#define __ASM_SIGINFO_H + +#include <asm-generic/siginfo.h> + +/* + * SIGFPE si_codes + */ +#ifdef __KERNEL__ +#define FPE_FIXME 0 /* Broken dup of SI_USER */ +#endif /* __KERNEL__ */ + +#endif diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index a71a48e71fff..03c6a3c72f9c 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -257,7 +257,7 @@ static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct pt_ if (exceptions == VFP_EXCEPTION_ERROR) { vfp_panic("unhandled bounce", inst); - vfp_raise_sigfpe(0, regs); + vfp_raise_sigfpe(FPE_FIXME, regs); return; }
Setting si_code to 0 results in a userspace seeing an si_code of 0. This is the same si_code as SI_USER. Posix and common sense requires that SI_USER not be a signal specific si_code. As such this use of 0 for the si_code is a pretty horribly broken ABI. Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a value of __SI_KILL and now sees a value of SIL_KILL with the result that uid and pid fields are copied and which might copying the si_addr field by accident but certainly not by design. Making this a very flakey implementation. Utilizing FPE_FIXME, siginfo_layout will now return SIL_FAULT and the appropriate fields will be reliably copied. Possible ABI fixes includee: - Send the signal without siginfo - Don't generate a signal - Possibly assign and use an appropriate si_code - Don't handle cases which can't happen Cc: Russell King <rmk@flint.arm.linux.org.uk> Cc: linux-arm-kernel@lists.infradead.org Ref: 451436b7bbb2 ("[ARM] Add support code for ARM hardware vector floating point") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- arch/arm/include/uapi/asm/siginfo.h | 13 +++++++++++++ arch/arm/vfp/vfpmodule.c | 2 +- 2 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 arch/arm/include/uapi/asm/siginfo.h