diff mbox

[08/11] signal/arm: Document conflicts with SI_USER and SIGFPE

Message ID 20180112005940.23279-8-ebiederm@xmission.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric W. Biederman Jan. 12, 2018, 12:59 a.m. UTC
Setting si_code to 0 results in a userspace seeing an si_code of 0.
This is the same si_code as SI_USER.  Posix and common sense requires
that SI_USER not be a signal specific si_code.  As such this use of 0
for the si_code is a pretty horribly broken ABI.

Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
value of __SI_KILL and now sees a value of SIL_KILL with the result
that uid and pid fields are copied and which might copying the si_addr
field by accident but certainly not by design.  Making this a very
flakey implementation.

Utilizing FPE_FIXME, siginfo_layout will now return SIL_FAULT and the
appropriate fields will be reliably copied.

Possible ABI fixes includee:
- Send the signal without siginfo
- Don't generate a signal
- Possibly assign and use an appropriate si_code
- Don't handle cases which can't happen

Cc: Russell King <rmk@flint.arm.linux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Ref: 451436b7bbb2 ("[ARM] Add support code for ARM hardware vector floating point")
History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/arm/include/uapi/asm/siginfo.h | 13 +++++++++++++
 arch/arm/vfp/vfpmodule.c            |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/include/uapi/asm/siginfo.h

Comments

Russell King (Oracle) Jan. 15, 2018, 5:49 p.m. UTC | #1
On Thu, Jan 11, 2018 at 06:59:37PM -0600, Eric W. Biederman wrote:
> Setting si_code to 0 results in a userspace seeing an si_code of 0.
> This is the same si_code as SI_USER.  Posix and common sense requires
> that SI_USER not be a signal specific si_code.  As such this use of 0
> for the si_code is a pretty horribly broken ABI.
> 
> Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
> value of __SI_KILL and now sees a value of SIL_KILL with the result
> that uid and pid fields are copied and which might copying the si_addr
> field by accident but certainly not by design.  Making this a very
> flakey implementation.
> 
> Utilizing FPE_FIXME, siginfo_layout will now return SIL_FAULT and the
> appropriate fields will be reliably copied.

So what do you suggest when none of the SIGFPE FPE_xxx codes match the
condition that "we don't know what happened" ?  Raise a SIGKILL instead
maybe?  We will have dumped the VFP state into the kernel log at this
point, things are pretty much fscked.

It's probably an impossible condition unless the hardware has failed,
no one has knowingly reported getting such a dump in their kernel log,
so it's something that could very likely be changed in some way
without anyone noticing.
Eric W. Biederman Jan. 15, 2018, 8:12 p.m. UTC | #2
Russell King - ARM Linux <linux@armlinux.org.uk> writes:

> On Thu, Jan 11, 2018 at 06:59:37PM -0600, Eric W. Biederman wrote:
>> Setting si_code to 0 results in a userspace seeing an si_code of 0.
>> This is the same si_code as SI_USER.  Posix and common sense requires
>> that SI_USER not be a signal specific si_code.  As such this use of 0
>> for the si_code is a pretty horribly broken ABI.
>> 
>> Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
>> value of __SI_KILL and now sees a value of SIL_KILL with the result
>> that uid and pid fields are copied and which might copying the si_addr
>> field by accident but certainly not by design.  Making this a very
>> flakey implementation.
>> 
>> Utilizing FPE_FIXME, siginfo_layout will now return SIL_FAULT and the
>> appropriate fields will be reliably copied.
>
> So what do you suggest when none of the SIGFPE FPE_xxx codes match the
> condition that "we don't know what happened" ?  Raise a SIGKILL instead
> maybe?  We will have dumped the VFP state into the kernel log at this
> point, things are pretty much fscked.
>
> It's probably an impossible condition unless the hardware has failed,
> no one has knowingly reported getting such a dump in their kernel log,
> so it's something that could very likely be changed in some way
> without anyone noticing.

It sounds like we have two equally valid possible solutions:
1) force_sig(SIGKILL, current);
2) Allocate a new FPE_xxx code in asm-generic/siginfo.h
   I believe the next available number is 15.

If no one is going to notice this should be comparatively easy to fix.
I just don't have the knowledge of arm to make the judgement myself.

Eric
Dave Martin Jan. 16, 2018, 5:41 p.m. UTC | #3
On Mon, Jan 15, 2018 at 05:49:47PM +0000, Russell King - ARM Linux wrote:
> On Thu, Jan 11, 2018 at 06:59:37PM -0600, Eric W. Biederman wrote:
> > Setting si_code to 0 results in a userspace seeing an si_code of 0.
> > This is the same si_code as SI_USER.  Posix and common sense requires
> > that SI_USER not be a signal specific si_code.  As such this use of 0
> > for the si_code is a pretty horribly broken ABI.
> > 
> > Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
> > value of __SI_KILL and now sees a value of SIL_KILL with the result
> > that uid and pid fields are copied and which might copying the si_addr
> > field by accident but certainly not by design.  Making this a very
> > flakey implementation.
> > 
> > Utilizing FPE_FIXME, siginfo_layout will now return SIL_FAULT and the
> > appropriate fields will be reliably copied.
> 
> So what do you suggest when none of the SIGFPE FPE_xxx codes match the
> condition that "we don't know what happened" ?  Raise a SIGKILL instead
> maybe?  We will have dumped the VFP state into the kernel log at this
> point, things are pretty much fscked.
> 
> It's probably an impossible condition unless the hardware has failed,
> no one has knowingly reported getting such a dump in their kernel log,
> so it's something that could very likely be changed in some way
> without anyone noticing.

arm64 can optionally not tell you exactly what exception(s) happened for 
vector operations.  This may also be true for 32-bit.

Unfortunately, there's nothing sensible to report in this case, and
there is no such constant as FPE_UNKNOWN.

Currently arm64 sets si_code to 0, which doesn't match any FPE_*
constant (but unfortunately matches SI_USER) -- defining a new, distinct
FPE_UNKNOWN is probably no worse than that. 

I'm not sure yet whether a similar argument applies for 32-bit.

Cheers
---Dave
Dave Martin Jan. 19, 2018, 12:05 p.m. UTC | #4
On Mon, Jan 15, 2018 at 05:49:47PM +0000, Russell King - ARM Linux wrote:
> On Thu, Jan 11, 2018 at 06:59:37PM -0600, Eric W. Biederman wrote:
> > Setting si_code to 0 results in a userspace seeing an si_code of 0.
> > This is the same si_code as SI_USER.  Posix and common sense requires
> > that SI_USER not be a signal specific si_code.  As such this use of 0
> > for the si_code is a pretty horribly broken ABI.
> > 
> > Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a
> > value of __SI_KILL and now sees a value of SIL_KILL with the result
> > that uid and pid fields are copied and which might copying the si_addr
> > field by accident but certainly not by design.  Making this a very
> > flakey implementation.
> > 
> > Utilizing FPE_FIXME, siginfo_layout will now return SIL_FAULT and the
> > appropriate fields will be reliably copied.
> 
> So what do you suggest when none of the SIGFPE FPE_xxx codes match the
> condition that "we don't know what happened" ?  Raise a SIGKILL instead
> maybe?  We will have dumped the VFP state into the kernel log at this
> point, things are pretty much fscked.
> 
> It's probably an impossible condition unless the hardware has failed,
> no one has knowingly reported getting such a dump in their kernel log,
> so it's something that could very likely be changed in some way
> without anyone noticing.

Relating to this, what's your view on how to clean up the si_code zeros
in fsr-2level.c and fsr-3level.c?

Due to the historical evolution of the fault codes I'm less
confident of getting these right than for arm64.

Many are things that shouldn't happen and likely indicate a kernel bug
or system failure if they do, so at least some of the
{ do_bad, SIGxxx, 0, ... } entries can probably be changed to
{ do_bad, SIGKILL, SI_KERNEL, ... } with no ill effects.  But there
are many fault codes whose meaning has changed over time.

Cheers
---Dave
diff mbox

Patch

diff --git a/arch/arm/include/uapi/asm/siginfo.h b/arch/arm/include/uapi/asm/siginfo.h
new file mode 100644
index 000000000000..d0513880be21
--- /dev/null
+++ b/arch/arm/include/uapi/asm/siginfo.h
@@ -0,0 +1,13 @@ 
+#ifndef __ASM_SIGINFO_H
+#define __ASM_SIGINFO_H
+
+#include <asm-generic/siginfo.h>
+
+/*
+ * SIGFPE si_codes
+ */
+#ifdef __KERNEL__
+#define FPE_FIXME	0	/* Broken dup of SI_USER */
+#endif /* __KERNEL__ */
+
+#endif
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index a71a48e71fff..03c6a3c72f9c 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -257,7 +257,7 @@  static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct pt_
 
 	if (exceptions == VFP_EXCEPTION_ERROR) {
 		vfp_panic("unhandled bounce", inst);
-		vfp_raise_sigfpe(0, regs);
+		vfp_raise_sigfpe(FPE_FIXME, regs);
 		return;
 	}