Message ID | CA+55aFwoThxqUeAZSsMhf--ODyhGkmOENH5R6=4+CuaopFx9eA@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote: > On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux > <linux@armlinux.org.uk> wrote: > > > > Yes, it does solve the problem at hand with strace - the exact patch I > > tested against 4.16 is below. > > Ok, good. > > > However, FPE_FLTUNK is not defined in older kernels, so while we can > > fix it this way for the current merge window, that doesn't help 4.16. > > I wonder if we should even bother with FPE_FLTUNK. > > I suspect we might as well use FPE_FLTINV, I suspect, and not have > this complexity at all. That case is not worth worrying about, since > it's a "this shouldn't happen anyway" and the *real* reason will be in > the kernel logs due to vfs_panic(). > > So it's not like this is something that the user should ever care > about the si_code about. Ack, my intended meaning for FPE_FLTUNK is that the fp exception is either spurious or we can't tell easily (or possibly at all) which FPE_XXX should be returned. It's up to userspace to figure it out if it really cares. Previously we were accidentally returning SI_USER in si_code for arm64. This case on arm looks like a more serious error for which FPE_FLTINV may be more appropriate anyway. [...] Cheers ---Dave
On Fri, Apr 13, 2018 at 06:08:28PM +0100, Dave Martin wrote: > On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote: > > On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux > > <linux@armlinux.org.uk> wrote: > > > > > > Yes, it does solve the problem at hand with strace - the exact patch I > > > tested against 4.16 is below. > > > > Ok, good. > > > > > However, FPE_FLTUNK is not defined in older kernels, so while we can > > > fix it this way for the current merge window, that doesn't help 4.16. > > > > I wonder if we should even bother with FPE_FLTUNK. > > > > I suspect we might as well use FPE_FLTINV, I suspect, and not have > > this complexity at all. That case is not worth worrying about, since > > it's a "this shouldn't happen anyway" and the *real* reason will be in > > the kernel logs due to vfs_panic(). > > > > So it's not like this is something that the user should ever care > > about the si_code about. > > Ack, my intended meaning for FPE_FLTUNK is that the fp exception is > either spurious or we can't tell easily (or possibly at all) which > FPE_XXX should be returned. It's up to userspace to figure it out > if it really cares. Previously we were accidentally returning SI_USER > in si_code for arm64. > > This case on arm looks like a more serious error for which FPE_FLTINV > may be more appropriate anyway. No. The cases where we get to this point are: 1. A trap concerning a coprocessor register transfer instruction (iow, move between a VFP register and ARM register.) 2. A trap concerning a coprocessor register load or save instruction. (In both of these, "concerning" means that the VFP hardware provides such an instruction as the reason for the fault, *not* that it is the faulting instruction.) 3. A combination of the exception bits (EX and DEX) on certain VFP implementations. All of these can be summarised as "the hardware went wrong in some way" rather than "the user program did something wrong." FPE_FLTINV means "floating point invalid operation". Does it really cover the case where hardware has failed, or is it intended to cover the case where userspace did something wrong and asked for an invalid operation from the FP hardware?
On Fri, Apr 13, 2018 at 10:54 AM, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > > FPE_FLTINV means "floating point invalid operation". Does it really > cover the case where hardware has failed, or is it intended to cover > the case where userspace did something wrong and asked for an invalid > operation from the FP hardware? Note that the number of people who actually look at the si_code is approximately zero. But the ones that _do_ check the si_code are certainly not going to check it against a new code that they don't know about. I suspect that if you start searching for FLT_xyz occurrences in code, approximately 100% of them are from the kernel code that generates them, not from any actual users. So I'd be very surprised if you can find *anybody* who cares about that exact value (with the possible exceptions of test-suites). Sadly, google code-search is no more. It was useful for things like that. Linus
On Fri, Apr 13, 2018 at 06:54:08PM +0100, Russell King - ARM Linux wrote: > On Fri, Apr 13, 2018 at 06:08:28PM +0100, Dave Martin wrote: > > On Fri, Apr 13, 2018 at 09:33:17AM -0700, Linus Torvalds wrote: > > > On Fri, Apr 13, 2018 at 2:42 AM, Russell King - ARM Linux > > > <linux@armlinux.org.uk> wrote: > > > > > > > > Yes, it does solve the problem at hand with strace - the exact patch I > > > > tested against 4.16 is below. > > > > > > Ok, good. > > > > > > > However, FPE_FLTUNK is not defined in older kernels, so while we can > > > > fix it this way for the current merge window, that doesn't help 4.16. > > > > > > I wonder if we should even bother with FPE_FLTUNK. > > > > > > I suspect we might as well use FPE_FLTINV, I suspect, and not have > > > this complexity at all. That case is not worth worrying about, since > > > it's a "this shouldn't happen anyway" and the *real* reason will be in > > > the kernel logs due to vfs_panic(). > > > > > > So it's not like this is something that the user should ever care > > > about the si_code about. > > > > Ack, my intended meaning for FPE_FLTUNK is that the fp exception is > > either spurious or we can't tell easily (or possibly at all) which > > FPE_XXX should be returned. It's up to userspace to figure it out > > if it really cares. Previously we were accidentally returning SI_USER > > in si_code for arm64. > > > > This case on arm looks like a more serious error for which FPE_FLTINV > > may be more appropriate anyway. > > No. The cases where we get to this point are: > > 1. A trap concerning a coprocessor register transfer instruction (iow, move > between a VFP register and ARM register.) > 2. A trap concerning a coprocessor register load or save instruction. > > (In both of these, "concerning" means that the VFP hardware provides > such an instruction as the reason for the fault, *not* that it is the > faulting instruction.) > > 3. A combination of the exception bits (EX and DEX) on certain VFP > implementations. > > All of these can be summarised as "the hardware went wrong in some way" > rather than "the user program did something wrong." Although my understanding of VFP bounces is a bit hazy, I think this is broadly in line with my assumptions. > FPE_FLTINV means "floating point invalid operation". Does it really > cover the case where hardware has failed, or is it intended to cover > the case where userspace did something wrong and asked for an invalid > operation from the FP hardware? So, there's an argument that FPE_FLTINV is not really correct. My rationale was that there is nothing correct that we can return, and FPE_FLTINV may be no worse than the alternatives. If we can only hit this case as the result of a hardware failure or kernel bug though, should this be delivered as SIGKILL instead? That's the approach I eventually followed for various exceptions on arm64 that were theoretically delivered to userspace with si_code==0, but really should be impossible unless and kernel and/or hardware is buggy. If that's the case though, I don't see how a userspace testsuite is hitting this code path. Maybe I've misunderstood the context of this thread. Cheers ---Dave
On Fri, Apr 13, 2018 at 11:23:36AM -0700, Linus Torvalds wrote: > On Fri, Apr 13, 2018 at 10:54 AM, Russell King - ARM Linux > <linux@armlinux.org.uk> wrote: > > > > FPE_FLTINV means "floating point invalid operation". Does it really > > cover the case where hardware has failed, or is it intended to cover > > the case where userspace did something wrong and asked for an invalid > > operation from the FP hardware? > > Note that the number of people who actually look at the si_code is > approximately zero. > > But the ones that _do_ check the si_code are certainly not going to > check it against a new code that they don't know about. > > I suspect that if you start searching for FLT_xyz occurrences in code, > approximately 100% of them are from the kernel code that generates > them, not from any actual users. > > So I'd be very surprised if you can find *anybody* who cares about > that exact value (with the possible exceptions of test-suites). > > Sadly, google code-search is no more. It was useful for things like that. I've found https://codesearch.debian.net/ useful for digging into this kind of question, though it tends to throw up a lot of false positives. Most uses I've seen do nothing more than use the FPE_xyz value to format diagnostic messages while dying. I struggled to find code that made a meaningful functional decision based on the value, though that's not proof... Cheers ---Dave
On Fri, Apr 13, 2018 at 07:35:38PM +0100, Dave Martin wrote: > If that's the case though, I don't see how a userspace testsuite is > hitting this code path. Maybe I've misunderstood the context of this > thread. It isn't hitting this exact case. The userspace testsuite is hitting an entirely different case: kill(getpid(), SIGFPE); As one expects, this generates a SIGFPE to the current process, which then inspects the siginfo structure. Being a userspace generated signal, si_code is set to SI_USER, which has the value 0. With FPE_FIXME defined to zero, as Eric has done: enum siginfo_layout siginfo_layout(int sig, int si_code) { enum siginfo_layout layout = SIL_KILL; if ((si_code > SI_USER) && (si_code < SI_KERNEL)) { ... } else { ... #ifdef FPE_FIXME if ((sig == SIGFPE) && (si_code == FPE_FIXME)) layout = SIL_FAULT; #endif } return layout; } This causes siginfo_layout() to return SIL_FAULT for this userspace generated signal, rather than the correct SIL_KILL. This affects which fields we copy to userspace. SI_USER is defined to pass si_pid and si_uid to the userspace process, which on ARM are the first two consecutive 32-bit quantities in the union, which is done when siginfo_layout() returns SIL_KILL. However, when SIL_FAULT is returned, we only copy si_addr in the union, which on ARM is just one 32-bit quantity. Consequently, userspace sees a correct value for si_pid, and si_uid remains set to whatever was there in userspace. In the case of the strace program, that's zero. This means if you run the strace testsuite as root, the problem doesn't appear, but if you run it as a non-root user, it will. So, the testsuite case has little to do with the behaviour of the VFP handling - it's to do with the behaviour of the kernel's signal handling.
On Fri, Apr 13, 2018 at 07:50:17PM +0100, Russell King - ARM Linux wrote: > On Fri, Apr 13, 2018 at 07:35:38PM +0100, Dave Martin wrote: > > If that's the case though, I don't see how a userspace testsuite is > > hitting this code path. Maybe I've misunderstood the context of this > > thread. > > It isn't hitting this exact case. > > The userspace testsuite is hitting an entirely different case: > > kill(getpid(), SIGFPE); > > As one expects, this generates a SIGFPE to the current process, which > then inspects the siginfo structure. Being a userspace generated > signal, si_code is set to SI_USER, which has the value 0. > > With FPE_FIXME defined to zero, as Eric has done: > > enum siginfo_layout siginfo_layout(int sig, int si_code) > { > enum siginfo_layout layout = SIL_KILL; > if ((si_code > SI_USER) && (si_code < SI_KERNEL)) { > ... > } else { > ... > #ifdef FPE_FIXME > if ((sig == SIGFPE) && (si_code == FPE_FIXME)) > layout = SIL_FAULT; > #endif > } > return layout; > } > > This causes siginfo_layout() to return SIL_FAULT for this userspace > generated signal, rather than the correct SIL_KILL. > > This affects which fields we copy to userspace. > > SI_USER is defined to pass si_pid and si_uid to the userspace process, > which on ARM are the first two consecutive 32-bit quantities in the > union, which is done when siginfo_layout() returns SIL_KILL. However, > when SIL_FAULT is returned, we only copy si_addr in the union, which > on ARM is just one 32-bit quantity. > > Consequently, userspace sees a correct value for si_pid, and si_uid > remains set to whatever was there in userspace. In the case of the > strace program, that's zero. This means if you run the strace > testsuite as root, the problem doesn't appear, but if you run it as > a non-root user, it will. > > So, the testsuite case has little to do with the behaviour of the VFP > handling - it's to do with the behaviour of the kernel's signal handling. Oh, right. So, going back to the unhandled VFP bounce question, is it reasonable for that to be a SIGKILL? That avoids the question of what si_code userspace should see, because userspace doesn't get to see siginfo at all in that case: it's dead. Or do we hit this in real situations that we want userspace to bail out of more gracefully? Cheers ---Dave
On Fri, Apr 13, 2018 at 11:45 AM, Dave Martin <Dave.Martin@arm.com> wrote: > > Most uses I've seen do nothing more than use the FPE_xyz value to > format diagnostic messages while dying. I struggled to find code that > made a meaningful functional decision based on the value, though that's > not proof... Yeah. I've seen code that cares about SIGFPE deeply, but it's almost invariably about some emulated environment (eg Java VM, or CPU emulation). And the siginfo data is basically never good enough for those environments anyway on its own, so they will go and look at the actual instruction that caused the fault and the register state instead, because they need *all* the information. The cases that use si_code are the ones that just trapped signals in order to give a more helpful abort message. So I could certainly imagine that si_code is actually used by somebody who then decides to actuall act differently on it, but aside from perhaps printing out a different message, it sounds far-fetched. Linus
On Fri, Apr 13, 2018 at 12:53:49PM -0700, Linus Torvalds wrote: > On Fri, Apr 13, 2018 at 11:45 AM, Dave Martin <Dave.Martin@arm.com> wrote: > > > > Most uses I've seen do nothing more than use the FPE_xyz value to > > format diagnostic messages while dying. I struggled to find code that > > made a meaningful functional decision based on the value, though that's > > not proof... > > Yeah. I've seen code that cares about SIGFPE deeply, but it's almost > invariably about some emulated environment (eg Java VM, or CPU > emulation). > > And the siginfo data is basically never good enough for those > environments anyway on its own, so they will go and look at the actual > instruction that caused the fault and the register state instead, > because they need *all* the information. > > The cases that use si_code are the ones that just trapped signals in > order to give a more helpful abort message. > > So I could certainly imagine that si_code is actually used by somebody > who then decides to actuall act differently on it, but aside from > perhaps printing out a different message, it sounds far-fetched. Okay, in that case let's just use FPE_FLTINV. That makes the patch easily back-portable for stable kernels.
Russell King - ARM Linux <linux@armlinux.org.uk> writes: > On Fri, Apr 13, 2018 at 12:53:49PM -0700, Linus Torvalds wrote: >> On Fri, Apr 13, 2018 at 11:45 AM, Dave Martin <Dave.Martin@arm.com> wrote: >> > >> > Most uses I've seen do nothing more than use the FPE_xyz value to >> > format diagnostic messages while dying. I struggled to find code that >> > made a meaningful functional decision based on the value, though that's >> > not proof... >> >> Yeah. I've seen code that cares about SIGFPE deeply, but it's almost >> invariably about some emulated environment (eg Java VM, or CPU >> emulation). >> >> And the siginfo data is basically never good enough for those >> environments anyway on its own, so they will go and look at the actual >> instruction that caused the fault and the register state instead, >> because they need *all* the information. >> >> The cases that use si_code are the ones that just trapped signals in >> order to give a more helpful abort message. >> >> So I could certainly imagine that si_code is actually used by somebody >> who then decides to actuall act differently on it, but aside from >> perhaps printing out a different message, it sounds far-fetched. > > Okay, in that case let's just use FPE_FLTINV. That makes the patch > easily back-portable for stable kernels. If we want to I don't think backporting 266da65e9156 ("signal: Add FPE_FLTUNK si_code for undiagnosable fp exceptions") would be at all difficult. What it is changing has been stable for quite a while. The surroundings might change and so it might require some trivial manual fixup but I don't expect any problems. Not that I want to derail the consensus but if we want to backport similar fixes for arm64 or the other architectures that wind up using FPE_FLTUNK for their fix we would need to backport 266da65e9156 anyway. Eric
Linus, Would you consider the patchset below for -rc2? Dealing with the aliases of SI_USER has been a challenge as we have had a b0rked ABI in some cases since 2.5. So far no one except myself has suggested that changing the si_code of from 0 to something else for those problematic aliases of SI_USER is going to be a problem. So it looks like just fixing the issue is a real possibility. Fixing the cases that do kill(SIGFPE, ...) because at least test cases care seems important. The best fixes to backport appear to be the real architecture fixes that remove the aliases for SI_USER as those are deep fixes that fundamentally fix the problems, and are also very small changes. I am not yet brave enough to merge architectural fixes like that without arch maintainers buy-in. Getting at least an ack if nothing else takes a little bit of time. Still we have a arm fix upthread and David Miller has given his nod to a sparc fix that uses FPE_FLTUNK. So it appears real architecture fixes are progressing. Further I have looked and that leaves only powerpc, parisc, ia64, and alpha. The new si_code FPE_FLTUNK appears to address most of those, and there is an untested patch for parisc. So real progress appears possible. The generic code can do better, and that is what this rfc patchset is about. It ensures siginfo is fully initialized and uses copy_to_user to copy siginfo to userspace. This takes siginfo_layout out of the picture and so for non-compat non-signalfd siginfos the status quo returns to what it was before I introduced siginfo_layout (AKA regressions go bye-bye). I believe given the issues these changes are a candiate for -rc2. Otherwise I will keep these changes for the next merge window. Eric W. Biederman (3): signal: Ensure every siginfo we send has all bits initialized signal: Reduce copy_siginfo_to_user to just copy_to_user signal: Stop special casing TRAP_FIXME and FPE_FIXME in siginfo_layout arch/alpha/kernel/osf_sys.c | 1 + arch/alpha/kernel/signal.c | 2 + arch/alpha/kernel/traps.c | 5 ++ arch/alpha/mm/fault.c | 2 + arch/arc/mm/fault.c | 2 + arch/arm/kernel/ptrace.c | 1 + arch/arm/kernel/swp_emulate.c | 1 + arch/arm/kernel/traps.c | 5 ++ arch/arm/mm/alignment.c | 1 + arch/arm/mm/fault.c | 5 ++ arch/arm/vfp/vfpmodule.c | 3 +- arch/arm64/kernel/fpsimd.c | 2 +- arch/arm64/kernel/sys_compat.c | 1 + arch/arm64/kernel/traps.c | 1 + arch/arm64/mm/fault.c | 18 ++++-- arch/c6x/kernel/traps.c | 1 + arch/hexagon/kernel/traps.c | 1 + arch/hexagon/mm/vm_fault.c | 1 + arch/ia64/kernel/brl_emu.c | 1 + arch/ia64/kernel/signal.c | 2 + arch/ia64/kernel/traps.c | 27 ++++++++- arch/ia64/kernel/unaligned.c | 1 + arch/ia64/mm/fault.c | 4 +- arch/m68k/kernel/traps.c | 2 + arch/microblaze/kernel/exceptions.c | 1 + arch/microblaze/mm/fault.c | 4 +- arch/mips/mm/fault.c | 1 + arch/nds32/kernel/traps.c | 6 +- arch/nds32/mm/fault.c | 1 + arch/nios2/kernel/traps.c | 1 + arch/openrisc/kernel/traps.c | 5 +- arch/openrisc/mm/fault.c | 1 + arch/parisc/kernel/ptrace.c | 1 + arch/parisc/kernel/traps.c | 2 + arch/parisc/kernel/unaligned.c | 1 + arch/parisc/math-emu/driver.c | 1 + arch/parisc/mm/fault.c | 1 + arch/powerpc/kernel/process.c | 1 + arch/powerpc/kernel/traps.c | 3 +- arch/powerpc/mm/fault.c | 1 + arch/powerpc/platforms/cell/spufs/fault.c | 2 +- arch/riscv/kernel/traps.c | 1 + arch/s390/kernel/traps.c | 5 +- arch/s390/mm/fault.c | 2 + arch/sh/kernel/hw_breakpoint.c | 1 + arch/sh/kernel/traps_32.c | 2 + arch/sh/math-emu/math.c | 1 + arch/sh/mm/fault.c | 1 + arch/sparc/kernel/process_64.c | 1 + arch/sparc/kernel/sys_sparc_32.c | 1 + arch/sparc/kernel/traps_32.c | 10 ++++ arch/sparc/kernel/traps_64.c | 14 +++++ arch/sparc/kernel/unaligned_32.c | 1 + arch/sparc/mm/fault_32.c | 1 + arch/sparc/mm/fault_64.c | 1 + arch/um/kernel/trap.c | 2 + arch/unicore32/kernel/fpu-ucf64.c | 2 +- arch/unicore32/mm/fault.c | 3 + arch/x86/entry/vsyscall/vsyscall_64.c | 2 +- arch/x86/kernel/ptrace.c | 2 +- arch/x86/kernel/traps.c | 3 + arch/x86/kernel/umip.c | 1 + arch/x86/kvm/mmu.c | 1 + arch/x86/mm/fault.c | 1 + arch/xtensa/kernel/traps.c | 1 + arch/xtensa/mm/fault.c | 1 + include/linux/ptrace.h | 1 - include/linux/tracehook.h | 1 + kernel/signal.c | 93 +------------------------------ virt/kvm/arm/mmu.c | 1 + 70 files changed, 165 insertions(+), 115 deletions(-) Eric
( On Sun, Apr 15, 2018 at 8:56 AM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > Would you consider the patchset below for -rc2? Ugh. I have an irrational dislike of "clear_siginfo()". It's a nasty broken interface, which just mis-spells "memset()", and makes it non-obvious that you could clear it other ways too (eg "kzalloc()" or whatever). And this series brings in a lot of new users. Honestly, both "clear_siginfo()" and "copy_siginfo()" are crazy interfaces. They may have made sense when converting code, but not so much now. If we want to have a function that initializes a siginfo, I think it should _actually_ initialize it, and at least take the two fields that a siginfo has to have to be valid: si_signo and si_code. So I'd rather see a patch that does something like this: - clear_siginfo(info); - info->si_signo = sig; - info->si_errno = 0; - info->si_code = SI_USER; - info->si_pid = 0; - info->si_uid = 0; + init_siginfo(info, sig, SI_USER); which at least makes that function be *worth* something, not just a bad spelling of memset. It's not just the removal of pointless "set to zero", it's the whole concept of "this function actually makes a valid siginfo", which is lacking in the existing function. (Yeah, yeah, si_errno is "generic" too and always part of a siginfo, but nobody cares. It's pretty much always set to zero, it would be stupid to add that interface to the "init_siginfo()" function. So just clearing it is fine, the one or two places that want to set it to some silly value can do it themselves). Then your series would incidentally also: (a) make for fewer lines overall, rather than add lines (b) make it clear that we always initialize si_code, which now *must* be a valid value with all the recent siginfo changes. Hmm? The other thing we should do is to get rid of the stupid padding. Right now "struct siginfo" is pointlessly padded to 128 bytes. That is completely insane, when it's always just zero in the kernel. So put that _pad[] thing inside #ifndef __KERNEL__, and make copy_siginfo_to_user() write the padding zeroes when copying to user space. The reason for the padding is "future expansion", so we do want to tell the user space that it's maybe up to 128 bytes in size, but if we don't fill it all, we shouldn't waste time and memory on clearing the padding internally. I'm certainly *hoping* nobody depends on the whole 128 bytes in rt_sigqueueinfo(). In theory you can fill it all (as long as si_code is negative), but the man-page only says "si_value", and the compat function doesn't copy any more than that either, so any user that tries to fill in more than si_value is already broken. In fact, it might even be worth enforcing that in rt_sigqueueinfo(), just to see if anybody plays any games.. On x86-64, without the pointless padding, the size of 'struct siginfo' inside the kernel would be 48 bytes. That's quite a big difference for something that is often allocated on the kernel stack. So I'm certainly willing to make those kinds of changes, but let's make them real *improvements* now, ok? Wasn't that the point of all the cleanups in the end? Linus
Linus Torvalds <torvalds@linux-foundation.org> writes: > ( > > On Sun, Apr 15, 2018 at 8:56 AM, Eric W. Biederman > <ebiederm@xmission.com> wrote: >> >> Would you consider the patchset below for -rc2? > > Ugh. The point of this series is to squash the potential for regressions even from the weird broken code that fills in fields for si_code 0 that do not match SI_USER. The copy_siginfo_to_user32 never handled that so we don't need to worry about that. All of the signals that abuse si_code 0 go through force_sig_info so signalfd can not catch them. Which means that only copy_siginfo_to_user needs to be worried about. Last time I was benchmarking I could not see a difference between copying the entire siginfo and only a small part so I don't see the point of optimizing now. For an actual cleanup I intend to go farther than you are proposing and convert everthing to the set of helper functions I have added to kernel/signal.c force_sig_fault, force_sig_mceerr, force_sig_bnderr, force_sig_pkuerr. When I am done there will be maybe 5 instances of clear_siginfo outside of those helper functions. At which point I won't care if we remove clear_siginfo and just use memset. That is a real improvement that looks something like: "105 files changed, 933 insertions(+), 1698 deletions(-)" That is my end goal with all of this. You complained to me about regressions and you are right with the current handling of FPE_FIXME TRAP_FIXME the code is not bug compatible with old versions of linux, and people have noticed. What this patchset represents is the bare minimum needed to convert copy_siginfo_to_user to a copy_to_user, and remove the possibility of regressions. To that end I need to ensure every struct siginfo in the entire kernel is memset to 0. A structure initializer is absolutely not enough. So I don't mind people thinking clear_siginfo is necessary. To that end clear_siginfo where it does not take the size of struct siginfo as a parameter is much less suceptible to typos, and allows me to ensure every instance of struct siginfo is fully initialized. I fully intend to remove practically every instance of struct siginfo from the architecture specific code, and use the aforementioned helpers. The trouble is that some of the architecture code need refactoring for that to happen. Even your proposed init_siginfo can't be widely used as a common pattern is to fill in siginfo partially in the code with si_code and si_signo being filled in almost last. So in short. I intend to remove most of these clear_siginfo calls when I remove the siginfos later. This series is focused on making copy_siginfo_to_user simple enough we don't need to use siginfo_layout, and thus can be fully backwards compatibile with ourselves and we won't need to worry about regressions. I have aimed to keep this simple enough we can merge this after -rc1 because we don't want regressions. Eric ps. I intend to place this change first in my series wether or not it makes it into -rc2 so that I can be certain we remove any possible regressions in behavior on the buggy architectures. Then I can take my time and ensure the non-trivial changes of refactoring etc are done carefully so I don't introduce other bugs. I need that so I can sleep at night. pps. I can look at some of your other suggestions but cleverness leads to regressions, and if you are going to complain at me harshly when I have been being careful and taking things seriously I am not particularly willing to take unnecessary chances.
Linus Torvalds <torvalds@linux-foundation.org> writes: > ( > > On Sun, Apr 15, 2018 at 8:56 AM, Eric W. Biederman > <ebiederm@xmission.com> wrote: [snip bit about wanting what is effectively force_sig_fault instead of clear_siginfo everywhere] > The other thing we should do is to get rid of the stupid padding. > Right now "struct siginfo" is pointlessly padded to 128 bytes. That is > completely insane, when it's always just zero in the kernel. > > So put that _pad[] thing inside #ifndef __KERNEL__, and make > copy_siginfo_to_user() write the padding zeroes when copying to user > space. The reason for the padding is "future expansion", so we do want > to tell the user space that it's maybe up to 128 bytes in size, but if > we don't fill it all, we shouldn't waste time and memory on clearing > the padding internally. > > I'm certainly *hoping* nobody depends on the whole 128 bytes in > rt_sigqueueinfo(). In theory you can fill it all (as long as si_code > is negative), but the man-page only says "si_value", and the compat > function doesn't copy any more than that either, so any user that > tries to fill in more than si_value is already broken. In fact, it > might even be worth enforcing that in rt_sigqueueinfo(), just to see > if anybody plays any games.. From my earlier looking I think this is all doable except detecting if > On x86-64, without the pointless padding, the size of 'struct siginfo' > inside the kernel would be 48 bytes. That's quite a big difference for > something that is often allocated on the kernel stack. From my earlier looking I can say that I know of no case where signals are injected into the kernel that we need more bytes than the kernel currently provides. The two cases I know of are signal reinjection for checkpoint/restart or something that just uses pid, uid and ptr. Generally that is enough. If we just truncate siginfo to everything except the pad bytes in the kernel there should be no problems. What I don't see how to do is to limit the size of struct siginfo the kernel accepts in a forward compatible way. Something that would fail if for some reason you used more siginfo bytes. Looking at userspace. glibc always memsets siginfo_t to 0. Criu just uses whatever it captured with PTRACE_PEEKSIGINFO, and criu uses unitialized memory for the target of PTRACE_PEEKSIGINFO. If we truncate siginfo in the kernel then we check for a known si_code in which case we are safe to truncate siginfo. If the si_code is unknown then we should check to see if the extra bytes are 0. That should work with everything. Plus it is a natural point to test to see if userspace is using signals that the kernel does not currently support. I will put that in my queue. > So I'm certainly willing to make those kinds of changes, but let's > make them real *improvements* now, ok? Wasn't that the point of all > the cleanups in the end? Definitely. With the strace test case causing people to talk about regressions I was just looking to see if it would make sense to do something minimal for -rc2 so reduce concerns about regressions. Now I am going to focus on getting what I can ready for the next merge window. Eric
On Sun, Apr 15, 2018 at 11:16:04AM -0700, Linus Torvalds wrote: [...] > The other thing we should do is to get rid of the stupid padding. > Right now "struct siginfo" is pointlessly padded to 128 bytes. That is > completely insane, when it's always just zero in the kernel. Agreed, inside the kernel the padding achieves nothing. > So put that _pad[] thing inside #ifndef __KERNEL__, and make > copy_siginfo_to_user() write the padding zeroes when copying to user > space. The reason for the padding is "future expansion", so we do want > to tell the user space that it's maybe up to 128 bytes in size, but if > we don't fill it all, we shouldn't waste time and memory on clearing > the padding internally. > > I'm certainly *hoping* nobody depends on the whole 128 bytes in > rt_sigqueueinfo(). In theory you can fill it all (as long as si_code > is negative), but the man-page only says "si_value", and the compat > function doesn't copy any more than that either, so any user that > tries to fill in more than si_value is already broken. In fact, it > might even be worth enforcing that in rt_sigqueueinfo(), just to see > if anybody plays any games.. [...] Digression: Since we don't traditionally zero the tail-padding in the user sigframe, is there a reliable way for userspace to detect newly-added fields in siginfo other than by having an explicit sigaction sa_flags flag to request them? I imagine something like [1] below from the userspace perspective. On a separate thread, the issue of how to report syndrome information for SIGSEGV came up [2] (such as whether the faulting instruction was a read or a write). This information is useful (and used) by things like userspace sanitisers and qemu. Currently, reporting this to userspace relies on arch-specific cruft in the sigframe. We're committed to maintaining what's already in each arch sigframe, but it would be preferable to have a portable way of adding information to siginfo in a generic way. si_code doesn't really work for that, since si_codes are mutually exclusive: I can't see a way of adding supplementary information using si_code. Anyway, that would be a separate RFC in the future (if ever). Cheers ---Dave [1] static volatile int have_extflags = 0; static void handler(int n, siginfo_t *si, void *uc) { /* ... */ if (have_extflags) { /* Check si->si_extflags */ } else { /* fallback */ } /* ... */ } int main(void) { /* ... */ struct sigaction sa; /* ... */ sa.sa_flags = SA_SIGINFO | SA_SIGINFO_EXTFLAGS; sa.sa_sigaction = handler; if (!sigaction(SIGSEGV, &sa, NULL)) { have_extflags = 1; } else { sa.sa_flags &= ~SA_SIGINFO_EXTFLAGS; if (sigaction(SIGSEGV, &sa, NULL)) goto error; } /* ... */ } [2] [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on kernel VA http://lists.infradead.org/pipermail/linux-arm-kernel/2018-April/571428.html
Dave Martin <Dave.Martin@arm.com> writes: > On Sun, Apr 15, 2018 at 11:16:04AM -0700, Linus Torvalds wrote: > > [...] > >> The other thing we should do is to get rid of the stupid padding. >> Right now "struct siginfo" is pointlessly padded to 128 bytes. That is >> completely insane, when it's always just zero in the kernel. > > Agreed, inside the kernel the padding achieves nothing. > >> So put that _pad[] thing inside #ifndef __KERNEL__, and make >> copy_siginfo_to_user() write the padding zeroes when copying to user >> space. The reason for the padding is "future expansion", so we do want >> to tell the user space that it's maybe up to 128 bytes in size, but if >> we don't fill it all, we shouldn't waste time and memory on clearing >> the padding internally. >> >> I'm certainly *hoping* nobody depends on the whole 128 bytes in >> rt_sigqueueinfo(). In theory you can fill it all (as long as si_code >> is negative), but the man-page only says "si_value", and the compat >> function doesn't copy any more than that either, so any user that >> tries to fill in more than si_value is already broken. In fact, it >> might even be worth enforcing that in rt_sigqueueinfo(), just to see >> if anybody plays any games.. > > [...] > > Digression: > > Since we don't traditionally zero the tail-padding in the user sigframe, > is there a reliable way for userspace to detect newly-added fields in > siginfo other than by having an explicit sigaction sa_flags flag to > request them? I imagine something like [1] below from the userspace > perspective. > > On a separate thread, the issue of how to report syndrome information > for SIGSEGV came up [2] (such as whether the faulting instruction was a > read or a write). This information is useful (and used) by things like > userspace sanitisers and qemu. Currently, reporting this to userspace > relies on arch-specific cruft in the sigframe. > > We're committed to maintaining what's already in each arch sigframe, > but it would be preferable to have a portable way of adding information > to siginfo in a generic way. si_code doesn't really work for that, > since si_codes are mutually exclusive: I can't see a way of adding > supplementary information using si_code. > > Anyway, that would be a separate RFC in the future (if ever). So far what I have seen done is ``registers'' added to sigcontext. Which it looks like you have done with esr. Scrubbing information from faults to where the addresses point outside of the userspace mapping makes sense. I think before I would pursue what you are talking about on a generic level I would want to look at the fact that we handle unblockable faults wrong. While unlikely it is possible for someone to send a thread specific signal at just the right time, and have that signal delivered before the synchronous fault. Then we could pass through additional arguments through that new ``generic'' path. Especially what are arguments such as tsk->thread.fault_address and tsk->thread.fault_code. We can do anything we like with a new SA_flag as that allows us to change the format of the sigframe. If we are very careful we can add generic fields after that crazy union anonymous union in the _sigfault case of struct siginfo. The trick would be to find something that would be enough so that people don't need to implement their own instruction decoder to see what is going on. Something that is applicable to every sigfault case not just SIGSEGV. Something that we can and want to implement on multiple architectures. The point being doing something generic can be a lot of work, even if it is worth it in the end. > [1] > > static volatile int have_extflags = 0; > > static void handler(int n, siginfo_t *si, void *uc) > { > /* ... */ > > if (have_extflags) { > /* Check si->si_extflags */ > } else { > /* fallback */ > } > > /* ... */ > } > > int main(void) > { > /* ... */ > > struct sigaction sa; > > /* ... */ > > sa.sa_flags = SA_SIGINFO | SA_SIGINFO_EXTFLAGS; > sa.sa_sigaction = handler; > if (!sigaction(SIGSEGV, &sa, NULL)) { > have_extflags = 1; > } else { > sa.sa_flags &= ~SA_SIGINFO_EXTFLAGS; > if (sigaction(SIGSEGV, &sa, NULL)) > goto error; > } > > /* ... */ > } > > [2] [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on kernel VA > http://lists.infradead.org/pipermail/linux-arm-kernel/2018-April/571428.html Eric
arch/arm/include/uapi/asm/siginfo.h | 13 ------------- arch/arm/vfp/vfpmodule.c | 2 +- 2 files changed, 1 insertion(+), 14 deletions(-) diff --git a/arch/arm/include/uapi/asm/siginfo.h b/arch/arm/include/uapi/asm/siginfo.h deleted file mode 100644 index d0513880be21..000000000000 --- a/arch/arm/include/uapi/asm/siginfo.h +++ /dev/null @@ -1,13 +0,0 @@ -#ifndef __ASM_SIGINFO_H -#define __ASM_SIGINFO_H - -#include <asm-generic/siginfo.h> - -/* - * SIGFPE si_codes - */ -#ifdef __KERNEL__ -#define FPE_FIXME 0 /* Broken dup of SI_USER */ -#endif /* __KERNEL__ */ - -#endif diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index 4c375e11ae95..af4ee2cef2f9 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -257,7 +257,7 @@ static void vfp_raise_exceptions(u32 exceptions, u32 inst, u32 fpscr, struct pt_ if (exceptions == VFP_EXCEPTION_ERROR) { vfp_panic("unhandled bounce", inst); - vfp_raise_sigfpe(FPE_FIXME, regs); + vfp_raise_sigfpe(FPE_FLTINV, regs); return; }