Message ID | CALCETrUUyBzPv4ZqXJ6pC8YVTZBs+rw305f53yw_xc1xi6EEjw@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, Aug 13, 2017 at 10:53 PM, Andy Lutomirski <luto@kernel.org> wrote: > On Sun, Aug 13, 2017 at 7:44 PM, Brian Gerst <brgerst@gmail.com> wrote: >> On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski <luto@kernel.org> wrote: >>> /* Normal 64-bit system call target */ >>> ENTRY(xen_syscall_target) >>> - undo_xen_syscall >>> - jmp entry_SYSCALL_64_after_swapgs >>> + popq %rcx >>> + popq %r11 >>> + jmp entry_SYSCALL_64_after_hwframe >>> ENDPROC(xen_syscall_target) >>> >>> #ifdef CONFIG_IA32_EMULATION >>> >>> /* 32-bit compat syscall target */ >>> ENTRY(xen_syscall32_target) >>> - undo_xen_syscall >>> - jmp entry_SYSCALL_compat >>> + popq %rcx >>> + popq %r11 >>> + jmp entry_SYSCALL_compat_after_hwframe >>> ENDPROC(xen_syscall32_target) >>> >>> /* 32-bit compat sysenter target */ >>> ENTRY(xen_sysenter_target) >>> - undo_xen_syscall >>> + mov 0*8(%rsp), %rcx >>> + mov 1*8(%rsp), %r11 >>> + mov 5*8(%rsp), %rsp >>> jmp entry_SYSENTER_compat >>> ENDPROC(xen_sysenter_target) >> >> This patch causes the iopl_32 and ioperm_32 self-tests to fail on a >> 64-bit PV kernel. The 64-bit versions pass. It gets a seg fault after >> "parent: write to 0x80 (should fail)", and the fault isn't caught by >> the signal handler. It just dumps back to the shell. The tests pass >> after reverting this. > > I can reproduce it if I emulate an AMD machine. I can "fix" it like this: > > diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S > index a8a4f4c460a6..6255e00f425e 100644 > --- a/arch/x86/xen/xen-asm_64.S > +++ b/arch/x86/xen/xen-asm_64.S > @@ -97,6 +97,9 @@ ENDPROC(xen_syscall_target) > ENTRY(xen_syscall32_target) > popq %rcx > popq %r11 > + movq $__USER32_DS, 4*8(%rsp) > + movq $__USER32_CS, 1*8(%rsp) > + movq %r11, 2*8(%rsp) > jmp entry_SYSCALL_compat_after_hwframe > ENDPROC(xen_syscall32_target) > > but I haven't tried to diagnose precisely what's going on. > > Xen seems to be putting the 0xe0?? values in ss and cs, which oughtn't > to be a problem, but it kills opportunistic sysretl. Maybe that's > triggering a preexisting bug? It is indeed triggering an existing but, but that bug is not a kernel bug :) It's this thing: https://sourceware.org/bugzilla/show_bug.cgi?id=21269 See, we have this old legacy garbage in which, when running with nonstandard SS, a certain special, otherwise nonsensical input to sigaction() causes a stack switch. Xen PV runs user code with a nonstandard SS, and glibc accidentally passes this weird parameter to sigaction() on a regular basis. With this patch applied, the kernel suddenly starts to *realize* that ss is weird, and boom. (Or maybe it increases the chance that SS is actually weird, since I'd expect this to trip on #GP, not SYSCALL. But I don't care quite enough to dig further.) Patch coming.
On 14/08/2017 06:53, Andy Lutomirski wrote: > On Sun, Aug 13, 2017 at 7:44 PM, Brian Gerst <brgerst@gmail.com> wrote: >> On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski <luto@kernel.org> wrote: >>> /* Normal 64-bit system call target */ >>> ENTRY(xen_syscall_target) >>> - undo_xen_syscall >>> - jmp entry_SYSCALL_64_after_swapgs >>> + popq %rcx >>> + popq %r11 >>> + jmp entry_SYSCALL_64_after_hwframe >>> ENDPROC(xen_syscall_target) >>> >>> #ifdef CONFIG_IA32_EMULATION >>> >>> /* 32-bit compat syscall target */ >>> ENTRY(xen_syscall32_target) >>> - undo_xen_syscall >>> - jmp entry_SYSCALL_compat >>> + popq %rcx >>> + popq %r11 >>> + jmp entry_SYSCALL_compat_after_hwframe >>> ENDPROC(xen_syscall32_target) >>> >>> /* 32-bit compat sysenter target */ >>> ENTRY(xen_sysenter_target) >>> - undo_xen_syscall >>> + mov 0*8(%rsp), %rcx >>> + mov 1*8(%rsp), %r11 >>> + mov 5*8(%rsp), %rsp >>> jmp entry_SYSENTER_compat >>> ENDPROC(xen_sysenter_target) >> This patch causes the iopl_32 and ioperm_32 self-tests to fail on a >> 64-bit PV kernel. The 64-bit versions pass. It gets a seg fault after >> "parent: write to 0x80 (should fail)", and the fault isn't caught by >> the signal handler. It just dumps back to the shell. The tests pass >> after reverting this. > I can reproduce it if I emulate an AMD machine. I can "fix" it like this: > > diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S > index a8a4f4c460a6..6255e00f425e 100644 > --- a/arch/x86/xen/xen-asm_64.S > +++ b/arch/x86/xen/xen-asm_64.S > @@ -97,6 +97,9 @@ ENDPROC(xen_syscall_target) > ENTRY(xen_syscall32_target) > popq %rcx > popq %r11 > + movq $__USER32_DS, 4*8(%rsp) > + movq $__USER32_CS, 1*8(%rsp) > + movq %r11, 2*8(%rsp) > jmp entry_SYSCALL_compat_after_hwframe > ENDPROC(xen_syscall32_target) > > but I haven't tried to diagnose precisely what's going on. > > Xen seems to be putting the 0xe0?? values in ss and cs, which oughtn't > to be a problem, but it kills opportunistic sysretl. Maybe that's > triggering a preexisting bug? The reason %rcx/%r11 are extra on the stack is because Xen uses sysret to get here. This is part of the 64bit PV ABI. %cs will be 0xe033 (FLAT_CS64) and %ss will be 0xe02b (FLAT_SS64). I would expect it to kill opportunistic sysret, as Xen's and Linux's idea of using sysret differ. ~Andrew
On Mon, Aug 14, 2017 at 1:53 AM, Andy Lutomirski <luto@kernel.org> wrote: > On Sun, Aug 13, 2017 at 7:44 PM, Brian Gerst <brgerst@gmail.com> wrote: >> On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski <luto@kernel.org> wrote: >>> /* Normal 64-bit system call target */ >>> ENTRY(xen_syscall_target) >>> - undo_xen_syscall >>> - jmp entry_SYSCALL_64_after_swapgs >>> + popq %rcx >>> + popq %r11 >>> + jmp entry_SYSCALL_64_after_hwframe >>> ENDPROC(xen_syscall_target) >>> >>> #ifdef CONFIG_IA32_EMULATION >>> >>> /* 32-bit compat syscall target */ >>> ENTRY(xen_syscall32_target) >>> - undo_xen_syscall >>> - jmp entry_SYSCALL_compat >>> + popq %rcx >>> + popq %r11 >>> + jmp entry_SYSCALL_compat_after_hwframe >>> ENDPROC(xen_syscall32_target) >>> >>> /* 32-bit compat sysenter target */ >>> ENTRY(xen_sysenter_target) >>> - undo_xen_syscall >>> + mov 0*8(%rsp), %rcx >>> + mov 1*8(%rsp), %r11 >>> + mov 5*8(%rsp), %rsp >>> jmp entry_SYSENTER_compat >>> ENDPROC(xen_sysenter_target) >> >> This patch causes the iopl_32 and ioperm_32 self-tests to fail on a >> 64-bit PV kernel. The 64-bit versions pass. It gets a seg fault after >> "parent: write to 0x80 (should fail)", and the fault isn't caught by >> the signal handler. It just dumps back to the shell. The tests pass >> after reverting this. > > I can reproduce it if I emulate an AMD machine. I can "fix" it like this: Yes, this is an AMD processor. > diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S > index a8a4f4c460a6..6255e00f425e 100644 > --- a/arch/x86/xen/xen-asm_64.S > +++ b/arch/x86/xen/xen-asm_64.S > @@ -97,6 +97,9 @@ ENDPROC(xen_syscall_target) > ENTRY(xen_syscall32_target) > popq %rcx > popq %r11 > + movq $__USER32_DS, 4*8(%rsp) > + movq $__USER32_CS, 1*8(%rsp) > + movq %r11, 2*8(%rsp) > jmp entry_SYSCALL_compat_after_hwframe > ENDPROC(xen_syscall32_target) > > but I haven't tried to diagnose precisely what's going on. > > Xen seems to be putting the 0xe0?? values in ss and cs, which oughtn't > to be a problem, but it kills opportunistic sysretl. Maybe that's > triggering a preexisting bug? Resetting the CS/SS values worked. Looking at the Xen hypervisor code, EFLAGS on the stack should already be set to the value in R11, so that part doesn't appear necessary. Shouldn't this also be done for the 64-bit SYSCALL entry, for consistency with previous code? -- Brian Gerst
diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S index a8a4f4c460a6..6255e00f425e 100644 --- a/arch/x86/xen/xen-asm_64.S +++ b/arch/x86/xen/xen-asm_64.S @@ -97,6 +97,9 @@ ENDPROC(xen_syscall_target) ENTRY(xen_syscall32_target) popq %rcx popq %r11 + movq $__USER32_DS, 4*8(%rsp) + movq $__USER32_CS, 1*8(%rsp) + movq %r11, 2*8(%rsp) jmp entry_SYSCALL_compat_after_hwframe ENDPROC(xen_syscall32_target)