[v2] x86/xen/64: Rearrange the SYSCALL entries

Message ID	CALCETrUUyBzPv4ZqXJ6pC8YVTZBs+rw305f53yw_xc1xi6EEjw@mail.gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AA861239F6 MIME-Version: 1.0 In-Reply-To: <CAMzpN2h7UyjJ3034ospL-_eZbiEFgi85XCZRX=sQU2rAkzpzqg@mail.gmail.com> References: <7c88ed36805d36841ab03ec3b48b4122c4418d71.1502164668.git.luto@kernel.org> <CAMzpN2h7UyjJ3034ospL-_eZbiEFgi85XCZRX=sQU2rAkzpzqg@mail.gmail.com> From: Andy Lutomirski <luto@kernel.org> Date: Sun, 13 Aug 2017 22:53:36 -0700 Message-ID: <CALCETrUUyBzPv4ZqXJ6pC8YVTZBs+rw305f53yw_xc1xi6EEjw@mail.gmail.com> To: Brian Gerst <brgerst@gmail.com>, Andrew Cooper <andrew.cooper3@citrix.com> Cc: Juergen Gross <jgross@suse.com>, X86 ML <x86@kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Andy Lutomirski <luto@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>, "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>, Boris Ostrovsky <boris.ostrovsky@oracle.com>, Borislav Petkov <bpetkov@suse.de> Subject: Re: [Xen-devel] [PATCH v2] x86/xen/64: Rearrange the SYSCALL entries Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Message ID

CALCETrUUyBzPv4ZqXJ6pC8YVTZBs+rw305f53yw_xc1xi6EEjw@mail.gmail.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AA861239F6
MIME-Version: 1.0
In-Reply-To: <CAMzpN2h7UyjJ3034ospL-_eZbiEFgi85XCZRX=sQU2rAkzpzqg@mail.gmail.com>
References: <7c88ed36805d36841ab03ec3b48b4122c4418d71.1502164668.git.luto@kernel.org>
	<CAMzpN2h7UyjJ3034ospL-_eZbiEFgi85XCZRX=sQU2rAkzpzqg@mail.gmail.com>
From: Andy Lutomirski <luto@kernel.org>
Date: Sun, 13 Aug 2017 22:53:36 -0700
Message-ID: <CALCETrUUyBzPv4ZqXJ6pC8YVTZBs+rw305f53yw_xc1xi6EEjw@mail.gmail.com>
To: Brian Gerst <brgerst@gmail.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Juergen Gross <jgross@suse.com>, X86 ML <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>, 
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Borislav Petkov <bpetkov@suse.de>
Subject: Re: [Xen-devel] [PATCH v2] x86/xen/64: Rearrange the SYSCALL entries
Precedence: list
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Commit Message

Andy Lutomirski Aug. 14, 2017, 5:53 a.m. UTC

On Sun, Aug 13, 2017 at 7:44 PM, Brian Gerst <brgerst@gmail.com> wrote:
> On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>  /* Normal 64-bit system call target */
>>  ENTRY(xen_syscall_target)
>> -       undo_xen_syscall
>> -       jmp entry_SYSCALL_64_after_swapgs
>> +       popq %rcx
>> +       popq %r11
>> +       jmp entry_SYSCALL_64_after_hwframe
>>  ENDPROC(xen_syscall_target)
>>
>>  #ifdef CONFIG_IA32_EMULATION
>>
>>  /* 32-bit compat syscall target */
>>  ENTRY(xen_syscall32_target)
>> -       undo_xen_syscall
>> -       jmp entry_SYSCALL_compat
>> +       popq %rcx
>> +       popq %r11
>> +       jmp entry_SYSCALL_compat_after_hwframe
>>  ENDPROC(xen_syscall32_target)
>>
>>  /* 32-bit compat sysenter target */
>>  ENTRY(xen_sysenter_target)
>> -       undo_xen_syscall
>> +       mov 0*8(%rsp), %rcx
>> +       mov 1*8(%rsp), %r11
>> +       mov 5*8(%rsp), %rsp
>>         jmp entry_SYSENTER_compat
>>  ENDPROC(xen_sysenter_target)
>
> This patch causes the iopl_32 and ioperm_32 self-tests to fail on a
> 64-bit PV kernel.  The 64-bit versions pass. It gets a seg fault after
> "parent: write to 0x80 (should fail)", and the fault isn't caught by
> the signal handler.  It just dumps back to the shell.  The tests pass
> after reverting this.

I can reproduce it if I emulate an AMD machine.  I can "fix" it like this:

but I haven't tried to diagnose precisely what's going on.

Xen seems to be putting the 0xe0?? values in ss and cs, which oughtn't
to be a problem, but it kills opportunistic sysretl.  Maybe that's
triggering a preexisting bug?

Comments

Andy Lutomirski Aug. 14, 2017, 6:42 a.m. UTC | #1

On Sun, Aug 13, 2017 at 10:53 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Sun, Aug 13, 2017 at 7:44 PM, Brian Gerst <brgerst@gmail.com> wrote:
>> On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>>  /* Normal 64-bit system call target */
>>>  ENTRY(xen_syscall_target)
>>> -       undo_xen_syscall
>>> -       jmp entry_SYSCALL_64_after_swapgs
>>> +       popq %rcx
>>> +       popq %r11
>>> +       jmp entry_SYSCALL_64_after_hwframe
>>>  ENDPROC(xen_syscall_target)
>>>
>>>  #ifdef CONFIG_IA32_EMULATION
>>>
>>>  /* 32-bit compat syscall target */
>>>  ENTRY(xen_syscall32_target)
>>> -       undo_xen_syscall
>>> -       jmp entry_SYSCALL_compat
>>> +       popq %rcx
>>> +       popq %r11
>>> +       jmp entry_SYSCALL_compat_after_hwframe
>>>  ENDPROC(xen_syscall32_target)
>>>
>>>  /* 32-bit compat sysenter target */
>>>  ENTRY(xen_sysenter_target)
>>> -       undo_xen_syscall
>>> +       mov 0*8(%rsp), %rcx
>>> +       mov 1*8(%rsp), %r11
>>> +       mov 5*8(%rsp), %rsp
>>>         jmp entry_SYSENTER_compat
>>>  ENDPROC(xen_sysenter_target)
>>
>> This patch causes the iopl_32 and ioperm_32 self-tests to fail on a
>> 64-bit PV kernel.  The 64-bit versions pass. It gets a seg fault after
>> "parent: write to 0x80 (should fail)", and the fault isn't caught by
>> the signal handler.  It just dumps back to the shell.  The tests pass
>> after reverting this.
>
> I can reproduce it if I emulate an AMD machine.  I can "fix" it like this:
>
> diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
> index a8a4f4c460a6..6255e00f425e 100644
> --- a/arch/x86/xen/xen-asm_64.S
> +++ b/arch/x86/xen/xen-asm_64.S
> @@ -97,6 +97,9 @@ ENDPROC(xen_syscall_target)
>  ENTRY(xen_syscall32_target)
>         popq %rcx
>         popq %r11
> +       movq $__USER32_DS, 4*8(%rsp)
> +       movq $__USER32_CS, 1*8(%rsp)
> +       movq %r11, 2*8(%rsp)
>         jmp entry_SYSCALL_compat_after_hwframe
>  ENDPROC(xen_syscall32_target)
>
> but I haven't tried to diagnose precisely what's going on.
>
> Xen seems to be putting the 0xe0?? values in ss and cs, which oughtn't
> to be a problem, but it kills opportunistic sysretl.  Maybe that's
> triggering a preexisting bug?

It is indeed triggering an existing but, but that bug is not a kernel
bug :)  It's this thing:

https://sourceware.org/bugzilla/show_bug.cgi?id=21269

See, we have this old legacy garbage in which, when running with
nonstandard SS, a certain special, otherwise nonsensical input to
sigaction() causes a stack switch.  Xen PV runs user code with a
nonstandard SS, and glibc accidentally passes this weird parameter to
sigaction() on a regular basis.  With this patch applied, the kernel
suddenly starts to *realize* that ss is weird, and boom.  (Or maybe it
increases the chance that SS is actually weird, since I'd expect this
to trip on #GP, not SYSCALL.  But I don't care quite enough to dig
further.)

Patch coming.

Andrew Cooper Aug. 14, 2017, 6:45 a.m. UTC | #2

On 14/08/2017 06:53, Andy Lutomirski wrote:
> On Sun, Aug 13, 2017 at 7:44 PM, Brian Gerst <brgerst@gmail.com> wrote:
>> On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>>  /* Normal 64-bit system call target */
>>>  ENTRY(xen_syscall_target)
>>> -       undo_xen_syscall
>>> -       jmp entry_SYSCALL_64_after_swapgs
>>> +       popq %rcx
>>> +       popq %r11
>>> +       jmp entry_SYSCALL_64_after_hwframe
>>>  ENDPROC(xen_syscall_target)
>>>
>>>  #ifdef CONFIG_IA32_EMULATION
>>>
>>>  /* 32-bit compat syscall target */
>>>  ENTRY(xen_syscall32_target)
>>> -       undo_xen_syscall
>>> -       jmp entry_SYSCALL_compat
>>> +       popq %rcx
>>> +       popq %r11
>>> +       jmp entry_SYSCALL_compat_after_hwframe
>>>  ENDPROC(xen_syscall32_target)
>>>
>>>  /* 32-bit compat sysenter target */
>>>  ENTRY(xen_sysenter_target)
>>> -       undo_xen_syscall
>>> +       mov 0*8(%rsp), %rcx
>>> +       mov 1*8(%rsp), %r11
>>> +       mov 5*8(%rsp), %rsp
>>>         jmp entry_SYSENTER_compat
>>>  ENDPROC(xen_sysenter_target)
>> This patch causes the iopl_32 and ioperm_32 self-tests to fail on a
>> 64-bit PV kernel.  The 64-bit versions pass. It gets a seg fault after
>> "parent: write to 0x80 (should fail)", and the fault isn't caught by
>> the signal handler.  It just dumps back to the shell.  The tests pass
>> after reverting this.
> I can reproduce it if I emulate an AMD machine.  I can "fix" it like this:
>
> diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
> index a8a4f4c460a6..6255e00f425e 100644
> --- a/arch/x86/xen/xen-asm_64.S
> +++ b/arch/x86/xen/xen-asm_64.S
> @@ -97,6 +97,9 @@ ENDPROC(xen_syscall_target)
>  ENTRY(xen_syscall32_target)
>         popq %rcx
>         popq %r11
> +       movq $__USER32_DS, 4*8(%rsp)
> +       movq $__USER32_CS, 1*8(%rsp)
> +       movq %r11, 2*8(%rsp)
>         jmp entry_SYSCALL_compat_after_hwframe
>  ENDPROC(xen_syscall32_target)
>
> but I haven't tried to diagnose precisely what's going on.
>
> Xen seems to be putting the 0xe0?? values in ss and cs, which oughtn't
> to be a problem, but it kills opportunistic sysretl.  Maybe that's
> triggering a preexisting bug?

The reason %rcx/%r11 are extra on the stack is because Xen uses sysret
to get here.  This is part of the 64bit PV ABI.

%cs will be 0xe033 (FLAT_CS64) and %ss will be 0xe02b (FLAT_SS64).

I would expect it to kill opportunistic sysret, as Xen's and Linux's
idea of using sysret differ.

~Andrew

Brian Gerst Aug. 14, 2017, 12:46 p.m. UTC | #3

On Mon, Aug 14, 2017 at 1:53 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On Sun, Aug 13, 2017 at 7:44 PM, Brian Gerst <brgerst@gmail.com> wrote:
>> On Mon, Aug 7, 2017 at 11:59 PM, Andy Lutomirski <luto@kernel.org> wrote:
>>>  /* Normal 64-bit system call target */
>>>  ENTRY(xen_syscall_target)
>>> -       undo_xen_syscall
>>> -       jmp entry_SYSCALL_64_after_swapgs
>>> +       popq %rcx
>>> +       popq %r11
>>> +       jmp entry_SYSCALL_64_after_hwframe
>>>  ENDPROC(xen_syscall_target)
>>>
>>>  #ifdef CONFIG_IA32_EMULATION
>>>
>>>  /* 32-bit compat syscall target */
>>>  ENTRY(xen_syscall32_target)
>>> -       undo_xen_syscall
>>> -       jmp entry_SYSCALL_compat
>>> +       popq %rcx
>>> +       popq %r11
>>> +       jmp entry_SYSCALL_compat_after_hwframe
>>>  ENDPROC(xen_syscall32_target)
>>>
>>>  /* 32-bit compat sysenter target */
>>>  ENTRY(xen_sysenter_target)
>>> -       undo_xen_syscall
>>> +       mov 0*8(%rsp), %rcx
>>> +       mov 1*8(%rsp), %r11
>>> +       mov 5*8(%rsp), %rsp
>>>         jmp entry_SYSENTER_compat
>>>  ENDPROC(xen_sysenter_target)
>>
>> This patch causes the iopl_32 and ioperm_32 self-tests to fail on a
>> 64-bit PV kernel.  The 64-bit versions pass. It gets a seg fault after
>> "parent: write to 0x80 (should fail)", and the fault isn't caught by
>> the signal handler.  It just dumps back to the shell.  The tests pass
>> after reverting this.
>
> I can reproduce it if I emulate an AMD machine.  I can "fix" it like this:

Yes, this is an AMD processor.

> diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
> index a8a4f4c460a6..6255e00f425e 100644
> --- a/arch/x86/xen/xen-asm_64.S
> +++ b/arch/x86/xen/xen-asm_64.S
> @@ -97,6 +97,9 @@ ENDPROC(xen_syscall_target)
>  ENTRY(xen_syscall32_target)
>         popq %rcx
>         popq %r11
> +       movq $__USER32_DS, 4*8(%rsp)
> +       movq $__USER32_CS, 1*8(%rsp)
> +       movq %r11, 2*8(%rsp)
>         jmp entry_SYSCALL_compat_after_hwframe
>  ENDPROC(xen_syscall32_target)
>
> but I haven't tried to diagnose precisely what's going on.
>
> Xen seems to be putting the 0xe0?? values in ss and cs, which oughtn't
> to be a problem, but it kills opportunistic sysretl.  Maybe that's
> triggering a preexisting bug?

Resetting the CS/SS values worked.  Looking at the Xen hypervisor
code, EFLAGS on the stack should already be set to the value in R11,
so that part doesn't appear necessary.

Shouldn't this also be done for the 64-bit SYSCALL entry, for
consistency with previous code?

--
Brian Gerst

diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index a8a4f4c460a6..6255e00f425e 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -97,6 +97,9 @@  ENDPROC(xen_syscall_target)
 ENTRY(xen_syscall32_target)
        popq %rcx
        popq %r11
+       movq $__USER32_DS, 4*8(%rsp)
+       movq $__USER32_CS, 1*8(%rsp)
+       movq %r11, 2*8(%rsp)
        jmp entry_SYSCALL_compat_after_hwframe
 ENDPROC(xen_syscall32_target)