[2/2] x86/HVM: re-order operations in hvm_ud_intercept()

Message ID	57583D2B02000078000F31CE@prv-mh.provo.novell.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> Message-Id: <57583D2B02000078000F31CE@prv-mh.provo.novell.com> Date: Wed, 08 Jun 2016 07:43:39 -0600 From: "Jan Beulich" <JBeulich@suse.com> To: "xen-devel" <xen-devel@lists.xenproject.org> References: <57583B4E02000078000F3198@prv-mh.provo.novell.com> In-Reply-To: <57583B4E02000078000F3198@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=__Part5D6B5F1B.1__=" Cc: Andrew Cooper <andrew.cooper3@citrix.com> Subject: [Xen-devel] [PATCH 2/2] x86/HVM: re-order operations in hvm_ud_intercept() Precedence: list Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Jan Beulich June 8, 2016, 1:43 p.m. UTC

Don't fetch CS explicitly, leverage the fact that hvm_emulate_prepare()
already does (and that hvm_virtual_to_linear_addr() doesn't alter it).

At once increase the length passed to hvm_virtual_to_linear_addr() by
one: There definitely needs to be at least one more opcode byte, and we
can avoid missing a wraparound case this way.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
x86/HVM: re-order operations in hvm_ud_intercept()

Don't fetch CS explicitly, leverage the fact that hvm_emulate_prepare()
already does (and that hvm_virtual_to_linear_addr() doesn't alter it).

At once increase the length passed to hvm_virtual_to_linear_addr() by
one: There definitely needs to be at least one more opcode byte, and we
can avoid missing a wraparound case this way.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3834,19 +3834,20 @@ void hvm_ud_intercept(struct cpu_user_re
 {
     struct hvm_emulate_ctxt ctxt;
 
+    hvm_emulate_prepare(&ctxt, regs);
+
     if ( opt_hvm_fep )
     {
         struct vcpu *cur = current;
-        struct segment_register cs;
+        const struct segment_register *cs = &ctxt.seg_reg[x86_seg_cs];
         unsigned long addr;
         char sig[5]; /* ud2; .ascii "xen" */
 
-        hvm_get_segment_register(cur, x86_seg_cs, &cs);
-        if ( hvm_virtual_to_linear_addr(x86_seg_cs, &cs, regs->eip,
-                                        sizeof(sig), hvm_access_insn_fetch,
+        if ( hvm_virtual_to_linear_addr(x86_seg_cs, cs, regs->eip,
+                                        sizeof(sig) + 1, hvm_access_insn_fetch,
                                         (hvm_long_mode_enabled(cur) &&
-                                         cs.attr.fields.l) ? 64 :
-                                        cs.attr.fields.db ? 32 : 16, &addr) &&
+                                         cs->attr.fields.l) ? 64 :
+                                        cs->attr.fields.db ? 32 : 16, &addr) &&
              (hvm_fetch_from_guest_virt_nofault(sig, addr, sizeof(sig),
                                                 0) == HVMCOPY_okay) &&
              (memcmp(sig, "\xf\xbxen", sizeof(sig)) == 0) )
@@ -3856,8 +3857,6 @@ void hvm_ud_intercept(struct cpu_user_re
         }
     }
 
-    hvm_emulate_prepare(&ctxt, regs);
-
     switch ( hvm_emulate_one(&ctxt) )
     {
     case X86EMUL_UNHANDLEABLE:

Andrew Cooper June 9, 2016, 11:34 a.m. UTC | #1

On 08/06/16 14:43, Jan Beulich wrote:
> Don't fetch CS explicitly, leverage the fact that hvm_emulate_prepare()
> already does (and that hvm_virtual_to_linear_addr() doesn't alter it).
>
> At once increase the length passed to hvm_virtual_to_linear_addr() by
> one: There definitely needs to be at least one more opcode byte, and we
> can avoid missing a wraparound case this way.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

I looked into this when you suggested it, but it latches the wrong eip
in the emulation state, and you will end up re-emulating the ud2a
instruction, rather than the following instruction.

I would be tempted just to leave this code in its current condition.

~Andrew

Jan Beulich June 9, 2016, 12:31 p.m. UTC | #2

>>> On 09.06.16 at 13:34, <andrew.cooper3@citrix.com> wrote:
> On 08/06/16 14:43, Jan Beulich wrote:
>> Don't fetch CS explicitly, leverage the fact that hvm_emulate_prepare()
>> already does (and that hvm_virtual_to_linear_addr() doesn't alter it).
>>
>> At once increase the length passed to hvm_virtual_to_linear_addr() by
>> one: There definitely needs to be at least one more opcode byte, and we
>> can avoid missing a wraparound case this way.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I looked into this when you suggested it, but it latches the wrong eip
> in the emulation state, and you will end up re-emulating the ud2a
> instruction, rather than the following instruction.

Where is there any latching of eip? All hvm_emulate_prepare() does
is storing the regs pointer.

Jan

Andrew Cooper June 9, 2016, 2:06 p.m. UTC | #3

On 09/06/16 13:31, Jan Beulich wrote:
>>>> On 09.06.16 at 13:34, <andrew.cooper3@citrix.com> wrote:
>> On 08/06/16 14:43, Jan Beulich wrote:
>>> Don't fetch CS explicitly, leverage the fact that hvm_emulate_prepare()
>>> already does (and that hvm_virtual_to_linear_addr() doesn't alter it).
>>>
>>> At once increase the length passed to hvm_virtual_to_linear_addr() by
>>> one: There definitely needs to be at least one more opcode byte, and we
>>> can avoid missing a wraparound case this way.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> I looked into this when you suggested it, but it latches the wrong eip
>> in the emulation state, and you will end up re-emulating the ud2a
>> instruction, rather than the following instruction.
> Where is there any latching of eip? All hvm_emulate_prepare() does
> is storing the regs pointer.

Oh - so it does.  I clearly looked over it too quickly.

What wraparound issue are you referring to?  Adding 1 will cause
incorrect behaviour when the emulation prefix ends at the segment limit.

~Andrew

Jan Beulich June 9, 2016, 2:13 p.m. UTC | #4

>>> On 09.06.16 at 16:06, <andrew.cooper3@citrix.com> wrote:
> On 09/06/16 13:31, Jan Beulich wrote:
>>>>> On 09.06.16 at 13:34, <andrew.cooper3@citrix.com> wrote:
>>> On 08/06/16 14:43, Jan Beulich wrote:
>>>> Don't fetch CS explicitly, leverage the fact that hvm_emulate_prepare()
>>>> already does (and that hvm_virtual_to_linear_addr() doesn't alter it).
>>>>
>>>> At once increase the length passed to hvm_virtual_to_linear_addr() by
>>>> one: There definitely needs to be at least one more opcode byte, and we
>>>> can avoid missing a wraparound case this way.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> I looked into this when you suggested it, but it latches the wrong eip
>>> in the emulation state, and you will end up re-emulating the ud2a
>>> instruction, rather than the following instruction.
>> Where is there any latching of eip? All hvm_emulate_prepare() does
>> is storing the regs pointer.
> 
> Oh - so it does.  I clearly looked over it too quickly.
> 
> What wraparound issue are you referring to?  Adding 1 will cause
> incorrect behaviour when the emulation prefix ends at the segment limit.

I don't think so: The prefix together with the actual instruction
encoding should be viewed as a single instruction, and it crossing
the segment limit should #GP. It wrapping at the prefix/encoding
boundary is the case that I'm specifically referring to (this case
should also #GP, but wouldn't without this adjustment).

Jan

Andrew Cooper June 9, 2016, 2:27 p.m. UTC | #5

On 09/06/16 15:13, Jan Beulich wrote:
>>>> On 09.06.16 at 16:06, <andrew.cooper3@citrix.com> wrote:
>> On 09/06/16 13:31, Jan Beulich wrote:
>>>>>> On 09.06.16 at 13:34, <andrew.cooper3@citrix.com> wrote:
>>>> On 08/06/16 14:43, Jan Beulich wrote:
>>>>> Don't fetch CS explicitly, leverage the fact that hvm_emulate_prepare()
>>>>> already does (and that hvm_virtual_to_linear_addr() doesn't alter it).
>>>>>
>>>>> At once increase the length passed to hvm_virtual_to_linear_addr() by
>>>>> one: There definitely needs to be at least one more opcode byte, and we
>>>>> can avoid missing a wraparound case this way.
>>>>>
>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>> I looked into this when you suggested it, but it latches the wrong eip
>>>> in the emulation state, and you will end up re-emulating the ud2a
>>>> instruction, rather than the following instruction.
>>> Where is there any latching of eip? All hvm_emulate_prepare() does
>>> is storing the regs pointer.
>> Oh - so it does.  I clearly looked over it too quickly.
>>
>> What wraparound issue are you referring to?  Adding 1 will cause
>> incorrect behaviour when the emulation prefix ends at the segment limit.
> I don't think so: The prefix together with the actual instruction
> encoding should be viewed as a single instruction, and it crossing
> the segment limit should #GP. It wrapping at the prefix/encoding
> boundary is the case that I'm specifically referring to (this case
> should also #GP, but wouldn't without this adjustment).

But the force emulation prefix specifically doesn't behave like other
prefixes.

It doesn't count towards the 15 byte instruction limit, and if the
emulated instruction does fault, we want the fault pointing at the
emulated instruction, not the force prefix.  We should avoid making any
link.

~Andrew

Jan Beulich June 9, 2016, 3:05 p.m. UTC | #6

>>> On 09.06.16 at 16:27, <andrew.cooper3@citrix.com> wrote:
> On 09/06/16 15:13, Jan Beulich wrote:
>>>>> On 09.06.16 at 16:06, <andrew.cooper3@citrix.com> wrote:
>>> On 09/06/16 13:31, Jan Beulich wrote:
>>>>>>> On 09.06.16 at 13:34, <andrew.cooper3@citrix.com> wrote:
>>>>> On 08/06/16 14:43, Jan Beulich wrote:
>>>>>> Don't fetch CS explicitly, leverage the fact that hvm_emulate_prepare()
>>>>>> already does (and that hvm_virtual_to_linear_addr() doesn't alter it).
>>>>>>
>>>>>> At once increase the length passed to hvm_virtual_to_linear_addr() by
>>>>>> one: There definitely needs to be at least one more opcode byte, and we
>>>>>> can avoid missing a wraparound case this way.
>>>>>>
>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>> I looked into this when you suggested it, but it latches the wrong eip
>>>>> in the emulation state, and you will end up re-emulating the ud2a
>>>>> instruction, rather than the following instruction.
>>>> Where is there any latching of eip? All hvm_emulate_prepare() does
>>>> is storing the regs pointer.
>>> Oh - so it does.  I clearly looked over it too quickly.
>>>
>>> What wraparound issue are you referring to?  Adding 1 will cause
>>> incorrect behaviour when the emulation prefix ends at the segment limit.
>> I don't think so: The prefix together with the actual instruction
>> encoding should be viewed as a single instruction, and it crossing
>> the segment limit should #GP. It wrapping at the prefix/encoding
>> boundary is the case that I'm specifically referring to (this case
>> should also #GP, but wouldn't without this adjustment).
> 
> But the force emulation prefix specifically doesn't behave like other
> prefixes.
> 
> It doesn't count towards the 15 byte instruction limit, and if the
> emulated instruction does fault, we want the fault pointing at the
> emulated instruction, not the force prefix.  We should avoid making any
> link.

Well, are you saying placing such a prefix right below the boundary
of a flat segment is _expected_ to get the instruction at address 0
emulated? I don't think I could buy that. The patch makes no other
connection between prefix and actual insn. And #GP because of
such a boundary condition should imo point at the prefix; only all
faults associated with the actual insn should point there.

Jan

Jan Beulich June 17, 2016, 8:19 a.m. UTC | #7

>>> On 09.06.16 at 17:05, <JBeulich@suse.com> wrote:
>>>> On 09.06.16 at 16:27, <andrew.cooper3@citrix.com> wrote:
>> On 09/06/16 15:13, Jan Beulich wrote:
>>>>>> On 09.06.16 at 16:06, <andrew.cooper3@citrix.com> wrote:
>>>> On 09/06/16 13:31, Jan Beulich wrote:
>>>>>>>> On 09.06.16 at 13:34, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 08/06/16 14:43, Jan Beulich wrote:
>>>>>>> Don't fetch CS explicitly, leverage the fact that hvm_emulate_prepare()
>>>>>>> already does (and that hvm_virtual_to_linear_addr() doesn't alter it).
>>>>>>>
>>>>>>> At once increase the length passed to hvm_virtual_to_linear_addr() by
>>>>>>> one: There definitely needs to be at least one more opcode byte, and we
>>>>>>> can avoid missing a wraparound case this way.
>>>>>>>
>>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>> I looked into this when you suggested it, but it latches the wrong eip
>>>>>> in the emulation state, and you will end up re-emulating the ud2a
>>>>>> instruction, rather than the following instruction.
>>>>> Where is there any latching of eip? All hvm_emulate_prepare() does
>>>>> is storing the regs pointer.
>>>> Oh - so it does.  I clearly looked over it too quickly.
>>>>
>>>> What wraparound issue are you referring to?  Adding 1 will cause
>>>> incorrect behaviour when the emulation prefix ends at the segment limit.
>>> I don't think so: The prefix together with the actual instruction
>>> encoding should be viewed as a single instruction, and it crossing
>>> the segment limit should #GP. It wrapping at the prefix/encoding
>>> boundary is the case that I'm specifically referring to (this case
>>> should also #GP, but wouldn't without this adjustment).
>> 
>> But the force emulation prefix specifically doesn't behave like other
>> prefixes.
>> 
>> It doesn't count towards the 15 byte instruction limit, and if the
>> emulated instruction does fault, we want the fault pointing at the
>> emulated instruction, not the force prefix.  We should avoid making any
>> link.
> 
> Well, are you saying placing such a prefix right below the boundary
> of a flat segment is _expected_ to get the instruction at address 0
> emulated? I don't think I could buy that. The patch makes no other
> connection between prefix and actual insn. And #GP because of
> such a boundary condition should imo point at the prefix; only all
> faults associated with the actual insn should point there.

Ping?

Jan

Andrew Cooper June 17, 2016, 9:37 a.m. UTC | #8

On 09/06/16 16:05, Jan Beulich wrote:
>>>> On 09.06.16 at 16:27, <andrew.cooper3@citrix.com> wrote:
>> On 09/06/16 15:13, Jan Beulich wrote:
>>>>>> On 09.06.16 at 16:06, <andrew.cooper3@citrix.com> wrote:
>>>> On 09/06/16 13:31, Jan Beulich wrote:
>>>>>>>> On 09.06.16 at 13:34, <andrew.cooper3@citrix.com> wrote:
>>>>>> On 08/06/16 14:43, Jan Beulich wrote:
>>>>>>> Don't fetch CS explicitly, leverage the fact that hvm_emulate_prepare()
>>>>>>> already does (and that hvm_virtual_to_linear_addr() doesn't alter it).
>>>>>>>
>>>>>>> At once increase the length passed to hvm_virtual_to_linear_addr() by
>>>>>>> one: There definitely needs to be at least one more opcode byte, and we
>>>>>>> can avoid missing a wraparound case this way.
>>>>>>>
>>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>> I looked into this when you suggested it, but it latches the wrong eip
>>>>>> in the emulation state, and you will end up re-emulating the ud2a
>>>>>> instruction, rather than the following instruction.
>>>>> Where is there any latching of eip? All hvm_emulate_prepare() does
>>>>> is storing the regs pointer.
>>>> Oh - so it does.  I clearly looked over it too quickly.
>>>>
>>>> What wraparound issue are you referring to?  Adding 1 will cause
>>>> incorrect behaviour when the emulation prefix ends at the segment limit.
>>> I don't think so: The prefix together with the actual instruction
>>> encoding should be viewed as a single instruction, and it crossing
>>> the segment limit should #GP. It wrapping at the prefix/encoding
>>> boundary is the case that I'm specifically referring to (this case
>>> should also #GP, but wouldn't without this adjustment).
>> But the force emulation prefix specifically doesn't behave like other
>> prefixes.
>>
>> It doesn't count towards the 15 byte instruction limit, and if the
>> emulated instruction does fault, we want the fault pointing at the
>> emulated instruction, not the force prefix.  We should avoid making any
>> link.
> Well, are you saying placing such a prefix right below the boundary
> of a flat segment is _expected_ to get the instruction at address 0
> emulated? I don't think I could buy that. The patch makes no other
> connection between prefix and actual insn. And #GP because of
> such a boundary condition should imo point at the prefix; only all
> faults associated with the actual insn should point there.

Ok.  That sounds reasonable.  Would it be possible to add a small
comment to the code? Even with the commit message, I was confused as to
the nature of the +1.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Jan Beulich June 17, 2016, 10:01 a.m. UTC | #9

>>> On 17.06.16 at 11:37, <andrew.cooper3@citrix.com> wrote:
> On 09/06/16 16:05, Jan Beulich wrote:
>>>>> On 09.06.16 at 16:27, <andrew.cooper3@citrix.com> wrote:
>>> On 09/06/16 15:13, Jan Beulich wrote:
>>>>>>> On 09.06.16 at 16:06, <andrew.cooper3@citrix.com> wrote:
>>>>> On 09/06/16 13:31, Jan Beulich wrote:
>>>>>>>>> On 09.06.16 at 13:34, <andrew.cooper3@citrix.com> wrote:
>>>>>>> On 08/06/16 14:43, Jan Beulich wrote:
>>>>>>>> Don't fetch CS explicitly, leverage the fact that hvm_emulate_prepare()
>>>>>>>> already does (and that hvm_virtual_to_linear_addr() doesn't alter it).
>>>>>>>>
>>>>>>>> At once increase the length passed to hvm_virtual_to_linear_addr() by
>>>>>>>> one: There definitely needs to be at least one more opcode byte, and we
>>>>>>>> can avoid missing a wraparound case this way.
>>>>>>>>
>>>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>>> I looked into this when you suggested it, but it latches the wrong eip
>>>>>>> in the emulation state, and you will end up re-emulating the ud2a
>>>>>>> instruction, rather than the following instruction.
>>>>>> Where is there any latching of eip? All hvm_emulate_prepare() does
>>>>>> is storing the regs pointer.
>>>>> Oh - so it does.  I clearly looked over it too quickly.
>>>>>
>>>>> What wraparound issue are you referring to?  Adding 1 will cause
>>>>> incorrect behaviour when the emulation prefix ends at the segment limit.
>>>> I don't think so: The prefix together with the actual instruction
>>>> encoding should be viewed as a single instruction, and it crossing
>>>> the segment limit should #GP. It wrapping at the prefix/encoding
>>>> boundary is the case that I'm specifically referring to (this case
>>>> should also #GP, but wouldn't without this adjustment).
>>> But the force emulation prefix specifically doesn't behave like other
>>> prefixes.
>>>
>>> It doesn't count towards the 15 byte instruction limit, and if the
>>> emulated instruction does fault, we want the fault pointing at the
>>> emulated instruction, not the force prefix.  We should avoid making any
>>> link.
>> Well, are you saying placing such a prefix right below the boundary
>> of a flat segment is _expected_ to get the instruction at address 0
>> emulated? I don't think I could buy that. The patch makes no other
>> connection between prefix and actual insn. And #GP because of
>> such a boundary condition should imo point at the prefix; only all
>> faults associated with the actual insn should point there.
> 
> Ok.  That sounds reasonable.  Would it be possible to add a small
> comment to the code? Even with the commit message, I was confused as to
> the nature of the +1.

+        /*
+         * Note that in the call below we pass 1 more than the signature
+         * size, to guard against the overall code sequence wrapping between
+         * "prefix" and actual instruction. There's necessarily at least one
+         * actual instruction byte required, so this won't cause failure on
+         * legitimate uses.
+         */

> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Thanks, Jan

[2/2] x86/HVM: re-order operations in hvm_ud_intercept()

Commit Message

Comments

Patch