diff mbox

KVM: fix sil/dil/bpl/spl in the mod/rm fields

Message ID 1369924555-30216-1-git-send-email-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paolo Bonzini May 30, 2013, 2:35 p.m. UTC
The x86-64 extended low-byte registers were fetched correctly from reg,
but not from mod/rm.

This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
not enough.

Cc: gnatapov@redhat.com
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.9
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/emulate.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Paolo Bonzini May 30, 2013, 3:34 p.m. UTC | #1
Il 30/05/2013 16:35, Paolo Bonzini ha scritto:
> The x86-64 extended low-byte registers were fetched correctly from reg,
> but not from mod/rm.
> 
> This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
> not enough.

Well, it is enough but it takes 2 minutes to reach the point where
hardware virtualization is used.  It is doing a lot of stuff in
emulation mode because FS and GS have leftovers from the A20 test:

FS =0000 0000000000000000 0000ffff 00009300 DPL=0 DS16 [-WA]
GS =ffff 00000000000ffff0 0000ffff 00009300 DPL=0 DS16 [-WA]

0x00000000000113be:  in     $0x92,%al
0x00000000000113c0:  or     $0x2,%al
0x00000000000113c2:  out    %al,$0x92
0x00000000000113c4:  xor    %ax,%ax
0x00000000000113c6:  mov    %ax,%fs
0x00000000000113c8:  dec    %ax
0x00000000000113c9:  mov    %ax,%gs
0x00000000000113cb:  inc    %ax
0x00000000000113cc:  mov    %ax,%fs:0x200
0x00000000000113d0:  cmp    %gs:0x210,%ax
0x00000000000113d5:  je     0x113cb

The DPL < RPL test fails.  Any ideas?  Should we introduce a new
intermediate value for emulate_invalid_guest_state (0=none, 1=some, 2=full)?

Paolo

> Cc: gnatapov@redhat.com
> Cc: kvm@vger.kernel.org
> Cc: <stable@vger.kernel.org> # 3.9
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/emulate.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index aa68106..028b34f 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1239,9 +1239,12 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
>  	ctxt->modrm_seg = VCPU_SREG_DS;
>  
>  	if (ctxt->modrm_mod == 3) {
> +		int highbyte_regs = ctxt->rex_prefix == 0;
> +
>  		op->type = OP_REG;
>  		op->bytes = (ctxt->d & ByteOp) ? 1 : ctxt->op_bytes;
> -		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm, ctxt->d & ByteOp);
> +		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm,
> +					       highbyte_regs && (ctxt->d & ByteOp));
>  		if (ctxt->d & Sse) {
>  			op->type = OP_XMM;
>  			op->bytes = 16;
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini May 30, 2013, 4:34 p.m. UTC | #2
Il 30/05/2013 17:34, Paolo Bonzini ha scritto:
> Il 30/05/2013 16:35, Paolo Bonzini ha scritto:
>> The x86-64 extended low-byte registers were fetched correctly from reg,
>> but not from mod/rm.
>>
>> This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
>> not enough.
> 
> Well, it is enough but it takes 2 minutes to reach the point where
> hardware virtualization is used.  It is doing a lot of stuff in
> emulation mode because FS and GS have leftovers from the A20 test:
> 
> FS =0000 0000000000000000 0000ffff 00009300 DPL=0 DS16 [-WA]
> GS =ffff 00000000000ffff0 0000ffff 00009300 DPL=0 DS16 [-WA]
> 
> 0x00000000000113be:  in     $0x92,%al
> 0x00000000000113c0:  or     $0x2,%al
> 0x00000000000113c2:  out    %al,$0x92
> 0x00000000000113c4:  xor    %ax,%ax
> 0x00000000000113c6:  mov    %ax,%fs
> 0x00000000000113c8:  dec    %ax
> 0x00000000000113c9:  mov    %ax,%gs
> 0x00000000000113cb:  inc    %ax
> 0x00000000000113cc:  mov    %ax,%fs:0x200
> 0x00000000000113d0:  cmp    %gs:0x210,%ax
> 0x00000000000113d5:  je     0x113cb
> 
> The DPL < RPL test fails.  Any ideas?  Should we introduce a new
> intermediate value for emulate_invalid_guest_state (0=none, 1=some, 2=full)?

One idea could be to replace invalid descriptors with NULL ones.  Then
you can intercept this in the #GP handler and trigger emulation for that
instruction only.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov June 2, 2013, 6:12 p.m. UTC | #3
On Thu, May 30, 2013 at 04:35:55PM +0200, Paolo Bonzini wrote:
> The x86-64 extended low-byte registers were fetched correctly from reg,
> but not from mod/rm.
> 
> This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
> not enough.
> 
Did I missed unit test patch? :)

> Cc: gnatapov@redhat.com
> Cc: kvm@vger.kernel.org
> Cc: <stable@vger.kernel.org> # 3.9
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/emulate.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index aa68106..028b34f 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1239,9 +1239,12 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
>  	ctxt->modrm_seg = VCPU_SREG_DS;
>  
>  	if (ctxt->modrm_mod == 3) {
> +		int highbyte_regs = ctxt->rex_prefix == 0;
> +
>  		op->type = OP_REG;
>  		op->bytes = (ctxt->d & ByteOp) ? 1 : ctxt->op_bytes;
> -		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm, ctxt->d & ByteOp);
> +		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm,
> +					       highbyte_regs && (ctxt->d & ByteOp));
>  		if (ctxt->d & Sse) {
>  			op->type = OP_XMM;
>  			op->bytes = 16;
> -- 
> 1.8.1.4

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini June 3, 2013, 6:27 a.m. UTC | #4
Il 02/06/2013 20:12, Gleb Natapov ha scritto:
> On Thu, May 30, 2013 at 04:35:55PM +0200, Paolo Bonzini wrote:
>> The x86-64 extended low-byte registers were fetched correctly from reg,
>> but not from mod/rm.
>>
>> This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
>> not enough.
>>
> Did I missed unit test patch? :)

I wanted to ask the GSoC student to do it.  If it doesn't come in a
couple of weeks, I'll send it.

Paolo

>> Cc: gnatapov@redhat.com
>> Cc: kvm@vger.kernel.org
>> Cc: <stable@vger.kernel.org> # 3.9
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  arch/x86/kvm/emulate.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index aa68106..028b34f 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -1239,9 +1239,12 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
>>  	ctxt->modrm_seg = VCPU_SREG_DS;
>>  
>>  	if (ctxt->modrm_mod == 3) {
>> +		int highbyte_regs = ctxt->rex_prefix == 0;
>> +
>>  		op->type = OP_REG;
>>  		op->bytes = (ctxt->d & ByteOp) ? 1 : ctxt->op_bytes;
>> -		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm, ctxt->d & ByteOp);
>> +		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm,
>> +					       highbyte_regs && (ctxt->d & ByteOp));
>>  		if (ctxt->d & Sse) {
>>  			op->type = OP_XMM;
>>  			op->bytes = 16;
>> -- 
>> 1.8.1.4
> 
> --
> 			Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov June 3, 2013, 8:04 a.m. UTC | #5
On Mon, Jun 03, 2013 at 08:27:57AM +0200, Paolo Bonzini wrote:
> Il 02/06/2013 20:12, Gleb Natapov ha scritto:
> > On Thu, May 30, 2013 at 04:35:55PM +0200, Paolo Bonzini wrote:
> >> The x86-64 extended low-byte registers were fetched correctly from reg,
> >> but not from mod/rm.
> >>
> >> This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
> >> not enough.
> >>
> > Did I missed unit test patch? :)
> 
> I wanted to ask the GSoC student to do it.  If it doesn't come in a
> couple of weeks, I'll send it.
> 
Which instruction you saw the bug happening with? It this 3.10 regression?

> Paolo
> 
> >> Cc: gnatapov@redhat.com
Please use my other email :)

> >> Cc: kvm@vger.kernel.org
> >> Cc: <stable@vger.kernel.org> # 3.9
> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >> ---
> >>  arch/x86/kvm/emulate.c | 5 ++++-
> >>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> >> index aa68106..028b34f 100644
> >> --- a/arch/x86/kvm/emulate.c
> >> +++ b/arch/x86/kvm/emulate.c
> >> @@ -1239,9 +1239,12 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
> >>  	ctxt->modrm_seg = VCPU_SREG_DS;
> >>  
> >>  	if (ctxt->modrm_mod == 3) {
> >> +		int highbyte_regs = ctxt->rex_prefix == 0;
> >> +
> >>  		op->type = OP_REG;
> >>  		op->bytes = (ctxt->d & ByteOp) ? 1 : ctxt->op_bytes;
> >> -		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm, ctxt->d & ByteOp);
> >> +		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm,
> >> +					       highbyte_regs && (ctxt->d & ByteOp));
> >>  		if (ctxt->d & Sse) {
> >>  			op->type = OP_XMM;
> >>  			op->bytes = 16;
> >> -- 
> >> 1.8.1.4
> > 
> > --
> > 			Gleb.
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini June 3, 2013, 8:15 a.m. UTC | #6
Il 03/06/2013 10:04, Gleb Natapov ha scritto:
> On Mon, Jun 03, 2013 at 08:27:57AM +0200, Paolo Bonzini wrote:
>> Il 02/06/2013 20:12, Gleb Natapov ha scritto:
>>> On Thu, May 30, 2013 at 04:35:55PM +0200, Paolo Bonzini wrote:
>>>> The x86-64 extended low-byte registers were fetched correctly from reg,
>>>> but not from mod/rm.
>>>>
>>>> This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
>>>> not enough.
>>>>
>>> Did I missed unit test patch? :)
>>
>> I wanted to ask the GSoC student to do it.  If it doesn't come in a
>> couple of weeks, I'll send it.
>>
> Which instruction you saw the bug happening with? It this 3.10 regression?

cmp $0x1f, %bpl

Like the NOP, it is a regression introduced in the switch of
emulate_invalid_guest_state from 0 to 1.

Paolo

> 
>> Paolo
>>
>>>> Cc: gnatapov@redhat.com
> Please use my other email :)
> 
>>>> Cc: kvm@vger.kernel.org
>>>> Cc: <stable@vger.kernel.org> # 3.9
>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>>> ---
>>>>  arch/x86/kvm/emulate.c | 5 ++++-
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>>>> index aa68106..028b34f 100644
>>>> --- a/arch/x86/kvm/emulate.c
>>>> +++ b/arch/x86/kvm/emulate.c
>>>> @@ -1239,9 +1239,12 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
>>>>  	ctxt->modrm_seg = VCPU_SREG_DS;
>>>>  
>>>>  	if (ctxt->modrm_mod == 3) {
>>>> +		int highbyte_regs = ctxt->rex_prefix == 0;
>>>> +
>>>>  		op->type = OP_REG;
>>>>  		op->bytes = (ctxt->d & ByteOp) ? 1 : ctxt->op_bytes;
>>>> -		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm, ctxt->d & ByteOp);
>>>> +		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm,
>>>> +					       highbyte_regs && (ctxt->d & ByteOp));
>>>>  		if (ctxt->d & Sse) {
>>>>  			op->type = OP_XMM;
>>>>  			op->bytes = 16;
>>>> -- 
>>>> 1.8.1.4
>>>
>>> --
>>> 			Gleb.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 
> --
> 			Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov June 3, 2013, 8:28 a.m. UTC | #7
On Thu, May 30, 2013 at 04:35:55PM +0200, Paolo Bonzini wrote:
> The x86-64 extended low-byte registers were fetched correctly from reg,
> but not from mod/rm.
> 
> This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
> not enough.
> 
> Cc: gnatapov@redhat.com
> Cc: kvm@vger.kernel.org
> Cc: <stable@vger.kernel.org> # 3.9
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Applied to master, thanks.

> ---
>  arch/x86/kvm/emulate.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index aa68106..028b34f 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1239,9 +1239,12 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
>  	ctxt->modrm_seg = VCPU_SREG_DS;
>  
>  	if (ctxt->modrm_mod == 3) {
> +		int highbyte_regs = ctxt->rex_prefix == 0;
> +
>  		op->type = OP_REG;
>  		op->bytes = (ctxt->d & ByteOp) ? 1 : ctxt->op_bytes;
> -		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm, ctxt->d & ByteOp);
> +		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm,
> +					       highbyte_regs && (ctxt->d & ByteOp));
>  		if (ctxt->d & Sse) {
>  			op->type = OP_XMM;
>  			op->bytes = 16;
> -- 
> 1.8.1.4

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov June 3, 2013, 10:25 a.m. UTC | #8
On Thu, May 30, 2013 at 05:34:21PM +0200, Paolo Bonzini wrote:
> Il 30/05/2013 16:35, Paolo Bonzini ha scritto:
> > The x86-64 extended low-byte registers were fetched correctly from reg,
> > but not from mod/rm.
> > 
> > This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
> > not enough.
> 
> Well, it is enough but it takes 2 minutes to reach the point where
> hardware virtualization is used.  It is doing a lot of stuff in
> emulation mode because FS and GS have leftovers from the A20 test:
> 
> FS =0000 0000000000000000 0000ffff 00009300 DPL=0 DS16 [-WA]
> GS =ffff 00000000000ffff0 0000ffff 00009300 DPL=0 DS16 [-WA]
> 
> 0x00000000000113be:  in     $0x92,%al
> 0x00000000000113c0:  or     $0x2,%al
> 0x00000000000113c2:  out    %al,$0x92
> 0x00000000000113c4:  xor    %ax,%ax
> 0x00000000000113c6:  mov    %ax,%fs
> 0x00000000000113c8:  dec    %ax
> 0x00000000000113c9:  mov    %ax,%gs
> 0x00000000000113cb:  inc    %ax
> 0x00000000000113cc:  mov    %ax,%fs:0x200
> 0x00000000000113d0:  cmp    %gs:0x210,%ax
> 0x00000000000113d5:  je     0x113cb
> 
This is 16 bit code that sets them up. So 32bit transition code does not
reload them?

> The DPL < RPL test fails.  Any ideas?  Should we introduce a new
> intermediate value for emulate_invalid_guest_state (0=none, 1=some, 2=full)?
> 
> Paolo
> 
> > Cc: gnatapov@redhat.com
> > Cc: kvm@vger.kernel.org
> > Cc: <stable@vger.kernel.org> # 3.9
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  arch/x86/kvm/emulate.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> > index aa68106..028b34f 100644
> > --- a/arch/x86/kvm/emulate.c
> > +++ b/arch/x86/kvm/emulate.c
> > @@ -1239,9 +1239,12 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
> >  	ctxt->modrm_seg = VCPU_SREG_DS;
> >  
> >  	if (ctxt->modrm_mod == 3) {
> > +		int highbyte_regs = ctxt->rex_prefix == 0;
> > +
> >  		op->type = OP_REG;
> >  		op->bytes = (ctxt->d & ByteOp) ? 1 : ctxt->op_bytes;
> > -		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm, ctxt->d & ByteOp);
> > +		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm,
> > +					       highbyte_regs && (ctxt->d & ByteOp));
> >  		if (ctxt->d & Sse) {
> >  			op->type = OP_XMM;
> >  			op->bytes = 16;
> > 

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini June 3, 2013, 12:53 p.m. UTC | #9
Il 03/06/2013 12:25, Gleb Natapov ha scritto:
> On Thu, May 30, 2013 at 05:34:21PM +0200, Paolo Bonzini wrote:
>> Il 30/05/2013 16:35, Paolo Bonzini ha scritto:
>>> The x86-64 extended low-byte registers were fetched correctly from reg,
>>> but not from mod/rm.
>>>
>>> This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
>>> not enough.
>>
>> Well, it is enough but it takes 2 minutes to reach the point where
>> hardware virtualization is used.  It is doing a lot of stuff in
>> emulation mode because FS and GS have leftovers from the A20 test:
>>
>> FS =0000 0000000000000000 0000ffff 00009300 DPL=0 DS16 [-WA]
>> GS =ffff 00000000000ffff0 0000ffff 00009300 DPL=0 DS16 [-WA]
>>
>> 0x00000000000113be:  in     $0x92,%al
>> 0x00000000000113c0:  or     $0x2,%al
>> 0x00000000000113c2:  out    %al,$0x92
>> 0x00000000000113c4:  xor    %ax,%ax
>> 0x00000000000113c6:  mov    %ax,%fs
>> 0x00000000000113c8:  dec    %ax
>> 0x00000000000113c9:  mov    %ax,%gs
>> 0x00000000000113cb:  inc    %ax
>> 0x00000000000113cc:  mov    %ax,%fs:0x200
>> 0x00000000000113d0:  cmp    %gs:0x210,%ax
>> 0x00000000000113d5:  je     0x113cb
>>
> This is 16 bit code that sets them up. So 32bit transition code does not
> reload them?

Yes.  It does this:

        movw    $1, %ax                         # protected mode (PE) bit
        lmsw    %ax                             # This is it!
        jmp     flush_instr

flush_instr:
        xorw    %bx, %bx                        # Flag to indicate a boot
        xorl    %esi, %esi                      # Pointer to real-mode code
        movw    %cs, %si
        subw    $DELTA_INITSEG, %si
        shll    $4, %esi                        # Convert to 32-bit pointer
        .byte 0x66, 0xea                        # prefix + jmpi-opcode
code32: .long   0x1000                          # will be set to 0x100000
                                                # for big kernels
        .word   __KERNEL_CS

which jumps to boot/compressed/head.S:

startup_32:
        cld
        cli
        movl    $(__KERNEL_DS), %eax
        movl    %eax, %ds
        movl    %eax, %es
        movl    %eax, %ss

and totally ignores fs/gs.  Much later there is this (in kernel/head.S):

        /*
         * We don't really need to load %fs or %gs, but load them anyway
         * to kill any stale realmode selectors.  This allows execution
         * under VT hardware.
         */
        movl %eax,%fs
        movl %eax,%gs
 
but the whole decompression is run under emulation.

Paolo

>> The DPL < RPL test fails.  Any ideas?  Should we introduce a new
>> intermediate value for emulate_invalid_guest_state (0=none, 1=some, 2=full)?
>>
>> Paolo
>>
>>> Cc: gnatapov@redhat.com
>>> Cc: kvm@vger.kernel.org
>>> Cc: <stable@vger.kernel.org> # 3.9
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>  arch/x86/kvm/emulate.c | 5 ++++-
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>>> index aa68106..028b34f 100644
>>> --- a/arch/x86/kvm/emulate.c
>>> +++ b/arch/x86/kvm/emulate.c
>>> @@ -1239,9 +1239,12 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
>>>  	ctxt->modrm_seg = VCPU_SREG_DS;
>>>  
>>>  	if (ctxt->modrm_mod == 3) {
>>> +		int highbyte_regs = ctxt->rex_prefix == 0;
>>> +
>>>  		op->type = OP_REG;
>>>  		op->bytes = (ctxt->d & ByteOp) ? 1 : ctxt->op_bytes;
>>> -		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm, ctxt->d & ByteOp);
>>> +		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm,
>>> +					       highbyte_regs && (ctxt->d & ByteOp));
>>>  		if (ctxt->d & Sse) {
>>>  			op->type = OP_XMM;
>>>  			op->bytes = 16;
>>>
> 
> --
> 			Gleb.
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity June 3, 2013, 3:42 p.m. UTC | #10
On Thu, May 30, 2013 at 7:34 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 30/05/2013 17:34, Paolo Bonzini ha scritto:
>> Il 30/05/2013 16:35, Paolo Bonzini ha scritto:
>>> The x86-64 extended low-byte registers were fetched correctly from reg,
>>> but not from mod/rm.
>>>
>>> This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
>>> not enough.
>>
>> Well, it is enough but it takes 2 minutes to reach the point where
>> hardware virtualization is used.  It is doing a lot of stuff in
>> emulation mode because FS and GS have leftovers from the A20 test:
>>
>> FS =0000 0000000000000000 0000ffff 00009300 DPL=0 DS16 [-WA]
>> GS =ffff 00000000000ffff0 0000ffff 00009300 DPL=0 DS16 [-WA]
>>
>> 0x00000000000113be:  in     $0x92,%al
>> 0x00000000000113c0:  or     $0x2,%al
>> 0x00000000000113c2:  out    %al,$0x92
>> 0x00000000000113c4:  xor    %ax,%ax
>> 0x00000000000113c6:  mov    %ax,%fs
>> 0x00000000000113c8:  dec    %ax
>> 0x00000000000113c9:  mov    %ax,%gs
>> 0x00000000000113cb:  inc    %ax
>> 0x00000000000113cc:  mov    %ax,%fs:0x200
>> 0x00000000000113d0:  cmp    %gs:0x210,%ax
>> 0x00000000000113d5:  je     0x113cb
>>
>> The DPL < RPL test fails.  Any ideas?  Should we introduce a new
>> intermediate value for emulate_invalid_guest_state (0=none, 1=some, 2=full)?
>
> One idea could be to replace invalid descriptors with NULL ones.  Then
> you can intercept this in the #GP handler and trigger emulation for that
> instruction only.

Won't work, vmx won't let you enter in such a configuration.

Maybe you can detect the exact code sequence (%eip, some instructions,
register state) and clear %fs and %gs.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov June 3, 2013, 4:40 p.m. UTC | #11
On Mon, Jun 03, 2013 at 06:42:11PM +0300, Avi Kivity wrote:
> On Thu, May 30, 2013 at 7:34 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > Il 30/05/2013 17:34, Paolo Bonzini ha scritto:
> >> Il 30/05/2013 16:35, Paolo Bonzini ha scritto:
> >>> The x86-64 extended low-byte registers were fetched correctly from reg,
> >>> but not from mod/rm.
> >>>
> >>> This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
> >>> not enough.
> >>
> >> Well, it is enough but it takes 2 minutes to reach the point where
> >> hardware virtualization is used.  It is doing a lot of stuff in
> >> emulation mode because FS and GS have leftovers from the A20 test:
> >>
> >> FS =0000 0000000000000000 0000ffff 00009300 DPL=0 DS16 [-WA]
> >> GS =ffff 00000000000ffff0 0000ffff 00009300 DPL=0 DS16 [-WA]
> >>
> >> 0x00000000000113be:  in     $0x92,%al
> >> 0x00000000000113c0:  or     $0x2,%al
> >> 0x00000000000113c2:  out    %al,$0x92
> >> 0x00000000000113c4:  xor    %ax,%ax
> >> 0x00000000000113c6:  mov    %ax,%fs
> >> 0x00000000000113c8:  dec    %ax
> >> 0x00000000000113c9:  mov    %ax,%gs
> >> 0x00000000000113cb:  inc    %ax
> >> 0x00000000000113cc:  mov    %ax,%fs:0x200
> >> 0x00000000000113d0:  cmp    %gs:0x210,%ax
> >> 0x00000000000113d5:  je     0x113cb
> >>
> >> The DPL < RPL test fails.  Any ideas?  Should we introduce a new
> >> intermediate value for emulate_invalid_guest_state (0=none, 1=some, 2=full)?
> >
> > One idea could be to replace invalid descriptors with NULL ones.  Then
> > you can intercept this in the #GP handler and trigger emulation for that
> > instruction only.
> 
> Won't work, vmx won't let you enter in such a configuration.
> 
Why? It is possible to have NULL descriptor in 32bit mode with vmx. But
we do not usually intercept #GP while executing 32bit mode, so we will
have to track if there is artificial NULL selector and enables #GP
interception and then emulate on every #GP.

> Maybe you can detect the exact code sequence (%eip, some instructions,
> register state) and clear %fs and %gs.
My be we can set dpl to rpl unconditionally on a switch from 16 to 32
bit. The only problem I can see with it is that if a guest enters user
mode without explicitly reload the segment it will be accessible by a
user mode code, but I am not sure it is well defined what dpl of a 16
bit segment is after transition to 32 bit mode anyway, so it would be
crazy to do so.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini June 3, 2013, 4:58 p.m. UTC | #12
Il 03/06/2013 18:40, Gleb Natapov ha scritto:
>> > Won't work, vmx won't let you enter in such a configuration.
> 
> Why? It is possible to have NULL descriptor in 32bit mode with vmx. But
> we do not usually intercept #GP while executing 32bit mode, so we will
> have to track if there is artificial NULL selector and enables #GP
> interception and then emulate on every #GP.

Yes, that's what I had in mind.  Of course for invalid CS you do have to
emulate.

>> > Maybe you can detect the exact code sequence (%eip, some instructions,
>> > register state) and clear %fs and %gs.
> My be we can set dpl to rpl unconditionally on a switch from 16 to 32
> bit. The only problem I can see with it is that if a guest enters user
> mode without explicitly reload the segment it will be accessible by a
> user mode code, but I am not sure it is well defined what dpl of a 16
> bit segment is after transition to 32 bit mode anyway, so it would be
> crazy to do so.

That too, or just set it to 3.  But perhaps the #GP interception
wouldn't be too hard.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov June 3, 2013, 5:45 p.m. UTC | #13
On Mon, Jun 03, 2013 at 08:30:18PM +0300, Avi Kivity wrote:
> On Jun 3, 2013 7:41 PM, "Gleb Natapov" <gleb@redhat.com> wrote:
> >
> > On Mon, Jun 03, 2013 at 06:42:11PM +0300, Avi Kivity wrote:
> > > On Thu, May 30, 2013 at 7:34 PM, Paolo Bonzini <pbonzini@redhat.com>
> wrote:
> > > > Il 30/05/2013 17:34, Paolo Bonzini ha scritto:
> > > >> Il 30/05/2013 16:35, Paolo Bonzini ha scritto:
> > > >>> The x86-64 extended low-byte registers were fetched correctly from
> reg,
> > > >>> but not from mod/rm.
> > > >>>
> > > >>> This fixes another bug in the boot of RHEL5.9 64-bit, but it is
> still
> > > >>> not enough.
> > > >>
> > > >> Well, it is enough but it takes 2 minutes to reach the point where
> > > >> hardware virtualization is used.  It is doing a lot of stuff in
> > > >> emulation mode because FS and GS have leftovers from the A20 test:
> > > >>
> > > >> FS =0000 0000000000000000 0000ffff 00009300 DPL=0 DS16 [-WA]
> > > >> GS =ffff 00000000000ffff0 0000ffff 00009300 DPL=0 DS16 [-WA]
> > > >>
> > > >> 0x00000000000113be:  in     $0x92,%al
> > > >> 0x00000000000113c0:  or     $0x2,%al
> > > >> 0x00000000000113c2:  out    %al,$0x92
> > > >> 0x00000000000113c4:  xor    %ax,%ax
> > > >> 0x00000000000113c6:  mov    %ax,%fs
> > > >> 0x00000000000113c8:  dec    %ax
> > > >> 0x00000000000113c9:  mov    %ax,%gs
> > > >> 0x00000000000113cb:  inc    %ax
> > > >> 0x00000000000113cc:  mov    %ax,%fs:0x200
> > > >> 0x00000000000113d0:  cmp    %gs:0x210,%ax
> > > >> 0x00000000000113d5:  je     0x113cb
> > > >>
> > > >> The DPL < RPL test fails.  Any ideas?  Should we introduce a new
> > > >> intermediate value for emulate_invalid_guest_state (0=none, 1=some,
> 2=full)?
> > > >
> > > > One idea could be to replace invalid descriptors with NULL ones.  Then
> > > > you can intercept this in the #GP handler and trigger emulation for
> that
> > > > instruction only.
> > >
> > > Won't work, vmx won't let you enter in such a configuration.
> > >
> > Why? It is possible to have NULL descriptor in 32bit mode with vmx. But
> > we do not usually intercept #GP while executing 32bit mode, so we will
> > have to track if there is artificial NULL selector and enables #GP
> > interception and then emulate on every #GP.
> 
> Sorry, was thinking of virtual-8086 mode. It should work.
> 
> >
> > > Maybe you can detect the exact code sequence (%eip, some instructions,
> > > register state) and clear %fs and %gs.
> > My be we can set dpl to rpl unconditionally on a switch from 16 to 32
> > bit. The only problem I can see with it is that if a guest enters user
> > mode without explicitly reload the segment it will be accessible by a
> > user mode code, but I am not sure it is well defined what dpl of a 16
> > bit segment is after transition to 32 bit mode anyway, so it would be
> > crazy to do so.
> 
> The problem is you cannot detect a segment reload if you do that.Trapping
> #GP preserves correctness in all cases (at the cost of some complexity).
> 
I do not see why I would want to detect reload. Setting segment to NULL
has a disadvantage that if guest will read selector it will get wrong
value, but may be we can leave selector alone and mark segment unusable.
I always wondered what VMX has "unusable" attribute for, may be this is
it.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index aa68106..028b34f 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1239,9 +1239,12 @@  static int decode_modrm(struct x86_emulate_ctxt *ctxt,
 	ctxt->modrm_seg = VCPU_SREG_DS;
 
 	if (ctxt->modrm_mod == 3) {
+		int highbyte_regs = ctxt->rex_prefix == 0;
+
 		op->type = OP_REG;
 		op->bytes = (ctxt->d & ByteOp) ? 1 : ctxt->op_bytes;
-		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm, ctxt->d & ByteOp);
+		op->addr.reg = decode_register(ctxt, ctxt->modrm_rm,
+					       highbyte_regs && (ctxt->d & ByteOp));
 		if (ctxt->d & Sse) {
 			op->type = OP_XMM;
 			op->bytes = 16;