diff mbox

x86: svm: don't intercept CR0 TS or MP bit write

Message ID 20150220224445.2875.66846.stgit@joelvmguard2.amd.com (mailing list archive)
State New, archived
Headers show

Commit Message

Joel Schopp Feb. 20, 2015, 10:44 p.m. UTC
From: David Kaplan <david.kaplan@amd.com>

Reduce the number of exits by avoiding exiting when the guest writes TS or MP
bits of CR0.  INTERCEPT_CR0_WRITE intercepts all writes to CR0 including TS and
MP bits. It intercepts these even if INTERCEPT_SELECTIVE_CR0 is set.  What we
should be doing is setting INTERCEPT_SELECTIVE_CR0 and not setting
INTERCEPT_CR0_WRITE.

Signed-off-by: David Kaplan <david.kaplan@amd.com>
[added remove of clr_cr_intercept in init_vmcb, fixed check in handle_exit,
added emulation on interception back in, forward ported, tested]
Signed-off-by: Joel Schopp <joel.schopp@amd.com>
---
 arch/x86/kvm/svm.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Radim Krčmář Feb. 23, 2015, 7:17 p.m. UTC | #1
2015-02-20 16:44-0600, Joel Schopp:
> From: David Kaplan <david.kaplan@amd.com>
> 
> Reduce the number of exits by avoiding exiting when the guest writes TS or MP
> bits of CR0.  INTERCEPT_CR0_WRITE intercepts all writes to CR0 including TS and
> MP bits. It intercepts these even if INTERCEPT_SELECTIVE_CR0 is set.  What we
> should be doing is setting INTERCEPT_SELECTIVE_CR0 and not setting
> INTERCEPT_CR0_WRITE.
> 
> Signed-off-by: David Kaplan <david.kaplan@amd.com>
> [added remove of clr_cr_intercept in init_vmcb, fixed check in handle_exit,
> added emulation on interception back in, forward ported, tested]
> Signed-off-by: Joel Schopp <joel.schopp@amd.com>
> ---
>  arch/x86/kvm/svm.c |   13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index d319e0c..55822e5 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1539,10 +1538,8 @@ static void update_cr0_intercept(struct vcpu_svm *svm)
>  
>  	if (gcr0 == *hcr0 && svm->vcpu.fpu_active) {
>  		clr_cr_intercept(svm, INTERCEPT_CR0_READ);


> -		clr_cr_intercept(svm, INTERCEPT_CR0_WRITE);
>  	} else {
>  		set_cr_intercept(svm, INTERCEPT_CR0_READ);

(There is no point in checking fpu_active if cr0s are equal.)

> -		set_cr_intercept(svm, INTERCEPT_CR0_WRITE);

KVM uses lazy FPU and the state is undefined before the first access.
We set cr0.ts when !svm->vcpu.fpu_active to detect the first access, but
if we allow the guest to clear cr0.ts without exiting, it can access FPU
with undefined state.

Is this code failing to disable the intercept when FPU is active?

> @@ -2940,7 +2937,11 @@ static int cr_interception(struct vcpu_svm *svm)
> +	if (svm->vmcb->control.exit_code == SVM_EXIT_CR0_SEL_WRITE)
> +	   cr = 16;
   	^^^

> +	else
> +	   cr = svm->vmcb->control.exit_code - SVM_EXIT_READ_CR0;
   	^^^

Linux uses tabs for indentation.

> @@ -3502,7 +3503,7 @@ static int handle_exit(struct kvm_vcpu *vcpu)
>  	struct kvm_run *kvm_run = vcpu->run;
>  	u32 exit_code = svm->vmcb->control.exit_code;
>  
> -	if (!is_cr_intercept(svm, INTERCEPT_CR0_WRITE))
> +	if (!is_cr_intercept(svm, INTERCEPT_SELECTIVE_CR0))
>  		vcpu->arch.cr0 = svm->vmcb->save.cr0;

I think the purpose of this code is to get changes that happened while
we weren't monitoring CR0, and we introduce a bug here ... MP and TS can
change, but those changes are lost to arch.cr0 now.

(The original code was also suspicious -- we propagate changes of
 vmcb->save.cr0.  I don't think we want to clear guest's CD and NW /
 set MP and TS, which is the result of what we do in svm_set_cr0() /
 update_cr0_intercept() ...)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joel Schopp Feb. 24, 2015, 9:25 p.m. UTC | #2
>> -		clr_cr_intercept(svm, INTERCEPT_CR0_WRITE);
>>  	} else {
>>  		set_cr_intercept(svm, INTERCEPT_CR0_READ);
> (There is no point in checking fpu_active if cr0s are equal.)
>
>> -		set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
> KVM uses lazy FPU and the state is undefined before the first access.
> We set cr0.ts when !svm->vcpu.fpu_active to detect the first access, but
> if we allow the guest to clear cr0.ts without exiting, it can access FPU
> with undefined state.
Thanks for the valuable feedback.  It's apparent I hadn't thought
through the interaction with lazy FPU and will need to go back and
rethink my approach here.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář Feb. 25, 2015, 8:26 p.m. UTC | #3
2015-02-24 15:25-0600, Joel Schopp:
> 
> >> -		clr_cr_intercept(svm, INTERCEPT_CR0_WRITE);
> >>  	} else {
> >>  		set_cr_intercept(svm, INTERCEPT_CR0_READ);
> > (There is no point in checking fpu_active if cr0s are equal.)
> >
> >> -		set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
> > KVM uses lazy FPU and the state is undefined before the first access.
> > We set cr0.ts when !svm->vcpu.fpu_active to detect the first access, but
> > if we allow the guest to clear cr0.ts without exiting, it can access FPU
> > with undefined state.
> Thanks for the valuable feedback.  It's apparent I hadn't thought
> through the interaction with lazy FPU and will need to go back and
> rethink my approach here.

I don't think we can gain much without sacrificing some laziness, like:
when a guest with lazy FPU clears CR0.TS, it is going to use that FPU,
so we could pre-load FPU in this case and drop the write intercept too;
guests that unconditionally clear CR0.TS would perform worse though.

It's going to take a lot of time, but two hunks in your patch, that made
selective intercept benefit from decode assists, look useful even now.

Would you post them separately?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joel Schopp Feb. 25, 2015, 10:39 p.m. UTC | #4
On 02/25/2015 02:26 PM, Radim Kr?má? wrote:
> 2015-02-24 15:25-0600, Joel Schopp:
>>>> -		clr_cr_intercept(svm, INTERCEPT_CR0_WRITE);
>>>>  	} else {
>>>>  		set_cr_intercept(svm, INTERCEPT_CR0_READ);
>>> (There is no point in checking fpu_active if cr0s are equal.)
>>>
>>>> -		set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
>>> KVM uses lazy FPU and the state is undefined before the first access.
>>> We set cr0.ts when !svm->vcpu.fpu_active to detect the first access, but
>>> if we allow the guest to clear cr0.ts without exiting, it can access FPU
>>> with undefined state.
>> Thanks for the valuable feedback.  It's apparent I hadn't thought
>> through the interaction with lazy FPU and will need to go back and
>> rethink my approach here.
> I don't think we can gain much without sacrificing some laziness, like:
> when a guest with lazy FPU clears CR0.TS, it is going to use that FPU,
> so we could pre-load FPU in this case and drop the write intercept too;
> guests that unconditionally clear CR0.TS would perform worse though.
>
> It's going to take a lot of time, but two hunks in your patch, that made
> selective intercept benefit from decode assists, look useful even now.
>
> Would you post them separately?
I can re-post those separately.  They are less useful, though probably
still worth doing, on their own because SVM_EXIT_WRITE_CR0 takes
precidence over SVM_EXIT_CR0_SEL_WRITE
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d319e0c..55822e5 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1093,7 +1093,6 @@  static void init_vmcb(struct vcpu_svm *svm)
 	set_cr_intercept(svm, INTERCEPT_CR0_READ);
 	set_cr_intercept(svm, INTERCEPT_CR3_READ);
 	set_cr_intercept(svm, INTERCEPT_CR4_READ);
-	set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
 	set_cr_intercept(svm, INTERCEPT_CR3_WRITE);
 	set_cr_intercept(svm, INTERCEPT_CR4_WRITE);
 	set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
@@ -1539,10 +1538,8 @@  static void update_cr0_intercept(struct vcpu_svm *svm)
 
 	if (gcr0 == *hcr0 && svm->vcpu.fpu_active) {
 		clr_cr_intercept(svm, INTERCEPT_CR0_READ);
-		clr_cr_intercept(svm, INTERCEPT_CR0_WRITE);
 	} else {
 		set_cr_intercept(svm, INTERCEPT_CR0_READ);
-		set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
 	}
 }
 
@@ -2940,7 +2937,11 @@  static int cr_interception(struct vcpu_svm *svm)
 		return emulate_on_interception(svm);
 
 	reg = svm->vmcb->control.exit_info_1 & SVM_EXITINFO_REG_MASK;
-	cr = svm->vmcb->control.exit_code - SVM_EXIT_READ_CR0;
+
+	if (svm->vmcb->control.exit_code == SVM_EXIT_CR0_SEL_WRITE)
+	   cr = 16;
+	else
+	   cr = svm->vmcb->control.exit_code - SVM_EXIT_READ_CR0;
 
 	err = 0;
 	if (cr >= 16) { /* mov to cr */
@@ -3325,7 +3326,7 @@  static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_READ_CR3]			= cr_interception,
 	[SVM_EXIT_READ_CR4]			= cr_interception,
 	[SVM_EXIT_READ_CR8]			= cr_interception,
-	[SVM_EXIT_CR0_SEL_WRITE]		= emulate_on_interception,
+	[SVM_EXIT_CR0_SEL_WRITE]		= cr_interception,
 	[SVM_EXIT_WRITE_CR0]			= cr_interception,
 	[SVM_EXIT_WRITE_CR3]			= cr_interception,
 	[SVM_EXIT_WRITE_CR4]			= cr_interception,
@@ -3502,7 +3503,7 @@  static int handle_exit(struct kvm_vcpu *vcpu)
 	struct kvm_run *kvm_run = vcpu->run;
 	u32 exit_code = svm->vmcb->control.exit_code;
 
-	if (!is_cr_intercept(svm, INTERCEPT_CR0_WRITE))
+	if (!is_cr_intercept(svm, INTERCEPT_SELECTIVE_CR0))
 		vcpu->arch.cr0 = svm->vmcb->save.cr0;
 	if (npt_enabled)
 		vcpu->arch.cr3 = svm->vmcb->save.cr3;