[3/3] KVM: x86: correct mwait and monitor emulation

Message ID	20140618184601.GE1695@ERROL.INI.CMU.EDU (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Date: Wed, 18 Jun 2014 14:46:01 -0400 From: "Gabriel L. Somlo" <gsomlo@gmail.com> To: Eric Northup <digitaleric@google.com> Cc: Nadav Amit <namit@cs.technion.ac.il>, Paolo Bonzini <pbonzini@redhat.com>, gleb@kernel.org, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>, the arch/x86 maintainers <x86@kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, KVM <kvm@vger.kernel.org>, joro@8bytes.org, mst@redhat.com, agraf@suse.de Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation Message-ID: <20140618184601.GE1695@ERROL.INI.CMU.EDU> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <CAG7+5M1T=OSqMLOJjb-txRCL_J_q=KSNV9yOfeP=98YuyD27Fw@mail.gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: kvm-owner@vger.kernel.org Precedence: bulk

Gabriel L. Somlo June 18, 2014, 6:46 p.m. UTC

On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <namit@cs.technion.ac.il> wrote:
> > mwait and monitor are currently handled as nop. Considering this behavior, they
> > should still be handled correctly, i.e., check execution conditions and generate
> > exceptions when required. mwait and monitor may also be executed in real-mode
> > and are not handled in that case.  This patch performs the emulation of
> > monitor-mwait according to Intel SDM (other than checking whether interrupt can
> > be used as a break event).
> >
> > Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>

How about this instead (details in the commit log below) ? Please let
me know what you think, and if you'd prefer me to send it out as a
separate patch rather than a reply to this thread.

Thanks,
--Gabriel

From 0375a0aceb54cdbc26a6c0e5b43c46324f830ec3 Mon Sep 17 00:00:00 2001
From: "Gabriel L. Somlo" <somlo@cmu.edu>
Date: Wed, 18 Jun 2014 14:39:15 -0400
Subject: [PATCH] kvm: x86: revert "emulate monitor and mwait instructions as nop"

This reverts commit 87c00572ba05aa8c9db118da75c608f47eb10b9e.

OS X <= 10.7.* are the only known guests which realistically required
this functionality. As it turns out, OS X can be told to forego using
monitor/mwait by passing it "idlehalt=0" as a kernel argument, so we're
better off removing this hack from KVM altogether.

Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu>
---
 arch/x86/kvm/cpuid.c |  2 --
 arch/x86/kvm/svm.c   |  8 +++-----
 arch/x86/kvm/vmx.c   | 10 ++++------
 3 files changed, 7 insertions(+), 13 deletions(-)

Bandan Das June 18, 2014, 7:09 p.m. UTC | #1

"Gabriel L. Somlo" <gsomlo@gmail.com> writes:

> On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
>> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <namit@cs.technion.ac.il> wrote:
>> > mwait and monitor are currently handled as nop. Considering this behavior, they
>> > should still be handled correctly, i.e., check execution conditions and generate
>> > exceptions when required. mwait and monitor may also be executed in real-mode
>> > and are not handled in that case.  This patch performs the emulation of
>> > monitor-mwait according to Intel SDM (other than checking whether interrupt can
>> > be used as a break event).
>> >
>> > Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
>
> How about this instead (details in the commit log below) ? Please let
> me know what you think, and if you'd prefer me to send it out as a
> separate patch rather than a reply to this thread.

I am not saying we should "punish" guests that don't do the right thing.
But if we can't emulate it the right way, we emulate it  ? 
And the fact that a workaround exists for the guest is all the more better :)

> Thanks,
> --Gabriel
>
>
> From 0375a0aceb54cdbc26a6c0e5b43c46324f830ec3 Mon Sep 17 00:00:00 2001
> From: "Gabriel L. Somlo" <somlo@cmu.edu>
> Date: Wed, 18 Jun 2014 14:39:15 -0400
> Subject: [PATCH] kvm: x86: revert "emulate monitor and mwait instructions as nop"
>
> This reverts commit 87c00572ba05aa8c9db118da75c608f47eb10b9e.
>
> OS X <= 10.7.* are the only known guests which realistically required
> this functionality. As it turns out, OS X can be told to forego using
> monitor/mwait by passing it "idlehalt=0" as a kernel argument, so we're
> better off removing this hack from KVM altogether.
>
> Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu>
> ---
>  arch/x86/kvm/cpuid.c |  2 --
>  arch/x86/kvm/svm.c   |  8 +++-----
>  arch/x86/kvm/vmx.c   | 10 ++++------
>  3 files changed, 7 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 38a0afe..17b42fa 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -283,8 +283,6 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  		0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW);
>  	/* cpuid 1.ecx */
>  	const u32 kvm_supported_word4_x86_features =
> -		/* NOTE: MONITOR (and MWAIT) are emulated as NOP,
> -		 * but *not* advertised to guests via CPUID ! */
>  		F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
>  		0 /* DS-CPL, VMX, SMX, EST */ |
>  		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index ec8366c..0e8ef20 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3274,7 +3274,7 @@ static int pause_interception(struct vcpu_svm *svm)
>  	return 1;
>  }
>  
> -static int nop_interception(struct vcpu_svm *svm)
> +static int invalid_op_interception(struct vcpu_svm *svm)
>  {
>  	skip_emulated_instruction(&(svm->vcpu));
>  	return 1;
> @@ -3282,14 +3282,12 @@ static int nop_interception(struct vcpu_svm *svm)
>  
>  static int monitor_interception(struct vcpu_svm *svm)
>  {
> -	printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> -	return nop_interception(svm);
> +	return invalid_op_interception(svm);
>  }
>  
>  static int mwait_interception(struct vcpu_svm *svm)
>  {
> -	printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> -	return nop_interception(svm);
> +	return invalid_op_interception(svm);
>  }
>  
>  static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 801332e..577c7df 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5672,22 +5672,20 @@ static int handle_pause(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> -static int handle_nop(struct kvm_vcpu *vcpu)
> +static int handle_invalid_op(struct kvm_vcpu *vcpu)
>  {
> -	skip_emulated_instruction(vcpu);
> +	kvm_queue_exception(vcpu, UD_VECTOR);
>  	return 1;
>  }
>  
>  static int handle_mwait(struct kvm_vcpu *vcpu)
>  {
> -	printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> -	return handle_nop(vcpu);
> +	return handle_invalid_op(vcpu);
>  }
>  
>  static int handle_monitor(struct kvm_vcpu *vcpu)
>  {
> -	printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> -	return handle_nop(vcpu);
> +	return handle_invalid_op(vcpu);
>  }
>  
>  /*
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin June 19, 2014, 10:18 a.m. UTC | #2

On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
> On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
> > On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <namit@cs.technion.ac.il> wrote:
> > > mwait and monitor are currently handled as nop. Considering this behavior, they
> > > should still be handled correctly, i.e., check execution conditions and generate
> > > exceptions when required. mwait and monitor may also be executed in real-mode
> > > and are not handled in that case.  This patch performs the emulation of
> > > monitor-mwait according to Intel SDM (other than checking whether interrupt can
> > > be used as a break event).
> > >
> > > Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> 
> How about this instead (details in the commit log below) ? Please let
> me know what you think, and if you'd prefer me to send it out as a
> separate patch rather than a reply to this thread.
> 
> Thanks,
> --Gabriel

If there's an easy workaround, I'm inclined to agree.
We can always go back to Gabriel's patch (and then we'll need
Nadav's one too) but if we release a kernel with this
support it becomes an ABI and we can't go back.

So let's be careful here, and revert the hack for 3.16.


Acked-by: Michael S. Tsirkin <mst@redhat.com>



> 
> >From 0375a0aceb54cdbc26a6c0e5b43c46324f830ec3 Mon Sep 17 00:00:00 2001
> From: "Gabriel L. Somlo" <somlo@cmu.edu>
> Date: Wed, 18 Jun 2014 14:39:15 -0400
> Subject: [PATCH] kvm: x86: revert "emulate monitor and mwait instructions as nop"
> 
> This reverts commit 87c00572ba05aa8c9db118da75c608f47eb10b9e.
> 
> OS X <= 10.7.* are the only known guests which realistically required
> this functionality. As it turns out, OS X can be told to forego using
> monitor/mwait by passing it "idlehalt=0" as a kernel argument, so we're
> better off removing this hack from KVM altogether.
> 
> Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu>
> ---
>  arch/x86/kvm/cpuid.c |  2 --
>  arch/x86/kvm/svm.c   |  8 +++-----
>  arch/x86/kvm/vmx.c   | 10 ++++------
>  3 files changed, 7 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 38a0afe..17b42fa 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -283,8 +283,6 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>  		0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW);
>  	/* cpuid 1.ecx */
>  	const u32 kvm_supported_word4_x86_features =
> -		/* NOTE: MONITOR (and MWAIT) are emulated as NOP,
> -		 * but *not* advertised to guests via CPUID ! */
>  		F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
>  		0 /* DS-CPL, VMX, SMX, EST */ |
>  		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index ec8366c..0e8ef20 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3274,7 +3274,7 @@ static int pause_interception(struct vcpu_svm *svm)
>  	return 1;
>  }
>  
> -static int nop_interception(struct vcpu_svm *svm)
> +static int invalid_op_interception(struct vcpu_svm *svm)
>  {
>  	skip_emulated_instruction(&(svm->vcpu));
>  	return 1;
> @@ -3282,14 +3282,12 @@ static int nop_interception(struct vcpu_svm *svm)
>  
>  static int monitor_interception(struct vcpu_svm *svm)
>  {
> -	printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> -	return nop_interception(svm);
> +	return invalid_op_interception(svm);
>  }
>  
>  static int mwait_interception(struct vcpu_svm *svm)
>  {
> -	printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> -	return nop_interception(svm);
> +	return invalid_op_interception(svm);
>  }
>  
>  static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 801332e..577c7df 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5672,22 +5672,20 @@ static int handle_pause(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> -static int handle_nop(struct kvm_vcpu *vcpu)
> +static int handle_invalid_op(struct kvm_vcpu *vcpu)
>  {
> -	skip_emulated_instruction(vcpu);
> +	kvm_queue_exception(vcpu, UD_VECTOR);
>  	return 1;
>  }
>  
>  static int handle_mwait(struct kvm_vcpu *vcpu)
>  {
> -	printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> -	return handle_nop(vcpu);
> +	return handle_invalid_op(vcpu);
>  }
>  
>  static int handle_monitor(struct kvm_vcpu *vcpu)
>  {
> -	printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> -	return handle_nop(vcpu);
> +	return handle_invalid_op(vcpu);
>  }
>  
>  /*
> -- 
> 1.9.3
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov June 19, 2014, 11:23 a.m. UTC | #3

On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
> 
> On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> > On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
> >> On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
> >>> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <namit@cs.technion.ac.il> wrote:
> >>>> mwait and monitor are currently handled as nop. Considering this behavior, they
> >>>> should still be handled correctly, i.e., check execution conditions and generate
> >>>> exceptions when required. mwait and monitor may also be executed in real-mode
> >>>> and are not handled in that case.  This patch performs the emulation of
> >>>> monitor-mwait according to Intel SDM (other than checking whether interrupt can
> >>>> be used as a break event).
> >>>> 
> >>>> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> >> 
> >> How about this instead (details in the commit log below) ? Please let
> >> me know what you think, and if you'd prefer me to send it out as a
> >> separate patch rather than a reply to this thread.
> >> 
> >> Thanks,
> >> --Gabriel
> > 
> > If there's an easy workaround, I'm inclined to agree.
> > We can always go back to Gabriel's patch (and then we'll need
> > Nadav's one too) but if we release a kernel with this
> > support it becomes an ABI and we can't go back.
> > 
> > So let's be careful here, and revert the hack for 3.16.
> > 
> > 
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> Personally, I got a custom guest which requires mwait for executing correctly.
Can you elaborate on this guest a little bit. With nop implementation
for mwait the guest will hog a host cpu. Do you consider this to be
"executing correctly?"

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nadav Amit June 19, 2014, 11:52 a.m. UTC | #4

On 6/19/14, 2:23 PM, Gleb Natapov wrote:
> On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
>>
>> On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>>> On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
>>>> On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
>>>>> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <namit@cs.technion.ac.il> wrote:
>>>>>> mwait and monitor are currently handled as nop. Considering this behavior, they
>>>>>> should still be handled correctly, i.e., check execution conditions and generate
>>>>>> exceptions when required. mwait and monitor may also be executed in real-mode
>>>>>> and are not handled in that case.  This patch performs the emulation of
>>>>>> monitor-mwait according to Intel SDM (other than checking whether interrupt can
>>>>>> be used as a break event).
>>>>>>
>>>>>> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
>>>>
>>>> How about this instead (details in the commit log below) ? Please let
>>>> me know what you think, and if you'd prefer me to send it out as a
>>>> separate patch rather than a reply to this thread.
>>>>
>>>> Thanks,
>>>> --Gabriel
>>>
>>> If there's an easy workaround, I'm inclined to agree.
>>> We can always go back to Gabriel's patch (and then we'll need
>>> Nadav's one too) but if we release a kernel with this
>>> support it becomes an ABI and we can't go back.
>>>
>>> So let's be careful here, and revert the hack for 3.16.
>>>
>>>
>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>>
>> Personally, I got a custom guest which requires mwait for executing correctly.
> Can you elaborate on this guest a little bit. With nop implementation
> for mwait the guest will hog a host cpu. Do you consider this to be
> "executing correctly?"
>
> --

mwait is not as "clean" as it may appear. It encounters false wake-ups 
due to a variety of reasons, and any code need to recheck the wake-up 
condition afterwards. Actually, some CPUs had bugs that caused excessive 
wake-ups that degraded performance considerably (Nehalem, if I am not 
mistaken).
Therefore, handling mwait as nop is logically correct (although it may 
degrade performance).

For the reference, if you look at the SDM 8.10.4, you'll see:
"Multiple events other than a write to the triggering address range can 
cause a processor that executed MWAIT to wake up. These include events 
that would lead to voluntary or involuntary context switches, such as..."

Note the words "include" in the sentence "These include events". 
Software has no way of controlling whether it gets false wake-ups and 
cannot rely on the wake-up as indication to anything.

Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin June 19, 2014, 12:01 p.m. UTC | #5

On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote:
> On 6/19/14, 2:23 PM, Gleb Natapov wrote:
> >On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
> >>
> >>On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>
> >>>On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
> >>>>On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
> >>>>>On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <namit@cs.technion.ac.il> wrote:
> >>>>>>mwait and monitor are currently handled as nop. Considering this behavior, they
> >>>>>>should still be handled correctly, i.e., check execution conditions and generate
> >>>>>>exceptions when required. mwait and monitor may also be executed in real-mode
> >>>>>>and are not handled in that case.  This patch performs the emulation of
> >>>>>>monitor-mwait according to Intel SDM (other than checking whether interrupt can
> >>>>>>be used as a break event).
> >>>>>>
> >>>>>>Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> >>>>
> >>>>How about this instead (details in the commit log below) ? Please let
> >>>>me know what you think, and if you'd prefer me to send it out as a
> >>>>separate patch rather than a reply to this thread.
> >>>>
> >>>>Thanks,
> >>>>--Gabriel
> >>>
> >>>If there's an easy workaround, I'm inclined to agree.
> >>>We can always go back to Gabriel's patch (and then we'll need
> >>>Nadav's one too) but if we release a kernel with this
> >>>support it becomes an ABI and we can't go back.
> >>>
> >>>So let's be careful here, and revert the hack for 3.16.
> >>>
> >>>
> >>>Acked-by: Michael S. Tsirkin <mst@redhat.com>
> >>>
> >>Personally, I got a custom guest which requires mwait for executing correctly.
> >Can you elaborate on this guest a little bit. With nop implementation
> >for mwait the guest will hog a host cpu. Do you consider this to be
> >"executing correctly?"
> >
> >--
> 
> mwait is not as "clean" as it may appear. It encounters false wake-ups due
> to a variety of reasons, and any code need to recheck the wake-up condition
> afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that
> degraded performance considerably (Nehalem, if I am not mistaken).
> Therefore, handling mwait as nop is logically correct (although it may
> degrade performance).
> 
> For the reference, if you look at the SDM 8.10.4, you'll see:
> "Multiple events other than a write to the triggering address range can
> cause a processor that executed MWAIT to wake up. These include events that
> would lead to voluntary or involuntary context switches, such as..."
> 
> Note the words "include" in the sentence "These include events". Software
> has no way of controlling whether it gets false wake-ups and cannot rely on
> the wake-up as indication to anything.
> 
> Nadav

It's a quality of implementation question.
It is correct in the same sense that
a NIC dropping each second packet is correct.

If we ship this hack we have to maintain it forever,
so there needs to be a compelling reason beyond
just "because we can".

Gleb Natapov June 19, 2014, 12:07 p.m. UTC | #6

On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote:
> On 6/19/14, 2:23 PM, Gleb Natapov wrote:
> >On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
> >>
> >>On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>
> >>>On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
> >>>>On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
> >>>>>On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <namit@cs.technion.ac.il> wrote:
> >>>>>>mwait and monitor are currently handled as nop. Considering this behavior, they
> >>>>>>should still be handled correctly, i.e., check execution conditions and generate
> >>>>>>exceptions when required. mwait and monitor may also be executed in real-mode
> >>>>>>and are not handled in that case.  This patch performs the emulation of
> >>>>>>monitor-mwait according to Intel SDM (other than checking whether interrupt can
> >>>>>>be used as a break event).
> >>>>>>
> >>>>>>Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> >>>>
> >>>>How about this instead (details in the commit log below) ? Please let
> >>>>me know what you think, and if you'd prefer me to send it out as a
> >>>>separate patch rather than a reply to this thread.
> >>>>
> >>>>Thanks,
> >>>>--Gabriel
> >>>
> >>>If there's an easy workaround, I'm inclined to agree.
> >>>We can always go back to Gabriel's patch (and then we'll need
> >>>Nadav's one too) but if we release a kernel with this
> >>>support it becomes an ABI and we can't go back.
> >>>
> >>>So let's be careful here, and revert the hack for 3.16.
> >>>
> >>>
> >>>Acked-by: Michael S. Tsirkin <mst@redhat.com>
> >>>
> >>Personally, I got a custom guest which requires mwait for executing correctly.
> >Can you elaborate on this guest a little bit. With nop implementation
> >for mwait the guest will hog a host cpu. Do you consider this to be
> >"executing correctly?"
> >
> >--
> 
> mwait is not as "clean" as it may appear. It encounters false wake-ups due
> to a variety of reasons, and any code need to recheck the wake-up condition
> afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that
> degraded performance considerably (Nehalem, if I am not mistaken).
> Therefore, handling mwait as nop is logically correct (although it may
> degrade performance).
> 
> For the reference, if you look at the SDM 8.10.4, you'll see:
> "Multiple events other than a write to the triggering address range can
> cause a processor that executed MWAIT to wake up. These include events that
> would lead to voluntary or involuntary context switches, such as..."
> 
> Note the words "include" in the sentence "These include events". Software
> has no way of controlling whether it gets false wake-ups and cannot rely on
> the wake-up as indication to anything.
> 
That's all well and good and I didn't say that nop is not a valid
mwait implementation, it is, though there is a big difference between
"encounters false wake-ups" and never sleeps.  What I asked is do you
consider your guest hogging host cpu to be "executing correctly?". What
this guest is doing that such behaviour is tolerated and shouldn't it
be better to just poll for a condition you are waiting for instead of
executing expensive vmexits. This will also hog 100% host cpu, but will
be actually faster.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nadav Amit June 19, 2014, 12:10 p.m. UTC | #7

On 6/19/14, 3:07 PM, Gleb Natapov wrote:
> On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote:
>> On 6/19/14, 2:23 PM, Gleb Natapov wrote:
>>> On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
>>>>
>>>> On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>
>>>>> On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
>>>>>> On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
>>>>>>> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <namit@cs.technion.ac.il> wrote:
>>>>>>>> mwait and monitor are currently handled as nop. Considering this behavior, they
>>>>>>>> should still be handled correctly, i.e., check execution conditions and generate
>>>>>>>> exceptions when required. mwait and monitor may also be executed in real-mode
>>>>>>>> and are not handled in that case.  This patch performs the emulation of
>>>>>>>> monitor-mwait according to Intel SDM (other than checking whether interrupt can
>>>>>>>> be used as a break event).
>>>>>>>>
>>>>>>>> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
>>>>>>
>>>>>> How about this instead (details in the commit log below) ? Please let
>>>>>> me know what you think, and if you'd prefer me to send it out as a
>>>>>> separate patch rather than a reply to this thread.
>>>>>>
>>>>>> Thanks,
>>>>>> --Gabriel
>>>>>
>>>>> If there's an easy workaround, I'm inclined to agree.
>>>>> We can always go back to Gabriel's patch (and then we'll need
>>>>> Nadav's one too) but if we release a kernel with this
>>>>> support it becomes an ABI and we can't go back.
>>>>>
>>>>> So let's be careful here, and revert the hack for 3.16.
>>>>>
>>>>>
>>>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>>>>
>>>> Personally, I got a custom guest which requires mwait for executing correctly.
>>> Can you elaborate on this guest a little bit. With nop implementation
>>> for mwait the guest will hog a host cpu. Do you consider this to be
>>> "executing correctly?"
>>>
>>> --
>>
>> mwait is not as "clean" as it may appear. It encounters false wake-ups due
>> to a variety of reasons, and any code need to recheck the wake-up condition
>> afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that
>> degraded performance considerably (Nehalem, if I am not mistaken).
>> Therefore, handling mwait as nop is logically correct (although it may
>> degrade performance).
>>
>> For the reference, if you look at the SDM 8.10.4, you'll see:
>> "Multiple events other than a write to the triggering address range can
>> cause a processor that executed MWAIT to wake up. These include events that
>> would lead to voluntary or involuntary context switches, such as..."
>>
>> Note the words "include" in the sentence "These include events". Software
>> has no way of controlling whether it gets false wake-ups and cannot rely on
>> the wake-up as indication to anything.
>>
> That's all well and good and I didn't say that nop is not a valid
> mwait implementation, it is, though there is a big difference between
> "encounters false wake-ups" and never sleeps.  What I asked is do you
> consider your guest hogging host cpu to be "executing correctly?". What
> this guest is doing that such behaviour is tolerated and shouldn't it
> be better to just poll for a condition you are waiting for instead of
> executing expensive vmexits. This will also hog 100% host cpu, but will
> be actually faster.
>
You are correct, but unfortunately I have no control over the guest 
workload. In this specific workload I do not care about performance but 
only about correctness.

Nadav

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gleb Natapov June 19, 2014, 12:16 p.m. UTC | #8

On Thu, Jun 19, 2014 at 03:10:21PM +0300, Nadav Amit wrote:
> On 6/19/14, 3:07 PM, Gleb Natapov wrote:
> >On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote:
> >>On 6/19/14, 2:23 PM, Gleb Natapov wrote:
> >>>On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
> >>>>
> >>>>On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>>
> >>>>>On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
> >>>>>>On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
> >>>>>>>On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <namit@cs.technion.ac.il> wrote:
> >>>>>>>>mwait and monitor are currently handled as nop. Considering this behavior, they
> >>>>>>>>should still be handled correctly, i.e., check execution conditions and generate
> >>>>>>>>exceptions when required. mwait and monitor may also be executed in real-mode
> >>>>>>>>and are not handled in that case.  This patch performs the emulation of
> >>>>>>>>monitor-mwait according to Intel SDM (other than checking whether interrupt can
> >>>>>>>>be used as a break event).
> >>>>>>>>
> >>>>>>>>Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> >>>>>>
> >>>>>>How about this instead (details in the commit log below) ? Please let
> >>>>>>me know what you think, and if you'd prefer me to send it out as a
> >>>>>>separate patch rather than a reply to this thread.
> >>>>>>
> >>>>>>Thanks,
> >>>>>>--Gabriel
> >>>>>
> >>>>>If there's an easy workaround, I'm inclined to agree.
> >>>>>We can always go back to Gabriel's patch (and then we'll need
> >>>>>Nadav's one too) but if we release a kernel with this
> >>>>>support it becomes an ABI and we can't go back.
> >>>>>
> >>>>>So let's be careful here, and revert the hack for 3.16.
> >>>>>
> >>>>>
> >>>>>Acked-by: Michael S. Tsirkin <mst@redhat.com>
> >>>>>
> >>>>Personally, I got a custom guest which requires mwait for executing correctly.
> >>>Can you elaborate on this guest a little bit. With nop implementation
> >>>for mwait the guest will hog a host cpu. Do you consider this to be
> >>>"executing correctly?"
> >>>
> >>>--
> >>
> >>mwait is not as "clean" as it may appear. It encounters false wake-ups due
> >>to a variety of reasons, and any code need to recheck the wake-up condition
> >>afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that
> >>degraded performance considerably (Nehalem, if I am not mistaken).
> >>Therefore, handling mwait as nop is logically correct (although it may
> >>degrade performance).
> >>
> >>For the reference, if you look at the SDM 8.10.4, you'll see:
> >>"Multiple events other than a write to the triggering address range can
> >>cause a processor that executed MWAIT to wake up. These include events that
> >>would lead to voluntary or involuntary context switches, such as..."
> >>
> >>Note the words "include" in the sentence "These include events". Software
> >>has no way of controlling whether it gets false wake-ups and cannot rely on
> >>the wake-up as indication to anything.
> >>
> >That's all well and good and I didn't say that nop is not a valid
> >mwait implementation, it is, though there is a big difference between
> >"encounters false wake-ups" and never sleeps.  What I asked is do you
> >consider your guest hogging host cpu to be "executing correctly?". What
> >this guest is doing that such behaviour is tolerated and shouldn't it
> >be better to just poll for a condition you are waiting for instead of
> >executing expensive vmexits. This will also hog 100% host cpu, but will
> >be actually faster.
> >
> You are correct, but unfortunately I have no control over the guest
> workload. In this specific workload I do not care about performance but only
> about correctness.
> 
Fair enough. But can you at least hint what is this mysterious guest?

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin June 19, 2014, 12:17 p.m. UTC | #9

On Thu, Jun 19, 2014 at 03:10:21PM +0300, Nadav Amit wrote:
> On 6/19/14, 3:07 PM, Gleb Natapov wrote:
> >On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote:
> >>On 6/19/14, 2:23 PM, Gleb Natapov wrote:
> >>>On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
> >>>>
> >>>>On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>>
> >>>>>On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
> >>>>>>On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
> >>>>>>>On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <namit@cs.technion.ac.il> wrote:
> >>>>>>>>mwait and monitor are currently handled as nop. Considering this behavior, they
> >>>>>>>>should still be handled correctly, i.e., check execution conditions and generate
> >>>>>>>>exceptions when required. mwait and monitor may also be executed in real-mode
> >>>>>>>>and are not handled in that case.  This patch performs the emulation of
> >>>>>>>>monitor-mwait according to Intel SDM (other than checking whether interrupt can
> >>>>>>>>be used as a break event).
> >>>>>>>>
> >>>>>>>>Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> >>>>>>
> >>>>>>How about this instead (details in the commit log below) ? Please let
> >>>>>>me know what you think, and if you'd prefer me to send it out as a
> >>>>>>separate patch rather than a reply to this thread.
> >>>>>>
> >>>>>>Thanks,
> >>>>>>--Gabriel
> >>>>>
> >>>>>If there's an easy workaround, I'm inclined to agree.
> >>>>>We can always go back to Gabriel's patch (and then we'll need
> >>>>>Nadav's one too) but if we release a kernel with this
> >>>>>support it becomes an ABI and we can't go back.
> >>>>>
> >>>>>So let's be careful here, and revert the hack for 3.16.
> >>>>>
> >>>>>
> >>>>>Acked-by: Michael S. Tsirkin <mst@redhat.com>
> >>>>>
> >>>>Personally, I got a custom guest which requires mwait for executing correctly.
> >>>Can you elaborate on this guest a little bit. With nop implementation
> >>>for mwait the guest will hog a host cpu. Do you consider this to be
> >>>"executing correctly?"
> >>>
> >>>--
> >>
> >>mwait is not as "clean" as it may appear. It encounters false wake-ups due
> >>to a variety of reasons, and any code need to recheck the wake-up condition
> >>afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that
> >>degraded performance considerably (Nehalem, if I am not mistaken).
> >>Therefore, handling mwait as nop is logically correct (although it may
> >>degrade performance).
> >>
> >>For the reference, if you look at the SDM 8.10.4, you'll see:
> >>"Multiple events other than a write to the triggering address range can
> >>cause a processor that executed MWAIT to wake up. These include events that
> >>would lead to voluntary or involuntary context switches, such as..."
> >>
> >>Note the words "include" in the sentence "These include events". Software
> >>has no way of controlling whether it gets false wake-ups and cannot rely on
> >>the wake-up as indication to anything.
> >>
> >That's all well and good and I didn't say that nop is not a valid
> >mwait implementation, it is, though there is a big difference between
> >"encounters false wake-ups" and never sleeps.  What I asked is do you
> >consider your guest hogging host cpu to be "executing correctly?". What
> >this guest is doing that such behaviour is tolerated and shouldn't it
> >be better to just poll for a condition you are waiting for instead of
> >executing expensive vmexits. This will also hog 100% host cpu, but will
> >be actually faster.
> >
> You are correct, but unfortunately I have no control over the guest
> workload. In this specific workload I do not care about performance but only
> about correctness.
> 
> Nadav

No one prevents you from patching your kernel to run this workload.  But
is this of use to anyone else? If yes why?

Nadav Amit June 19, 2014, 12:28 p.m. UTC | #10

On 6/19/14, 3:17 PM, Michael S. Tsirkin wrote:
> On Thu, Jun 19, 2014 at 03:10:21PM +0300, Nadav Amit wrote:
>> On 6/19/14, 3:07 PM, Gleb Natapov wrote:
>>> On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote:
>>>> On 6/19/14, 2:23 PM, Gleb Natapov wrote:
>>>>> On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
>>>>>>
>>>>>> On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>
>>>>>>> On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
>>>>>>>> On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
>>>>>>>>> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <namit@cs.technion.ac.il> wrote:
>>>>>>>>>> mwait and monitor are currently handled as nop. Considering this behavior, they
>>>>>>>>>> should still be handled correctly, i.e., check execution conditions and generate
>>>>>>>>>> exceptions when required. mwait and monitor may also be executed in real-mode
>>>>>>>>>> and are not handled in that case.  This patch performs the emulation of
>>>>>>>>>> monitor-mwait according to Intel SDM (other than checking whether interrupt can
>>>>>>>>>> be used as a break event).
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
>>>>>>>>
>>>>>>>> How about this instead (details in the commit log below) ? Please let
>>>>>>>> me know what you think, and if you'd prefer me to send it out as a
>>>>>>>> separate patch rather than a reply to this thread.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> --Gabriel
>>>>>>>
>>>>>>> If there's an easy workaround, I'm inclined to agree.
>>>>>>> We can always go back to Gabriel's patch (and then we'll need
>>>>>>> Nadav's one too) but if we release a kernel with this
>>>>>>> support it becomes an ABI and we can't go back.
>>>>>>>
>>>>>>> So let's be careful here, and revert the hack for 3.16.
>>>>>>>
>>>>>>>
>>>>>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>>>>>>
>>>>>> Personally, I got a custom guest which requires mwait for executing correctly.
>>>>> Can you elaborate on this guest a little bit. With nop implementation
>>>>> for mwait the guest will hog a host cpu. Do you consider this to be
>>>>> "executing correctly?"
>>>>>
>>>>> --
>>>>
>>>> mwait is not as "clean" as it may appear. It encounters false wake-ups due
>>>> to a variety of reasons, and any code need to recheck the wake-up condition
>>>> afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that
>>>> degraded performance considerably (Nehalem, if I am not mistaken).
>>>> Therefore, handling mwait as nop is logically correct (although it may
>>>> degrade performance).
>>>>
>>>> For the reference, if you look at the SDM 8.10.4, you'll see:
>>>> "Multiple events other than a write to the triggering address range can
>>>> cause a processor that executed MWAIT to wake up. These include events that
>>>> would lead to voluntary or involuntary context switches, such as..."
>>>>
>>>> Note the words "include" in the sentence "These include events". Software
>>>> has no way of controlling whether it gets false wake-ups and cannot rely on
>>>> the wake-up as indication to anything.
>>>>
>>> That's all well and good and I didn't say that nop is not a valid
>>> mwait implementation, it is, though there is a big difference between
>>> "encounters false wake-ups" and never sleeps.  What I asked is do you
>>> consider your guest hogging host cpu to be "executing correctly?". What
>>> this guest is doing that such behaviour is tolerated and shouldn't it
>>> be better to just poll for a condition you are waiting for instead of
>>> executing expensive vmexits. This will also hog 100% host cpu, but will
>>> be actually faster.
>>>
>> You are correct, but unfortunately I have no control over the guest
>> workload. In this specific workload I do not care about performance but only
>> about correctness.
>>
>> Nadav
>
> No one prevents you from patching your kernel to run this workload.  But
> is this of use to anyone else? If yes why?
>
I do not say it should be the default behavior, and I can try to push to 
qemu some setting to turn it on by demand.

Anyhow, I believe there are cases you may want mwait support - either an 
OS X guest which was not modified to run without mwait, or for debugging 
the monitor-mwait flow of a guest OS.

I am not going to argue too much. Since I was under the impression there 
are needs for mwait, other than mine, I thought it would make all of our 
lives easier to have a better implementation.

Nadav
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[3/3] KVM: x86: correct mwait and monitor emulation

Commit Message

Comments

Patch