diff mbox series

[2/4] KVM: VMX/pmu: Save host debugctlmsr just before vm entry

Message ID 20230616113353.45202-3-xiong.y.zhang@intel.com (mailing list archive)
State New, archived
Headers show
Series Part of fix for host and guest LBR event coexist | expand

Commit Message

Zhang, Xiong Y June 16, 2023, 11:33 a.m. UTC
Perf defines four types of perf event: per cpu pinned event, per process
pinned event, per cpu event, per process event, their prioirity are from
high to low. vLBR event is per process pinned event. So durng vm exit
handler, if vLBR event preempts perf low priority LBR event, perf will
disable LBR and let guest control LBR, or if vLBR event is preempted by
perf high priority LBR event, perf will enable LBR. In a word LBR status
may be changed during vm exit handler.

MSR_IA32_DEBUGCTLMSR[0] controls LBR enabling, kvm saves its value into
vmx->host_debugctlmsr in vcpu_load(), and kvm restores its value from
vmx->host_debugctlmsr after vm exit immediately. Since
MSR_IA32_DEBUGCTLMSR[0] could be changed during vm exit handler, the
saved value vmx->host_debugctlmsr could be wrong. So this commit saves
MSR_IA32_DEBUGCTLMSR into vmx->host_debugctlmsr just before vm entry to
reflect the real hardware value.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

Comments

Sean Christopherson June 23, 2023, 8:15 p.m. UTC | #1
On Fri, Jun 16, 2023, Xiong Zhang wrote:
> Perf defines four types of perf event: per cpu pinned event, per process
> pinned event, per cpu event, per process event, their prioirity are from
> high to low. vLBR event is per process pinned event. So durng vm exit
> handler, if vLBR event preempts perf low priority LBR event, perf will
> disable LBR and let guest control LBR, or if vLBR event is preempted by
> perf high priority LBR event, perf will enable LBR. In a word LBR status
> may be changed during vm exit handler.
> 
> MSR_IA32_DEBUGCTLMSR[0] controls LBR enabling, kvm saves its value into
> vmx->host_debugctlmsr in vcpu_load(), and kvm restores its value from
> vmx->host_debugctlmsr after vm exit immediately. Since
> MSR_IA32_DEBUGCTLMSR[0] could be changed during vm exit handler, the
> saved value vmx->host_debugctlmsr could be wrong. So this commit saves
> MSR_IA32_DEBUGCTLMSR into vmx->host_debugctlmsr just before vm entry to
> reflect the real hardware value.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 44fb619803b8..5ca61a26d0d7 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1459,13 +1459,9 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
>   */
>  static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -
>  	vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
>  
>  	vmx_vcpu_pi_load(vcpu, cpu);
> -
> -	vmx->host_debugctlmsr = get_debugctlmsr();
>  }
>  
>  static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -7273,6 +7269,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	atomic_switch_perf_msrs(vmx);
>  	if (intel_pmu_lbr_is_enabled(vcpu))
>  		vmx_passthrough_lbr_msrs(vcpu);
> +	vmx->host_debugctlmsr = get_debugctlmsr();

Reading DEBUG_CTL on every VM-Entry is either unnecessary or insufficient.  If
the DEBUG_CTL value is being changed synchronously, then just fix whatever KVM
path leads to a change in the host avlue.  If DEBUG_CTL is being changed
asynchronously, then I'm guessing the change is coming from NMI context, which
means that KVM is buggy no matter how close we put this to VM-Enter.
Zhang, Xiong Y June 25, 2023, 4:03 a.m. UTC | #2
> On Fri, Jun 16, 2023, Xiong Zhang wrote:
> > Perf defines four types of perf event: per cpu pinned event, per
> > process pinned event, per cpu event, per process event, their
> > prioirity are from high to low. vLBR event is per process pinned
> > event. So durng vm exit handler, if vLBR event preempts perf low
> > priority LBR event, perf will disable LBR and let guest control LBR,
> > or if vLBR event is preempted by perf high priority LBR event, perf
> > will enable LBR. In a word LBR status may be changed during vm exit handler.
> >
> > MSR_IA32_DEBUGCTLMSR[0] controls LBR enabling, kvm saves its value
> > into
> > vmx->host_debugctlmsr in vcpu_load(), and kvm restores its value from
> > vmx->host_debugctlmsr after vm exit immediately. Since
> > MSR_IA32_DEBUGCTLMSR[0] could be changed during vm exit handler, the
> > saved value vmx->host_debugctlmsr could be wrong. So this commit saves
> > MSR_IA32_DEBUGCTLMSR into vmx->host_debugctlmsr just before vm entry
> > to reflect the real hardware value.
> >
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > 44fb619803b8..5ca61a26d0d7 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1459,13 +1459,9 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu,
> int cpu,
> >   */
> >  static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)  {
> > -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > -
> >  	vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
> >
> >  	vmx_vcpu_pi_load(vcpu, cpu);
> > -
> > -	vmx->host_debugctlmsr = get_debugctlmsr();
> >  }
> >
> >  static void vmx_vcpu_put(struct kvm_vcpu *vcpu) @@ -7273,6 +7269,7 @@
> > static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >  	atomic_switch_perf_msrs(vmx);
> >  	if (intel_pmu_lbr_is_enabled(vcpu))
> >  		vmx_passthrough_lbr_msrs(vcpu);
> > +	vmx->host_debugctlmsr = get_debugctlmsr();
> 
> Reading DEBUG_CTL on every VM-Entry is either unnecessary or insufficient.  If
> the DEBUG_CTL value is being changed synchronously, then just fix whatever
> KVM path leads to a change in the host avlue.  If DEBUG_CTL is being changed
> asynchronously, then I'm guessing the change is coming from NMI context,
> which means that KVM is buggy no matter how close we put this to VM-Enter.
When a perf event reschedule is needed on a physical cpu, perf scheduler send an IPI to the target cpu, LBR will be enabled or disabled in the IPI handler according to active event attribute.
If vLBR event is active, LBR is disabled in IPI handler.
If Host LBR event is active, LBR is enabled in the IPI handler, this could happen when host LBR event preempt vLBR event during vm exit handler.
DEBUG_CTL[0]'s changing is asynchronous in the perf IPI handler,  host irq is disabled near VM-Enter, so IPI couldn't happen, then host DEBUG_CTL[0] couldn't change before kvm enable host irq.
Perf event counter overflow (PMI) is a NMI, but this NMI handler doesn't change LBR status, the kvm saved host_debugctlmsr is correct still after PMI handler.

thanks
Like Xu June 28, 2023, 5:37 a.m. UTC | #3
On 25/6/2023 12:03 pm, Zhang, Xiong Y wrote:
>> On Fri, Jun 16, 2023, Xiong Zhang wrote:
>>> Perf defines four types of perf event: per cpu pinned event, per
>>> process pinned event, per cpu event, per process event, their
>>> prioirity are from high to low. vLBR event is per process pinned
>>> event. So durng vm exit handler, if vLBR event preempts perf low
>>> priority LBR event, perf will disable LBR and let guest control LBR,
>>> or if vLBR event is preempted by perf high priority LBR event, perf
>>> will enable LBR. In a word LBR status may be changed during vm exit handler.
>>>
>>> MSR_IA32_DEBUGCTLMSR[0] controls LBR enabling, kvm saves its value
>>> into
>>> vmx->host_debugctlmsr in vcpu_load(), and kvm restores its value from
>>> vmx->host_debugctlmsr after vm exit immediately. Since
>>> MSR_IA32_DEBUGCTLMSR[0] could be changed during vm exit handler, the
>>> saved value vmx->host_debugctlmsr could be wrong. So this commit saves
>>> MSR_IA32_DEBUGCTLMSR into vmx->host_debugctlmsr just before vm entry
>>> to reflect the real hardware value.
>>>
>>> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
>>> ---
>>>   arch/x86/kvm/vmx/vmx.c | 5 +----
>>>   1 file changed, 1 insertion(+), 4 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
>>> 44fb619803b8..5ca61a26d0d7 100644
>>> --- a/arch/x86/kvm/vmx/vmx.c
>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>> @@ -1459,13 +1459,9 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu,
>> int cpu,
>>>    */
>>>   static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)  {
>>> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>> -
>>>   	vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
>>>
>>>   	vmx_vcpu_pi_load(vcpu, cpu);
>>> -
>>> -	vmx->host_debugctlmsr = get_debugctlmsr();
>>>   }
>>>
>>>   static void vmx_vcpu_put(struct kvm_vcpu *vcpu) @@ -7273,6 +7269,7 @@
>>> static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>>   	atomic_switch_perf_msrs(vmx);
>>>   	if (intel_pmu_lbr_is_enabled(vcpu))
>>>   		vmx_passthrough_lbr_msrs(vcpu);
>>> +	vmx->host_debugctlmsr = get_debugctlmsr();
>>
>> Reading DEBUG_CTL on every VM-Entry is either unnecessary or insufficient.  If
>> the DEBUG_CTL value is being changed synchronously, then just fix whatever
>> KVM path leads to a change in the host avlue.  If DEBUG_CTL is being changed
>> asynchronously, then I'm guessing the change is coming from NMI context,
>> which means that KVM is buggy no matter how close we put this to VM-Enter.
> When a perf event reschedule is needed on a physical cpu, perf scheduler send an IPI to the target cpu, LBR will be enabled or disabled in the IPI handler according to active event attribute.
> If vLBR event is active, LBR is disabled in IPI handler.
> If Host LBR event is active, LBR is enabled in the IPI handler, this could happen when host LBR event preempt vLBR event during vm exit handler.
> DEBUG_CTL[0]'s changing is asynchronous in the perf IPI handler,  host irq is disabled near VM-Enter, so IPI couldn't happen, then host DEBUG_CTL[0] couldn't change before kvm enable host irq.
> Perf event counter overflow (PMI) is a NMI, but this NMI handler doesn't change LBR status, the kvm saved host_debugctlmsr is correct still after PMI handler.
> 
> thanks

This is not true. One example is Freezing LBRs on PMI (bit 11) in the host NMI ctx.

For "Legacy Freeze_LBR_on_PMI" feature on a host, "the LBR is frozen on the
overflowed condition of the buffer area, the processor clears the LBR bit
(bit 0) in IA32_DEBUGCTL."

Not to mention that the commit message makes no mention of the effect of
this change on other features on DEBUG_CTL.

I couldn't agree with Sean more here. I think the first is to make sure that 
debugctl's
functionality is not broken in both root mode and non-root mode, followed closely
by what policy should be set and notified to any user if host/kvm are not in a
position to support either side.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 44fb619803b8..5ca61a26d0d7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1459,13 +1459,9 @@  void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
  */
 static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-
 	vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
 
 	vmx_vcpu_pi_load(vcpu, cpu);
-
-	vmx->host_debugctlmsr = get_debugctlmsr();
 }
 
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
@@ -7273,6 +7269,7 @@  static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	atomic_switch_perf_msrs(vmx);
 	if (intel_pmu_lbr_is_enabled(vcpu))
 		vmx_passthrough_lbr_msrs(vcpu);
+	vmx->host_debugctlmsr = get_debugctlmsr();
 
 	if (enable_preemption_timer)
 		vmx_update_hv_timer(vcpu);