diff mbox series

[3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation

Message ID 20230616113353.45202-4-xiong.y.zhang@intel.com (mailing list archive)
State New, archived
Headers show
Series Part of fix for host and guest LBR event coexist | expand

Commit Message

Zhang, Xiong Y June 16, 2023, 11:33 a.m. UTC
vLBR event could be inactive in two case:
a. host per cpu pinned LBR event occupy LBR when vLBR event is created
b. vLBR event is preempted by host per cpu pinned LBR event during vm
exit handler.
When vLBR event is inactive, guest couldn't access LBR msr, and it is
forced into error state and is excluded from schedule by perf scheduler.
So vLBR event couldn't be active through perf scheduler even if host per
cpu pinned LBR event has released LBR, kvm could enable vLBR event
proactively, then vLBR event may be active and LBR msr could be passthrough
into guest.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Sean Christopherson June 23, 2023, 8:38 p.m. UTC | #1
On Fri, Jun 16, 2023, Xiong Zhang wrote:
> vLBR event could be inactive in two case:
> a. host per cpu pinned LBR event occupy LBR when vLBR event is created
> b. vLBR event is preempted by host per cpu pinned LBR event during vm
> exit handler.
> When vLBR event is inactive, guest couldn't access LBR msr, and it is
> forced into error state and is excluded from schedule by perf scheduler.
> So vLBR event couldn't be active through perf scheduler even if host per
> cpu pinned LBR event has released LBR, kvm could enable vLBR event
> proactively, then vLBR event may be active and LBR msr could be passthrough
> into guest.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>  arch/x86/kvm/vmx/pmu_intel.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 741efe2c497b..5a3ab8c8711b 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -314,7 +314,16 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
>  	if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
>  		return false;
>  
> -	if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
> +	/* vLBR event may be inactive, but physical LBR may be free now.

	/*
	 * This is the preferred block comment style.
	 */

> +	 * but vLBR event is pinned event, once it is inactive state, perf
> +	 * will force it to error state in merge_sched_in() and exclude it from
> +	 * perf schedule, so even if LBR is free now, vLBR event couldn't be active
> +	 * through perf scheduler and vLBR event could be active through
> +	 * perf_event_enable().
> +	 */

Trimming that down, is this what you mean?

	/*
	 * Attempt to re-enable the vLBR event if it was disabled due to
	 * contention with host LBR usage, i.e. was put into an error state.
	 * Perf doesn't notify KVM if the host stops using LBRs, i.e. KVM needs
	 * to manually re-enable the event.
	 */

Which begs the question, why can't there be a notification of some form that the
LBRs are once again available?

Assuming that's too difficult for whatever reason, why wait until the guest tries
to read LBRs?  E.g. why not be more aggressive and try to re-enable vLBRs on every
VM-Exit.

And if we do wait until the guest touches relevant MSRs, shouldn't writes to
DEBUG_CTL that set DEBUGCTLMSR_LBR also try to re-enable the event?

Lastly, what guarantees that the MSRs hold guest data?  I assume perf purges the
MSRs at some point, but it would be helpful to call that out in the changelog.

> +	if (lbr_desc->event && (lbr_desc->event->state == PERF_EVENT_STATE_ERROR))
> +		perf_event_enable(lbr_desc->event);
> +	else if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
>  		goto dummy;
>  
>  	/*
> -- 
> 2.25.1
>
Zhang, Xiong Y June 25, 2023, 6:54 a.m. UTC | #2
> > On Fri, Jun 16, 2023, Xiong Zhang wrote:
> > vLBR event could be inactive in two case:
> > a. host per cpu pinned LBR event occupy LBR when vLBR event is created
> > b. vLBR event is preempted by host per cpu pinned LBR event during vm
> > exit handler.
> > When vLBR event is inactive, guest couldn't access LBR msr, and it is
> > forced into error state and is excluded from schedule by perf scheduler.
> > So vLBR event couldn't be active through perf scheduler even if host
> > per cpu pinned LBR event has released LBR, kvm could enable vLBR event
> > proactively, then vLBR event may be active and LBR msr could be
> > passthrough into guest.
> >
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > ---
> >  arch/x86/kvm/vmx/pmu_intel.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/pmu_intel.c
> > b/arch/x86/kvm/vmx/pmu_intel.c index 741efe2c497b..5a3ab8c8711b 100644
> > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > @@ -314,7 +314,16 @@ static bool intel_pmu_handle_lbr_msrs_access(struct
> kvm_vcpu *vcpu,
> >  	if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
> >  		return false;
> >
> > -	if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
> > +	/* vLBR event may be inactive, but physical LBR may be free now.
> 
> 	/*
> 	 * This is the preferred block comment style.
> 	 */
> 
> > +	 * but vLBR event is pinned event, once it is inactive state, perf
> > +	 * will force it to error state in merge_sched_in() and exclude it from
> > +	 * perf schedule, so even if LBR is free now, vLBR event couldn't be
> active
> > +	 * through perf scheduler and vLBR event could be active through
> > +	 * perf_event_enable().
> > +	 */
> 
> Trimming that down, is this what you mean?
Yes, thanks a lot.
> 
> 	/*
> 	 * Attempt to re-enable the vLBR event if it was disabled due to
> 	 * contention with host LBR usage, i.e. was put into an error state.
> 	 * Perf doesn't notify KVM if the host stops using LBRs, i.e. KVM needs
> 	 * to manually re-enable the event.
> 	 */
> 
> Which begs the question, why can't there be a notification of some form that
> the LBRs are once again available?
This is perf scheduler rule. If pinned event couldn't get resource as resource limitation, perf will put it into error state and exclude it from perf scheduler, even if resource available later, perf won't schedule it again as it is in error state, the only way to reschedule it is to enable it again. 
If non-pinned event couldn't get resource as resource limitation, perf will put it into inactive state, perf will reschedule it automatically once resource is available.
vLBR event is per process pinned event.
> 
> Assuming that's too difficult for whatever reason, why wait until the guest tries
> to read LBRs?  E.g. why not be more aggressive and try to re-enable vLBRs on
> every VM-Exit.
Yes, it is a good suggestion. Actually vmx_passthrough_lbr_msrs() is called on every
VM-exit, it also check vLBR event state, so I could re-enable vLBR in this function.
> 
> And if we do wait until the guest touches relevant MSRs, shouldn't writes to
> DEBUG_CTL that set DEBUGCTLMSR_LBR also try to re-enable the event?
Only perf know whether vLBR event could be active or not at this moment, if vLBR is active, KVM could read/write DEBUG_CTL[0] with irq disable/enable pair in theory, but it is better that kvm don't touch perf hw resource directly, as vLBR event is one host LBR event, host may have other LBR event, perf will schedule them according to perf scheduler rule.  If vLBR is inactive, KVM shouldn't touch DEBUG_CTL MSR totally.
> 
> Lastly, what guarantees that the MSRs hold guest data?  I assume perf purges
> the MSRs at some point, but it would be helpful to call that out in the changelog.
For DEBUG_CTL msr, VMCS has two fields for this:
1. "Save debug controls" in VM-Exit controls
2. "Load debug controls" in VM-Entry controls
For LBR records MSRs, perf will save them at process schedule out and load them at process schedule in.
Sure, I will add it into changelog.

> > +	if (lbr_desc->event && (lbr_desc->event->state ==
> PERF_EVENT_STATE_ERROR))
> > +		perf_event_enable(lbr_desc->event);
> > +	else if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu)
> > +< 0)
> >  		goto dummy;
> >
> >  	/*
> > --
> > 2.25.1
> >
Sean Christopherson June 26, 2023, 5 p.m. UTC | #3
+Weijiang

On Sun, Jun 25, 2023, Xiong Y Zhang wrote:
> > > On Fri, Jun 16, 2023, Xiong Zhang wrote:
> > > vLBR event could be inactive in two case:
> > > a. host per cpu pinned LBR event occupy LBR when vLBR event is created
> > > b. vLBR event is preempted by host per cpu pinned LBR event during vm
> > > exit handler.
> > > When vLBR event is inactive, guest couldn't access LBR msr, and it is
> > > forced into error state and is excluded from schedule by perf scheduler.
> > > So vLBR event couldn't be active through perf scheduler even if host
> > > per cpu pinned LBR event has released LBR, kvm could enable vLBR event
> > > proactively, then vLBR event may be active and LBR msr could be
> > > passthrough into guest.
> > >
> > > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > > ---
> > >  arch/x86/kvm/vmx/pmu_intel.c | 11 ++++++++++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c
> > > b/arch/x86/kvm/vmx/pmu_intel.c index 741efe2c497b..5a3ab8c8711b 100644
> > > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > > @@ -314,7 +314,16 @@ static bool intel_pmu_handle_lbr_msrs_access(struct
> > kvm_vcpu *vcpu,
> > >  	if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
> > >  		return false;
> > >
> > > -	if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
> > > +	/* vLBR event may be inactive, but physical LBR may be free now.
> > 
> > 	/*
> > 	 * This is the preferred block comment style.
> > 	 */
> > 
> > > +	 * but vLBR event is pinned event, once it is inactive state, perf
> > > +	 * will force it to error state in merge_sched_in() and exclude it from
> > > +	 * perf schedule, so even if LBR is free now, vLBR event couldn't be
> > active
> > > +	 * through perf scheduler and vLBR event could be active through
> > > +	 * perf_event_enable().
> > > +	 */
> > 
> > Trimming that down, is this what you mean?
> Yes, thanks a lot.
> > 
> > 	/*
> > 	 * Attempt to re-enable the vLBR event if it was disabled due to
> > 	 * contention with host LBR usage, i.e. was put into an error state.
> > 	 * Perf doesn't notify KVM if the host stops using LBRs, i.e. KVM needs
> > 	 * to manually re-enable the event.
> > 	 */
> > 
> > Which begs the question, why can't there be a notification of some form that
> > the LBRs are once again available?
> This is perf scheduler rule. If pinned event couldn't get resource as
> resource limitation, perf will put it into error state and exclude it from
> perf scheduler, even if resource available later, perf won't schedule it
> again as it is in error state, the only way to reschedule it is to enable it
> again.  If non-pinned event couldn't get resource as resource limitation,
> perf will put it into inactive state, perf will reschedule it automatically
> once resource is available.  vLBR event is per process pinned event.

That doesn't answer my question.  I get that all of this is subject to perf
scheduling, I'm asking why perf doesn't communicate directly with KVM to coordinate
access to LBRs instead of pulling the rug out from under KVM.

> > Assuming that's too difficult for whatever reason, why wait until the guest tries
> > to read LBRs?  E.g. why not be more aggressive and try to re-enable vLBRs on
> > every VM-Exit.
> Yes, it is a good suggestion. Actually vmx_passthrough_lbr_msrs() is called on every
> VM-exit, it also check vLBR event state, so I could re-enable vLBR in this function.
> > 
> > And if we do wait until the guest touches relevant MSRs, shouldn't writes to
> > DEBUG_CTL that set DEBUGCTLMSR_LBR also try to re-enable the event?
> Only perf know whether vLBR event could be active or not at this moment, if
> vLBR is active, KVM could read/write DEBUG_CTL[0] with irq disable/enable
> pair in theory, but it is better that kvm don't touch perf hw resource
> directly, as vLBR event is one host LBR event, host may have other LBR event,
> perf will schedule them according to perf scheduler rule.  If vLBR is
> inactive, KVM shouldn't touch DEBUG_CTL MSR totally.

Again, this doesn't answer my question.  I didn't suggest KVM write anything
directly, I asked why KVM doesn't try to re-enable the perf LBR event when emulating
a guest write to DEBUG_CTL.

> > Lastly, what guarantees that the MSRs hold guest data?  I assume perf purges
> > the MSRs at some point, but it would be helpful to call that out in the changelog.
> For DEBUG_CTL msr, VMCS has two fields for this:
> 1. "Save debug controls" in VM-Exit controls
> 2. "Load debug controls" in VM-Entry controls
> For LBR records MSRs, perf will save them at process schedule out and load them at process schedule in.

Once again, this doesn't answer my question.  I want to know *exactly* when perf
can take control of the LBRs.  The fact that intel_pmu_handle_lbr_msrs_access()
disables IRQs when checking lbr_desc->event->state strongly suggests that the
answer isn't "when a task is scheduled in".

Your other response[1] mostly answered that question, but I want explicit documentation
on the contract between perf and KVM with respect to LBRs.  In short, please work
with Weijiang to fulfill my request/demand[*] that someone document KVM's LBR support,
and justify the "design".  I am simply not willing to take KVM LBR patches until that
documentation is provided.

Copy+pasting my request/demand for convenience:

  : First and foremost, the existing LBR support needs to be documented.  Someone,
  : I don't care who, needs to provide a detailed writeup of the contract between KVM
  : and perf.  Specifically, I want to know:
  : 
  :   1. When exactly is perf allowed to take control of LBR MRS.  Task switch?  IRQ?
  :      NMI?
  : 
  :   2. What is the expected behavior when perf is using LBRs?  Is the guest supposed
  :      to be traced?
  : 
  :   3. Why does KVM snapshot DEBUGCTL with IRQs enabled, but disables IRQs when
  :      accessing LBR MSRs?
  : 
  : It doesn't have to be polished, e.g. I'll happily wordsmith things into proper
  : documentation, but I want to have a very clear understanding of how LBR support
  : is _intended_ to function and how it all _actually_ functions without having to
  : make guesses.
  : 
  : And depending on the answers, I want to revisit KVM's LBR implementation before
  : tackling arch LBRs.  Letting perf usurp LBRs while KVM has the vCPU loaded is
  : frankly ridiculous.  Just have perf set a flag telling KVM that it needs to take
  : control of LBRs and have KVM service the flag as a request or something.  Stealing
  : the LBRs back in IRQ context adds a stupid amount of complexity without much value,
  : e.g. waiting a few branches for KVM to get to a safe place isn't going to meaningfully
  : change the traces.  If that can't actually happen, then why on earth does KVM need
  : to disable IRQs to read MSRs?
  : 
  : And AFAICT, since KVM unconditionally loads the guest's DEBUGCTL, whether or not
  : guest branches show up in the LBRs when the host is tracing is completely up to
  : the whims of the guest.  If that's correct, then again, what's the point of the
  : dance between KVM and perf?

[1] https://lkml.kernel.org/r/MW4PR11MB5824480492ED55C63769FF11BB21A%40MW4PR11MB5824.namprd11.prod.outlook.com
[2] https://lore.kernel.org/all/Y9RUOvJ5dkCU9J8C@google.com
Zhang, Xiong Y June 27, 2023, 3:29 a.m. UTC | #4
> On Sun, Jun 25, 2023, Xiong Y Zhang wrote:
> > > > On Fri, Jun 16, 2023, Xiong Zhang wrote:
> > > > vLBR event could be inactive in two case:
> > > > a. host per cpu pinned LBR event occupy LBR when vLBR event is
> > > > created b. vLBR event is preempted by host per cpu pinned LBR
> > > > event during vm exit handler.
> > > > When vLBR event is inactive, guest couldn't access LBR msr, and it
> > > > is forced into error state and is excluded from schedule by perf scheduler.
> > > > So vLBR event couldn't be active through perf scheduler even if
> > > > host per cpu pinned LBR event has released LBR, kvm could enable
> > > > vLBR event proactively, then vLBR event may be active and LBR msr
> > > > could be passthrough into guest.
> > > >
> > > > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > > > ---
> > > >  arch/x86/kvm/vmx/pmu_intel.c | 11 ++++++++++-
> > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c
> > > > b/arch/x86/kvm/vmx/pmu_intel.c index 741efe2c497b..5a3ab8c8711b
> > > > 100644
> > > > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > > > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > > > @@ -314,7 +314,16 @@ static bool
> > > > intel_pmu_handle_lbr_msrs_access(struct
> > > kvm_vcpu *vcpu,
> > > >  	if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
> > > >  		return false;
> > > >
> > > > -	if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
> > > > +	/* vLBR event may be inactive, but physical LBR may be free now.
> > >
> > > 	/*
> > > 	 * This is the preferred block comment style.
> > > 	 */
> > >
> > > > +	 * but vLBR event is pinned event, once it is inactive state, perf
> > > > +	 * will force it to error state in merge_sched_in() and exclude it from
> > > > +	 * perf schedule, so even if LBR is free now, vLBR event
> > > > +couldn't be
> > > active
> > > > +	 * through perf scheduler and vLBR event could be active through
> > > > +	 * perf_event_enable().
> > > > +	 */
> > >
> > > Trimming that down, is this what you mean?
> > Yes, thanks a lot.
> > >
> > > 	/*
> > > 	 * Attempt to re-enable the vLBR event if it was disabled due to
> > > 	 * contention with host LBR usage, i.e. was put into an error state.
> > > 	 * Perf doesn't notify KVM if the host stops using LBRs, i.e. KVM needs
> > > 	 * to manually re-enable the event.
> > > 	 */
> > >
> > > Which begs the question, why can't there be a notification of some
> > > form that the LBRs are once again available?
> > This is perf scheduler rule. If pinned event couldn't get resource as
> > resource limitation, perf will put it into error state and exclude it
> > from perf scheduler, even if resource available later, perf won't
> > schedule it again as it is in error state, the only way to reschedule
> > it is to enable it again.  If non-pinned event couldn't get resource
> > as resource limitation, perf will put it into inactive state, perf
> > will reschedule it automatically once resource is available.  vLBR event is per
> process pinned event.
> 
> That doesn't answer my question.  I get that all of this is subject to perf
> scheduling, I'm asking why perf doesn't communicate directly with KVM to
> coordinate access to LBRs instead of pulling the rug out from under KVM.
Perf doesn't need such notification interface currently, as non-pinned event will be active automatically once resource available, only pinned event is still in inactive even if resource available, perf may refuse to add such interface for KVM usage only. Perf may suggest us to modify vLBR event from pinned to nonpinned.
> 
> > > Assuming that's too difficult for whatever reason, why wait until
> > > the guest tries to read LBRs?  E.g. why not be more aggressive and
> > > try to re-enable vLBRs on every VM-Exit.
> > Yes, it is a good suggestion. Actually vmx_passthrough_lbr_msrs() is
> > called on every VM-exit, it also check vLBR event state, so I could re-enable
> vLBR in this function.
> > >
> > > And if we do wait until the guest touches relevant MSRs, shouldn't
> > > writes to DEBUG_CTL that set DEBUGCTLMSR_LBR also try to re-enable the
> event?
> > Only perf know whether vLBR event could be active or not at this
> > moment, if vLBR is active, KVM could read/write DEBUG_CTL[0] with irq
> > disable/enable pair in theory, but it is better that kvm don't touch
> > perf hw resource directly, as vLBR event is one host LBR event, host
> > may have other LBR event, perf will schedule them according to perf
> > scheduler rule.  If vLBR is inactive, KVM shouldn't touch DEBUG_CTL MSR
> totally.
> 
> Again, this doesn't answer my question.  I didn't suggest KVM write anything
> directly, I asked why KVM doesn't try to re-enable the perf LBR event when
> emulating a guest write to DEBUG_CTL.
Yes, kvm should re-enable vLBR event when emulating a guest write to DEBUG_CTL. Since inactive vLBR event will be re-enabled in every VM-Exit according to the above suggestion, no extra code is needed here.
> 
> > > Lastly, what guarantees that the MSRs hold guest data?  I assume
> > > perf purges the MSRs at some point, but it would be helpful to call that out in
> the changelog.
> > For DEBUG_CTL msr, VMCS has two fields for this:
> > 1. "Save debug controls" in VM-Exit controls 2. "Load debug controls"
> > in VM-Entry controls For LBR records MSRs, perf will save them at
> > process schedule out and load them at process schedule in.
> 
> Once again, this doesn't answer my question.  I want to know *exactly* when
> perf can take control of the LBRs.  The fact that
> intel_pmu_handle_lbr_msrs_access()
> disables IRQs when checking lbr_desc->event->state strongly suggests that the
> answer isn't "when a task is scheduled in".
> 
> Your other response[1] mostly answered that question, but I want explicit
> documentation on the contract between perf and KVM with respect to LBRs.  In
> short, please work with Weijiang to fulfill my request/demand[*] that someone
> document KVM's LBR support, and justify the "design".  I am simply not willing to
> take KVM LBR patches until that documentation is provided.
Sure, I will work with Weijiang to supply such documentation. Will this document be put in Documentation/virt/kvm/x86/ ?
> 
> Copy+pasting my request/demand for convenience:
> 
>   : First and foremost, the existing LBR support needs to be documented.
> Someone,
>   : I don't care who, needs to provide a detailed writeup of the contract between
> KVM
>   : and perf.  Specifically, I want to know:
>   :
>   :   1. When exactly is perf allowed to take control of LBR MRS.  Task switch?
> IRQ?
>   :      NMI?
>   :
>   :   2. What is the expected behavior when perf is using LBRs?  Is the guest
> supposed
>   :      to be traced?
>   :
>   :   3. Why does KVM snapshot DEBUGCTL with IRQs enabled, but disables IRQs
> when
>   :      accessing LBR MSRs?
>   :
>   : It doesn't have to be polished, e.g. I'll happily wordsmith things into proper
>   : documentation, but I want to have a very clear understanding of how LBR
> support
>   : is _intended_ to function and how it all _actually_ functions without having to
>   : make guesses.
>   :
>   : And depending on the answers, I want to revisit KVM's LBR implementation
> before
>   : tackling arch LBRs.  Letting perf usurp LBRs while KVM has the vCPU loaded is
>   : frankly ridiculous.  Just have perf set a flag telling KVM that it needs to take
>   : control of LBRs and have KVM service the flag as a request or something.
> Stealing
>   : the LBRs back in IRQ context adds a stupid amount of complexity without
> much value,
>   : e.g. waiting a few branches for KVM to get to a safe place isn't going to
> meaningfully
>   : change the traces.  If that can't actually happen, then why on earth does KVM
> need
>   : to disable IRQs to read MSRs?
>   :
>   : And AFAICT, since KVM unconditionally loads the guest's DEBUGCTL, whether
> or not
>   : guest branches show up in the LBRs when the host is tracing is completely up
> to
>   : the whims of the guest.  If that's correct, then again, what's the point of the
>   : dance between KVM and perf?
> 
> [1]
> https://lkml.kernel.org/r/MW4PR11MB5824480492ED55C63769FF11BB21A%40
> MW4PR11MB5824.namprd11.prod.outlook.com
> [2] https://lore.kernel.org/all/Y9RUOvJ5dkCU9J8C@google.com
Sean Christopherson June 27, 2023, 3:07 p.m. UTC | #5
On Tue, Jun 27, 2023, Xiong Y Zhang wrote:
> > On Sun, Jun 25, 2023, Xiong Y Zhang wrote:
> > > > > On Fri, Jun 16, 2023, Xiong Zhang wrote:
> > > > 	/*
> > > > 	 * Attempt to re-enable the vLBR event if it was disabled due to
> > > > 	 * contention with host LBR usage, i.e. was put into an error state.
> > > > 	 * Perf doesn't notify KVM if the host stops using LBRs, i.e. KVM needs
> > > > 	 * to manually re-enable the event.
> > > > 	 */
> > > >
> > > > Which begs the question, why can't there be a notification of some
> > > > form that the LBRs are once again available?
> > > This is perf scheduler rule. If pinned event couldn't get resource as
> > > resource limitation, perf will put it into error state and exclude it
> > > from perf scheduler, even if resource available later, perf won't
> > > schedule it again as it is in error state, the only way to reschedule
> > > it is to enable it again.  If non-pinned event couldn't get resource
> > > as resource limitation, perf will put it into inactive state, perf
> > > will reschedule it automatically once resource is available.  vLBR event is per
> > process pinned event.
> > 
> > That doesn't answer my question.  I get that all of this is subject to perf
> > scheduling, I'm asking why perf doesn't communicate directly with KVM to
> > coordinate access to LBRs instead of pulling the rug out from under KVM.
> Perf doesn't need such notification interface currently, as non-pinned event
> will be active automatically once resource available, only pinned event is
> still in inactive even if resource available, perf may refuse to add such
> interface for KVM usage only.

Or maybe perf will be overjoyed that someone is finally proposing a coherent
interface.  Until we actually try/ask, we'll never know.

> > Your other response[1] mostly answered that question, but I want explicit
> > documentation on the contract between perf and KVM with respect to LBRs.  In
> > short, please work with Weijiang to fulfill my request/demand[*] that someone
> > document KVM's LBR support, and justify the "design".  I am simply not willing to
> > take KVM LBR patches until that documentation is provided.
> Sure, I will work with Weijiang to supply such documentation. Will this
> document be put in Documentation/virt/kvm/x86/ ?

Ya, Documentation/virt/kvm/x86/pmu.rst please.
Like Xu June 28, 2023, 5:50 a.m. UTC | #6
On 16/6/2023 7:33 pm, Xiong Zhang wrote:
> vLBR event could be inactive in two case:
> a. host per cpu pinned LBR event occupy LBR when vLBR event is created
> b. vLBR event is preempted by host per cpu pinned LBR event during vm
> exit handler.
> When vLBR event is inactive, guest couldn't access LBR msr, and it is
> forced into error state and is excluded from schedule by perf scheduler.
> So vLBR event couldn't be active through perf scheduler even if host per
> cpu pinned LBR event has released LBR, kvm could enable vLBR event
> proactively, then vLBR event may be active and LBR msr could be passthrough
> into guest.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>   arch/x86/kvm/vmx/pmu_intel.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 741efe2c497b..5a3ab8c8711b 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -314,7 +314,16 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
>   	if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
>   		return false;
>   
> -	if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
> +	/* vLBR event may be inactive, but physical LBR may be free now.
> +	 * but vLBR event is pinned event, once it is inactive state, perf
> +	 * will force it to error state in merge_sched_in() and exclude it from
> +	 * perf schedule, so even if LBR is free now, vLBR event couldn't be active
> +	 * through perf scheduler and vLBR event could be active through
> +	 * perf_event_enable().
> +	 */
> +	if (lbr_desc->event && (lbr_desc->event->state == PERF_EVENT_STATE_ERROR))
> +		perf_event_enable(lbr_desc->event);

After allowing LBR host/guest coexistence, calls perf_event_enable() for events
here do not always succeed, thus this is not a good call point.

As expected here, any erroneous perf_event is released and reprogrammed in the
kvm_pmu_handle_event(), and the perf status for enabled features are checked
near the atomic_switch_perf_msrs().

> +	else if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
>   		goto dummy;
>   
>   	/*
Like Xu June 28, 2023, 6:07 a.m. UTC | #7
On 27/6/2023 11:07 pm, Sean Christopherson wrote:
> On Tue, Jun 27, 2023, Xiong Y Zhang wrote:
>>> On Sun, Jun 25, 2023, Xiong Y Zhang wrote:
>>>>>> On Fri, Jun 16, 2023, Xiong Zhang wrote:
>>>>> 	/*
>>>>> 	 * Attempt to re-enable the vLBR event if it was disabled due to
>>>>> 	 * contention with host LBR usage, i.e. was put into an error state.
>>>>> 	 * Perf doesn't notify KVM if the host stops using LBRs, i.e. KVM needs
>>>>> 	 * to manually re-enable the event.
>>>>> 	 */
>>>>>
>>>>> Which begs the question, why can't there be a notification of some
>>>>> form that the LBRs are once again available?
>>>> This is perf scheduler rule. If pinned event couldn't get resource as
>>>> resource limitation, perf will put it into error state and exclude it
>>>> from perf scheduler, even if resource available later, perf won't
>>>> schedule it again as it is in error state, the only way to reschedule
>>>> it is to enable it again.  If non-pinned event couldn't get resource
>>>> as resource limitation, perf will put it into inactive state, perf
>>>> will reschedule it automatically once resource is available.  vLBR event is per
>>> process pinned event.
>>>
>>> That doesn't answer my question.  I get that all of this is subject to perf
>>> scheduling, I'm asking why perf doesn't communicate directly with KVM to
>>> coordinate access to LBRs instead of pulling the rug out from under KVM.
>> Perf doesn't need such notification interface currently, as non-pinned event
>> will be active automatically once resource available, only pinned event is
>> still in inactive even if resource available, perf may refuse to add such
>> interface for KVM usage only.
> 
> Or maybe perf will be overjoyed that someone is finally proposing a coherent
> interface.  Until we actually try/ask, we'll never know.

For the perf subsystem, KVM or any other perf_event in kernel space is just a user,
and any external logic that deeply interferes with perf management of host PMU
hw resources will be defeated without question.

> 
>>> Your other response[1] mostly answered that question, but I want explicit
>>> documentation on the contract between perf and KVM with respect to LBRs.  In
>>> short, please work with Weijiang to fulfill my request/demand[*] that someone
>>> document KVM's LBR support, and justify the "design".  I am simply not willing to
>>> take KVM LBR patches until that documentation is provided.
>> Sure, I will work with Weijiang to supply such documentation. Will this
>> document be put in Documentation/virt/kvm/x86/ ?
> 
> Ya, Documentation/virt/kvm/x86/pmu.rst please.

Please pay extra attention to the current status and expectations of co-existence
in the documentation. I'm really looking forward to this document written from
the perspective of a new vPMU developer. Thanks.
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 741efe2c497b..5a3ab8c8711b 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -314,7 +314,16 @@  static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
 	if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
 		return false;
 
-	if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
+	/* vLBR event may be inactive, but physical LBR may be free now.
+	 * but vLBR event is pinned event, once it is inactive state, perf
+	 * will force it to error state in merge_sched_in() and exclude it from
+	 * perf schedule, so even if LBR is free now, vLBR event couldn't be active
+	 * through perf scheduler and vLBR event could be active through
+	 * perf_event_enable().
+	 */
+	if (lbr_desc->event && (lbr_desc->event->state == PERF_EVENT_STATE_ERROR))
+		perf_event_enable(lbr_desc->event);
+	else if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
 		goto dummy;
 
 	/*