diff mbox series

[v2,12/54] perf: x86: Add x86 function to switch PMI handler

Message ID 20240506053020.3911940-13-mizhang@google.com (mailing list archive)
State New, archived
Headers show
Series Mediated Passthrough vPMU 2.0 for x86 | expand

Commit Message

Mingwei Zhang May 6, 2024, 5:29 a.m. UTC
From: Xiong Zhang <xiong.y.zhang@linux.intel.com>

Add x86 specific function to switch PMI handler since passthrough PMU and host
PMU use different interrupt vectors.

x86_perf_guest_enter() switch PMU vector from NMI to KVM_GUEST_PMI_VECTOR,
and guest LVTPC_MASK value should be reflected onto HW to indicate whether
guest has cleared LVTPC_MASK or not, so guest lvt_pc is passed as parameter.

x86_perf_guest_exit() switch PMU vector from KVM_GUEST_PMI_VECTOR to NMI.

Signed-off-by: Xiong Zhang <xiong.y.zhang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 arch/x86/events/core.c            | 17 +++++++++++++++++
 arch/x86/include/asm/perf_event.h |  3 +++
 2 files changed, 20 insertions(+)

Comments

Peter Zijlstra May 7, 2024, 9:22 a.m. UTC | #1
On Mon, May 06, 2024 at 05:29:37AM +0000, Mingwei Zhang wrote:
> From: Xiong Zhang <xiong.y.zhang@linux.intel.com>
> 
> Add x86 specific function to switch PMI handler since passthrough PMU and host
> PMU use different interrupt vectors.
> 
> x86_perf_guest_enter() switch PMU vector from NMI to KVM_GUEST_PMI_VECTOR,
> and guest LVTPC_MASK value should be reflected onto HW to indicate whether
> guest has cleared LVTPC_MASK or not, so guest lvt_pc is passed as parameter.
> 
> x86_perf_guest_exit() switch PMU vector from KVM_GUEST_PMI_VECTOR to NMI.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@linux.intel.com>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
>  arch/x86/events/core.c            | 17 +++++++++++++++++
>  arch/x86/include/asm/perf_event.h |  3 +++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 09050641ce5d..8167f2230d3a 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -701,6 +701,23 @@ struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data)
>  }
>  EXPORT_SYMBOL_GPL(perf_guest_get_msrs);
>  
> +void x86_perf_guest_enter(u32 guest_lvtpc)
> +{
> +	lockdep_assert_irqs_disabled();
> +
> +	apic_write(APIC_LVTPC, APIC_DM_FIXED | KVM_GUEST_PMI_VECTOR |
> +			       (guest_lvtpc & APIC_LVT_MASKED));
> +}
> +EXPORT_SYMBOL_GPL(x86_perf_guest_enter);
> +
> +void x86_perf_guest_exit(void)
> +{
> +	lockdep_assert_irqs_disabled();
> +
> +	apic_write(APIC_LVTPC, APIC_DM_NMI);
> +}
> +EXPORT_SYMBOL_GPL(x86_perf_guest_exit);

Urgghh... because it makes sense for this bare APIC write to be exported
?!?

Can't this at the very least be hard tied to perf_guest_{enter,exit}() ?
Chen, Zide May 7, 2024, 9:40 p.m. UTC | #2
On 5/5/2024 10:29 PM, Mingwei Zhang wrote:
> From: Xiong Zhang <xiong.y.zhang@linux.intel.com>
> 
> Add x86 specific function to switch PMI handler since passthrough PMU and host
> PMU use different interrupt vectors.
> 
> x86_perf_guest_enter() switch PMU vector from NMI to KVM_GUEST_PMI_VECTOR,
> and guest LVTPC_MASK value should be reflected onto HW to indicate whether
> guest has cleared LVTPC_MASK or not, so guest lvt_pc is passed as parameter.
> 
> x86_perf_guest_exit() switch PMU vector from KVM_GUEST_PMI_VECTOR to NMI.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@linux.intel.com>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
>  arch/x86/events/core.c            | 17 +++++++++++++++++
>  arch/x86/include/asm/perf_event.h |  3 +++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 09050641ce5d..8167f2230d3a 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -701,6 +701,23 @@ struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data)
>  }
>  EXPORT_SYMBOL_GPL(perf_guest_get_msrs);
>  
> +void x86_perf_guest_enter(u32 guest_lvtpc)
> +{
> +	lockdep_assert_irqs_disabled();
> +
> +	apic_write(APIC_LVTPC, APIC_DM_FIXED | KVM_GUEST_PMI_VECTOR |
> +			       (guest_lvtpc & APIC_LVT_MASKED));

If CONFIG_KVM is not defined, KVM_GUEST_PMI_VECTOR is not available and
it causes compiling error.
Mi, Dapeng May 8, 2024, 3:44 a.m. UTC | #3
On 5/8/2024 5:40 AM, Chen, Zide wrote:
>
> On 5/5/2024 10:29 PM, Mingwei Zhang wrote:
>> From: Xiong Zhang <xiong.y.zhang@linux.intel.com>
>>
>> Add x86 specific function to switch PMI handler since passthrough PMU and host
>> PMU use different interrupt vectors.
>>
>> x86_perf_guest_enter() switch PMU vector from NMI to KVM_GUEST_PMI_VECTOR,
>> and guest LVTPC_MASK value should be reflected onto HW to indicate whether
>> guest has cleared LVTPC_MASK or not, so guest lvt_pc is passed as parameter.
>>
>> x86_perf_guest_exit() switch PMU vector from KVM_GUEST_PMI_VECTOR to NMI.
>>
>> Signed-off-by: Xiong Zhang <xiong.y.zhang@linux.intel.com>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>>  arch/x86/events/core.c            | 17 +++++++++++++++++
>>  arch/x86/include/asm/perf_event.h |  3 +++
>>  2 files changed, 20 insertions(+)
>>
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index 09050641ce5d..8167f2230d3a 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -701,6 +701,23 @@ struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data)
>>  }
>>  EXPORT_SYMBOL_GPL(perf_guest_get_msrs);
>>  
>> +void x86_perf_guest_enter(u32 guest_lvtpc)
>> +{
>> +	lockdep_assert_irqs_disabled();
>> +
>> +	apic_write(APIC_LVTPC, APIC_DM_FIXED | KVM_GUEST_PMI_VECTOR |
>> +			       (guest_lvtpc & APIC_LVT_MASKED));
> If CONFIG_KVM is not defined, KVM_GUEST_PMI_VECTOR is not available and
> it causes compiling error.
Zide, thanks. If CONFIG_KVM is not defined, these two helpers would not
need to be called, it can be defined as null.
Xiong Zhang May 8, 2024, 6:58 a.m. UTC | #4
On 5/7/2024 5:22 PM, Peter Zijlstra wrote:
> On Mon, May 06, 2024 at 05:29:37AM +0000, Mingwei Zhang wrote:
>> From: Xiong Zhang <xiong.y.zhang@linux.intel.com>
>>
>> Add x86 specific function to switch PMI handler since passthrough PMU and host
>> PMU use different interrupt vectors.
>>
>> x86_perf_guest_enter() switch PMU vector from NMI to KVM_GUEST_PMI_VECTOR,
>> and guest LVTPC_MASK value should be reflected onto HW to indicate whether
>> guest has cleared LVTPC_MASK or not, so guest lvt_pc is passed as parameter.
>>
>> x86_perf_guest_exit() switch PMU vector from KVM_GUEST_PMI_VECTOR to NMI.
>>
>> Signed-off-by: Xiong Zhang <xiong.y.zhang@linux.intel.com>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>>  arch/x86/events/core.c            | 17 +++++++++++++++++
>>  arch/x86/include/asm/perf_event.h |  3 +++
>>  2 files changed, 20 insertions(+)
>>
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index 09050641ce5d..8167f2230d3a 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -701,6 +701,23 @@ struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data)
>>  }
>>  EXPORT_SYMBOL_GPL(perf_guest_get_msrs);
>>  
>> +void x86_perf_guest_enter(u32 guest_lvtpc)
>> +{
>> +	lockdep_assert_irqs_disabled();
>> +
>> +	apic_write(APIC_LVTPC, APIC_DM_FIXED | KVM_GUEST_PMI_VECTOR |
>> +			       (guest_lvtpc & APIC_LVT_MASKED));
>> +}
>> +EXPORT_SYMBOL_GPL(x86_perf_guest_enter);
>> +
>> +void x86_perf_guest_exit(void)
>> +{
>> +	lockdep_assert_irqs_disabled();
>> +
>> +	apic_write(APIC_LVTPC, APIC_DM_NMI);
>> +}
>> +EXPORT_SYMBOL_GPL(x86_perf_guest_exit);
> 
> Urgghh... because it makes sense for this bare APIC write to be exported
> ?!?
Usually KVM doesn't access HW except vmx directly and requests other
components to access HW to avoid confliction, APIC_LVTPC is managed by x86
perf driver, so I added two functions here and exported them.
> 
> Can't this at the very least be hard tied to perf_guest_{enter,exit}() ?
> perf_guest_{enter, exit} is called from this function in another commit, I
should merge that commit into this one according to your suggestion in
other email.

thanks
Peter Zijlstra May 8, 2024, 8:37 a.m. UTC | #5
On Wed, May 08, 2024 at 02:58:30PM +0800, Zhang, Xiong Y wrote:
> On 5/7/2024 5:22 PM, Peter Zijlstra wrote:
> > On Mon, May 06, 2024 at 05:29:37AM +0000, Mingwei Zhang wrote:
> >> From: Xiong Zhang <xiong.y.zhang@linux.intel.com>

> >> +void x86_perf_guest_enter(u32 guest_lvtpc)
> >> +{
> >> +	lockdep_assert_irqs_disabled();
> >> +
> >> +	apic_write(APIC_LVTPC, APIC_DM_FIXED | KVM_GUEST_PMI_VECTOR |
> >> +			       (guest_lvtpc & APIC_LVT_MASKED));
> >> +}
> >> +EXPORT_SYMBOL_GPL(x86_perf_guest_enter);
> >> +
> >> +void x86_perf_guest_exit(void)
> >> +{
> >> +	lockdep_assert_irqs_disabled();
> >> +
> >> +	apic_write(APIC_LVTPC, APIC_DM_NMI);
> >> +}
> >> +EXPORT_SYMBOL_GPL(x86_perf_guest_exit);
> > 
> > Urgghh... because it makes sense for this bare APIC write to be exported
> > ?!?
> Usually KVM doesn't access HW except vmx directly and requests other
> components to access HW to avoid confliction, APIC_LVTPC is managed by x86
> perf driver, so I added two functions here and exported them.

Yes, I understand how you got here. But as with everything you export,
you should ask yourself, should I export this. The above
x86_perf_guest_enter() function allows any module to write random LVTPC
entries. That's not a good thing to export.

I utterly detest how KVM is a module and ends up exporting a ton of
stuff that *really* should not be exported.
Xiong Zhang May 9, 2024, 7:30 a.m. UTC | #6
On 5/8/2024 4:37 PM, Peter Zijlstra wrote:
> On Wed, May 08, 2024 at 02:58:30PM +0800, Zhang, Xiong Y wrote:
>> On 5/7/2024 5:22 PM, Peter Zijlstra wrote:
>>> On Mon, May 06, 2024 at 05:29:37AM +0000, Mingwei Zhang wrote:
>>>> From: Xiong Zhang <xiong.y.zhang@linux.intel.com>
> 
>>>> +void x86_perf_guest_enter(u32 guest_lvtpc)
>>>> +{
>>>> +	lockdep_assert_irqs_disabled();
>>>> +
>>>> +	apic_write(APIC_LVTPC, APIC_DM_FIXED | KVM_GUEST_PMI_VECTOR |
>>>> +			       (guest_lvtpc & APIC_LVT_MASKED));
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(x86_perf_guest_enter);
>>>> +
>>>> +void x86_perf_guest_exit(void)
>>>> +{
>>>> +	lockdep_assert_irqs_disabled();
>>>> +
>>>> +	apic_write(APIC_LVTPC, APIC_DM_NMI);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(x86_perf_guest_exit);
>>>
>>> Urgghh... because it makes sense for this bare APIC write to be exported
>>> ?!?
>> Usually KVM doesn't access HW except vmx directly and requests other
>> components to access HW to avoid confliction, APIC_LVTPC is managed by x86
>> perf driver, so I added two functions here and exported them.
> 
> Yes, I understand how you got here. But as with everything you export,
> you should ask yourself, should I export this. The above
> x86_perf_guest_enter() function allows any module to write random LVTPC
> entries. That's not a good thing to export.
Totally agree with your concern. Here KVM need to switch PMI vector at PMU
context switch, export isn't good, could you kindly give guideline to
design or improve such interface?
I thought the following two method, but they are worse than this commit.
1. Perf register notification to KVM, but this makes perf depends on KVM.
2. KVM write APIC_LVTPC directly, but this needs x86 export apic_write().

thanks
> 
> I utterly detest how KVM is a module and ends up exporting a ton of
> stuff that *really* should not be exported.
Mingwei Zhang May 30, 2024, 5:12 a.m. UTC | #7
On Tue, May 07, 2024, Chen, Zide wrote:
> 
> 
> On 5/5/2024 10:29 PM, Mingwei Zhang wrote:
> > From: Xiong Zhang <xiong.y.zhang@linux.intel.com>
> > 
> > Add x86 specific function to switch PMI handler since passthrough PMU and host
> > PMU use different interrupt vectors.
> > 
> > x86_perf_guest_enter() switch PMU vector from NMI to KVM_GUEST_PMI_VECTOR,
> > and guest LVTPC_MASK value should be reflected onto HW to indicate whether
> > guest has cleared LVTPC_MASK or not, so guest lvt_pc is passed as parameter.
> > 
> > x86_perf_guest_exit() switch PMU vector from KVM_GUEST_PMI_VECTOR to NMI.
> > 
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@linux.intel.com>
> > Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> > ---
> >  arch/x86/events/core.c            | 17 +++++++++++++++++
> >  arch/x86/include/asm/perf_event.h |  3 +++
> >  2 files changed, 20 insertions(+)
> > 
> > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > index 09050641ce5d..8167f2230d3a 100644
> > --- a/arch/x86/events/core.c
> > +++ b/arch/x86/events/core.c
> > @@ -701,6 +701,23 @@ struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data)
> >  }
> >  EXPORT_SYMBOL_GPL(perf_guest_get_msrs);
> >  
> > +void x86_perf_guest_enter(u32 guest_lvtpc)
> > +{
> > +	lockdep_assert_irqs_disabled();
> > +
> > +	apic_write(APIC_LVTPC, APIC_DM_FIXED | KVM_GUEST_PMI_VECTOR |
> > +			       (guest_lvtpc & APIC_LVT_MASKED));
> 
> If CONFIG_KVM is not defined, KVM_GUEST_PMI_VECTOR is not available and
> it causes compiling error.

That is a good discovery, thanks. hmm, we could put the whole function
under IS_ENABLED(CONFIG_KVM) to avoid that.

Thanks.
-Mingwei
diff mbox series

Patch

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 09050641ce5d..8167f2230d3a 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -701,6 +701,23 @@  struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data)
 }
 EXPORT_SYMBOL_GPL(perf_guest_get_msrs);
 
+void x86_perf_guest_enter(u32 guest_lvtpc)
+{
+	lockdep_assert_irqs_disabled();
+
+	apic_write(APIC_LVTPC, APIC_DM_FIXED | KVM_GUEST_PMI_VECTOR |
+			       (guest_lvtpc & APIC_LVT_MASKED));
+}
+EXPORT_SYMBOL_GPL(x86_perf_guest_enter);
+
+void x86_perf_guest_exit(void)
+{
+	lockdep_assert_irqs_disabled();
+
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+}
+EXPORT_SYMBOL_GPL(x86_perf_guest_exit);
+
 /*
  * There may be PMI landing after enabled=0. The PMI hitting could be before or
  * after disable_all.
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 3736b8a46c04..807ea9c98567 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -577,6 +577,9 @@  static inline void perf_events_lapic_init(void)	{ }
 static inline void perf_check_microcode(void) { }
 #endif
 
+void x86_perf_guest_enter(u32 guest_lvtpc);
+void x86_perf_guest_exit(void);
+
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
 extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data);
 extern void x86_perf_get_lbr(struct x86_pmu_lbr *lbr);