mbox series

[v8,00/14] Guest LBR Enabling

Message ID 1565075774-26671-1-git-send-email-wei.w.wang@intel.com (mailing list archive)
Headers show
Series Guest LBR Enabling | expand

Message

Wang, Wei W Aug. 6, 2019, 7:16 a.m. UTC
Last Branch Recording (LBR) is a performance monitor unit (PMU) feature
on Intel CPUs that captures branch related info. This patch series enables
this feature to KVM guests.

Each guest can be configured to expose this LBR feature to the guest via
userspace setting the enabling param in KVM_CAP_X86_GUEST_LBR (patch 3).

About the lbr emulation method:
Since the vcpu get scheduled in, the lbr related msrs are made
interceptible. This makes guest first access to a lbr related msr always
vm-exit to kvm, so that kvm can know whether the lbr feature is used
during the vcpu time slice. The kvm lbr msr handler does the following
things:
  - create an lbr perf event (task pinned) for the vcpu thread.
    The perf event mainly serves 2 purposes:
      -- follow the host perf scheduling rules to manage the vcpu's usage
         of lbr (e.g. a cpu pinned lbr event could reclaim lbr and thus
         stopping the vcpu's use);
      -- have the host perf do context switching of the lbr state on the
         vcpu thread switching.
  - pass the lbr related msrs through to the guest.
    This enables the following guest accesses to the lbr related msrs
    without vm-exit, as long as the vcpu's lbr event owns the lbr feature.
    A cpu pinned lbr event on the host could come and take over the lbr
    feature via IPI calls. In this case, the pass-through will be
    cancelled (patch 13), and the guest following accesses to the lbr msrs
    will vm-exit to kvm and accesses will be forbidden in the handler.

If the guest doesn't touch any of the lbr related msrs (likely the guest
doesn't need to run lbr in the near future), the vcpu's lbr perf event
will be freed (please see patch 12 commit for more details).

* Tests
Conclusion: the profiling results on the guest are similar to that on the host.

Run: ./perf -b ./test_program

- Test on the host:
Overhead  Command  Source Shared Object  Source Symbol    Target Symbol   
  22.35%  ftest    libc-2.23.so          [.] __random     [.] __random        
   8.20%  ftest    ftest                 [.] qux          [.] qux             
   5.88%  ftest    ftest                 [.] random@plt   [.] __random        
   5.88%  ftest    libc-2.23.so          [.] __random     [.] __random_r  
   5.79%  ftest    ftest                 [.] main         [.] random@plt  
   5.60%  ftest    ftest                 [.] main         [.] foo             
   5.24%  ftest    libc-2.23.so          [.] __random     [.] main            
   5.20%  ftest    libc-2.23.so          [.] __random_r   [.] __random        
   5.00%  ftest    ftest                 [.] foo          [.] qux             
   4.91%  ftest    ftest                 [.] main         [.] bar             
   4.83%  ftest    ftest                 [.] bar          [.] qux             
   4.57%  ftest    ftest                 [.] main         [.] main            
   4.38%  ftest    ftest                 [.] foo          [.] main            
   4.13%  ftest    ftest                 [.] qux          [.] foo             
   3.89%  ftest    ftest                 [.] qux          [.] bar             
   3.86%  ftest    ftest                 [.] bar          [.] main            

- Test on the guest:
Overhead  Command  Source Shaged Object  Source Symbol    Target Symbol
  22.36%  ftest    libc-2.23.so          [.] random       [.] random  
   8.55%  ftest    ftest                 [.] qux          [.] qux                    
   5.79%  ftest    libc-2.23.so          [.] random       [.] random_r                     
   5.64%  ftest    ftest                 [.] random@plt   [.] random                     
   5.58%  ftest    ftest                 [.] main         [.] random@plt                       
   5.55%  ftest    ftest                 [.] main         [.] foo                       
   5.41%  ftest    libc-2.23.so          [.] random       [.] main                 
   5.31%  ftest    libc-2.23.so          [.] random_r     [.] random                      
   5.11%  ftest    ftest                 [.] foo          [.] qux                     
   4.93%  ftest    ftest                 [.] main         [.] main                     
   4.59%  ftest    ftest                 [.] qux          [.] bar                       
   4.49%  ftest    ftest                 [.] bar          [.] main                       
   4.42%  ftest    ftest                 [.] bar          [.] qux                       
   4.16%  ftest    ftest                 [.] main         [.] bar                       
   3.95%  ftest    ftest                 [.] qux          [.] foo                        
   3.79%  ftest    ftest                 [.] foo          [.] main
(due to the lib version difference, "random" is equavlent to __random above)

v7->v8 Changelog:
  - Patch 3:
    -- document KVM_CAP_X86_GUEST_LBR in api.txt
    -- make the check of KVM_CAP_X86_GUEST_LBR return the size of
       struct x86_perf_lbr_stack, to let userspace do a compatibility
       check.
  - Patch 7:
    -- support perf scheduler to not assign a counter for the perf event
       that has PERF_EV_CAP_NO_COUNTER set (rather than skipping the perf
       scheduler). This allows the scheduler to detect lbr usage conflicts
       via get_event_constraints, and lower priority events will finally
       fail to use lbr.
    -- define X86_PMC_IDX_NA as "-1", which represents a never assigned
       counter id. There are other places that use "-1", but could be
       updated to use the new macro in another patch series.
  - Patch 8:
    -- move the event->owner assignment into perf_event_alloc to have it
       set before event_init is called. Please see this patch's commit for
       reasons.
  - Patch 9:
    -- use "exclude_host" and "is_kernel_event" to decide if the lbr event
       is used for the vcpu lbr emulation, which doesn't need a counter,
       and removes the usage of the previous new perf_event_create API.
    -- remove the unused attr fields.
  - Patch 10:
    -- set a hardware reserved bit (bit 62 of LBR_SELECT) to reg->config
       for the vcpu lbr emulation event. This makes the config different
       from other host lbr event, so that they don't share the lbr.
       Please see the comments in the patch for the reasons why they
       shouldn't share.
  - Patch 12:
    -- disable interrupt and check if the vcpu lbr event owns the lbr
       feature before kvm writing to the lbr related msr. This avoids kvm
       updating the lbr msrs after lbr has been reclaimed by other events
       via ipi.
    -- remove arch v4 related support.
  - Patch 13:
    -- double check if the vcpu lbr event owns the lbr feature before
       vm-entry into the guest. The lbr pass-through will be cancelled if
       lbr feature has been reclaimed by a cpu pinned lbr event.

Previous:
https://lkml.kernel.org/r/1562548999-37095-1-git-send-email-wei.w.wang@intel.com

Wei Wang (14):
  perf/x86: fix the variable type of the lbr msrs
  perf/x86: add a function to get the addresses of the lbr stack msrs
  KVM/x86: KVM_CAP_X86_GUEST_LBR
  KVM/x86: intel_pmu_lbr_enable
  KVM/x86/vPMU: tweak kvm_pmu_get_msr
  KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  perf/x86: support to create a perf event without counter allocation
  perf/core: set the event->owner before event_init
  KVM/x86/vPMU: APIs to create/free lbr perf event for a vcpu thread
  perf/x86/lbr: don't share lbr for the vcpu usage case
  perf/x86: save/restore LBR_SELECT on vcpu switching
  KVM/x86/lbr: lbr emulation
  KVM/x86/vPMU: check the lbr feature before entering guest
  KVM/x86: remove the common handling of the debugctl msr

 Documentation/virt/kvm/api.txt    |  26 +++
 arch/x86/events/core.c            |  36 ++-
 arch/x86/events/intel/core.c      |   3 +
 arch/x86/events/intel/lbr.c       |  95 +++++++-
 arch/x86/events/perf_event.h      |   6 +-
 arch/x86/include/asm/kvm_host.h   |   5 +
 arch/x86/include/asm/perf_event.h |  17 ++
 arch/x86/kvm/cpuid.c              |   2 +-
 arch/x86/kvm/pmu.c                |  24 +-
 arch/x86/kvm/pmu.h                |  11 +-
 arch/x86/kvm/pmu_amd.c            |   7 +-
 arch/x86/kvm/vmx/pmu_intel.c      | 476 +++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.c            |   4 +-
 arch/x86/kvm/vmx/vmx.h            |   2 +
 arch/x86/kvm/x86.c                |  47 ++--
 include/linux/perf_event.h        |  18 ++
 include/uapi/linux/kvm.h          |   1 +
 kernel/events/core.c              |  19 +-
 18 files changed, 738 insertions(+), 61 deletions(-)

Comments

Wang, Wei W Sept. 6, 2019, 8:50 a.m. UTC | #1
A polite ping for comments on this version, thanks!

On Tuesday, August 6, 2019 3:16 PM, Wei Wang wrote:
> Last Branch Recording (LBR) is a performance monitor unit (PMU) feature on
> Intel CPUs that captures branch related info. This patch series enables this
> feature to KVM guests.
> 
> Each guest can be configured to expose this LBR feature to the guest via
> userspace setting the enabling param in KVM_CAP_X86_GUEST_LBR (patch
> 3).
> 
> About the lbr emulation method:
> Since the vcpu get scheduled in, the lbr related msrs are made interceptible.
> This makes guest first access to a lbr related msr always vm-exit to kvm, so
> that kvm can know whether the lbr feature is used during the vcpu time slice.
> The kvm lbr msr handler does the following
> things:
>   - create an lbr perf event (task pinned) for the vcpu thread.
>     The perf event mainly serves 2 purposes:
>       -- follow the host perf scheduling rules to manage the vcpu's usage
>          of lbr (e.g. a cpu pinned lbr event could reclaim lbr and thus
>          stopping the vcpu's use);
>       -- have the host perf do context switching of the lbr state on the
>          vcpu thread switching.
>   - pass the lbr related msrs through to the guest.
>     This enables the following guest accesses to the lbr related msrs
>     without vm-exit, as long as the vcpu's lbr event owns the lbr feature.
>     A cpu pinned lbr event on the host could come and take over the lbr
>     feature via IPI calls. In this case, the pass-through will be
>     cancelled (patch 13), and the guest following accesses to the lbr msrs
>     will vm-exit to kvm and accesses will be forbidden in the handler.
> 
> If the guest doesn't touch any of the lbr related msrs (likely the guest doesn't
> need to run lbr in the near future), the vcpu's lbr perf event will be freed
> (please see patch 12 commit for more details).
> 
> * Tests
> Conclusion: the profiling results on the guest are similar to that on the host.
> 
> Run: ./perf -b ./test_program
> 
> - Test on the host:
> Overhead  Command  Source Shared Object  Source Symbol    Target
> Symbol
>   22.35%  ftest    libc-2.23.so          [.] __random     [.]
> __random
>    8.20%  ftest    ftest                 [.] qux          [.] qux
>    5.88%  ftest    ftest                 [.] random@plt   [.]
> __random
>    5.88%  ftest    libc-2.23.so          [.] __random     [.]
> __random_r
>    5.79%  ftest    ftest                 [.] main         [.]
> random@plt
>    5.60%  ftest    ftest                 [.] main         [.] foo
>    5.24%  ftest    libc-2.23.so          [.] __random     [.] main
>    5.20%  ftest    libc-2.23.so          [.] __random_r   [.]
> __random
>    5.00%  ftest    ftest                 [.] foo          [.] qux
>    4.91%  ftest    ftest                 [.] main         [.] bar
>    4.83%  ftest    ftest                 [.] bar          [.] qux
>    4.57%  ftest    ftest                 [.] main         [.] main
>    4.38%  ftest    ftest                 [.] foo          [.] main
>    4.13%  ftest    ftest                 [.] qux          [.] foo
>    3.89%  ftest    ftest                 [.] qux          [.] bar
>    3.86%  ftest    ftest                 [.] bar          [.] main
> 
> - Test on the guest:
> Overhead  Command  Source Shaged Object  Source Symbol    Target
> Symbol
>   22.36%  ftest    libc-2.23.so          [.] random       [.] random
>    8.55%  ftest    ftest                 [.] qux          [.] qux
>    5.79%  ftest    libc-2.23.so          [.] random       [.]
> random_r
>    5.64%  ftest    ftest                 [.] random@plt   [.]
> random
>    5.58%  ftest    ftest                 [.] main         [.]
> random@plt
>    5.55%  ftest    ftest                 [.] main         [.] foo
>    5.41%  ftest    libc-2.23.so          [.] random       [.] main
>    5.31%  ftest    libc-2.23.so          [.] random_r     [.] random
>    5.11%  ftest    ftest                 [.] foo          [.] qux
>    4.93%  ftest    ftest                 [.] main         [.] main
>    4.59%  ftest    ftest                 [.] qux          [.] bar
>    4.49%  ftest    ftest                 [.] bar          [.] main
>    4.42%  ftest    ftest                 [.] bar          [.] qux
>    4.16%  ftest    ftest                 [.] main         [.] bar
>    3.95%  ftest    ftest                 [.] qux          [.] foo
>    3.79%  ftest    ftest                 [.] foo          [.] main
> (due to the lib version difference, "random" is equavlent to __random above)
> 
> v7->v8 Changelog:
>   - Patch 3:
>     -- document KVM_CAP_X86_GUEST_LBR in api.txt
>     -- make the check of KVM_CAP_X86_GUEST_LBR return the size of
>        struct x86_perf_lbr_stack, to let userspace do a compatibility
>        check.
>   - Patch 7:
>     -- support perf scheduler to not assign a counter for the perf event
>        that has PERF_EV_CAP_NO_COUNTER set (rather than skipping the
> perf
>        scheduler). This allows the scheduler to detect lbr usage conflicts
>        via get_event_constraints, and lower priority events will finally
>        fail to use lbr.
>     -- define X86_PMC_IDX_NA as "-1", which represents a never assigned
>        counter id. There are other places that use "-1", but could be
>        updated to use the new macro in another patch series.
>   - Patch 8:
>     -- move the event->owner assignment into perf_event_alloc to have it
>        set before event_init is called. Please see this patch's commit for
>        reasons.
>   - Patch 9:
>     -- use "exclude_host" and "is_kernel_event" to decide if the lbr event
>        is used for the vcpu lbr emulation, which doesn't need a counter,
>        and removes the usage of the previous new perf_event_create API.
>     -- remove the unused attr fields.
>   - Patch 10:
>     -- set a hardware reserved bit (bit 62 of LBR_SELECT) to reg->config
>        for the vcpu lbr emulation event. This makes the config different
>        from other host lbr event, so that they don't share the lbr.
>        Please see the comments in the patch for the reasons why they
>        shouldn't share.
>   - Patch 12:
>     -- disable interrupt and check if the vcpu lbr event owns the lbr
>        feature before kvm writing to the lbr related msr. This avoids kvm
>        updating the lbr msrs after lbr has been reclaimed by other events
>        via ipi.
>     -- remove arch v4 related support.
>   - Patch 13:
>     -- double check if the vcpu lbr event owns the lbr feature before
>        vm-entry into the guest. The lbr pass-through will be cancelled if
>        lbr feature has been reclaimed by a cpu pinned lbr event.
> 
> Previous:
> https://lkml.kernel.org/r/1562548999-37095-1-git-send-email-wei.w.wang
> @intel.com
> 
> Wei Wang (14):
>   perf/x86: fix the variable type of the lbr msrs
>   perf/x86: add a function to get the addresses of the lbr stack msrs
>   KVM/x86: KVM_CAP_X86_GUEST_LBR
>   KVM/x86: intel_pmu_lbr_enable
>   KVM/x86/vPMU: tweak kvm_pmu_get_msr
>   KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
>   perf/x86: support to create a perf event without counter allocation
>   perf/core: set the event->owner before event_init
>   KVM/x86/vPMU: APIs to create/free lbr perf event for a vcpu thread
>   perf/x86/lbr: don't share lbr for the vcpu usage case
>   perf/x86: save/restore LBR_SELECT on vcpu switching
>   KVM/x86/lbr: lbr emulation
>   KVM/x86/vPMU: check the lbr feature before entering guest
>   KVM/x86: remove the common handling of the debugctl msr
> 
>  Documentation/virt/kvm/api.txt    |  26 +++
>  arch/x86/events/core.c            |  36 ++-
>  arch/x86/events/intel/core.c      |   3 +
>  arch/x86/events/intel/lbr.c       |  95 +++++++-
>  arch/x86/events/perf_event.h      |   6 +-
>  arch/x86/include/asm/kvm_host.h   |   5 +
>  arch/x86/include/asm/perf_event.h |  17 ++
>  arch/x86/kvm/cpuid.c              |   2 +-
>  arch/x86/kvm/pmu.c                |  24 +-
>  arch/x86/kvm/pmu.h                |  11 +-
>  arch/x86/kvm/pmu_amd.c            |   7 +-
>  arch/x86/kvm/vmx/pmu_intel.c      | 476
> +++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/vmx/vmx.c            |   4 +-
>  arch/x86/kvm/vmx/vmx.h            |   2 +
>  arch/x86/kvm/x86.c                |  47 ++--
>  include/linux/perf_event.h        |  18 ++
>  include/uapi/linux/kvm.h          |   1 +
>  kernel/events/core.c              |  19 +-
>  18 files changed, 738 insertions(+), 61 deletions(-)
> 
> --
> 2.7.4
Eduardo Habkost Jan. 30, 2020, 8:14 p.m. UTC | #2
On Tue, Aug 06, 2019 at 03:16:00PM +0800, Wei Wang wrote:
> Last Branch Recording (LBR) is a performance monitor unit (PMU) feature
> on Intel CPUs that captures branch related info. This patch series enables
> this feature to KVM guests.
> 
> Each guest can be configured to expose this LBR feature to the guest via
> userspace setting the enabling param in KVM_CAP_X86_GUEST_LBR (patch 3).

Are QEMU patches for enabling KVM_CAP_X86_GUEST_LBR being planned?
Wang, Wei W Jan. 31, 2020, 1:01 a.m. UTC | #3
On Friday, January 31, 2020 4:14 AM, Eduardo Habkost wrote:
> On Tue, Aug 06, 2019 at 03:16:00PM +0800, Wei Wang wrote:
> > Last Branch Recording (LBR) is a performance monitor unit (PMU)
> > feature on Intel CPUs that captures branch related info. This patch
> > series enables this feature to KVM guests.
> >
> > Each guest can be configured to expose this LBR feature to the guest
> > via userspace setting the enabling param in KVM_CAP_X86_GUEST_LBR
> (patch 3).
> 
> Are QEMU patches for enabling KVM_CAP_X86_GUEST_LBR being planned?
> 
Yes, we have a couple of qemu patches. That's planned to be reviewed after the kernel part gets finalized