Message ID | 20231109180646.2963718-1-khorenko@virtuozzo.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: x86/vPMU: Speed up vmexit for AMD Zen 4 CPUs | expand |
On Thu, Nov 9, 2023 at 10:24 AM Konstantin Khorenko <khorenko@virtuozzo.com> wrote: > > We have detected significant performance drop of our atomic test which > checks the rate of CPUID instructions rate inside an L1 VM on an AMD > node. > > Investigation led to 2 mainstream patches which have introduced extra > events accounting: > > 018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions") > 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions") > > And on an AMD Zen 3 CPU that resulted in immediate 43% drop in the CPUID > rate. > > Checking latest mainsteam kernel the performance difference is much less > but still quite noticeable: 13.4% and shows up on AMD CPUs only. > > Looks like iteration over all PMCs in kvm_pmu_trigger_event() is cheap > on Intel and expensive on AMD CPUs. > > So the idea behind this patch is to skip iterations over PMCs at all in > case PMU is disabled for a VM completely or PMU is enabled for a VM, but > there are no active PMCs at all. A better solution may be to maintain two bitmaps of general purpose counters that need to be incremented, one for instructions retired and one for branch instructions retired. Set or clear these bits whenever the PerfEvtSelN MSRs are written. I think I would keep the PGC bits separate, on those microarchitectures that support PGC. Then, kvm_pmu_trigger_event() need only consult the appropriate bitmap (or the logical and of that bitmap with PGC). In most cases, the value will be zero, and the function can simply return. This would work even for AMD microarchitectures that don't support PGC. > Unfortunately > * current kernel code does not differentiate if PMU is globally enabled > for a VM or not (pmu->version is always 1) > * AMD CPUs older than Zen 4 do not support PMU v2 and thus efficient > check for enabled PMCs is not possible > > => the patch speeds up vmexit for AMD Zen 4 CPUs only, this is sad. > but the patch does not hurt other CPUs - and this is fortunate! > > i have no access to a node with AMD Zen 4 CPU, so i had to test on > AMD Zen 3 CPU and i hope my expectations are right for AMD Zen 4. > > i would appreciate if anyone perform the test of a real AMD Zen 4 node. > > AMD performance results: > CPU: AMD Zen 3 (three!): AMD EPYC 7443P 24-Core Processor > > * The test binary is run inside an AlmaLinux 9 VM with their stock kernel > 5.14.0-284.11.1.el9_2.x86_64. > * Test binary checks the CPUID instractions rate (instructions per sec). > * Default VM config (PMU is off, pmu->version is reported as 1). > * The Host runs the kernel under test. > > # for i in 1 2 3 4 5 ; do ./at_cpu_cpuid.pub ; done | \ > awk -e '{print $4;}' | \ > cut -f1 --delimiter='.' | \ > ./avg.sh > > Measurements: > 1. Host runs stock latest mainstream kernel commit 305230142ae0. > 2. Host runs same mainstream kernel + current patch. > 3. Host runs same mainstream kernel + current patch + force > guest_pmu_is_enabled() to always return "false" using following change: > > - if (pmu->version >= 2 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask)) > + if (pmu->version == 1 && !(pmu->global_ctrl & ~pmu->global_ctrl_mask)) > > ----------------------------------------- > | Kernels | CPUID rate | > ----------------------------------------- > | 1. | 1360250 | > | 2. | 1365536 (+ 0.4%) | > | 3. | 1541850 (+13.4%) | > ----------------------------------------- > > Measurement (2) gives some fluctuation, the performance is not increased > because the test was done on a Zen 3 CPU, so we are unable to use fast > check for active PMCs. > Measurement (3) shows expected performance boost on a Zen 4 CPU under > the same test. > > The test used: > # cat at_cpu_cpuid.pub.cpp > /* > * The test executes CPUID instruction in a loop and reports the calls rate. > */ > > #include <stdio.h> > #include <time.h> > > /* #define CPUID_EAX 0x80000002 */ > #define CPUID_EAX 0x29a > #define CPUID_ECX 0 > > #define TEST_EXEC_SECS 30 // in seconds > #define LOOPS_APPROX_RATE 1000000 > > static inline void cpuid(unsigned int _eax, unsigned int _ecx) > { > unsigned int regs[4] = {_eax, 0, _ecx, 0}; > > asm __volatile__( > "cpuid" > : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3]) > : "0" (regs[0]), "1" (regs[1]), "2" (regs[2]), "3" (regs[3]) > : "memory"); > } > > double cpuid_rate_loops(int loops_num) > { > int i; > clock_t start_time, end_time; > double spent_time, rate; > > start_time = clock(); > > for (i = 0; i < loops_num; i++) > cpuid((unsigned int)CPUID_EAX, (unsigned int)CPUID_ECX); > > end_time = clock(); > spent_time = (double)(end_time - start_time) / CLOCKS_PER_SEC; > > rate = (double)loops_num / spent_time; > > return rate; > } > > int main(int argc, char* argv[]) > { > double approx_rate, rate; > int loops; > > /* First we detect approximate CPUIDs rate. */ > approx_rate = cpuid_rate_loops(LOOPS_APPROX_RATE); > > /* > * How many loops there should be in order to run the test for > * TEST_EXEC_SECS seconds? > */ > loops = (int)(approx_rate * TEST_EXEC_SECS); > > /* Get the precise instructions rate. */ > rate = cpuid_rate_loops(loops); > > printf( "CPUID instructions rate: %f instructions/second\n", rate); > > return 0; > } > > Konstantin Khorenko (1): > KVM: x86/vPMU: Check PMU is enabled for vCPU before searching for PMC > > arch/x86/kvm/pmu.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > -- > 2.39.3 > >
On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko <khorenko@virtuozzo.com> wrote: > > Hi All, > > as a followup for my patch: i have noticed that > currently Intel kernel code provides an ability to detect if PMU is totally disabled for a VM > (pmu->version == 0 in this case), but for AMD code pmu->version is never 0, > no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/> in the VM config which > results in "-cpu pmu=off" qemu option). > > So the question is - is it possible to enhance the code for AMD to also honor PMU VM setting or it is > impossible by design? The AMD architectural specification prior to AMD PMU v2 does not allow one to describe a CPU (via CPUID or MSRs) that has fewer than 4 general purpose PMU counters. While AMD PMU v2 does allow one to describe such a CPU, legacy software that knows nothing of AMD PMU v2 can expect four counters regardless. Having said that, KVM does provide a per-VM capability for disabling the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See section 8.35 in Documentation/virt/kvm/api.rst.
On Thu, Nov 09, 2023, Jim Mattson wrote: > On Thu, Nov 9, 2023 at 10:24 AM Konstantin Khorenko > <khorenko@virtuozzo.com> wrote: > > > > We have detected significant performance drop of our atomic test which > > checks the rate of CPUID instructions rate inside an L1 VM on an AMD > > node. > > > > Investigation led to 2 mainstream patches which have introduced extra > > events accounting: > > > > 018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions") > > 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions") > > > > And on an AMD Zen 3 CPU that resulted in immediate 43% drop in the CPUID > > rate. > > > > Checking latest mainsteam kernel the performance difference is much less > > but still quite noticeable: 13.4% and shows up on AMD CPUs only. > > > > Looks like iteration over all PMCs in kvm_pmu_trigger_event() is cheap > > on Intel and expensive on AMD CPUs. > > > > So the idea behind this patch is to skip iterations over PMCs at all in > > case PMU is disabled for a VM completely or PMU is enabled for a VM, but > > there are no active PMCs at all. > > A better solution may be to maintain two bitmaps of general purpose > counters that need to be incremented, one for instructions retired and > one for branch instructions retired. Set or clear these bits whenever > the PerfEvtSelN MSRs are written. I think I would keep the PGC bits > separate, on those microarchitectures that support PGC. Then, > kvm_pmu_trigger_event() need only consult the appropriate bitmap (or > the logical and of that bitmap with PGC). In most cases, the value > will be zero, and the function can simply return. > > This would work even for AMD microarchitectures that don't support PGC. Yeah. There are multiple lower-hanging fruits to be picked though, most of which won't conflict with using dedicated per-event bitmaps, or at worst are trivial to resolve. 1. Don't call into perf to get the eventsel (which generates an indirect call) on every invocation, let alone every iteration. 2. Avoid getting the CPL when it's irrelevant. 3. Check the eventsel before querying the event filter. 4. Mask out PMCs that aren't globally enabled from the get-go (masking out PMCs based on eventsel would essentially be the same as per-event bitmaps). I'm definitely not opposed to per-event bitmaps, but it'd be nice to avoid them, e.g. if we can eke out 99% of the performance just by doing a few obvious optimizations. This is the end result of what I'm testing and will (hopefully) post shortly: static inline bool pmc_is_eventsel_match(struct kvm_pmc *pmc, u64 eventsel) { return !((pmc->eventsel ^ eventsel) & AMD64_RAW_EVENT_MASK_NB); } static inline bool cpl_is_matched(struct kvm_pmc *pmc) { bool select_os, select_user; u64 config; if (pmc_is_gp(pmc)) { config = pmc->eventsel; select_os = config & ARCH_PERFMON_EVENTSEL_OS; select_user = config & ARCH_PERFMON_EVENTSEL_USR; } else { config = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl, pmc->idx - KVM_FIXED_PMC_BASE_IDX); select_os = config & 0x1; select_user = config & 0x2; } /* * Skip the CPL lookup, which isn't free on Intel, if the result will * be the same regardless of the CPL. */ if (select_os == select_user) return select_os; return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user; } void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel) { DECLARE_BITMAP(bitmap, X86_PMC_IDX_MAX); struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); struct kvm_pmc *pmc; int i; BUILD_BUG_ON(sizeof(pmu->global_ctrl) * BITS_PER_BYTE != X86_PMC_IDX_MAX); if (!kvm_pmu_has_perf_global_ctrl(pmu)) bitmap_copy(bitmap, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX); else if (!bitmap_and(bitmap, pmu->all_valid_pmc_idx, (unsigned long *)&pmu->global_ctrl, X86_PMC_IDX_MAX)) return; kvm_for_each_pmc(pmu, pmc, i, bitmap) { /* Ignore checks for edge detect, pin control, invert and CMASK bits */ if (!pmc_is_eventsel_match(pmc, eventsel) || !pmc_event_is_allowed(pmc) || !cpl_is_matched(pmc)) continue; kvm_pmu_incr_counter(pmc); } }
On 11/9/23 23:52, Jim Mattson wrote: > On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko > <khorenko@virtuozzo.com> wrote: >> Hi All, >> >> as a followup for my patch: i have noticed that >> currently Intel kernel code provides an ability to detect if PMU is totally disabled for a VM >> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0, >> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/> in the VM config which >> results in "-cpu pmu=off" qemu option). >> >> So the question is - is it possible to enhance the code for AMD to also honor PMU VM setting or it is >> impossible by design? > The AMD architectural specification prior to AMD PMU v2 does not allow > one to describe a CPU (via CPUID or MSRs) that has fewer than 4 > general purpose PMU counters. While AMD PMU v2 does allow one to > describe such a CPU, legacy software that knows nothing of AMD PMU v2 > can expect four counters regardless. > > Having said that, KVM does provide a per-VM capability for disabling > the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See > section 8.35 in Documentation/virt/kvm/api.rst. But this means in particular that QEMU should immediately use this KVM_PMU_CAP_DISABLE if this capability is supported and PMU=off. I am not seeing this code thus I believe that we have missed this. I think that this change worth adding. We will measure the impact :-) Den
On 11/9/23 3:46 PM, Denis V. Lunev wrote: > On 11/9/23 23:52, Jim Mattson wrote: >> On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko >> <khorenko@virtuozzo.com> wrote: >>> Hi All, >>> >>> as a followup for my patch: i have noticed that >>> currently Intel kernel code provides an ability to detect if PMU is totally >>> disabled for a VM >>> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0, >>> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/> >>> in the VM config which >>> results in "-cpu pmu=off" qemu option). >>> >>> So the question is - is it possible to enhance the code for AMD to also honor >>> PMU VM setting or it is >>> impossible by design? >> The AMD architectural specification prior to AMD PMU v2 does not allow >> one to describe a CPU (via CPUID or MSRs) that has fewer than 4 >> general purpose PMU counters. While AMD PMU v2 does allow one to >> describe such a CPU, legacy software that knows nothing of AMD PMU v2 >> can expect four counters regardless. >> >> Having said that, KVM does provide a per-VM capability for disabling >> the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See >> section 8.35 in Documentation/virt/kvm/api.rst. > But this means in particular that QEMU should immediately > use this KVM_PMU_CAP_DISABLE if this capability is supported and PMU=off. I am > not seeing this code thus I believe that we have missed this. I think that this > change worth adding. We will measure the impact :-) Den > I used to have a patch to use KVM_PMU_CAP_DISABLE in QEMU, but that did not draw many developers' attention. https://lore.kernel.org/qemu-devel/20230621013821.6874-2-dongli.zhang@oracle.com/ It is time to first re-send that again. Dongli Zhang
On Thu, Nov 9, 2023 at 3:46 PM Denis V. Lunev <den@virtuozzo.com> wrote: > > On 11/9/23 23:52, Jim Mattson wrote: > > On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko > > <khorenko@virtuozzo.com> wrote: > >> Hi All, > >> > >> as a followup for my patch: i have noticed that > >> currently Intel kernel code provides an ability to detect if PMU is totally disabled for a VM > >> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0, > >> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/> in the VM config which > >> results in "-cpu pmu=off" qemu option). > >> > >> So the question is - is it possible to enhance the code for AMD to also honor PMU VM setting or it is > >> impossible by design? > > The AMD architectural specification prior to AMD PMU v2 does not allow > > one to describe a CPU (via CPUID or MSRs) that has fewer than 4 > > general purpose PMU counters. While AMD PMU v2 does allow one to > > describe such a CPU, legacy software that knows nothing of AMD PMU v2 > > can expect four counters regardless. > > > > Having said that, KVM does provide a per-VM capability for disabling > > the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See > > section 8.35 in Documentation/virt/kvm/api.rst. > But this means in particular that QEMU should immediately > use this KVM_PMU_CAP_DISABLE if this capability is supported and > PMU=off. I am not seeing this code thus I believe that we have missed > this. I think that this change worth adding. We will measure the impact > :-) Den At present, KVM will still blindly cycle through each GP counter (4 minimum for AMD) until it checks vcpu->kvm->arch.enable_pmu at the top of get_gp_pmc_amd(). Sean's proposal to clear the metadata should eliminate the overhead for VMs with the virtual PMU disabled. My proposal should eliminate the overhead even for VMs with the virtual PMU enabled, as long as no counters are programmed for "instructions retired."
On Thu, Nov 9, 2023 at 3:42 PM Sean Christopherson <seanjc@google.com> wrote: > > On Thu, Nov 09, 2023, Jim Mattson wrote: > > On Thu, Nov 9, 2023 at 10:24 AM Konstantin Khorenko > > <khorenko@virtuozzo.com> wrote: > > > > > > We have detected significant performance drop of our atomic test which > > > checks the rate of CPUID instructions rate inside an L1 VM on an AMD > > > node. > > > > > > Investigation led to 2 mainstream patches which have introduced extra > > > events accounting: > > > > > > 018d70ffcfec ("KVM: x86: Update vPMCs when retiring branch instructions") > > > 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions") > > > > > > And on an AMD Zen 3 CPU that resulted in immediate 43% drop in the CPUID > > > rate. > > > > > > Checking latest mainsteam kernel the performance difference is much less > > > but still quite noticeable: 13.4% and shows up on AMD CPUs only. > > > > > > Looks like iteration over all PMCs in kvm_pmu_trigger_event() is cheap > > > on Intel and expensive on AMD CPUs. > > > > > > So the idea behind this patch is to skip iterations over PMCs at all in > > > case PMU is disabled for a VM completely or PMU is enabled for a VM, but > > > there are no active PMCs at all. > > > > A better solution may be to maintain two bitmaps of general purpose > > counters that need to be incremented, one for instructions retired and > > one for branch instructions retired. Set or clear these bits whenever > > the PerfEvtSelN MSRs are written. I think I would keep the PGC bits > > separate, on those microarchitectures that support PGC. Then, > > kvm_pmu_trigger_event() need only consult the appropriate bitmap (or > > the logical and of that bitmap with PGC). In most cases, the value > > will be zero, and the function can simply return. > > > > This would work even for AMD microarchitectures that don't support PGC. > > Yeah. There are multiple lower-hanging fruits to be picked though, most of which > won't conflict with using dedicated per-event bitmaps, or at worst are trivial > to resolve. > > 1. Don't call into perf to get the eventsel (which generates an indirect call) > on every invocation, let alone every iteration. > > 2. Avoid getting the CPL when it's irrelevant. > > 3. Check the eventsel before querying the event filter. > > 4. Mask out PMCs that aren't globally enabled from the get-go (masking out > PMCs based on eventsel would essentially be the same as per-event bitmaps). The code below only looks at PGC. Even on CPUs that support PGC, some PMU clients still use the enable bits in the PerfEvtSelN. Linux perf, for instance, can't seem to make up its mind whether to use PGC or PerfEvtSelN.EN. > I'm definitely not opposed to per-event bitmaps, but it'd be nice to avoid them, > e.g. if we can eke out 99% of the performance just by doing a few obvious > optimizations. > > This is the end result of what I'm testing and will (hopefully) post shortly: > > static inline bool pmc_is_eventsel_match(struct kvm_pmc *pmc, u64 eventsel) > { > return !((pmc->eventsel ^ eventsel) & AMD64_RAW_EVENT_MASK_NB); > } The top nybble of AMD's 3-nybble event select collides with Intel's IN_TX and IN_TXCP bits. I think we can assert that the vCPU can't be in a transaction if KVM is emulating an instruction, but this probably merits a comment. The function name should also be more descriptive, so that it doesn't get misused out of context. :) > static inline bool cpl_is_matched(struct kvm_pmc *pmc) > { > bool select_os, select_user; > u64 config; > > if (pmc_is_gp(pmc)) { > config = pmc->eventsel; > select_os = config & ARCH_PERFMON_EVENTSEL_OS; > select_user = config & ARCH_PERFMON_EVENTSEL_USR; > } else { > config = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl, > pmc->idx - KVM_FIXED_PMC_BASE_IDX); > select_os = config & 0x1; > select_user = config & 0x2; > } > > /* > * Skip the CPL lookup, which isn't free on Intel, if the result will > * be the same regardless of the CPL. > */ > if (select_os == select_user) > return select_os; > > return (static_call(kvm_x86_get_cpl)(pmc->vcpu) == 0) ? select_os : select_user; > } > > void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 eventsel) > { > DECLARE_BITMAP(bitmap, X86_PMC_IDX_MAX); > struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > struct kvm_pmc *pmc; > int i; > > BUILD_BUG_ON(sizeof(pmu->global_ctrl) * BITS_PER_BYTE != X86_PMC_IDX_MAX); > > if (!kvm_pmu_has_perf_global_ctrl(pmu)) > bitmap_copy(bitmap, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX); > else if (!bitmap_and(bitmap, pmu->all_valid_pmc_idx, > (unsigned long *)&pmu->global_ctrl, X86_PMC_IDX_MAX)) > return; > > kvm_for_each_pmc(pmu, pmc, i, bitmap) { > /* Ignore checks for edge detect, pin control, invert and CMASK bits */ > if (!pmc_is_eventsel_match(pmc, eventsel) || > !pmc_event_is_allowed(pmc) || !cpl_is_matched(pmc)) > continue; > > kvm_pmu_incr_counter(pmc); > } > }
On Thu, Nov 09, 2023, Jim Mattson wrote: > On Thu, Nov 9, 2023 at 3:42 PM Sean Christopherson <seanjc@google.com> wrote: > > > > On Thu, Nov 09, 2023, Jim Mattson wrote: > > > A better solution may be to maintain two bitmaps of general purpose > > > counters that need to be incremented, one for instructions retired and > > > one for branch instructions retired. Set or clear these bits whenever > > > the PerfEvtSelN MSRs are written. I think I would keep the PGC bits > > > separate, on those microarchitectures that support PGC. Then, > > > kvm_pmu_trigger_event() need only consult the appropriate bitmap (or > > > the logical and of that bitmap with PGC). In most cases, the value > > > will be zero, and the function can simply return. > > > > > > This would work even for AMD microarchitectures that don't support PGC. > > > > Yeah. There are multiple lower-hanging fruits to be picked though, most of which > > won't conflict with using dedicated per-event bitmaps, or at worst are trivial > > to resolve. > > > > 1. Don't call into perf to get the eventsel (which generates an indirect call) > > on every invocation, let alone every iteration. > > > > 2. Avoid getting the CPL when it's irrelevant. > > > > 3. Check the eventsel before querying the event filter. > > > > 4. Mask out PMCs that aren't globally enabled from the get-go (masking out > > PMCs based on eventsel would essentially be the same as per-event bitmaps). > > The code below only looks at PGC. Even on CPUs that support PGC, some > PMU clients still use the enable bits in the PerfEvtSelN. Linux perf, > for instance, can't seem to make up its mind whether to use PGC or > PerfEvtSelN.EN. Ugh, as in perf leaves all GLOBAL_CTRL bits set and toggles only the eventsel enable bit? Lame. > > I'm definitely not opposed to per-event bitmaps, but it'd be nice to avoid them, > > e.g. if we can eke out 99% of the performance just by doing a few obvious > > optimizations. > > > > This is the end result of what I'm testing and will (hopefully) post shortly: > > > > static inline bool pmc_is_eventsel_match(struct kvm_pmc *pmc, u64 eventsel) > > { > > return !((pmc->eventsel ^ eventsel) & AMD64_RAW_EVENT_MASK_NB); > > } > > The top nybble of AMD's 3-nybble event select collides with Intel's > IN_TX and IN_TXCP bits. I think we can assert that the vCPU can't be > in a transaction if KVM is emulating an instruction, but this probably > merits a comment. Argh, more pre-existing crud. This is silly, the vendor mask is already in kvm_pmu_ops.EVENTSEL_EVENT. What's one more patch... > The function name should also be more descriptive, so that it doesn't get > misused out of context. :) Heh, I think I'll just delete the darn thing. That also makes it easier to understand the comment about ignoring certain fields.
On Fri, Nov 10, 2023, Sean Christopherson wrote: > On Thu, Nov 09, 2023, Jim Mattson wrote: > > On Thu, Nov 9, 2023 at 3:42 PM Sean Christopherson <seanjc@google.com> wrote: > > > static inline bool pmc_is_eventsel_match(struct kvm_pmc *pmc, u64 eventsel) > > > { > > > return !((pmc->eventsel ^ eventsel) & AMD64_RAW_EVENT_MASK_NB); > > > } > > > > The top nybble of AMD's 3-nybble event select collides with Intel's > > IN_TX and IN_TXCP bits. I think we can assert that the vCPU can't be > > in a transaction if KVM is emulating an instruction, but this probably > > merits a comment. > > Argh, more pre-existing crud. This is silly, the vendor mask is already in > kvm_pmu_ops.EVENTSEL_EVENT. What's one more patch... Ah, I see what your saying. Checking the bits is actually correct, probably through sheer dumb luck. I'll expand the comment to cover that and the reserved bits.
On 11/10/23 01:01, Dongli Zhang wrote: > > On 11/9/23 3:46 PM, Denis V. Lunev wrote: >> On 11/9/23 23:52, Jim Mattson wrote: >>> On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko >>> <khorenko@virtuozzo.com> wrote: >>>> Hi All, >>>> >>>> as a followup for my patch: i have noticed that >>>> currently Intel kernel code provides an ability to detect if PMU is totally >>>> disabled for a VM >>>> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0, >>>> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/> >>>> in the VM config which >>>> results in "-cpu pmu=off" qemu option). >>>> >>>> So the question is - is it possible to enhance the code for AMD to also honor >>>> PMU VM setting or it is >>>> impossible by design? >>> The AMD architectural specification prior to AMD PMU v2 does not allow >>> one to describe a CPU (via CPUID or MSRs) that has fewer than 4 >>> general purpose PMU counters. While AMD PMU v2 does allow one to >>> describe such a CPU, legacy software that knows nothing of AMD PMU v2 >>> can expect four counters regardless. >>> >>> Having said that, KVM does provide a per-VM capability for disabling >>> the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See >>> section 8.35 in Documentation/virt/kvm/api.rst. >> But this means in particular that QEMU should immediately >> use this KVM_PMU_CAP_DISABLE if this capability is supported and PMU=off. I am >> not seeing this code thus I believe that we have missed this. I think that this >> change worth adding. We will measure the impact :-) Den >> > I used to have a patch to use KVM_PMU_CAP_DISABLE in QEMU, but that did not draw > many developers' attention. > > https://lore.kernel.org/qemu-devel/20230621013821.6874-2-dongli.zhang@oracle.com/ > > It is time to first re-send that again. > > Dongli Zhang We have checked that setting KVM_PMU_CAP_DISABLE really helps. Konstantin has done this and this is good. On the other hand, looking into these patches I disagree with them. We should not introduce new option for QEMU. If PMU is disabled, i.e. we assume that pmu=off passed in the command line, we should set KVM_PMU_CAP_DISABLE for that virtual machine. Den
Hi Denis, On 11/13/23 01:31, Denis V. Lunev wrote: > On 11/10/23 01:01, Dongli Zhang wrote: >> >> On 11/9/23 3:46 PM, Denis V. Lunev wrote: >>> On 11/9/23 23:52, Jim Mattson wrote: >>>> On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko >>>> <khorenko@virtuozzo.com> wrote: >>>>> Hi All, >>>>> >>>>> as a followup for my patch: i have noticed that >>>>> currently Intel kernel code provides an ability to detect if PMU is totally >>>>> disabled for a VM >>>>> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0, >>>>> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/> >>>>> in the VM config which >>>>> results in "-cpu pmu=off" qemu option). >>>>> >>>>> So the question is - is it possible to enhance the code for AMD to also honor >>>>> PMU VM setting or it is >>>>> impossible by design? >>>> The AMD architectural specification prior to AMD PMU v2 does not allow >>>> one to describe a CPU (via CPUID or MSRs) that has fewer than 4 >>>> general purpose PMU counters. While AMD PMU v2 does allow one to >>>> describe such a CPU, legacy software that knows nothing of AMD PMU v2 >>>> can expect four counters regardless. >>>> >>>> Having said that, KVM does provide a per-VM capability for disabling >>>> the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See >>>> section 8.35 in Documentation/virt/kvm/api.rst. >>> But this means in particular that QEMU should immediately >>> use this KVM_PMU_CAP_DISABLE if this capability is supported and PMU=off. I am >>> not seeing this code thus I believe that we have missed this. I think that this >>> change worth adding. We will measure the impact :-) Den >>> >> I used to have a patch to use KVM_PMU_CAP_DISABLE in QEMU, but that did not draw >> many developers' attention. >> >> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20230621013821.6874-2-dongli.zhang@oracle.com/__;!!ACWV5N9M2RV99hQ!McSH2M-kuHmzAwTuXKxrjLkrdJoPqML6cY_Ndc-8k9LRQ7D1V9bSBRQPwHqtx9XCVLK3uzdsMaxyfwve$ >> It is time to first re-send that again. >> >> Dongli Zhang > We have checked that setting KVM_PMU_CAP_DISABLE really helps. Konstantin has > done this and this is good. On the other hand, looking into these patches I > disagree with them. We should not introduce new option for QEMU. If PMU is > disabled, i.e. we assume that pmu=off passed in the command line, we should set > KVM_PMU_CAP_DISABLE for that virtual machine. Den Can I assume you meant pmu=off, that is, cpu->enable_pmu in QEMU? In my opinion, cpu->enable_pmu indicates the option to control the cpu features. It may be used by any accelerators, and it is orthogonal to the KVM cap. The KVM_PMU_CAP_DISABLE is only specific to the KVM accelerator. That's why I had introduced a new option, to allow to configure the VM in my dimensions. It means one dimension to AMD, but two for Intel: to disable PMU via cpuid, or KVM cap. Anyway, this is KVM mailing list, and I may initiate the discussion in QEMU list. Thank you very much! Dongli Zhang
On 11/13/23 15:14, Dongli Zhang wrote: > Hi Denis, > > On 11/13/23 01:31, Denis V. Lunev wrote: >> On 11/10/23 01:01, Dongli Zhang wrote: >>> On 11/9/23 3:46 PM, Denis V. Lunev wrote: >>>> On 11/9/23 23:52, Jim Mattson wrote: >>>>> On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko >>>>> <khorenko@virtuozzo.com> wrote: >>>>>> Hi All, >>>>>> >>>>>> as a followup for my patch: i have noticed that >>>>>> currently Intel kernel code provides an ability to detect if PMU is totally >>>>>> disabled for a VM >>>>>> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0, >>>>>> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/> >>>>>> in the VM config which >>>>>> results in "-cpu pmu=off" qemu option). >>>>>> >>>>>> So the question is - is it possible to enhance the code for AMD to also honor >>>>>> PMU VM setting or it is >>>>>> impossible by design? >>>>> The AMD architectural specification prior to AMD PMU v2 does not allow >>>>> one to describe a CPU (via CPUID or MSRs) that has fewer than 4 >>>>> general purpose PMU counters. While AMD PMU v2 does allow one to >>>>> describe such a CPU, legacy software that knows nothing of AMD PMU v2 >>>>> can expect four counters regardless. >>>>> >>>>> Having said that, KVM does provide a per-VM capability for disabling >>>>> the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See >>>>> section 8.35 in Documentation/virt/kvm/api.rst. >>>> But this means in particular that QEMU should immediately >>>> use this KVM_PMU_CAP_DISABLE if this capability is supported and PMU=off. I am >>>> not seeing this code thus I believe that we have missed this. I think that this >>>> change worth adding. We will measure the impact :-) Den >>>> >>> I used to have a patch to use KVM_PMU_CAP_DISABLE in QEMU, but that did not draw >>> many developers' attention. >>> >>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20230621013821.6874-2-dongli.zhang@oracle.com/__;!!ACWV5N9M2RV99hQ!McSH2M-kuHmzAwTuXKxrjLkrdJoPqML6cY_Ndc-8k9LRQ7D1V9bSBRQPwHqtx9XCVLK3uzdsMaxyfwve$ >>> It is time to first re-send that again. >>> >>> Dongli Zhang >> We have checked that setting KVM_PMU_CAP_DISABLE really helps. Konstantin has >> done this and this is good. On the other hand, looking into these patches I >> disagree with them. We should not introduce new option for QEMU. If PMU is >> disabled, i.e. we assume that pmu=off passed in the command line, we should set >> KVM_PMU_CAP_DISABLE for that virtual machine. Den > Can I assume you meant pmu=off, that is, cpu->enable_pmu in QEMU? > > In my opinion, cpu->enable_pmu indicates the option to control the cpu features. > It may be used by any accelerators, and it is orthogonal to the KVM cap. > > > The KVM_PMU_CAP_DISABLE is only specific to the KVM accelerator. > > > That's why I had introduced a new option, to allow to configure the VM in my > dimensions. > > It means one dimension to AMD, but two for Intel: to disable PMU via cpuid, or > KVM cap. > > Anyway, this is KVM mailing list, and I may initiate the discussion in QEMU list. > > Thank you very much! > > Dongli Zhang with the option pmu='off' it is expected that PMU should be off for the guest. At the moment (without this KVM capability) we can disable PMU for Intel only and thus have performance degradation on AMD. This option disables PMU and thus normally when we are running KVM guest and wanting PMU to be off it would be required to * disable CPUID leaf for Intel * set KVM_PMU_CAP_DISABLE for both processors This would be quite natural and transparent for the libvirt. Alexander will prepare the patch today or tomorrow for the discussion. Den
On 11/13/23 06:42, Denis V. Lunev wrote: > On 11/13/23 15:14, Dongli Zhang wrote: >> Hi Denis, >> >> On 11/13/23 01:31, Denis V. Lunev wrote: >>> On 11/10/23 01:01, Dongli Zhang wrote: >>>> On 11/9/23 3:46 PM, Denis V. Lunev wrote: >>>>> On 11/9/23 23:52, Jim Mattson wrote: >>>>>> On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko >>>>>> <khorenko@virtuozzo.com> wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> as a followup for my patch: i have noticed that >>>>>>> currently Intel kernel code provides an ability to detect if PMU is totally >>>>>>> disabled for a VM >>>>>>> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0, >>>>>>> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/> >>>>>>> in the VM config which >>>>>>> results in "-cpu pmu=off" qemu option). >>>>>>> >>>>>>> So the question is - is it possible to enhance the code for AMD to also >>>>>>> honor >>>>>>> PMU VM setting or it is >>>>>>> impossible by design? >>>>>> The AMD architectural specification prior to AMD PMU v2 does not allow >>>>>> one to describe a CPU (via CPUID or MSRs) that has fewer than 4 >>>>>> general purpose PMU counters. While AMD PMU v2 does allow one to >>>>>> describe such a CPU, legacy software that knows nothing of AMD PMU v2 >>>>>> can expect four counters regardless. >>>>>> >>>>>> Having said that, KVM does provide a per-VM capability for disabling >>>>>> the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See >>>>>> section 8.35 in Documentation/virt/kvm/api.rst. >>>>> But this means in particular that QEMU should immediately >>>>> use this KVM_PMU_CAP_DISABLE if this capability is supported and PMU=off. I am >>>>> not seeing this code thus I believe that we have missed this. I think that >>>>> this >>>>> change worth adding. We will measure the impact :-) Den >>>>> >>>> I used to have a patch to use KVM_PMU_CAP_DISABLE in QEMU, but that did not >>>> draw >>>> many developers' attention. >>>> >>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20230621013821.6874-2-dongli.zhang@oracle.com/__;!!ACWV5N9M2RV99hQ!McSH2M-kuHmzAwTuXKxrjLkrdJoPqML6cY_Ndc-8k9LRQ7D1V9bSBRQPwHqtx9XCVLK3uzdsMaxyfwve$ >>>> It is time to first re-send that again. >>>> >>>> Dongli Zhang >>> We have checked that setting KVM_PMU_CAP_DISABLE really helps. Konstantin has >>> done this and this is good. On the other hand, looking into these patches I >>> disagree with them. We should not introduce new option for QEMU. If PMU is >>> disabled, i.e. we assume that pmu=off passed in the command line, we should set >>> KVM_PMU_CAP_DISABLE for that virtual machine. Den >> Can I assume you meant pmu=off, that is, cpu->enable_pmu in QEMU? >> >> In my opinion, cpu->enable_pmu indicates the option to control the cpu features. >> It may be used by any accelerators, and it is orthogonal to the KVM cap. >> >> >> The KVM_PMU_CAP_DISABLE is only specific to the KVM accelerator. >> >> >> That's why I had introduced a new option, to allow to configure the VM in my >> dimensions. >> >> It means one dimension to AMD, but two for Intel: to disable PMU via cpuid, or >> KVM cap. >> >> Anyway, this is KVM mailing list, and I may initiate the discussion in QEMU list. >> >> Thank you very much! >> >> Dongli Zhang > with the option pmu='off' it is expected that PMU should be > off for the guest. At the moment (without this KVM capability) > we can disable PMU for Intel only and thus have performance > degradation on AMD. > > This option disables PMU and thus normally when we are > running KVM guest and wanting PMU to be off it would > be required to > * disable CPUID leaf for Intel > * set KVM_PMU_CAP_DISABLE for both processors This would be quite natural and > transparent for the libvirt. Alexander will prepare the patch today or tomorrow > for the discussion. Den That is what I had implemented in the v1 of patch. https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com/ However, I changed that after people suggested introduce a new property. Dongli Zhang
On 11/13/23 17:17, Dongli Zhang wrote: > > On 11/13/23 06:42, Denis V. Lunev wrote: >> On 11/13/23 15:14, Dongli Zhang wrote: >>> Hi Denis, >>> >>> On 11/13/23 01:31, Denis V. Lunev wrote: >>>> On 11/10/23 01:01, Dongli Zhang wrote: >>>>> On 11/9/23 3:46 PM, Denis V. Lunev wrote: >>>>>> On 11/9/23 23:52, Jim Mattson wrote: >>>>>>> On Thu, Nov 9, 2023 at 10:18 AM Konstantin Khorenko >>>>>>> <khorenko@virtuozzo.com> wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> as a followup for my patch: i have noticed that >>>>>>>> currently Intel kernel code provides an ability to detect if PMU is totally >>>>>>>> disabled for a VM >>>>>>>> (pmu->version == 0 in this case), but for AMD code pmu->version is never 0, >>>>>>>> no matter if PMU is enabled or disabled for a VM (i mean <pmu state='off'/> >>>>>>>> in the VM config which >>>>>>>> results in "-cpu pmu=off" qemu option). >>>>>>>> >>>>>>>> So the question is - is it possible to enhance the code for AMD to also >>>>>>>> honor >>>>>>>> PMU VM setting or it is >>>>>>>> impossible by design? >>>>>>> The AMD architectural specification prior to AMD PMU v2 does not allow >>>>>>> one to describe a CPU (via CPUID or MSRs) that has fewer than 4 >>>>>>> general purpose PMU counters. While AMD PMU v2 does allow one to >>>>>>> describe such a CPU, legacy software that knows nothing of AMD PMU v2 >>>>>>> can expect four counters regardless. >>>>>>> >>>>>>> Having said that, KVM does provide a per-VM capability for disabling >>>>>>> the virtual PMU: KVM_CAP_PMU_CAPABILITY(KVM_PMU_CAP_DISABLE). See >>>>>>> section 8.35 in Documentation/virt/kvm/api.rst. >>>>>> But this means in particular that QEMU should immediately >>>>>> use this KVM_PMU_CAP_DISABLE if this capability is supported and PMU=off. I am >>>>>> not seeing this code thus I believe that we have missed this. I think that >>>>>> this >>>>>> change worth adding. We will measure the impact :-) Den >>>>>> >>>>> I used to have a patch to use KVM_PMU_CAP_DISABLE in QEMU, but that did not >>>>> draw >>>>> many developers' attention. >>>>> >>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20230621013821.6874-2-dongli.zhang@oracle.com/__;!!ACWV5N9M2RV99hQ!McSH2M-kuHmzAwTuXKxrjLkrdJoPqML6cY_Ndc-8k9LRQ7D1V9bSBRQPwHqtx9XCVLK3uzdsMaxyfwve$ >>>>> It is time to first re-send that again. >>>>> >>>>> Dongli Zhang >>>> We have checked that setting KVM_PMU_CAP_DISABLE really helps. Konstantin has >>>> done this and this is good. On the other hand, looking into these patches I >>>> disagree with them. We should not introduce new option for QEMU. If PMU is >>>> disabled, i.e. we assume that pmu=off passed in the command line, we should set >>>> KVM_PMU_CAP_DISABLE for that virtual machine. Den >>> Can I assume you meant pmu=off, that is, cpu->enable_pmu in QEMU? >>> >>> In my opinion, cpu->enable_pmu indicates the option to control the cpu features. >>> It may be used by any accelerators, and it is orthogonal to the KVM cap. >>> >>> >>> The KVM_PMU_CAP_DISABLE is only specific to the KVM accelerator. >>> >>> >>> That's why I had introduced a new option, to allow to configure the VM in my >>> dimensions. >>> >>> It means one dimension to AMD, but two for Intel: to disable PMU via cpuid, or >>> KVM cap. >>> >>> Anyway, this is KVM mailing list, and I may initiate the discussion in QEMU list. >>> >>> Thank you very much! >>> >>> Dongli Zhang >> with the option pmu='off' it is expected that PMU should be >> off for the guest. At the moment (without this KVM capability) >> we can disable PMU for Intel only and thus have performance >> degradation on AMD. >> >> This option disables PMU and thus normally when we are >> running KVM guest and wanting PMU to be off it would >> be required to >> * disable CPUID leaf for Intel >> * set KVM_PMU_CAP_DISABLE for both processors This would be quite natural and >> transparent for the libvirt. Alexander will prepare the patch today or tomorrow >> for the discussion. Den > That is what I had implemented in the v1 of patch. > > https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com/ > > However, I changed that after people suggested introduce a new property. > > Dongli Zhang That would save a bit of our work :) For me this patch looks absolutely awesome and is doing exactly what I want to do in our downstream. This would get us required 15+% benefit for each VMexit. Den