Message ID | 1537437959-8751-5-git-send-email-wei.w.wang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Guest LBR Enabling | expand |
On Thu, Sep 20, 2018 at 06:05:58PM +0800, Wei Wang wrote: > diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h > index 1562863..a91fdef 100644 > --- a/arch/x86/events/perf_event.h > +++ b/arch/x86/events/perf_event.h > @@ -223,6 +223,7 @@ struct cpu_hw_events { > */ > u64 intel_ctrl_guest_mask; > u64 intel_ctrl_host_mask; > + bool guest_lbr; > struct perf_guest_switch_msr guest_switch_msrs[X86_PMC_IDX_MAX]; > > /* Please, no bools in aggregate types. Either use a bitfield or u8, there's more LBR related bools, convert and group the lot to avoid holes.
On Thu, Sep 20, 2018 at 06:05:58PM +0800, Wei Wang wrote: > diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c > index 5ab4a36..97a29d7 100644 > --- a/arch/x86/kvm/pmu_intel.c > +++ b/arch/x86/kvm/pmu_intel.c > @@ -342,6 +342,47 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu) > pmu->global_ovf_ctrl = 0; > } > > +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu) > +{ > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > + struct perf_event *event; > + struct perf_event_attr attr = { > + .type = PERF_TYPE_RAW, > + .size = sizeof(attr), Bit yuck to imply .config = 0, like that. > + .pinned = true, Note that this can still result in failing to schedule the event, you create a pinned task event, but a pinned cpu event takes precedence and can claim the LBR. How do you deal with that situation, where the LBR is in use by a host event? > + .exclude_host = true, Didn't you mean to use .exclude_guest ? You don't want this thing to run when the guest is running, right? But I still don't like this hack much, what happens if userspace sets that bit? You seems to have forgotten to combine it with PF_VCPU or whatever is the appropriate bit to check. Similarly, how does the existing exclude_{gues,host} crud work? > + .sample_type = PERF_SAMPLE_BRANCH_STACK, What's the point of setting .sample_type if !.sample_period ? > + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK | > + PERF_SAMPLE_BRANCH_USER | > + PERF_SAMPLE_BRANCH_KERNEL, > + }; I think this function wants a comment to explain wth its doing and why, because if someone stumbles over this code in a few months nobody will remember those things. > + > + if (pmu->guest_lbr_event) > + return 0; > + > + event = perf_event_create_kernel_counter(&attr, -1, current, NULL, > + NULL); > + if (IS_ERR(event)) { > + pr_err("%s: failed %ld\n", __func__, PTR_ERR(event)); > + return -ENOENT; > + } > + pmu->guest_lbr_event = event; > + > + return 0; > +}
> +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu) > +{ > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > + struct perf_event *event; > + struct perf_event_attr attr = { > + .type = PERF_TYPE_RAW, > + .size = sizeof(attr), > + .pinned = true, > + .exclude_host = true, > + .sample_type = PERF_SAMPLE_BRANCH_STACK, > + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK | > + PERF_SAMPLE_BRANCH_USER | > + PERF_SAMPLE_BRANCH_KERNEL, I think that will allocate an extra perfmon counter, right? So if the guest wants to use all perfmon counters we would start to multiplex, which would be bad Would need to fix perf core to not allocate a perfmon counter in this case, if no period and no event count is requested. -Andi
On Thu, Sep 20, 2018 at 08:30:35AM -0700, Andi Kleen wrote: > > +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu) > > +{ > > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > > + struct perf_event *event; > > + struct perf_event_attr attr = { > > + .type = PERF_TYPE_RAW, > > + .size = sizeof(attr), > > + .pinned = true, > > + .exclude_host = true, > > + .sample_type = PERF_SAMPLE_BRANCH_STACK, > > + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK | > > + PERF_SAMPLE_BRANCH_USER | > > + PERF_SAMPLE_BRANCH_KERNEL, > > I think that will allocate an extra perfmon counter, right? I throught the same too, but I think the exclude_host/guest, whichever is the right one makes that work for regular counters too. That code is a wee bit magical and I didn't take the time to reverse engineer that. It most certainly needs a comment.
On Thursday, September 20, 2018 11:31 PM, Andi Kleen wrote: > > +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu) { > > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > > + struct perf_event *event; > > + struct perf_event_attr attr = { > > + .type = PERF_TYPE_RAW, > > + .size = sizeof(attr), > > + .pinned = true, > > + .exclude_host = true, > > + .sample_type = PERF_SAMPLE_BRANCH_STACK, > > + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK > | > > + PERF_SAMPLE_BRANCH_USER | > > + PERF_SAMPLE_BRANCH_KERNEL, Sorry for my late response (I was digging into some of the details). > I think that will allocate an extra perfmon counter, right? > > So if the guest wants to use all perfmon counters we would start to multiplex, > which would be bad Right. The host side counter allocation is not necessary. > Would need to fix perf core to not allocate a perfmon counter in this case, if > no period and no event count is requested. > If no period (i.e. .sample_period = 0), the current code still uses one counter with the period set to the max period. If we change that to do no counter allocation, would that break any of the existing usages (I didn't find the reason why the existing code not avoid that allocation for a non-sampling event)? Best, Wei
On Friday, September 21, 2018 12:24 AM, Peter Zijlstra wrote: > On Thu, Sep 20, 2018 at 08:30:35AM -0700, Andi Kleen wrote: > > > +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu) { > > > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); > > > + struct perf_event *event; > > > + struct perf_event_attr attr = { > > > + .type = PERF_TYPE_RAW, > > > + .size = sizeof(attr), > > > + .pinned = true, > > > + .exclude_host = true, > > > + .sample_type = PERF_SAMPLE_BRANCH_STACK, > > > + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK > | > > > + PERF_SAMPLE_BRANCH_USER | > > > + PERF_SAMPLE_BRANCH_KERNEL, > > > > I think that will allocate an extra perfmon counter, right? > > I throught the same too, but I think the exclude_host/guest, whichever is the > right one makes that work for regular counters too. Sorry for being late. I'm not sure if exclude_host/guest would be suitable, for example, if the guest wants to use a perf counter, host will create a perf event with "exclude_host=true" to have the counter not count in host. And "exclude_guest=true" is a flag to the perf core that the counter should not count when the guest runs. What would you think if we add a new flag (e.g. .force_no_counters) to the perf core to indicate not allocating a perf counter? > That code is a wee bit magical and I didn't take the time to reverse engineer > that. It most certainly needs a comment. No problem. I will add more comments in the next version. Best, Wei
On Thursday, September 20, 2018 8:32 PM, Peter Zijlstra wrote: > On Thu, Sep 20, 2018 at 06:05:58PM +0800, Wei Wang wrote: > How do you deal with that situation, where the LBR is in use by a host event? I guess you were referring to this use case: "perf record -b qemu-system-x86_64..", where the qemu process also needs lbr. I also discussed with Andi about this in v1, where we switch the lbr state on VMX transitions. That could achieve the above, but adds too much overhead to VMX transitions, which are frequent. Considering the above usage is not common (it's more common to just have the task in the guest use the lbr feature), we decided to not take that usage into account, and switch lbr state only on the vCPU switching. Best, Wei
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c index c81f160..915fcc3 100644 --- a/arch/x86/events/intel/lbr.c +++ b/arch/x86/events/intel/lbr.c @@ -462,6 +462,9 @@ void intel_pmu_lbr_add(struct perf_event *event) if (!x86_pmu.lbr_nr) return; + if (event->attr.exclude_host) + cpuc->guest_lbr = true; + cpuc->br_sel = event->hw.branch_reg.reg; if (branch_user_callstack(cpuc->br_sel) && event->ctx->task_ctx_data) { @@ -507,6 +510,7 @@ void intel_pmu_lbr_del(struct perf_event *event) task_ctx->lbr_callstack_users--; } + cpuc->guest_lbr = false; cpuc->lbr_users--; WARN_ON_ONCE(cpuc->lbr_users < 0); perf_sched_cb_dec(event->ctx->pmu); @@ -516,7 +520,7 @@ void intel_pmu_lbr_enable_all(bool pmi) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); - if (cpuc->lbr_users) + if (cpuc->lbr_users && !cpuc->guest_lbr) __intel_pmu_lbr_enable(pmi); } @@ -524,7 +528,7 @@ void intel_pmu_lbr_disable_all(void) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); - if (cpuc->lbr_users) + if (cpuc->lbr_users && !cpuc->guest_lbr) __intel_pmu_lbr_disable(); } @@ -658,7 +662,7 @@ void intel_pmu_lbr_read(void) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); - if (!cpuc->lbr_users) + if (!cpuc->lbr_users || cpuc->guest_lbr) return; if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_32) diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 1562863..a91fdef 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -223,6 +223,7 @@ struct cpu_hw_events { */ u64 intel_ctrl_guest_mask; u64 intel_ctrl_host_mask; + bool guest_lbr; struct perf_guest_switch_msr guest_switch_msrs[X86_PMC_IDX_MAX]; /* diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4a46e31..fdcac01 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -432,6 +432,7 @@ struct kvm_pmu { struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED]; struct irq_work irq_work; u64 reprogram_pmi; + struct perf_event *guest_lbr_event; }; struct kvm_pmu_ops; diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index ba8898e..e872aed 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -120,6 +120,9 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu); bool is_vmware_backdoor_pmc(u32 pmc_idx); +extern int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu); +extern void intel_pmu_disable_save_guest_lbr(struct kvm_vcpu *vcpu); + extern struct kvm_pmu_ops intel_pmu_ops; extern struct kvm_pmu_ops amd_pmu_ops; #endif /* __KVM_X86_PMU_H */ diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c index 5ab4a36..97a29d7 100644 --- a/arch/x86/kvm/pmu_intel.c +++ b/arch/x86/kvm/pmu_intel.c @@ -342,6 +342,47 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu) pmu->global_ovf_ctrl = 0; } +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu) +{ + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + struct perf_event *event; + struct perf_event_attr attr = { + .type = PERF_TYPE_RAW, + .size = sizeof(attr), + .pinned = true, + .exclude_host = true, + .sample_type = PERF_SAMPLE_BRANCH_STACK, + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK | + PERF_SAMPLE_BRANCH_USER | + PERF_SAMPLE_BRANCH_KERNEL, + }; + + if (pmu->guest_lbr_event) + return 0; + + event = perf_event_create_kernel_counter(&attr, -1, current, NULL, + NULL); + if (IS_ERR(event)) { + pr_err("%s: failed %ld\n", __func__, PTR_ERR(event)); + return -ENOENT; + } + pmu->guest_lbr_event = event; + + return 0; +} + +void intel_pmu_disable_save_guest_lbr(struct kvm_vcpu *vcpu) +{ + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + struct perf_event *event = pmu->guest_lbr_event; + + if (!event) + return; + + perf_event_release_kernel(event); + pmu->guest_lbr_event = NULL; +} + struct kvm_pmu_ops intel_pmu_ops = { .find_arch_event = intel_find_arch_event, .find_fixed_event = intel_find_fixed_event,