From patchwork Tue Aug 6 07:16:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078401 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7CAEF14DB for ; Tue, 6 Aug 2019 08:05:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6EBF8288A9 for ; Tue, 6 Aug 2019 08:05:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6302D28893; Tue, 6 Aug 2019 08:05:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F244F286BC for ; Tue, 6 Aug 2019 08:05:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732259AbfHFIFD (ORCPT ); Tue, 6 Aug 2019 04:05:03 -0400 Received: from mga03.intel.com ([134.134.136.65]:5765 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728998AbfHFIFC (ORCPT ); Tue, 6 Aug 2019 04:05:02 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:00:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337277" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:00:35 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 01/14] perf/x86: fix the variable type of the lbr msrs Date: Tue, 6 Aug 2019 15:16:01 +0800 Message-Id: <1565075774-26671-2-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> References: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The msr variable type can be "unsigned int", which uses less memory than the longer unsigned long. The lbr nr won't be a negative number, so make it "unsigned int" as well. Cc: Peter Zijlstra Cc: Andi Kleen Suggested-by: Peter Zijlstra Signed-off-by: Wei Wang --- arch/x86/events/perf_event.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 8751008..27e4d32 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -660,8 +660,8 @@ struct x86_pmu { /* * Intel LBR */ - unsigned long lbr_tos, lbr_from, lbr_to; /* MSR base regs */ - int lbr_nr; /* hardware stack size */ + unsigned int lbr_tos, lbr_from, lbr_to, + lbr_nr; /* lbr stack and size */ u64 lbr_sel_mask; /* LBR_SELECT valid bits */ const int *lbr_sel_map; /* lbr_select mappings */ bool lbr_double_abort; /* duplicated lbr aborts */ From patchwork Tue Aug 6 07:16:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078403 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F0B214DB for ; Tue, 6 Aug 2019 08:05:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 603A12890B for ; Tue, 6 Aug 2019 08:05:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 541D527F98; Tue, 6 Aug 2019 08:05:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E3DB42890B for ; Tue, 6 Aug 2019 08:05:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732291AbfHFIFK (ORCPT ); Tue, 6 Aug 2019 04:05:10 -0400 Received: from mga03.intel.com ([134.134.136.65]:5765 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732235AbfHFIFD (ORCPT ); Tue, 6 Aug 2019 04:05:03 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:00:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337285" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:00:38 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 02/14] perf/x86: add a function to get the addresses of the lbr stack msrs Date: Tue, 6 Aug 2019 15:16:02 +0800 Message-Id: <1565075774-26671-3-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> References: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The lbr stack msrs are model specific. The perf subsystem has already assigned the abstracted msr address values based on the cpu model. So add a function to enable callers outside the perf subsystem to get the lbr stack addresses. This is useful for hypervisors to emulate the lbr feature for the guest. Cc: Paolo Bonzini Cc: Andi Kleen Cc: Peter Zijlstra Signed-off-by: Wei Wang --- arch/x86/events/intel/lbr.c | 23 +++++++++++++++++++++++ arch/x86/include/asm/perf_event.h | 14 ++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c index 6f814a2..9b2d05c 100644 --- a/arch/x86/events/intel/lbr.c +++ b/arch/x86/events/intel/lbr.c @@ -1311,3 +1311,26 @@ void intel_pmu_lbr_init_knl(void) if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_LIP) x86_pmu.intel_cap.lbr_format = LBR_FORMAT_EIP_FLAGS; } + +/** + * x86_perf_get_lbr_stack - get the lbr stack related msrs + * + * @stack: the caller's memory to get the lbr stack + * + * Returns: 0 indicates that the lbr stack has been successfully obtained. + */ +int x86_perf_get_lbr_stack(struct x86_perf_lbr_stack *stack) +{ + stack->nr = x86_pmu.lbr_nr; + stack->tos = x86_pmu.lbr_tos; + stack->from = x86_pmu.lbr_from; + stack->to = x86_pmu.lbr_to; + + if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO) + stack->info = MSR_LBR_INFO_0; + else + stack->info = 0; + + return 0; +} +EXPORT_SYMBOL_GPL(x86_perf_get_lbr_stack); diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index 1392d5e..2606100 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -318,7 +318,16 @@ struct perf_guest_switch_msr { u64 host, guest; }; +struct x86_perf_lbr_stack { + unsigned int nr; + unsigned int tos; + unsigned int from; + unsigned int to; + unsigned int info; +}; + extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr); +extern int x86_perf_get_lbr_stack(struct x86_perf_lbr_stack *stack); extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap); extern void perf_check_microcode(void); extern int x86_perf_rdpmc_index(struct perf_event *event); @@ -329,6 +338,11 @@ static inline struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr) return NULL; } +static inline int x86_perf_get_lbr_stack(struct x86_perf_lbr_stack *stack) +{ + return -1; +} + static inline void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap) { memset(cap, 0, sizeof(*cap)); From patchwork Tue Aug 6 07:16:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078399 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9A53814DB for ; Tue, 6 Aug 2019 08:05:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8B4BD2887E for ; Tue, 6 Aug 2019 08:05:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7F08A2896D; Tue, 6 Aug 2019 08:05:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 142022887E for ; Tue, 6 Aug 2019 08:05:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732277AbfHFIFD (ORCPT ); Tue, 6 Aug 2019 04:05:03 -0400 Received: from mga03.intel.com ([134.134.136.65]:5765 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732264AbfHFIFD (ORCPT ); Tue, 6 Aug 2019 04:05:03 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:00:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337292" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:00:40 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 03/14] KVM/x86: KVM_CAP_X86_GUEST_LBR Date: Tue, 6 Aug 2019 15:16:03 +0800 Message-Id: <1565075774-26671-4-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> References: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Introduce KVM_CAP_X86_GUEST_LBR to allow per-VM enabling of the guest lbr feature. Signed-off-by: Wei Wang --- Documentation/virt/kvm/api.txt | 26 ++++++++++++++++++++++++++ arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/x86.c | 16 ++++++++++++++++ include/uapi/linux/kvm.h | 1 + 4 files changed, 45 insertions(+) diff --git a/Documentation/virt/kvm/api.txt b/Documentation/virt/kvm/api.txt index 2d06776..64632a8 100644 --- a/Documentation/virt/kvm/api.txt +++ b/Documentation/virt/kvm/api.txt @@ -5046,6 +5046,32 @@ it hard or impossible to use it correctly. The availability of KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 signals that those bugs are fixed. Userspace should not try to use KVM_CAP_MANUAL_DIRTY_LOG_PROTECT. +7.19 KVM_CAP_X86_GUEST_LBR +Architectures: x86 +Parameters: args[0] whether feature should be enabled or not + args[1] pointer to the userspace memory to load the lbr stack info + +The lbr stack info is described by +struct x86_perf_lbr_stack { + unsigned int nr; + unsigned int tos; + unsigned int from; + unsigned int to; + unsigned int info; +}; + +@nr: number of lbr stack entries +@tos: index of the top of stack msr +@from: index of the msr that stores a branch source address +@to: index of the msr that stores a branch destination address +@info: index of the msr that stores lbr related flags + +Enabling this capability allows guest accesses to the lbr feature. Otherwise, +#GP will be injected to the guest when it accesses to the lbr related msrs. + +After the feature is enabled, before exiting to userspace, kvm handlers should +fill the lbr stack info into the userspace memory pointed by args[1]. + 8. Other capabilities. ---------------------- diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 7b0a4ee..d29dddd 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -875,6 +875,7 @@ struct kvm_arch { atomic_t vapics_in_nmi_mode; struct mutex apic_map_lock; struct kvm_apic_map *apic_map; + struct x86_perf_lbr_stack lbr_stack; bool apic_access_page_done; @@ -884,6 +885,7 @@ struct kvm_arch { bool hlt_in_guest; bool pause_in_guest; bool cstate_in_guest; + bool lbr_in_guest; unsigned long irq_sources_bitmap; s64 kvmclock_offset; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c6d951c..e1eb1be 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3129,6 +3129,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_EXCEPTION_PAYLOAD: r = 1; break; + case KVM_CAP_X86_GUEST_LBR: + r = sizeof(struct x86_perf_lbr_stack); + break; case KVM_CAP_SYNC_REGS: r = KVM_SYNC_X86_VALID_FIELDS; break; @@ -4670,6 +4673,19 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, kvm->arch.exception_payload_enabled = cap->args[0]; r = 0; break; + case KVM_CAP_X86_GUEST_LBR: + r = -EINVAL; + if (cap->args[0] && + x86_perf_get_lbr_stack(&kvm->arch.lbr_stack)) + break; + + if (copy_to_user((void __user *)cap->args[1], + &kvm->arch.lbr_stack, + sizeof(struct x86_perf_lbr_stack))) + break; + kvm->arch.lbr_in_guest = cap->args[0]; + r = 0; + break; default: r = -EINVAL; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5e3f12d..dd53edc 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -996,6 +996,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 #define KVM_CAP_PMU_EVENT_FILTER 173 +#define KVM_CAP_X86_GUEST_LBR 174 #ifdef KVM_CAP_IRQ_ROUTING From patchwork Tue Aug 6 07:16:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078405 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7292F14DB for ; Tue, 6 Aug 2019 08:05:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 61B0728876 for ; Tue, 6 Aug 2019 08:05:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 55C052887E; Tue, 6 Aug 2019 08:05:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A580D28876 for ; Tue, 6 Aug 2019 08:05:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732127AbfHFIF2 (ORCPT ); Tue, 6 Aug 2019 04:05:28 -0400 Received: from mga03.intel.com ([134.134.136.65]:5805 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731946AbfHFIF1 (ORCPT ); Tue, 6 Aug 2019 04:05:27 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:00:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337307" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:00:42 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 04/14] KVM/x86: intel_pmu_lbr_enable Date: Tue, 6 Aug 2019 15:16:04 +0800 Message-Id: <1565075774-26671-5-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> References: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The lbr stack is model specific, for example, SKX has 32 lbr stack entries while HSW has 16 entries, so a HSW guest running on a SKX machine may not get accurate perf results. Currently, we forbid the guest lbr enabling when the guest and host see different lbr stack entries or the host and guest see different lbr stack msr indices. Cc: Paolo Bonzini Cc: Andi Kleen Cc: Peter Zijlstra Cc: Kan Liang Signed-off-by: Wei Wang --- arch/x86/kvm/pmu.c | 8 +++ arch/x86/kvm/pmu.h | 2 + arch/x86/kvm/vmx/pmu_intel.c | 136 +++++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c | 3 +- 4 files changed, 147 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 46875bb..26fac6c 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -331,6 +331,14 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data) return 0; } +bool kvm_pmu_lbr_enable(struct kvm_vcpu *vcpu) +{ + if (kvm_x86_ops->pmu_ops->lbr_enable) + return kvm_x86_ops->pmu_ops->lbr_enable(vcpu); + + return false; +} + void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu) { if (lapic_in_kernel(vcpu)) diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index 58265f7..d9eec9a 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -29,6 +29,7 @@ struct kvm_pmu_ops { u64 *mask); int (*is_valid_msr_idx)(struct kvm_vcpu *vcpu, unsigned idx); bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr); + bool (*lbr_enable)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 *data); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info); void (*refresh)(struct kvm_vcpu *vcpu); @@ -107,6 +108,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel); void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx); void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx); +bool kvm_pmu_lbr_enable(struct kvm_vcpu *vcpu); void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu); void kvm_pmu_handle_event(struct kvm_vcpu *vcpu); int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data); diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 4dea0e0..6294a86 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -12,6 +12,7 @@ #include #include #include +#include #include "x86.h" #include "cpuid.h" #include "lapic.h" @@ -162,6 +163,140 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) return ret; } +static bool intel_pmu_lbr_enable(struct kvm_vcpu *vcpu) +{ + struct kvm *kvm = vcpu->kvm; + u8 vcpu_model = guest_cpuid_model(vcpu); + unsigned int vcpu_lbr_from, vcpu_lbr_nr; + + if (x86_perf_get_lbr_stack(&kvm->arch.lbr_stack)) + return false; + + if (guest_cpuid_family(vcpu) != boot_cpu_data.x86) + return false; + + /* + * It could be possible that people have vcpus of old model run on + * physcal cpus of newer model, for example a BDW guest on a SKX + * machine (but not possible to be the other way around). + * The BDW guest may not get accurate results on a SKX machine as it + * only reads 16 entries of the lbr stack while there are 32 entries + * of recordings. We currently forbid the lbr enabling when the vcpu + * and physical cpu see different lbr stack entries or the guest lbr + * msr indices are not compatible with the host. + */ + switch (vcpu_model) { + case INTEL_FAM6_CORE2_MEROM: + case INTEL_FAM6_CORE2_MEROM_L: + case INTEL_FAM6_CORE2_PENRYN: + case INTEL_FAM6_CORE2_DUNNINGTON: + /* intel_pmu_lbr_init_core() */ + vcpu_lbr_nr = 4; + vcpu_lbr_from = MSR_LBR_CORE_FROM; + break; + case INTEL_FAM6_NEHALEM: + case INTEL_FAM6_NEHALEM_EP: + case INTEL_FAM6_NEHALEM_EX: + /* intel_pmu_lbr_init_nhm() */ + vcpu_lbr_nr = 16; + vcpu_lbr_from = MSR_LBR_NHM_FROM; + break; + case INTEL_FAM6_ATOM_BONNELL: + case INTEL_FAM6_ATOM_BONNELL_MID: + case INTEL_FAM6_ATOM_SALTWELL: + case INTEL_FAM6_ATOM_SALTWELL_MID: + case INTEL_FAM6_ATOM_SALTWELL_TABLET: + /* intel_pmu_lbr_init_atom() */ + vcpu_lbr_nr = 8; + vcpu_lbr_from = MSR_LBR_CORE_FROM; + break; + case INTEL_FAM6_ATOM_SILVERMONT: + case INTEL_FAM6_ATOM_SILVERMONT_X: + case INTEL_FAM6_ATOM_SILVERMONT_MID: + case INTEL_FAM6_ATOM_AIRMONT: + case INTEL_FAM6_ATOM_AIRMONT_MID: + /* intel_pmu_lbr_init_slm() */ + vcpu_lbr_nr = 8; + vcpu_lbr_from = MSR_LBR_CORE_FROM; + break; + case INTEL_FAM6_ATOM_GOLDMONT: + case INTEL_FAM6_ATOM_GOLDMONT_X: + /* intel_pmu_lbr_init_skl(); */ + vcpu_lbr_nr = 32; + vcpu_lbr_from = MSR_LBR_NHM_FROM; + break; + case INTEL_FAM6_ATOM_GOLDMONT_PLUS: + /* intel_pmu_lbr_init_skl()*/ + vcpu_lbr_nr = 32; + vcpu_lbr_from = MSR_LBR_NHM_FROM; + break; + case INTEL_FAM6_WESTMERE: + case INTEL_FAM6_WESTMERE_EP: + case INTEL_FAM6_WESTMERE_EX: + /* intel_pmu_lbr_init_nhm() */ + vcpu_lbr_nr = 16; + vcpu_lbr_from = MSR_LBR_NHM_FROM; + break; + case INTEL_FAM6_SANDYBRIDGE: + case INTEL_FAM6_SANDYBRIDGE_X: + /* intel_pmu_lbr_init_snb() */ + vcpu_lbr_nr = 16; + vcpu_lbr_from = MSR_LBR_NHM_FROM; + break; + case INTEL_FAM6_IVYBRIDGE: + case INTEL_FAM6_IVYBRIDGE_X: + /* intel_pmu_lbr_init_snb() */ + vcpu_lbr_nr = 16; + vcpu_lbr_from = MSR_LBR_NHM_FROM; + break; + case INTEL_FAM6_HASWELL_CORE: + case INTEL_FAM6_HASWELL_X: + case INTEL_FAM6_HASWELL_ULT: + case INTEL_FAM6_HASWELL_GT3E: + /* intel_pmu_lbr_init_hsw() */ + vcpu_lbr_nr = 16; + vcpu_lbr_from = MSR_LBR_NHM_FROM; + break; + case INTEL_FAM6_BROADWELL_CORE: + case INTEL_FAM6_BROADWELL_XEON_D: + case INTEL_FAM6_BROADWELL_GT3E: + case INTEL_FAM6_BROADWELL_X: + /* intel_pmu_lbr_init_hsw() */ + vcpu_lbr_nr = 16; + vcpu_lbr_from = MSR_LBR_NHM_FROM; + break; + case INTEL_FAM6_XEON_PHI_KNL: + case INTEL_FAM6_XEON_PHI_KNM: + /* intel_pmu_lbr_init_knl() */ + vcpu_lbr_nr = 8; + vcpu_lbr_from = MSR_LBR_NHM_FROM; + break; + case INTEL_FAM6_SKYLAKE_MOBILE: + case INTEL_FAM6_SKYLAKE_DESKTOP: + case INTEL_FAM6_SKYLAKE_X: + case INTEL_FAM6_KABYLAKE_MOBILE: + case INTEL_FAM6_KABYLAKE_DESKTOP: + /* intel_pmu_lbr_init_skl() */ + vcpu_lbr_nr = 32; + vcpu_lbr_from = MSR_LBR_NHM_FROM; + break; + default: + vcpu_lbr_nr = 0; + vcpu_lbr_from = 0; + pr_warn("%s: vcpu model not supported %d\n", __func__, + vcpu_model); + } + + if (vcpu_lbr_nr != kvm->arch.lbr_stack.nr || + vcpu_lbr_from != kvm->arch.lbr_stack.from) { + pr_warn("%s: vcpu model %x incompatible to pcpu %x\n", + __func__, vcpu_model, boot_cpu_data.x86_model); + return false; + } + + return true; +} + static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); @@ -366,6 +501,7 @@ struct kvm_pmu_ops intel_pmu_ops = { .msr_idx_to_pmc = intel_msr_idx_to_pmc, .is_valid_msr_idx = intel_is_valid_msr_idx, .is_valid_msr = intel_is_valid_msr, + .lbr_enable = intel_pmu_lbr_enable, .get_msr = intel_pmu_get_msr, .set_msr = intel_pmu_set_msr, .refresh = intel_pmu_refresh, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e1eb1be..da45962 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4675,8 +4675,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, break; case KVM_CAP_X86_GUEST_LBR: r = -EINVAL; - if (cap->args[0] && - x86_perf_get_lbr_stack(&kvm->arch.lbr_stack)) + if (cap->args[0] && !kvm_pmu_lbr_enable(kvm->vcpus[0])) break; if (copy_to_user((void __user *)cap->args[1], From patchwork Tue Aug 6 07:16:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078407 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F1FCB14DB for ; Tue, 6 Aug 2019 08:06:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E306828657 for ; Tue, 6 Aug 2019 08:06:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D6B5E2847B; Tue, 6 Aug 2019 08:06:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 44BC32844C for ; Tue, 6 Aug 2019 08:06:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732313AbfHFIFw (ORCPT ); Tue, 6 Aug 2019 04:05:52 -0400 Received: from mga03.intel.com ([134.134.136.65]:5847 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727259AbfHFIFw (ORCPT ); Tue, 6 Aug 2019 04:05:52 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:00:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337326" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:00:45 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 05/14] KVM/x86/vPMU: tweak kvm_pmu_get_msr Date: Tue, 6 Aug 2019 15:16:05 +0800 Message-Id: <1565075774-26671-6-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> References: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Change kvm_pmu_get_msr to get the msr_data struct, as the host_initiated field from the struct could be used by get_msr. This also makes this API consistent with kvm_pmu_set_msr. Cc: Paolo Bonzini Cc: Andi Kleen Signed-off-by: Wei Wang --- arch/x86/kvm/pmu.c | 4 ++-- arch/x86/kvm/pmu.h | 4 ++-- arch/x86/kvm/pmu_amd.c | 7 ++++--- arch/x86/kvm/vmx/pmu_intel.c | 19 +++++++++++-------- arch/x86/kvm/x86.c | 4 ++-- 5 files changed, 21 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 26fac6c..1a291ed 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -350,9 +350,9 @@ bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) return kvm_x86_ops->pmu_ops->is_valid_msr(vcpu, msr); } -int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data) +int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { - return kvm_x86_ops->pmu_ops->get_msr(vcpu, msr, data); + return kvm_x86_ops->pmu_ops->get_msr(vcpu, msr_info); } int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index d9eec9a..f61024e 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -30,7 +30,7 @@ struct kvm_pmu_ops { int (*is_valid_msr_idx)(struct kvm_vcpu *vcpu, unsigned idx); bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr); bool (*lbr_enable)(struct kvm_vcpu *vcpu); - int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 *data); + int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info); void (*refresh)(struct kvm_vcpu *vcpu); void (*init)(struct kvm_vcpu *vcpu); @@ -114,7 +114,7 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu); int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data); int kvm_pmu_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx); bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr); -int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data); +int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info); void kvm_pmu_refresh(struct kvm_vcpu *vcpu); void kvm_pmu_reset(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c index c838838..4a64a3f 100644 --- a/arch/x86/kvm/pmu_amd.c +++ b/arch/x86/kvm/pmu_amd.c @@ -208,21 +208,22 @@ static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) return ret; } -static int amd_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data) +static int amd_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); struct kvm_pmc *pmc; + u32 msr = msr_info->index; /* MSR_PERFCTRn */ pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER); if (pmc) { - *data = pmc_read_counter(pmc); + msr_info->data = pmc_read_counter(pmc); return 0; } /* MSR_EVNTSELn */ pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL); if (pmc) { - *data = pmc->eventsel; + msr_info->data = pmc->eventsel; return 0; } diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 6294a86..53bb95e 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -297,35 +297,38 @@ static bool intel_pmu_lbr_enable(struct kvm_vcpu *vcpu) return true; } -static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data) +static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); struct kvm_pmc *pmc; + u32 msr = msr_info->index; switch (msr) { case MSR_CORE_PERF_FIXED_CTR_CTRL: - *data = pmu->fixed_ctr_ctrl; + msr_info->data = pmu->fixed_ctr_ctrl; return 0; case MSR_CORE_PERF_GLOBAL_STATUS: - *data = pmu->global_status; + msr_info->data = pmu->global_status; return 0; case MSR_CORE_PERF_GLOBAL_CTRL: - *data = pmu->global_ctrl; + msr_info->data = pmu->global_ctrl; return 0; case MSR_CORE_PERF_GLOBAL_OVF_CTRL: - *data = pmu->global_ovf_ctrl; + msr_info->data = pmu->global_ovf_ctrl; return 0; default: if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0))) { u64 val = pmc_read_counter(pmc); - *data = val & pmu->counter_bitmask[KVM_PMC_GP]; + msr_info->data = + val & pmu->counter_bitmask[KVM_PMC_GP]; return 0; } else if ((pmc = get_fixed_pmc(pmu, msr))) { u64 val = pmc_read_counter(pmc); - *data = val & pmu->counter_bitmask[KVM_PMC_FIXED]; + msr_info->data = + val & pmu->counter_bitmask[KVM_PMC_FIXED]; return 0; } else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) { - *data = pmc->eventsel; + msr_info->data = pmc->eventsel; return 0; } } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index da45962..7a7cd93 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2824,7 +2824,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1: case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1: if (kvm_pmu_is_valid_msr(vcpu, msr_info->index)) - return kvm_pmu_get_msr(vcpu, msr_info->index, &msr_info->data); + return kvm_pmu_get_msr(vcpu, msr_info); msr_info->data = 0; break; case MSR_IA32_UCODE_REV: @@ -2982,7 +2982,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) break; default: if (kvm_pmu_is_valid_msr(vcpu, msr_info->index)) - return kvm_pmu_get_msr(vcpu, msr_info->index, &msr_info->data); + return kvm_pmu_get_msr(vcpu, msr_info); if (!ignore_msrs) { vcpu_debug_ratelimited(vcpu, "unhandled rdmsr: 0x%x\n", msr_info->index); From patchwork Tue Aug 6 07:16:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078425 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 75DD017E0 for ; Tue, 6 Aug 2019 08:06:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 63DD9223A6 for ; Tue, 6 Aug 2019 08:06:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 568D128496; Tue, 6 Aug 2019 08:06:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E1051223A6 for ; Tue, 6 Aug 2019 08:06:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732350AbfHFIGD (ORCPT ); Tue, 6 Aug 2019 04:06:03 -0400 Received: from mga03.intel.com ([134.134.136.65]:5858 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731946AbfHFIGD (ORCPT ); Tue, 6 Aug 2019 04:06:03 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:00:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337339" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:00:47 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 06/14] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest Date: Tue, 6 Aug 2019 15:16:06 +0800 Message-Id: <1565075774-26671-7-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> References: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Bits [0, 5] of MSR_IA32_PERF_CAPABILITIES tell about the format of the addresses stored in the lbr stack. Expose those bits to the guest when the guest lbr feature is enabled. Cc: Paolo Bonzini Cc: Andi Kleen Signed-off-by: Wei Wang --- arch/x86/include/asm/perf_event.h | 2 ++ arch/x86/kvm/cpuid.c | 2 +- arch/x86/kvm/vmx/pmu_intel.c | 16 ++++++++++++++++ 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index 2606100..aa77da2 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -95,6 +95,8 @@ #define PEBS_DATACFG_LBRS BIT_ULL(3) #define PEBS_DATACFG_LBR_SHIFT 24 +#define X86_PERF_CAP_MASK_LBR_FMT 0x3f + /* * Intel "Architectural Performance Monitoring" CPUID * detection/enumeration details: diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 22c2720..826b2dc 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -458,7 +458,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function, F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ | 0 /* DS-CPL, VMX, SMX, EST */ | 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ | - F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ | + F(FMA) | F(CX16) | 0 /* xTPR Update*/ | F(PDCM) | F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) | F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) | 0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) | diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 53bb95e..f0ad78f 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -151,6 +151,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) case MSR_CORE_PERF_GLOBAL_STATUS: case MSR_CORE_PERF_GLOBAL_CTRL: case MSR_CORE_PERF_GLOBAL_OVF_CTRL: + case MSR_IA32_PERF_CAPABILITIES: ret = pmu->version > 1; break; default: @@ -316,6 +317,19 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_CORE_PERF_GLOBAL_OVF_CTRL: msr_info->data = pmu->global_ovf_ctrl; return 0; + case MSR_IA32_PERF_CAPABILITIES: { + u64 data; + + if (!boot_cpu_has(X86_FEATURE_PDCM) || + (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_PDCM))) + return 1; + data = native_read_msr(MSR_IA32_PERF_CAPABILITIES); + msr_info->data = 0; + if (vcpu->kvm->arch.lbr_in_guest) + msr_info->data |= (data & X86_PERF_CAP_MASK_LBR_FMT); + return 0; + } default: if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0))) { u64 val = pmc_read_counter(pmc); @@ -374,6 +388,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 0; } break; + case MSR_IA32_PERF_CAPABILITIES: + return 1; /* RO MSR */ default: if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0))) { if (msr_info->host_initiated) From patchwork Tue Aug 6 07:16:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078423 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7A35D14DB for ; Tue, 6 Aug 2019 08:06:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6982C2845E for ; Tue, 6 Aug 2019 08:06:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5D252286BE; Tue, 6 Aug 2019 08:06:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3FCB42845E for ; Tue, 6 Aug 2019 08:06:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732275AbfHFIGs (ORCPT ); Tue, 6 Aug 2019 04:06:48 -0400 Received: from mga03.intel.com ([134.134.136.65]:5858 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732068AbfHFIGD (ORCPT ); Tue, 6 Aug 2019 04:06:03 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:00:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337350" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:00:49 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 07/14] perf/x86: support to create a perf event without counter allocation Date: Tue, 6 Aug 2019 15:16:07 +0800 Message-Id: <1565075774-26671-8-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> References: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hypervisors may create an lbr event for a vcpu's lbr emulation, and the emulation doesn't need a counter fundamentally. This makes the emulation follow the x86 SDM's description about lbr, which doesn't include a counter, and also avoids wasting a counter. The perf scheduler is supported to not assign a counter for a perf event which doesn't need a counter. Define a macro, X86_PMC_IDX_NA, to replace "-1", which represents a never assigned counter id. Cc: Andi Kleen Cc: Peter Zijlstra Signed-off-by: Wei Wang https://lkml.kernel.org/r/20180920162407.GA24124@hirez.programming.kicks-ass.net --- arch/x86/events/core.c | 36 +++++++++++++++++++++++++++--------- arch/x86/events/intel/core.c | 3 +++ arch/x86/include/asm/perf_event.h | 1 + include/linux/perf_event.h | 11 +++++++++++ 4 files changed, 42 insertions(+), 9 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 81b005e..ffa27bb 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -73,7 +73,7 @@ u64 x86_perf_event_update(struct perf_event *event) int idx = hwc->idx; u64 delta; - if (idx == INTEL_PMC_IDX_FIXED_BTS) + if ((idx == INTEL_PMC_IDX_FIXED_BTS) || (idx == X86_PMC_IDX_NA)) return 0; /* @@ -595,7 +595,7 @@ static int __x86_pmu_event_init(struct perf_event *event) atomic_inc(&active_events); event->destroy = hw_perf_event_destroy; - event->hw.idx = -1; + event->hw.idx = X86_PMC_IDX_NA; event->hw.last_cpu = -1; event->hw.last_tag = ~0ULL; @@ -763,6 +763,8 @@ static bool perf_sched_restore_state(struct perf_sched *sched) static bool __perf_sched_find_counter(struct perf_sched *sched) { struct event_constraint *c; + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct perf_event *e = cpuc->event_list[sched->state.event]; int idx; if (!sched->state.unassigned) @@ -772,6 +774,14 @@ static bool __perf_sched_find_counter(struct perf_sched *sched) return false; c = sched->constraints[sched->state.event]; + if (c == &emptyconstraint) + return false; + + if (is_no_counter_event(e)) { + idx = X86_PMC_IDX_NA; + goto done; + } + /* Prefer fixed purpose counters */ if (c->idxmsk64 & (~0ULL << INTEL_PMC_IDX_FIXED)) { idx = INTEL_PMC_IDX_FIXED; @@ -797,7 +807,7 @@ static bool __perf_sched_find_counter(struct perf_sched *sched) done: sched->state.counter = idx; - if (c->overlap) + if ((idx != X86_PMC_IDX_NA) && c->overlap) perf_sched_save_state(sched); return true; @@ -918,7 +928,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign) c = cpuc->event_constraint[i]; /* never assigned */ - if (hwc->idx == -1) + if (hwc->idx == X86_PMC_IDX_NA) break; /* constraint still honored */ @@ -969,7 +979,8 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign) if (!unsched && assign) { for (i = 0; i < n; i++) { e = cpuc->event_list[i]; - if (x86_pmu.commit_scheduling) + if (x86_pmu.commit_scheduling && + (assign[i] != X86_PMC_IDX_NA)) x86_pmu.commit_scheduling(cpuc, i, assign[i]); } } else { @@ -1038,7 +1049,8 @@ static inline void x86_assign_hw_event(struct perf_event *event, hwc->last_cpu = smp_processor_id(); hwc->last_tag = ++cpuc->tags[i]; - if (hwc->idx == INTEL_PMC_IDX_FIXED_BTS) { + if ((hwc->idx == INTEL_PMC_IDX_FIXED_BTS) || + (hwc->idx == X86_PMC_IDX_NA)) { hwc->config_base = 0; hwc->event_base = 0; } else if (hwc->idx >= INTEL_PMC_IDX_FIXED) { @@ -1115,7 +1127,7 @@ static void x86_pmu_enable(struct pmu *pmu) * - running on same CPU as last time * - no other event has used the counter since */ - if (hwc->idx == -1 || + if (hwc->idx == X86_PMC_IDX_NA || match_prev_assignment(hwc, cpuc, i)) continue; @@ -1169,7 +1181,7 @@ int x86_perf_event_set_period(struct perf_event *event) s64 period = hwc->sample_period; int ret = 0, idx = hwc->idx; - if (idx == INTEL_PMC_IDX_FIXED_BTS) + if ((idx == INTEL_PMC_IDX_FIXED_BTS) || (idx == X86_PMC_IDX_NA)) return 0; /* @@ -1306,7 +1318,7 @@ static void x86_pmu_start(struct perf_event *event, int flags) if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED))) return; - if (WARN_ON_ONCE(idx == -1)) + if (idx == X86_PMC_IDX_NA) return; if (flags & PERF_EF_RELOAD) { @@ -1388,6 +1400,9 @@ void x86_pmu_stop(struct perf_event *event, int flags) struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); struct hw_perf_event *hwc = &event->hw; + if (hwc->idx == X86_PMC_IDX_NA) + return; + if (test_bit(hwc->idx, cpuc->active_mask)) { x86_pmu.disable(event); __clear_bit(hwc->idx, cpuc->active_mask); @@ -2128,6 +2143,9 @@ static int x86_pmu_event_idx(struct perf_event *event) if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED)) return 0; + if (idx == X86_PMC_IDX_NA) + return X86_PMC_IDX_NA; + if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) { idx -= INTEL_PMC_IDX_FIXED; idx |= 1 << 30; diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 648260b5..177c321 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2156,6 +2156,9 @@ static void intel_pmu_disable_event(struct perf_event *event) return; } + if (hwc->idx == X86_PMC_IDX_NA) + return; + cpuc->intel_ctrl_guest_mask &= ~(1ull << hwc->idx); cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->idx); cpuc->intel_cp_status &= ~(1ull << hwc->idx); diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index aa77da2..23ab90c 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -11,6 +11,7 @@ #define INTEL_PMC_IDX_FIXED 32 #define X86_PMC_IDX_MAX 64 +#define X86_PMC_IDX_NA -1 #define MSR_ARCH_PERFMON_PERFCTR0 0xc1 #define MSR_ARCH_PERFMON_PERFCTR1 0xc2 diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index e8ad3c5..2cae06a 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -530,6 +530,7 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *, */ #define PERF_EV_CAP_SOFTWARE BIT(0) #define PERF_EV_CAP_READ_ACTIVE_PKG BIT(1) +#define PERF_EV_CAP_NO_COUNTER BIT(2) #define SWEVENT_HLIST_BITS 8 #define SWEVENT_HLIST_SIZE (1 << SWEVENT_HLIST_BITS) @@ -1039,6 +1040,16 @@ static inline bool is_sampling_event(struct perf_event *event) return event->attr.sample_period != 0; } +static inline bool is_no_counter_event(struct perf_event *event) +{ + return !!(event->event_caps & PERF_EV_CAP_NO_COUNTER); +} + +static inline void perf_event_set_no_counter(struct perf_event *event) +{ + event->event_caps |= PERF_EV_CAP_NO_COUNTER; +} + /* * Return 1 for a software event, 0 for a hardware event */ From patchwork Tue Aug 6 07:16:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078419 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D739414E5 for ; Tue, 6 Aug 2019 08:06:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C87622872A for ; Tue, 6 Aug 2019 08:06:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BC1222878E; Tue, 6 Aug 2019 08:06:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5FC25223A6 for ; Tue, 6 Aug 2019 08:06:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732364AbfHFIGE (ORCPT ); Tue, 6 Aug 2019 04:06:04 -0400 Received: from mga03.intel.com ([134.134.136.65]:5861 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732338AbfHFIGD (ORCPT ); Tue, 6 Aug 2019 04:06:03 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:00:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337365" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:00:52 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 08/14] perf/core: set the event->owner before event_init Date: Tue, 6 Aug 2019 15:16:08 +0800 Message-Id: <1565075774-26671-9-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> References: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Kernel and user events can be distinguished by checking event->owner. Some pmu driver implementation may need to know event->owner in event_init. For example, intel_pmu_setup_lbr_filter treats a kernel event with exclude_host set as an lbr event created for guest lbr emulation, which doesn't need a pmu counter. So move the event->owner assignment into perf_event_alloc to have it set before event_init is called. Signed-off-by: Wei Wang --- kernel/events/core.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 0463c11..7663f85 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -10288,6 +10288,7 @@ static void account_event(struct perf_event *event) static struct perf_event * perf_event_alloc(struct perf_event_attr *attr, int cpu, struct task_struct *task, + struct task_struct *owner, struct perf_event *group_leader, struct perf_event *parent_event, perf_overflow_handler_t overflow_handler, @@ -10340,6 +10341,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, event->group_leader = group_leader; event->pmu = NULL; event->oncpu = -1; + event->owner = owner; event->parent = parent_event; @@ -10891,7 +10893,7 @@ SYSCALL_DEFINE5(perf_event_open, if (flags & PERF_FLAG_PID_CGROUP) cgroup_fd = pid; - event = perf_event_alloc(&attr, cpu, task, group_leader, NULL, + event = perf_event_alloc(&attr, cpu, task, current, group_leader, NULL, NULL, NULL, cgroup_fd); if (IS_ERR(event)) { err = PTR_ERR(event); @@ -11153,8 +11155,6 @@ SYSCALL_DEFINE5(perf_event_open, perf_event__header_size(event); perf_event__id_header_size(event); - event->owner = current; - perf_install_in_context(ctx, event, event->cpu); perf_unpin_context(ctx); @@ -11231,16 +11231,13 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu, * Get the target context (task or percpu): */ - event = perf_event_alloc(attr, cpu, task, NULL, NULL, + event = perf_event_alloc(attr, cpu, task, TASK_TOMBSTONE, NULL, NULL, overflow_handler, context, -1); if (IS_ERR(event)) { err = PTR_ERR(event); goto err; } - /* Mark owner so we could distinguish it from user events. */ - event->owner = TASK_TOMBSTONE; - ctx = find_get_context(event->pmu, task, event); if (IS_ERR(ctx)) { err = PTR_ERR(ctx); @@ -11677,6 +11674,7 @@ inherit_event(struct perf_event *parent_event, child_event = perf_event_alloc(&parent_event->attr, parent_event->cpu, + parent_event->owner, child, group_leader, parent_event, NULL, NULL, -1); From patchwork Tue Aug 6 07:16:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078421 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD5A114DB for ; Tue, 6 Aug 2019 08:06:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AD7EF2845E for ; Tue, 6 Aug 2019 08:06:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A1041286BE; Tue, 6 Aug 2019 08:06:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CE31F2845E for ; Tue, 6 Aug 2019 08:06:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732469AbfHFIGl (ORCPT ); Tue, 6 Aug 2019 04:06:41 -0400 Received: from mga03.intel.com ([134.134.136.65]:5858 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732358AbfHFIGE (ORCPT ); Tue, 6 Aug 2019 04:06:04 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:00:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337382" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:00:54 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 09/14] KVM/x86/vPMU: APIs to create/free lbr perf event for a vcpu thread Date: Tue, 6 Aug 2019 15:16:09 +0800 Message-Id: <1565075774-26671-10-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> References: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP VMX transition is much more frequent than vcpu switching, and saving/restoring tens of lbr msrs (e.g. 32 lbr stack entries) would add too much overhead to the frequent vmx transition, which is not necessary. So the vcpu's lbr state only gets saved/restored on the vcpu context switching. The main purposes of using the vcpu's lbr perf event are - follow the host perf scheduling rules to manage the vcpu's usage of lbr (e.g. a cpu pinned lbr event could reclaim lbr and thus stopping the vcpu's use); - have the host perf do context switching of the lbr state on the vcpu thread context switching. Please see the comments in intel_pmu_create_lbr_event for more details. To achieve the pure lbr emulation, the perf event is created only to claim for the lbr feature, and no perf counter is needed for it. The vcpu_lbr field is added to indicate to the host lbr driver that the lbr is currently assigned to a vcpu to use. The guest driver inside the vcpu has its own logic to use the lbr, thus the host side lbr driver doesn't need to enable and use the lbr feature in this case. Some design choice considerations: - Why using "is_kernel_event", instead of checking the PF_VCPU flag, to determine that it is a vcpu perf event for lbr emulation? This is because PF_VCPU is set right before vm-entry into the guest, and cleared after the guest vm-exits to the host. So that flag doesn't remain set when running the host code. Cc: Paolo Bonzini Cc: Andi Kleen Cc: Peter Zijlstra Co-developed-by: Like Xu Signed-off-by: Like Xu Signed-off-by: Wei Wang --- arch/x86/events/intel/lbr.c | 38 ++++++++++++++++++++++-- arch/x86/events/perf_event.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/vmx/pmu_intel.c | 64 +++++++++++++++++++++++++++++++++++++++++ include/linux/perf_event.h | 7 +++++ kernel/events/core.c | 7 ----- 6 files changed, 108 insertions(+), 10 deletions(-) diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c index 9b2d05c..4f4bd18 100644 --- a/arch/x86/events/intel/lbr.c +++ b/arch/x86/events/intel/lbr.c @@ -462,6 +462,14 @@ void intel_pmu_lbr_add(struct perf_event *event) if (!x86_pmu.lbr_nr) return; + /* + * An lbr event without a counter indicates this is for the vcpu lbr + * emulation, so set the vcpu_lbr flag when the vcpu lbr event + * gets scheduled on the lbr here. + */ + if (is_no_counter_event(event)) + cpuc->vcpu_lbr = 1; + cpuc->br_sel = event->hw.branch_reg.reg; if (branch_user_callstack(cpuc->br_sel) && event->ctx->task_ctx_data) { @@ -509,6 +517,14 @@ void intel_pmu_lbr_del(struct perf_event *event) task_ctx->lbr_callstack_users--; } + /* + * An lbr event without a counter indicates this is for the vcpu lbr + * emulation, so clear the vcpu_lbr flag when the vcpu's lbr event + * gets scheduled out from the lbr. + */ + if (is_no_counter_event(event)) + cpuc->vcpu_lbr = 0; + if (x86_pmu.intel_cap.pebs_baseline && event->attr.precise_ip > 0) cpuc->lbr_pebs_users--; cpuc->lbr_users--; @@ -521,7 +537,12 @@ void intel_pmu_lbr_enable_all(bool pmi) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); - if (cpuc->lbr_users) + /* + * The vcpu lbr emulation doesn't need host to enable lbr at this + * point, because the guest will set the enabling at a proper time + * itself. + */ + if (cpuc->lbr_users && !cpuc->vcpu_lbr) __intel_pmu_lbr_enable(pmi); } @@ -529,7 +550,11 @@ void intel_pmu_lbr_disable_all(void) { struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); - if (cpuc->lbr_users) + /* + * Same as intel_pmu_lbr_enable_all, the guest is responsible for + * clearing the enabling itself. + */ + if (cpuc->lbr_users && !cpuc->vcpu_lbr) __intel_pmu_lbr_disable(); } @@ -668,8 +693,12 @@ void intel_pmu_lbr_read(void) * * This could be smarter and actually check the event, * but this simple approach seems to work for now. + * + * And no need to read the lbr msrs here if the vcpu lbr event + * is using it, as the guest will read them itself. */ - if (!cpuc->lbr_users || cpuc->lbr_users == cpuc->lbr_pebs_users) + if (!cpuc->lbr_users || cpuc->vcpu_lbr || + cpuc->lbr_users == cpuc->lbr_pebs_users) return; if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_32) @@ -802,6 +831,9 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event) if (!x86_pmu.lbr_nr) return -EOPNOTSUPP; + if (event->attr.exclude_host && is_kernel_event(event)) + perf_event_set_no_counter(event); + /* * setup SW LBR filter */ diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 27e4d32..8b90a25 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -220,6 +220,7 @@ struct cpu_hw_events { /* * Intel LBR bits */ + u8 vcpu_lbr; int lbr_users; int lbr_pebs_users; struct perf_branch_stack lbr_stack; diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d29dddd..692a0c2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -474,6 +474,7 @@ struct kvm_pmu { struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED]; struct irq_work irq_work; u64 reprogram_pmi; + struct perf_event *lbr_event; }; struct kvm_pmu_ops; diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index f0ad78f..89730f8 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -164,6 +164,70 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) return ret; } +int intel_pmu_create_lbr_event(struct kvm_vcpu *vcpu) +{ + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + struct perf_event *event; + + /* + * The perf event is created for the following purposes: + * - have the host perf subsystem manage (prioritize) the guest's use + * of lbr with other host lbr events (if there are). The pinned field + * is set to true to make this event task pinned. If a cpu pinned + * lbr event reclaims lbr, the event->oncpu field will be set to -1. + * It will be checked at the moment before vm-entry, and the lbr + * feature will not be passed through to the guest for direct + * accesses if the vcpu's lbr event does not own the lbr feature + * anymore. This will cause the guest's lbr accesses to trap to the + * kvm's handler, where the accesses will be prevented in this case. + * - have the host perf subsystem help save/restore the guest lbr stack + * on vcpu switching. Since the host perf only performs this + * save/restore for the user callstack mode lbr event, we configure + * the sample_type and branch_sample_type fields accordingly to make + * this a user callstack mode lbr event. + * + * This perf event is used for the emulation of the lbr feature, which + * doesn't have a pmu counter. Accordingly, the related attr fields, + * such as config and sample period, don't need to be set here. + * exclude_host is set to tell the perf lbr driver that the event is for + * the guest lbr emulation. + */ + struct perf_event_attr attr = { + .type = PERF_TYPE_RAW, + .size = sizeof(attr), + .pinned = true, + .exclude_host = true, + .sample_type = PERF_SAMPLE_BRANCH_STACK, + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK | + PERF_SAMPLE_BRANCH_USER, + }; + + if (pmu->lbr_event) + return 0; + + event = perf_event_create_kernel_counter(&attr, -1, current, NULL, + NULL); + if (IS_ERR(event)) { + pr_err("%s: failed %ld\n", __func__, PTR_ERR(event)); + return -ENOENT; + } + pmu->lbr_event = event; + + return 0; +} + +void intel_pmu_free_lbr_event(struct kvm_vcpu *vcpu) +{ + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + struct perf_event *event = pmu->lbr_event; + + if (!event) + return; + + perf_event_release_kernel(event); + pmu->lbr_event = NULL; +} + static bool intel_pmu_lbr_enable(struct kvm_vcpu *vcpu) { struct kvm *kvm = vcpu->kvm; diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 2cae06a..eb76cdc 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1050,6 +1050,13 @@ static inline void perf_event_set_no_counter(struct perf_event *event) event->event_caps |= PERF_EV_CAP_NO_COUNTER; } +#define TASK_TOMBSTONE ((void *)-1L) + +static inline bool is_kernel_event(struct perf_event *event) +{ + return READ_ONCE(event->owner) == TASK_TOMBSTONE; +} + /* * Return 1 for a software event, 0 for a hardware event */ diff --git a/kernel/events/core.c b/kernel/events/core.c index 7663f85..9aa987a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -164,13 +164,6 @@ static void perf_ctx_unlock(struct perf_cpu_context *cpuctx, raw_spin_unlock(&cpuctx->ctx.lock); } -#define TASK_TOMBSTONE ((void *)-1L) - -static bool is_kernel_event(struct perf_event *event) -{ - return READ_ONCE(event->owner) == TASK_TOMBSTONE; -} - /* * On task ctx scheduling... * From patchwork Tue Aug 6 07:16:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078409 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F028614DB for ; Tue, 6 Aug 2019 08:06:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E0DE62845E for ; Tue, 6 Aug 2019 08:06:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D524D286BE; Tue, 6 Aug 2019 08:06:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75EFB2845E for ; Tue, 6 Aug 2019 08:06:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732372AbfHFIGE (ORCPT ); Tue, 6 Aug 2019 04:06:04 -0400 Received: from mga03.intel.com ([134.134.136.65]:5861 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732356AbfHFIGE (ORCPT ); Tue, 6 Aug 2019 04:06:04 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:00:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337398" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:00:57 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 10/14] perf/x86/lbr: don't share lbr for the vcpu usage case Date: Tue, 6 Aug 2019 15:16:10 +0800 Message-Id: <1565075774-26671-11-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> References: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Perf event scheduling lets multiple lbr events share the lbr if they use the same config for LBR_SELECT. For the vcpu case, the vcpu's lbr event created on the host deliberately sets the config to the user callstack mode to have the host support to save/restore the lbr state on vcpu context switching, and the config won't be written to the LBR_SELECT, as the LBR_SELECT is configured by the guest, which might not be the same as the user callstack mode. So don't allow the vcpu's lbr event to share lbr with other host lbr events. Signed-off-by: Wei Wang --- arch/x86/events/intel/lbr.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c index 4f4bd18..a0f3686 100644 --- a/arch/x86/events/intel/lbr.c +++ b/arch/x86/events/intel/lbr.c @@ -45,6 +45,12 @@ static const enum { #define LBR_CALL_STACK_BIT 9 /* enable call stack */ /* + * Set this hardware reserved bit if the lbr perf event is for the vcpu lbr + * emulation. This makes the reg->config different from other regular lbr + * events' config, so that they will not share the lbr feature. + */ +#define LBR_VCPU_BIT 62 +/* * Following bit only exists in Linux; we mask it out before writing it to * the actual MSR. But it helps the constraint perf code to understand * that this is a separate configuration. @@ -62,6 +68,7 @@ static const enum { #define LBR_FAR (1 << LBR_FAR_BIT) #define LBR_CALL_STACK (1 << LBR_CALL_STACK_BIT) #define LBR_NO_INFO (1ULL << LBR_NO_INFO_BIT) +#define LBR_VCPU (1ULL << LBR_VCPU_BIT) #define LBR_PLM (LBR_KERNEL | LBR_USER) @@ -818,6 +825,26 @@ static int intel_pmu_setup_hw_lbr_filter(struct perf_event *event) (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)) reg->config |= LBR_NO_INFO; + /* + * An lbr perf event without a counter indicates this is for the vcpu + * lbr emulation. The vcpu lbr emulation does not allow the lbr + * feature to be shared with other lbr events on the host, because the + * LBR_SELECT msr is configured by the guest itself. The reg->config + * is deliberately configured to be user call stack mode via the + * related attr fileds to get the host perf's help to save/restore the + * lbr state on vcpu context switching. It doesn't represent what + * LBR_SELECT will be configured. + * + * Set the reserved LBR_VCPU bit for the vcpu usage case, so that the + * vcpu's lbr perf event will not share the lbr feature with other perf + * events. (see __intel_shared_reg_get_constraints, failing to share + * makes it return the emptyconstraint, which finally makes + * x86_schedule_events fail to schedule the lower priority lbr event on + * the lbr feature). + */ + if (is_no_counter_event(event)) + reg->config |= LBR_VCPU; + return 0; } From patchwork Tue Aug 6 07:16:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078411 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F03CC14E5 for ; Tue, 6 Aug 2019 08:06:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E09F22845E for ; Tue, 6 Aug 2019 08:06:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D40F9286BE; Tue, 6 Aug 2019 08:06:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7261C2845E for ; Tue, 6 Aug 2019 08:06:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732419AbfHFIGR (ORCPT ); Tue, 6 Aug 2019 04:06:17 -0400 Received: from mga03.intel.com ([134.134.136.65]:5868 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731735AbfHFIGQ (ORCPT ); Tue, 6 Aug 2019 04:06:16 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:01:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337421" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:00:59 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 11/14] perf/x86: save/restore LBR_SELECT on vcpu switching Date: Tue, 6 Aug 2019 15:16:11 +0800 Message-Id: <1565075774-26671-12-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> References: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The regular host lbr perf event doesn't save/restore the LBR_SELECT msr during a thread context switching, because the LBR_SELECT value is generated from attr.branch_sample_type and already stored in event->hw.branch_reg (please see intel_pmu_setup_hw_filter), which doesn't get lost during thread context switching. The attr.branch_sample_type for the vcpu lbr event is deliberately set to the user call stack mode to enable the perf core to save/restore the lbr related msrs on vcpu switching. So the attr.branch_sample_type essentially doesn't represent what the guest pmu driver will write to LBR_SELECT. Meanwhile, the host lbr driver doesn't configure the lbr msrs, including the LBR_SELECT msr, for the vcpu thread case, as the pmu driver inside the vcpu will do that. So for the vcpu case, add the LBR_SELECT save/restore to ensure what the guest writes to the LBR_SELECT msr doesn't get lost during the vcpu context switching. Cc: Peter Zijlstra Cc: Andi Kleen Cc: Kan Liang Signed-off-by: Wei Wang --- arch/x86/events/intel/lbr.c | 7 +++++++ arch/x86/events/perf_event.h | 1 + 2 files changed, 8 insertions(+) diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c index a0f3686..236f8352 100644 --- a/arch/x86/events/intel/lbr.c +++ b/arch/x86/events/intel/lbr.c @@ -390,6 +390,9 @@ static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx) wrmsrl(x86_pmu.lbr_tos, tos); task_ctx->lbr_stack_state = LBR_NONE; + + if (cpuc->vcpu_lbr) + wrmsrl(MSR_LBR_SELECT, task_ctx->lbr_sel); } static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx) @@ -416,6 +419,10 @@ static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx) if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO) rdmsrl(MSR_LBR_INFO_0 + lbr_idx, task_ctx->lbr_info[i]); } + + if (cpuc->vcpu_lbr) + rdmsrl(MSR_LBR_SELECT, task_ctx->lbr_sel); + task_ctx->valid_lbrs = i; task_ctx->tos = tos; task_ctx->lbr_stack_state = LBR_VALID; diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 8b90a25..0b2f660 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -699,6 +699,7 @@ struct x86_perf_task_context { u64 lbr_from[MAX_LBR_ENTRIES]; u64 lbr_to[MAX_LBR_ENTRIES]; u64 lbr_info[MAX_LBR_ENTRIES]; + u64 lbr_sel; int tos; int valid_lbrs; int lbr_callstack_users; From patchwork Tue Aug 6 07:16:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078415 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2167814DB for ; Tue, 6 Aug 2019 08:06:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 10E482845E for ; Tue, 6 Aug 2019 08:06:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 03D8728496; Tue, 6 Aug 2019 08:06:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0E25E286BE for ; Tue, 6 Aug 2019 08:06:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732435AbfHFIGU (ORCPT ); Tue, 6 Aug 2019 04:06:20 -0400 Received: from mga03.intel.com ([134.134.136.65]:5868 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732423AbfHFIGT (ORCPT ); Tue, 6 Aug 2019 04:06:19 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:01:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337443" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:01:01 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 12/14] KVM/x86/lbr: lbr emulation Date: Tue, 6 Aug 2019 15:16:12 +0800 Message-Id: <1565075774-26671-13-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> References: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In general, the lbr emulation works in this way: Guest first access (since vcpu scheduled in) to the lbr related msr gets trapped to kvm, and the handler will do the following things: - create an lbr perf event to have the vcpu get the lbr feature from host perf following the perf scheduling rules; - pass the lbr related msrs through to the guest for direct accesses without vm-exits till the end of this vcpu time slice. The guest first access is made interceptible so that the kvm side lbr emulation can always get if the lbr feature has been used during the vcpu time slice. If the lbr feature isn't used during a time slice, the lbr event created for the vcpu will be freed. Some considerations: - Why not free the vcpu lbr event when the guest clears the lbr enable bit? Guest may frequently clear the lbr enable bit (in the debugctl msr) during its use of the lbr feature, e.g. in PMI handler. This will cause the kvm emulation to frequently alloc/free the vcpu lbr event, which is unnecessary. Technically, we want to free the vcpu lbr event when the guest doesn't need to run lbr anymore. Heuristically, we free the vcpu lbr event when the guest doesn't touch any of the lbr msrs during an entire vcpu time slice. Cc: Paolo Bonzini Cc: Andi Kleen Cc: Peter Zijlstra Suggested-by: Andi Kleen Signed-off-by: Wei Wang --- arch/x86/include/asm/kvm_host.h | 2 + arch/x86/kvm/pmu.c | 6 ++ arch/x86/kvm/pmu.h | 2 + arch/x86/kvm/vmx/pmu_intel.c | 206 ++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 4 +- arch/x86/kvm/vmx/vmx.h | 2 + arch/x86/kvm/x86.c | 2 + 7 files changed, 222 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 692a0c2..ecd22b5 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -469,6 +469,8 @@ struct kvm_pmu { u64 global_ctrl_mask; u64 global_ovf_ctrl_mask; u64 reserved_bits; + /* Indicate if the lbr msrs were accessed in this vcpu time slice */ + bool lbr_used; u8 version; struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC]; struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED]; diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 1a291ed..afad092 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -360,6 +360,12 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return kvm_x86_ops->pmu_ops->set_msr(vcpu, msr_info); } +void kvm_pmu_sched_in(struct kvm_vcpu *vcpu, int cpu) +{ + if (kvm_x86_ops->pmu_ops->sched_in) + kvm_x86_ops->pmu_ops->sched_in(vcpu, cpu); +} + /* refresh PMU settings. This function generally is called when underlying * settings are changed (such as changes of PMU CPUID by guest VMs), which * should rarely happen. diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index f61024e..f875721 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -32,6 +32,7 @@ struct kvm_pmu_ops { bool (*lbr_enable)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info); + void (*sched_in)(struct kvm_vcpu *vcpu, int cpu); void (*refresh)(struct kvm_vcpu *vcpu); void (*init)(struct kvm_vcpu *vcpu); void (*reset)(struct kvm_vcpu *vcpu); @@ -116,6 +117,7 @@ int kvm_pmu_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx); bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr); int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info); +void kvm_pmu_sched_in(struct kvm_vcpu *vcpu, int cpu); void kvm_pmu_refresh(struct kvm_vcpu *vcpu); void kvm_pmu_reset(struct kvm_vcpu *vcpu); void kvm_pmu_init(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 89730f8..5580f1a 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -17,6 +17,7 @@ #include "cpuid.h" #include "lapic.h" #include "pmu.h" +#include "vmx.h" static struct kvm_event_hw_type_mapping intel_arch_events[] = { /* Index must match CPUID 0x0A.EBX bit vector */ @@ -141,6 +142,19 @@ static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu, return &counters[idx]; } +/* Return true if it is one of the lbr related msrs. */ +static inline bool is_lbr_msr(struct kvm_vcpu *vcpu, u32 index) +{ + struct x86_perf_lbr_stack *stack = &vcpu->kvm->arch.lbr_stack; + int nr = stack->nr; + + return !!(index == MSR_LBR_SELECT || + index == stack->tos || + (index >= stack->from && index < stack->from + nr) || + (index >= stack->to && index < stack->to + nr) || + (index >= stack->info && index < stack->info)); +} + static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); @@ -152,9 +166,12 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr) case MSR_CORE_PERF_GLOBAL_CTRL: case MSR_CORE_PERF_GLOBAL_OVF_CTRL: case MSR_IA32_PERF_CAPABILITIES: + case MSR_IA32_DEBUGCTLMSR: ret = pmu->version > 1; break; default: + if (is_lbr_msr(vcpu, msr)) + return pmu->version > 1; ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) || get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) || get_fixed_pmc(pmu, msr); @@ -362,6 +379,163 @@ static bool intel_pmu_lbr_enable(struct kvm_vcpu *vcpu) return true; } +/* + * "set = 1" to make the lbr msrs interceptible, otherwise pass the lbr msrs + * through to the guest. + */ +static void intel_pmu_set_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, + bool set) +{ + unsigned long *msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap; + struct x86_perf_lbr_stack *stack = &vcpu->kvm->arch.lbr_stack; + int nr = stack->nr; + int i; + + vmx_set_intercept_for_msr(msr_bitmap, MSR_LBR_SELECT, + MSR_TYPE_RW, set); + vmx_set_intercept_for_msr(msr_bitmap, stack->tos, + MSR_TYPE_RW, set); + for (i = 0; i < nr; i++) { + vmx_set_intercept_for_msr(msr_bitmap, stack->from + i, + MSR_TYPE_RW, set); + vmx_set_intercept_for_msr(msr_bitmap, stack->to + i, + MSR_TYPE_RW, set); + if (stack->info) + vmx_set_intercept_for_msr(msr_bitmap, stack->info + i, + MSR_TYPE_RW, set); + } +} + +static bool intel_pmu_set_lbr_msr(struct kvm_vcpu *vcpu, + struct msr_data *msr_info) +{ + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + u32 index = msr_info->index; + u64 data = msr_info->data; + bool ret = false; + + /* The lbr event should have been allocated when reaching here. */ + if (WARN_ON(!pmu->lbr_event)) + return ret; + + /* + * Host perf could reclaim the lbr feature via ipi calls, and this can + * be detected via lbr_event->oncpu being set to -1. To ensure the + * writes to the lbr msrs don't happen after the lbr feature has been + * reclaimed by the host, the interrupt is disabled before performing + * the writes. + */ + local_irq_disable(); + if (pmu->lbr_event->oncpu == -1) + goto out; + + switch (index) { + case MSR_IA32_DEBUGCTLMSR: + ret = true; + /* + * Currently, only FREEZE_LBRS_ON_PMI and DEBUGCTLMSR_LBR are + * supported. + */ + data &= (DEBUGCTLMSR_FREEZE_LBRS_ON_PMI | DEBUGCTLMSR_LBR); + vmcs_write64(GUEST_IA32_DEBUGCTL, data); + break; + default: + if (is_lbr_msr(vcpu, index)) { + ret = true; + wrmsrl(index, data); + } + } + +out: + local_irq_enable(); + return ret; +} + +static bool intel_pmu_get_lbr_msr(struct kvm_vcpu *vcpu, + struct msr_data *msr_info) +{ + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + u32 index = msr_info->index; + bool ret = false; + + /* The lbr event should have been allocated when reaching here. */ + if (WARN_ON(!pmu->lbr_event)) + return ret; + + /* + * Disable irq to ensure the lbr feature doesn't get reclaimed by the + * host at the time the value is read from the msr, this avoids the + * host lbr value to be leaked to the guest. If lbr has been reclaimed, + * return 0 on guest reads. + */ + local_irq_disable(); + if (pmu->lbr_event->oncpu == -1) { + msr_info->data = 0; + goto out; + } + + switch (index) { + case MSR_IA32_DEBUGCTLMSR: + msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL); + ret = true; + break; + default: + if (is_lbr_msr(vcpu, index)) { + ret = true; + rdmsrl(index, msr_info->data); + } + } + +out: + local_irq_enable(); + return ret; +} + +static bool intel_pmu_access_lbr_msr(struct kvm_vcpu *vcpu, + struct msr_data *msr_info, + bool set) +{ + u32 index = msr_info->index; + bool ret = false; + + /* Return false if the msr access has nothing to do with lbr. */ + if ((index != MSR_IA32_DEBUGCTLMSR) && !is_lbr_msr(vcpu, index)) + return false; + + /* + * Guest initiated access is allowed when userspace has explicitly + * enabled lbr. + */ + if (!msr_info->host_initiated && !vcpu->kvm->arch.lbr_in_guest) + return false; + + if (intel_pmu_create_lbr_event(vcpu)) + return false; + + if (set) + ret = intel_pmu_set_lbr_msr(vcpu, msr_info); + else + ret = intel_pmu_get_lbr_msr(vcpu, msr_info); + + /* + * If this is the guest's first access to an lbr related msr, pass + * the lbr related msrs through to the guest for direct accesses. + * It is possible in theory that a cpu pinned lbr event could take + * over the lbr feature the moment after the pass-through is set + * up via intel_pmu_set_intercept_for_lbr_msrs below. Don't worry, + * because it will be double checked right before vm-entry to + * ensure the lbr msrs are pass-throughed with the lbr being owned + * by this vcpu's lbr event (see vcpu_enter_guest for more + * details). + */ + if (ret && !vcpu->arch.pmu.lbr_used) { + vcpu->arch.pmu.lbr_used = true; + intel_pmu_set_intercept_for_lbr_msrs(vcpu, false); + } + + return ret; +} + static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); @@ -408,6 +582,8 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) } else if ((pmc = get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0))) { msr_info->data = pmc->eventsel; return 0; + } else if (intel_pmu_access_lbr_msr(vcpu, msr_info, false)) { + return 0; } } @@ -471,12 +647,39 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) reprogram_gp_counter(pmc, data); return 0; } + } else if (intel_pmu_access_lbr_msr(vcpu, msr_info, true)) { + return 0; } } return 1; } +static void intel_pmu_sched_in(struct kvm_vcpu *vcpu, int cpu) +{ + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + u64 guest_debugctl; + + /* + * The lbr feature was used in last vcpu time slice, so set the lbr + * msrs interceptible so that we can capture whether it's used again + * in this time slice. + */ + if (pmu->lbr_used) { + pmu->lbr_used = false; + intel_pmu_set_intercept_for_lbr_msrs(vcpu, true); + } else if (pmu->lbr_event) { + /* + * The lbr feature wasn't used during last vcpu time slice + * and the vcpu lbr event hasn't been freed, so it's time to + * free the lbr event. + */ + guest_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); + if (!(guest_debugctl & DEBUGCTLMSR_LBR)) + intel_pmu_free_lbr_event(vcpu); + } +} + static void intel_pmu_refresh(struct kvm_vcpu *vcpu) { struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); @@ -574,6 +777,8 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu) pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status = pmu->global_ovf_ctrl = 0; + + intel_pmu_free_lbr_event(vcpu); } struct kvm_pmu_ops intel_pmu_ops = { @@ -587,6 +792,7 @@ struct kvm_pmu_ops intel_pmu_ops = { .lbr_enable = intel_pmu_lbr_enable, .get_msr = intel_pmu_get_msr, .set_msr = intel_pmu_set_msr, + .sched_in = intel_pmu_sched_in, .refresh = intel_pmu_refresh, .init = intel_pmu_init, .reset = intel_pmu_reset, diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 074385c..af1a1f9 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3557,8 +3557,8 @@ static __always_inline void vmx_enable_intercept_for_msr(unsigned long *msr_bitm } } -static __always_inline void vmx_set_intercept_for_msr(unsigned long *msr_bitmap, - u32 msr, int type, bool value) +void vmx_set_intercept_for_msr(unsigned long *msr_bitmap, u32 msr, int type, + bool value) { if (value) vmx_enable_intercept_for_msr(msr_bitmap, msr, type); diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 82d0bc3..85acaf0 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -328,6 +328,8 @@ void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu); void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu); +void vmx_set_intercept_for_msr(unsigned long *msr_bitmap, u32 msr, int type, + bool value); struct shared_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr); void pt_update_intercept_for_msr(struct vcpu_vmx *vmx); void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7a7cd93..b76f019 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9294,6 +9294,8 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) { vcpu->arch.l1tf_flush_l1d = true; + + kvm_pmu_sched_in(vcpu, cpu); kvm_x86_ops->sched_in(vcpu, cpu); } From patchwork Tue Aug 6 07:16:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078417 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3D24214E5 for ; Tue, 6 Aug 2019 08:06:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 27988223A6 for ; Tue, 6 Aug 2019 08:06:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1BE6228496; Tue, 6 Aug 2019 08:06:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 896B32845E for ; Tue, 6 Aug 2019 08:06:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732282AbfHFIGd (ORCPT ); Tue, 6 Aug 2019 04:06:33 -0400 Received: from mga03.intel.com ([134.134.136.65]:5872 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732426AbfHFIGS (ORCPT ); Tue, 6 Aug 2019 04:06:18 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:01:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337498" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:01:04 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 13/14] KVM/x86/vPMU: check the lbr feature before entering guest Date: Tue, 6 Aug 2019 15:16:13 +0800 Message-Id: <1565075774-26671-14-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> References: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The guest can access the lbr related msrs only when the vcpu's lbr event has been assigned the lbr feature. A cpu pinned lbr event (though no such event usages in the current upstream kernel) could reclaim the lbr feature from the vcpu's lbr event (task pinned) via ipi calls. If the cpu is running in the non-root mode, this will cause the cpu to vm-exit to handle the host ipi and then vm-entry back to the guest. So on vm-entry (where interrupt has been disabled), we double confirm that the vcpu's lbr event is still assigned the lbr feature via checking event->oncpu. The pass-through of the lbr related msrs will be cancelled if the lbr is reclaimed, and the following guest accesses to the lbr related msrs will vm-exit to the related msr emulation handler in kvm, which will prevent the accesses. Signed-off-by: Wei Wang --- arch/x86/kvm/pmu.c | 6 ++++++ arch/x86/kvm/pmu.h | 3 +++ arch/x86/kvm/vmx/pmu_intel.c | 35 +++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c | 13 +++++++++++++ 4 files changed, 57 insertions(+) diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index afad092..ed10a57 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -339,6 +339,12 @@ bool kvm_pmu_lbr_enable(struct kvm_vcpu *vcpu) return false; } +void kvm_pmu_enabled_feature_confirm(struct kvm_vcpu *vcpu) +{ + if (kvm_x86_ops->pmu_ops->enabled_feature_confirm) + kvm_x86_ops->pmu_ops->enabled_feature_confirm(vcpu); +} + void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu) { if (lapic_in_kernel(vcpu)) diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index f875721..7467907 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -30,6 +30,7 @@ struct kvm_pmu_ops { int (*is_valid_msr_idx)(struct kvm_vcpu *vcpu, unsigned idx); bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr); bool (*lbr_enable)(struct kvm_vcpu *vcpu); + void (*enabled_feature_confirm)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info); void (*sched_in)(struct kvm_vcpu *vcpu, int cpu); @@ -126,6 +127,8 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp); bool is_vmware_backdoor_pmc(u32 pmc_idx); +void kvm_pmu_enabled_feature_confirm(struct kvm_vcpu *vcpu); + extern struct kvm_pmu_ops intel_pmu_ops; extern struct kvm_pmu_ops amd_pmu_ops; #endif /* __KVM_X86_PMU_H */ diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 5580f1a..421051aa 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -781,6 +781,40 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu) intel_pmu_free_lbr_event(vcpu); } +void intel_pmu_lbr_confirm(struct kvm_vcpu *vcpu) +{ + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); + + /* + * Either lbr_event being NULL or lbr_used being false indicates that + * the lbr msrs haven't been passed through to the guest, so no need + * to cancel passthrough. + */ + if (!pmu->lbr_event || !pmu->lbr_used) + return; + + /* + * The lbr feature gets reclaimed via IPI calls, so checking of + * lbr_event->oncpu needs to be in an atomic context. Just confirm + * that irq has been disabled already. + */ + lockdep_assert_irqs_disabled(); + + /* + * Cancel the pass-through of the lbr msrs if lbr has been reclaimed + * by the host perf. + */ + if (pmu->lbr_event->oncpu != -1) { + pmu->lbr_used = false; + intel_pmu_set_intercept_for_lbr_msrs(vcpu, true); + } +} + +void intel_pmu_enabled_feature_confirm(struct kvm_vcpu *vcpu) +{ + intel_pmu_lbr_confirm(vcpu); +} + struct kvm_pmu_ops intel_pmu_ops = { .find_arch_event = intel_find_arch_event, .find_fixed_event = intel_find_fixed_event, @@ -790,6 +824,7 @@ struct kvm_pmu_ops intel_pmu_ops = { .is_valid_msr_idx = intel_is_valid_msr_idx, .is_valid_msr = intel_is_valid_msr, .lbr_enable = intel_pmu_lbr_enable, + .enabled_feature_confirm = intel_pmu_enabled_feature_confirm, .get_msr = intel_pmu_get_msr, .set_msr = intel_pmu_set_msr, .sched_in = intel_pmu_sched_in, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b76f019..efaf0e8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7985,6 +7985,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) smp_mb__after_srcu_read_unlock(); /* + * Higher priority host perf events (e.g. cpu pinned) could reclaim the + * pmu resources (e.g. lbr) that were assigned to the vcpu. This is + * usually done via ipi calls (see perf_install_in_context for + * details). + * + * Before entering the non-root mode (with irq disabled here), double + * confirm that the pmu features enabled to the guest are not reclaimed + * by higher priority host events. Otherwise, disallow vcpu's access to + * the reclaimed features. + */ + kvm_pmu_enabled_feature_confirm(vcpu); + + /* * This handles the case where a posted interrupt was * notified with kvm_vcpu_kick. */ From patchwork Tue Aug 6 07:16:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078413 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3DCB214E5 for ; Tue, 6 Aug 2019 08:06:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2CF482845E for ; Tue, 6 Aug 2019 08:06:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1DF532872A; Tue, 6 Aug 2019 08:06:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AFFE92845E for ; Tue, 6 Aug 2019 08:06:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732445AbfHFIGV (ORCPT ); Tue, 6 Aug 2019 04:06:21 -0400 Received: from mga03.intel.com ([134.134.136.65]:5872 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732429AbfHFIGT (ORCPT ); Tue, 6 Aug 2019 04:06:19 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:01:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337527" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:01:07 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 14/14] KVM/x86: remove the common handling of the debugctl msr Date: Tue, 6 Aug 2019 15:16:14 +0800 Message-Id: <1565075774-26671-15-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> References: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The debugctl msr is not completely identical on AMD and Intel CPUs, for example, FREEZE_LBRS_ON_PMI is supported by Intel CPUs only. Now, this msr is handled separatedly in svm.c and intel_pmu.c. So remove the common debugctl msr handling code in kvm_get/set_msr_common. Cc: Paolo Bonzini Cc: Andi Kleen Cc: Peter Zijlstra Signed-off-by: Wei Wang --- arch/x86/kvm/x86.c | 13 ------------- 1 file changed, 13 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index efaf0e8..3839ebd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2528,18 +2528,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 1; } break; - case MSR_IA32_DEBUGCTLMSR: - if (!data) { - /* We support the non-activated case already */ - break; - } else if (data & ~(DEBUGCTLMSR_LBR | DEBUGCTLMSR_BTF)) { - /* Values other than LBR and BTF are vendor-specific, - thus reserved and should throw a #GP */ - return 1; - } - vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n", - __func__, data); - break; case 0x200 ... 0x2ff: return kvm_mtrr_set_msr(vcpu, msr, data); case MSR_IA32_APICBASE: @@ -2800,7 +2788,6 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) switch (msr_info->index) { case MSR_IA32_PLATFORM_ID: case MSR_IA32_EBL_CR_POWERON: - case MSR_IA32_DEBUGCTLMSR: case MSR_IA32_LASTBRANCHFROMIP: case MSR_IA32_LASTBRANCHTOIP: case MSR_IA32_LASTINTFROMIP: