From patchwork Tue Aug 6 07:16:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wang, Wei W" X-Patchwork-Id: 11078397 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1F98614E5 for ; Tue, 6 Aug 2019 08:04:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0F63828893 for ; Tue, 6 Aug 2019 08:04:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 02C6E2887E; Tue, 6 Aug 2019 08:04:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4B314288A9 for ; Tue, 6 Aug 2019 08:04:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732215AbfHFIEo (ORCPT ); Tue, 6 Aug 2019 04:04:44 -0400 Received: from mga03.intel.com ([134.134.136.65]:5752 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728998AbfHFIEo (ORCPT ); Tue, 6 Aug 2019 04:04:44 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Aug 2019 01:00:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,352,1559545200"; d="scan'208";a="373337263" Received: from devel-ww.sh.intel.com ([10.239.48.128]) by fmsmga005.fm.intel.com with ESMTP; 06 Aug 2019 01:00:33 -0700 From: Wei Wang To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, ak@linux.intel.com, peterz@infradead.org, pbonzini@redhat.com Cc: kan.liang@intel.com, mingo@redhat.com, rkrcmar@redhat.com, like.xu@intel.com, wei.w.wang@intel.com, jannh@google.com, arei.gonglei@huawei.com, jmattson@google.com Subject: [PATCH v8 00/14] Guest LBR Enabling Date: Tue, 6 Aug 2019 15:16:00 +0800 Message-Id: <1565075774-26671-1-git-send-email-wei.w.wang@intel.com> X-Mailer: git-send-email 2.7.4 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Last Branch Recording (LBR) is a performance monitor unit (PMU) feature on Intel CPUs that captures branch related info. This patch series enables this feature to KVM guests. Each guest can be configured to expose this LBR feature to the guest via userspace setting the enabling param in KVM_CAP_X86_GUEST_LBR (patch 3). About the lbr emulation method: Since the vcpu get scheduled in, the lbr related msrs are made interceptible. This makes guest first access to a lbr related msr always vm-exit to kvm, so that kvm can know whether the lbr feature is used during the vcpu time slice. The kvm lbr msr handler does the following things: - create an lbr perf event (task pinned) for the vcpu thread. The perf event mainly serves 2 purposes: -- follow the host perf scheduling rules to manage the vcpu's usage of lbr (e.g. a cpu pinned lbr event could reclaim lbr and thus stopping the vcpu's use); -- have the host perf do context switching of the lbr state on the vcpu thread switching. - pass the lbr related msrs through to the guest. This enables the following guest accesses to the lbr related msrs without vm-exit, as long as the vcpu's lbr event owns the lbr feature. A cpu pinned lbr event on the host could come and take over the lbr feature via IPI calls. In this case, the pass-through will be cancelled (patch 13), and the guest following accesses to the lbr msrs will vm-exit to kvm and accesses will be forbidden in the handler. If the guest doesn't touch any of the lbr related msrs (likely the guest doesn't need to run lbr in the near future), the vcpu's lbr perf event will be freed (please see patch 12 commit for more details). * Tests Conclusion: the profiling results on the guest are similar to that on the host. Run: ./perf -b ./test_program - Test on the host: Overhead Command Source Shared Object Source Symbol Target Symbol 22.35% ftest libc-2.23.so [.] __random [.] __random 8.20% ftest ftest [.] qux [.] qux 5.88% ftest ftest [.] random@plt [.] __random 5.88% ftest libc-2.23.so [.] __random [.] __random_r 5.79% ftest ftest [.] main [.] random@plt 5.60% ftest ftest [.] main [.] foo 5.24% ftest libc-2.23.so [.] __random [.] main 5.20% ftest libc-2.23.so [.] __random_r [.] __random 5.00% ftest ftest [.] foo [.] qux 4.91% ftest ftest [.] main [.] bar 4.83% ftest ftest [.] bar [.] qux 4.57% ftest ftest [.] main [.] main 4.38% ftest ftest [.] foo [.] main 4.13% ftest ftest [.] qux [.] foo 3.89% ftest ftest [.] qux [.] bar 3.86% ftest ftest [.] bar [.] main - Test on the guest: Overhead Command Source Shaged Object Source Symbol Target Symbol 22.36% ftest libc-2.23.so [.] random [.] random 8.55% ftest ftest [.] qux [.] qux 5.79% ftest libc-2.23.so [.] random [.] random_r 5.64% ftest ftest [.] random@plt [.] random 5.58% ftest ftest [.] main [.] random@plt 5.55% ftest ftest [.] main [.] foo 5.41% ftest libc-2.23.so [.] random [.] main 5.31% ftest libc-2.23.so [.] random_r [.] random 5.11% ftest ftest [.] foo [.] qux 4.93% ftest ftest [.] main [.] main 4.59% ftest ftest [.] qux [.] bar 4.49% ftest ftest [.] bar [.] main 4.42% ftest ftest [.] bar [.] qux 4.16% ftest ftest [.] main [.] bar 3.95% ftest ftest [.] qux [.] foo 3.79% ftest ftest [.] foo [.] main (due to the lib version difference, "random" is equavlent to __random above) v7->v8 Changelog: - Patch 3: -- document KVM_CAP_X86_GUEST_LBR in api.txt -- make the check of KVM_CAP_X86_GUEST_LBR return the size of struct x86_perf_lbr_stack, to let userspace do a compatibility check. - Patch 7: -- support perf scheduler to not assign a counter for the perf event that has PERF_EV_CAP_NO_COUNTER set (rather than skipping the perf scheduler). This allows the scheduler to detect lbr usage conflicts via get_event_constraints, and lower priority events will finally fail to use lbr. -- define X86_PMC_IDX_NA as "-1", which represents a never assigned counter id. There are other places that use "-1", but could be updated to use the new macro in another patch series. - Patch 8: -- move the event->owner assignment into perf_event_alloc to have it set before event_init is called. Please see this patch's commit for reasons. - Patch 9: -- use "exclude_host" and "is_kernel_event" to decide if the lbr event is used for the vcpu lbr emulation, which doesn't need a counter, and removes the usage of the previous new perf_event_create API. -- remove the unused attr fields. - Patch 10: -- set a hardware reserved bit (bit 62 of LBR_SELECT) to reg->config for the vcpu lbr emulation event. This makes the config different from other host lbr event, so that they don't share the lbr. Please see the comments in the patch for the reasons why they shouldn't share. - Patch 12: -- disable interrupt and check if the vcpu lbr event owns the lbr feature before kvm writing to the lbr related msr. This avoids kvm updating the lbr msrs after lbr has been reclaimed by other events via ipi. -- remove arch v4 related support. - Patch 13: -- double check if the vcpu lbr event owns the lbr feature before vm-entry into the guest. The lbr pass-through will be cancelled if lbr feature has been reclaimed by a cpu pinned lbr event. Previous: https://lkml.kernel.org/r/1562548999-37095-1-git-send-email-wei.w.wang@intel.com Wei Wang (14): perf/x86: fix the variable type of the lbr msrs perf/x86: add a function to get the addresses of the lbr stack msrs KVM/x86: KVM_CAP_X86_GUEST_LBR KVM/x86: intel_pmu_lbr_enable KVM/x86/vPMU: tweak kvm_pmu_get_msr KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest perf/x86: support to create a perf event without counter allocation perf/core: set the event->owner before event_init KVM/x86/vPMU: APIs to create/free lbr perf event for a vcpu thread perf/x86/lbr: don't share lbr for the vcpu usage case perf/x86: save/restore LBR_SELECT on vcpu switching KVM/x86/lbr: lbr emulation KVM/x86/vPMU: check the lbr feature before entering guest KVM/x86: remove the common handling of the debugctl msr Documentation/virt/kvm/api.txt | 26 +++ arch/x86/events/core.c | 36 ++- arch/x86/events/intel/core.c | 3 + arch/x86/events/intel/lbr.c | 95 +++++++- arch/x86/events/perf_event.h | 6 +- arch/x86/include/asm/kvm_host.h | 5 + arch/x86/include/asm/perf_event.h | 17 ++ arch/x86/kvm/cpuid.c | 2 +- arch/x86/kvm/pmu.c | 24 +- arch/x86/kvm/pmu.h | 11 +- arch/x86/kvm/pmu_amd.c | 7 +- arch/x86/kvm/vmx/pmu_intel.c | 476 +++++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/vmx.c | 4 +- arch/x86/kvm/vmx/vmx.h | 2 + arch/x86/kvm/x86.c | 47 ++-- include/linux/perf_event.h | 18 ++ include/uapi/linux/kvm.h | 1 + kernel/events/core.c | 19 +- 18 files changed, 738 insertions(+), 61 deletions(-)