From patchwork Thu Oct 27 20:55:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jitindar Singh, Suraj" X-Patchwork-Id: 13022823 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 934E8FA3740 for ; Thu, 27 Oct 2022 21:03:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236438AbiJ0VDj (ORCPT ); Thu, 27 Oct 2022 17:03:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236636AbiJ0VDD (ORCPT ); Thu, 27 Oct 2022 17:03:03 -0400 Received: from smtp-fw-9103.amazon.com (smtp-fw-9103.amazon.com [207.171.188.200]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECB7E4457C; Thu, 27 Oct 2022 13:55:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1666904123; x=1698440123; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=WnrVw/qJ+xW6uaQ7KumAASkcr57kUcTZanuevNFCbj4=; b=jFKm4zxnQy/4QVHnjSEZ5dfk+K0DpTLaNjhsL8fb8zVMOImwzL8Xfmfj DusMAJGlFmd/5aHRdSi5g1h97MlY/JPGQZRwkCgFQrP8ELslof/X9fpyl jvC6iw5QGul8AC0F6O23+qZ1MGZx846YEZDJ/ieAgDRA8/eNx0lTXtbNN 8=; X-IronPort-AV: E=Sophos;i="5.95,218,1661817600"; d="scan'208";a="1068508416" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-pdx-2c-m6i4x-5eae960a.us-west-2.amazon.com) ([10.25.36.214]) by smtp-border-fw-9103.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2022 20:55:23 +0000 Received: from EX13MTAUWB001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan2.pdx.amazon.com [10.236.137.194]) by email-inbound-relay-pdx-2c-m6i4x-5eae960a.us-west-2.amazon.com (Postfix) with ESMTPS id AA96041740; Thu, 27 Oct 2022 20:55:22 +0000 (UTC) Received: from EX19D030UWB002.ant.amazon.com (10.13.139.182) by EX13MTAUWB001.ant.amazon.com (10.43.161.207) with Microsoft SMTP Server (TLS) id 15.0.1497.42; Thu, 27 Oct 2022 20:55:22 +0000 Received: from u3c3f5cfe23135f.ant.amazon.com (10.43.160.223) by EX19D030UWB002.ant.amazon.com (10.13.139.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.2.1118.15; Thu, 27 Oct 2022 20:55:21 +0000 From: Suraj Jitindar Singh To: CC: , , , , , , , Subject: [PATCH 4.14 17/34] entel_idle: Disable IBRS during long idle Date: Thu, 27 Oct 2022 13:55:09 -0700 Message-ID: <20221027205512.17684-1-surajjs@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20221027204801.13146-1-surajjs@amazon.com> References: <20221027204801.13146-1-surajjs@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.43.160.223] X-ClientProxiedBy: EX13D11UWB002.ant.amazon.com (10.43.161.20) To EX19D030UWB002.ant.amazon.com (10.13.139.182) Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Peter Zijlstra commit bf5835bcdb9635c97f85120dba9bfa21e111130f upstream. Having IBRS enabled while the SMT sibling is idle unnecessarily slows down the running sibling. OTOH, disabling IBRS around idle takes two MSR writes, which will increase the idle latency. Therefore, only disable IBRS around deeper idle states. Shallow idle states are bounded by the tick in duration, since NOHZ is not allowed for them by virtue of their short target residency. Only do this for mwait-driven idle, since that keeps interrupts disabled across idle, which makes disabling IBRS vs IRQ-entry a non-issue. Note: C6 is a random threshold, most importantly C1 probably shouldn't disable IBRS, benchmarking needed. Suggested-by: Tim Chen Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Borislav Petkov Reviewed-by: Josh Poimboeuf Signed-off-by: Borislav Petkov [cascardo: no CPUIDLE_FLAG_IRQ_ENABLE] Signed-off-by: Thadeu Lima de Souza Cascardo Signed-off-by: Greg Kroah-Hartman [ bp: Adjust context ] Signed-off-by: Suraj Jitindar Singh --- arch/x86/include/asm/nospec-branch.h | 1 + arch/x86/kernel/cpu/bugs.c | 6 ++++ drivers/idle/intel_idle.c | 45 ++++++++++++++++++++++++---- 3 files changed, 46 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 73f592d28396..3a1e3062f244 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -293,6 +293,7 @@ static inline void indirect_branch_prediction_barrier(void) /* The Intel SPEC CTRL MSR base value cache */ extern u64 x86_spec_ctrl_base; extern void write_spec_ctrl_current(u64 val, bool force); +extern u64 spec_ctrl_current(void); /* * With retpoline, we must use IBRS to restrict branch prediction diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 01604ea478b0..43a258e592d0 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -77,6 +77,12 @@ void write_spec_ctrl_current(u64 val, bool force) wrmsrl(MSR_IA32_SPEC_CTRL, val); } +u64 spec_ctrl_current(void) +{ + return this_cpu_read(x86_spec_ctrl_current); +} +EXPORT_SYMBOL_GPL(spec_ctrl_current); + /* * The vendor and possibly platform specific bits which can be modified in * x86_spec_ctrl_base. diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 31f54a334b58..51cc492c2e35 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -58,11 +58,13 @@ #include #include #include +#include #include #include #include #include #include +#include #include #include @@ -97,6 +99,8 @@ static const struct idle_cpu *icpu; static struct cpuidle_device __percpu *intel_idle_cpuidle_devices; static int intel_idle(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index); +static int intel_idle_ibrs(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index); static void intel_idle_s2idle(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index); static struct cpuidle_state *cpuidle_state_table; @@ -109,6 +113,12 @@ static struct cpuidle_state *cpuidle_state_table; */ #define CPUIDLE_FLAG_TLB_FLUSHED 0x10000 +/* + * Disable IBRS across idle (when KERNEL_IBRS), is exclusive vs IRQ_ENABLE + * above. + */ +#define CPUIDLE_FLAG_IBRS BIT(16) + /* * MWAIT takes an 8-bit "hint" in EAX "suggesting" * the C-state (top nibble) and sub-state (bottom nibble) @@ -617,7 +627,7 @@ static struct cpuidle_state skl_cstates[] = { { .name = "C6", .desc = "MWAIT 0x20", - .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 85, .target_residency = 200, .enter = &intel_idle, @@ -625,7 +635,7 @@ static struct cpuidle_state skl_cstates[] = { { .name = "C7s", .desc = "MWAIT 0x33", - .flags = MWAIT2flg(0x33) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x33) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 124, .target_residency = 800, .enter = &intel_idle, @@ -633,7 +643,7 @@ static struct cpuidle_state skl_cstates[] = { { .name = "C8", .desc = "MWAIT 0x40", - .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 200, .target_residency = 800, .enter = &intel_idle, @@ -641,7 +651,7 @@ static struct cpuidle_state skl_cstates[] = { { .name = "C9", .desc = "MWAIT 0x50", - .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 480, .target_residency = 5000, .enter = &intel_idle, @@ -649,7 +659,7 @@ static struct cpuidle_state skl_cstates[] = { { .name = "C10", .desc = "MWAIT 0x60", - .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 890, .target_residency = 5000, .enter = &intel_idle, @@ -678,7 +688,7 @@ static struct cpuidle_state skx_cstates[] = { { .name = "C6", .desc = "MWAIT 0x20", - .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 133, .target_residency = 600, .enter = &intel_idle, @@ -935,6 +945,24 @@ static __cpuidle int intel_idle(struct cpuidle_device *dev, return index; } +static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index) +{ + bool smt_active = sched_smt_active(); + u64 spec_ctrl = spec_ctrl_current(); + int ret; + + if (smt_active) + wrmsrl(MSR_IA32_SPEC_CTRL, 0); + + ret = intel_idle(dev, drv, index); + + if (smt_active) + wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl); + + return ret; +} + /** * intel_idle_s2idle - simplified "enter" callback routine for suspend-to-idle * @dev: cpuidle_device @@ -1375,6 +1403,11 @@ static void __init intel_idle_cpuidle_driver_init(void) mark_tsc_unstable("TSC halts in idle" " states deeper than C2"); + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) && + cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IBRS) { + drv->states[drv->state_count].enter = intel_idle_ibrs; + } + drv->states[drv->state_count] = /* structure copy */ cpuidle_state_table[cstate];