From patchwork Thu Nov 17 09:19:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suleiman Souhlal X-Patchwork-Id: 13046376 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F678C4332F for ; Thu, 17 Nov 2022 09:21:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239860AbiKQJVp (ORCPT ); Thu, 17 Nov 2022 04:21:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239895AbiKQJVe (ORCPT ); Thu, 17 Nov 2022 04:21:34 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2964697CC for ; Thu, 17 Nov 2022 01:21:25 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-36810cfa61fso13857277b3.6 for ; Thu, 17 Nov 2022 01:21:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=K9Ae8YptpjIFLt+1WhGpIuKV36hqZRLrgMOTRnegM/o=; b=k+ecxoUDGl6beWbepCuLxfhUxUeYc2nm6udzLdUReRrgpO+qc560Wh+7zFcGrMIzwQ Q+p7plb3pG/0+Xe+r85q0c60PsokagN0ilv9lu6/XSfI8szC0xaFcNcVoY1q7iPxv1tu I1++eJ7b24/73O+qQXEUKYor7Md5vkXJcA92mHnpI+ESXQw6UCDzuD6bOp68QgzB1fdg vNTM2mBuLckJqEYoBMrum7RHl5gHjWV8G4nqlEZoDgjW1pQnpdhBK5AxcBuWwUekrbIQ 4FA3e5AAoZktIhCox/5flWxUnV18vNckBUwEL9uYD81EOK7hI+UCPPvKGqAhCsj8Nvf4 OCIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=K9Ae8YptpjIFLt+1WhGpIuKV36hqZRLrgMOTRnegM/o=; b=MYpVtTPqzsWhA+5m+TF3lGLAgPqFIS+Jc9nKwCsNRNM0cZbQzRz7hXopfjpXAJIzoH 581Jcwgwh4t/bynjpLEBnPqJ15iz4S2yRz80vJTtM9nR6cNjf1oDX5COz8Ck4se82d7n UPXj7M3bXBj1zlMtNqHS2ARPFIUwhedM2Dp105P+hPKVrYassG8GNUAiOXq681i38uQA yHFKn4YqgPN6o2plGwA5ITd9YEy8aC2Vz89+NiTGG/nw+AYv6JOK4gEto4smGNkAN1Zp wIKXhkYxrupoMygATcnprPAKStP0LLN2H/AnqMyg1USYzk63Sn/0RoliBJGSkEmrrVjq QPmQ== X-Gm-Message-State: ANoB5pntTY7i2cCLExFx00cdpZ2wUDN8CA6bCWHq4FBwsBs6FwlkWWje Sypqwy82LXFYoL+FWdQSKFNfypqUrKpDpQ== X-Google-Smtp-Source: AA0mqf7M52p+Uab/sfpbZCekPMbDre4Hl4WcfPGNv/Fl0vgvV1zgr+A1O/uIekNtnXPGzKAnUU1sKisRIw+dUQ== X-Received: from suleiman1.tok.corp.google.com ([2401:fa00:8f:203:416e:f3c7:7f1d:6e]) (user=suleiman job=sendgmr) by 2002:a25:5f05:0:b0:6bc:c982:acc4 with SMTP id t5-20020a255f05000000b006bcc982acc4mr1352958ybb.76.1668676885293; Thu, 17 Nov 2022 01:21:25 -0800 (PST) Date: Thu, 17 Nov 2022 18:19:36 +0900 In-Reply-To: <20221117091952.1940850-1-suleiman@google.com> Message-Id: <20221117091952.1940850-19-suleiman@google.com> Mime-Version: 1.0 References: <20221117091952.1940850-1-suleiman@google.com> X-Mailer: git-send-email 2.38.1.431.g37b22c650d-goog Subject: [PATCH 4.19 18/34] intel_idle: Disable IBRS during long idle From: Suleiman Souhlal To: stable@vger.kernel.org Cc: x86@kernel.org, kvm@vger.kernel.org, bp@alien8.de, pbonzini@redhat.com, peterz@infradead.org, jpoimboe@kernel.org, cascardo@canonical.com, surajjs@amazon.com, ssouhlal@FreeBSD.org, suleiman@google.com Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Peter Zijlstra commit bf5835bcdb9635c97f85120dba9bfa21e111130f upstream. Having IBRS enabled while the SMT sibling is idle unnecessarily slows down the running sibling. OTOH, disabling IBRS around idle takes two MSR writes, which will increase the idle latency. Therefore, only disable IBRS around deeper idle states. Shallow idle states are bounded by the tick in duration, since NOHZ is not allowed for them by virtue of their short target residency. Only do this for mwait-driven idle, since that keeps interrupts disabled across idle, which makes disabling IBRS vs IRQ-entry a non-issue. Note: C6 is a random threshold, most importantly C1 probably shouldn't disable IBRS, benchmarking needed. Suggested-by: Tim Chen Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Borislav Petkov Reviewed-by: Josh Poimboeuf Signed-off-by: Borislav Petkov [cascardo: no CPUIDLE_FLAG_IRQ_ENABLE] Signed-off-by: Thadeu Lima de Souza Cascardo Signed-off-by: Greg Kroah-Hartman [cascardo: context adjustments] Signed-off-by: Thadeu Lima de Souza Cascardo Signed-off-by: Greg Kroah-Hartman Signed-off-by: Suleiman Souhlal --- arch/x86/include/asm/nospec-branch.h | 1 + arch/x86/kernel/cpu/bugs.c | 6 ++++ drivers/idle/intel_idle.c | 43 ++++++++++++++++++++++++---- 3 files changed, 44 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 43a1c7d69dbe..9311f0f9c392 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -309,6 +309,7 @@ static inline void indirect_branch_prediction_barrier(void) /* The Intel SPEC CTRL MSR base value cache */ extern u64 x86_spec_ctrl_base; extern void write_spec_ctrl_current(u64 val, bool force); +extern u64 spec_ctrl_current(void); /* * With retpoline, we must use IBRS to restrict branch prediction diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 63a59dbda780..0734f35d1af1 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -77,6 +77,12 @@ void write_spec_ctrl_current(u64 val, bool force) wrmsrl(MSR_IA32_SPEC_CTRL, val); } +u64 spec_ctrl_current(void) +{ + return this_cpu_read(x86_spec_ctrl_current); +} +EXPORT_SYMBOL_GPL(spec_ctrl_current); + /* * The vendor and possibly platform specific bits which can be modified in * x86_spec_ctrl_base. diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index c4bb67ed8da3..6360c045e3d0 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -58,11 +58,13 @@ #include #include #include +#include #include #include #include #include #include +#include #include #include @@ -109,6 +111,12 @@ static struct cpuidle_state *cpuidle_state_table; */ #define CPUIDLE_FLAG_TLB_FLUSHED 0x10000 +/* + * Disable IBRS across idle (when KERNEL_IBRS), is exclusive vs IRQ_ENABLE + * above. + */ +#define CPUIDLE_FLAG_IBRS BIT(16) + /* * MWAIT takes an 8-bit "hint" in EAX "suggesting" * the C-state (top nibble) and sub-state (bottom nibble) @@ -119,6 +127,24 @@ static struct cpuidle_state *cpuidle_state_table; #define flg2MWAIT(flags) (((flags) >> 24) & 0xFF) #define MWAIT2flg(eax) ((eax & 0xFF) << 24) +static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int index) +{ + bool smt_active = sched_smt_active(); + u64 spec_ctrl = spec_ctrl_current(); + int ret; + + if (smt_active) + wrmsrl(MSR_IA32_SPEC_CTRL, 0); + + ret = intel_idle(dev, drv, index); + + if (smt_active) + wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl); + + return ret; +} + /* * States are indexed by the cstate number, * which is also the index into the MWAIT hint array. @@ -617,7 +643,7 @@ static struct cpuidle_state skl_cstates[] = { { .name = "C6", .desc = "MWAIT 0x20", - .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 85, .target_residency = 200, .enter = &intel_idle, @@ -625,7 +651,7 @@ static struct cpuidle_state skl_cstates[] = { { .name = "C7s", .desc = "MWAIT 0x33", - .flags = MWAIT2flg(0x33) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x33) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 124, .target_residency = 800, .enter = &intel_idle, @@ -633,7 +659,7 @@ static struct cpuidle_state skl_cstates[] = { { .name = "C8", .desc = "MWAIT 0x40", - .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x40) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 200, .target_residency = 800, .enter = &intel_idle, @@ -641,7 +667,7 @@ static struct cpuidle_state skl_cstates[] = { { .name = "C9", .desc = "MWAIT 0x50", - .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x50) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 480, .target_residency = 5000, .enter = &intel_idle, @@ -649,7 +675,7 @@ static struct cpuidle_state skl_cstates[] = { { .name = "C10", .desc = "MWAIT 0x60", - .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 890, .target_residency = 5000, .enter = &intel_idle, @@ -678,7 +704,7 @@ static struct cpuidle_state skx_cstates[] = { { .name = "C6", .desc = "MWAIT 0x20", - .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED, + .flags = MWAIT2flg(0x20) | CPUIDLE_FLAG_TLB_FLUSHED | CPUIDLE_FLAG_IBRS, .exit_latency = 133, .target_residency = 600, .enter = &intel_idle, @@ -1384,6 +1410,11 @@ static void __init intel_idle_cpuidle_driver_init(void) drv->states[drv->state_count] = /* structure copy */ cpuidle_state_table[cstate]; + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) && + cpuidle_state_table[cstate].flags & CPUIDLE_FLAG_IBRS) { + drv->states[drv->state_count].enter = intel_idle_ibrs; + } + drv->state_count += 1; }