From patchwork Wed Apr 6 06:35:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Smita Koralahalli X-Patchwork-Id: 12803054 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A823C433F5 for ; Wed, 6 Apr 2022 10:06:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239285AbiDFKIL (ORCPT ); Wed, 6 Apr 2022 06:08:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56194 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1348882AbiDFKHC (ORCPT ); Wed, 6 Apr 2022 06:07:02 -0400 Received: from NAM02-BN1-obe.outbound.protection.outlook.com (mail-bn1nam07on2076.outbound.protection.outlook.com [40.107.212.76]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E59C4C6F3B; Tue, 5 Apr 2022 23:36:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NiCbpAx7pPxs7rbbKJqY+6bJRH4VR/5GHNbe1ex35ypLI9O8TNiQxH0ifd6Mgr2npD5ipKUpeU/ygfy7vA68HhKuDZGKnSONOTmKKabsbRzIevIeSJKNpcHikNULwOJCaZnMqEAz0W3B3tk6vbvnacm1xRlfh0t22Imy+521AJnF+PnHBpRIrAJ6/2JM8TbNZq4zQ3vJyENLlk7NGr8op3WDUg6HggdQ0uGGTNP0ox1LQCf0njDiIrqJbli7ROjofzoJvgwYq9jMZrmnKjWjoW9Yox3V5CtSGhyWiJFx0L/+Ar7Q2W3k/O2IOP+PscNaGTyVuaSz4tbabb8lQSPavg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=M3SQ+n7t4ketuh/trnqK6zjWrJYxj5EqwME18CqKEn0=; b=MsbxLxWNWVwgyWeC2POKrRDqekKWGvSNFU5qQz6o/XHkTZDVkyKOkN+x37uENy3oO6SUV94BM49RkHsvUhWZssDPxWWlA+Jg5Nw+pQTHidKF609UlQnkPEcufpSqdweKojgEgP+yNr31Dc/2Xs24nNFkD2lgY3BNKGfG7Kjw/VP/meBNVFhS0U+ySaqALgDUJHkN6qdwAO3E8+4rQFnXRl6YRJ3Pnl1/xOWM2FqBFnWT37vwEGp1CjAKyqGF+iY+n+dckXyi+SI6/1Gq0PhwTNC9yJbYSu0f5YODZErFmEeOqHgtt4PzJy/4ucYfVxLFoy9I+mpWUu61HCb1k+p6Nw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=intel.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=M3SQ+n7t4ketuh/trnqK6zjWrJYxj5EqwME18CqKEn0=; b=rf+FUDDlQJSV2lIKr0QMB1ZlaHm3aTYddyN4e5rjdQy8MS9BdfWlmzCsB4yOdUCbywIVLXpNRFVrPMr7U41tH7tLD3iaJZhcPVLFhEbWhS+TBPApFpWSE8LmjIHklVccftJTqcb7oXzS9q4CbfpvgYvbgT9xcFPOZYBWMnHdpi8= Received: from DS7PR03CA0125.namprd03.prod.outlook.com (2603:10b6:5:3b4::10) by MN2PR12MB4255.namprd12.prod.outlook.com (2603:10b6:208:198::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31; Wed, 6 Apr 2022 06:35:59 +0000 Received: from DM6NAM11FT022.eop-nam11.prod.protection.outlook.com (2603:10b6:5:3b4:cafe::1b) by DS7PR03CA0125.outlook.office365.com (2603:10b6:5:3b4::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31 via Frontend Transport; Wed, 6 Apr 2022 06:35:59 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT022.mail.protection.outlook.com (10.13.172.210) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.5144.20 via Frontend Transport; Wed, 6 Apr 2022 06:35:59 +0000 Received: from ethanolx50f7host.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Wed, 6 Apr 2022 01:35:58 -0500 From: Smita Koralahalli To: Tony Luck , Borislav Petkov CC: Smita Koralahalli , , Yazen Ghannam , Dave Hansen , , , Subject: [PATCH 1/5] x86/mce: Remove old CMCI storm mitigation code Date: Wed, 6 Apr 2022 01:35:38 -0500 Message-ID: <20220406063542.183946-2-Smita.KoralahalliChannabasappa@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220406063542.183946-1-Smita.KoralahalliChannabasappa@amd.com> References: <20220406063542.183946-1-Smita.KoralahalliChannabasappa@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 0abc60cd-039e-40f5-8494-08da1797b67c X-MS-TrafficTypeDiagnostic: MN2PR12MB4255:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: WXue9CeOnhEidlsVaHAJb9gpeIkkqc1wzWGG8MgQAdyxnAXewQMQKWW60zq1kYpP8Pdvpx6b3YeiXgrVO87OIUJRjIGNVeuv0WIH+cWD2IlKeKuj8wWTygONgR+KKjyWyTvRtry2Rpix71VRTDDKaIHY8DHzPCIn0mHfRo1clWTECa2V4andrKkCd71fDpzA8hdMZz8ikIpfvD/nQsufgwjKFYE5kXK0x/2uw+Mdq0M6cj8h9dILv/AE8X0QtDOqtN8UID8DiE6h+cS7rbKoL/HBqYE6YWw32v6zGlG3LmtxJtYd/eEEqNuxgcAXFwebLxy3nlqzzklyW+Xe3i07uFR4BQhp3hqLx3l1HNRudOJx7qVcrWXhrCNifaKTHCv/nyueYUAJNs35bcxsKwfRussy0mFKWn6kv+fxTKX+LSzMLM23CSOPwE7gSZRedbVM8BxkXzt5u9K2r92lWAjD+qJ1NgrrRHSsj3l+C+jlfrkhZV0k7f8wzWqZk7gOn65sbeNYUv0/NX7jxD379Q81rgSDA9dYolArMkeD7aX70NG4aatEVy+ywPGh4VQEhJnK1OJgyoCvyItzo+kUF4y8KiMlGWGc9A2Uv5SKTiUZX6cLZ668Fv5n4mzkwPmx1SQzTD6TxEhckpSTO8B7EtEFRA+FoUtWr4ujTbgkgrh2Y3osQl0shYuVZ1StgPG2ZWmy5CuFb5frL03gqsOf+upNwg== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(46966006)(36840700001)(40470700004)(6666004)(2616005)(7696005)(70586007)(81166007)(47076005)(86362001)(316002)(186003)(336012)(70206006)(54906003)(26005)(82310400005)(40460700003)(16526019)(1076003)(8676002)(110136005)(36756003)(2906002)(4326008)(426003)(83380400001)(356005)(8936002)(5660300002)(508600001)(36860700001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Apr 2022 06:35:59.4523 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0abc60cd-039e-40f5-8494-08da1797b67c X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT022.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB4255 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Tony Luck When a "storm" of CMCI is detected this code mitigates by disabling CMCI interrupt signalling from all of the banks owned by the CPU that saw the storm. There are problems with this approach: 1) It is very coarse grained. In all likelihood only one of the banks was generating the interrupts, but CMCI is disabled for all. This means Linux may delay seeing and processing errors logged from other banks. 2) Although CMCI stands for Corrected Machine Check Interrupt, it is also used to signal when an uncorrected error is logged. This is a problem because these errors should be handled in a timely manner. Delete all this code in preparation for a finer grained solution. Signed-off-by: Tony Luck --- arch/x86/kernel/cpu/mce/core.c | 20 +--- arch/x86/kernel/cpu/mce/intel.c | 145 ----------------------------- arch/x86/kernel/cpu/mce/internal.h | 6 -- 3 files changed, 1 insertion(+), 170 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 981496e6bc0e..331d4f7cf5f2 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1596,13 +1596,6 @@ static unsigned long check_interval = INITIAL_CHECK_INTERVAL; static DEFINE_PER_CPU(unsigned long, mce_next_interval); /* in jiffies */ static DEFINE_PER_CPU(struct timer_list, mce_timer); -static unsigned long mce_adjust_timer_default(unsigned long interval) -{ - return interval; -} - -static unsigned long (*mce_adjust_timer)(unsigned long interval) = mce_adjust_timer_default; - static void __start_timer(struct timer_list *t, unsigned long interval) { unsigned long when = jiffies + interval; @@ -1625,15 +1618,9 @@ static void mce_timer_fn(struct timer_list *t) iv = __this_cpu_read(mce_next_interval); - if (mce_available(this_cpu_ptr(&cpu_info))) { + if (mce_available(this_cpu_ptr(&cpu_info))) machine_check_poll(0, this_cpu_ptr(&mce_poll_banks)); - if (mce_intel_cmci_poll()) { - iv = mce_adjust_timer(iv); - goto done; - } - } - /* * Alert userspace if needed. If we logged an MCE, reduce the polling * interval, otherwise increase the polling interval. @@ -1643,7 +1630,6 @@ static void mce_timer_fn(struct timer_list *t) else iv = min(iv * 2, round_jiffies_relative(check_interval * HZ)); -done: __this_cpu_write(mce_next_interval, iv); __start_timer(t, iv); } @@ -1980,7 +1966,6 @@ static void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c) intel_init_cmci(); intel_init_lmce(); - mce_adjust_timer = cmci_intel_adjust_timer; } static void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c) @@ -1993,7 +1978,6 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c) switch (c->x86_vendor) { case X86_VENDOR_INTEL: mce_intel_feature_init(c); - mce_adjust_timer = cmci_intel_adjust_timer; break; case X86_VENDOR_AMD: { @@ -2649,8 +2633,6 @@ static void mce_reenable_cpu(void) static int mce_cpu_dead(unsigned int cpu) { - mce_intel_hcpu_update(cpu); - /* intentionally ignoring frozen here */ if (!cpuhp_tasks_frozen) cmci_rediscover(); diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c index 95275a5e57e0..052bf2708391 100644 --- a/arch/x86/kernel/cpu/mce/intel.c +++ b/arch/x86/kernel/cpu/mce/intel.c @@ -41,15 +41,6 @@ */ static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned); -/* - * CMCI storm detection backoff counter - * - * During storm, we reset this counter to INITIAL_CHECK_INTERVAL in case we've - * encountered an error. If not, we decrement it by one. We signal the end of - * the CMCI storm when it reaches 0. - */ -static DEFINE_PER_CPU(int, cmci_backoff_cnt); - /* * cmci_discover_lock protects against parallel discovery attempts * which could race against each other. @@ -57,21 +48,6 @@ static DEFINE_PER_CPU(int, cmci_backoff_cnt); static DEFINE_RAW_SPINLOCK(cmci_discover_lock); #define CMCI_THRESHOLD 1 -#define CMCI_POLL_INTERVAL (30 * HZ) -#define CMCI_STORM_INTERVAL (HZ) -#define CMCI_STORM_THRESHOLD 15 - -static DEFINE_PER_CPU(unsigned long, cmci_time_stamp); -static DEFINE_PER_CPU(unsigned int, cmci_storm_cnt); -static DEFINE_PER_CPU(unsigned int, cmci_storm_state); - -enum { - CMCI_STORM_NONE, - CMCI_STORM_ACTIVE, - CMCI_STORM_SUBSIDED, -}; - -static atomic_t cmci_storm_on_cpus; static int cmci_supported(int *banks) { @@ -127,124 +103,6 @@ static bool lmce_supported(void) return tmp & FEAT_CTL_LMCE_ENABLED; } -bool mce_intel_cmci_poll(void) -{ - if (__this_cpu_read(cmci_storm_state) == CMCI_STORM_NONE) - return false; - - /* - * Reset the counter if we've logged an error in the last poll - * during the storm. - */ - if (machine_check_poll(0, this_cpu_ptr(&mce_banks_owned))) - this_cpu_write(cmci_backoff_cnt, INITIAL_CHECK_INTERVAL); - else - this_cpu_dec(cmci_backoff_cnt); - - return true; -} - -void mce_intel_hcpu_update(unsigned long cpu) -{ - if (per_cpu(cmci_storm_state, cpu) == CMCI_STORM_ACTIVE) - atomic_dec(&cmci_storm_on_cpus); - - per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE; -} - -static void cmci_toggle_interrupt_mode(bool on) -{ - unsigned long flags, *owned; - int bank; - u64 val; - - raw_spin_lock_irqsave(&cmci_discover_lock, flags); - owned = this_cpu_ptr(mce_banks_owned); - for_each_set_bit(bank, owned, MAX_NR_BANKS) { - rdmsrl(MSR_IA32_MCx_CTL2(bank), val); - - if (on) - val |= MCI_CTL2_CMCI_EN; - else - val &= ~MCI_CTL2_CMCI_EN; - - wrmsrl(MSR_IA32_MCx_CTL2(bank), val); - } - raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); -} - -unsigned long cmci_intel_adjust_timer(unsigned long interval) -{ - if ((this_cpu_read(cmci_backoff_cnt) > 0) && - (__this_cpu_read(cmci_storm_state) == CMCI_STORM_ACTIVE)) { - mce_notify_irq(); - return CMCI_STORM_INTERVAL; - } - - switch (__this_cpu_read(cmci_storm_state)) { - case CMCI_STORM_ACTIVE: - - /* - * We switch back to interrupt mode once the poll timer has - * silenced itself. That means no events recorded and the timer - * interval is back to our poll interval. - */ - __this_cpu_write(cmci_storm_state, CMCI_STORM_SUBSIDED); - if (!atomic_sub_return(1, &cmci_storm_on_cpus)) - pr_notice("CMCI storm subsided: switching to interrupt mode\n"); - - fallthrough; - - case CMCI_STORM_SUBSIDED: - /* - * We wait for all CPUs to go back to SUBSIDED state. When that - * happens we switch back to interrupt mode. - */ - if (!atomic_read(&cmci_storm_on_cpus)) { - __this_cpu_write(cmci_storm_state, CMCI_STORM_NONE); - cmci_toggle_interrupt_mode(true); - cmci_recheck(); - } - return CMCI_POLL_INTERVAL; - default: - - /* We have shiny weather. Let the poll do whatever it thinks. */ - return interval; - } -} - -static bool cmci_storm_detect(void) -{ - unsigned int cnt = __this_cpu_read(cmci_storm_cnt); - unsigned long ts = __this_cpu_read(cmci_time_stamp); - unsigned long now = jiffies; - int r; - - if (__this_cpu_read(cmci_storm_state) != CMCI_STORM_NONE) - return true; - - if (time_before_eq(now, ts + CMCI_STORM_INTERVAL)) { - cnt++; - } else { - cnt = 1; - __this_cpu_write(cmci_time_stamp, now); - } - __this_cpu_write(cmci_storm_cnt, cnt); - - if (cnt <= CMCI_STORM_THRESHOLD) - return false; - - cmci_toggle_interrupt_mode(false); - __this_cpu_write(cmci_storm_state, CMCI_STORM_ACTIVE); - r = atomic_add_return(1, &cmci_storm_on_cpus); - mce_timer_kick(CMCI_STORM_INTERVAL); - this_cpu_write(cmci_backoff_cnt, INITIAL_CHECK_INTERVAL); - - if (r == 1) - pr_notice("CMCI storm detected: switching to poll mode\n"); - return true; -} - /* * The interrupt handler. This is called on every event. * Just call the poller directly to log any events. @@ -253,9 +111,6 @@ static bool cmci_storm_detect(void) */ static void intel_threshold_interrupt(void) { - if (cmci_storm_detect()) - return; - machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_banks_owned)); } diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h index 4ae0e603f7fa..17d313c9cc60 100644 --- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -41,18 +41,12 @@ struct dentry *mce_get_debugfs_dir(void); extern mce_banks_t mce_banks_ce_disabled; #ifdef CONFIG_X86_MCE_INTEL -unsigned long cmci_intel_adjust_timer(unsigned long interval); -bool mce_intel_cmci_poll(void); -void mce_intel_hcpu_update(unsigned long cpu); void cmci_disable_bank(int bank); void intel_init_cmci(void); void intel_init_lmce(void); void intel_clear_lmce(void); bool intel_filter_mce(struct mce *m); #else -# define cmci_intel_adjust_timer mce_adjust_timer_default -static inline bool mce_intel_cmci_poll(void) { return false; } -static inline void mce_intel_hcpu_update(unsigned long cpu) { } static inline void cmci_disable_bank(int bank) { } static inline void intel_init_cmci(void) { } static inline void intel_init_lmce(void) { } From patchwork Wed Apr 6 06:35:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Smita Koralahalli X-Patchwork-Id: 12803055 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D0BDC4332F for ; Wed, 6 Apr 2022 10:06:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245340AbiDFKIM (ORCPT ); Wed, 6 Apr 2022 06:08:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237093AbiDFKHF (ORCPT ); Wed, 6 Apr 2022 06:07:05 -0400 Received: from NAM02-DM3-obe.outbound.protection.outlook.com (mail-dm3nam07on2085.outbound.protection.outlook.com [40.107.95.85]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0871FC6F3C; Tue, 5 Apr 2022 23:36:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QHT2nySY9fqq4FR5ypWDtTQ27bAa9l+ymcGEbGK8U8hinM+8kFTyI8hSNpZIV+XATcRDCRuwiitinDoJz+dlLQ7/9JYd0xYMuKVZEXsgZCXX/TAQz6ECUbSPJXFZ7bwXIBjOH6N+jZKjqIXsr7TjSaiPgXPwCw+uTnxC9rLQIOAB2QVJDbqBpgoZWyZIOvyow7Zrgzx+2x3dBiTodB5/tIs+JBcVc+Ae2sjm8mdWXJ5EfLtmAtH4w2AQjtBFuiNB+L+Nw9dn7EF1WNsTKEBZMWI6/NATCkL35sFLxGnl0Bva73Mvvv6BbcucNl/zI3ucqm+OColG/gsY8Gfjz0hDUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ix1OCm4UWW0j9eULSlGTy4JF66hn6i4VBEKBJ5TzXZg=; b=AdGlc8++YW7vakoY+sKpf/DeDmHtfGieYQixV9cj9NFy6SaGc5RH1nvFzcYHExfzpJt/fmTJjHhnQFaT2dK4m5Of0LnEaMfE5B8vQncsIt0LHNkslOE7Rwpo4U6uEK/BNyuynrXUI0sFUp7w2yYFJ6nLUlL7EkXG3Jy7MSovQOVqutsUHKt6YNbUVlOJCYZN9fk5qkIrwSZOprPoW1MyaMWX0BuHJrm9GF/SK//P7DQp+b/3BGrkS3oFQWV2xrJvvnkDq153B16rdydWwfcG296i1svcKTgVrWMYpPLrrMer2+IXWk4Sp76zxeb036rTrX06cY/c/gnIafsXEYLUng== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=intel.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ix1OCm4UWW0j9eULSlGTy4JF66hn6i4VBEKBJ5TzXZg=; b=xAWUf9PwKGTDwPiRViIc9/wlVrVzUygpfc4sop/9zj4SiyOo9/wkrqyVbxrbtA1iRMnWkfjBS2hQ6JFnbDGnfJSZU3TLzTUndKMEWnsRy5OtfVP7XioKQnr1RKD5T8oeoo91IQqjHFQF3x388f5i7Eh9ZDb3leTaZl5YzsaEUN8= Received: from DS7PR05CA0082.namprd05.prod.outlook.com (2603:10b6:8:57::23) by BN8PR12MB4595.namprd12.prod.outlook.com (2603:10b6:408:72::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31; Wed, 6 Apr 2022 06:36:01 +0000 Received: from DM6NAM11FT064.eop-nam11.prod.protection.outlook.com (2603:10b6:8:57:cafe::d9) by DS7PR05CA0082.outlook.office365.com (2603:10b6:8:57::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5144.21 via Frontend Transport; Wed, 6 Apr 2022 06:36:00 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT064.mail.protection.outlook.com (10.13.172.234) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.5144.20 via Frontend Transport; Wed, 6 Apr 2022 06:36:00 +0000 Received: from ethanolx50f7host.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Wed, 6 Apr 2022 01:35:59 -0500 From: Smita Koralahalli To: Tony Luck , Borislav Petkov CC: Smita Koralahalli , , Yazen Ghannam , Dave Hansen , , , Subject: [PATCH 2/5] x86/mce: Add per-bank CMCI storm mitigation Date: Wed, 6 Apr 2022 01:35:39 -0500 Message-ID: <20220406063542.183946-3-Smita.KoralahalliChannabasappa@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220406063542.183946-1-Smita.KoralahalliChannabasappa@amd.com> References: <20220406063542.183946-1-Smita.KoralahalliChannabasappa@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: c71c05fb-8fa1-4e22-ad33-08da1797b747 X-MS-TrafficTypeDiagnostic: BN8PR12MB4595:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 7jK0TY/xEJAH5/IkOmB+IPCaIlMlYkl9CmFG3uvc4VxRCU2Kq+coyIrgGjhHZiljlB3kgOD8i6PcG5uKxWw6HgXsdvlugLiwaYJNYYGJ5ZsPx+/Kx1WnBsFCCTBZ/Acn1i4+53AfTIK2HjMrWZnC0wwf6CCd6PEgcCKgrQHjUvkey6qFL0z10+A2xmJz7Kh9gSiQ4ACn8Et+4EYr1/n61NcU/kuTs0WKP+ax9ZWNZ0mZwQZ/GyKKAWRecpjWuEileCTDwzqWPxMXMtXQVpQtRnWmuxpIdm/BcC6rmTKrhWuTqnYf6SCB7ovQ5H1rNyYTWYC0IB0VIAFUB/4V7UehXFc6Z723E1mgDmlntCeTY252tld892DsMw69EwxpJ+ZIt1HymUjWRwKDAzJDmSWki+jKnnZitT3Rd/TvEoQPVJnNepyc9PTAGgpCCnns5fiGX9P0j3yF6EPpE868rP0n9wx407rTlvhCfJKJhKEriPzYanlhdpr/IIWZ/xfPypy/XY/jSe9JUZO9owcYZ92bn+FTQLIyY0OAsXSK6rTm8ClZCRaP6Cju4gSI307jK2W1OqVlCCCObCVFGnnHdZJ26lEywuRvOvL4fcj4szetcK3baKDtf+fDKKUfHSkCgA9ooeO9zmgJ/NrFEDaAVV9t76YROIT5ug8cTB7+L1Z7RhMT78O2glH9fjVxTG/qWUErUjbCWMoUEkzYCwyRSPK5ig== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(36840700001)(40470700004)(46966006)(356005)(2616005)(36860700001)(15650500001)(8936002)(1076003)(5660300002)(2906002)(7696005)(82310400005)(30864003)(47076005)(36756003)(54906003)(83380400001)(70586007)(70206006)(8676002)(6666004)(40460700003)(81166007)(86362001)(4326008)(426003)(16526019)(186003)(26005)(336012)(316002)(110136005)(508600001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Apr 2022 06:36:00.7681 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c71c05fb-8fa1-4e22-ad33-08da1797b747 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT064.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN8PR12MB4595 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Tony Luck Add a hook into machine_check_poll() to keep track of per-CPU, per-bank corrected error logs. Maintain a bitmap history for each bank showing whether the bank logged an corrected error or not each time it is polled. In normal operation the interval between polls of this banks determines how far to shift the history. The 64 bit width corresponds to about one second. When a storm is observed the Rate of interrupts is reduced by setting a large threshold value for this bank in IA32_MCi_CTL2. This bank is added to the bitmap of banks for this CPU to poll. The polling rate is increased to once per second. During a storm each bit in the history indicates the status of the bank each time it is polled. Thus the history covers just over a minute. Declare a storm for that bank if the number of corrected interrupts seen in that history is above some threshold (5 in this RFC code for ease of testing, likely move to 15 for compatibility with previous storm detection). A storm on a bank ends if enough consecutive polls of the bank show no corrected errors (currently 30, may also change). That resets the threshold in IA32_MCi_CTL2 back to 1, removes the bank from the bitmap for polling, and changes the polling rate back to the default. If a CPU with banks in storm mode is taken offline, the new CPU that inherits ownership of those banks takes over management of storm(s) in the inherited bank(s). Signed-off-by: Tony Luck --- arch/x86/kernel/cpu/mce/core.c | 26 +++-- arch/x86/kernel/cpu/mce/intel.c | 146 ++++++++++++++++++++++++++++- arch/x86/kernel/cpu/mce/internal.h | 4 +- 3 files changed, 165 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 331d4f7cf5f2..13844a38aa2c 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -692,6 +692,8 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b) barrier(); m.status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS)); + track_cmci_storm(i, m.status); + /* If this entry is not valid, ignore it */ if (!(m.status & MCI_STATUS_VAL)) continue; @@ -1595,6 +1597,7 @@ static unsigned long check_interval = INITIAL_CHECK_INTERVAL; static DEFINE_PER_CPU(unsigned long, mce_next_interval); /* in jiffies */ static DEFINE_PER_CPU(struct timer_list, mce_timer); +static DEFINE_PER_CPU(bool, storm_poll_mode); static void __start_timer(struct timer_list *t, unsigned long interval) { @@ -1630,22 +1633,29 @@ static void mce_timer_fn(struct timer_list *t) else iv = min(iv * 2, round_jiffies_relative(check_interval * HZ)); - __this_cpu_write(mce_next_interval, iv); - __start_timer(t, iv); + if (__this_cpu_read(storm_poll_mode)) { + __start_timer(t, HZ); + } else { + __this_cpu_write(mce_next_interval, iv); + __start_timer(t, iv); + } } /* - * Ensure that the timer is firing in @interval from now. + * When a storm starts on any bank on this CPU, switch to polling + * once per second. When the storm ends, revert to the default + * polling interval. */ -void mce_timer_kick(unsigned long interval) +void mce_timer_kick(bool storm) { struct timer_list *t = this_cpu_ptr(&mce_timer); - unsigned long iv = __this_cpu_read(mce_next_interval); - __start_timer(t, interval); + __this_cpu_write(storm_poll_mode, storm); - if (interval < iv) - __this_cpu_write(mce_next_interval, interval); + if (storm) + __start_timer(t, HZ); + else + __this_cpu_write(mce_next_interval, check_interval * HZ); } /* Must not be called in IRQ context where del_timer_sync() can deadlock */ diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c index 052bf2708391..59cad4061e5a 100644 --- a/arch/x86/kernel/cpu/mce/intel.c +++ b/arch/x86/kernel/cpu/mce/intel.c @@ -47,8 +47,47 @@ static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned); */ static DEFINE_RAW_SPINLOCK(cmci_discover_lock); +/* + * CMCI storm tracking state + * stormy_bank_count: per-cpu count of MC banks in storm state + * bank_history: bitmask tracking of corrected errors seen in each bank + * bank_time_stamp: last time (in jiffies) that each bank was polled + * cmci_threshold: MCi_CTL2 threshold for each bank when there is no storm + */ +static DEFINE_PER_CPU(int, stormy_bank_count); +static DEFINE_PER_CPU(u64 [MAX_NR_BANKS], bank_history); +static DEFINE_PER_CPU(bool [MAX_NR_BANKS], bank_storm); +static DEFINE_PER_CPU(unsigned long [MAX_NR_BANKS], bank_time_stamp); +static int cmci_threshold[MAX_NR_BANKS]; + +/* Linux non-storm CMCI threshold (may be overridden by BIOS */ #define CMCI_THRESHOLD 1 +/* + * High threshold to limit CMCI rate during storms. Max supported is + * 0x7FFF. Use this slightly smaller value so it has a distinctive + * signature when some asks "Why am I not seeing all corrected errors?" + */ +#define CMCI_STORM_THRESHOLD 32749 + +/* + * How many errors within the history buffer mark the start of a storm + */ +#define STORM_BEGIN_THRESHOLD 5 + +/* + * How many polls of machine check bank without an error before declaring + * the storm is over + */ +#define STORM_END_POLL_THRESHOLD 30 + +/* + * When there is no storm each "bit" in the history represents + * this many jiffies. When there is a storm every poll() takes + * one history bit. + */ +#define HZBITS (HZ / 64) + static int cmci_supported(int *banks) { u64 cap; @@ -103,6 +142,93 @@ static bool lmce_supported(void) return tmp & FEAT_CTL_LMCE_ENABLED; } +/* + * Set a new CMCI threshold value. Preserve the state of the + * MCI_CTL2_CMCI_EN bit in case this happens during a + * cmci_rediscover() operation. + */ +static void cmci_set_threshold(int bank, int thresh) +{ + unsigned long flags; + u64 val; + + raw_spin_lock_irqsave(&cmci_discover_lock, flags); + rdmsrl(MSR_IA32_MCx_CTL2(bank), val); + val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK; + wrmsrl(MSR_IA32_MCx_CTL2(bank), val | thresh); + raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); +} + +static void cmci_storm_begin(int bank) +{ + __set_bit(bank, this_cpu_ptr(mce_poll_banks)); + this_cpu_write(bank_storm[bank], true); + + /* + * If this is the first bank on this CPU to enter storm mode + * start polling + */ + if (this_cpu_inc_return(stormy_bank_count) == 1) + mce_timer_kick(true); +} + +static void cmci_storm_end(int bank) +{ + __clear_bit(bank, this_cpu_ptr(mce_poll_banks)); + this_cpu_write(bank_history[bank], 0ull); + this_cpu_write(bank_storm[bank], false); + + /* If no banks left in storm mode, stop polling */ + if (!this_cpu_dec_return(stormy_bank_count)) + mce_timer_kick(false); +} + +void track_cmci_storm(int bank, u64 status) +{ + unsigned long now = jiffies, delta; + unsigned int shift = 1; + u64 history; + + /* + * When a bank is in storm mode, the history mask covers about + * one second of elapsed time. Check how long it has been since + * this bank was last polled, and compute a shift value to update + * the history bitmask. When not in storm mode, each consecutive + * poll of the bank is logged in the next history bit, so shift + * is kept at "1". + */ + if (this_cpu_read(bank_storm[bank])) { + delta = now - this_cpu_read(bank_time_stamp[bank]); + shift = (delta + HZBITS) / HZBITS; + } + + /* If has been a long time since the last poll, clear history */ + if (shift >= 64) + history = 0; + else + history = this_cpu_read(bank_history[bank]) << shift; + this_cpu_write(bank_time_stamp[bank], now); + + /* History keeps track of corrected errors. VAL=1 && UC=0 */ + if ((status & (MCI_STATUS_VAL | MCI_STATUS_UC)) == MCI_STATUS_VAL) + history |= 1; + this_cpu_write(bank_history[bank], history); + + if (this_cpu_read(bank_storm[bank])) { + if (history & GENMASK_ULL(STORM_END_POLL_THRESHOLD - 1, 0)) + return; + pr_notice("CPU%d BANK%d CMCI storm subsided\n", smp_processor_id(), bank); + cmci_set_threshold(bank, cmci_threshold[bank]); + cmci_storm_end(bank); + } else { + if (hweight64(history) < STORM_BEGIN_THRESHOLD) + return; + pr_notice("CPU%d BANK%d CMCI storm detected\n", smp_processor_id(), bank); + cmci_set_threshold(bank, CMCI_STORM_THRESHOLD); + cmci_storm_begin(bank); + } +} + /* * The interrupt handler. This is called on every event. * Just call the poller directly to log any events. @@ -147,6 +273,9 @@ static void cmci_discover(int banks) continue; } + if ((val & MCI_CTL2_CMCI_THRESHOLD_MASK) == CMCI_STORM_THRESHOLD) + goto storm; + if (!mca_cfg.bios_cmci_threshold) { val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK; val |= CMCI_THRESHOLD; @@ -159,7 +288,7 @@ static void cmci_discover(int banks) bios_zero_thresh = 1; val |= CMCI_THRESHOLD; } - +storm: val |= MCI_CTL2_CMCI_EN; wrmsrl(MSR_IA32_MCx_CTL2(i), val); rdmsrl(MSR_IA32_MCx_CTL2(i), val); @@ -167,7 +296,14 @@ static void cmci_discover(int banks) /* Did the enable bit stick? -- the bank supports CMCI */ if (val & MCI_CTL2_CMCI_EN) { set_bit(i, owned); - __clear_bit(i, this_cpu_ptr(mce_poll_banks)); + if ((val & MCI_CTL2_CMCI_THRESHOLD_MASK) == CMCI_STORM_THRESHOLD) { + pr_notice("CPU%d BANK%d CMCI inherited storm\n", smp_processor_id(), i); + this_cpu_write(bank_history[i], ~0ull); + this_cpu_write(bank_time_stamp[i], jiffies); + cmci_storm_begin(i); + } else { + __clear_bit(i, this_cpu_ptr(mce_poll_banks)); + } /* * We are able to set thresholds for some banks that * had a threshold of 0. This means the BIOS has not @@ -177,6 +313,10 @@ static void cmci_discover(int banks) if (mca_cfg.bios_cmci_threshold && bios_zero_thresh && (val & MCI_CTL2_CMCI_THRESHOLD_MASK)) bios_wrong_thresh = 1; + + /* Save default threshold for each bank */ + if (cmci_threshold[i] == 0) + cmci_threshold[i] = val & MCI_CTL2_CMCI_THRESHOLD_MASK; } else { WARN_ON(!test_bit(i, this_cpu_ptr(mce_poll_banks))); } @@ -218,6 +358,8 @@ static void __cmci_disable_bank(int bank) val &= ~MCI_CTL2_CMCI_EN; wrmsrl(MSR_IA32_MCx_CTL2(bank), val); __clear_bit(bank, this_cpu_ptr(mce_banks_owned)); + if ((val & MCI_CTL2_CMCI_THRESHOLD_MASK) == CMCI_STORM_THRESHOLD) + cmci_storm_end(bank); } /* diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h index 17d313c9cc60..1ee8fc0d97fe 100644 --- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -41,12 +41,14 @@ struct dentry *mce_get_debugfs_dir(void); extern mce_banks_t mce_banks_ce_disabled; #ifdef CONFIG_X86_MCE_INTEL +void track_cmci_storm(int bank, u64 status); void cmci_disable_bank(int bank); void intel_init_cmci(void); void intel_init_lmce(void); void intel_clear_lmce(void); bool intel_filter_mce(struct mce *m); #else +static inline void track_cmci_storm(int bank, u64 status) { } static inline void cmci_disable_bank(int bank) { } static inline void intel_init_cmci(void) { } static inline void intel_init_lmce(void) { } @@ -54,7 +56,7 @@ static inline void intel_clear_lmce(void) { } static inline bool intel_filter_mce(struct mce *m) { return false; } #endif -void mce_timer_kick(unsigned long interval); +void mce_timer_kick(bool storm); #ifdef CONFIG_ACPI_APEI int apei_write_mce(struct mce *m); From patchwork Wed Apr 6 06:35:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Smita Koralahalli X-Patchwork-Id: 12803056 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA1F7C433FE for ; Wed, 6 Apr 2022 10:06:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347195AbiDFKIS (ORCPT ); Wed, 6 Apr 2022 06:08:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243705AbiDFKHF (ORCPT ); Wed, 6 Apr 2022 06:07:05 -0400 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2078.outbound.protection.outlook.com [40.107.236.78]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7A65C6F3D; Tue, 5 Apr 2022 23:36:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=O2oRt+UAxDfImnM5bw3YdGGgF+rk0X0WLkS3iS0YzRswhJyRtjX124rBgwqY/vhoF70e3dDjjG5vq+7CIy0FI4F5mB38PNzcBvuQ6IF/8KJiPnijH0YkJtQ90hagZ3F62l6fX5A1OyUPx2fT75r+4d+8UEWxrWA4LVaqVBWQchTXxZMUdeVFM5egLZht++Mzmyd46gDhTfPEuVgtwKVLPMcD9Xt/3GlkdRtCYbwH0ojxil/8Ul2Jj+BZlkgXjMjKLiHutB8K+oj1ObLMdVXp70+QvXEoV3oeDEoUyuZ7+PvM471h6NDHaBlLho2HelquOSyZbSGJyY6iXYisLzf38A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=w0yf452qIEJ6Q93bJAfgj+qC2g7vcm5Nw9CkGPLij6Q=; b=kOBetQrFyzZ+e/mm8KoJMZIOa0R5JRq0Cr7Tw5uya8kzxMYB5sjmQZJpK1QuicfgGRa7ral8wQ3kSGJntpl+NFxc4e+MQLzSzgS0nuir5oySRWmdr62JokOysnSeRnEQrS0S7TFbitN2OorohxZ6kAK8upkZga051mWrnxfCaqloCI8gJrucyDILK8W4KSjzkqOaARW+4jfwNzEV/T54lq8+X8bcQUQ+EF8lQwxgf6aScxIbee4X9VelbGqaIDZi277gWn162LdZCKI9YSWhrxq1P4GoPUqH8p43Rct8T21ichtmnDqDARo7nIaEQlJMZO+Wo8Us106H4tGvd3LrBQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=intel.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=w0yf452qIEJ6Q93bJAfgj+qC2g7vcm5Nw9CkGPLij6Q=; b=JUZjhIBCzI2EtgmrEJzTZOaiNLEax+Sk3v4PmAHm2QKRA2TyuIrK6PcLplUz/DbqVKggBTJBAG9n6Pq6dozGiNANUh2W1GlvR2wFu0hy44Qcd1PK+cnfwjRMRrMIvi8exr6D6BMgl1ZTiRBFOAHi24ylY9RonsxsApy6ozXD6+g= Received: from DM5PR07CA0076.namprd07.prod.outlook.com (2603:10b6:4:ad::41) by DM6PR12MB3804.namprd12.prod.outlook.com (2603:10b6:5:1cd::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31; Wed, 6 Apr 2022 06:36:02 +0000 Received: from DM6NAM11FT057.eop-nam11.prod.protection.outlook.com (2603:10b6:4:ad:cafe::68) by DM5PR07CA0076.outlook.office365.com (2603:10b6:4:ad::41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31 via Frontend Transport; Wed, 6 Apr 2022 06:36:02 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT057.mail.protection.outlook.com (10.13.172.252) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.5144.20 via Frontend Transport; Wed, 6 Apr 2022 06:36:02 +0000 Received: from ethanolx50f7host.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Wed, 6 Apr 2022 01:36:01 -0500 From: Smita Koralahalli To: Tony Luck , Borislav Petkov CC: Smita Koralahalli , , Yazen Ghannam , Dave Hansen , , , Subject: [RFC PATCH 3/5] x86/mce: Introduce a function pointer mce_handle_storm Date: Wed, 6 Apr 2022 01:35:40 -0500 Message-ID: <20220406063542.183946-4-Smita.KoralahalliChannabasappa@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220406063542.183946-1-Smita.KoralahalliChannabasappa@amd.com> References: <20220406063542.183946-1-Smita.KoralahalliChannabasappa@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 1a305999-60e4-4f7b-16d5-08da1797b81c X-MS-TrafficTypeDiagnostic: DM6PR12MB3804:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: aLkG0MSC/E2EX+Lj80B37HJnx8ylt2wlaH0hbWU6gjm/f40RPAqCDMKQO9FKmkW+FNTqhNIk8x95PU20bO9jxQsjdH0lIQ7cZa/xcsqZpfMvppRaMNNRLdBVn2kgezZ9Wil2Q029iztO8dRH6LpRM2P0pcF0IlDzZeLdVwO/CIWNF3Cr0ToToi1wNvCaavstr3QhUHFgkqZhYzybRaEE5LRbJIpqbYMXnsnxZVOXdaiCy38rNU0WC9wSgnXVoVcPyYrXQt3KHvD5BvB+sG0uhSXzRIWlJrAR8r/jg7qz1W848L3eclk8w7Xr5v7OAbzFyltyebWLNSSLxFndU9Gl/EnvVF0i0GHqYrt9Vic7UnqlteBoj6wKZuxConuu9oH+56kuNu2ULPU40U2mBmbTesdJD/Lg8NdCGkIr3wiin6jl1yudLUv0tr1adIbjFdInqwmO/xEAtG/aQikuiz6hGvkhz0/SxCd1E/DPRXcjnqQAJ5ewZ2tQjRDhxjZ4thP8m47NY/W02JJsAgzj/5bwIdK2gj/9CpWfHDWze5dzR2rRCYUU2N+AnsEBgEDZ0yo0RMQoO8yNVXLa31jzAEB3L3Ir4VZStsW8Cgm6E6A0FsTQpXs9mcEz9VUGpyY4LKN2yTehH/cesoiKSRA75PMO5yg04q0rw2u/gqSp60ln+OCLMqdO8RtPCUiW8Kdk+I2yuUr3dZ7toNXE8l6rLGKH0A== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(40470700004)(46966006)(36840700001)(8676002)(4326008)(36860700001)(70586007)(356005)(5660300002)(70206006)(83380400001)(82310400005)(2616005)(186003)(26005)(336012)(16526019)(426003)(6666004)(81166007)(508600001)(1076003)(40460700003)(8936002)(36756003)(2906002)(110136005)(7696005)(54906003)(316002)(47076005)(86362001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Apr 2022 06:36:02.1601 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1a305999-60e4-4f7b-16d5-08da1797b81c X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT057.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB3804 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org Introduce a function pointer "mce_handle_storm". This function pointer does the vendor specific storm handling. In Intel it points to a routine to set different thresholds in IA32_MCi_CTL2. No functional changes. Signed-off-by: Smita Koralahalli --- The intention of keeping this patch separate was just to not make any changes in Tony's initial code as it can get confusing. These changes could be merged in Tony's new CMCI storm mitigation patch. --- arch/x86/kernel/cpu/mce/core.c | 5 +++++ arch/x86/kernel/cpu/mce/intel.c | 12 ++++++++++-- arch/x86/kernel/cpu/mce/internal.h | 3 +++ 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 13844a38aa2c..db6d60825e77 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1599,6 +1599,10 @@ static DEFINE_PER_CPU(unsigned long, mce_next_interval); /* in jiffies */ static DEFINE_PER_CPU(struct timer_list, mce_timer); static DEFINE_PER_CPU(bool, storm_poll_mode); +void mce_handle_storm_default(int bank, bool on) { } + +void (*mce_handle_storm)(int bank, bool on) = mce_handle_storm_default; + static void __start_timer(struct timer_list *t, unsigned long interval) { unsigned long when = jiffies + interval; @@ -1988,6 +1992,7 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c) switch (c->x86_vendor) { case X86_VENDOR_INTEL: mce_intel_feature_init(c); + mce_handle_storm = mce_intel_handle_storm; break; case X86_VENDOR_AMD: { diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c index 59cad4061e5a..7edc31742fe0 100644 --- a/arch/x86/kernel/cpu/mce/intel.c +++ b/arch/x86/kernel/cpu/mce/intel.c @@ -159,6 +159,14 @@ static void cmci_set_threshold(int bank, int thresh) raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); } +void mce_intel_handle_storm(int bank, bool on) +{ + if (on) + cmci_set_threshold(bank, cmci_threshold[bank]); + else + cmci_set_threshold(bank, CMCI_STORM_THRESHOLD); +} + static void cmci_storm_begin(int bank) { __set_bit(bank, this_cpu_ptr(mce_poll_banks)); @@ -218,13 +226,13 @@ void track_cmci_storm(int bank, u64 status) if (history & GENMASK_ULL(STORM_END_POLL_THRESHOLD - 1, 0)) return; pr_notice("CPU%d BANK%d CMCI storm subsided\n", smp_processor_id(), bank); - cmci_set_threshold(bank, cmci_threshold[bank]); + mce_handle_storm(bank, true); cmci_storm_end(bank); } else { if (hweight64(history) < STORM_BEGIN_THRESHOLD) return; pr_notice("CPU%d BANK%d CMCI storm detected\n", smp_processor_id(), bank); - cmci_set_threshold(bank, CMCI_STORM_THRESHOLD); + mce_handle_storm(bank, false); cmci_storm_begin(bank); } } diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h index 1ee8fc0d97fe..c95802db9535 100644 --- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -42,6 +42,7 @@ extern mce_banks_t mce_banks_ce_disabled; #ifdef CONFIG_X86_MCE_INTEL void track_cmci_storm(int bank, u64 status); +void mce_intel_handle_storm(int bank, bool on); void cmci_disable_bank(int bank); void intel_init_cmci(void); void intel_init_lmce(void); @@ -49,6 +50,7 @@ void intel_clear_lmce(void); bool intel_filter_mce(struct mce *m); #else static inline void track_cmci_storm(int bank, u64 status) { } +# define mce_intel_handle_storm mce_handle_storm_default static inline void cmci_disable_bank(int bank) { } static inline void intel_init_cmci(void) { } static inline void intel_init_lmce(void) { } @@ -57,6 +59,7 @@ static inline bool intel_filter_mce(struct mce *m) { return false; } #endif void mce_timer_kick(bool storm); +extern void (*mce_handle_storm)(int bank, bool on); #ifdef CONFIG_ACPI_APEI int apei_write_mce(struct mce *m); From patchwork Wed Apr 6 06:35:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Smita Koralahalli X-Patchwork-Id: 12803059 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E1C9C433F5 for ; Wed, 6 Apr 2022 10:06:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348909AbiDFKI0 (ORCPT ); Wed, 6 Apr 2022 06:08:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52832 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245525AbiDFKHG (ORCPT ); Wed, 6 Apr 2022 06:07:06 -0400 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2082.outbound.protection.outlook.com [40.107.94.82]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 618B7C6F3E; Tue, 5 Apr 2022 23:36:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Nj4MnzuK1ljBBEh76na5ZcUPQBxSemoMyDgarheZnw7dzZR2VrFSb8ec7xPKNkgF0CUfJ/OSKlzgKCDTwoPFWpQy+nBukoM+6QcFtfeohxCAH+1SL4imlPunn0pJgp3ClQ0idOwLuA4DE3cC6JPGC2a33Ci4soFaT4F7zdUPhgBXwq0cpN1yn4N5XDQEsYbeBy1B1dzoGwIuDNh6VT9dWAQsT0F3/wJ1lwapzKRbwVc+uUYYKO0wjRWyB7f2jcN2AOTsPol6/Z0phtpM4hD7FvWqbW7c4YQJnu8rYALIKU46g4ccuYD9Rw7bohPt1EJTLSUv88rwSWZr/cEhAmxZAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=C+8GKPehvHpdlqm1HL5LfuFgl9gX+YhcAcsVIMeyzI8=; b=BPzp4h+DQJw+8636PCb3gtAKBGIvhM6uvo6E7UBvFSZgI1ZTZCCSh6PcZNfjvu7jCWzWZN9yNRg8lyKNALyl9WLlP7zdxhxNoMm7lb8smHVVvdLqKcV3/rcoS7PohvMHtNwoZscVJ5kQ//PQIE4f2RGDHk425RMIo+KD49TL9pKWvBWH1pNgdcwIvXxYKCgaB8r22nB6NnUONhzAihelaXjRB5gmxewsIiONsQJPO7t2xO3JC1UgQamkJgSSxe0cPvMiPKoUby9CK63Ok1jvDa9QIasMnla5AyGoudgbzKZdxYpJHor9d6ZmQfyvlRz+SJSMNdrlQ3ay4LBibecc3g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=intel.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=C+8GKPehvHpdlqm1HL5LfuFgl9gX+YhcAcsVIMeyzI8=; b=Y8Gsb2o++Sm6AvBd3cXBzQzDq9YnEdWmQl5x7cKd/DbIFuq4bHymlQF9ZzncIOn50UIARahXqo88mWer8+BJ9mdRwADrU1BDzTL6jP7N36Ub8ma4GITp+sTmYbR0wlVLiRGCYhfz0yLIUsALzJ9OJBODD8Vfoc0m5euX6qn8FzE= Received: from DM6PR03CA0020.namprd03.prod.outlook.com (2603:10b6:5:40::33) by DM5PR12MB2439.namprd12.prod.outlook.com (2603:10b6:4:b4::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31; Wed, 6 Apr 2022 06:36:04 +0000 Received: from DM6NAM11FT034.eop-nam11.prod.protection.outlook.com (2603:10b6:5:40:cafe::dc) by DM6PR03CA0020.outlook.office365.com (2603:10b6:5:40::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31 via Frontend Transport; Wed, 6 Apr 2022 06:36:04 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT034.mail.protection.outlook.com (10.13.173.47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.5144.20 via Frontend Transport; Wed, 6 Apr 2022 06:36:03 +0000 Received: from ethanolx50f7host.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Wed, 6 Apr 2022 01:36:02 -0500 From: Smita Koralahalli To: Tony Luck , Borislav Petkov CC: Smita Koralahalli , , Yazen Ghannam , Dave Hansen , , , Subject: [RFC PATCH 4/5] x86/mce: Move storm handling to core. Date: Wed, 6 Apr 2022 01:35:41 -0500 Message-ID: <20220406063542.183946-5-Smita.KoralahalliChannabasappa@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220406063542.183946-1-Smita.KoralahalliChannabasappa@amd.com> References: <20220406063542.183946-1-Smita.KoralahalliChannabasappa@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: cbf7c2ab-29c7-4f4a-5696-08da1797b916 X-MS-TrafficTypeDiagnostic: DM5PR12MB2439:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: DjIaiaMjiizQ6/Jq09e0S8xaoBJAXffrOKRMyNSi8yjYqsNqYfbj6ESsWNM7NMq0R5PCQrXFNjFlPBMGJTTxC0iA0E0Rqe9hJpfSC7zQh5B2EVfo7kHMV3eJJCGwi3qorzN1EUEMx8UxMp464fGENJWwIivX1vSvx5itjmf/DEKYUg39ofGVY+BMYyfc4aUIbJFkw/ESRvazdcnFtq29UFTMkq88menRFa8vbZTOMGDZJ/8+3jWZHdhS8uT8DVB5FoQp4PQwvNHmZL6gDU/S5tnzbBZ1zsyBLCoLqt4MAbbIhCosaEIBAFec4wD401J8YtvfLh/CtQOj9QwoPu44N6dXhtu+QjCcYPZmx8v2TpWo69FWEGGATgFGb7x/9CcSjrWjA2rOsatNpXIv+dHKxPtVNfaGw8hCAnR8FLWfTepnXW/EO/utFz9yC2E/2WIpobDdr0SgaIBIgzzoh7X+9RQp58Tzrf6xHnQVUJk210OcCzj6SKYOQQ8zZzdnaXmHhueJEpGvoXE8W/h0DyCqr68ULtDK1bmm1+1oqlx6iraUwMiLU0XgelWjEQhY0Tp/0r7jiygoQ+BN4Pc6d2gvdiOPfGLbBXN74SZOM5VPvuWq/Xl6DQUcnTv+z2mGWNU7h11BKBxyZOZ4ww1y2z6U6HumD+AWJGJCj2lhsLpvjbuzxwYmKn492AJDLV0FQZ5oz77lsUjESVLXzzMR14TONw== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(36840700001)(40470700004)(46966006)(5660300002)(6666004)(70586007)(70206006)(36756003)(83380400001)(7696005)(86362001)(82310400005)(26005)(336012)(4326008)(186003)(1076003)(2616005)(8676002)(508600001)(426003)(8936002)(40460700003)(16526019)(110136005)(54906003)(2906002)(36860700001)(316002)(356005)(47076005)(81166007)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Apr 2022 06:36:03.8002 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: cbf7c2ab-29c7-4f4a-5696-08da1797b916 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT034.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR12MB2439 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org AMD's storm handling for threshold interrupts is similar to Intel's CMCI storm handling. Hence, make the storm handling code common by moving to core and removing the vendor exclusivity. On the contrary, setting different thresholds to reduce rate of interrupts in IA32_MCi_CTL2 register is kept Intel intact as the storm handling for AMD slightly differs where in it handles the storms by turning off the interrupts. No functional changes. Signed-off-by: Smita Koralahalli --- This is another patch which can be merged into Tony's per CPU per bank CMCI storm mitigation. --- arch/x86/kernel/cpu/mce/core.c | 81 +++++++++++++++++++++++ arch/x86/kernel/cpu/mce/intel.c | 100 +---------------------------- arch/x86/kernel/cpu/mce/internal.h | 25 ++++++++ 3 files changed, 107 insertions(+), 99 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index db6d60825e77..6caee488bf7d 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -611,6 +611,87 @@ static struct notifier_block mce_default_nb = { .priority = MCE_PRIO_LOWEST, }; +/* + * CMCI storm tracking state + * stormy_bank_count: per-cpu count of MC banks in storm state + * bank_history: bitmask tracking of corrected errors seen in each bank + * bank_time_stamp: last time (in jiffies) that each bank was polled + */ +DEFINE_PER_CPU(int, stormy_bank_count); +DEFINE_PER_CPU(u64 [MAX_NR_BANKS], bank_history); +DEFINE_PER_CPU(bool [MAX_NR_BANKS], bank_storm); +DEFINE_PER_CPU(unsigned long [MAX_NR_BANKS], bank_time_stamp); + +void cmci_storm_begin(int bank) +{ + __set_bit(bank, this_cpu_ptr(mce_poll_banks)); + this_cpu_write(bank_storm[bank], true); + + /* + * If this is the first bank on this CPU to enter storm mode + * start polling + */ + if (this_cpu_inc_return(stormy_bank_count) == 1) + mce_timer_kick(true); +} + +void cmci_storm_end(int bank) +{ + __clear_bit(bank, this_cpu_ptr(mce_poll_banks)); + this_cpu_write(bank_history[bank], 0ull); + this_cpu_write(bank_storm[bank], false); + + /* If no banks left in storm mode, stop polling */ + if (!this_cpu_dec_return(stormy_bank_count)) + mce_timer_kick(false); +} + +void track_cmci_storm(int bank, u64 status) +{ + unsigned long now = jiffies, delta; + unsigned int shift = 1; + u64 history; + + /* + * When a bank is in storm mode, the history mask covers about + * one second of elapsed time. Check how long it has been since + * this bank was last polled, and compute a shift value to update + * the history bitmask. When not in storm mode, each consecutive + * poll of the bank is logged in the next history bit, so shift + * is kept at "1". + */ + if (this_cpu_read(bank_storm[bank])) { + delta = now - this_cpu_read(bank_time_stamp[bank]); + shift = (delta + HZBITS) / HZBITS; + } + + /* If has been a long time since the last poll, clear history */ + if (shift >= 64) + history = 0; + else + history = this_cpu_read(bank_history[bank]) << shift; + this_cpu_write(bank_time_stamp[bank], now); + + /* History keeps track of corrected errors. VAL=1 && UC=0 */ + if ((status & (MCI_STATUS_VAL | MCI_STATUS_UC)) == MCI_STATUS_VAL) + history |= 1; + this_cpu_write(bank_history[bank], history); + + if (this_cpu_read(bank_storm[bank])) { + if (history & GENMASK_ULL(STORM_END_POLL_THRESHOLD - 1, 0)) + return; + pr_notice("CPU%d BANK%d CMCI storm subsided\n", smp_processor_id(), bank); + mce_handle_storm(bank, true); + cmci_storm_end(bank); + } else { + if (hweight64(history) < STORM_BEGIN_THRESHOLD) + return; + pr_notice("CPU%d BANK%d CMCI storm detected\n", smp_processor_id(), bank); + mce_handle_storm(bank, false); + cmci_storm_begin(bank); + } +} + /* * Read ADDR and MISC registers. */ diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c index 7edc31742fe0..6cc9aa97c092 100644 --- a/arch/x86/kernel/cpu/mce/intel.c +++ b/arch/x86/kernel/cpu/mce/intel.c @@ -47,17 +47,7 @@ static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned); */ static DEFINE_RAW_SPINLOCK(cmci_discover_lock); -/* - * CMCI storm tracking state - * stormy_bank_count: per-cpu count of MC banks in storm state - * bank_history: bitmask tracking of corrected errors seen in each bank - * bank_time_stamp: last time (in jiffies) that each bank was polled - * cmci_threshold: MCi_CTL2 threshold for each bank when there is no storm - */ -static DEFINE_PER_CPU(int, stormy_bank_count); -static DEFINE_PER_CPU(u64 [MAX_NR_BANKS], bank_history); -static DEFINE_PER_CPU(bool [MAX_NR_BANKS], bank_storm); -static DEFINE_PER_CPU(unsigned long [MAX_NR_BANKS], bank_time_stamp); +/* MCi_CTL2 threshold for each bank when there is no storm */ static int cmci_threshold[MAX_NR_BANKS]; /* Linux non-storm CMCI threshold (may be overridden by BIOS */ @@ -70,24 +60,6 @@ static int cmci_threshold[MAX_NR_BANKS]; */ #define CMCI_STORM_THRESHOLD 32749 -/* - * How many errors within the history buffer mark the start of a storm - */ -#define STORM_BEGIN_THRESHOLD 5 - -/* - * How many polls of machine check bank without an error before declaring - * the storm is over - */ -#define STORM_END_POLL_THRESHOLD 30 - -/* - * When there is no storm each "bit" in the history represents - * this many jiffies. When there is a storm every poll() takes - * one history bit. - */ -#define HZBITS (HZ / 64) - static int cmci_supported(int *banks) { u64 cap; @@ -167,76 +139,6 @@ void mce_intel_handle_storm(int bank, bool on) cmci_set_threshold(bank, CMCI_STORM_THRESHOLD); } -static void cmci_storm_begin(int bank) -{ - __set_bit(bank, this_cpu_ptr(mce_poll_banks)); - this_cpu_write(bank_storm[bank], true); - - /* - * If this is the first bank on this CPU to enter storm mode - * start polling - */ - if (this_cpu_inc_return(stormy_bank_count) == 1) - mce_timer_kick(true); -} - -static void cmci_storm_end(int bank) -{ - __clear_bit(bank, this_cpu_ptr(mce_poll_banks)); - this_cpu_write(bank_history[bank], 0ull); - this_cpu_write(bank_storm[bank], false); - - /* If no banks left in storm mode, stop polling */ - if (!this_cpu_dec_return(stormy_bank_count)) - mce_timer_kick(false); -} - -void track_cmci_storm(int bank, u64 status) -{ - unsigned long now = jiffies, delta; - unsigned int shift = 1; - u64 history; - - /* - * When a bank is in storm mode, the history mask covers about - * one second of elapsed time. Check how long it has been since - * this bank was last polled, and compute a shift value to update - * the history bitmask. When not in storm mode, each consecutive - * poll of the bank is logged in the next history bit, so shift - * is kept at "1". - */ - if (this_cpu_read(bank_storm[bank])) { - delta = now - this_cpu_read(bank_time_stamp[bank]); - shift = (delta + HZBITS) / HZBITS; - } - - /* If has been a long time since the last poll, clear history */ - if (shift >= 64) - history = 0; - else - history = this_cpu_read(bank_history[bank]) << shift; - this_cpu_write(bank_time_stamp[bank], now); - - /* History keeps track of corrected errors. VAL=1 && UC=0 */ - if ((status & (MCI_STATUS_VAL | MCI_STATUS_UC)) == MCI_STATUS_VAL) - history |= 1; - this_cpu_write(bank_history[bank], history); - - if (this_cpu_read(bank_storm[bank])) { - if (history & GENMASK_ULL(STORM_END_POLL_THRESHOLD - 1, 0)) - return; - pr_notice("CPU%d BANK%d CMCI storm subsided\n", smp_processor_id(), bank); - mce_handle_storm(bank, true); - cmci_storm_end(bank); - } else { - if (hweight64(history) < STORM_BEGIN_THRESHOLD) - return; - pr_notice("CPU%d BANK%d CMCI storm detected\n", smp_processor_id(), bank); - mce_handle_storm(bank, false); - cmci_storm_begin(bank); - } -} - /* * The interrupt handler. This is called on every event. * Just call the poller directly to log any events. diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h index c95802db9535..49907cadf9ad 100644 --- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -60,6 +60,31 @@ static inline bool intel_filter_mce(struct mce *m) { return false; } void mce_timer_kick(bool storm); extern void (*mce_handle_storm)(int bank, bool on); +void cmci_storm_begin(int bank); +void cmci_storm_end(int bank); + +DECLARE_PER_CPU(int, stormy_bank_count); +DECLARE_PER_CPU(u64 [MAX_NR_BANKS], bank_history); +DECLARE_PER_CPU(bool [MAX_NR_BANKS], bank_storm); +DECLARE_PER_CPU(unsigned long [MAX_NR_BANKS], bank_time_stamp); + +/* + * How many errors within the history buffer mark the start of a storm + */ +#define STORM_BEGIN_THRESHOLD 5 + +/* + * How many polls of machine check bank without an error before declaring + * the storm is over + */ +#define STORM_END_POLL_THRESHOLD 30 + +/* + * When there is no storm each "bit" in the history represents + * this many jiffies. When there is a storm every poll() takes + * one history bit. + */ +#define HZBITS (HZ / 64) #ifdef CONFIG_ACPI_APEI int apei_write_mce(struct mce *m); From patchwork Wed Apr 6 06:35:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Smita Koralahalli X-Patchwork-Id: 12803058 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D54AC433EF for ; Wed, 6 Apr 2022 10:06:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351428AbiDFKIW (ORCPT ); Wed, 6 Apr 2022 06:08:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343622AbiDFKHG (ORCPT ); Wed, 6 Apr 2022 06:07:06 -0400 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2046.outbound.protection.outlook.com [40.107.92.46]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3E9FC6F3F; Tue, 5 Apr 2022 23:36:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VEZE5j7+MS2yk5fYrCB6nVtVVjGk9MqnF3spTgrvct3mnIaYSY2ERWsY3n5pDw3AgaCOczWDdwhkMatnDzufFITlFk/s4xK/JkfYZ6U3+YldQltaJA85IWhdNQL8+3VY2YbWwBc256+a0U2oJlapoG88SXb0TuvaZuFmO0D+MMTjgqLO2n3en80D4Rnva2NnaqhmUqqFU5FB+Xkl7FASzkTGBMjeATk4Jj1frmrtXoL3RnYhfCXb4T3Ap7LrHmf/Zhf2kfipfXSj/zRUuF6zq0wpDNDiMSBLJkeSZUOJnPgUHXtYUjf90NiMUqQE2Npcj6JxY2TVTCFHaTcIWmljqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1VlTqkxDIxDU73UZf91V0PrK0Txhqggj600+WuSGKkc=; b=cuDVU0aV+LLA89R5aIz8O4bt02p+E+blIe6gr2/O5HqIW+fWFFIjbI3D7k5SsXD5PhDoM0Z7zU264IdZR/BSo7GNcDcxVbUGAPDqOV9LJ6ohS6xMmT2G/Pk85fRv4BKrU8pp26ohLNW/HJs3rvIJ5inXO4rg5PKiinwvzDslbLecwCQbvxf93mckjB2F3wwE53nnb8ggCmpSpSKddWK2tEG0rsiflvNpR/6vf5xxq2IWGq/+60KB6wLHT7AorRjM+fhjFPP9RhuxTqNMnQ0DoaU7tFlRQ/aJjK5op7jUgDOxyw+W7Qg0PpGKoJsWH9yOMGaOf2CqtWZKSUmg3fnJvg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=intel.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1VlTqkxDIxDU73UZf91V0PrK0Txhqggj600+WuSGKkc=; b=d0iD5kLE1/IxoMA2/4F6jm4o7ZfQR2Rl8R6pJsfMMfOAOjsX14QPkOMPMnZSHcu+Wunh6XP57y876R2Ya1tk+COR9p7WzGF8X+PIiXoRw+645IGopbI9yIiGtD/ffNO4djnUM9T47+8yd9KTVNmcud3xUC1Q5sgKdWc7YvE+we4= Received: from DM5PR07CA0057.namprd07.prod.outlook.com (2603:10b6:4:ad::22) by MW3PR12MB4345.namprd12.prod.outlook.com (2603:10b6:303:59::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31; Wed, 6 Apr 2022 06:36:06 +0000 Received: from DM6NAM11FT057.eop-nam11.prod.protection.outlook.com (2603:10b6:4:ad:cafe::8c) by DM5PR07CA0057.outlook.office365.com (2603:10b6:4:ad::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5123.31 via Frontend Transport; Wed, 6 Apr 2022 06:36:06 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT057.mail.protection.outlook.com (10.13.172.252) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.5144.20 via Frontend Transport; Wed, 6 Apr 2022 06:36:06 +0000 Received: from ethanolx50f7host.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Wed, 6 Apr 2022 01:36:05 -0500 From: Smita Koralahalli To: Tony Luck , Borislav Petkov CC: Smita Koralahalli , , Yazen Ghannam , Dave Hansen , , , Subject: [RFC PATCH 5/5] x86/mce: Handle AMD threshold interrupt storms Date: Wed, 6 Apr 2022 01:35:42 -0500 Message-ID: <20220406063542.183946-6-Smita.KoralahalliChannabasappa@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220406063542.183946-1-Smita.KoralahalliChannabasappa@amd.com> References: <20220406063542.183946-1-Smita.KoralahalliChannabasappa@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 2c33f646-9fd2-4919-193d-08da1797ba70 X-MS-TrafficTypeDiagnostic: MW3PR12MB4345:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: U5eqYI9TvSqvX01gPSu7Gf5LoawKuDeD6ZptWyevlh20WixwT0oVKp6MnztEJaQYdq/bP3L5A7eAaBEoKgtZ+gcZ4BpSyu6GnDxoWOnCfPL6e5JfmNJidBruiVUYLZuGgDd1Sp0DHmGiITav39vr/ffVFpPkvPrDiYFzogU5flkaWaK3ZV0x4cKrMLmIuhs5ZQzmv7PH+MNmv5sbLjWvY7ZMz5PaWFl+FJSj+v39VVIvZkeKI7jl/wsYNhZiOXRiYcHy81bFEHmf44UQq2NbOzyYmKL4RGbVfF9bRmCH7H7IU5nrgfrjsBwguQvFjxMC234dKUKepHFV+E34GvGMP84uXFkDVy3KPUL2D2TZ4Ni5u89GwbLZU1TVSI5cQYZc1uNdNrU8821/L/6z7GqNBTw9QhaHhMY4ZINAQ0dPxZqikwIeHtJWx6vuRCAej2Vh8rZTXFET5aL+GXEH2UqKaBSpFakISbT94lsOy7NYV4iyX3ELAVyGoER/Ygt7v1c6uYcYTCrGV7nXdFcKmpqkGgVFVqIQQVfOQTk8UlEP0IqbotlVC3BRh8tbka+cw6M8ixZQAvgu28VEd7KUWmz9s4NtTMg0nRFsP1gqydvpBY5IWALiDVSK0RjlV7ggKN5+DSzv6KeF0c+/5fRVRdML/2SyVv4D960Kfo2xJrN9nGlvn9xEp3E5SLLBlBS4WDjbU84bGFYv+v4g8zjpP6ovqHCVfunUf7AIgpEGvDPzhZo= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230001)(4636009)(46966006)(36840700001)(40470700004)(6666004)(2616005)(83380400001)(7696005)(81166007)(40460700003)(47076005)(86362001)(186003)(26005)(54906003)(70206006)(336012)(82310400005)(316002)(16526019)(8676002)(36756003)(2906002)(110136005)(4326008)(70586007)(1076003)(426003)(356005)(508600001)(5660300002)(36860700001)(8936002)(36900700001)(309714004);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Apr 2022 06:36:06.0818 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2c33f646-9fd2-4919-193d-08da1797ba70 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT057.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW3PR12MB4345 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org Extend the logic of handling CMCI storms to AMD threshold interrupts. Rely on the similar approach as of Intel's CMCI to mitigate storms per CPU and per bank. But, unlike CMCI, do not set thresholds and reduce interrupt rate on a storm. Rather, disable the interrupt on the corresponding CPU and bank. Re-enable back the interrupts if enough consecutive polls of the bank show no corrected errors (30, as programmed by Intel). Turning off the threshold interrupts would be a better solution on AMD systems as other error severities will still be handled even if the threshold interrupts are disabled. Signed-off-by: Smita Koralahalli --- arch/x86/kernel/cpu/mce/amd.c | 49 ++++++++++++++++++++++++++++++ arch/x86/kernel/cpu/mce/core.c | 1 + arch/x86/kernel/cpu/mce/internal.h | 4 +++ 3 files changed, 54 insertions(+) diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 1940d305db1c..941b09f4dac5 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -466,6 +466,47 @@ static void threshold_restart_bank(void *_tr) wrmsr(tr->b->address, lo, hi); } +static void _reset_block(struct threshold_block *block) +{ + struct thresh_restart tr; + + memset(&tr, 0, sizeof(tr)); + tr.b = block; + threshold_restart_bank(&tr); +} + +static void toggle_interrupt_reset_block(struct threshold_block *block, bool on) +{ + if (!block) + return; + + block->interrupt_enable = !!on; + _reset_block(block); +} + +void mce_amd_handle_storm(int bank, bool on) +{ + struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL; + struct threshold_bank **bp = this_cpu_read(threshold_banks); + unsigned long flags; + + if (!bp) + return; + + local_irq_save(flags); + + first_block = bp[bank]->blocks; + if (!first_block) + goto end; + + toggle_interrupt_reset_block(first_block, on); + + list_for_each_entry_safe(block, tmp, &first_block->miscj, miscj) + toggle_interrupt_reset_block(block, on); +end: + local_irq_restore(flags); +} + static void mce_threshold_block_init(struct threshold_block *b, int offset) { struct thresh_restart tr = { @@ -867,6 +908,7 @@ static void amd_threshold_interrupt(void) struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL; struct threshold_bank **bp = this_cpu_read(threshold_banks); unsigned int bank, cpu = smp_processor_id(); + u64 status; /* * Validate that the threshold bank has been initialized already. The @@ -880,6 +922,13 @@ static void amd_threshold_interrupt(void) if (!(per_cpu(bank_map, cpu) & (1 << bank))) continue; + rdmsrl(mca_msr_reg(bank, MCA_STATUS), status); + track_cmci_storm(bank, status); + + /* Return early on an interrupt storm */ + if (this_cpu_read(bank_storm[bank])) + return; + first_block = bp[bank]->blocks; if (!first_block) continue; diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 6caee488bf7d..c510dd17f2c5 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -2078,6 +2078,7 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c) case X86_VENDOR_AMD: { mce_amd_feature_init(c); + mce_handle_storm = mce_amd_handle_storm; break; } diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h index 49907cadf9ad..b9e8c8155c66 100644 --- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -213,7 +213,11 @@ extern bool filter_mce(struct mce *m); #ifdef CONFIG_X86_MCE_AMD extern bool amd_filter_mce(struct mce *m); +void track_cmci_storm(int bank, u64 status); +void mce_amd_handle_storm(int bank, bool on); #else +static inline void track_cmci_storm(int bank, u64 status) { } +# define mce_amd_handle_storm mce_handle_storm_default static inline bool amd_filter_mce(struct mce *m) { return false; } #endif