From patchwork Tue Jul 30 18:54:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Avadhut Naik X-Patchwork-Id: 13747798 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2076.outbound.protection.outlook.com [40.107.244.76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 836531AA3C6; Tue, 30 Jul 2024 18:54:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.244.76 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722365672; cv=fail; b=Q+KsFXwKUIgXrQ0D8JO6HsAcTRMFyRVavXFDaBEDiB0gLw8UO99lf5HdhsSANnsTFhlbeKA9k3ifRx/vkWSD9mLTx4tyDvfxnDzTplNopS3S88ceq2269grNkbDPTP4jyzQH8cPlShsunQ4Q335MD8vDukvZeXq+fGGBp/wSRVA= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722365672; c=relaxed/simple; bh=41MgKxZi0pX4BRSzhS6qU7qSCQTr7mWqTIJ3AiZC8BA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cVMxwmJZflCkRYwc297+xl7k6KpJXHwFsOXCkoJEmSSAzIPmO7ErQ4ksKxNNrLkDLInjgXCLbdQWb3XSGJ9+Rda6+VQ0+7S+hhBpEPLJdtx8LtVfqNUmb5aQABpdaun2NQV6KyKzMviVPHrzZYHb6SK/L5M7REq9wqeAJOze+Ws= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=3MH+juJZ; arc=fail smtp.client-ip=40.107.244.76 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="3MH+juJZ" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Ig1TCvbdhc25bJOyScBT1FhhfvrlY9Cv29lQal8BiKaRd0oLOfv1PmQT587SXl3Lz733rb1K4b9/I/oO3nIlqzld2JmNy/pUfQASo8ijJPvsFlXeu5JMuSupvqvtj4i7DnMqp2aLeJwlzMt7MKUMcc/BGzs8IjHZ+0T2RTGhsMVp10nPJaSn0MWvJcVyBR9wwrd3rLhMOLXos8MC2WA7J8yrUpRh6wUcpQXyaJclg7nhDqRhMvQ6TjS46SGlftKkdAqTkcqjESKHg6ZjA10qbqdr45ZG9CUYSG4W8JuZ5X0egkfMtgvhGUFpsEzQKa+z/D80cAQjjsgvVfejyAFSjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sWcdXOniSVGbcjjYbXDkJY3i2A2Oy9JKnftXjaZmXR8=; b=psHNgRCh2ZBjbZ5o+reKMtUoKBVwWYBrXWE5E28x7Ngf4rXvj5ljpraW3xRwromL1G5M2fi7K9ONfutQXR4W0vWeKnyCZ11YMdYzCPeXyhxsJHs61vf/8xaop0BAKKXGJweMj8X/UmI7EoybYpx4BS7H2yzUG9RkbJZfmWa56Vefa9FYNcGKC7ieXHOnBZKu1uzM4uxOZc/q1ZSGl8d2UT+h83n74MVuHs3zu4z3YSKxffSTVhDxF2h5uXZ3omH0k1EGS5AKPK2w4KLmipXb2iQ1L5l6nORAkG94tGIx3Jj6eYytQSJw2OpJFdVZL93csV0RDSpXOH+/i+XbyRTLlQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=sWcdXOniSVGbcjjYbXDkJY3i2A2Oy9JKnftXjaZmXR8=; b=3MH+juJZJYMQ/hJIh3FqSTMn3xCg2l23TrbxKH/FjT3+IK3qcqj454dMd5pMK3PcAQVgl/rtH4Ey++dVpaSZXEb3vLw921fIfH7QF7CkTAYYBV3dvCpGx2V2x5YNc61R7AcInaUN2ucx/8HoO16WdQx4Hd3qAxWEcV46N3e/ON8= Received: from CH5P223CA0018.NAMP223.PROD.OUTLOOK.COM (2603:10b6:610:1f3::25) by PH0PR12MB8797.namprd12.prod.outlook.com (2603:10b6:510:28d::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7807.23; Tue, 30 Jul 2024 18:54:25 +0000 Received: from CH1PEPF0000AD7B.namprd04.prod.outlook.com (2603:10b6:610:1f3:cafe::c0) by CH5P223CA0018.outlook.office365.com (2603:10b6:610:1f3::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.35 via Frontend Transport; Tue, 30 Jul 2024 18:54:25 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CH1PEPF0000AD7B.mail.protection.outlook.com (10.167.244.58) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7828.19 via Frontend Transport; Tue, 30 Jul 2024 18:54:25 +0000 Received: from titanite-d354host.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 30 Jul 2024 13:54:23 -0500 From: Avadhut Naik To: , , , CC: , , , , , , , , , , , , , Subject: [PATCH v3 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info Date: Tue, 30 Jul 2024 13:54:03 -0500 Message-ID: <20240730185406.3709876-2-avadhut.naik@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240730185406.3709876-1-avadhut.naik@amd.com> References: <20240730185406.3709876-1-avadhut.naik@amd.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH1PEPF0000AD7B:EE_|PH0PR12MB8797:EE_ X-MS-Office365-Filtering-Correlation-Id: f1b16eb3-71c9-4158-7487-08dcb0c90837 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|7416014|82310400026|376014|36860700013; X-Microsoft-Antispam-Message-Info: SeLFellpw48QhVUJgyOElqxqGMgqQ2ib9DruqnZpOwqQsdDb9m87bq+TrDFCbYnHQV7/IDveBw2U+0eYt1d+B5T8LWGDVyMSRav0dNVLY4UZ1rW6PLJ7nUZblF4jdO5jeN5PZEJc7IzNM0/OmPhfROTy0x2UOv5Cp9fSDkhmN2FncIFTbajS2zwMe3UM8zWr8OI3h4l3JeD3ehbX491sAwEWXeYQ//tuYvNUjJ3AoFlzJSxbSTRt3F73QwUYew24HLs2cBMvLfilRxMX/lIlISUeLZLx6AsnhXz6pBVjTJMNeoxTrGF2/pHfp43EC31LJLdrvqEHZWsYYRd91U4CjiAzXjDfhjPoKKUXnr6qhUT/ctYN/RhhN23qYvu0azgdfwyqzoqOP/gC4WMwBoBlFx58qDYomCAeS3Rmlg/a7xnUxaJqk+MaMnp92fgLMAbOLarogFP5DVYD8xlHSwYffEJR7hk+/hC3zDyyHRc5aIvrdL6pTKnBOhj8g6Q2NnJH+ksAkveG6XgapyeOlouDL8kKT/Dn3GJCY5RjJAL7BA7K/52rEuxZhz4Qt4X2wxp8wMQsOC8OCLTPe8hkGbHAQvOTMgTAQ4+wBnSOWVWRVy+dGXCBA78QIVWJMvSOd1DTGchHR9RYudjvVbHa4wT9AihekHupYyJzghQrtKpiN5AZdXWPe5tBlxwpyFQ1TcXO/6eGtkzE5lL+iRQXWiJS7CDi6Fv77Fj76adEpST+67VIK07MwEejiv0T6HJjR3002dFoEYEPfcOxQU6osA8OZcVu8/9yeMQ+ONzte3tO1jBDHnVBAs/YEdXSb8vg5Q8GJ2sl/81PVmqnQiqJsNhyyK/N9aGxo6wB6Sgwp6JayYvvdRZ1ewQ9DjAgD1e39OL/5yZJhQd1JerxptHcfKj6pZve7zLIzOitydLOOoWjPccgpIOD27XfQfnFiZBnM0X4oD9oKaU2nin1nkCcovH4cLDWEOt81o1wZr3d/W6iyvqjiuRWq0IFacDtHVZYWuH07of8PMHblOc8XOjX7q+tCTjJHnmRuruL0ZxLKVva/cHN9P05cT9gSUgZeGaZGwUU+BT3ynXBhqNmBYKNDMuPaeIX6h0ZjR+BY8YT2HHkkxK11RTV0QnXbnuNiRXTwcM5DGhLI7TUjK9BwHuj0rGGIcGn6x2lsJteZ3bMt2e49cB+0KIVuA8bwuNjILkBNXPxN85CHvzxb0k4urdKYgmYo4p0wqrNQLQf6pnkuWzOvAro1roOO3W/P73vZv78mgF3lMkIpLZGXUZOnhjDUtcOwxLzjx9WrWd2Ag/7TFHYYl1sLGrFBehIMvAvfvZZKuiTOH5r3A/7XIiWJyd5RxVBL7vNX0cbivd7EfjuQFpv+mG5mwAyz2HuSxEp0Qqhswmqt0cOaRSE65lXv4ikeAyhZQ== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(7416014)(82310400026)(376014)(36860700013);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Jul 2024 18:54:25.1597 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f1b16eb3-71c9-4158-7487-08dcb0c90837 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CH1PEPF0000AD7B.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR12MB8797 Currently, exporting new additional machine check error information involves adding new fields for the same at the end of the struct mce. This additional information can then be consumed through mcelog or tracepoint. However, as new MSRs are being added (and will be added in the future) by CPU vendors on their newer CPUs with additional machine check error information to be exported, the size of struct mce will balloon on some CPUs, unnecessarily, since those fields are vendor-specific. Moreover, different CPU vendors may export the additional information in varying sizes. The problem particularly intensifies since struct mce is exposed to userspace as part of UAPI. It's bloating through vendor-specific data should be avoided to limit the information being sent out to userspace. Add a new structure mce_hw_err to wrap the existing struct mce. The same will prevent its ballooning since vendor-specifc data, if any, can now be exported through a union within the wrapper structure and through __dynamic_array in mce_record tracepoint. Furthermore, new internal kernel fields can be added to the wrapper struct without impacting the user space API. [Yazen: Add last commit message paragraph.] Suggested-by: Borislav Petkov (AMD) Signed-off-by: Avadhut Naik Signed-off-by: Yazen Ghannam Signed-off-by: Avadhut Naik --- Changes in v2: [1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-yazen.ghannam@amd.com/ [2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-yazen.ghannam@amd.com/ 1. Drop dependencies on sets [1] and [2] above and rebase on top of tip/master. Changes in v3: 1. Move wrapper changes required in mce_read_aux() and mce_no_way_out() to this patch from the second patch. 2. Fix SoB chain to properly reflect the patch path. --- arch/x86/include/asm/mce.h | 10 +- arch/x86/kernel/cpu/mce/amd.c | 29 ++-- arch/x86/kernel/cpu/mce/apei.c | 54 ++++--- arch/x86/kernel/cpu/mce/core.c | 189 ++++++++++++++---------- arch/x86/kernel/cpu/mce/dev-mcelog.c | 2 +- arch/x86/kernel/cpu/mce/genpool.c | 20 +-- arch/x86/kernel/cpu/mce/inject.c | 4 +- arch/x86/kernel/cpu/mce/internal.h | 4 +- drivers/acpi/acpi_extlog.c | 2 +- drivers/acpi/nfit/mce.c | 2 +- drivers/edac/i7core_edac.c | 2 +- drivers/edac/igen6_edac.c | 2 +- drivers/edac/mce_amd.c | 2 +- drivers/edac/pnd2_edac.c | 2 +- drivers/edac/sb_edac.c | 2 +- drivers/edac/skx_common.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +- drivers/ras/amd/fmpm.c | 2 +- drivers/ras/cec.c | 2 +- include/trace/events/mce.h | 42 +++--- 20 files changed, 210 insertions(+), 166 deletions(-) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 3ad29b128943..ba2b3a5f999e 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -187,6 +187,14 @@ enum mce_notifier_prios { MCE_PRIO_HIGHEST = MCE_PRIO_CEC }; +/** + * struct mce_hw_err - Hardware Error Record. + * @m: Machine Check record. + */ +struct mce_hw_err { + struct mce m; +}; + struct notifier_block; extern void mce_register_decode_chain(struct notifier_block *nb); extern void mce_unregister_decode_chain(struct notifier_block *nb); @@ -222,7 +230,7 @@ static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, #endif void mce_setup(struct mce *m); -void mce_log(struct mce *m); +void mce_log(struct mce_hw_err *err); DECLARE_PER_CPU(struct device *, mce_device); /* Maximum number of MCA banks per CPU. */ diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 9a0133ef7e20..cb7dc0b1aa50 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -778,29 +778,32 @@ bool amd_mce_usable_address(struct mce *m) static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc) { - struct mce m; + struct mce_hw_err err; + struct mce *m = &err.m; - mce_setup(&m); + memset(&err, 0, sizeof(struct mce_hw_err)); - m.status = status; - m.misc = misc; - m.bank = bank; - m.tsc = rdtsc(); + mce_setup(m); - if (m.status & MCI_STATUS_ADDRV) { - m.addr = addr; + m->status = status; + m->misc = misc; + m->bank = bank; + m->tsc = rdtsc(); - smca_extract_err_addr(&m); + if (m->status & MCI_STATUS_ADDRV) { + m->addr = addr; + + smca_extract_err_addr(m); } if (mce_flags.smca) { - rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m.ipid); + rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid); - if (m.status & MCI_STATUS_SYNDV) - rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd); + if (m->status & MCI_STATUS_SYNDV) + rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd); } - mce_log(&m); + mce_log(&err); } DEFINE_IDTENTRY_SYSVEC(sysvec_deferred_error) diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index 7f7309ff67d0..b8f4e75fb8a7 100644 --- a/arch/x86/kernel/cpu/mce/apei.c +++ b/arch/x86/kernel/cpu/mce/apei.c @@ -28,9 +28,12 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) { - struct mce m; + struct mce_hw_err err; + struct mce *m = &err.m; int lsb; + memset(&err, 0, sizeof(struct mce_hw_err)); + if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) return; @@ -44,30 +47,33 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err) else lsb = PAGE_SHIFT; - mce_setup(&m); - m.bank = -1; + mce_setup(m); + m->bank = -1; /* Fake a memory read error with unknown channel */ - m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f; - m.misc = (MCI_MISC_ADDR_PHYS << 6) | lsb; + m->status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f; + m->misc = (MCI_MISC_ADDR_PHYS << 6) | lsb; if (severity >= GHES_SEV_RECOVERABLE) - m.status |= MCI_STATUS_UC; + m->status |= MCI_STATUS_UC; if (severity >= GHES_SEV_PANIC) { - m.status |= MCI_STATUS_PCC; - m.tsc = rdtsc(); + m->status |= MCI_STATUS_PCC; + m->tsc = rdtsc(); } - m.addr = mem_err->physical_addr; - mce_log(&m); + m->addr = mem_err->physical_addr; + mce_log(&err); } EXPORT_SYMBOL_GPL(apei_mce_report_mem_error); int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) { const u64 *i_mce = ((const u64 *) (ctx_info + 1)); + struct mce_hw_err err; + struct mce *m = &err.m; unsigned int cpu; - struct mce m; + + memset(&err, 0, sizeof(struct mce_hw_err)); if (!boot_cpu_has(X86_FEATURE_SMCA)) return -EINVAL; @@ -97,29 +103,29 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) if (ctx_info->reg_arr_size < 48) return -EINVAL; - mce_setup(&m); + mce_setup(m); - m.extcpu = -1; - m.socketid = -1; + m->extcpu = -1; + m->socketid = -1; for_each_possible_cpu(cpu) { if (cpu_data(cpu).topo.initial_apicid == lapic_id) { - m.extcpu = cpu; - m.socketid = cpu_data(m.extcpu).topo.pkg_id; + m->extcpu = cpu; + m->socketid = cpu_data(m->extcpu).topo.pkg_id; break; } } - m.apicid = lapic_id; - m.bank = (ctx_info->msr_addr >> 4) & 0xFF; - m.status = *i_mce; - m.addr = *(i_mce + 1); - m.misc = *(i_mce + 2); + m->apicid = lapic_id; + m->bank = (ctx_info->msr_addr >> 4) & 0xFF; + m->status = *i_mce; + m->addr = *(i_mce + 1); + m->misc = *(i_mce + 2); /* Skipping MCA_CONFIG */ - m.ipid = *(i_mce + 4); - m.synd = *(i_mce + 5); + m->ipid = *(i_mce + 4); + m->synd = *(i_mce + 5); - mce_log(&m); + mce_log(&err); return 0; } diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index b85ec7a4ec9e..ab9f1d606438 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -88,7 +88,7 @@ struct mca_config mca_cfg __read_mostly = { .monarch_timeout = -1 }; -static DEFINE_PER_CPU(struct mce, mces_seen); +static DEFINE_PER_CPU(struct mce_hw_err, hw_errs_seen); static unsigned long mce_need_notify; /* @@ -136,9 +136,9 @@ void mce_setup(struct mce *m) DEFINE_PER_CPU(struct mce, injectm); EXPORT_PER_CPU_SYMBOL_GPL(injectm); -void mce_log(struct mce *m) +void mce_log(struct mce_hw_err *err) { - if (!mce_gen_pool_add(m)) + if (!mce_gen_pool_add(err)) irq_work_queue(&mce_irq_work); } EXPORT_SYMBOL_GPL(mce_log); @@ -159,8 +159,10 @@ void mce_unregister_decode_chain(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(mce_unregister_decode_chain); -static void __print_mce(struct mce *m) +static void __print_mce(struct mce_hw_err *err) { + struct mce *m = &err->m; + pr_emerg(HW_ERR "CPU %d: Machine Check%s: %Lx Bank %d: %016Lx\n", m->extcpu, (m->mcgstatus & MCG_STATUS_MCIP ? " Exception" : ""), @@ -202,9 +204,11 @@ static void __print_mce(struct mce *m) m->microcode); } -static void print_mce(struct mce *m) +static void print_mce(struct mce_hw_err *err) { - __print_mce(m); + struct mce *m = &err->m; + + __print_mce(err); if (m->cpuvendor != X86_VENDOR_AMD && m->cpuvendor != X86_VENDOR_HYGON) pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n"); @@ -239,7 +243,7 @@ static const char *mce_dump_aux_info(struct mce *m) return NULL; } -static noinstr void mce_panic(const char *msg, struct mce *final, char *exp) +static noinstr void mce_panic(const char *msg, struct mce_hw_err *final, char *exp) { struct llist_node *pending; struct mce_evt_llist *l; @@ -270,20 +274,22 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp) pending = mce_gen_pool_prepare_records(); /* First print corrected ones that are still unlogged */ llist_for_each_entry(l, pending, llnode) { - struct mce *m = &l->mce; + struct mce_hw_err *err = &l->err; + struct mce *m = &err->m; if (!(m->status & MCI_STATUS_UC)) { - print_mce(m); + print_mce(err); if (!apei_err) apei_err = apei_write_mce(m); } } /* Now print uncorrected but with the final one last */ llist_for_each_entry(l, pending, llnode) { - struct mce *m = &l->mce; + struct mce_hw_err *err = &l->err; + struct mce *m = &err->m; if (!(m->status & MCI_STATUS_UC)) continue; - if (!final || mce_cmp(m, final)) { - print_mce(m); + if (!final || mce_cmp(m, &final->m)) { + print_mce(err); if (!apei_err) apei_err = apei_write_mce(m); } @@ -291,12 +297,12 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp) if (final) { print_mce(final); if (!apei_err) - apei_err = apei_write_mce(final); + apei_err = apei_write_mce(&final->m); } if (exp) pr_emerg(HW_ERR "Machine check: %s\n", exp); - memmsg = mce_dump_aux_info(final); + memmsg = mce_dump_aux_info(&final->m); if (memmsg) pr_emerg(HW_ERR "Machine check: %s\n", memmsg); @@ -311,9 +317,9 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp) * panic. */ if (kexec_crash_loaded()) { - if (final && (final->status & MCI_STATUS_ADDRV)) { + if (final && (final->m.status & MCI_STATUS_ADDRV)) { struct page *p; - p = pfn_to_online_page(final->addr >> PAGE_SHIFT); + p = pfn_to_online_page(final->m.addr >> PAGE_SHIFT); if (p) SetPageHWPoison(p); } @@ -562,13 +568,13 @@ EXPORT_SYMBOL_GPL(mce_is_correctable); static int mce_early_notifier(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *m = (struct mce *)data; + struct mce_hw_err *err = (struct mce_hw_err *)data; - if (!m) + if (!err) return NOTIFY_DONE; /* Emit the trace record: */ - trace_mce_record(m); + trace_mce_record(err); set_bit(0, &mce_need_notify); @@ -585,7 +591,8 @@ static struct notifier_block early_nb = { static int uc_decode_notifier(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *mce = (struct mce *)data; + struct mce_hw_err *err = (struct mce_hw_err *)data; + struct mce *mce = &err->m; unsigned long pfn; if (!mce || !mce_usable_address(mce)) @@ -612,13 +619,13 @@ static struct notifier_block mce_uc_nb = { static int mce_default_notifier(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *m = (struct mce *)data; + struct mce_hw_err *err = (struct mce_hw_err *)data; - if (!m) + if (!err) return NOTIFY_DONE; - if (mca_cfg.print_all || !m->kflags) - __print_mce(m); + if (mca_cfg.print_all || !(err->m.kflags)) + __print_mce(err); return NOTIFY_DONE; } @@ -632,8 +639,10 @@ static struct notifier_block mce_default_nb = { /* * Read ADDR and MISC registers. */ -static noinstr void mce_read_aux(struct mce *m, int i) +static noinstr void mce_read_aux(struct mce_hw_err *err, int i) { + struct mce *m = &err->m; + if (m->status & MCI_STATUS_MISCV) m->misc = mce_rdmsrl(mca_msr_reg(i, MCA_MISC)); @@ -680,26 +689,29 @@ DEFINE_PER_CPU(unsigned, mce_poll_count); void machine_check_poll(enum mcp_flags flags, mce_banks_t *b) { struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); - struct mce m; + struct mce_hw_err err; + struct mce *m = &err.m; int i; + memset(&err, 0, sizeof(struct mce_hw_err)); + this_cpu_inc(mce_poll_count); - mce_gather_info(&m, NULL); + mce_gather_info(m, NULL); if (flags & MCP_TIMESTAMP) - m.tsc = rdtsc(); + m->tsc = rdtsc(); for (i = 0; i < this_cpu_read(mce_num_banks); i++) { if (!mce_banks[i].ctl || !test_bit(i, *b)) continue; - m.misc = 0; - m.addr = 0; - m.bank = i; + m->misc = 0; + m->addr = 0; + m->bank = i; barrier(); - m.status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS)); + m->status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS)); /* * Update storm tracking here, before checking for the @@ -709,17 +721,17 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b) * storm status. */ if (!mca_cfg.cmci_disabled) - mce_track_storm(&m); + mce_track_storm(m); /* If this entry is not valid, ignore it */ - if (!(m.status & MCI_STATUS_VAL)) + if (!(m->status & MCI_STATUS_VAL)) continue; /* * If we are logging everything (at CPU online) or this * is a corrected error, then we must log it. */ - if ((flags & MCP_UC) || !(m.status & MCI_STATUS_UC)) + if ((flags & MCP_UC) || !(m->status & MCI_STATUS_UC)) goto log_it; /* @@ -729,20 +741,20 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b) * everything else. */ if (!mca_cfg.ser) { - if (m.status & MCI_STATUS_UC) + if (m->status & MCI_STATUS_UC) continue; goto log_it; } /* Log "not enabled" (speculative) errors */ - if (!(m.status & MCI_STATUS_EN)) + if (!(m->status & MCI_STATUS_EN)) goto log_it; /* * Log UCNA (SDM: 15.6.3 "UCR Error Classification") * UC == 1 && PCC == 0 && S == 0 */ - if (!(m.status & MCI_STATUS_PCC) && !(m.status & MCI_STATUS_S)) + if (!(m->status & MCI_STATUS_PCC) && !(m->status & MCI_STATUS_S)) goto log_it; /* @@ -756,20 +768,20 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b) if (flags & MCP_DONTLOG) goto clear_it; - mce_read_aux(&m, i); - m.severity = mce_severity(&m, NULL, NULL, false); + mce_read_aux(&err, i); + m->severity = mce_severity(m, NULL, NULL, false); /* * Don't get the IP here because it's unlikely to * have anything to do with the actual error location. */ - if (mca_cfg.dont_log_ce && !mce_usable_address(&m)) + if (mca_cfg.dont_log_ce && !mce_usable_address(m)) goto clear_it; if (flags & MCP_QUEUE_LOG) - mce_gen_pool_add(&m); + mce_gen_pool_add(&err); else - mce_log(&m); + mce_log(&err); clear_it: /* @@ -893,9 +905,10 @@ static __always_inline void quirk_zen_ifu(int bank, struct mce *m, struct pt_reg * Do a quick check if any of the events requires a panic. * This decides if we keep the events around or clear them. */ -static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp, +static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **msg, unsigned long *validp, struct pt_regs *regs) { + struct mce *m = &err->m; char *tmp = *msg; int i; @@ -913,7 +926,7 @@ static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned lo m->bank = i; if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) { - mce_read_aux(m, i); + mce_read_aux(err, i); *msg = tmp; return 1; } @@ -1005,6 +1018,7 @@ static noinstr int mce_timed_out(u64 *t, const char *msg) static void mce_reign(void) { int cpu; + struct mce_hw_err *err = NULL; struct mce *m = NULL; int global_worst = 0; char *msg = NULL; @@ -1015,11 +1029,13 @@ static void mce_reign(void) * Grade the severity of the errors of all the CPUs. */ for_each_possible_cpu(cpu) { - struct mce *mtmp = &per_cpu(mces_seen, cpu); + struct mce_hw_err *etmp = &per_cpu(hw_errs_seen, cpu); + struct mce *mtmp = &etmp->m; if (mtmp->severity > global_worst) { global_worst = mtmp->severity; - m = &per_cpu(mces_seen, cpu); + err = &per_cpu(hw_errs_seen, cpu); + m = &err->m; } } @@ -1031,7 +1047,7 @@ static void mce_reign(void) if (m && global_worst >= MCE_PANIC_SEVERITY) { /* call mce_severity() to get "msg" for panic */ mce_severity(m, NULL, &msg, true); - mce_panic("Fatal machine check", m, msg); + mce_panic("Fatal machine check", err, msg); } /* @@ -1048,11 +1064,11 @@ static void mce_reign(void) mce_panic("Fatal machine check from unknown source", NULL, NULL); /* - * Now clear all the mces_seen so that they don't reappear on + * Now clear all the hw_errs_seen so that they don't reappear on * the next mce. */ for_each_possible_cpu(cpu) - memset(&per_cpu(mces_seen, cpu), 0, sizeof(struct mce)); + memset(&per_cpu(hw_errs_seen, cpu), 0, sizeof(struct mce_hw_err)); } static atomic_t global_nwo; @@ -1256,12 +1272,13 @@ static noinstr bool mce_check_crashing_cpu(void) } static __always_inline int -__mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *final, +__mc_scan_banks(struct mce_hw_err *err, struct pt_regs *regs, struct mce *final, unsigned long *toclear, unsigned long *valid_banks, int no_way_out, int *worst) { struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); struct mca_config *cfg = &mca_cfg; + struct mce *m = &err->m; int severity, i, taint = 0; for (i = 0; i < this_cpu_read(mce_num_banks); i++) { @@ -1307,7 +1324,7 @@ __mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *final, if (severity == MCE_NO_SEVERITY) continue; - mce_read_aux(m, i); + mce_read_aux(err, i); /* assuming valid severity level != 0 */ m->severity = severity; @@ -1317,7 +1334,7 @@ __mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *final, * done in #MC context, where instrumentation is disabled. */ instrumentation_begin(); - mce_log(m); + mce_log(err); instrumentation_end(); if (severity > *worst) { @@ -1387,8 +1404,9 @@ static void kill_me_never(struct callback_head *cb) set_mce_nospec(pfn); } -static void queue_task_work(struct mce *m, char *msg, void (*func)(struct callback_head *)) +static void queue_task_work(struct mce_hw_err *err, char *msg, void (*func)(struct callback_head *)) { + struct mce *m = &err->m; int count = ++current->mce_count; /* First call, save all the details */ @@ -1402,11 +1420,12 @@ static void queue_task_work(struct mce *m, char *msg, void (*func)(struct callba /* Ten is likely overkill. Don't expect more than two faults before task_work() */ if (count > 10) - mce_panic("Too many consecutive machine checks while accessing user data", m, msg); + mce_panic("Too many consecutive machine checks while accessing user data", + err, msg); /* Second or later call, make sure page address matches the one from first call */ if (count > 1 && (current->mce_addr >> PAGE_SHIFT) != (m->addr >> PAGE_SHIFT)) - mce_panic("Consecutive machine checks to different user pages", m, msg); + mce_panic("Consecutive machine checks to different user pages", err, msg); /* Do not call task_work_add() more than once */ if (count > 1) @@ -1455,8 +1474,14 @@ noinstr void do_machine_check(struct pt_regs *regs) int worst = 0, order, no_way_out, kill_current_task, lmce, taint = 0; DECLARE_BITMAP(valid_banks, MAX_NR_BANKS) = { 0 }; DECLARE_BITMAP(toclear, MAX_NR_BANKS) = { 0 }; - struct mce m, *final; + struct mce_hw_err *final; + struct mce_hw_err err; char *msg = NULL; + struct mce *m; + + memset(&err, 0, sizeof(struct mce_hw_err)); + + m = &err.m; if (unlikely(mce_flags.p5)) return pentium_machine_check(regs); @@ -1494,13 +1519,13 @@ noinstr void do_machine_check(struct pt_regs *regs) this_cpu_inc(mce_exception_count); - mce_gather_info(&m, regs); - m.tsc = rdtsc(); + mce_gather_info(m, regs); + m->tsc = rdtsc(); - final = this_cpu_ptr(&mces_seen); - *final = m; + final = this_cpu_ptr(&hw_errs_seen); + final->m = *m; - no_way_out = mce_no_way_out(&m, &msg, valid_banks, regs); + no_way_out = mce_no_way_out(&err, &msg, valid_banks, regs); barrier(); @@ -1509,15 +1534,15 @@ noinstr void do_machine_check(struct pt_regs *regs) * Assume the worst for now, but if we find the * severity is MCE_AR_SEVERITY we have other options. */ - if (!(m.mcgstatus & MCG_STATUS_RIPV)) + if (!(m->mcgstatus & MCG_STATUS_RIPV)) kill_current_task = 1; /* * Check if this MCE is signaled to only this logical processor, * on Intel, Zhaoxin only. */ - if (m.cpuvendor == X86_VENDOR_INTEL || - m.cpuvendor == X86_VENDOR_ZHAOXIN) - lmce = m.mcgstatus & MCG_STATUS_LMCES; + if (m->cpuvendor == X86_VENDOR_INTEL || + m->cpuvendor == X86_VENDOR_ZHAOXIN) + lmce = m->mcgstatus & MCG_STATUS_LMCES; /* * Local machine check may already know that we have to panic. @@ -1528,12 +1553,12 @@ noinstr void do_machine_check(struct pt_regs *regs) */ if (lmce) { if (no_way_out) - mce_panic("Fatal local machine check", &m, msg); + mce_panic("Fatal local machine check", &err, msg); } else { order = mce_start(&no_way_out); } - taint = __mc_scan_banks(&m, regs, final, toclear, valid_banks, no_way_out, &worst); + taint = __mc_scan_banks(&err, regs, &final->m, toclear, valid_banks, no_way_out, &worst); if (!no_way_out) mce_clear_state(toclear); @@ -1548,7 +1573,7 @@ noinstr void do_machine_check(struct pt_regs *regs) no_way_out = worst >= MCE_PANIC_SEVERITY; if (no_way_out) - mce_panic("Fatal machine check on current CPU", &m, msg); + mce_panic("Fatal machine check on current CPU", &err, msg); } } else { /* @@ -1560,8 +1585,8 @@ noinstr void do_machine_check(struct pt_regs *regs) * make sure we have the right "msg". */ if (worst >= MCE_PANIC_SEVERITY) { - mce_severity(&m, regs, &msg, true); - mce_panic("Local fatal machine check!", &m, msg); + mce_severity(m, regs, &msg, true); + mce_panic("Local fatal machine check!", &err, msg); } } @@ -1579,16 +1604,16 @@ noinstr void do_machine_check(struct pt_regs *regs) goto out; /* Fault was in user mode and we need to take some action */ - if ((m.cs & 3) == 3) { + if ((m->cs & 3) == 3) { /* If this triggers there is no way to recover. Die hard. */ BUG_ON(!on_thread_stack() || !user_mode(regs)); - if (!mce_usable_address(&m)) - queue_task_work(&m, msg, kill_me_now); + if (!mce_usable_address(m)) + queue_task_work(&err, msg, kill_me_now); else - queue_task_work(&m, msg, kill_me_maybe); + queue_task_work(&err, msg, kill_me_maybe); - } else if (m.mcgstatus & MCG_STATUS_SEAM_NR) { + } else if (m->mcgstatus & MCG_STATUS_SEAM_NR) { /* * Saved RIP on stack makes it look like the machine check * was taken in the kernel on the instruction following @@ -1600,8 +1625,8 @@ noinstr void do_machine_check(struct pt_regs *regs) * not occur there. Mark the page as poisoned so it won't * be added to free list when the guest is terminated. */ - if (mce_usable_address(&m)) { - struct page *p = pfn_to_online_page(m.addr >> PAGE_SHIFT); + if (mce_usable_address(m)) { + struct page *p = pfn_to_online_page(m->addr >> PAGE_SHIFT); if (p) SetPageHWPoison(p); @@ -1616,13 +1641,13 @@ noinstr void do_machine_check(struct pt_regs *regs) * corresponding exception handler which would do that is the * proper one. */ - if (m.kflags & MCE_IN_KERNEL_RECOV) { + if (m->kflags & MCE_IN_KERNEL_RECOV) { if (!fixup_exception(regs, X86_TRAP_MC, 0, 0)) - mce_panic("Failed kernel mode recovery", &m, msg); + mce_panic("Failed kernel mode recovery", &err, msg); } - if (m.kflags & MCE_IN_KERNEL_COPYIN) - queue_task_work(&m, msg, kill_me_never); + if (m->kflags & MCE_IN_KERNEL_COPYIN) + queue_task_work(&err, msg, kill_me_never); } out: diff --git a/arch/x86/kernel/cpu/mce/dev-mcelog.c b/arch/x86/kernel/cpu/mce/dev-mcelog.c index a05ac0716ecf..4a0e3bb4a4fb 100644 --- a/arch/x86/kernel/cpu/mce/dev-mcelog.c +++ b/arch/x86/kernel/cpu/mce/dev-mcelog.c @@ -36,7 +36,7 @@ static DECLARE_WAIT_QUEUE_HEAD(mce_chrdev_wait); static int dev_mce_log(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *mce = (struct mce *)data; + struct mce *mce = &((struct mce_hw_err *)data)->m; unsigned int entry; if (mce->kflags & MCE_HANDLED_CEC) diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c index 4284749ec803..3337ea5c428d 100644 --- a/arch/x86/kernel/cpu/mce/genpool.c +++ b/arch/x86/kernel/cpu/mce/genpool.c @@ -31,15 +31,15 @@ static LLIST_HEAD(mce_event_llist); */ static bool is_duplicate_mce_record(struct mce_evt_llist *t, struct mce_evt_llist *l) { + struct mce_hw_err *err1, *err2; struct mce_evt_llist *node; - struct mce *m1, *m2; - m1 = &t->mce; + err1 = &t->err; llist_for_each_entry(node, &l->llnode, llnode) { - m2 = &node->mce; + err2 = &node->err; - if (!mce_cmp(m1, m2)) + if (!mce_cmp(&err1->m, &err2->m)) return true; } return false; @@ -73,9 +73,9 @@ struct llist_node *mce_gen_pool_prepare_records(void) void mce_gen_pool_process(struct work_struct *__unused) { + struct mce_hw_err *err; struct llist_node *head; struct mce_evt_llist *node, *tmp; - struct mce *mce; head = llist_del_all(&mce_event_llist); if (!head) @@ -83,8 +83,8 @@ void mce_gen_pool_process(struct work_struct *__unused) head = llist_reverse_order(head); llist_for_each_entry_safe(node, tmp, head, llnode) { - mce = &node->mce; - blocking_notifier_call_chain(&x86_mce_decoder_chain, 0, mce); + err = &node->err; + blocking_notifier_call_chain(&x86_mce_decoder_chain, 0, err); gen_pool_free(mce_evt_pool, (unsigned long)node, sizeof(*node)); } } @@ -94,11 +94,11 @@ bool mce_gen_pool_empty(void) return llist_empty(&mce_event_llist); } -int mce_gen_pool_add(struct mce *mce) +int mce_gen_pool_add(struct mce_hw_err *err) { struct mce_evt_llist *node; - if (filter_mce(mce)) + if (filter_mce(&err->m)) return -EINVAL; if (!mce_evt_pool) @@ -110,7 +110,7 @@ int mce_gen_pool_add(struct mce *mce) return -ENOMEM; } - memcpy(&node->mce, mce, sizeof(*mce)); + memcpy(&node->err, err, sizeof(*err)); llist_add(&node->llnode, &mce_event_llist); return 0; diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c index 49ed3428785d..c65a5c4e2f22 100644 --- a/arch/x86/kernel/cpu/mce/inject.c +++ b/arch/x86/kernel/cpu/mce/inject.c @@ -502,6 +502,7 @@ static void prepare_msrs(void *info) static void do_inject(void) { + struct mce_hw_err err; u64 mcg_status = 0; unsigned int cpu = i_mce.extcpu; u8 b = i_mce.bank; @@ -517,7 +518,8 @@ static void do_inject(void) i_mce.status |= MCI_STATUS_SYNDV; if (inj_type == SW_INJ) { - mce_log(&i_mce); + err.m = i_mce; + mce_log(&err); return; } diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h index 01f8f03969e6..c79cb5b00e4c 100644 --- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -26,12 +26,12 @@ extern struct blocking_notifier_head x86_mce_decoder_chain; struct mce_evt_llist { struct llist_node llnode; - struct mce mce; + struct mce_hw_err err; }; void mce_gen_pool_process(struct work_struct *__unused); bool mce_gen_pool_empty(void); -int mce_gen_pool_add(struct mce *mce); +int mce_gen_pool_add(struct mce_hw_err *err); int mce_gen_pool_init(void); struct llist_node *mce_gen_pool_prepare_records(void); diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c index ca87a0939135..4864191918db 100644 --- a/drivers/acpi/acpi_extlog.c +++ b/drivers/acpi/acpi_extlog.c @@ -134,7 +134,7 @@ static int print_extlog_rcd(const char *pfx, static int extlog_print(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *mce = (struct mce *)data; + struct mce *mce = &((struct mce_hw_err *)data)->m; int bank = mce->bank; int cpu = mce->extcpu; struct acpi_hest_generic_status *estatus, *tmp; diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c index d48a388b796e..b917988db794 100644 --- a/drivers/acpi/nfit/mce.c +++ b/drivers/acpi/nfit/mce.c @@ -13,7 +13,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *mce = (struct mce *)data; + struct mce *mce = &((struct mce_hw_err *)data)->m; struct acpi_nfit_desc *acpi_desc; struct nfit_spa *nfit_spa; diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c index 91e0a88ef904..d1e47cba0ff2 100644 --- a/drivers/edac/i7core_edac.c +++ b/drivers/edac/i7core_edac.c @@ -1810,7 +1810,7 @@ static void i7core_check_error(struct mem_ctl_info *mci, struct mce *m) static int i7core_mce_check_error(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *mce = (struct mce *)data; + struct mce *mce = &((struct mce_hw_err *)data)->m; struct i7core_dev *i7_dev; struct mem_ctl_info *mci; diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c index 0fe75eed8973..d73e9f0600ee 100644 --- a/drivers/edac/igen6_edac.c +++ b/drivers/edac/igen6_edac.c @@ -919,7 +919,7 @@ static int ecclog_nmi_handler(unsigned int cmd, struct pt_regs *regs) static int ecclog_mce_handler(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *mce = (struct mce *)data; + struct mce *mce = &((struct mce_hw_err *)data)->m; char *type; if (mce->kflags & MCE_HANDLED_CEC) diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index 8130c3dc64da..c5fae99de781 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -792,7 +792,7 @@ static const char *decode_error_status(struct mce *m) static int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *m = (struct mce *)data; + struct mce *m = &((struct mce_hw_err *)data)->m; unsigned int fam = x86_family(m->cpuid); int ecc; diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c index f93f2f2b1cf2..a3008f6eb2b1 100644 --- a/drivers/edac/pnd2_edac.c +++ b/drivers/edac/pnd2_edac.c @@ -1366,7 +1366,7 @@ static void pnd2_unregister_mci(struct mem_ctl_info *mci) */ static int pnd2_mce_check_error(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *mce = (struct mce *)data; + struct mce *mce = &((struct mce_hw_err *)data)->m; struct mem_ctl_info *mci; struct dram_addr daddr; char *type; diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c index e5c05a876947..4dadb9975b23 100644 --- a/drivers/edac/sb_edac.c +++ b/drivers/edac/sb_edac.c @@ -3255,7 +3255,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci, static int sbridge_mce_check_error(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *mce = (struct mce *)data; + struct mce *mce = &((struct mce_hw_err *)data)->m; struct mem_ctl_info *mci; char *type; diff --git a/drivers/edac/skx_common.c b/drivers/edac/skx_common.c index 8d18099fd528..21f9d2c22c81 100644 --- a/drivers/edac/skx_common.c +++ b/drivers/edac/skx_common.c @@ -644,7 +644,7 @@ static bool skx_error_in_mem(const struct mce *m) int skx_mce_check_error(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *mce = (struct mce *)data; + struct mce *mce = &((struct mce_hw_err *)data)->m; struct decoded_addr res; struct mem_ctl_info *mci; char *type; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index d0307c55da50..fea085ef663e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -4158,7 +4158,7 @@ static struct amdgpu_device *find_adev(uint32_t node_id) static int amdgpu_bad_page_notifier(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *m = (struct mce *)data; + struct mce *m = &((struct mce_hw_err *)data)->m; struct amdgpu_device *adev = NULL; uint32_t gpu_id = 0; uint32_t umc_inst = 0, ch_inst = 0; diff --git a/drivers/ras/amd/fmpm.c b/drivers/ras/amd/fmpm.c index 90de737fbc90..78dd4b192992 100644 --- a/drivers/ras/amd/fmpm.c +++ b/drivers/ras/amd/fmpm.c @@ -400,7 +400,7 @@ static void retire_dram_row(u64 addr, u64 id, u32 cpu) static int fru_handle_mem_poison(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *m = (struct mce *)data; + struct mce *m = &((struct mce_hw_err *)data)->m; struct fru_rec *rec; if (!mce_is_memory_error(m)) diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c index e440b15fbabc..be785746f587 100644 --- a/drivers/ras/cec.c +++ b/drivers/ras/cec.c @@ -534,7 +534,7 @@ static int __init create_debugfs_nodes(void) static int cec_notifier(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *m = (struct mce *)data; + struct mce *m = &((struct mce_hw_err *)data)->m; if (!m) return NOTIFY_DONE; diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h index f0f7b3cb2041..65aba1afcd07 100644 --- a/include/trace/events/mce.h +++ b/include/trace/events/mce.h @@ -19,9 +19,9 @@ TRACE_EVENT(mce_record, - TP_PROTO(struct mce *m), + TP_PROTO(struct mce_hw_err *err), - TP_ARGS(m), + TP_ARGS(err), TP_STRUCT__entry( __field( u64, mcgcap ) @@ -46,25 +46,25 @@ TRACE_EVENT(mce_record, ), TP_fast_assign( - __entry->mcgcap = m->mcgcap; - __entry->mcgstatus = m->mcgstatus; - __entry->status = m->status; - __entry->addr = m->addr; - __entry->misc = m->misc; - __entry->synd = m->synd; - __entry->ipid = m->ipid; - __entry->ip = m->ip; - __entry->tsc = m->tsc; - __entry->ppin = m->ppin; - __entry->walltime = m->time; - __entry->cpu = m->extcpu; - __entry->cpuid = m->cpuid; - __entry->apicid = m->apicid; - __entry->socketid = m->socketid; - __entry->cs = m->cs; - __entry->bank = m->bank; - __entry->cpuvendor = m->cpuvendor; - __entry->microcode = m->microcode; + __entry->mcgcap = err->m.mcgcap; + __entry->mcgstatus = err->m.mcgstatus; + __entry->status = err->m.status; + __entry->addr = err->m.addr; + __entry->misc = err->m.misc; + __entry->synd = err->m.synd; + __entry->ipid = err->m.ipid; + __entry->ip = err->m.ip; + __entry->tsc = err->m.tsc; + __entry->ppin = err->m.ppin; + __entry->walltime = err->m.time; + __entry->cpu = err->m.extcpu; + __entry->cpuid = err->m.cpuid; + __entry->apicid = err->m.apicid; + __entry->socketid = err->m.socketid; + __entry->cs = err->m.cs; + __entry->bank = err->m.bank; + __entry->cpuvendor = err->m.cpuvendor; + __entry->microcode = err->m.microcode; ), TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR: %016Lx, MISC: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x", From patchwork Tue Jul 30 18:54:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Avadhut Naik X-Patchwork-Id: 13747799 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2056.outbound.protection.outlook.com [40.107.244.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A44CA18C90D; Tue, 30 Jul 2024 18:54:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.244.56 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722365682; cv=fail; b=se7vYBykYZl/woah6ABQa6mTNctHgpzVTLARzlKxKkP21GyHZa9WNn/KDPCI7tBgJJKb40kNxMQhaTIohKfOsOA4+2lYph6aMlIucok2yg59Va7U25x5NSqo1SjZtAj55/cDzGrORog2WK7uGIskfBQaXIA/LbVDFgLVHUk6NfE= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722365682; c=relaxed/simple; bh=/TGdEzfJG6fHGvfmyb0ol5L2Tf+4+bfmDz0nN4p/QgE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=KfKrwG24nMFE6wGzVUdcYLtuXJS4cL4m3fiCLIINxvn8LR+DjrXyy8a6Dm/Oa/wMSn1m5+pCv80GNwiJQFbZ01ebjD8mONW88wSDrpkJ56M9Ev98PQG+dduMpngqGUB+LMF6tvghr2sMH+uVaG4HjxhVSTSd6ACYvVosLsrVsGM= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=IkjBlqVs; arc=fail smtp.client-ip=40.107.244.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="IkjBlqVs" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=RkuIvM1fnzdb/Z439CWdVxhM0B1BFkvdYKlNBKnAXCx3e/yPPNF+SFDMQY5BHPNK9DC1DlsMvryn7nHcWgjWm00zgfAbmj8OwPCTQwhZlwpCopkHaSL9/OMhT6lgwH9XvgTuZAaMO9dGO0rxnoGWJampsu2uDP8G0kO0JD3NGoKzm9KOAuWqzaWkLn5r4vnSfugbuer1eZCB3NYrdeJQ2GWGJdWFqVyEKkVEaoq3bzgSqHoREm7EpAmESjoR9V8i6KeRXN2qNXwgHQFFfp0y9swwAuWkfGxCHRYrYh6jznLQ9cB/OOSWRY6JsLyZySxL7SjB6BayL+5fBGQAxzgWdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=e3n+Gagvt6fKsxtxxZUG9c7b7tjl2i85G9FN0gv9XLk=; b=wgLEssULPs4jBj4kjkUW8u+OiLMEQ6yXhUpuqHCuBEvyKKhPPTd/pOidc/vqTozm5GDPcoqrkklFD69F3TakC42XKwMddwpDaqMw+zW/0Ox49XSMpWcrdeHrYspLx1AgTJ2epm7FdXvtF5+xCdnmhFFOAb3S9JcB/tsisECYVS8hlxwqlNrZsrHEWcbaDYr4N+9D9FbLSVDpu34CRL08MC5U55HvEJClsVwYwhbWvWIrH+KgJZSTino0rn6WcXIYM3OB1xrvrZ3e2aWkkHe3eGG05PW/D/UvpfsihYmWFulpx6cX4fb+MpO9qy6dkRfrQ5jksVlqeAX43B83zY+rVg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=e3n+Gagvt6fKsxtxxZUG9c7b7tjl2i85G9FN0gv9XLk=; b=IkjBlqVsJouO34Kx5DxMIKn4bfOpiksQpkJuuYghhS/dy01sJJlwgief7IZ63LTFu4OAbZoG//kzOoUdtWZBoXSeCbYWXrP49ANad1jIEfCKy/PUPnQtgTwPn5xqmqDIEl1MmUEJeZSgmHpACak961ObuMLr3qrVauxUzlqG6Lk= Received: from CH2PR17CA0026.namprd17.prod.outlook.com (2603:10b6:610:53::36) by BY5PR12MB4049.namprd12.prod.outlook.com (2603:10b6:a03:201::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7807.28; Tue, 30 Jul 2024 18:54:37 +0000 Received: from CH1PEPF0000AD75.namprd04.prod.outlook.com (2603:10b6:610:53:cafe::61) by CH2PR17CA0026.outlook.office365.com (2603:10b6:610:53::36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.34 via Frontend Transport; Tue, 30 Jul 2024 18:54:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CH1PEPF0000AD75.mail.protection.outlook.com (10.167.244.54) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7828.19 via Frontend Transport; Tue, 30 Jul 2024 18:54:36 +0000 Received: from titanite-d354host.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 30 Jul 2024 13:54:35 -0500 From: Avadhut Naik To: , , , CC: , , , , , , , , , , , , , Subject: [PATCH v3 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers Date: Tue, 30 Jul 2024 13:54:04 -0500 Message-ID: <20240730185406.3709876-3-avadhut.naik@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240730185406.3709876-1-avadhut.naik@amd.com> References: <20240730185406.3709876-1-avadhut.naik@amd.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH1PEPF0000AD75:EE_|BY5PR12MB4049:EE_ X-MS-Office365-Filtering-Correlation-Id: 6f4d02e2-4c80-4852-4bb7-08dcb0c90f00 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|376014|7416014|1800799024|36860700013; X-Microsoft-Antispam-Message-Info: MtxmP5aqNwXu/HbvxG/xSoERFIO5IkV/fV2V11lt//oLeNh6Tc0aGgN8D0bSYoR/2AxTPch3A2OEis6eEt2I7FPckbnSJdW6HPE+cs75tHWfMc6hXRt+5iaU30j582ZDcIe7Wd8MPj0WwoO+LIuK2i4+0fk5/vEBe5/8+VLuM+c8Ba803kqL32LzI+5rWu+gYx79k2u6LsvsWQXNtD2f0R9Pos93eGIVcIqXZmYMQoHieaYit/VJ5QtZFqNsOkgK7hL8VKFXsNPOasZKfdm0ltgB0wAnRshVbfHY1OuX6SNTd4T5+Gju1/t34Lg/SOUUkN1WMX+PTV7AhdLopLsOrj0WgByiJ7gLAG641I6ext8DdjlOSWHBScjSii73TkdNrtzfsX3dM+aI2namUJqb8yYnXabwwhLaGbJow3LuUfvu8OjNu5kqKjLZubV1t7A4TV7uM1MP4Dw8d27gIXymsvGXoSYFBcuocE6+6A+eysS09wDL8REKz8pVZCWgN7M/mJAMcGxPlRieeiJSGtNfSQz4zuZ7E/nvtXvIekmIngb4fN6xADdrtM05MHlPpuspWEVaFAXqHWn9opmzWjEDSlviLUaVCCqxeHDttLqgeHrRQvN6jStftoIeEpRAy31G6hAb4ITwXFPV9VU0DE7buoeE1+MGPbxxa1tSHsmTAcJxILg2kCZl8kGXg22oc7Y2nje3+SoOXpaJs21qo+P/0TbiDVzFXuNIjvh+lOHE6vG4QqKTHinSxvxLwBfY0GcDbfqBuLbSHi5FrVa9NOVsZC4Hd7PrlWs3BP+jVWXgseeBjX5ceyBxRxHL6C/PM47hp0lnfkOZOgERIWUl6geged3UzqT8kUbHS2Gl7TBzqjOPxJLshbtBTyhY5uuEE5sZkwLKgFULl77kI+XFSvIfgB1k3yyLCfWEAEiaHuMV2BuZTrRBBBKgthzYHyrDn2qaxlVpqOZo3JnQsWKBx9z8cQyukEpFDJwm2n+AUJwvPjL/mRoJHDm4NQO+Ug6fREkPpL5bOUBitnJHTGGRpxawTI+sP+7lrNjSVySDqLJ3nD0mvJecLFtwrbpsu7SoX5UBP1vEDOTpS3bAbLkLf077ky5rfyLngMsAkP3iZkoIpAsbBS8UhQZCpcbgaN89Kqb5V3KA2PWwpdSyoSkC20Mnay6Mv0q8x1opf1RGdsxAi15TdARftijvDpOKW+4eSKfHgSOvoWzM9I72e9s0g6wbSPv2XAa1pQlylvIYb9D+gACVlsbh5GhasRwR94b23ih62aqDwiAiL+LL3IA4kBzE7GqzUih1rPa2uJouyWpIKQORg3mKkqN91h0E67N68urQzTFcndQX6Ep5Onlx0jUtJrWYv0lwzqpCmUCMvsNkAglkXruVC2QUhTF7UVvm4Gej9DkWDCqAqmv3FahgnZQAcQ== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(82310400026)(376014)(7416014)(1800799024)(36860700013);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Jul 2024 18:54:36.5422 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6f4d02e2-4c80-4852-4bb7-08dcb0c90f00 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CH1PEPF0000AD75.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB4049 Starting with Zen4, AMD's Scalable MCA systems incorporate two new registers: MCA_SYND1 and MCA_SYND2. These registers will include supplemental error information in addition to the existing MCA_SYND register. The data within these registers is considered valid if MCA_STATUS[SyndV] is set. Userspace error decoding tools like the rasdaemon gather related hardware error information through the tracepoints. As such, these two registers should be exported through the mce_record tracepoint so that tools like rasdaemon can parse them and output the supplemental error information like FRU Text contained in them. [Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.] Signed-off-by: Avadhut Naik Signed-off-by: Yazen Ghannam Signed-off-by: Avadhut Naik --- Changes in v2: [1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-yazen.ghannam@amd.com/ [2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-yazen.ghannam@amd.com/ 1. Drop dependencies on sets [1] and [2] above and rebase on top of tip/master. Changes in v3: 1. Move wrapper changes required in mce_read_aux() and mce_no_way_out() from this patch to the first patch. 2. Add comments to explain the new wrapper's purpose. 3. Modify commit message per feedback received. 4. Fix SoB chain to properly reflect the patch path. --- arch/x86/include/asm/mce.h | 22 ++++++++++++++++++++++ arch/x86/include/uapi/asm/mce.h | 3 ++- arch/x86/kernel/cpu/mce/amd.c | 5 ++++- arch/x86/kernel/cpu/mce/core.c | 9 ++++++++- drivers/edac/mce_amd.c | 10 +++++++--- include/trace/events/mce.h | 9 +++++++-- 6 files changed, 50 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index ba2b3a5f999e..a5be7463c78a 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -122,6 +122,9 @@ #define MSR_AMD64_SMCA_MC0_DESTAT 0xc0002008 #define MSR_AMD64_SMCA_MC0_DEADDR 0xc0002009 #define MSR_AMD64_SMCA_MC0_MISC1 0xc000200a +/* Registers MISC2 to MISC4 are at offsets B to D. */ +#define MSR_AMD64_SMCA_MC0_SYND1 0xc000200e +#define MSR_AMD64_SMCA_MC0_SYND2 0xc000200f #define MSR_AMD64_SMCA_MCx_CTL(x) (MSR_AMD64_SMCA_MC0_CTL + 0x10*(x)) #define MSR_AMD64_SMCA_MCx_STATUS(x) (MSR_AMD64_SMCA_MC0_STATUS + 0x10*(x)) #define MSR_AMD64_SMCA_MCx_ADDR(x) (MSR_AMD64_SMCA_MC0_ADDR + 0x10*(x)) @@ -132,6 +135,8 @@ #define MSR_AMD64_SMCA_MCx_DESTAT(x) (MSR_AMD64_SMCA_MC0_DESTAT + 0x10*(x)) #define MSR_AMD64_SMCA_MCx_DEADDR(x) (MSR_AMD64_SMCA_MC0_DEADDR + 0x10*(x)) #define MSR_AMD64_SMCA_MCx_MISCy(x, y) ((MSR_AMD64_SMCA_MC0_MISC1 + y) + (0x10*(x))) +#define MSR_AMD64_SMCA_MCx_SYND1(x) (MSR_AMD64_SMCA_MC0_SYND1 + 0x10*(x)) +#define MSR_AMD64_SMCA_MCx_SYND2(x) (MSR_AMD64_SMCA_MC0_SYND2 + 0x10*(x)) #define XEC(x, mask) (((x) >> 16) & mask) @@ -190,9 +195,26 @@ enum mce_notifier_prios { /** * struct mce_hw_err - Hardware Error Record. * @m: Machine Check record. + * @vendor: Vendor-specific error information. + * + * Vendor-specific fields should not be added to struct mce. + * Instead, vendors should export their vendor-specific data + * through their structure in the vendor union below. + * + * AMD's vendor data is parsed by error decoding tools for + * supplemental error information. Thus, current offsets of + * existing fields must be maintained. + * Only add new fields at the end of AMD's vendor structure. */ struct mce_hw_err { struct mce m; + + union vendor_info { + struct { + u64 synd1; /* MCA_SYND1 MSR */ + u64 synd2; /* MCA_SYND2 MSR */ + } amd; + } vendor; }; struct notifier_block; diff --git a/arch/x86/include/uapi/asm/mce.h b/arch/x86/include/uapi/asm/mce.h index db9adc081c5a..cb6b48a7c22b 100644 --- a/arch/x86/include/uapi/asm/mce.h +++ b/arch/x86/include/uapi/asm/mce.h @@ -8,7 +8,8 @@ /* * Fields are zero when not available. Also, this struct is shared with * userspace mcelog and thus must keep existing fields at current offsets. - * Only add new fields to the end of the structure + * Only add new, shared fields to the end of the structure. + * Do not add vendor-specific fields. */ struct mce { __u64 status; /* Bank's MCi_STATUS MSR */ diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index cb7dc0b1aa50..a2a5fb940bb6 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -799,8 +799,11 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc) if (mce_flags.smca) { rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid); - if (m->status & MCI_STATUS_SYNDV) + if (m->status & MCI_STATUS_SYNDV) { rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd); + rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(bank), err.vendor.amd.synd1); + rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(bank), err.vendor.amd.synd2); + } } mce_log(&err); diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index ab9f1d606438..c8089d7a8e9b 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -189,6 +189,10 @@ static void __print_mce(struct mce_hw_err *err) if (mce_flags.smca) { if (m->synd) pr_cont("SYND %llx ", m->synd); + if (err->vendor.amd.synd1) + pr_cont("SYND1 %llx ", err->vendor.amd.synd1); + if (err->vendor.amd.synd2) + pr_cont("SYND2 %llx ", err->vendor.amd.synd2); if (m->ipid) pr_cont("IPID %llx ", m->ipid); } @@ -664,8 +668,11 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i) if (mce_flags.smca) { m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i)); - if (m->status & MCI_STATUS_SYNDV) + if (m->status & MCI_STATUS_SYNDV) { m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i)); + err->vendor.amd.synd1 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(i)); + err->vendor.amd.synd2 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(i)); + } } } diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index c5fae99de781..aea68999c849 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -792,7 +792,8 @@ static const char *decode_error_status(struct mce *m) static int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *m = &((struct mce_hw_err *)data)->m; + struct mce_hw_err *err = (struct mce_hw_err *)data; + struct mce *m = &err->m; unsigned int fam = x86_family(m->cpuid); int ecc; @@ -850,8 +851,11 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data) if (boot_cpu_has(X86_FEATURE_SMCA)) { pr_emerg(HW_ERR "IPID: 0x%016llx", m->ipid); - if (m->status & MCI_STATUS_SYNDV) - pr_cont(", Syndrome: 0x%016llx", m->synd); + if (m->status & MCI_STATUS_SYNDV) { + pr_cont(", Syndrome: 0x%016llx\n", m->synd); + pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx", + err->vendor.amd.synd1, err->vendor.amd.synd2); + } pr_cont("\n"); diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h index 65aba1afcd07..1e7d5696b3ba 100644 --- a/include/trace/events/mce.h +++ b/include/trace/events/mce.h @@ -43,6 +43,8 @@ TRACE_EVENT(mce_record, __field( u8, bank ) __field( u8, cpuvendor ) __field( u32, microcode ) + __field( u8, len ) + __dynamic_array(u8, v_data, sizeof(err->vendor)) ), TP_fast_assign( @@ -65,9 +67,11 @@ TRACE_EVENT(mce_record, __entry->bank = err->m.bank; __entry->cpuvendor = err->m.cpuvendor; __entry->microcode = err->m.microcode; + __entry->len = sizeof(err->vendor); + memcpy(__get_dynamic_array(v_data), &err->vendor, sizeof(err->vendor)); ), - TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR: %016Lx, MISC: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x", + TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016llx, IPID: %016llx, ADDR: %016llx, MISC: %016llx, SYND: %016llx, RIP: %02x:<%016llx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x, vendor data: %s", __entry->cpu, __entry->mcgcap, __entry->mcgstatus, __entry->bank, __entry->status, @@ -83,7 +87,8 @@ TRACE_EVENT(mce_record, __entry->walltime, __entry->socketid, __entry->apicid, - __entry->microcode) + __entry->microcode, + __print_array(__get_dynamic_array(v_data), __entry->len / 8, 8)) ); #endif /* _TRACE_MCE_H */ From patchwork Tue Jul 30 18:54:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Avadhut Naik X-Patchwork-Id: 13747800 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2088.outbound.protection.outlook.com [40.107.223.88]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4D95E18C900; Tue, 30 Jul 2024 18:54:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.223.88 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722365693; cv=fail; b=rUuKXLPxVCpc+vfvabNpXiLdgyK4C0MxwslUpHKFR3CFa+4x7gIyD+8Gd0vsAmpuoetxbLo2cv5ZPDmirCTIPfPhSOOTlF0ZuPw7apAi5BcUNKbjqmIjxs//cV3/YQdPHIuMYl6lVxekPWymwxbfys4EfLLtottqWq/JubEr8aU= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722365693; c=relaxed/simple; bh=A9AiBIhNB4emx/K6xc8akiVZcyQmssGw1Ja58JzbpJE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=F+m8PxgigKsiHqgsE4Ftv/XQTHw6qNzo/5iSGKmn4OfeRaTY6lSH1zhmlJz56voQ2VHY18jlXZpEE0f34EiTpaauzpUFU/Oc94ciLpYxPLqay9bo26L9p1gpU6V6ZN/NAzSIlQ7xKNLNUT39zZWjPxLz8URicORAv2P0b0YnaJQ= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=J+YemTIt; arc=fail smtp.client-ip=40.107.223.88 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="J+YemTIt" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=IcBhH5gDQKtvLM+3RVgPVCItWAWX2Y9lwVfzCZ/35ftxNXuVjvpSSPgD7qF4cx2eoBQ00MQ9otz5VwhsRLqjQiNXIzObkfwjvRZrg8LceoB0mZMykCoeiYKpCBSBiC+wcCv4x+uQYTfCGNk3no2k6NKy7qYkt09T8srYwX7JUUX1bKLjr6QTX75J9zu8lRrJ2BP/gSurTgLcD726smEbs6XrmaDH3cYSsCCPO41TJlIe2DxaRPpEpS7G0JynsekrjXiQ0qT98uhKhyRJ54NLOva8CPXMCHNHVWpCJ62uX85tgMEer5a9xGUE1djcVVQ7oPs/RjGyqQInA4meUV8qaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=bOiU/RQiLyr76l6EQEz9SQV+TWwZAGxvTx701NtM2jc=; b=CuI2Y0XuJ7ROqIMFVzeqh6tbKmyPQKtfMssRy6vLX9daX/VhNDMjhGZOLhUokD0Rq+2QqZXQub7psAgNTG+4rx5iRwSeSAAf1zRSGQYEZMe7vLgt/Fpb194mXgzPHnG2IUej9PNVwiueqVJMvOcsUxfr0GYsUSdWRZV7EoDcdSi2bgJJlGFnJwZ1EeMlcNP+AY/ztbydQpooydQEWmAhYC0lyNhywaPq031gdSxBRIQhEjOueShj/SL3DpHwZ7CR5Y36jMWViCxqVhVJv1KcUF5x+O5Ar1sg6mAz3iBYB8K2uCrRUjzHuoMoV2WdOOPWLR5jLU5JI3Vzojns8Y4GcQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bOiU/RQiLyr76l6EQEz9SQV+TWwZAGxvTx701NtM2jc=; b=J+YemTItrnVgKS/I86RxKWzdu7ubVsokc9FaNkt+lMuJNlC+PGJQQVNkuVV3RJUDiNgRMDcoWzwyilL+Z0//YkB/N8IUD/uVHUm7xdXC094xkhRF8xxHRFIJRKk2SxdgfDYBXDO8NQy7Lph7zjf4PL2Jj1wZdFAvPb7qTBtr58M= Received: from CH0PR13CA0004.namprd13.prod.outlook.com (2603:10b6:610:b1::9) by DS7PR12MB5741.namprd12.prod.outlook.com (2603:10b6:8:70::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7807.27; Tue, 30 Jul 2024 18:54:48 +0000 Received: from CH1PEPF0000AD7A.namprd04.prod.outlook.com (2603:10b6:610:b1:cafe::a8) by CH0PR13CA0004.outlook.office365.com (2603:10b6:610:b1::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7828.21 via Frontend Transport; Tue, 30 Jul 2024 18:54:48 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CH1PEPF0000AD7A.mail.protection.outlook.com (10.167.244.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7828.19 via Frontend Transport; Tue, 30 Jul 2024 18:54:48 +0000 Received: from titanite-d354host.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 30 Jul 2024 13:54:46 -0500 From: Avadhut Naik To: , , , CC: , , , , , , , , , , , , , Subject: [PATCH v3 3/4] x86/mce/apei: Handle variable register array size Date: Tue, 30 Jul 2024 13:54:05 -0500 Message-ID: <20240730185406.3709876-4-avadhut.naik@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240730185406.3709876-1-avadhut.naik@amd.com> References: <20240730185406.3709876-1-avadhut.naik@amd.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH1PEPF0000AD7A:EE_|DS7PR12MB5741:EE_ X-MS-Office365-Filtering-Correlation-Id: 4199e651-b529-42f5-09e7-08dcb0c915da X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700013|1800799024|376014|82310400026|7416014; X-Microsoft-Antispam-Message-Info: OLG4YDvG5FvL5UMgOj+fE1rSrHFMD8m7zKQnNURMTZ0mVzlArHv8c+VvPEqrtq3YhDHA9P+Ab8xyv8tITmzO+jJZI0vftZDHa90UFOjIaMGFW2higa3jzsKIoFwZllfdXrk+UvE5gZOqXN44SsQQeYIDDUHqyrDWRqj07yIQJVKVX2abqzDNAykSH5iXgCtpQIDRFXRYhKgPPqE8ilpHUO2oFeExPEE+LHSA5YOvkcBfTiPJdBMbPZPs+8iWdQdf5fxodenf4XoInmnyHwCRkmgOJzqbWGTB3FI2pztnSzvflxnG/7hdf012T+DC7sTaK8MbLtw4PJHjGg8ONCLNXvJGHlGIkEw9e0F/ehMx3375FUKynjTUHaTziNVezhz0WgDXekbReZxb2q7WNF2nlkWKHv+ZSUkzK8ukkjjPOqKaAaUn+DmzCC6cpTj32wKstlqZZxr7LxLVEjyQeNZQCSoGCMpaw9UUSNAZ7cpwxvCHuol9yqHx9RYKcObUlKoAT+dkMYDPHVUJgiTowULCYv1g+kkCp1h927rieXKdC9YaUz3sSEm4QR+aKcUQbm9AEZeU4X9teIkSE8SnIHUbP4u1flrGash1RPHJ88TDSEDpEJXGCUK2bA629N8mWYQN3RuZ8Pl8oSXUyfMHbj1eniDu5TiM+xpgOuTvXLPhN+WbeooU62ESTQJuysJEVNx2Zo1AuUigAwH8ukEGL2/cYsLPNCcTPnD0uMI60YlLjnHKS+4Ua2WeQGkCnk2nupp3fnbA+hTUayGBO+Otd/Tv4O3V84vYTl89DadjRVQjt5I8hEy+j5FKLnhxCjoWKmrk1IFA7T56onTMxueaKPhSzVT9+qZ4xQYEMFbpKR2RMhHLrfUkF0ROQ9fGkQwmKgScfyhSeLSPZxRRuWCoyv+ztlrTdN99ocimHz1Y45EWHKlYNOG+nTawFanofU6QOtU6RIsasn4lFFzZ80w5kU+Ns4oDLKvby193BN7ZJOtBiu++sCTcYRYUx2us+QH1XT9rQFJTFCyilYQxpjAXhMQQDNid1wQ159I7VCp2Q4ScbNtP75kXg9a+3e0rIAr8rXs7vE/VXBehU/3v1e4fQfR7jlTSL788sIasqRfv4A+vFNtcKymuhUqrJkhxQKEaHnOPDIHOuGaZBtcihDOUAtp8+i/GbU8mZfL9e/pvMPDBXsxKA4bcsYUvubgtYw9AS87lbJ0eb9BYWUNKRkDILpQ2q4udxy6nn0TmYAd0brvJLVc4DH4c6ymROZNk5EHRyqNgN4mcEWEdXQYDjWyFFcBDbXGgLqckO2HDkUaBv78bkiXslUI72U6EjhQ/Kb/RPSNrEBN/elbfF2E3N48bziikJ1DYc/nYCVi/5Txbn9sZMxr6rTqCDQADpzmS4mf4j+TfAIDosz2+6iE4aB3lhhSKuA== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(36860700013)(1800799024)(376014)(82310400026)(7416014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Jul 2024 18:54:48.0512 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4199e651-b529-42f5-09e7-08dcb0c915da X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CH1PEPF0000AD7A.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB5741 From: Yazen Ghannam ACPI Boot Error Record Table (BERT) is being used by the kernel to report errors that occurred in a previous boot. On some modern AMD systems, these very errors within the BERT are reported through the x86 Common Platform Error Record (CPER) format which consists of one or more Processor Context Information Structures. These context structures provide a starting address and represent an x86 MSR range in which the data constitutes a contiguous set of MSRs starting from, and including the starting address. It's common, for AMD systems that implement this behavior, that the MSR range represents the MCAX register space used for the Scalable MCA feature. The apei_smca_report_x86_error() function decodes and passes this information through the MCE notifier chain. However, this function assumes a fixed register size based on the original HW/FW implementation. This assumption breaks with the addition of two new MCAX registers viz. MCA_SYND1 and MCA_SYND2. These registers are added at the end of the MCAX register space, so they won't be included when decoding the CPER data. Rework apei_smca_report_x86_error() to support a variable register array size. This covers any case where the MSR context information starts at the MCAX address for MCA_STATUS and ends at any other register within the MCAX register space. Add code comments indicating the MCAX register at each offset. [Yazen: Add Avadhut as co-developer for wrapper changes.] Co-developed-by: Avadhut Naik Signed-off-by: Avadhut Naik Signed-off-by: Yazen Ghannam Signed-off-by: Avadhut Naik --- Changes in v2: [1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-yazen.ghannam@amd.com/ [2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-yazen.ghannam@amd.com/ 1. Drop dependencies on sets [1] and [2] above and rebase on top of tip/master. Changes in v3: 1. Incorporate suggested touchup. 2. Fix SoB chain to properly reflect the patch path. --- arch/x86/kernel/cpu/mce/apei.c | 72 +++++++++++++++++++++++++++------- 1 file changed, 58 insertions(+), 14 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index b8f4e75fb8a7..5949fc103be4 100644 --- a/arch/x86/kernel/cpu/mce/apei.c +++ b/arch/x86/kernel/cpu/mce/apei.c @@ -69,9 +69,9 @@ EXPORT_SYMBOL_GPL(apei_mce_report_mem_error); int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) { const u64 *i_mce = ((const u64 *) (ctx_info + 1)); + unsigned int cpu, num_regs; struct mce_hw_err err; struct mce *m = &err.m; - unsigned int cpu; memset(&err, 0, sizeof(struct mce_hw_err)); @@ -91,16 +91,12 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) return -EINVAL; /* - * The register array size must be large enough to include all the - * SMCA registers which need to be extracted. - * * The number of registers in the register array is determined by * Register Array Size/8 as defined in UEFI spec v2.8, sec N.2.4.2.2. - * The register layout is fixed and currently the raw data in the - * register array includes 6 SMCA registers which the kernel can - * extract. + * Sanity-check registers array size. */ - if (ctx_info->reg_arr_size < 48) + num_regs = ctx_info->reg_arr_size >> 3; + if (!num_regs) return -EINVAL; mce_setup(m); @@ -118,12 +114,60 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) m->apicid = lapic_id; m->bank = (ctx_info->msr_addr >> 4) & 0xFF; - m->status = *i_mce; - m->addr = *(i_mce + 1); - m->misc = *(i_mce + 2); - /* Skipping MCA_CONFIG */ - m->ipid = *(i_mce + 4); - m->synd = *(i_mce + 5); + + /* + * The SMCA register layout is fixed and includes 16 registers. + * The end of the array may be variable, but the beginning is known. + * Cap the number of registers to expected max (15). + */ + if (num_regs > 15) + num_regs = 15; + + switch (num_regs) { + /* MCA_SYND2 */ + case 15: + err.vendor.amd.synd2 = *(i_mce + 14); + fallthrough; + /* MCA_SYND1 */ + case 14: + err.vendor.amd.synd1 = *(i_mce + 13); + fallthrough; + /* MCA_MISC4 */ + case 13: + /* MCA_MISC3 */ + case 12: + /* MCA_MISC2 */ + case 11: + /* MCA_MISC1 */ + case 10: + /* MCA_DEADDR */ + case 9: + /* MCA_DESTAT */ + case 8: + /* reserved */ + case 7: + /* MCA_SYND */ + case 6: + m->synd = *(i_mce + 5); + fallthrough; + /* MCA_IPID */ + case 5: + m->ipid = *(i_mce + 4); + fallthrough; + /* MCA_CONFIG */ + case 4: + /* MCA_MISC0 */ + case 3: + m->misc = *(i_mce + 2); + fallthrough; + /* MCA_ADDR */ + case 2: + m->addr = *(i_mce + 1); + fallthrough; + /* MCA_STATUS */ + case 1: + m->status = *i_mce; + } mce_log(&err); From patchwork Tue Jul 30 18:54:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Avadhut Naik X-Patchwork-Id: 13747801 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2081.outbound.protection.outlook.com [40.107.93.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 062E518CC0B; Tue, 30 Jul 2024 18:55:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.93.81 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722365707; cv=fail; b=oURwOg1O/4MXAhC0QHqm0n2dX17ArwN08XgVW1sVaPNiBx8VUUK34kSHJr3CQcD7f8J1/phxMD20Crfa2ybjOtNLC1/SCE+CEbULtv0alG2Dq8St3bSkGa7Ov46XS4nGLwJ13rx9hMDrzlKh+LhVpVmV8tDqyyI8ViPkgglPIog= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722365707; c=relaxed/simple; bh=0v0C+HGzIAxIvHr8Ni7NCkjeGpGOK7r392nLJnXCdxs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Nmcd/Eo8Wc5bvrVc2D6SMNMXjTFcXMS6wvY99YGk3U5WF7q1e+9lhByb7URdrKuYkoD36w7bTvaSYYiS8GTmuGdCJllgbZDsEEwXfEkjn0inCi0ckfr2dPrzBM8LMv9nII+4YZR4WCzir5pvRUKJqCR1feDUOOzoBzSzAQSh1XY= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=3jD3241p; arc=fail smtp.client-ip=40.107.93.81 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="3jD3241p" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=cTrQt5ZHWN1UCC8FOg+hkgW5ZoMGoyy29GbZzk/O+mkbCS1yHpEFkl7zOhgWyoIRkTAMp2fsDvY8p3LUiDOkseIV3V0/5UaTZsTRE8Z0XmLV6Bc5Svlg6XKnfjn+Z6lISKt8C2Kv5/rG6RHO7akZqAwTjReFPmjBFtFYTU91hShCHPyXtLaxHqlYwX0XCVflZYCO57n7b/QmqMTVPV+mUQkm8tQc8jHk/x4sPWlFJjiYtutcWOxdbzoPzWXoA+tJPxHZXownYCc5lh5VBPmEx8c8Bg8WUR8ITHN2yuhahojWRWj2m2+Lj7UWhlWP35omngbePQdiEvd9/+q0C1H7kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=XmQRvxxMB3CItzDn44urORtpgb/R+cUfZE9EtJUONmY=; b=YKPDUK5hF0twc6z5tcAKnU3Nxfq09dBFwv9ncgwgrybx4cXLYrm8BVzI/EPOqEQysI0rqypLJ2IP5F8SUUh92KpBYQBctkKkwnBTswHmLz+0T/fvOFNasLGGmrTDEvut4yz1LHAINJwdq0ONxrW7QV+Q9JAzb4v4IKJ8xo9bCH333fzAhgnaWjEzjSZAg+jfrar12F4JpSNb7VkNUN07s9dWVnS3Lm5I4Qo58ugvcLcVd+1Pc9RNGgLbqs+EpscFTwU+YZKHo3j8cki1shjWb6XLezPvGg7igdP9AX6e0xkL/lKWlGn1gpBdYHiFKFqkMXlI2pbcHh9Sqo06yAYqPg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XmQRvxxMB3CItzDn44urORtpgb/R+cUfZE9EtJUONmY=; b=3jD3241pRCG0DG4HRMoGL8epStmPa7K3n9DwysJMsGjmgInvi5i7wXDYX/VgSiYaiUdB9Cw7FTG2/ESuigy6PznOOY5eb+qlkGqBVkBWY532ifIRQN2vh8jzuSXi5eERou8xoZUBgSI46boFJBhaOep1bN6Q/92hm7f8VxxIC+A= Received: from CH2PR17CA0022.namprd17.prod.outlook.com (2603:10b6:610:53::32) by CH3PR12MB7739.namprd12.prod.outlook.com (2603:10b6:610:151::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7807.28; Tue, 30 Jul 2024 18:54:59 +0000 Received: from CH1PEPF0000AD75.namprd04.prod.outlook.com (2603:10b6:610:53:cafe::8a) by CH2PR17CA0022.outlook.office365.com (2603:10b6:610:53::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.35 via Frontend Transport; Tue, 30 Jul 2024 18:54:59 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CH1PEPF0000AD75.mail.protection.outlook.com (10.167.244.54) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7828.19 via Frontend Transport; Tue, 30 Jul 2024 18:54:59 +0000 Received: from titanite-d354host.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 30 Jul 2024 13:54:58 -0500 From: Avadhut Naik To: , , , CC: , , , , , , , , , , , , , Subject: [PATCH v3 4/4] EDAC/mce_amd: Add support for FRU Text in MCA Date: Tue, 30 Jul 2024 13:54:06 -0500 Message-ID: <20240730185406.3709876-5-avadhut.naik@amd.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240730185406.3709876-1-avadhut.naik@amd.com> References: <20240730185406.3709876-1-avadhut.naik@amd.com> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH1PEPF0000AD75:EE_|CH3PR12MB7739:EE_ X-MS-Office365-Filtering-Correlation-Id: 0a012713-dee7-49e7-bd10-08dcb0c91ca2 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|36860700013|82310400026|376014|7416014; X-Microsoft-Antispam-Message-Info: WoF4udGN1FvNWxdnbuEQxjEfDzPparIvYcGt1Ga2RIG+4L/IdVZ88tJBP+LzYV4gByedy+OfDUrX8oGtnGdf49enWLIeJvygMgxZbYDb0hgHoqHSOPM/eoAC+mcD40vFDwDnGTJ0BndlZOcYdVRUHSep4aIxgUtI1IZcUBMR6CdCZK0VrIjyRvk12KQNt++6dG7rGLjNaWpqcTveIeMeQTp36Dbd3yvVE0xUvyeox6Cqnh1ALcOPiRTw9XfheOXTOuudYrofVHBZC6cblkpM7RIDPCcRR4y7nEtrVqO85L+TcOSHrhDhMt+k95Mpt/B3ZQtOHApE0uxEgYParJ9gQA3LTzoKM0qDUWqqJontCgXFE+m8FJBTFmmzskGkHQsmKGlQ91N1qIwb9SyDKYzU8Z8cH+flzo3TbgpSZsR6LyF692QJbhV8/p9fq+pU75FqjQVb6YlQMcwPViulxwxfgdCnVz0I4LYGLEVTGiI4KiTrN2W5B3KEphi0dFTqjG1jlilPDKZn0DuQFxZFZj8KknSYAv2ipANvFWJLiqCZQn5H9vgiELmXMQnBdmMmvsl3jpGxWKcntStwFF160Z6laFgw6OJ+IAxt3mtMkOn1/UTZ1PNi1eQFthv/BpGHLuramMwjqcKQ1w92rmMPN7eKVWgY5EUSUqYrk0xEkjPTBAdPA5rBs3V8tA6dAvbwwOQAW3ngAvshiXYs0IuFSJcCV7w5t180Y0Anxo64/b2h1+LaxQva79dgagri4rQh5W75Mv5S8X0NjvdXGgz7yLSHzjkJNdKgv7nYF+FIdu+VrK2rGhc7YOUqvRxWwbHwiHeF+esOiMnmdmrQMBN3CjRhOpdrhG8Vv5aKuxG04mx7glFgfzJqFzuwEbbUdX2S3Ff/3m6nVqP+//PdqAhhZUctPKNPTT3bkeldQX8E8Dfwe7JeXE7UYay5AyElB6KsgOqKHEZ0EqCWIQQ/kf0uWxN0oY4xzlrZkq199onO5c9FtFDCSm3MtQ094vVNmERQe9r9CAEPQlhxlIDCRVqYnIMBegYTdiUbC324zUpfnUGgSrNxRw8n9c0LzbL/9kNk+pLkGfISRkN87WyIgZhP9Yxy+8PnsQOeF/gPe7uTsqvV6MGQ9ILTsLOhGqZQK2BCqImy6Q9sFnCGsIZATDYvNWdKjRsShsD8B+C54pvp/ljWGMTil7ysoKgJs5jBeIe0tTXrZaeYUvBVeDLO/Q0nhXz5Pw8kfV+h6NIRp0hTQTPCO8m4HUUc+4PnpAigwPX6p5FwrsRHqOnEuhLIWwjQoNizTZgENk2xzAtURcqlFu7QXyeLkV1cM1EcUdBLHVMiQzq0l2aiklGCnZCvT2A7m0F+ssV12XcLtBzwZMBVE5e84+d+pJVsGD27jB06jvnjUNRG7exDVn0/jzKzVVrPCvz5jQ== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(36860700013)(82310400026)(376014)(7416014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Jul 2024 18:54:59.4172 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0a012713-dee7-49e7-bd10-08dcb0c91ca2 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CH1PEPF0000AD75.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH3PR12MB7739 From: Yazen Ghannam A new "FRU Text in MCA" feature is defined where the Field Replaceable Unit (FRU) Text for a device is represented by a string in the new MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]). The FRU Text is populated dynamically for each individual error state (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank covers multiple devices, for example, a Unified Memory Controller (UMC) bank that manages two DIMMs. Since MCA_CONFIG[9] is instrumental in decoding FRU Text, it has to be exported through the mce_record tracepoint so that userspace tools like the rasdaemon can determine if FRU Text has been reported through the MCA_SYND1 and MCA_SYND2 registers and output it. [Yazen: Add Avadhut as co-developer for wrapper changes.] Co-developed-by: Avadhut Naik Signed-off-by: Avadhut Naik Signed-off-by: Yazen Ghannam Signed-off-by: Avadhut Naik --- Changes in v2: [1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-yazen.ghannam@amd.com/ [2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-yazen.ghannam@amd.com/ 1. Drop dependencies on sets [1] and [2] above and rebase on top of tip/master. Changes in v3: 1. Modify commit message per feedback provided. 2. Remove call to memset() for the string frutext. Instead, just ensure that it is NULL terminated. 2. Fix SoB chain to properly reflect the patch path. --- arch/x86/include/asm/mce.h | 2 ++ arch/x86/kernel/cpu/mce/amd.c | 1 + arch/x86/kernel/cpu/mce/apei.c | 2 ++ arch/x86/kernel/cpu/mce/core.c | 3 +++ drivers/edac/mce_amd.c | 21 ++++++++++++++------- 5 files changed, 22 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index a5be7463c78a..377a5469ed7e 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -61,6 +61,7 @@ * - TCC bit is present in MCx_STATUS. */ #define MCI_CONFIG_MCAX 0x1 +#define MCI_CONFIG_FRUTEXT BIT_ULL(9) #define MCI_IPID_MCATYPE 0xFFFF0000 #define MCI_IPID_HWID 0xFFF @@ -213,6 +214,7 @@ struct mce_hw_err { struct { u64 synd1; /* MCA_SYND1 MSR */ u64 synd2; /* MCA_SYND2 MSR */ + u64 config; /* MCA_CONFIG MSR */ } amd; } vendor; }; diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index a2a5fb940bb6..00b6fa987094 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -798,6 +798,7 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc) if (mce_flags.smca) { rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid); + rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(bank), err.vendor.amd.config); if (m->status & MCI_STATUS_SYNDV) { rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd); diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index 5949fc103be4..db2677f3023a 100644 --- a/arch/x86/kernel/cpu/mce/apei.c +++ b/arch/x86/kernel/cpu/mce/apei.c @@ -156,6 +156,8 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id) fallthrough; /* MCA_CONFIG */ case 4: + err.vendor.amd.config = *(i_mce + 3); + fallthrough; /* MCA_MISC0 */ case 3: m->misc = *(i_mce + 2); diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index c8089d7a8e9b..054188aac2ee 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -195,6 +195,8 @@ static void __print_mce(struct mce_hw_err *err) pr_cont("SYND2 %llx ", err->vendor.amd.synd2); if (m->ipid) pr_cont("IPID %llx ", m->ipid); + if (err->vendor.amd.config) + pr_cont("CONFIG %llx ", err->vendor.amd.config); } pr_cont("\n"); @@ -667,6 +669,7 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i) if (mce_flags.smca) { m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i)); + err->vendor.amd.config = mce_rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(i)); if (m->status & MCI_STATUS_SYNDV) { m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i)); diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index aea68999c849..f7f7ea0a5292 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data) struct mce_hw_err *err = (struct mce_hw_err *)data; struct mce *m = &err->m; unsigned int fam = x86_family(m->cpuid); + u64 mca_config = err->vendor.amd.config; int ecc; if (m->kflags & MCE_HANDLED_CEC) @@ -814,11 +815,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data) ((m->status & MCI_STATUS_PCC) ? "PCC" : "-")); if (boot_cpu_has(X86_FEATURE_SMCA)) { - u32 low, high; - u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank); - - if (!rdmsr_safe(addr, &low, &high) && - (low & MCI_CONFIG_MCAX)) + if (mca_config & MCI_CONFIG_MCAX) pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-")); pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-")); @@ -853,8 +850,18 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data) if (m->status & MCI_STATUS_SYNDV) { pr_cont(", Syndrome: 0x%016llx\n", m->synd); - pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx", - err->vendor.amd.synd1, err->vendor.amd.synd2); + if (mca_config & MCI_CONFIG_FRUTEXT) { + char frutext[17]; + frutext[16] = '\0'; + + memcpy(&frutext[0], &err->vendor.amd.synd1, 8); + memcpy(&frutext[8], &err->vendor.amd.synd2, 8); + + pr_emerg(HW_ERR "FRU Text: %s", frutext); + } else { + pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx", + err->vendor.amd.synd1, err->vendor.amd.synd2); + } } pr_cont("\n");