From patchwork Thu Oct 28 13:01:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12590055 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95268C4332F for ; Thu, 28 Oct 2021 13:01:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7C7CD61106 for ; Thu, 28 Oct 2021 13:01:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230265AbhJ1NEI (ORCPT ); Thu, 28 Oct 2021 09:04:08 -0400 Received: from mail-mw2nam10on2087.outbound.protection.outlook.com ([40.107.94.87]:3977 "EHLO NAM10-MW2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230157AbhJ1NEF (ORCPT ); Thu, 28 Oct 2021 09:04:05 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=B2822eiWcVQD9lZI5eCGTy6kODatiV7iglYGEdX725wX3GGWvoKpaJEodMPZ+tnPUC5bXKY0gqHyjWUgrTB3cxyyywoITPlmJlJjjT6jrtckNU477142BbWi/mdeEHdux3o2HajgtRoRorj5Y2LEVCvxXJeVEc/QePhN6cH7wVi650CqTP5PHiR0jGAshKY/Pwzf9a+MyNvVoXRpZ4m/c6gGQUBjiCcHrxQ3uRap+HV4GsYMHxWi9mmuKl3XvYH/Es1s+GsFaYpFQU2dZa0sn8nKQzztPrzLX6/KdK/RN/cy/aWJcC0pWS70Mt5larwYt0vee8NBCvqngeMuPqx1Kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0x/g4vXueCZ3gYz9Imo1IzZDoUJqvQVNCSBXKXaaEno=; b=mlFHLihSW9Y07u/cFoOgjD82iscqqbtEPN5VO2MQi8vNkpgg+JB5o8yuB9Lwyavbqp2yI5pbSrsxmKkarNjNbvOY9VZDlvU4kmgjS8ks1yVukg9MoSRtmITzZLRh6BTuwxd2vuQ6ixg4rbQT+dyurdLG9yBRj99gAVxzrHxX03sGBNosdVf3/C2dHju/N7l7t8bPJ5QTIHsHuNr04LD+gD66oHPcQEAVzei0gHmAadLmgep7RWZyfsZyx1LZMFMGs+RXNJzpqVCr5rAhAxjaD++di6KIOcgHzUvtaGNmvuONXjjaOKHE1nXDIdsxJcI7/Udv/22p8/DozOXCsXAkTg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0x/g4vXueCZ3gYz9Imo1IzZDoUJqvQVNCSBXKXaaEno=; b=EU4N7T0BrvRFTPph9MQo9n+vKgxhOu6P1F2oRIP6YhduQMHmvIUZ7pxpkIIRKu1IkpU6I97rejurkJK5VB8rXaF2OM7As5yw/93IhHp+YiyA4nqo2a1eD+6559JWvj60KZtke1mGVCYMbSDHqUyXAx6uVnLSK5OmC0P5uatcCg0= Received: from DM5PR18CA0084.namprd18.prod.outlook.com (2603:10b6:3:3::22) by DM5PR12MB2423.namprd12.prod.outlook.com (2603:10b6:4:b3::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4628.18; Thu, 28 Oct 2021 13:01:30 +0000 Received: from DM6NAM11FT049.eop-nam11.prod.protection.outlook.com (2603:10b6:3:3:cafe::ae) by DM5PR18CA0084.outlook.office365.com (2603:10b6:3:3::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4649.14 via Frontend Transport; Thu, 28 Oct 2021 13:01:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT049.mail.protection.outlook.com (10.13.172.188) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4649.14 via Frontend Transport; Thu, 28 Oct 2021 13:01:29 +0000 Received: from milan-ETHANOL-X.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.15; Thu, 28 Oct 2021 08:01:25 -0500 From: Naveen Krishna Chatradhi To: , CC: , , , , , Muralidhara M K , Naveen Krishna Chatradhi Subject: [PATCH v6 1/5] x86/amd_nb: Add support for northbridges on Aldebaran Date: Thu, 28 Oct 2021 18:31:02 +0530 Message-ID: <20211028130106.15701-2-nchatrad@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211028130106.15701-1-nchatrad@amd.com> References: <20211028130106.15701-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: a77d20aa-775e-4680-03d1-08d99a130f03 X-MS-TrafficTypeDiagnostic: DM5PR12MB2423: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:2331; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 06qp6LHjnQM7tJ14eQ9WKyBnbbhBvh0+cwN3Vc3g2clugVRr62mScpsm7OhMndxL2CpgQwer03joV3Q71JxVkK3KZlkLDO1gVQX7RgocudZ5dOyUYxyamplvGmE3BSDGwT/hhUHZHZroiSj2/t2eU1ZBuDjTWXIADv2WL7EDnqS3ZBGnb51lPwvcGV6shwiaCb4TY0FmeIL17UsIWEnEgtfAIAaU4oTGu2ZI/r/qK8WDMbFprEd6dRjgETE2ZSIAoV4riHf++U5Y5cqfnFNq9sHljZrK0EXnRqo5vb5NHB0le1qCQYu0XvMJ6EEKtzn108QqlRm2+FcYkWPtMcZLpx4B/NvhXsoW2Sawu/YJWSaNS+CU3v9U5EORggp51K09vNxVXW0iYYY+MJgv47eDS4tBiTPD+naz1rRIR29ECQFKffmR15WHBBe8x6EXerJuy55TAb5YCcBgwmrhDzykmK+Ej5kNMzQjZsb/gSxfyaxrapKrOcFZ6AxQfFtFMk2AyTsBMrPUmfTTl+/QSl2Av4yy0qGijQXapz/UMFjMad5v6y6P0srB3EH7BhtNBUPwk/do4HsCMStvvSql/EBT4zH+smpfR7yldbiFOa1AE4TAwRXrOb9xkWLBKxFVheL5olTqwJdfNCjszmyBhT4LK6sijQr401dlkYjCt1PEs/8UmoihKWoB/vXI7dcOHnBssNwbtBQLPkBUNSK4ndp7g3S8C0Xm74wOJ55tQfJzlhst9lW/rGO4R6BUcNAOqiDqfrecokjsF49lic5IoXIVucH9z3Th50rjqYXFrfAGalLD6EB0f5YGDvaUPb225h+i X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(4636009)(46966006)(36840700001)(6666004)(316002)(26005)(70206006)(966005)(16526019)(83380400001)(81166007)(82310400003)(70586007)(356005)(426003)(1076003)(186003)(5660300002)(110136005)(2616005)(508600001)(36860700001)(8676002)(47076005)(36756003)(2906002)(336012)(8936002)(54906003)(7696005)(4326008)(30864003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Oct 2021 13:01:29.5538 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a77d20aa-775e-4680-03d1-08d99a130f03 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT049.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR12MB2423 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Muralidhara M K On newer systems the CPUs manage MCA errors reported from the GPUs. Enumerate the GPU nodes with the AMD NB framework to support EDAC. GPU nodes are enumerated in sequential order based on the PCI hierarchy, and the first GPU node is assumed to have an "AMD Node ID" value after CPU Nodes are fully populated. Aldebaran is an AMD GPU, GPU drivers are part of the DRM framework https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html Each Aldebaran GPU has 2 Data Fabrics, which are enumerated as 2 nodes. With this implementation detail, the Data Fabric on the GPU nodes can be accessed the same way as the Data Fabric on CPU nodes. Special handling was necessary in northbridge enumeration as the roots_per_misc value is different for GPU and CPU nodes. Signed-off-by: Muralidhara M K Co-developed-by: Naveen Krishna Chatradhi Signed-off-by: Naveen Krishna Chatradhi Link: https://lkml.kernel.org/r/20210823185437.94417-2-nchatrad@amd.com --- Changes since v5: Modified amd_get_node_map() and checking return value Changes since v4: 1. renamed struct name from nmap to nodemap 2. split amd_get_node_map and addressed minor comments Changes since v3: 1. Use word "gpu" instead of "noncpu" in the patch 2. Do not create pci_dev_ids arrays for gpu nodes 3. Identify the gpu node start index from DF18F1 registers on the GPU nodes. a. Export cpu node count and gpu start node id Changes since v2: 1. Added Reviewed-by Yazen Ghannam Changes since v1: 1. Modified the commit message and comments in the code 2. Squashed patch 1/7: "x86/amd_nb: Add Aldebaran device to PCI IDs" arch/x86/include/asm/amd_nb.h | 9 +++ arch/x86/kernel/amd_nb.c | 146 ++++++++++++++++++++++++++++------ include/linux/pci_ids.h | 1 + 3 files changed, 133 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h index 455066a06f60..a78d088dae40 100644 --- a/arch/x86/include/asm/amd_nb.h +++ b/arch/x86/include/asm/amd_nb.h @@ -68,10 +68,17 @@ struct amd_northbridge { struct threshold_bank *bank4; }; +/* heterogeneous system node type map variables */ +struct amd_node_map { + u16 gpu_node_start_id; + u16 cpu_node_count; +}; + struct amd_northbridge_info { u16 num; u64 flags; struct amd_northbridge *nb; + struct amd_node_map *nodemap; }; #define AMD_NB_GART BIT(0) @@ -83,6 +90,8 @@ struct amd_northbridge_info { u16 amd_nb_num(void); bool amd_nb_has_feature(unsigned int feature); struct amd_northbridge *node_to_amd_nb(int node); +u16 amd_gpu_node_start_id(void); +u16 amd_cpu_node_count(void); static inline u16 amd_pci_dev_to_node_id(struct pci_dev *pdev) { diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c index c92c9c774c0e..199d9d79edfb 100644 --- a/arch/x86/kernel/amd_nb.c +++ b/arch/x86/kernel/amd_nb.c @@ -19,6 +19,7 @@ #define PCI_DEVICE_ID_AMD_17H_M10H_ROOT 0x15d0 #define PCI_DEVICE_ID_AMD_17H_M30H_ROOT 0x1480 #define PCI_DEVICE_ID_AMD_17H_M60H_ROOT 0x1630 +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb #define PCI_DEVICE_ID_AMD_17H_DF_F4 0x1464 #define PCI_DEVICE_ID_AMD_17H_M10H_DF_F4 0x15ec #define PCI_DEVICE_ID_AMD_17H_M30H_DF_F4 0x1494 @@ -28,6 +29,7 @@ #define PCI_DEVICE_ID_AMD_19H_M40H_ROOT 0x14b5 #define PCI_DEVICE_ID_AMD_19H_M40H_DF_F4 0x167d #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4 /* Protect the PCI config register pairs used for SMN and DF indirect access. */ static DEFINE_MUTEX(smn_mutex); @@ -40,6 +42,7 @@ static const struct pci_device_id amd_root_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M30H_ROOT) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M60H_ROOT) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_ROOT) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) }, {} }; @@ -63,6 +66,7 @@ static const struct pci_device_id amd_nb_misc_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_DF_F3) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F3) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F3) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) }, {} }; @@ -81,6 +85,7 @@ static const struct pci_device_id amd_nb_link_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F4) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F4) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) }, {} }; @@ -126,6 +131,68 @@ struct amd_northbridge *node_to_amd_nb(int node) } EXPORT_SYMBOL_GPL(node_to_amd_nb); +/* + * GPU start index and CPU count values on an heterogeneous system, + * these values will be used by the AMD EDAC and MCE modules. + */ +u16 amd_gpu_node_start_id(void) +{ + return (amd_northbridges.nodemap) ? + amd_northbridges.nodemap->gpu_node_start_id : 0; +} +EXPORT_SYMBOL_GPL(amd_gpu_node_start_id); + +u16 amd_cpu_node_count(void) +{ + return (amd_northbridges.nodemap) ? + amd_northbridges.nodemap->cpu_node_count : amd_northbridges.num; +} +EXPORT_SYMBOL_GPL(amd_cpu_node_count); + +/* GPU Data Fabric ID Device 24 Function 1 */ +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F1 0x14d1 + +/* DF18xF1 registers on Aldebaran GPU */ +#define REG_LOCAL_NODE_TYPE_MAP 0x144 +#define REG_RMT_NODE_TYPE_MAP 0x148 + +/* + * Newer AMD CPUs and GPUs whose data fabrics can be connected via custom xGMI + * links, comes with registers to gather local and remote node type map info. + * + * "Local Node Type" refers to nodes with the same type as that from which the + * register is read, and "Remote Node Type" refers to nodes with a different type. + * + * This function, reads the registers from GPU DF function 1. + * Hence, local nodes are GPU and remote nodes are CPUs. + */ +static int amd_get_node_map(void) +{ + struct amd_node_map *nodemap; + struct pci_dev *pdev; + u32 tmp; + + pdev = pci_get_device(PCI_VENDOR_ID_AMD, + PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F1, NULL); + if (!pdev) { + pr_debug("DF Func1 PCI device not found on this node.\n"); + return -ENODEV; + } + + nodemap = kmalloc(sizeof(*nodemap), GFP_KERNEL); + if (!nodemap) + return -ENOMEM; + + pci_read_config_dword(pdev, REG_LOCAL_NODE_TYPE_MAP, &tmp); + nodemap->gpu_node_start_id = tmp & 0xFFF; + + pci_read_config_dword(pdev, REG_RMT_NODE_TYPE_MAP, &tmp); + nodemap->cpu_node_count = tmp >> 16 & 0xFFF; + + amd_northbridges.nodemap = nodemap; + return 0; +} + static struct pci_dev *next_northbridge(struct pci_dev *dev, const struct pci_device_id *ids) { @@ -230,6 +297,27 @@ int amd_df_indirect_read(u16 node, u8 func, u16 reg, u8 instance_id, u32 *lo) } EXPORT_SYMBOL_GPL(amd_df_indirect_read); +struct pci_dev *get_root_devs(struct pci_dev *root, + const struct pci_device_id *root_ids, + u16 roots_per_misc) +{ + u16 j; + + /* + * If there are more PCI root devices than data fabric/ + * system management network interfaces, then the (N) + * PCI roots per DF/SMN interface are functionally the + * same (for DF/SMN access) and N-1 are redundant. N-1 + * PCI roots should be skipped per DF/SMN interface so + * the following DF/SMN interfaces get mapped to + * correct PCI roots. + */ + for (j = 0; j < roots_per_misc; j++) + root = next_northbridge(root, root_ids); + + return root; +} + int amd_cache_northbridges(void) { const struct pci_device_id *misc_ids = amd_nb_misc_ids; @@ -237,10 +325,10 @@ int amd_cache_northbridges(void) const struct pci_device_id *root_ids = amd_root_ids; struct pci_dev *root, *misc, *link; struct amd_northbridge *nb; - u16 roots_per_misc = 0; - u16 misc_count = 0; - u16 root_count = 0; - u16 i, j; + u16 roots_per_misc = 0, gpu_roots_per_misc = 0; + u16 misc_count = 0, gpu_misc_count = 0; + u16 root_count = 0, gpu_root_count = 0; + u16 i; if (amd_northbridges.num) return 0; @@ -252,15 +340,23 @@ int amd_cache_northbridges(void) } misc = NULL; - while ((misc = next_northbridge(misc, misc_ids)) != NULL) - misc_count++; + while ((misc = next_northbridge(misc, misc_ids)) != NULL) { + if (misc->device == PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) + gpu_misc_count++; + else + misc_count++; + } if (!misc_count) return -ENODEV; root = NULL; - while ((root = next_northbridge(root, root_ids)) != NULL) - root_count++; + while ((root = next_northbridge(root, root_ids)) != NULL) { + if (root->device == PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) + gpu_root_count++; + else + root_count++; + } if (root_count) { roots_per_misc = root_count / misc_count; @@ -275,33 +371,37 @@ int amd_cache_northbridges(void) } } - nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL); + /* + * The number of miscs, roots and roots_per_misc might vary on different + * nodes of a heterogeneous system. + * Calculate roots_per_misc accordingly in order to skip the redundant + * roots and map the DF/SMN interfaces to correct PCI roots. + */ + if (gpu_root_count && gpu_misc_count) { + int ret = amd_get_node_map(); + + if (ret) + return ret; + + gpu_roots_per_misc = gpu_root_count / gpu_misc_count; + } + + amd_northbridges.num = misc_count + gpu_misc_count; + nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL); if (!nb) return -ENOMEM; amd_northbridges.nb = nb; - amd_northbridges.num = misc_count; link = misc = root = NULL; for (i = 0; i < amd_northbridges.num; i++) { + u16 misc_roots = i < misc_count ? roots_per_misc : gpu_roots_per_misc; node_to_amd_nb(i)->root = root = - next_northbridge(root, root_ids); + get_root_devs(root, root_ids, misc_roots); node_to_amd_nb(i)->misc = misc = next_northbridge(misc, misc_ids); node_to_amd_nb(i)->link = link = next_northbridge(link, link_ids); - - /* - * If there are more PCI root devices than data fabric/ - * system management network interfaces, then the (N) - * PCI roots per DF/SMN interface are functionally the - * same (for DF/SMN access) and N-1 are redundant. N-1 - * PCI roots should be skipped per DF/SMN interface so - * the following DF/SMN interfaces get mapped to - * correct PCI roots. - */ - for (j = 1; j < roots_per_misc; j++) - root = next_northbridge(root, root_ids); } if (amd_gart_present()) diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 011f2f1ea5bb..b3a0ec29dbd6 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -557,6 +557,7 @@ #define PCI_DEVICE_ID_AMD_19H_DF_F3 0x1653 #define PCI_DEVICE_ID_AMD_19H_M40H_DF_F3 0x167c #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3 0x14d3 #define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703 #define PCI_DEVICE_ID_AMD_LANCE 0x2000 #define PCI_DEVICE_ID_AMD_LANCE_HOME 0x2001 From patchwork Thu Oct 28 13:01:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12590057 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD7D3C433FE for ; Thu, 28 Oct 2021 13:01:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C2658610FF for ; Thu, 28 Oct 2021 13:01:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230282AbhJ1NEK (ORCPT ); Thu, 28 Oct 2021 09:04:10 -0400 Received: from mail-bn7nam10on2089.outbound.protection.outlook.com ([40.107.92.89]:46560 "EHLO NAM10-BN7-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230299AbhJ1NEK (ORCPT ); Thu, 28 Oct 2021 09:04:10 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GVFKvij8wNGcFoUU2W/2t4/8qn70UjAG9EFlloqE3ySyZ9IGn6vFxDZn64GX50vU/echA8RoXUQ+x+GHmb2zkkuMHzNw+thb4D/hElkNWHcRos8HOlzWkkJPWbPrN3jwH2gaxrwxkSjVQ1mkbzcJnEan+BewS/K2F3ZBCxOocD4yijtNbRw0ENv3qxgbz5a7/OE9JynxQunY7tDArROTOrBcZsIQMZcv5ezsu/H57s/oi2pdoL4KT2lYJLGnYRFMqjNOzPvKEbdDNG8B3NJ470L9uSeZCa8T/d+O1lcAutJ9QAFc7U8WX/Ukra6SSSWYWVDdz+NgCi58FratHSbe5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=yxOxDP7jmuCpGXFCsk9UxCWU0QOzRo74wMFv+7Oxdl8=; b=JEddxBbLeI9sC+vsApQZhin//9TBwiczlgxny+paQGj7/UwGzIYxx+XwPzskft+iP3kTE77atG9JnLQhxWEFMCqeEh26Rag9KVWkzTdyZtPidsa9tzJKHO4oiwyeTYfTxqJ8adwvWktNHK+wMNwNhCjtRQYDfm8vCPt4RixGXuSl33tJzh+itxaJR5guyu26zuiNwkuugrV/GGpA+AN2YWBRqF4sZTwvZxQf7GYrBxoyatu2i2KwedDZB1ZWq6/4yqBsh5eJaZN7cw1y5RjGtyUNRvozWfVf5cFK4A+JfwEbtwEEKR3A/NG++c1QjMiqbnxTz5lpD+cgr8/MQYfjrw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=yxOxDP7jmuCpGXFCsk9UxCWU0QOzRo74wMFv+7Oxdl8=; b=RdyvN0fseC4VLhYIGDINRo8P7HUy0p0hlohj0fRL+HbfyrpCk+nJdDYtRnTFm6xdrASenA277hp32RYfiKSJuXvAWgM7YX3cTAbqbh75cZwgyV/+nSWzrRzQl8cIRIK9D4svINjb1T5K6xk2rBpJDCLU+OvR8W4ySn0ENTGwooA= Received: from DM3PR03CA0009.namprd03.prod.outlook.com (2603:10b6:0:50::19) by BYAPR12MB3206.namprd12.prod.outlook.com (2603:10b6:a03:130::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4649.14; Thu, 28 Oct 2021 13:01:40 +0000 Received: from DM6NAM11FT004.eop-nam11.prod.protection.outlook.com (2603:10b6:0:50:cafe::d6) by DM3PR03CA0009.outlook.office365.com (2603:10b6:0:50::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4649.14 via Frontend Transport; Thu, 28 Oct 2021 13:01:40 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT004.mail.protection.outlook.com (10.13.172.217) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4649.14 via Frontend Transport; Thu, 28 Oct 2021 13:01:39 +0000 Received: from milan-ETHANOL-X.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.15; Thu, 28 Oct 2021 08:01:36 -0500 From: Naveen Krishna Chatradhi To: , CC: , , , , , Naveen Krishna Chatradhi , Muralidhara M K Subject: [PATCH v6 2/5] EDAC/mce_amd: Extract node id from MCA_IPID Date: Thu, 28 Oct 2021 18:31:03 +0530 Message-ID: <20211028130106.15701-3-nchatrad@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211028130106.15701-1-nchatrad@amd.com> References: <20211028130106.15701-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 983ea13c-97e6-4cb3-eaed-08d99a131538 X-MS-TrafficTypeDiagnostic: BYAPR12MB3206: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:1060; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: s+D6Gtwp9nXzV5aGpdTARxyx/7SJW+dmT+mPy+c8xlcCu3LqoRR2ReRJy61cDzkRQpx7KRoGCd8mWg77lauG4LlFEycywImPjgk7glPrUyAvYjD+7TpWtp+n+fqrn3TY+EhIeIEzDnf8BulYgQ3exv9XG/heW/4xutYo7rVi3omw0cJ7bYRgotI/E52ow79HQe4/1qkWvn6qQYhwkVKBggK4a/nLbh7FOvDmTDeFzXJwyrm+RlsjdaDUwiatut3etqlnoMrzE08Advd86BwTjweu6FLF60a8cngu6SO9f0ybjLOQlA3+J+oALWd48kZg4pcEtyPojJCU/ZRz67pxINZaOzoffzYi8x99PSLSXkWMIZx2/VR0xb1pT7o2rB49Hx2YrS7L5/lzhUQnAVAD7VnoFRVgYtzk2jjRqp2PNrzzoB9G5L8lmLXxQVRnWHlQnxrZ/zhsOnWDjWPbwCO78KqM+HWFhx1BP3tYoqFAfQ7yEIAV2gvhDOegDJRVM0lfOrQNZiPx+gR9mGs0RiZhhbaihQSXraLuhj/CochF2qcm0TuWlbGe01ODexcIaYg4XjgYXv0hXFqZlVat2JtMx0g6hp5VQ9KgrwLRuhpK/T35N9KAEihyVV+SUIxL53w/aH6Jh0byI/Pv9Kn/VvGAnPhwtEjymU26ffXhLu7W3jtoPQfCGaUD64f/rfch0DKsk2MEZEyztEXB/skvYg0zVVDRu3XvlSgw6CePoMjwgDCIYVAy2T6uKSx8krSivpOW3siEVc2Yopg3+p/H/VKTeORhbpsIGBY76gJzxAT4WyaayggvsW4RzDrZqRTbyJG2 X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(4636009)(46966006)(36840700001)(186003)(82310400003)(2616005)(8936002)(47076005)(316002)(8676002)(54906003)(16526019)(2906002)(508600001)(110136005)(6666004)(336012)(26005)(4326008)(426003)(70586007)(83380400001)(70206006)(81166007)(356005)(5660300002)(966005)(1076003)(7696005)(36860700001)(36756003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Oct 2021 13:01:39.9694 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 983ea13c-97e6-4cb3-eaed-08d99a131538 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT004.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR12MB3206 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org On SMCA banks of the GPU nodes, the node id information is available in register MCA_IPID[47:44](InstanceIdHi). Convert the hardware node ID to a value used by Linux where GPU nodes are sequencially after the CPU nodes. Co-developed-by: Muralidhara M K Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi Reviewed-by: Yazen Ghannam Link: https://lkml.kernel.org/r/20210823185437.94417-3-nchatrad@amd.com --- Changes since v5: None Changes since v4: Add reviewed by Yazen Changes since v3: 1. Use APIs from amd_nb to identify the gpu_node_start_id and cpu_node_count. Which is required to map the hardware node id to node id enumerated by Linux. Changes since v2: 1. Modified subject and commit message 2. Added Reviewed by Yazen Ghannam Changes since v1: 1. Modified the commit message 2. rearranged the conditions before calling decode_dram_ecc() drivers/edac/mce_amd.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index 67dbf4c31271..af6caa76adc7 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -2,6 +2,7 @@ #include #include +#include #include #include "mce_amd.h" @@ -1072,8 +1073,27 @@ static void decode_smca_error(struct mce *m) if (xec < smca_mce_descs[bank_type].num_descs) pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]); - if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc) - decode_dram_ecc(topology_die_id(m->extcpu), m); + if (xec == 0 && decode_dram_ecc) { + int node_id = 0; + + if (bank_type == SMCA_UMC) { + node_id = topology_die_id(m->extcpu); + } else if (bank_type == SMCA_UMC_V2) { + /* + * SMCA_UMC_V2 exists on GPU nodes, extract the node id + * from register MCA_IPID[47:44](InstanceIdHi). + * The InstanceIdHi field represents the instance ID of the GPU. + * Which needs to be mapped to a value used by Linux, + * where GPU nodes are simply numerically after the CPU nodes. + */ + node_id = ((m->ipid >> 44) & 0xF) - + amd_gpu_node_start_id() + amd_cpu_node_count(); + } else { + return; + } + + decode_dram_ecc(node_id, m); + } } static inline void amd_decode_err_code(u16 ec) From patchwork Thu Oct 28 13:01:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12590059 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3203CC433EF for ; Thu, 28 Oct 2021 13:01:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 16DB4610C8 for ; Thu, 28 Oct 2021 13:01:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230354AbhJ1NEY (ORCPT ); Thu, 28 Oct 2021 09:04:24 -0400 Received: from mail-mw2nam08on2063.outbound.protection.outlook.com ([40.107.101.63]:33761 "EHLO NAM04-MW2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230367AbhJ1NEX (ORCPT ); Thu, 28 Oct 2021 09:04:23 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Jpm3N53dSnoNXRrviggHWN6tnRKLcgWGtNgqeimN2zfOs4muGIiyAsoXCKtPYvMS0FrszeLVd8VuEQOcSu5LEn46850xQlfpWMOb0+wntNGTorsm2gmoenoD5pchHqIm5RtW0y9EHebXIaMLOZ6Vv07Eyh6fH8zewVE1pDvKTI74/zEkbRKHnG7WNP6obj/ipBwXk3uUWakAtrSP7KTSU0ohx0B+kxnvLEfRIoh2TBplw9ae3DRORAq/6q2Rka2dTEip8mepJ48pPDAXZNlFJTMvZY6cwG6bQq+qv4gDTfFsrDwswHjyVdjOdV7tCSdBNhbmszoLaTL8NsTCoxGk0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=bJO2gyLOaneYxbxlQb8NrBnRLBIeATD4B/NEEwusg4I=; b=gH6es2K8Iz3VdS4zc2A8QuwPNXAn6Aqsmsog5fc4jqyfdseAbTG9k0922d9+4miHo9N3eYotd8Tw1RIcwBO6VxV4eq41WKcnglQaL3EAYxKLTxRyeNVasiwc6FYSHsjsFNR86774uizLaUscH/i4lY93/eJhz+O7cEgGnjUhH17kBAi7/KWx89fOrjlbjZ5ceBdTE6CjC1VvvIG+rAkV858qzZshEORXGMRx+RL4rSbPzys0ohpyVSYbeEvQY97XnUReWNuOQzYbDoHQ+z+YYUwKU10n9NAQTuMJ5cvkuJ56XolX1nHWiS0D34cL1ovqFqi6Z96rHe0iRlgm677rqg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bJO2gyLOaneYxbxlQb8NrBnRLBIeATD4B/NEEwusg4I=; b=cb+lHWtFbPh4UEvskDuCq3d79/9SdhWMaFMjea23oNslI5mB+/8aWd5futZwu373FMbfE+xZRJuV+qJTK1Fjq1BaSdptKJdsfkVko22ZeweoObapoWJBKlxu+goIXXCKG0/p+X8ITtksIezkNolKPQDEURAWe234teLk04hC+8Y= Received: from DS7PR03CA0210.namprd03.prod.outlook.com (2603:10b6:5:3b6::35) by CH0PR12MB5313.namprd12.prod.outlook.com (2603:10b6:610:d4::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4628.16; Thu, 28 Oct 2021 13:01:54 +0000 Received: from DM6NAM11FT029.eop-nam11.prod.protection.outlook.com (2603:10b6:5:3b6:cafe::ca) by DS7PR03CA0210.outlook.office365.com (2603:10b6:5:3b6::35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4649.14 via Frontend Transport; Thu, 28 Oct 2021 13:01:54 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT029.mail.protection.outlook.com (10.13.173.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4649.14 via Frontend Transport; Thu, 28 Oct 2021 13:01:53 +0000 Received: from milan-ETHANOL-X.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.15; Thu, 28 Oct 2021 08:01:50 -0500 From: Naveen Krishna Chatradhi To: , CC: , , , , , Muralidhara M K , Naveen Krishna Chatradhi Subject: [PATCH v6 3/5] EDAC/amd64: Extend family ops functions Date: Thu, 28 Oct 2021 18:31:04 +0530 Message-ID: <20211028130106.15701-4-nchatrad@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211028130106.15701-1-nchatrad@amd.com> References: <20211028130106.15701-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 092abd0c-3b5f-4eeb-7715-08d99a131d89 X-MS-TrafficTypeDiagnostic: CH0PR12MB5313: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:2201; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: u76hDsTrMxMGSF/aB3xcCBwa4w3A4dKdqiUtqH+hfawQNkxOSROKcjWZIHF55MTJC0/rOcgvOqvhKUm43fyhA4hvzllmKNA4PZ6/ABoxFAc1X+JIPReaAT6CBCl9lrtyMkfmyYRnSkcy3J+FP1QOgasIDPUaPlo4bdW201Re+2RVO/RbynNdgC6jpah6s2oxM4RpyqWBfb4KH/KHrcOYK6Dj4Imx4RKk8U4sAMKdGZxT2KkBANNlFHUi8tbVDm9N0G0c/WZiqryDU7WXDXbiXpFquL4R3wpF/1/+sxyJPSYPKMMIaSXvF+Rrg2sk3UHiyEEl/XMgrLVtGFQMapMpz+nx4/0b9fBAQLjqYDoHpg1hcgvAAJ9hi6MbcKsrX9sKOlILasQtjQkxpqsXqsxezsX66rM1wkBEtHBlcmfDmDZEDlO30LjbURxSw8I6UPF3SqRiFCuYDbJPdN3OANn5xqN3zAJJ+Kssc/KY1HYDKv5eRd4Zn2wkHdFgq52Wide11noWIJX/ug+Gp/heBDIgVCTOCsjP/fjNYn3GASGAfsFJIFNWFQCcn9ZdgWbPuFhSXp1Jm1/d0Mm46tcxMgNZ2HF8wlyVvflYhcg7BIi+uNjh3F7ugylysXjgA0kx1hh40ba9zOi+XK2M5xoG8E8Tyg5yMdrer1k/cJLeq8AuQTDNtmBSUxAnL1bNdBeJTwbKI0WdSZvlVfwmG0d2e6QqRODy/GIjTPx8t6o7TNKBTjzg/1yTa2s+UTc6LIGyPh//N4h0ogorKTxpF0qbLZjGpcCzM3DJ1whSPXWubXw6S7e2JEwiUQlCxwIm4p9eXzr3 X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(4636009)(36840700001)(46966006)(26005)(356005)(81166007)(2906002)(70586007)(47076005)(16526019)(2616005)(426003)(966005)(186003)(70206006)(83380400001)(336012)(8936002)(5660300002)(8676002)(7696005)(316002)(36860700001)(36756003)(82310400003)(508600001)(30864003)(1076003)(6666004)(110136005)(54906003)(4326008)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Oct 2021 13:01:53.9190 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 092abd0c-3b5f-4eeb-7715-08d99a131d89 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT029.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH0PR12MB5313 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Muralidhara M K Create new family operation routines and define them respectively. This would simplify adding support for future platforms. Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi Link: https://lkml.kernel.org/r/20211014185400.10451-4-nchatrad@amd.com --- Changes since v5: split read_mc_regs for per family ops Adjusted and Called dump_misc_regs for family ops Changes since v4: 1. Modified k8_prep_chip_selects for ext_model checks 2. Add read_dct_base_mask to ops 3. Renamed find_umc_channel and addressed minor comments Changes since v3: 1. Defined new family operation routines Changs since v2: 1. new patch drivers/edac/amd64_edac.c | 302 +++++++++++++++++++++++--------------- drivers/edac/amd64_edac.h | 10 +- 2 files changed, 188 insertions(+), 124 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 4fce75013674..1029fe84ba2e 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -1204,10 +1204,7 @@ static void __dump_misc_regs(struct amd64_pvt *pvt) /* Display and decode various NB registers for debug purposes. */ static void dump_misc_regs(struct amd64_pvt *pvt) { - if (pvt->umc) - __dump_misc_regs_df(pvt); - else - __dump_misc_regs(pvt); + pvt->ops->get_misc_regs(pvt); edac_dbg(1, " DramHoleValid: %s\n", dhar_valid(pvt) ? "yes" : "no"); @@ -1217,28 +1214,39 @@ static void dump_misc_regs(struct amd64_pvt *pvt) /* * See BKDG, F2x[1,0][5C:40], F2[1,0][6C:60] */ -static void prep_chip_selects(struct amd64_pvt *pvt) +static void k8_prep_chip_selects(struct amd64_pvt *pvt) { if (pvt->fam == 0xf && pvt->ext_model < K8_REV_F) { pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8; pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 8; - } else if (pvt->fam == 0x15 && pvt->model == 0x30) { - pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4; - pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2; - } else if (pvt->fam >= 0x17) { - int umc; - - for_each_umc(umc) { - pvt->csels[umc].b_cnt = 4; - pvt->csels[umc].m_cnt = 2; - } - - } else { + } else if (pvt->fam == 0xf && pvt->ext_model >= K8_REV_F) { pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8; pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4; } } +static void f15m30_prep_chip_selects(struct amd64_pvt *pvt) +{ + pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4; + pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2; +} + +static void default_prep_chip_selects(struct amd64_pvt *pvt) +{ + pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8; + pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4; +} + +static void f17_prep_chip_selects(struct amd64_pvt *pvt) +{ + int umc; + + for_each_umc(umc) { + pvt->csels[umc].b_cnt = 4; + pvt->csels[umc].m_cnt = 2; + } +} + static void read_umc_base_mask(struct amd64_pvt *pvt) { u32 umc_base_reg, umc_base_reg_sec; @@ -1297,11 +1305,6 @@ static void read_dct_base_mask(struct amd64_pvt *pvt) { int cs; - prep_chip_selects(pvt); - - if (pvt->umc) - return read_umc_base_mask(pvt); - for_each_chip_select(cs, 0, pvt) { int reg0 = DCSB0 + (cs * 4); int reg1 = DCSB1 + (cs * 4); @@ -2512,143 +2515,181 @@ static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl) } } +/* Prototypes for family specific ops routines */ +static int init_csrows(struct mem_ctl_info *mci); +static int init_csrows_df(struct mem_ctl_info *mci); +static void read_mc_regs(struct amd64_pvt *pvt); +static void __read_mc_regs_df(struct amd64_pvt *pvt); +static void update_umc_err_info(struct mce *m, struct err_info *err); + +static const struct low_ops k8_ops = { + .early_channel_count = k8_early_channel_count, + .map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow, + .dbam_to_cs = k8_dbam_to_chip_select, + .prep_chip_select = k8_prep_chip_selects, + .get_base_mask = read_dct_base_mask, + .get_misc_regs = __dump_misc_regs, + .get_mc_regs = read_mc_regs, + .populate_csrows = init_csrows, +}; + +static const struct low_ops f10_ops = { + .early_channel_count = f1x_early_channel_count, + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, + .dbam_to_cs = f10_dbam_to_chip_select, + .prep_chip_select = default_prep_chip_selects, + .get_base_mask = read_dct_base_mask, + .get_misc_regs = __dump_misc_regs, + .get_mc_regs = read_mc_regs, + .populate_csrows = init_csrows, +}; + +static const struct low_ops f15_ops = { + .early_channel_count = f1x_early_channel_count, + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, + .dbam_to_cs = f15_dbam_to_chip_select, + .prep_chip_select = default_prep_chip_selects, + .get_base_mask = read_dct_base_mask, + .get_misc_regs = __dump_misc_regs, + .get_mc_regs = read_mc_regs, + .populate_csrows = init_csrows, +}; + +static const struct low_ops f15m30_ops = { + .early_channel_count = f1x_early_channel_count, + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, + .dbam_to_cs = f16_dbam_to_chip_select, + .prep_chip_select = f15m30_prep_chip_selects, + .get_base_mask = read_dct_base_mask, + .get_misc_regs = __dump_misc_regs, + .get_mc_regs = read_mc_regs, + .populate_csrows = init_csrows, +}; + +static const struct low_ops f15m60_ops = { + .early_channel_count = f1x_early_channel_count, + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, + .dbam_to_cs = f15_m60h_dbam_to_chip_select, + .prep_chip_select = default_prep_chip_selects, + .get_base_mask = read_dct_base_mask, + .get_misc_regs = __dump_misc_regs, + .get_mc_regs = read_mc_regs, + .populate_csrows = init_csrows, +}; + +static const struct low_ops f16_ops = { + .early_channel_count = f1x_early_channel_count, + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, + .dbam_to_cs = f16_dbam_to_chip_select, + .prep_chip_select = default_prep_chip_selects, + .get_base_mask = read_dct_base_mask, + .get_misc_regs = __dump_misc_regs, + .get_mc_regs = read_mc_regs, + .populate_csrows = init_csrows, +}; + +static const struct low_ops f17_ops = { + .early_channel_count = f17_early_channel_count, + .dbam_to_cs = f17_addr_mask_to_cs_size, + .prep_chip_select = f17_prep_chip_selects, + .get_base_mask = read_umc_base_mask, + .get_misc_regs = __dump_misc_regs_df, + .get_mc_regs = __read_mc_regs_df, + .populate_csrows = init_csrows_df, + .get_umc_err_info = update_umc_err_info, +}; + static struct amd64_family_type family_types[] = { [K8_CPUS] = { .ctl_name = "K8", .f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP, .f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL, .max_mcs = 2, - .ops = { - .early_channel_count = k8_early_channel_count, - .map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow, - .dbam_to_cs = k8_dbam_to_chip_select, - } + .ops = k8_ops, }, [F10_CPUS] = { .ctl_name = "F10h", .f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP, .f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM, .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f10_dbam_to_chip_select, - } + .ops = f10_ops, }, [F15_CPUS] = { .ctl_name = "F15h", .f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2, .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f15_dbam_to_chip_select, - } + .ops = f15_ops, }, [F15_M30H_CPUS] = { .ctl_name = "F15h_M30h", .f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2, .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f16_dbam_to_chip_select, - } + .ops = f15m30_ops, }, [F15_M60H_CPUS] = { .ctl_name = "F15h_M60h", .f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2, .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f15_m60h_dbam_to_chip_select, - } + .ops = f15m60_ops, }, [F16_CPUS] = { .ctl_name = "F16h", .f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2, .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f16_dbam_to_chip_select, - } + .ops = f16_ops, }, [F16_M30H_CPUS] = { .ctl_name = "F16h_M30h", .f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2, .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f16_dbam_to_chip_select, - } + .ops = f16_ops, }, [F17_CPUS] = { .ctl_name = "F17h", .f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6, .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } + .ops = f17_ops, }, [F17_M10H_CPUS] = { .ctl_name = "F17h_M10h", .f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6, .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } + .ops = f17_ops, }, [F17_M30H_CPUS] = { .ctl_name = "F17h_M30h", .f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6, .max_mcs = 8, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } + .ops = f17_ops, }, [F17_M60H_CPUS] = { .ctl_name = "F17h_M60h", .f0_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F6, .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } + .ops = f17_ops, }, [F17_M70H_CPUS] = { .ctl_name = "F17h_M70h", .f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6, .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } + .ops = f17_ops, }, [F19_CPUS] = { .ctl_name = "F19h", .f0_id = PCI_DEVICE_ID_AMD_19H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_19H_DF_F6, .max_mcs = 8, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } + .ops = f17_ops, }, }; @@ -2899,10 +2940,13 @@ static inline void decode_bus_error(int node_id, struct mce *m) * Currently, we can derive the channel number by looking at the 6th nibble in * the instance_id. For example, instance_id=0xYXXXXX where Y is the channel * number. + * + * csrow can be derived from the lower 3 bits of MCA_SYND value. */ -static int find_umc_channel(struct mce *m) +static void update_umc_err_info(struct mce *m, struct err_info *err) { - return (m->ipid & GENMASK(31, 0)) >> 20; + err->channel = (m->ipid & GENMASK(31, 0)) >> 20; + err->csrow = m->synd & 0x7; } static void decode_umc_error(int node_id, struct mce *m) @@ -2924,8 +2968,6 @@ static void decode_umc_error(int node_id, struct mce *m) if (m->status & MCI_STATUS_DEFERRED) ecc_type = 3; - err.channel = find_umc_channel(m); - if (!(m->status & MCI_STATUS_SYNDV)) { err.err_code = ERR_SYND; goto log_error; @@ -2940,7 +2982,7 @@ static void decode_umc_error(int node_id, struct mce *m) err.err_code = ERR_CHANNEL; } - err.csrow = m->synd & 0x7; + pvt->ops->get_umc_err_info(m, &err); if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) { err.err_code = ERR_NORM_ADDR; @@ -3058,6 +3100,27 @@ static void determine_ecc_sym_sz(struct amd64_pvt *pvt) } } +static void read_top_mem_registers(struct amd64_pvt *pvt) +{ + u64 msr_val; + + /* + * Retrieve TOP_MEM and TOP_MEM2; no masking off of reserved bits since + * those are Read-As-Zero. + */ + rdmsrl(MSR_K8_TOP_MEM1, pvt->top_mem); + edac_dbg(0, " TOP_MEM: 0x%016llx\n", pvt->top_mem); + + /* Check first whether TOP_MEM2 is enabled: */ + rdmsrl(MSR_AMD64_SYSCFG, msr_val); + if (msr_val & BIT(21)) { + rdmsrl(MSR_K8_TOP_MEM2, pvt->top_mem2); + edac_dbg(0, " TOP_MEM2: 0x%016llx\n", pvt->top_mem2); + } else { + edac_dbg(0, " TOP_MEM2 disabled\n"); + } +} + /* * Retrieve the hardware registers of the memory controller. */ @@ -3067,6 +3130,8 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt) struct amd64_umc *umc; u32 i, umc_base; + read_top_mem_registers(pvt); + /* Read registers from each UMC */ for_each_umc(i) { @@ -3079,6 +3144,8 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt) amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl); amd_smn_read(nid, umc_base + UMCCH_UMC_CAP_HI, &umc->umc_cap_hi); } + + amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar); } /* @@ -3088,30 +3155,8 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt) static void read_mc_regs(struct amd64_pvt *pvt) { unsigned int range; - u64 msr_val; - - /* - * Retrieve TOP_MEM and TOP_MEM2; no masking off of reserved bits since - * those are Read-As-Zero. - */ - rdmsrl(MSR_K8_TOP_MEM1, pvt->top_mem); - edac_dbg(0, " TOP_MEM: 0x%016llx\n", pvt->top_mem); - - /* Check first whether TOP_MEM2 is enabled: */ - rdmsrl(MSR_AMD64_SYSCFG, msr_val); - if (msr_val & BIT(21)) { - rdmsrl(MSR_K8_TOP_MEM2, pvt->top_mem2); - edac_dbg(0, " TOP_MEM2: 0x%016llx\n", pvt->top_mem2); - } else { - edac_dbg(0, " TOP_MEM2 disabled\n"); - } - if (pvt->umc) { - __read_mc_regs_df(pvt); - amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar); - - goto skip; - } + read_top_mem_registers(pvt); amd64_read_pci_cfg(pvt->F3, NBCAP, &pvt->nbcap); @@ -3152,14 +3197,6 @@ static void read_mc_regs(struct amd64_pvt *pvt) amd64_read_dct_pci_cfg(pvt, 1, DCLR0, &pvt->dclr1); amd64_read_dct_pci_cfg(pvt, 1, DCHR0, &pvt->dchr1); } - -skip: - read_dct_base_mask(pvt); - - determine_memory_type(pvt); - edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]); - - determine_ecc_sym_sz(pvt); } /* @@ -3277,9 +3314,6 @@ static int init_csrows(struct mem_ctl_info *mci) int nr_pages = 0; u32 val; - if (pvt->umc) - return init_csrows_df(mci); - amd64_read_pci_cfg(pvt->F3, NBCFG, &val); pvt->nbcfg = val; @@ -3703,6 +3737,21 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt) return NULL; } + /* ops required for all the families */ + if (!pvt->ops->early_channel_count || !pvt->ops->dbam_to_cs || + !pvt->ops->prep_chip_select || !pvt->ops->get_base_mask || + !pvt->ops->get_misc_regs || !pvt->ops->get_mc_regs || + !pvt->ops->populate_csrows) { + edac_dbg(1, "Common helper routines not defined.\n"); + return NULL; + } + + /* ops required for families 17h and later */ + if (pvt->fam >= 0x17 && !pvt->ops->get_umc_err_info) { + edac_dbg(1, "Platform specific helper routines not defined.\n"); + return NULL; + } + return fam_type; } @@ -3735,7 +3784,16 @@ static int hw_info_get(struct amd64_pvt *pvt) if (ret) return ret; - read_mc_regs(pvt); + pvt->ops->get_mc_regs(pvt); + + pvt->ops->prep_chip_select(pvt); + + pvt->ops->get_base_mask(pvt); + + determine_memory_type(pvt); + edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]); + + determine_ecc_sym_sz(pvt); return 0; } @@ -3786,7 +3844,7 @@ static int init_one_instance(struct amd64_pvt *pvt) setup_mci_misc_attrs(mci); - if (init_csrows(mci)) + if (pvt->ops->populate_csrows(mci)) mci->edac_cap = EDAC_FLAG_NONE; ret = -ENODEV; diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 85aa820bc165..881ff6322bc9 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -467,11 +467,17 @@ struct ecc_settings { * functions and per device encoding/decoding logic. */ struct low_ops { - int (*early_channel_count) (struct amd64_pvt *pvt); + int (*early_channel_count) (struct amd64_pvt *pvt); void (*map_sysaddr_to_csrow) (struct mem_ctl_info *mci, u64 sys_addr, struct err_info *); - int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct, + int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct, unsigned cs_mode, int cs_mask_nr); + void (*prep_chip_select) (struct amd64_pvt *pvt); + void (*get_base_mask) (struct amd64_pvt *pvt); + void (*get_misc_regs) (struct amd64_pvt *pvt); + void (*get_mc_regs) (struct amd64_pvt *pvt); + int (*populate_csrows) (struct mem_ctl_info *mci); + void (*get_umc_err_info) (struct mce *m, struct err_info *err); }; struct amd64_family_type { From patchwork Thu Oct 28 13:01:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12590061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55AB0C433EF for ; Thu, 28 Oct 2021 13:02:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3DDB360E78 for ; Thu, 28 Oct 2021 13:02:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230409AbhJ1NE3 (ORCPT ); Thu, 28 Oct 2021 09:04:29 -0400 Received: from mail-sn1anam02on2048.outbound.protection.outlook.com ([40.107.96.48]:62329 "EHLO NAM02-SN1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230367AbhJ1NE2 (ORCPT ); Thu, 28 Oct 2021 09:04:28 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UvFuAt7KajztsTuY4h2GGWpThDqFQLB4IuEyHbdm7Xp08UjnaGs0zxD6NsDcOr2VhRK3MYEXDkjctT7PCPSNDNUObwEypwvdsgpYKLwtRdy21uYlvsXMi42buVnzel0dGCsXmxCw6t3l+7hiSYE43mgDq6dcYSp1fB9FtXHv9joGOfFRFkjKmxpOeS6zMTKb2LnnQ7f7Lp/o6Z1u/pUlwOr4i1lzNWN0cKvGU2pAGlz1BySR4dzdxbLF+mmp9APisaXcK6Y8LpZi3Z87WD4cJ1HrGQ4n0UA6wEz5X4h1UA+9QfogxeNsOzEDCrbkznn0FubHlZMMPkUd1igcpsvilQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2t4AcguaiLZlimooZ9MM2RsSUWJ444qGR9PtGTbvK/U=; b=mvYwrOGOF5k+c4ACpp5+pLsToO3UkaPxWIJc8C7wy/X2uLkUBj8nQm51QgVetyfE0BJt+0ekuJBJAPflbfUwy2TsxnIoECADkElP9a5E5hIOlmNxX5ezyxZy921FiYKUTJuZWnL0KV7jnVtqZ01NutbO491wnep+DRWF/IlOjUV1+Yol0isfMVY3LbDFs0tyUeCJFFdHkkBl26gs+kMFb8e7XBY4APj4or0J+r4wU9n6xW05nvmV8cj4o1CeG6OiucePB5h5ETEq9SQfstpyIo9GydSF8OKD62sDhAvVaK+lKQyQSTXCQqwKmF6t3x86GhbJOaH2DCxIZWpwYWp5fw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2t4AcguaiLZlimooZ9MM2RsSUWJ444qGR9PtGTbvK/U=; b=uc/GOhpJI4xbV++3NKSHSKupKIpgbjY91pM1qiLmRvBUcD10nyxmTaJ6dwl7T7vclwovVqh6OXy8imb3+iW0bwDarxgBgvFABw2zgt1yqN20UiJHRKqFKZA5tofmwM+DS5i2QVBS/Aekocg9OGsSvwrO4AqpjWiTiaOMXs1SDuY= Received: from DM5PR06CA0085.namprd06.prod.outlook.com (2603:10b6:3:4::23) by DM6PR12MB4530.namprd12.prod.outlook.com (2603:10b6:5:2aa::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4649.14; Thu, 28 Oct 2021 13:01:59 +0000 Received: from DM6NAM11FT014.eop-nam11.prod.protection.outlook.com (2603:10b6:3:4:cafe::a7) by DM5PR06CA0085.outlook.office365.com (2603:10b6:3:4::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4649.15 via Frontend Transport; Thu, 28 Oct 2021 13:01:59 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT014.mail.protection.outlook.com (10.13.173.132) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4649.14 via Frontend Transport; Thu, 28 Oct 2021 13:01:58 +0000 Received: from milan-ETHANOL-X.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.15; Thu, 28 Oct 2021 08:01:55 -0500 From: Naveen Krishna Chatradhi To: , CC: , , , , , Muralidhara M K , Naveen Krishna Chatradhi Subject: [PATCH v6 4/5] EDAC/amd64: Move struct fam_type into amd64_pvt structure Date: Thu, 28 Oct 2021 18:31:05 +0530 Message-ID: <20211028130106.15701-5-nchatrad@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211028130106.15701-1-nchatrad@amd.com> References: <20211028130106.15701-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: a2777e88-f8c5-43f7-1526-08d99a132068 X-MS-TrafficTypeDiagnostic: DM6PR12MB4530: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:580; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: qiWoOMQlUzpd4tChVe1N7+dka3vO5dAxcL4H4VObC2gtrl2VHljnVSndHc6l/kGkMRvI9C70GO8s/kfduCZrINV1OhbhmzJiHx3W/ch7uoL02r+egBh8V+PLfbXpU55vrfPHSitMu7nWmziCoaCpvT10RB5or6JHR8N4cO0HIh18ejSKQDVwMQywvVVvO1GphRWgW+7uV9aOK3jEhcrln5dUKDeu6ePKPWqagaldVKUnpVKkf0KjpXAAX98iujPVRFBbiH6+sHDbgxNX+lMjih7wBLlYYjtO7PVwoI8xr6TpOUABlbquEnNu39s0InbfNmoTR2dgpowEoLXXv/d/THtbiJPtZR2OnfIXPz35pwlJ4KqL+MmaDjQVXjoPNrH67uMRg10bmWYo4EG1UBiY3AMKjBEYUh+BGBulvlqfpncK+LrcNoaACoNGUU1+l0f7lLDji/h2ZQxGSP60jmpI2AhE1gcWKGe7j7xpT9L8OAFd8IUTHD2drLrAYBdpQITWXoYR98d2FjFQHIA3A9nRrEqs43MEw/zYuidsyohv9tZvkb5lXiEfLGhzosf1kXxTVzLHIdEdOCooaGRs+fbVaUxFBLwpD5HCWEdsJKuXftCxTjem5BRPhfsTQZRbPFrAHrNJUVrC7FYnqBDupxAZbicSJrS3pdngejxWLuW11tmZjOvaf3xkbnH2aMRgls7KmutawddsA5PEkWHo1zFFKNLPuOBtU8VBF12nHgQFS4uJW4Xkj6V+EVm4TfgOhd1UmWmGFA7Pcp84kgyaW8KhDdpC5Oe0m98vE397I/UWnf8ArQH/nVJ5OXg5taxVmCsr X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(4636009)(46966006)(36840700001)(70586007)(186003)(16526019)(26005)(36860700001)(336012)(110136005)(70206006)(36756003)(54906003)(2616005)(47076005)(8936002)(83380400001)(966005)(1076003)(426003)(5660300002)(508600001)(356005)(8676002)(81166007)(4326008)(82310400003)(6666004)(2906002)(316002)(7696005)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Oct 2021 13:01:58.7384 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a2777e88-f8c5-43f7-1526-08d99a132068 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT014.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4530 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Muralidhara M K On heterogeneous systems, the GPU nodes are probed after the CPU nodes and will overwrites the family type set by CPU nodes. Moving struct fam_type to struct amd64_pvt, instead of using fam_type as a global variable. Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi Reviewed-by: Yazen Ghannam Link: https://lkml.kernel.org/r/20211025145018.29985-5-nchatrad@amd.com --- Changes since v5: Added reviewed by Yazen Changes since v4: New patch, created based on a comment. drivers/edac/amd64_edac.c | 58 +++++++++++++++++++-------------------- drivers/edac/amd64_edac.h | 2 ++ 2 files changed, 30 insertions(+), 30 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 1029fe84ba2e..7953ffe9d547 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -13,8 +13,6 @@ module_param(ecc_enable_override, int, 0644); static struct msr __percpu *msrs; -static struct amd64_family_type *fam_type; - /* Per-node stuff */ static struct ecc_settings **ecc_stngs; @@ -448,7 +446,7 @@ static void get_cs_base_and_mask(struct amd64_pvt *pvt, int csrow, u8 dct, for (i = 0; i < pvt->csels[dct].m_cnt; i++) #define for_each_umc(i) \ - for (i = 0; i < fam_type->max_mcs; i++) + for (i = 0; i < pvt->fam_type->max_mcs; i++) /* * @input_addr is an InputAddr associated with the node given by mci. Return the @@ -3635,7 +3633,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci) mci->edac_cap = determine_edac_cap(pvt); mci->mod_name = EDAC_MOD_STR; - mci->ctl_name = fam_type->ctl_name; + mci->ctl_name = pvt->fam_type->ctl_name; mci->dev_name = pci_name(pvt->F3); mci->ctl_page_to_phys = NULL; @@ -3656,64 +3654,64 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt) switch (pvt->fam) { case 0xf: - fam_type = &family_types[K8_CPUS]; + pvt->fam_type = &family_types[K8_CPUS]; pvt->ops = &family_types[K8_CPUS].ops; break; case 0x10: - fam_type = &family_types[F10_CPUS]; + pvt->fam_type = &family_types[F10_CPUS]; pvt->ops = &family_types[F10_CPUS].ops; break; case 0x15: if (pvt->model == 0x30) { - fam_type = &family_types[F15_M30H_CPUS]; + pvt->fam_type = &family_types[F15_M30H_CPUS]; pvt->ops = &family_types[F15_M30H_CPUS].ops; break; } else if (pvt->model == 0x60) { - fam_type = &family_types[F15_M60H_CPUS]; + pvt->fam_type = &family_types[F15_M60H_CPUS]; pvt->ops = &family_types[F15_M60H_CPUS].ops; break; /* Richland is only client */ } else if (pvt->model == 0x13) { return NULL; } else { - fam_type = &family_types[F15_CPUS]; + pvt->fam_type = &family_types[F15_CPUS]; pvt->ops = &family_types[F15_CPUS].ops; } break; case 0x16: if (pvt->model == 0x30) { - fam_type = &family_types[F16_M30H_CPUS]; + pvt->fam_type = &family_types[F16_M30H_CPUS]; pvt->ops = &family_types[F16_M30H_CPUS].ops; break; } - fam_type = &family_types[F16_CPUS]; + pvt->fam_type = &family_types[F16_CPUS]; pvt->ops = &family_types[F16_CPUS].ops; break; case 0x17: if (pvt->model >= 0x10 && pvt->model <= 0x2f) { - fam_type = &family_types[F17_M10H_CPUS]; + pvt->fam_type = &family_types[F17_M10H_CPUS]; pvt->ops = &family_types[F17_M10H_CPUS].ops; break; } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) { - fam_type = &family_types[F17_M30H_CPUS]; + pvt->fam_type = &family_types[F17_M30H_CPUS]; pvt->ops = &family_types[F17_M30H_CPUS].ops; break; } else if (pvt->model >= 0x60 && pvt->model <= 0x6f) { - fam_type = &family_types[F17_M60H_CPUS]; + pvt->fam_type = &family_types[F17_M60H_CPUS]; pvt->ops = &family_types[F17_M60H_CPUS].ops; break; } else if (pvt->model >= 0x70 && pvt->model <= 0x7f) { - fam_type = &family_types[F17_M70H_CPUS]; + pvt->fam_type = &family_types[F17_M70H_CPUS]; pvt->ops = &family_types[F17_M70H_CPUS].ops; break; } fallthrough; case 0x18: - fam_type = &family_types[F17_CPUS]; + pvt->fam_type = &family_types[F17_CPUS]; pvt->ops = &family_types[F17_CPUS].ops; if (pvt->fam == 0x18) @@ -3722,12 +3720,12 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt) case 0x19: if (pvt->model >= 0x20 && pvt->model <= 0x2f) { - fam_type = &family_types[F17_M70H_CPUS]; + pvt->fam_type = &family_types[F17_M70H_CPUS]; pvt->ops = &family_types[F17_M70H_CPUS].ops; - fam_type->ctl_name = "F19h_M20h"; + pvt->fam_type->ctl_name = "F19h_M20h"; break; } - fam_type = &family_types[F19_CPUS]; + pvt->fam_type = &family_types[F19_CPUS]; pvt->ops = &family_types[F19_CPUS].ops; family_types[F19_CPUS].ctl_name = "F19h"; break; @@ -3752,7 +3750,7 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt) return NULL; } - return fam_type; + return pvt->fam_type; } static const struct attribute_group *amd64_edac_attr_groups[] = { @@ -3769,15 +3767,15 @@ static int hw_info_get(struct amd64_pvt *pvt) int ret; if (pvt->fam >= 0x17) { - pvt->umc = kcalloc(fam_type->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL); + pvt->umc = kcalloc(pvt->fam_type->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL); if (!pvt->umc) return -ENOMEM; - pci_id1 = fam_type->f0_id; - pci_id2 = fam_type->f6_id; + pci_id1 = pvt->fam_type->f0_id; + pci_id2 = pvt->fam_type->f6_id; } else { - pci_id1 = fam_type->f1_id; - pci_id2 = fam_type->f2_id; + pci_id1 = pvt->fam_type->f1_id; + pci_id2 = pvt->fam_type->f2_id; } ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2); @@ -3832,7 +3830,7 @@ static int init_one_instance(struct amd64_pvt *pvt) * only one channel. Also, this simplifies handling later for the price * of a couple of KBs tops. */ - layers[1].size = fam_type->max_mcs; + layers[1].size = pvt->fam_type->max_mcs; layers[1].is_virt_csrow = false; mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0); @@ -3862,7 +3860,7 @@ static bool instance_has_memory(struct amd64_pvt *pvt) bool cs_enabled = false; int cs = 0, dct = 0; - for (dct = 0; dct < fam_type->max_mcs; dct++) { + for (dct = 0; dct < pvt->fam_type->max_mcs; dct++) { for_each_chip_select(cs, dct, pvt) cs_enabled |= csrow_enabled(cs, dct, pvt); } @@ -3892,8 +3890,8 @@ static int probe_one_instance(unsigned int nid) pvt->F3 = F3; ret = -ENODEV; - fam_type = per_family_init(pvt); - if (!fam_type) + pvt->fam_type = per_family_init(pvt); + if (!pvt->fam_type) goto err_enable; ret = hw_info_get(pvt); @@ -3932,7 +3930,7 @@ static int probe_one_instance(unsigned int nid) goto err_enable; } - amd64_info("%s %sdetected (node %d).\n", fam_type->ctl_name, + amd64_info("%s %sdetected (node %d).\n", pvt->fam_type->ctl_name, (pvt->fam == 0xf ? (pvt->ext_model >= K8_REV_F ? "revF or later " : "revE or earlier ") diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 881ff6322bc9..82d9f64aa150 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -389,6 +389,8 @@ struct amd64_pvt { enum mem_type dram_type; struct amd64_umc *umc; /* UMC registers */ + + struct amd64_family_type *fam_type; }; enum err_codes { From patchwork Thu Oct 28 13:01:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12590063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2A7EC433F5 for ; Thu, 28 Oct 2021 13:02:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C64C660E8F for ; Thu, 28 Oct 2021 13:02:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230403AbhJ1NEm (ORCPT ); Thu, 28 Oct 2021 09:04:42 -0400 Received: from mail-bn8nam12on2065.outbound.protection.outlook.com ([40.107.237.65]:5914 "EHLO NAM12-BN8-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S230389AbhJ1NEg (ORCPT ); Thu, 28 Oct 2021 09:04:36 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Z7J7HNaqntpTO9cIMIosjtZ1UvCypbSmFpuvHCkf7MYS5dKTvZHm4kA9LTw3JSm0OCxE3pTgR20NWqI7f9Mo7X6PbDCZ4JiHFtjEQ1DiOJqtHHTVvOOlu5XAv7hLV46E7R6yGyGCiYtBkqAfDUMe1jZPUTgekOGUHQpdkZ5KB8eYhdxVoFKly0956TpHuK4JlqxxCubCi5EgS8mzTV0QSRw24zNv/NZkpPcYP+fhHcEV0p4M8p8lXOyWeu2bScW4/rhtZ9EJVHZi9PnnK4d+jxguq3LPa6+9QIzVm2pBRjzkyQ8Fd4C70JY5DhK/rHh/EmMeMsKQlZDvp7QRXWhJNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+fRD+DbOrZu0VSRgJWW9vMaTK0Fmys02hj4+lbVAZHE=; b=Ct+GMrh9der125BpUm+afHLkZEP5OD5nmOztcU8r1iQ8yTNDqEyTDUTXU3tjUmeGyUyoBR81CnXrfpUEcq0oeqyK2M2XXXWoNiqU1mbEBkeDxHiXSAqOqg67K8UeG8FnFaTC3vOdXcckLFP56nIxB8H4DBwg406nhIW5Z3+DbE3YkmXFbFNN3eG09H95JwjdQHnT6DZ3Dr5J6CuDRFrrfZa01CmqxIu4Si43s5FLII05fr+zrICynpOP0TLnGnJMMgg6pR/XfzIBG3Q+rgZ7WpEyowcZpA1KXZh7SMYc9njioDfrkeqELJry0PoiXhfK2fDZDGBTPgKW1rE5M6OC8Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+fRD+DbOrZu0VSRgJWW9vMaTK0Fmys02hj4+lbVAZHE=; b=0kz0/nsnRs1UU+8pl92vXeo9uzUSnBD7sewRWEjTpuCDuEJfjOm/0ZXvlzgQ1vQZDmeyRgRSnA8LGDGtV/d/kMcUO681k9HRLGVvNDGlBGRIDzuDUdA8APGykh9Bam5NTqBD2e9ltFLuvhkPgO8B2Q2nuNnMkKKksTl38NEoH/o= Received: from DM5PR06CA0072.namprd06.prod.outlook.com (2603:10b6:3:37::34) by MN2PR12MB3789.namprd12.prod.outlook.com (2603:10b6:208:16a::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4628.18; Thu, 28 Oct 2021 13:02:06 +0000 Received: from DM6NAM11FT030.eop-nam11.prod.protection.outlook.com (2603:10b6:3:37:cafe::4e) by DM5PR06CA0072.outlook.office365.com (2603:10b6:3:37::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4649.14 via Frontend Transport; Thu, 28 Oct 2021 13:02:05 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT030.mail.protection.outlook.com (10.13.172.146) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4649.14 via Frontend Transport; Thu, 28 Oct 2021 13:02:05 +0000 Received: from milan-ETHANOL-X.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.15; Thu, 28 Oct 2021 08:02:01 -0500 From: Naveen Krishna Chatradhi To: , CC: , , , , , Naveen Krishna Chatradhi , Muralidhara M K Subject: [PATCH v6 5/5] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes Date: Thu, 28 Oct 2021 18:31:06 +0530 Message-ID: <20211028130106.15701-6-nchatrad@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211028130106.15701-1-nchatrad@amd.com> References: <20211028130106.15701-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: ed025dc8-d067-4988-3d62-08d99a132469 X-MS-TrafficTypeDiagnostic: MN2PR12MB3789: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:2331; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: rBhpOrTuQHJrwngY9eL1HFP7Qv+97ppyeUPbOr3t9RCHiPLIdS/bfbDFMCUiZd8+CLmeE0HQ2V1dLTY8Hs3CinRNm8HG7cKz3TvblnQuThCNevRYC45MwqVMNMFxxWLJuVp9LbFBPXM7CoAuskXFodX5ciUfeP1Zjs0oMoyNqOaqvivNTX1NEeKMDBEiXBcwTmwM6z3T9QoyFvo65W1WyH8yPdU0GOZQuEpm/CeBjmgS3RhRkgupFhSE7ShsWL/nfwBQwhGlStgBz35ooe9TJQwypxmi3EZmuBlx9hOmhVFIu5dwap53rdtYHp5QV/qtDGvgY7B+dBRkoeFvbIKBaXQnSWs8m14T+8IUbvK48H7X929+fEoHjZOYectJJQlu4ceJqP4v3ioitWpplpmnZ8mAmBVBP24AtL6UPt7LXqOCCcXLxrG4CLvTnwUXzGdRrd47rAn5z13U73kwVGG2dwGaFQxdrvcOumVCzOvAzYtd8xStgCh6pyakr6bLEw2lzmxDJVHzETMzchPUNY+2M9pGTaW/XxWdvNAboUCcgGMuDkiE+IEib5SFaQh9Hkgm/qdh0LAdRmuMgEXsBLB5DK0/EIhaX6RLOqynM8AxJKIqqu6hklXRsP8Yiu6z7zUrVr/lG4oPciv8Es8XQM9ITKaca4cN/jozeDBsJctgk6SWmgeaiCqLZVA4t+QJE9o18ehHOyoH2jSRaH/Il14Jn8yHbT0/5nhE187TCH0qFKy+9mpX3N9V9K+I+J9vKsLUH93TbGBjhTkH5hw0MgTdkcHWWnccmi2E+xXv+N6Ap2c2KfWKTnBlVU87akGymtVe X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(4636009)(46966006)(36840700001)(70206006)(81166007)(508600001)(8676002)(30864003)(26005)(316002)(426003)(336012)(2616005)(1076003)(5660300002)(356005)(36860700001)(110136005)(70586007)(7696005)(47076005)(966005)(16526019)(82310400003)(83380400001)(186003)(2906002)(36756003)(8936002)(54906003)(4326008)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Oct 2021 13:02:05.4549 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ed025dc8-d067-4988-3d62-08d99a132469 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT030.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB3789 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org On newer heterogeneous systems with AMD CPUs, the data fabrics of the GPUs are connected directly via custom links. One such system, where Aldebaran GPU nodes are connected to the Family 19h, model 30h family of CPU nodes, the Aldebaran GPUs can report memory errors via SMCA banks. Aldebaran GPU support was added to DRM framework https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html The GPU nodes comes with HBM2 memory in-built, ECC support is enabled by default and the UMCs on GPU node are different from the UMCs on CPU nodes. GPU specific ops routines are defined to extend the amd64_edac module to enumerate HBM memory leveraging the existing edac and the amd64 specific data structures. Note: The UMC Phys on GPU nodes are enumerated as csrows and the UMC channels connected to HBM banks are enumerated as ranks. Cc: Yazen Ghannam Co-developed-by: Muralidhara M K Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi Link: https://lkml.kernel.org/r/20210823185437.94417-4-nchatrad@amd.com --- Changes since v5: Removed else condition in per_family_init for 19h family Changes since v4: Split "f17_addr_mask_to_cs_size" instead as did in 3rd patch earlier Changes since v3: 1. Bifurcated the GPU code from v2 Changes since v2: 1. Restored line deletions and handled minor comments 2. Modified commit message and some of the function comments 3. variable df_inst_id is introduced instead of umc_num Changes since v1: 1. Modifed the commit message 2. Change the edac_cap 3. kept sizes of both cpu and noncpu together 4. return success if the !F3 condition true and remove unnecessary validation drivers/edac/amd64_edac.c | 298 +++++++++++++++++++++++++++++++++----- drivers/edac/amd64_edac.h | 27 ++++ 2 files changed, 292 insertions(+), 33 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 7953ffe9d547..b404fa5b03ce 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -1121,6 +1121,20 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl) } } +static void debug_display_dimm_sizes_gpu(struct amd64_pvt *pvt, u8 ctrl) +{ + int size, cs = 0, cs_mode; + + edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl); + + cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY; + + for_each_chip_select(cs, ctrl, pvt) { + size = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs); + amd64_info(EDAC_MC ": %d: %5dMB\n", cs, size); + } +} + static void __dump_misc_regs_df(struct amd64_pvt *pvt) { struct amd64_umc *umc; @@ -1165,6 +1179,27 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt) pvt->dhar, dhar_base(pvt)); } +static void __dump_misc_regs_gpu(struct amd64_pvt *pvt) +{ + struct amd64_umc *umc; + u32 i, umc_base; + + for_each_umc(i) { + umc_base = get_umc_base(i); + umc = &pvt->umc[i]; + + edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg); + edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl); + edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl); + edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i); + + debug_display_dimm_sizes_gpu(pvt, i); + } + + edac_dbg(1, "F0x104 (DRAM Hole Address): 0x%08x, base: 0x%08x\n", + pvt->dhar, dhar_base(pvt)); +} + /* Display and decode various NB registers for debug purposes. */ static void __dump_misc_regs(struct amd64_pvt *pvt) { @@ -1245,6 +1280,43 @@ static void f17_prep_chip_selects(struct amd64_pvt *pvt) } } +static void gpu_prep_chip_selects(struct amd64_pvt *pvt) +{ + int umc; + + for_each_umc(umc) { + pvt->csels[umc].b_cnt = 8; + pvt->csels[umc].m_cnt = 8; + } +} + +static void read_umc_base_mask_gpu(struct amd64_pvt *pvt) +{ + u32 base_reg, mask_reg; + u32 *base, *mask; + int umc, cs; + + for_each_umc(umc) { + for_each_chip_select(cs, umc, pvt) { + base_reg = get_umc_base_gpu(umc, cs) + UMCCH_BASE_ADDR; + base = &pvt->csels[umc].csbases[cs]; + + if (!amd_smn_read(pvt->mc_node_id, base_reg, base)) { + edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n", + umc, cs, *base, base_reg); + } + + mask_reg = get_umc_base_gpu(umc, cs) + UMCCH_ADDR_MASK; + mask = &pvt->csels[umc].csmasks[cs]; + + if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask)) { + edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n", + umc, cs, *mask, mask_reg); + } + } + } +} + static void read_umc_base_mask(struct amd64_pvt *pvt) { u32 umc_base_reg, umc_base_reg_sec; @@ -1743,6 +1815,19 @@ static int f17_early_channel_count(struct amd64_pvt *pvt) return channels; } +static int gpu_early_channel_count(struct amd64_pvt *pvt) +{ + int i, channels = 0; + + /* The memory channels in case of GPUs are fully populated */ + for_each_umc(i) + channels += pvt->csels[i].b_cnt; + + amd64_info("MCT channel count: %d\n", channels); + + return channels; +} + static int ddr3_cs_size(unsigned i, bool dct_width) { unsigned shift = 0; @@ -1870,11 +1955,46 @@ static int f16_dbam_to_chip_select(struct amd64_pvt *pvt, u8 dct, return ddr3_cs_size(cs_mode, false); } +static int __addr_mask_to_cs_size(u32 addr_mask_orig, unsigned int cs_mode, + int csrow_nr, int dimm) +{ + u32 msb, weight, num_zero_bits; + u32 addr_mask_deinterleaved; + int size = 0; + + /* + * The number of zero bits in the mask is equal to the number of bits + * in a full mask minus the number of bits in the current mask. + * + * The MSB is the number of bits in the full mask because BIT[0] is + * always 0. + * + * In the special 3 Rank interleaving case, a single bit is flipped + * without swapping with the most significant bit. This can be handled + * by keeping the MSB where it is and ignoring the single zero bit. + */ + msb = fls(addr_mask_orig) - 1; + weight = hweight_long(addr_mask_orig); + num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE); + + /* Take the number of zero bits off from the top of the mask. */ + addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1); + + edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm); + edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig); + edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved); + + /* Register [31:1] = Address [39:9]. Size is in kBs here. */ + size = (addr_mask_deinterleaved >> 2) + 1; + + /* Return size in MBs. */ + return size >> 10; +} + static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, unsigned int cs_mode, int csrow_nr) { - u32 addr_mask_orig, addr_mask_deinterleaved; - u32 msb, weight, num_zero_bits; + u32 addr_mask_orig; int dimm, size = 0; /* No Chip Selects are enabled. */ @@ -1902,33 +2022,15 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, else addr_mask_orig = pvt->csels[umc].csmasks[dimm]; - /* - * The number of zero bits in the mask is equal to the number of bits - * in a full mask minus the number of bits in the current mask. - * - * The MSB is the number of bits in the full mask because BIT[0] is - * always 0. - * - * In the special 3 Rank interleaving case, a single bit is flipped - * without swapping with the most significant bit. This can be handled - * by keeping the MSB where it is and ignoring the single zero bit. - */ - msb = fls(addr_mask_orig) - 1; - weight = hweight_long(addr_mask_orig); - num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE); - - /* Take the number of zero bits off from the top of the mask. */ - addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1); - - edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm); - edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig); - edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved); + return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, dimm); +} - /* Register [31:1] = Address [39:9]. Size is in kBs here. */ - size = (addr_mask_deinterleaved >> 2) + 1; +static int gpu_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, + unsigned int cs_mode, int csrow_nr) +{ + u32 addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr]; - /* Return size in MBs. */ - return size >> 10; + return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, csrow_nr >> 1); } static void read_dram_ctl_register(struct amd64_pvt *pvt) @@ -2516,9 +2618,12 @@ static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl) /* Prototypes for family specific ops routines */ static int init_csrows(struct mem_ctl_info *mci); static int init_csrows_df(struct mem_ctl_info *mci); +static int init_csrows_gpu(struct mem_ctl_info *mci); static void read_mc_regs(struct amd64_pvt *pvt); static void __read_mc_regs_df(struct amd64_pvt *pvt); +static void __read_mc_regs_gpu(struct amd64_pvt *pvt); static void update_umc_err_info(struct mce *m, struct err_info *err); +static void update_umc_err_info_gpu(struct mce *m, struct err_info *err); static const struct low_ops k8_ops = { .early_channel_count = k8_early_channel_count, @@ -2597,6 +2702,17 @@ static const struct low_ops f17_ops = { .get_umc_err_info = update_umc_err_info, }; +static const struct low_ops gpu_ops = { + .early_channel_count = gpu_early_channel_count, + .dbam_to_cs = gpu_addr_mask_to_cs_size, + .prep_chip_select = gpu_prep_chip_selects, + .get_base_mask = read_umc_base_mask_gpu, + .get_misc_regs = __dump_misc_regs_gpu, + .get_mc_regs = __read_mc_regs_gpu, + .populate_csrows = init_csrows_gpu, + .get_umc_err_info = update_umc_err_info_gpu, +}; + static struct amd64_family_type family_types[] = { [K8_CPUS] = { .ctl_name = "K8", @@ -2689,6 +2805,14 @@ static struct amd64_family_type family_types[] = { .max_mcs = 8, .ops = f17_ops, }, + [ALDEBARAN_GPUS] = { + .ctl_name = "ALDEBARAN", + .f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0, + .f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6, + .max_mcs = 4, + .ops = gpu_ops, + }, + }; /* @@ -2947,12 +3071,38 @@ static void update_umc_err_info(struct mce *m, struct err_info *err) err->csrow = m->synd & 0x7; } +/* + * The CPUs have one channel per UMC, So UMC number is equivalent to a + * channel number. The GPUs have 8 channels per UMC, so the UMC number no + * longer works as a channel number. + * The channel number within a GPU UMC is given in MCA_IPID[15:12]. + * However, the IDs are split such that two UMC values go to one UMC, and + * the channel numbers are split in two groups of four. + * + * Refer comment on get_umc_base_gpu() from amd64_edac.h + * + * For example, + * UMC0 CH[3:0] = 0x0005[3:0]000 + * UMC0 CH[7:4] = 0x0015[3:0]000 + * UMC1 CH[3:0] = 0x0025[3:0]000 + * UMC1 CH[7:4] = 0x0035[3:0]000 + */ +static void update_umc_err_info_gpu(struct mce *m, struct err_info *err) +{ + u8 ch = (m->ipid & GENMASK(31, 0)) >> 20; + u8 phy = ((m->ipid >> 12) & 0xf); + + err->channel = ch % 2 ? phy + 4 : phy; + err->csrow = phy; +} + static void decode_umc_error(int node_id, struct mce *m) { u8 ecc_type = (m->status >> 45) & 0x3; struct mem_ctl_info *mci; struct amd64_pvt *pvt; struct err_info err; + u8 df_inst_id; u64 sys_addr; mci = edac_mc_find(node_id); @@ -2982,7 +3132,17 @@ static void decode_umc_error(int node_id, struct mce *m) pvt->ops->get_umc_err_info(m, &err); - if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) { + /* + * GPU node has #phys[X] which has #channels[Y] each. + * On GPUs, df_inst_id = [X] * num_ch_per_phy + [Y]. + * On CPUs, "Channel"="UMC Number"="DF Instance ID". + */ + if (pvt->is_gpu) + df_inst_id = (err.csrow * pvt->channel_count / mci->nr_csrows) + err.channel; + else + df_inst_id = err.channel; + + if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, df_inst_id, &sys_addr)) { err.err_code = ERR_NORM_ADDR; goto log_error; } @@ -3146,6 +3306,25 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt) amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar); } +static void __read_mc_regs_gpu(struct amd64_pvt *pvt) +{ + u8 nid = pvt->mc_node_id; + struct amd64_umc *umc; + u32 i, umc_base; + + /* Read registers from each UMC */ + for_each_umc(i) { + umc_base = get_umc_base_gpu(i, 0); + umc = &pvt->umc[i]; + + amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg); + amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl); + amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl); + } + + amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar); +} + /* * Retrieve the hardware registers of the memory controller (this includes the * 'Address Map' and 'Misc' device regs) @@ -3241,7 +3420,10 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig) csrow_nr >>= 1; cs_mode = DBAM_DIMM(csrow_nr, dbam); } else { - cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt); + if (pvt->is_gpu) + cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY; + else + cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt); } nr_pages = pvt->ops->dbam_to_cs(pvt, dct, cs_mode, csrow_nr); @@ -3298,6 +3480,35 @@ static int init_csrows_df(struct mem_ctl_info *mci) return empty; } +static int init_csrows_gpu(struct mem_ctl_info *mci) +{ + struct amd64_pvt *pvt = mci->pvt_info; + struct dimm_info *dimm; + int empty = 1; + u8 umc, cs; + + for_each_umc(umc) { + for_each_chip_select(cs, umc, pvt) { + if (!csrow_enabled(cs, umc, pvt)) + continue; + + empty = 0; + dimm = mci->csrows[umc]->channels[cs]->dimm; + + edac_dbg(1, "MC node: %d, csrow: %d\n", + pvt->mc_node_id, cs); + + dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs); + dimm->mtype = MEM_HBM2; + dimm->edac_mode = EDAC_SECDED; + dimm->dtype = DEV_X16; + dimm->grain = 64; + } + } + + return empty; +} + /* * Initialize the array of csrow attribute instances, based on the values * from pci config hardware registers. @@ -3539,6 +3750,10 @@ static bool ecc_enabled(struct amd64_pvt *pvt) u8 ecc_en = 0, i; u32 value; + /* ECC is enabled by default on GPU nodes */ + if (pvt->is_gpu) + return true; + if (boot_cpu_data.x86 >= 0x17) { u8 umc_en_mask = 0, ecc_en_mask = 0; struct amd64_umc *umc; @@ -3622,7 +3837,10 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci) mci->edac_ctl_cap = EDAC_FLAG_NONE; if (pvt->umc) { - f17h_determine_edac_ctl_cap(mci, pvt); + if (pvt->is_gpu) + mci->edac_ctl_cap |= EDAC_FLAG_SECDED; + else + f17h_determine_edac_ctl_cap(mci, pvt); } else { if (pvt->nbcap & NBCAP_SECDED) mci->edac_ctl_cap |= EDAC_FLAG_SECDED; @@ -3724,6 +3942,17 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt) pvt->ops = &family_types[F17_M70H_CPUS].ops; pvt->fam_type->ctl_name = "F19h_M20h"; break; + } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) { + if (pvt->mc_node_id >= amd_cpu_node_count()) { + pvt->fam_type = &family_types[ALDEBARAN_GPUS]; + pvt->ops = &family_types[ALDEBARAN_GPUS].ops; + pvt->is_gpu = true; + } else { + pvt->fam_type = &family_types[F19_CPUS]; + pvt->ops = &family_types[F19_CPUS].ops; + family_types[F19_CPUS].ctl_name = "F19h_M30h"; + } + break; } pvt->fam_type = &family_types[F19_CPUS]; pvt->ops = &family_types[F19_CPUS].ops; @@ -3791,7 +4020,9 @@ static int hw_info_get(struct amd64_pvt *pvt) determine_memory_type(pvt); edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]); - determine_ecc_sym_sz(pvt); + /* ECC symbol size is not available on GPU nodes */ + if (!pvt->is_gpu) + determine_ecc_sym_sz(pvt); return 0; } @@ -3819,9 +4050,10 @@ static int init_one_instance(struct amd64_pvt *pvt) if (pvt->channel_count < 0) return ret; + /* Define layers for CPU and GPU nodes */ ret = -ENOMEM; layers[0].type = EDAC_MC_LAYER_CHIP_SELECT; - layers[0].size = pvt->csels[0].b_cnt; + layers[0].size = pvt->is_gpu ? pvt->fam_type->max_mcs : pvt->csels[0].b_cnt; layers[0].is_virt_csrow = true; layers[1].type = EDAC_MC_LAYER_CHANNEL; @@ -3830,7 +4062,7 @@ static int init_one_instance(struct amd64_pvt *pvt) * only one channel. Also, this simplifies handling later for the price * of a couple of KBs tops. */ - layers[1].size = pvt->fam_type->max_mcs; + layers[1].size = pvt->is_gpu ? pvt->csels[0].b_cnt : pvt->fam_type->max_mcs; layers[1].is_virt_csrow = false; mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0); diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 82d9f64aa150..da2f6c79cccc 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -126,6 +126,8 @@ #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446 #define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650 #define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656 +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14d0 +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14d6 /* * Function 1 - Address Map @@ -298,6 +300,7 @@ enum amd_families { F17_M60H_CPUS, F17_M70H_CPUS, F19_CPUS, + ALDEBARAN_GPUS, NUM_FAMILIES, }; @@ -391,6 +394,8 @@ struct amd64_pvt { struct amd64_umc *umc; /* UMC registers */ struct amd64_family_type *fam_type; + + bool is_gpu; }; enum err_codes { @@ -412,6 +417,28 @@ struct err_info { u32 offset; }; +static inline u32 get_umc_base_gpu(u8 umc, u8 channel) +{ + /* + * On CPUs, there is one channel per UMC, so UMC numbering equals + * channel numbering. On GPUs, there are eight channels per UMC, + * so the channel numbering is different from UMC numbering. + * + * On CPU nodes channels are selected in 6th nibble + * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000; + * + * On GPU nodes channels are selected in 3rd nibble + * HBM chX[3:0]= [Y ]5X[3:0]000; + * HBM chX[7:4]= [Y+1]5X[3:0]000 + */ + umc *= 2; + + if (channel >= 4) + umc++; + + return 0x50000 + (umc << 20) + ((channel % 4) << 12); +} + static inline u32 get_umc_base(u8 channel) { /* chY: 0xY50000 */