From patchwork Thu Oct 14 18:50:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12559245 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F76CC433EF for ; Thu, 14 Oct 2021 18:51:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 47F5C61139 for ; Thu, 14 Oct 2021 18:51:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234004AbhJNSxd (ORCPT ); Thu, 14 Oct 2021 14:53:33 -0400 Received: from mail-dm6nam10on2084.outbound.protection.outlook.com ([40.107.93.84]:33120 "EHLO NAM10-DM6-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S232090AbhJNSx2 (ORCPT ); Thu, 14 Oct 2021 14:53:28 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=U7gvQpXfR8bLB5TuCBk8wXPmE4nxIwYzKn3UanRR/Uc/yKaDpUXCBSnMiWyWg7fKNoTCO8SNt7D/UYUFLkew+7slWMettvSDyy6Wb2vbf0O2nXq1vQsraIj/7I94CdLu/kfQspCfoKv0gkDxAsm9i2uJj+qF21po+BncPTkgxkWjsVUvWBLnch5BemVZSL39JFS85JGyeSx8J6vkgiemvt4NEM5sdeb004cPmzjZnEGcgNWUkzBVvywSQQbwpfMGqdd4qJOAhjNAPmXOWBQS23X8VL3pIGqn5O6QVLpTwmH0Th5d0w/0r/pLIrynn76tVfqAM35TcSu/Drq3j3W1zQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=HCAqUrgrOvlfJaVLYb8DntD9zAxtBtoxUrwQLW5Afxo=; b=ZQ6ifpfu14X82gFLpNPurA5JgOuk6noEPwxS5KQxurUfgsUhWBchidEUu6TWVR8tg+uzQP89gy6roQ0KoPdGdIRL99010O4la9YCN18sTTMQMF6vXJCGoQw34/Po3cxUJzDXExHEQmUyUzOzHEOk/kFFCx8CPovZmS0LqIu+9dKkkiczQnevIUPKdIobcQXkqJjMwrZSX13WSlf0y7bz59wI7skTfJFghtp1EqhqrbQcdU+Box3IYFBmUGmHEygDIVvdm+l95BP92BShBBFO+qMlOTehS/LEn7IxHneHeCAUJLD0umI3P94dR64ZKNmu1rZa6OC6EWTRYPKAyCiPzA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HCAqUrgrOvlfJaVLYb8DntD9zAxtBtoxUrwQLW5Afxo=; b=UC6ki/FQ/O0SrPkdJIjiH3xk5XPdZaZNXm2mZOatIe/9YITLVRew8bxDnuXdT914DtB/yaEraGR7ncgI2b2xIMx60gT39YGdLbMn+egJqhkiX7LBLznD1fQSuuvauvl9Ne/ruO0OOtbo0L5FBPWDP4khYbs7DCA8dXHkmWLppqM= Received: from CO2PR07CA0067.namprd07.prod.outlook.com (2603:10b6:100::35) by MWHPR12MB1197.namprd12.prod.outlook.com (2603:10b6:300:f::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4608.16; Thu, 14 Oct 2021 18:51:20 +0000 Received: from CO1NAM11FT021.eop-nam11.prod.protection.outlook.com (2603:10b6:100:0:cafe::69) by CO2PR07CA0067.outlook.office365.com (2603:10b6:100::35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4608.15 via Frontend Transport; Thu, 14 Oct 2021 18:51:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT021.mail.protection.outlook.com (10.13.175.51) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4608.15 via Frontend Transport; Thu, 14 Oct 2021 18:51:20 +0000 Received: from milan-ETHANOL-X.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.8; Thu, 14 Oct 2021 13:51:17 -0500 From: Naveen Krishna Chatradhi To: , CC: , , , , , Muralidhara M K , Naveen Krishna Chatradhi Subject: [PATCH v4 1/4] x86/amd_nb: Add support for northbridges on Aldebaran Date: Fri, 15 Oct 2021 00:20:55 +0530 Message-ID: <20211014185058.9587-2-nchatrad@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211014185058.9587-1-nchatrad@amd.com> References: <20210823185437.94417-1-nchatrad@amd.com> <20211014185058.9587-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 15408b1f-7af7-45f5-786b-08d98f439cc1 X-MS-TrafficTypeDiagnostic: MWHPR12MB1197: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:2657; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: FLc5QvGMnVz2c9p71y+6brc7yIE7x3XC/RDmfX31sU6ysaWxyW2E0yyTEqtKLW7/wovokeY9iHPkHKKLvZ6mtracDsHoQOTACtBncyowzIs4QveEMZEp5rTzRpllTiq14zvCS9TZQe0DUZA6veXsXqTdz4ilJip31myzbYuEsd8jq/e9wIsk6Dy+jeuTzZRZXnBE81swQJXthc8k1XJvC6LmDpTrvtbbstJn3aOc1xoaPknN31XYMrwXnkbh/ZTxK3J+0oNT2UC5aEUNe5V6JlIp4zHPrvv8sPmjrtZq4ijKSQxXxvrfUeBgQP/RuzM0ROMdDzr/WlDy8orSFLniahQIsReB0v95rCaa1rGoE3jMZyIgbfnbxzWh3I+VGjCwPkcFYPSXR2GZzT2HLG3LlVkfdki1z6MmCBs9/aUMLhp+sD+hTrBu803bYEJaDEqROzuvnBn+Vyl7ETMgsie9TCO3v6X5H+HoIu69Jvh8OfqDMVt09hN9ZOtw/Oz+UV13/Tx9wFCzgETfGkQ+XiAGCxVDR1kHNSylVAH3ZxyuO6SG4jPxCDus/Gg/9kYodS1gmdVIPjJ2fAxlxtVJKoiFkzu8MzkcuYiPriT2rCOGc7YHwJ4oMjFqMddyo/KsaTcuQfkkhxsyNgcxhW2zDJeEJCugGbxOQOKlybXeGlQvzX6Anrwopw0kqn5sJUCJrTSePvGXkph4fuAq2fWuOn8sTEwhR6ImCeESdy4OYoSXHBwQXuPtJ30h8wPQ/rU/RrKaXg6UuSpnBvLE435MSSoiEoPFwpfYW/rhscCesCMCMlfaGr2vwuXKGJAKvcGbHn+n X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(4636009)(36840700001)(46966006)(426003)(186003)(6666004)(2616005)(316002)(36860700001)(1076003)(70586007)(16526019)(70206006)(26005)(7696005)(336012)(81166007)(47076005)(2906002)(508600001)(8676002)(54906003)(8936002)(82310400003)(83380400001)(36756003)(966005)(4326008)(356005)(110136005)(5660300002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Oct 2021 18:51:20.3222 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 15408b1f-7af7-45f5-786b-08d98f439cc1 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT021.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR12MB1197 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Muralidhara M K On newer systems the CPUs manage MCA errors reported from the GPUs. Enumerate the GPU nodes with the AMD NB framework to support EDAC. GPU nodes are enumerated in sequential order based on the PCI hierarchy, and the first GPU node is assumed to have an "AMD Node ID" value after CPU Nodes are fully populated. Aldebaran is an AMD GPU, GPU drivers are part of the DRM framework https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html Each Aldebaran GPU has 2 Data Fabrics, which are enumerated as 2 nodes. With this implementation detail, the Data Fabric on the GPU nodes can be accessed the same way as the Data Fabric on CPU nodes. Special handling was necessary in northbridge enumeration as the roots_per_misc value is different for GPU and CPU nodes. Signed-off-by: Muralidhara M K Co-developed-by: Naveen Krishna Chatradhi Signed-off-by: Naveen Krishna Chatradhi Link: https://lkml.kernel.org/r/20210823185437.94417-2-nchatrad@amd.com --- Changes since v3: 1. Use word "gpu" instead of "noncpu" in the patch 2. Do not create pci_dev_ids arrays for gpu nodes 3. Identify the gpu node start index from DF18F1 registers on the GPU nodes. a. Export cpu node count and gpu start node id Changes since v2: 1. Added Reviewed-by Yazen Ghannam Changes since v1: 1. Modified the commit message and comments in the code 2. Squashed patch 1/7: "x86/amd_nb: Add Aldebaran device to PCI IDs" arch/x86/include/asm/amd_nb.h | 9 +++ arch/x86/kernel/amd_nb.c | 131 ++++++++++++++++++++++++++++------ include/linux/pci_ids.h | 1 + 3 files changed, 118 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h index 455066a06f60..5898300f11ed 100644 --- a/arch/x86/include/asm/amd_nb.h +++ b/arch/x86/include/asm/amd_nb.h @@ -68,10 +68,17 @@ struct amd_northbridge { struct threshold_bank *bank4; }; +/* heterogeneous system node type map variables */ +struct amd_node_map { + u16 gpu_node_start_id; + u16 cpu_node_count; +}; + struct amd_northbridge_info { u16 num; u64 flags; struct amd_northbridge *nb; + struct amd_node_map *nmap; }; #define AMD_NB_GART BIT(0) @@ -83,6 +90,8 @@ struct amd_northbridge_info { u16 amd_nb_num(void); bool amd_nb_has_feature(unsigned int feature); struct amd_northbridge *node_to_amd_nb(int node); +u16 amd_gpu_node_start_id(void); +u16 amd_cpu_node_count(void); static inline u16 amd_pci_dev_to_node_id(struct pci_dev *pdev) { diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c index c92c9c774c0e..54a6a7462f07 100644 --- a/arch/x86/kernel/amd_nb.c +++ b/arch/x86/kernel/amd_nb.c @@ -19,6 +19,7 @@ #define PCI_DEVICE_ID_AMD_17H_M10H_ROOT 0x15d0 #define PCI_DEVICE_ID_AMD_17H_M30H_ROOT 0x1480 #define PCI_DEVICE_ID_AMD_17H_M60H_ROOT 0x1630 +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT 0x14bb #define PCI_DEVICE_ID_AMD_17H_DF_F4 0x1464 #define PCI_DEVICE_ID_AMD_17H_M10H_DF_F4 0x15ec #define PCI_DEVICE_ID_AMD_17H_M30H_DF_F4 0x1494 @@ -28,6 +29,7 @@ #define PCI_DEVICE_ID_AMD_19H_M40H_ROOT 0x14b5 #define PCI_DEVICE_ID_AMD_19H_M40H_DF_F4 0x167d #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4 0x14d4 /* Protect the PCI config register pairs used for SMN and DF indirect access. */ static DEFINE_MUTEX(smn_mutex); @@ -40,6 +42,7 @@ static const struct pci_device_id amd_root_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M30H_ROOT) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_17H_M60H_ROOT) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_ROOT) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) }, {} }; @@ -63,6 +66,7 @@ static const struct pci_device_id amd_nb_misc_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_DF_F3) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F3) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F3) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) }, {} }; @@ -81,6 +85,7 @@ static const struct pci_device_id amd_nb_link_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M40H_DF_F4) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_19H_M50H_DF_F4) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) }, {} }; @@ -126,6 +131,55 @@ struct amd_northbridge *node_to_amd_nb(int node) } EXPORT_SYMBOL_GPL(node_to_amd_nb); +/* + * GPU start index and CPU count values on an heterogeneous system, + * these values will be used by the AMD EDAC and MCE modules. + */ +u16 amd_gpu_node_start_id(void) +{ + return (amd_northbridges.nmap) ? + amd_northbridges.nmap->gpu_node_start_id : 0; +} +EXPORT_SYMBOL_GPL(amd_gpu_node_start_id); + +u16 amd_cpu_node_count(void) +{ + return (amd_northbridges.nmap) ? + amd_northbridges.nmap->cpu_node_count : amd_northbridges.num; +} +EXPORT_SYMBOL_GPL(amd_cpu_node_count); + +/* DF18xF1 regsters on Aldebaran GPU */ +#define REG_LOCAL_NODE_TYPE_MAP 0x144 +#define REG_RMT_NODE_TYPE_MAP 0x148 + +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F1 0x14d1 + +static int amd_get_node_map(void) +{ + struct amd_node_map *np; + struct pci_dev *pdev = NULL; + u32 tmp; + + pdev = pci_get_device(PCI_VENDOR_ID_AMD, + PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F1, pdev); + if (!pdev) + return -ENODEV; + + np = kmalloc(sizeof(*np), GFP_KERNEL); + if (!np) + return -ENOMEM; + + pci_read_config_dword(pdev, REG_LOCAL_NODE_TYPE_MAP, &tmp); + np->gpu_node_start_id = tmp & 0xFFF; + + pci_read_config_dword(pdev, REG_RMT_NODE_TYPE_MAP, &tmp); + np->cpu_node_count = tmp >> 16 & 0xFFF; + + amd_northbridges.nmap = np; + return 0; +} + static struct pci_dev *next_northbridge(struct pci_dev *dev, const struct pci_device_id *ids) { @@ -230,6 +284,27 @@ int amd_df_indirect_read(u16 node, u8 func, u16 reg, u8 instance_id, u32 *lo) } EXPORT_SYMBOL_GPL(amd_df_indirect_read); +struct pci_dev *get_root_devs(struct pci_dev *root, + const struct pci_device_id *root_ids, + u16 roots_per_misc) +{ + u16 j; + + /* + * If there are more PCI root devices than data fabric/ + * system management network interfaces, then the (N) + * PCI roots per DF/SMN interface are functionally the + * same (for DF/SMN access) and N-1 are redundant. N-1 + * PCI roots should be skipped per DF/SMN interface so + * the following DF/SMN interfaces get mapped to + * correct PCI roots. + */ + for (j = 0; j < roots_per_misc; j++) + root = next_northbridge(root, root_ids); + + return root; +} + int amd_cache_northbridges(void) { const struct pci_device_id *misc_ids = amd_nb_misc_ids; @@ -237,10 +312,10 @@ int amd_cache_northbridges(void) const struct pci_device_id *root_ids = amd_root_ids; struct pci_dev *root, *misc, *link; struct amd_northbridge *nb; - u16 roots_per_misc = 0; - u16 misc_count = 0; - u16 root_count = 0; - u16 i, j; + u16 roots_per_misc = 0, gpu_roots_per_misc = 0; + u16 misc_count = 0, gpu_misc_count = 0; + u16 root_count = 0, gpu_root_count = 0; + u16 i; if (amd_northbridges.num) return 0; @@ -252,15 +327,23 @@ int amd_cache_northbridges(void) } misc = NULL; - while ((misc = next_northbridge(misc, misc_ids)) != NULL) - misc_count++; + while ((misc = next_northbridge(misc, misc_ids)) != NULL) { + if (misc->device == PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) + gpu_misc_count++; + else + misc_count++; + } if (!misc_count) return -ENODEV; root = NULL; - while ((root = next_northbridge(root, root_ids)) != NULL) - root_count++; + while ((root = next_northbridge(root, root_ids)) != NULL) { + if (root->device == PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) + gpu_root_count++; + else + root_count++; + } if (root_count) { roots_per_misc = root_count / misc_count; @@ -275,33 +358,35 @@ int amd_cache_northbridges(void) } } - nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL); + /* + * The number of miscs, roots and roots_per_misc might vary on different + * nodes of a heterogeneous system. + * calculate roots_per_misc accordingly in order to skip the redundant + * roots and map the DF/SMN interfaces to correct PCI roots. + */ + if (gpu_root_count && gpu_misc_count) { + if (amd_get_node_map()) + return -ENOMEM; + + gpu_roots_per_misc = gpu_root_count / gpu_misc_count; + } + + amd_northbridges.num = misc_count + gpu_misc_count; + nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), GFP_KERNEL); if (!nb) return -ENOMEM; amd_northbridges.nb = nb; - amd_northbridges.num = misc_count; link = misc = root = NULL; for (i = 0; i < amd_northbridges.num; i++) { + u16 misc_roots = i < misc_count ? roots_per_misc : gpu_roots_per_misc; node_to_amd_nb(i)->root = root = - next_northbridge(root, root_ids); + get_root_devs(root, root_ids, misc_roots); node_to_amd_nb(i)->misc = misc = next_northbridge(misc, misc_ids); node_to_amd_nb(i)->link = link = next_northbridge(link, link_ids); - - /* - * If there are more PCI root devices than data fabric/ - * system management network interfaces, then the (N) - * PCI roots per DF/SMN interface are functionally the - * same (for DF/SMN access) and N-1 are redundant. N-1 - * PCI roots should be skipped per DF/SMN interface so - * the following DF/SMN interfaces get mapped to - * correct PCI roots. - */ - for (j = 1; j < roots_per_misc; j++) - root = next_northbridge(root, root_ids); } if (amd_gart_present()) diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 011f2f1ea5bb..b3a0ec29dbd6 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -557,6 +557,7 @@ #define PCI_DEVICE_ID_AMD_19H_DF_F3 0x1653 #define PCI_DEVICE_ID_AMD_19H_M40H_DF_F3 0x167c #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3 0x14d3 #define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703 #define PCI_DEVICE_ID_AMD_LANCE 0x2000 #define PCI_DEVICE_ID_AMD_LANCE_HOME 0x2001 From patchwork Thu Oct 14 18:50:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12559247 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00D4AC433EF for ; Thu, 14 Oct 2021 18:51:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E079460E78 for ; Thu, 14 Oct 2021 18:51:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232266AbhJNSxj (ORCPT ); Thu, 14 Oct 2021 14:53:39 -0400 Received: from mail-dm3nam07on2089.outbound.protection.outlook.com ([40.107.95.89]:52289 "EHLO NAM02-DM3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S232090AbhJNSxf (ORCPT ); Thu, 14 Oct 2021 14:53:35 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LLqf//E5G8nJytEnffbZUmwfStoI1Hjqgc1R5belbHKg3liVyPvcWwAXNu1s34M5Fh67Iz3g7Dir5zfNs+ROAl0y+lFRgox7wJK3j1x3Dim3G638pa7DokOE5Tq32szTpbnOcn0+Js8wsgxqVhG988s9fDVFXsrpiaB4fycUmqwIfIIGD0i6fEq8687ldte1uJmO5FrpPhWTeE3PJDnutaBcfzeiWcu+5tLdhiOJO5V0yX10729ABXe1szEs4nlowQ2snrqr9+jOn1Ar8oMVsEG5Dn3enEkpTwdeYOo5QSV/u0g/zrzSL9vaoOMxNSk0HntJCUqTsZB7WNqT8+rOFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=gMnFZhUHhQ4WkhpMBtqS1zhSnoMRiPZtaaJTHCZCOCk=; b=W2G0MnpuEjMU9cDaAmYlHiAp+oIQDVpKTM7SJ+Y0RL8TzwTarpHALEUhvGpMdtawXkjFecaJ3pH20wVar4u7rzubhBT/bdeJ6QpYQcNyWE64PJK6dKg+Wq7taLB5Q8p7rZobG9fivbMRppygeIBzv0KbD4hzeYjAfoJDzBMnbM3NKn/+hcx3b+F/awiuk8REGEoqKyYqswB8xXkNcLULYbsz5A5WPN2jluntx/zHkcfeKYCRTkkXqwhhw8VcYvvUeAYCz/sJmRdK2KK3/JCqpgRDdZYmeXYHEe7y8jrldSMJfeW16gLm8SADJfhC5P5UWu3Qffz4O6NmMiJBxbgnXg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=gMnFZhUHhQ4WkhpMBtqS1zhSnoMRiPZtaaJTHCZCOCk=; b=n+o103FOSDfXncVcRjW2VTuwjOY1rDN6sOOmPG3iUySSBXxyEvn/QDrEY36Cuvoq8rQbDC0BHvQfRSdmoPLO6tx5/jvNTh8L7r3+1gMOxudOea4gzj9s1YfcE7j1YcPJrtmH835z/L/gMV3mOAK26QpErsIFNkK6cLyS4kG6JTc= Received: from MWHPR17CA0056.namprd17.prod.outlook.com (2603:10b6:300:93::18) by BYAPR12MB2600.namprd12.prod.outlook.com (2603:10b6:a03:69::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4587.18; Thu, 14 Oct 2021 18:51:26 +0000 Received: from CO1NAM11FT037.eop-nam11.prod.protection.outlook.com (2603:10b6:300:93:cafe::17) by MWHPR17CA0056.outlook.office365.com (2603:10b6:300:93::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4608.14 via Frontend Transport; Thu, 14 Oct 2021 18:51:26 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT037.mail.protection.outlook.com (10.13.174.91) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4608.15 via Frontend Transport; Thu, 14 Oct 2021 18:51:25 +0000 Received: from milan-ETHANOL-X.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.8; Thu, 14 Oct 2021 13:51:21 -0500 From: Naveen Krishna Chatradhi To: , CC: , , , , , Naveen Krishna Chatradhi , Muralidhara M K Subject: [PATCH v4 2/4] EDAC/mce_amd: Extract node id from MCA_IPID Date: Fri, 15 Oct 2021 00:20:56 +0530 Message-ID: <20211014185058.9587-3-nchatrad@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211014185058.9587-1-nchatrad@amd.com> References: <20210823185437.94417-1-nchatrad@amd.com> <20211014185058.9587-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 3a7e7a2f-38d5-4095-8f79-08d98f439fe6 X-MS-TrafficTypeDiagnostic: BYAPR12MB2600: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:1060; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: dnF8u5t1snVi6ydsAGq9+nQxLbG8VsfAzNA5Psxs/9Ju33RERQuSsL7zKk9TcWgsARyLOG/ue/uGnEwCMDjgdXvRC8KVIdA1xviC21AaM/WoKM8+I40SUtz4uVaQIQGEmBeHsIgdWQWkNB/i8nci8Vk7iw8FacOjkOvDOhbQs/APIaII2ywug3y4h6dJhJio0UIHpDOYiiM/A3JjYVguoWn8LEV0+rp6Zt3AYqJ+fEA/bU/TPOTqOs9z+tmiL3/hrZDRnVS/mR58ec7BvauH3C2r3JI5tgfdooZ1nxv4nSXk34pHb7eIuPPRtXfcdmExSn/hVD7xGHBQZzxNAARk55hRQLjGJJrC57zGi88Gkah5aJTA/lgIOJNoEY9c0+o/2iM34cGoixPawDyafm+gFHFSXJqMgweHN7+h6LRFQaz5TfgewL1gsTi6wb5MmMG6jQYxsJBNCM/QEIP0sBKuv1gM1seI6kw+RyWWv59ReXfecNMpLr96iZTtHPiWpOnra2d+gedTO0TJ9B5NDki7sp1hEuztFXSG57vw3xmvFcnTW0zEoY+rnbJxB5gX7j1cIWWtFsCuLW7t4xbePmg89LlZni5AhKbGHgDcgAeHJEu+9v400p1bue697WFsHaB/BcQoUGPSaeNtA51xLcgKcrdyn93L6JaTo2ABFBGFw8y3Jj2gRJjUnbvqInmWkJY3V242N2JGHghry3zb31RGjtv9eef6NHMMoidnn+WGs3dlNbsr7hDQx6klC1S2uRWwhjmyq+Gw09sjor6C+gedRbZiZHi93Q0UWht3p9Yc5sAWvJgXQZQ4IFEqfrS7AsT7 X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(4636009)(46966006)(36840700001)(47076005)(70206006)(5660300002)(7696005)(110136005)(966005)(4326008)(70586007)(36756003)(36860700001)(2906002)(316002)(508600001)(2616005)(81166007)(26005)(54906003)(16526019)(82310400003)(6666004)(1076003)(186003)(426003)(8676002)(336012)(8936002)(83380400001)(356005)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Oct 2021 18:51:25.6274 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3a7e7a2f-38d5-4095-8f79-08d98f439fe6 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT037.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR12MB2600 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org On SMCA banks of the GPU nodes, the node id information is available in register MCA_IPID[47:44](InstanceIdHi). Convert the hardware node ID to a value used by Linux where GPU nodes are sequencially after the CPU nodes. Co-developed-by: Muralidhara M K Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi Link: https://lkml.kernel.org/r/20210823185437.94417-3-nchatrad@amd.com --- Changes since v3: 1. Use APIs from amd_nb to identify the gpu_node_start_id and cpu_node_count. Which is required to map the hardware node id to node id enumerated by Linux. Changes since v2: 1. Modified subject and commit message 2. Added Reviewed by Yazen Ghannam Changes since v1: 1. Modified the commit message 2. rearranged the conditions before calling decode_dram_ecc() drivers/edac/mce_amd.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c index 67dbf4c31271..af6caa76adc7 100644 --- a/drivers/edac/mce_amd.c +++ b/drivers/edac/mce_amd.c @@ -2,6 +2,7 @@ #include #include +#include #include #include "mce_amd.h" @@ -1072,8 +1073,27 @@ static void decode_smca_error(struct mce *m) if (xec < smca_mce_descs[bank_type].num_descs) pr_cont(", %s.\n", smca_mce_descs[bank_type].descs[xec]); - if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc) - decode_dram_ecc(topology_die_id(m->extcpu), m); + if (xec == 0 && decode_dram_ecc) { + int node_id = 0; + + if (bank_type == SMCA_UMC) { + node_id = topology_die_id(m->extcpu); + } else if (bank_type == SMCA_UMC_V2) { + /* + * SMCA_UMC_V2 exists on GPU nodes, extract the node id + * from register MCA_IPID[47:44](InstanceIdHi). + * The InstanceIdHi field represents the instance ID of the GPU. + * Which needs to be mapped to a value used by Linux, + * where GPU nodes are simply numerically after the CPU nodes. + */ + node_id = ((m->ipid >> 44) & 0xF) - + amd_gpu_node_start_id() + amd_cpu_node_count(); + } else { + return; + } + + decode_dram_ecc(node_id, m); + } } static inline void amd_decode_err_code(u16 ec) From patchwork Thu Oct 14 18:50:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12559249 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C0ABC433FE for ; Thu, 14 Oct 2021 18:51:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4FEA360E78 for ; Thu, 14 Oct 2021 18:51:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234030AbhJNSxk (ORCPT ); Thu, 14 Oct 2021 14:53:40 -0400 Received: from mail-dm6nam11on2040.outbound.protection.outlook.com ([40.107.223.40]:20417 "EHLO NAM11-DM6-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S234019AbhJNSxi (ORCPT ); Thu, 14 Oct 2021 14:53:38 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Cd1/SqdT7ZWw/MtZANgO9vdxUdA5aWKFhvxzoSYZ8EaGLfH1Jio07o/gsup23AILgrNMK0HezxkGn+bmBuqHsydYfCm+TbdOf2Nm4QHD4o3/e47DMkpMHi5gz8rfDgndufuxQfZFhZ6bhvhodclVkdQQ0ArGRUfQyGyKEMUB/REtOPGB904vDdzAARL46mZzV3gQmXgHXxrJ9ORzWfb1Tt+aAFQGZkDpKa5SHi/TIE6wUbPhWdXxkjGqudKIkDhUn/ATO9k8W+fv1YwbeAFPByF5bp6r2LlDziqWkO1xf8XjbWNL86NppPgPm2rWlCWNArIxyFEsR8h0lKQd1SkIeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=14MHjlX3sRKE8ZFjcwF4Z/ofGJ6RUOy2udBtPpFjJxI=; b=P1sc9cnp4V24FrUscvKaTmRPkcTfR/vmeSE7/KW0UuLWkGNWvpbJADX74BxBgMvsXdopHNTQbGDUMN8J1aRQGrl50NJZRbI1PiTJ9u31OqNcrMCbQ1g5pTIm+7mT1Wi2OQAPrQNhvErM3bUezbxXn/Cr3DrVpoKDxawq76GEbU67DP3PHNMVFIATtcH5Lz53nupSFaQYNZLaba0YfTvAjVBc55WoihqClvUHZzs6RtlK3LGq8Kc91c3U9civ3lYkiMie7+06h6QNQS+IiIgQjku26kXpwqHmAmoKaZ1ZpEkC/eiuGrZuxojiSVMmB5l/x6CYlpmvbTMmizbLgwDMwA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=14MHjlX3sRKE8ZFjcwF4Z/ofGJ6RUOy2udBtPpFjJxI=; b=vrqNN3tZHi9Uxntxq4A1SLEWuHCDMliOQ84Jicxqrc5vek/H4kdI/iLrwv3V1qP3NUtYJ4G5J0DDxv7CVcWiRke1Vg4v6OnGoKGX7vKRT6c6X4kz2+KPEUExbZe3MqSM4DEPqywEB4tXT4kmeX0STcof4PZIgFzNZL5lTr0dHEM= Received: from MWHPR1201CA0019.namprd12.prod.outlook.com (2603:10b6:301:4a::29) by BN9PR12MB5179.namprd12.prod.outlook.com (2603:10b6:408:11c::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4608.16; Thu, 14 Oct 2021 18:51:30 +0000 Received: from CO1NAM11FT004.eop-nam11.prod.protection.outlook.com (2603:10b6:301:4a:cafe::b4) by MWHPR1201CA0019.outlook.office365.com (2603:10b6:301:4a::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4608.15 via Frontend Transport; Thu, 14 Oct 2021 18:51:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT004.mail.protection.outlook.com (10.13.175.89) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4608.15 via Frontend Transport; Thu, 14 Oct 2021 18:51:29 +0000 Received: from milan-ETHANOL-X.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.8; Thu, 14 Oct 2021 13:51:25 -0500 From: Naveen Krishna Chatradhi To: , CC: , , , , , Muralidhara M K , Naveen Krishna Chatradhi Subject: [PATCH 3/4] EDAC/amd64: Extend family ops functions Date: Fri, 15 Oct 2021 00:20:57 +0530 Message-ID: <20211014185058.9587-4-nchatrad@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211014185058.9587-1-nchatrad@amd.com> References: <20210823185437.94417-1-nchatrad@amd.com> <20211014185058.9587-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: c462c0c0-4fce-437d-d628-08d98f43a262 X-MS-TrafficTypeDiagnostic: BN9PR12MB5179: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:7691; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 15N8nPK9Zmd7e8zX/KlqPnox5xITrFXP8+Ib1HEXNYG7vIioXguQYevGgMWjB48Joy9gI1YZZOia/vMizhpQJAL63vaxSyaZslK0wh7GGNQgaOkzox3z8nCmQj6c51p7Vyf5Zgjxd7uT+yq9SIMKU0CL+9Ww0pSN7TBBUyCOGjAbfbgokHCfafzBPlBEIc/uPSD5VHjq6/EY5xhfqKrsoZGT45D88UZ/1IR1EYw1GrbblOjWTb70klLgan9Bf+GAJR4nAZqajlj20MrqK0ghguQYyHLBgaSgN9dsdt4/51FjgaHLsZs8hlfqPB5EOxGyz4wpmwc+nEUAvax1Z1X39UJIQPdzcWgy8Yf+exJoZUiYT1o+buD4Ny4UPT1Rbos3T+K8jsxkiGFOae0EDDok62SD/V1qGzx4j33XGRNQ48+bZYZAs6fvCEjfojasi6p3fifqbM1urcSafI3Aqt8DDnTOkD1mQbvnjiNh4QlCuAH7aIgmUXFbuukGmZxliX7TSbqBKlg8UbkKIvhsvProHyV6oiTY1w0FLYwHNQVpFimp7cItuLR9snzHuklGaXOjti9/h8nKh7Zf3pcIxq+wemwPRf1OvzNOEWZaxzY3c5LHMaeGBcBqbz7/Jn/exa8xYrDxCRUnKQj1kYtL69xGO3Fo60ToqcA2s91UQ6pCnAeEpvMnUoxxfHeCi+0zSfpQ18yAwbMUqKKN2OPeq+emSBh1v+ORaOWQ5BVhYftgcfQ= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(4636009)(46966006)(36840700001)(36756003)(336012)(81166007)(356005)(16526019)(36860700001)(26005)(7696005)(8676002)(4326008)(5660300002)(186003)(47076005)(30864003)(316002)(2906002)(508600001)(6666004)(8936002)(70206006)(83380400001)(70586007)(1076003)(426003)(82310400003)(2616005)(54906003)(110136005)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Oct 2021 18:51:29.7993 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c462c0c0-4fce-437d-d628-08d98f43a262 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT004.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN9PR12MB5179 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org From: Muralidhara M K Create new family operation routines and define them respectively. This would simplify adding support for future platforms. Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi --- Changes since v3: 1. Defined new family operation routines Changs since v2: 1. new patch drivers/edac/amd64_edac.c | 291 ++++++++++++++++++++++---------------- drivers/edac/amd64_edac.h | 6 + 2 files changed, 174 insertions(+), 123 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 4fce75013674..131ed19f69dd 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -1204,10 +1204,7 @@ static void __dump_misc_regs(struct amd64_pvt *pvt) /* Display and decode various NB registers for debug purposes. */ static void dump_misc_regs(struct amd64_pvt *pvt) { - if (pvt->umc) - __dump_misc_regs_df(pvt); - else - __dump_misc_regs(pvt); + pvt->ops->display_misc_regs(pvt); edac_dbg(1, " DramHoleValid: %s\n", dhar_valid(pvt) ? "yes" : "no"); @@ -1217,25 +1214,31 @@ static void dump_misc_regs(struct amd64_pvt *pvt) /* * See BKDG, F2x[1,0][5C:40], F2[1,0][6C:60] */ -static void prep_chip_selects(struct amd64_pvt *pvt) +static void k8_prep_chip_selects(struct amd64_pvt *pvt) { - if (pvt->fam == 0xf && pvt->ext_model < K8_REV_F) { - pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8; - pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 8; - } else if (pvt->fam == 0x15 && pvt->model == 0x30) { - pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4; - pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2; - } else if (pvt->fam >= 0x17) { - int umc; - - for_each_umc(umc) { - pvt->csels[umc].b_cnt = 4; - pvt->csels[umc].m_cnt = 2; - } + pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8; + pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 8; +} - } else { - pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8; - pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4; +static void f15m30_prep_chip_selects(struct amd64_pvt *pvt) +{ + pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 4; + pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 2; +} + +static void fmisc_prep_chip_selects(struct amd64_pvt *pvt) +{ + pvt->csels[0].b_cnt = pvt->csels[1].b_cnt = 8; + pvt->csels[0].m_cnt = pvt->csels[1].m_cnt = 4; +} + +static void f17_prep_chip_selects(struct amd64_pvt *pvt) +{ + int umc; + + for_each_umc(umc) { + pvt->csels[umc].b_cnt = 4; + pvt->csels[umc].m_cnt = 2; } } @@ -1297,10 +1300,10 @@ static void read_dct_base_mask(struct amd64_pvt *pvt) { int cs; - prep_chip_selects(pvt); + pvt->ops->prep_chip_select(pvt); - if (pvt->umc) - return read_umc_base_mask(pvt); + if (pvt->ops->get_base_mask) + return pvt->ops->get_base_mask(pvt); for_each_chip_select(cs, 0, pvt) { int reg0 = DCSB0 + (cs * 4); @@ -1869,37 +1872,12 @@ static int f16_dbam_to_chip_select(struct amd64_pvt *pvt, u8 dct, return ddr3_cs_size(cs_mode, false); } -static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, - unsigned int cs_mode, int csrow_nr) +static int __addr_mask_to_cs_size(u32 addr_mask_orig, unsigned int cs_mode, + int csrow_nr, int dimm) { - u32 addr_mask_orig, addr_mask_deinterleaved; u32 msb, weight, num_zero_bits; - int dimm, size = 0; - - /* No Chip Selects are enabled. */ - if (!cs_mode) - return size; - - /* Requested size of an even CS but none are enabled. */ - if (!(cs_mode & CS_EVEN) && !(csrow_nr & 1)) - return size; - - /* Requested size of an odd CS but none are enabled. */ - if (!(cs_mode & CS_ODD) && (csrow_nr & 1)) - return size; - - /* - * There is one mask per DIMM, and two Chip Selects per DIMM. - * CS0 and CS1 -> DIMM0 - * CS2 and CS3 -> DIMM1 - */ - dimm = csrow_nr >> 1; - - /* Asymmetric dual-rank DIMM support. */ - if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY)) - addr_mask_orig = pvt->csels[umc].csmasks_sec[dimm]; - else - addr_mask_orig = pvt->csels[umc].csmasks[dimm]; + u32 addr_mask_deinterleaved; + int size = 0; /* * The number of zero bits in the mask is equal to the number of bits @@ -1930,6 +1908,40 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, return size >> 10; } +static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, + unsigned int cs_mode, int csrow_nr) +{ + u32 addr_mask_orig; + int dimm, size = 0; + + /* No Chip Selects are enabled. */ + if (!cs_mode) + return size; + + /* Requested size of an even CS but none are enabled. */ + if (!(cs_mode & CS_EVEN) && !(csrow_nr & 1)) + return size; + + /* Requested size of an odd CS but none are enabled. */ + if (!(cs_mode & CS_ODD) && (csrow_nr & 1)) + return size; + + /* + * There is one mask per DIMM, and two Chip Selects per DIMM. + * CS0 and CS1 -> DIMM0 + * CS2 and CS3 -> DIMM1 + */ + dimm = csrow_nr >> 1; + + /* Asymmetric dual-rank DIMM support. */ + if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY)) + addr_mask_orig = pvt->csels[umc].csmasks_sec[dimm]; + else + addr_mask_orig = pvt->csels[umc].csmasks[dimm]; + + return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, dimm); +} + static void read_dram_ctl_register(struct amd64_pvt *pvt) { @@ -2512,143 +2524,168 @@ static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl) } } +/* Prototypes for family specific ops routines */ +static int init_csrows(struct mem_ctl_info *mci); +static int init_csrows_df(struct mem_ctl_info *mci); +static void __read_mc_regs_df(struct amd64_pvt *pvt); +static void find_umc_channel(struct mce *m, struct err_info *err); + +static const struct low_ops k8_ops = { + .early_channel_count = k8_early_channel_count, + .map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow, + .dbam_to_cs = k8_dbam_to_chip_select, + .prep_chip_select = k8_prep_chip_selects, + .display_misc_regs = __dump_misc_regs, + .populate_csrows = init_csrows, +}; + +static const struct low_ops f10_ops = { + .early_channel_count = f1x_early_channel_count, + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, + .dbam_to_cs = f10_dbam_to_chip_select, + .prep_chip_select = fmisc_prep_chip_selects, + .display_misc_regs = __dump_misc_regs, + .populate_csrows = init_csrows, +}; + +static const struct low_ops f15_ops = { + .early_channel_count = f1x_early_channel_count, + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, + .dbam_to_cs = f15_dbam_to_chip_select, + .prep_chip_select = fmisc_prep_chip_selects, + .display_misc_regs = __dump_misc_regs, + .populate_csrows = init_csrows, +}; + +static const struct low_ops f15m30_ops = { + .early_channel_count = f1x_early_channel_count, + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, + .dbam_to_cs = f16_dbam_to_chip_select, + .prep_chip_select = f15m30_prep_chip_selects, + .display_misc_regs = __dump_misc_regs, + .populate_csrows = init_csrows, +}; + +static const struct low_ops f16_x_ops = { + .early_channel_count = f1x_early_channel_count, + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, + .dbam_to_cs = f15_m60h_dbam_to_chip_select, + .prep_chip_select = fmisc_prep_chip_selects, + .display_misc_regs = __dump_misc_regs, + .populate_csrows = init_csrows, +}; + +static const struct low_ops f16_ops = { + .early_channel_count = f1x_early_channel_count, + .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, + .dbam_to_cs = f16_dbam_to_chip_select, + .prep_chip_select = fmisc_prep_chip_selects, + .display_misc_regs = __dump_misc_regs, + .populate_csrows = init_csrows, +}; + +static const struct low_ops f17_ops = { + .early_channel_count = f17_early_channel_count, + .dbam_to_cs = f17_addr_mask_to_cs_size, + .prep_chip_select = f17_prep_chip_selects, + .get_base_mask = read_umc_base_mask, + .display_misc_regs = __dump_misc_regs_df, + .get_mc_regs = __read_mc_regs_df, + .populate_csrows = init_csrows_df, + .get_umc_err_info = find_umc_channel, +}; + static struct amd64_family_type family_types[] = { [K8_CPUS] = { .ctl_name = "K8", .f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP, .f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL, .max_mcs = 2, - .ops = { - .early_channel_count = k8_early_channel_count, - .map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow, - .dbam_to_cs = k8_dbam_to_chip_select, - } + .ops = k8_ops, }, [F10_CPUS] = { .ctl_name = "F10h", .f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP, .f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM, .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f10_dbam_to_chip_select, - } + .ops = f10_ops, }, [F15_CPUS] = { .ctl_name = "F15h", .f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2, .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f15_dbam_to_chip_select, - } + .ops = f15_ops, }, [F15_M30H_CPUS] = { .ctl_name = "F15h_M30h", .f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2, .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f16_dbam_to_chip_select, - } + .ops = f15m30_ops, }, [F15_M60H_CPUS] = { .ctl_name = "F15h_M60h", .f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2, .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f15_m60h_dbam_to_chip_select, - } + .ops = f16_x_ops, }, [F16_CPUS] = { .ctl_name = "F16h", .f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2, .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f16_dbam_to_chip_select, - } + .ops = f16_ops, }, [F16_M30H_CPUS] = { .ctl_name = "F16h_M30h", .f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1, .f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2, .max_mcs = 2, - .ops = { - .early_channel_count = f1x_early_channel_count, - .map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow, - .dbam_to_cs = f16_dbam_to_chip_select, - } + .ops = f16_ops, }, [F17_CPUS] = { .ctl_name = "F17h", .f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6, .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } + .ops = f17_ops, }, [F17_M10H_CPUS] = { .ctl_name = "F17h_M10h", .f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6, .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } + .ops = f17_ops, }, [F17_M30H_CPUS] = { .ctl_name = "F17h_M30h", .f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6, .max_mcs = 8, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } + .ops = f17_ops, }, [F17_M60H_CPUS] = { .ctl_name = "F17h_M60h", .f0_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_17H_M60H_DF_F6, .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } + .ops = f17_ops, }, [F17_M70H_CPUS] = { .ctl_name = "F17h_M70h", .f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6, .max_mcs = 2, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } + .ops = f17_ops, }, [F19_CPUS] = { .ctl_name = "F19h", .f0_id = PCI_DEVICE_ID_AMD_19H_DF_F0, .f6_id = PCI_DEVICE_ID_AMD_19H_DF_F6, .max_mcs = 8, - .ops = { - .early_channel_count = f17_early_channel_count, - .dbam_to_cs = f17_addr_mask_to_cs_size, - } + .ops = f17_ops, }, }; @@ -2900,9 +2937,10 @@ static inline void decode_bus_error(int node_id, struct mce *m) * the instance_id. For example, instance_id=0xYXXXXX where Y is the channel * number. */ -static int find_umc_channel(struct mce *m) +static void find_umc_channel(struct mce *m, struct err_info *err) { - return (m->ipid & GENMASK(31, 0)) >> 20; + err->channel = (m->ipid & GENMASK(31, 0)) >> 20; + err->csrow = m->synd & 0x7; } static void decode_umc_error(int node_id, struct mce *m) @@ -2924,7 +2962,7 @@ static void decode_umc_error(int node_id, struct mce *m) if (m->status & MCI_STATUS_DEFERRED) ecc_type = 3; - err.channel = find_umc_channel(m); + pvt->ops->get_umc_err_info(m, &err); if (!(m->status & MCI_STATUS_SYNDV)) { err.err_code = ERR_SYND; @@ -2940,8 +2978,6 @@ static void decode_umc_error(int node_id, struct mce *m) err.err_code = ERR_CHANNEL; } - err.csrow = m->synd & 0x7; - if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) { err.err_code = ERR_NORM_ADDR; goto log_error; @@ -3106,8 +3142,9 @@ static void read_mc_regs(struct amd64_pvt *pvt) edac_dbg(0, " TOP_MEM2 disabled\n"); } - if (pvt->umc) { - __read_mc_regs_df(pvt); + if (pvt->ops->get_mc_regs) { + pvt->ops->get_mc_regs(pvt); + amd64_read_pci_cfg(pvt->F0, DF_DHAR, &pvt->dhar); goto skip; @@ -3277,9 +3314,6 @@ static int init_csrows(struct mem_ctl_info *mci) int nr_pages = 0; u32 val; - if (pvt->umc) - return init_csrows_df(mci); - amd64_read_pci_cfg(pvt->F3, NBCFG, &val); pvt->nbcfg = val; @@ -3703,6 +3737,17 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt) return NULL; } + /* ops required for all the families */ + if (!pvt->ops->early_channel_count | !pvt->ops->prep_chip_select | + !pvt->ops->display_misc_regs | !pvt->ops->dbam_to_cs | + !pvt->ops->populate_csrows) + return NULL; + + /* ops required for families 17h and later */ + if (pvt->fam >= 0x17 && (!pvt->ops->get_base_mask | + !pvt->ops->get_umc_err_info | !pvt->ops->get_mc_regs)) + return NULL; + return fam_type; } @@ -3786,7 +3831,7 @@ static int init_one_instance(struct amd64_pvt *pvt) setup_mci_misc_attrs(mci); - if (init_csrows(mci)) + if (pvt->ops->populate_csrows(mci)) mci->edac_cap = EDAC_FLAG_NONE; ret = -ENODEV; diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index 85aa820bc165..ce21b3cf0825 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -472,6 +472,12 @@ struct low_ops { struct err_info *); int (*dbam_to_cs) (struct amd64_pvt *pvt, u8 dct, unsigned cs_mode, int cs_mask_nr); + void (*prep_chip_select)(struct amd64_pvt *pvt); + void (*get_base_mask)(struct amd64_pvt *pvt); + void (*display_misc_regs)(struct amd64_pvt *pvt); + void (*get_mc_regs)(struct amd64_pvt *pvt); + int (*populate_csrows)(struct mem_ctl_info *mci); + void (*get_umc_err_info)(struct mce *m, struct err_info *err); }; struct amd64_family_type { From patchwork Thu Oct 14 18:50:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naveen Krishna Chatradhi X-Patchwork-Id: 12559251 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8BC7C433EF for ; Thu, 14 Oct 2021 18:51:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B6CD361156 for ; Thu, 14 Oct 2021 18:51:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234046AbhJNSxo (ORCPT ); Thu, 14 Oct 2021 14:53:44 -0400 Received: from mail-dm6nam10on2077.outbound.protection.outlook.com ([40.107.93.77]:14825 "EHLO NAM10-DM6-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S234002AbhJNSxm (ORCPT ); Thu, 14 Oct 2021 14:53:42 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bjO0DIyyqN2ArhUd5qIxkzJcA+aKMHlX5em3L7RzlDcbRtFujJyOmprt/dxLUxYI5Zay+2RHoVZsYs8PEO1JL1tV8r31Z57ke59TuwzN8oPPP2mt4TQCet6mZnLUIMPPKAaBAQw/RdphZKuIU1xwfPTo6c9dTR7B8tH9N5poEUscncS8PnxAeqirQnlAlDQv9IQkutPVJBlGxoqfx0tA2X/tRc4DJyXaRH+S997mTu9UIr/lO6l5YyCdcXZkatGsBo48fvNGFka0hTM8gx7nOnjn+KHRlV6Skn6kIRxNhfG5D8yxOEn2YJw8EchuTyIT24vYfL6XGGubwPRT8Btj3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Iu5ZV1caJ1qk/C/rTeFdvQ2C/Mg6DzTOs4zTpWeIWno=; b=KpZmtCF7d4f7VPAaHSVxYIAY+1T3CXM7EWL+q2++wGYKpHZ0XB7Ue+DHgFBcAi/rlvCqrwuFRKYqOp6CDhGCGxOwR6MpN7dK0vY4ZhLnnD+8nEjj+DI8usLh6TQfECB1adb2jCd0SY3zYnKYvLyFUKM5xIraH59YzyVJtq/jmgEp7H9yZwOwoaQ3kHFQN28hBBNvJ1aMUM+i+PLOALqxPxVsneS0XM0Hbuqgif2RS1vZm3HypMUXHlQrlyTo73qTobabFhpAEDXBgrsCPbRMtl5MlYsnf5J1o7XKCDGBPTcIH2J7bANGFeaebI1wntsevPNdnDT3mUtwaiCibsleVw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Iu5ZV1caJ1qk/C/rTeFdvQ2C/Mg6DzTOs4zTpWeIWno=; b=zFt9rK24zgBjFnRxes1UB1F9t5X3ouUa4Jsg/QA9Tbl9KXM1VfMJcpTuRanI0wRCP4XIuteks0nQT/3hXp7T0ntpXBVi1N4BnaqpiHHgloCGGVx3a4BdoctI9mJiupDoBXQdryZEnFO1VU2KCD/GL+qN9VIkQd+mV8La0gq2RSA= Received: from MW4PR04CA0086.namprd04.prod.outlook.com (2603:10b6:303:6b::31) by CH0PR12MB5076.namprd12.prod.outlook.com (2603:10b6:610:e3::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4587.19; Thu, 14 Oct 2021 18:51:34 +0000 Received: from CO1NAM11FT005.eop-nam11.prod.protection.outlook.com (2603:10b6:303:6b:cafe::9b) by MW4PR04CA0086.outlook.office365.com (2603:10b6:303:6b::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4608.16 via Frontend Transport; Thu, 14 Oct 2021 18:51:34 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT005.mail.protection.outlook.com (10.13.174.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.4608.15 via Frontend Transport; Thu, 14 Oct 2021 18:51:34 +0000 Received: from milan-ETHANOL-X.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.8; Thu, 14 Oct 2021 13:51:30 -0500 From: Naveen Krishna Chatradhi To: , CC: , , , , , Naveen Krishna Chatradhi , Muralidhara M K Subject: [PATCH 4/4] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes Date: Fri, 15 Oct 2021 00:20:58 +0530 Message-ID: <20211014185058.9587-5-nchatrad@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211014185058.9587-1-nchatrad@amd.com> References: <20210823185437.94417-1-nchatrad@amd.com> <20211014185058.9587-1-nchatrad@amd.com> MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 728dae73-bc7d-4d03-985a-08d98f43a4e3 X-MS-TrafficTypeDiagnostic: CH0PR12MB5076: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:1850; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 6J4bjUvqVBEdJYQghMPfXo4c2BBBO/jc9JMzR9Ds9dMQIpy62/2Q0H2rBw0D4krGkeqgeZ2Bh+I10tLh5JbTLgmsBJXL+ZFHrGbqnuMjJZZVpCa3Ty598jIVDBto3r5fd6etRSCDPE4mFFQCogXoQnO288ozqltwbLAwHi6JdruVQo7w/RVhMt38pMDNdV66u2gSKRA6StZbAEPgZSeNW/LKQx95jO9PoOlNY1V3YqIDdNnhFE2avT+zcJhg6gS7T7uAEIs5VddbpBWjHDW8dTDwjSjpnNaQ3ip6OB9LdW1eCAGi1vSUZvOvj5GF+y/VchcXaShz4fhuGtVxlY9SImope9SA8HTg+Yx4W444Ri9A0bJbwI7D7mtSXk3ZRgjm/tBerHm/ju9W8T3afdliu9VYk0Gq7THqWhshQZV63YMu7uPjO9uRBWZggGaMePQ06nLbAnXo5LPIsQSDuIRiPH8bvqbFqo8qWa5f3lN059lkRy5/5XIdElZ9axwD0BzJrqHf5o8YAsteOYQVBmHf461iB2aTR+UfRnpgNdc4jMRmX9vEzWNQyFIc+5OANURIT6Vs/GDamgaqxyUQZpPeqbn0j85M44SK++FhlL6MCf6uQX5/2mFqyX0jlOxp44PBuCpuQRDYqaCYmfgVIUxWoHoI4o+CPdeckU0985ebCdqQ1Ep61WvChdfiQP7thbdDBVZD1GzexnGchXCpkjL0qbuo9P3co+9BOOAOMYBdIycJ7WqAb9jqJx4WQsR7YG5W3nVQGa6D/lHgPaFVxLQBR1DUleIJq50YMxKCXpxlVfVv2j73GyBectOcWlxT4KgI X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(4636009)(46966006)(36840700001)(26005)(81166007)(6666004)(5660300002)(70586007)(8676002)(316002)(36756003)(36860700001)(47076005)(16526019)(186003)(7696005)(426003)(2616005)(82310400003)(336012)(83380400001)(4326008)(110136005)(356005)(8936002)(54906003)(1076003)(30864003)(2906002)(966005)(508600001)(70206006)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Oct 2021 18:51:34.0126 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 728dae73-bc7d-4d03-985a-08d98f43a4e3 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT005.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH0PR12MB5076 Precedence: bulk List-ID: X-Mailing-List: linux-edac@vger.kernel.org On newer heterogeneous systems with AMD CPUs, the data fabrics of the GPUs are connected directly via custom links. One such system, where Aldebaran GPU nodes are connected to the Family 19h, model 30h family of CPU nodes, the Aldebaran GPUs can report memory errors via SMCA banks. Aldebaran GPU support was added to DRM framework https://lists.freedesktop.org/archives/amd-gfx/2021-February/059694.html The GPU nodes comes with HBM2 memory in-built, ECC support is enabled by default and the UMCs on GPU node are different from the UMCs on CPU nodes. GPU specific ops routines are defined to extend the amd64_edac module to enumerate HBM memory leveraging the existing edac and the amd64 specific data structures. Note: The UMC Phys on GPU nodes are enumerated as csrows and the UMC channels connected to HBM banks are enumerated as ranks. Cc: Yazen Ghannam Co-developed-by: Muralidhara M K Signed-off-by: Muralidhara M K Signed-off-by: Naveen Krishna Chatradhi Link: https://lkml.kernel.org/r/20210823185437.94417-4-nchatrad@amd.com --- Changes since v3: 1. Bifurcated the GPU code from v2 Changes since v2: 1. Restored line deletions and handled minor comments 2. Modified commit message and some of the function comments 3. variable df_inst_id is introduced instead of umc_num Changes since v1: 1. Modifed the commit message 2. Change the edac_cap 3. kept sizes of both cpu and noncpu together 4. return success if the !F3 condition true and remove unnecessary validation drivers/edac/amd64_edac.c | 233 +++++++++++++++++++++++++++++++++++++- drivers/edac/amd64_edac.h | 27 +++++ 2 files changed, 254 insertions(+), 6 deletions(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 131ed19f69dd..7173310660a3 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -1123,6 +1123,20 @@ static void debug_display_dimm_sizes_df(struct amd64_pvt *pvt, u8 ctrl) } } +static void debug_display_dimm_sizes_gpu(struct amd64_pvt *pvt, u8 ctrl) +{ + int size, cs = 0, cs_mode; + + edac_printk(KERN_DEBUG, EDAC_MC, "UMC%d chip selects:\n", ctrl); + + cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY; + + for_each_chip_select(cs, ctrl, pvt) { + size = pvt->ops->dbam_to_cs(pvt, ctrl, cs_mode, cs); + amd64_info(EDAC_MC ": %d: %5dMB\n", cs, size); + } +} + static void __dump_misc_regs_df(struct amd64_pvt *pvt) { struct amd64_umc *umc; @@ -1167,6 +1181,27 @@ static void __dump_misc_regs_df(struct amd64_pvt *pvt) pvt->dhar, dhar_base(pvt)); } +static void __dump_misc_regs_gpu(struct amd64_pvt *pvt) +{ + struct amd64_umc *umc; + u32 i, umc_base; + + for_each_umc(i) { + umc_base = get_umc_base(i); + umc = &pvt->umc[i]; + + edac_dbg(1, "UMC%d UMC cfg: 0x%x\n", i, umc->umc_cfg); + edac_dbg(1, "UMC%d SDP ctrl: 0x%x\n", i, umc->sdp_ctrl); + edac_dbg(1, "UMC%d ECC ctrl: 0x%x\n", i, umc->ecc_ctrl); + edac_dbg(1, "UMC%d All HBMs support ECC: yes\n", i); + + debug_display_dimm_sizes_gpu(pvt, i); + } + + edac_dbg(1, "F0x104 (DRAM Hole Address): 0x%08x, base: 0x%08x\n", + pvt->dhar, dhar_base(pvt)); +} + /* Display and decode various NB registers for debug purposes. */ static void __dump_misc_regs(struct amd64_pvt *pvt) { @@ -1242,6 +1277,43 @@ static void f17_prep_chip_selects(struct amd64_pvt *pvt) } } +static void gpu_prep_chip_selects(struct amd64_pvt *pvt) +{ + int umc; + + for_each_umc(umc) { + pvt->csels[umc].b_cnt = 8; + pvt->csels[umc].m_cnt = 8; + } +} + +static void read_umc_base_mask_gpu(struct amd64_pvt *pvt) +{ + u32 base_reg, mask_reg; + u32 *base, *mask; + int umc, cs; + + for_each_umc(umc) { + for_each_chip_select(cs, umc, pvt) { + base_reg = get_umc_base_gpu(umc, cs) + UMCCH_BASE_ADDR; + base = &pvt->csels[umc].csbases[cs]; + + if (!amd_smn_read(pvt->mc_node_id, base_reg, base)) { + edac_dbg(0, " DCSB%d[%d]=0x%08x reg: 0x%x\n", + umc, cs, *base, base_reg); + } + + mask_reg = get_umc_base_gpu(umc, cs) + UMCCH_ADDR_MASK; + mask = &pvt->csels[umc].csmasks[cs]; + + if (!amd_smn_read(pvt->mc_node_id, mask_reg, mask)) { + edac_dbg(0, " DCSM%d[%d]=0x%08x reg: 0x%x\n", + umc, cs, *mask, mask_reg); + } + } + } +} + static void read_umc_base_mask(struct amd64_pvt *pvt) { u32 umc_base_reg, umc_base_reg_sec; @@ -1745,6 +1817,19 @@ static int f17_early_channel_count(struct amd64_pvt *pvt) return channels; } +static int gpu_early_channel_count(struct amd64_pvt *pvt) +{ + int i, channels = 0; + + /* The memory channels in case of GPUs are fully populated */ + for_each_umc(i) + channels += pvt->csels[i].b_cnt; + + amd64_info("MCT channel count: %d\n", channels); + + return channels; +} + static int ddr3_cs_size(unsigned i, bool dct_width) { unsigned shift = 0; @@ -1942,6 +2027,14 @@ static int f17_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, dimm); } +static int gpu_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc, + unsigned int cs_mode, int csrow_nr) +{ + u32 addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr]; + + return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, csrow_nr >> 1); +} + static void read_dram_ctl_register(struct amd64_pvt *pvt) { @@ -2527,8 +2620,11 @@ static void debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl) /* Prototypes for family specific ops routines */ static int init_csrows(struct mem_ctl_info *mci); static int init_csrows_df(struct mem_ctl_info *mci); +static int init_csrows_gpu(struct mem_ctl_info *mci); static void __read_mc_regs_df(struct amd64_pvt *pvt); +static void __read_mc_regs_gpu(struct amd64_pvt *pvt); static void find_umc_channel(struct mce *m, struct err_info *err); +static void find_umc_channel_gpu(struct mce *m, struct err_info *err); static const struct low_ops k8_ops = { .early_channel_count = k8_early_channel_count, @@ -2595,6 +2691,17 @@ static const struct low_ops f17_ops = { .get_umc_err_info = find_umc_channel, }; +static const struct low_ops gpu_ops = { + .early_channel_count = gpu_early_channel_count, + .dbam_to_cs = gpu_addr_mask_to_cs_size, + .prep_chip_select = gpu_prep_chip_selects, + .get_base_mask = read_umc_base_mask_gpu, + .display_misc_regs = __dump_misc_regs_gpu, + .get_mc_regs = __read_mc_regs_gpu, + .populate_csrows = init_csrows_gpu, + .get_umc_err_info = find_umc_channel_gpu, +}; + static struct amd64_family_type family_types[] = { [K8_CPUS] = { .ctl_name = "K8", @@ -2687,6 +2794,14 @@ static struct amd64_family_type family_types[] = { .max_mcs = 8, .ops = f17_ops, }, + [ALDEBARAN_GPUS] = { + .ctl_name = "ALDEBARAN", + .f0_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0, + .f6_id = PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6, + .max_mcs = 4, + .ops = gpu_ops, + }, + }; /* @@ -2943,12 +3058,38 @@ static void find_umc_channel(struct mce *m, struct err_info *err) err->csrow = m->synd & 0x7; } +/* + * The CPUs have one channel per UMC, So UMC number is equivalent to a + * channel number. The GPUs have 8 channels per UMC, so the UMC number no + * longer works as a channel number. + * The channel number within a GPU UMC is given in MCA_IPID[15:12]. + * However, the IDs are split such that two UMC values go to one UMC, and + * the channel numbers are split in two groups of four. + * + * Refer comment on get_umc_base_gpu() from amd64_edac.h + * + * For example, + * UMC0 CH[3:0] = 0x0005[3:0]000 + * UMC0 CH[7:4] = 0x0015[3:0]000 + * UMC1 CH[3:0] = 0x0025[3:0]000 + * UMC1 CH[7:4] = 0x0035[3:0]000 + */ +static void find_umc_channel_gpu(struct mce *m, struct err_info *err) +{ + u8 ch = (m->ipid & GENMASK(31, 0)) >> 20; + u8 phy = ((m->ipid >> 12) & 0xf); + + err->channel = ch % 2 ? phy + 4 : phy; + err->csrow = phy; +} + static void decode_umc_error(int node_id, struct mce *m) { u8 ecc_type = (m->status >> 45) & 0x3; struct mem_ctl_info *mci; struct amd64_pvt *pvt; struct err_info err; + u8 df_inst_id; u64 sys_addr; mci = edac_mc_find(node_id); @@ -2978,7 +3119,17 @@ static void decode_umc_error(int node_id, struct mce *m) err.err_code = ERR_CHANNEL; } - if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, err.channel, &sys_addr)) { + /* + * GPU node has #phys[X] which has #channels[Y] each. + * On GPUs, df_inst_id = [X] * num_ch_per_phy + [Y]. + * On CPUs, "Channel"="UMC Number"="DF Instance ID". + */ + if (pvt->is_gpu) + df_inst_id = (err.csrow * pvt->channel_count / mci->nr_csrows) + err.channel; + else + df_inst_id = err.channel; + + if (umc_normaddr_to_sysaddr(m->addr, pvt->mc_node_id, df_inst_id, &sys_addr)) { err.err_code = ERR_NORM_ADDR; goto log_error; } @@ -3117,6 +3268,23 @@ static void __read_mc_regs_df(struct amd64_pvt *pvt) } } +static void __read_mc_regs_gpu(struct amd64_pvt *pvt) +{ + u8 nid = pvt->mc_node_id; + struct amd64_umc *umc; + u32 i, umc_base; + + /* Read registers from each UMC */ + for_each_umc(i) { + umc_base = get_umc_base_gpu(i, 0); + umc = &pvt->umc[i]; + + amd_smn_read(nid, umc_base + UMCCH_UMC_CFG, &umc->umc_cfg); + amd_smn_read(nid, umc_base + UMCCH_SDP_CTRL, &umc->sdp_ctrl); + amd_smn_read(nid, umc_base + UMCCH_ECC_CTRL, &umc->ecc_ctrl); + } +} + /* * Retrieve the hardware registers of the memory controller (this includes the * 'Address Map' and 'Misc' device regs) @@ -3196,7 +3364,9 @@ static void read_mc_regs(struct amd64_pvt *pvt) determine_memory_type(pvt); edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]); - determine_ecc_sym_sz(pvt); + /* ECC symbol size is not available on GPU nodes */ + if (!pvt->is_gpu) + determine_ecc_sym_sz(pvt); } /* @@ -3243,7 +3413,10 @@ static u32 get_csrow_nr_pages(struct amd64_pvt *pvt, u8 dct, int csrow_nr_orig) csrow_nr >>= 1; cs_mode = DBAM_DIMM(csrow_nr, dbam); } else { - cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt); + if (pvt->is_gpu) + cs_mode = CS_EVEN_PRIMARY | CS_ODD_PRIMARY; + else + cs_mode = f17_get_cs_mode(csrow_nr >> 1, dct, pvt); } nr_pages = pvt->ops->dbam_to_cs(pvt, dct, cs_mode, csrow_nr); @@ -3300,6 +3473,35 @@ static int init_csrows_df(struct mem_ctl_info *mci) return empty; } +static int init_csrows_gpu(struct mem_ctl_info *mci) +{ + struct amd64_pvt *pvt = mci->pvt_info; + struct dimm_info *dimm; + int empty = 1; + u8 umc, cs; + + for_each_umc(umc) { + for_each_chip_select(cs, umc, pvt) { + if (!csrow_enabled(cs, umc, pvt)) + continue; + + empty = 0; + dimm = mci->csrows[umc]->channels[cs]->dimm; + + edac_dbg(1, "MC node: %d, csrow: %d\n", + pvt->mc_node_id, cs); + + dimm->nr_pages = get_csrow_nr_pages(pvt, umc, cs); + dimm->mtype = MEM_HBM2; + dimm->edac_mode = EDAC_SECDED; + dimm->dtype = DEV_X16; + dimm->grain = 64; + } + } + + return empty; +} + /* * Initialize the array of csrow attribute instances, based on the values * from pci config hardware registers. @@ -3541,6 +3743,10 @@ static bool ecc_enabled(struct amd64_pvt *pvt) u8 ecc_en = 0, i; u32 value; + /* ECC is enabled by default on GPU nodes */ + if (pvt->is_gpu) + return true; + if (boot_cpu_data.x86 >= 0x17) { u8 umc_en_mask = 0, ecc_en_mask = 0; struct amd64_umc *umc; @@ -3624,7 +3830,10 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci) mci->edac_ctl_cap = EDAC_FLAG_NONE; if (pvt->umc) { - f17h_determine_edac_ctl_cap(mci, pvt); + if (pvt->is_gpu) + mci->edac_ctl_cap |= EDAC_FLAG_SECDED; + else + f17h_determine_edac_ctl_cap(mci, pvt); } else { if (pvt->nbcap & NBCAP_SECDED) mci->edac_ctl_cap |= EDAC_FLAG_SECDED; @@ -3726,6 +3935,17 @@ static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt) pvt->ops = &family_types[F17_M70H_CPUS].ops; fam_type->ctl_name = "F19h_M20h"; break; + } else if (pvt->model >= 0x30 && pvt->model <= 0x3f) { + if (pvt->mc_node_id >= amd_cpu_node_count()) { + fam_type = &family_types[ALDEBARAN_GPUS]; + pvt->ops = &family_types[ALDEBARAN_GPUS].ops; + pvt->is_gpu = true; + } else { + fam_type = &family_types[F19_CPUS]; + pvt->ops = &family_types[F19_CPUS].ops; + fam_type->ctl_name = "F19h_M30h"; + } + break; } fam_type = &family_types[F19_CPUS]; pvt->ops = &family_types[F19_CPUS].ops; @@ -3808,9 +4028,10 @@ static int init_one_instance(struct amd64_pvt *pvt) if (pvt->channel_count < 0) return ret; + /* Define layers for CPU and GPU nodes */ ret = -ENOMEM; layers[0].type = EDAC_MC_LAYER_CHIP_SELECT; - layers[0].size = pvt->csels[0].b_cnt; + layers[0].size = pvt->is_gpu ? fam_type->max_mcs : pvt->csels[0].b_cnt; layers[0].is_virt_csrow = true; layers[1].type = EDAC_MC_LAYER_CHANNEL; @@ -3819,7 +4040,7 @@ static int init_one_instance(struct amd64_pvt *pvt) * only one channel. Also, this simplifies handling later for the price * of a couple of KBs tops. */ - layers[1].size = fam_type->max_mcs; + layers[1].size = pvt->is_gpu ? pvt->csels[0].b_cnt : fam_type->max_mcs; layers[1].is_virt_csrow = false; mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0); diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h index ce21b3cf0825..2dbf6fe14a55 100644 --- a/drivers/edac/amd64_edac.h +++ b/drivers/edac/amd64_edac.h @@ -126,6 +126,8 @@ #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F6 0x1446 #define PCI_DEVICE_ID_AMD_19H_DF_F0 0x1650 #define PCI_DEVICE_ID_AMD_19H_DF_F6 0x1656 +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F0 0x14d0 +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F6 0x14d6 /* * Function 1 - Address Map @@ -298,6 +300,7 @@ enum amd_families { F17_M60H_CPUS, F17_M70H_CPUS, F19_CPUS, + ALDEBARAN_GPUS, NUM_FAMILIES, }; @@ -389,6 +392,8 @@ struct amd64_pvt { enum mem_type dram_type; struct amd64_umc *umc; /* UMC registers */ + + bool is_gpu; }; enum err_codes { @@ -410,6 +415,28 @@ struct err_info { u32 offset; }; +static inline u32 get_umc_base_gpu(u8 umc, u8 channel) +{ + /* + * On CPUs, there is one channel per UMC, so UMC numbering equals + * channel numbering. On GPUs, there are eight channels per UMC, + * so the channel numbering is different from UMC numbering. + * + * On CPU nodes channels are selected in 6th nibble + * UMC chY[3:0]= [(chY*2 + 1) : (chY*2)]50000; + * + * On GPU nodes channels are selected in 3rd nibble + * HBM chX[3:0]= [Y ]5X[3:0]000; + * HBM chX[7:4]= [Y+1]5X[3:0]000 + */ + umc *= 2; + + if (channel >= 4) + umc++; + + return 0x50000 + (umc << 20) + ((channel % 4) << 12); +} + static inline u32 get_umc_base(u8 channel) { /* chY: 0xY50000 */