From patchwork Thu Nov 23 00:35:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankit Agrawal X-Patchwork-Id: 13465625 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="EsNUlLAi" Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2052.outbound.protection.outlook.com [40.107.237.52]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2001810E; Wed, 22 Nov 2023 16:35:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=N1XvTZfNRcrRd2MuYkCQjMYULBgviYgqvazSUPYrqefJk2mshCQZfkltGr21koZeHBtWnflzfbtnx3SQqP5cs9ZiLIg/A+Euwtg0kxku/u4MzxZs1EokVADGroPLBeEzcWi8YHAEAZlZRiAOkpQJmY1QDUNDLXrZ46VAea4a5AOyyxinZO+N+FkY2vioj6XEUxF4e3UtLxkxqBHYMrdp5HfQGjuxYdpURx/hD895IlbLwnsjMqeh8J9bCdvJ6nK2isSwoKlTGA10I3n/VCPR1mIfhmkLX3w5hjljbcctGZIEC/cDNHFtqodWgc6b07cvJFN/ZQe/PShecUs4JkJU+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=j6/Dl8yhwAb5WUiC75w/2oRU4+hYo1CMEdhyj99a+C4=; b=Uf9sUYztyLwosFs4qNHNbWDOSpJUETcVD8UFN6orBNPET/3rlo7kV+0WGVRIwkpqfovsGVt6FYDvSAGxQDUSFOPBsmX2I+H1FV8Gh93nf7Ry+/etPxk1ULvMP33BlsamBIfrd3WqFR8vy1LyKER9TVgDTqrsZ0z7N1s6FxBjJex/QmZcZZWVylgVKSrATJdopMlHwJFzS+0Gn//iqWp6g3L8pteR6zBWZgHmyS6Er/kx16GnBLU7nO321qLnSIkPysb+t7Ec11hsPU9Hv0sJwy0N5U3wwE02K6b2rzH7qvykp6uK/9tLkt3n8Egyha40n/dXeEwGWmie2xuKjQjQsw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=kvack.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=j6/Dl8yhwAb5WUiC75w/2oRU4+hYo1CMEdhyj99a+C4=; b=EsNUlLAioR+X/4VZA2L0WtRak7R3v0IeW9I/Yjy8PpAWRY/1u3VAgr6CEMCipUvGoJ5/3wx1iY1fKdL9OeaJPrysBNxesjIHl+Gtk+4WMx6of3TQnrPMuyOfrZiogrz8Bb/bpuXvG19LSI3EY7WYnf1Oj04Lz1b7ERzWRpFDkKSeFK6IgHUfagQeawHyZrOhlMqfBKBCEMXKOD0PYgsH3KCOtvmAAPdqc3ALr/mGIW7pKMXFP79HO5EAcWr/UdbJ1J6W/wKZS1z9rsAi1qr8HdvE2T3TE2hSe6lyHdKOBtY/rL8X4LaXYJE8j49vmTsXHK85DYruyJuW9KNOHtqv7A== Received: from CH0PR08CA0026.namprd08.prod.outlook.com (2603:10b6:610:33::31) by DM4PR12MB6011.namprd12.prod.outlook.com (2603:10b6:8:6b::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7025.19; Thu, 23 Nov 2023 00:35:51 +0000 Received: from DS3PEPF000099D9.namprd04.prod.outlook.com (2603:10b6:610:33:cafe::62) by CH0PR08CA0026.outlook.office365.com (2603:10b6:610:33::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7025.19 via Frontend Transport; Thu, 23 Nov 2023 00:35:50 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by DS3PEPF000099D9.mail.protection.outlook.com (10.167.17.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7025.12 via Frontend Transport; Thu, 23 Nov 2023 00:35:50 +0000 Received: from rnnvmail203.nvidia.com (10.129.68.9) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Wed, 22 Nov 2023 16:35:33 -0800 Received: from rnnvmail201.nvidia.com (10.129.68.8) by rnnvmail203.nvidia.com (10.129.68.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Wed, 22 Nov 2023 16:35:33 -0800 Received: from sgarnayak-dt.nvidia.com (10.127.8.9) by mail.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41 via Frontend Transport; Wed, 22 Nov 2023 16:35:24 -0800 From: To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , Subject: [PATCH v2 1/4] mm: handle poisoning of pfn without struct pages Date: Thu, 23 Nov 2023 06:05:09 +0530 Message-ID: <20231123003513.24292-2-ankita@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231123003513.24292-1-ankita@nvidia.com> References: <20231123003513.24292-1-ankita@nvidia.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS3PEPF000099D9:EE_|DM4PR12MB6011:EE_ X-MS-Office365-Filtering-Correlation-Id: 6c990266-acb1-4995-7e7f-08dbebbc24ce X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 24NfJhTCUFQAal3Xh2u6sGkfZWGJeRA61QvKfoNw8sElSyYinVcqAcX4jmPG/ffMpziaKOaoFwsmoO8Vw0xwT0peb2vwUyLH5uwxh/Y3WgUvQCOVVetN3Y4AOOkB4VxX8u5BBzulrtENXpZGSf/fneZS4jRatFGpF8Pb99vv67LqcPTC98d5HsBvdBCtizALIv731J4KOVEWYB7uGgrrQAeoHCaVk/skuM867YVAQtCciKtlOeHA2ugAktfSu+N/MGZCYZWWsINBfTUU/hfetoY1bD8kyrv0xWxzCriSgROvROslAdAX0yafc9979iQMARASTEZqrfFMTz3dZBotdXqhfeq1CA1gzClndrM16FXb667snesj28XfjiLnjP+hfFU9ZWQ1NIcPSnNNRz7iSVFuG0Jv0zuDGwv+IRjiDmC6XPP89U2A6o/jM3/wOgS65cuxaaoswN30lYe0iZSnqhnS/fD1G9LRvRznt5b8781363LASMhu07XphrdtQ8D1ZhdGgIMK18qI+eZUoe2q+vkVDVy+pRzsJCiEA2011JD6EDA4c/rumT4lXWrDl+5dbR5q40gC9BO2ayVaLxiFktRjamnB3qBCC9iBxvUa2oBG8IpSo8MvcRFU4AYGTdHCPKp1+VE1S+PBC/H+MX1ycsa3Ta3dmAV06vIheWF6ESS6x/9Lp6lnyx2eznhezGVhCZM/Va478wc3+zXsKD7woTOjLs6fgMlI+nvhOVGUd3ruUmWc2topAT9DbPYG0YpwAHDvm7sVW8slSZCD+gkFBg== X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230031)(4636009)(136003)(376002)(346002)(396003)(39860400002)(230922051799003)(186009)(451199024)(64100799003)(82310400011)(1800799012)(40470700004)(36840700001)(46966006)(40460700003)(1076003)(26005)(83380400001)(6666004)(7696005)(2616005)(36860700001)(47076005)(4326008)(8676002)(5660300002)(8936002)(41300700001)(478600001)(2906002)(2876002)(7416002)(30864003)(316002)(336012)(54906003)(110136005)(70586007)(70206006)(86362001)(426003)(36756003)(82740400003)(356005)(7636003)(921008)(40480700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Nov 2023 00:35:50.5386 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6c990266-acb1-4995-7e7f-08dbebbc24ce X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DS3PEPF000099D9.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB6011 From: Ankit Agrawal The kernel MM currently does not handle ECC errors / poison on a memory region that is not backed by struct pages. If a memory region is mapped using remap_pfn_range(), but not added to the kernel, MM will not have associated struct pages. Add a new mechanism to handle memory failure on such memory. Make kernel MM expose a function to allow modules managing the device memory to register a failure function and the physical address space associated with the device memory. MM maintains this information as interval tree. The registered memory failure function is used by MM to notify the kernel module managing the PFN, so that the module may take any required action. The module for example may use the information to track the poisoned pages. In this implementation, kernel MM follows the following sequence similar (mostly) to the memory_failure() handler for struct page backed memory: 1. memory_failure() is triggered on reception of a poison error. An absence of struct page is detected and consequently memory_failure_pfn() is executed. 2. memory_failure_pfn() call the newly introduced failure handler exposed by the module managing the poisoned memory to notify it of the problematic PFN. 3. memory_failure_pfn() unmaps the stage-2 mapping to the PFN. 4. memory_failure_pfn() collects the processes mapped to the PFN. 5. memory_failure_pfn() sends SIGBUS (BUS_MCEERR_AO) to all the processes mapping the faulty PFN using kill_procs(). 6. An access to the faulty PFN by an operation in VM at a later point is trapped and user_mem_abort() is called. 7. The vma ops fault function gets called due to the absence of Stage-2 mapping. It is expected to return VM_FAULT_HWPOISON on the PFN. 8. __gfn_to_pfn_memslot() then returns KVM_PFN_ERR_HWPOISON, which cause the poison with SIGBUS (BUS_MCEERR_AR) to be sent to the QEMU process through kvm_send_hwpoison_signal(). Signed-off-by: Ankit Agrawal --- include/linux/memory-failure.h | 22 +++++ include/linux/mm.h | 1 + include/ras/ras_event.h | 1 + mm/Kconfig | 1 + mm/memory-failure.c | 146 +++++++++++++++++++++++++++------ 5 files changed, 146 insertions(+), 25 deletions(-) create mode 100644 include/linux/memory-failure.h diff --git a/include/linux/memory-failure.h b/include/linux/memory-failure.h new file mode 100644 index 000000000000..9a579960972a --- /dev/null +++ b/include/linux/memory-failure.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_MEMORY_FAILURE_H +#define _LINUX_MEMORY_FAILURE_H + +#include + +struct pfn_address_space; + +struct pfn_address_space_ops { + void (*failure)(struct pfn_address_space *pfn_space, unsigned long pfn); +}; + +struct pfn_address_space { + struct interval_tree_node node; + const struct pfn_address_space_ops *ops; + struct address_space *mapping; +}; + +int register_pfn_address_space(struct pfn_address_space *pfn_space); +void unregister_pfn_address_space(struct pfn_address_space *pfn_space); + +#endif /* _LINUX_MEMORY_FAILURE_H */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 418d26608ece..82b90e890b4b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4007,6 +4007,7 @@ enum mf_action_page_type { MF_MSG_BUDDY, MF_MSG_DAX, MF_MSG_UNSPLIT_THP, + MF_MSG_PFN_MAP, MF_MSG_UNKNOWN, }; diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h index cbd3ddd7c33d..05c3e6f6bd02 100644 --- a/include/ras/ras_event.h +++ b/include/ras/ras_event.h @@ -373,6 +373,7 @@ TRACE_EVENT(aer_event, EM ( MF_MSG_BUDDY, "free buddy page" ) \ EM ( MF_MSG_DAX, "dax page" ) \ EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ + EM ( MF_MSG_PFN_MAP, "non struct page pfn" ) \ EMe ( MF_MSG_UNKNOWN, "unknown page" ) /* diff --git a/mm/Kconfig b/mm/Kconfig index 89971a894b60..4f9533422887 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -774,6 +774,7 @@ config MEMORY_FAILURE depends on ARCH_SUPPORTS_MEMORY_FAILURE bool "Enable recovery from hardware memory errors" select MEMORY_ISOLATION + select INTERVAL_TREE select RAS help Enables code to recover from some memory failures on systems diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 660c21859118..4f7672775486 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -38,6 +38,7 @@ #include #include +#include #include #include #include @@ -60,6 +61,7 @@ #include #include #include +#include #include "swap.h" #include "internal.h" #include "ras/ras_event.h" @@ -144,6 +146,10 @@ static struct ctl_table memory_failure_table[] = { { } }; +static struct rb_root_cached pfn_space_itree = RB_ROOT_CACHED; + +static DEFINE_MUTEX(pfn_space_lock); + /* * Return values: * 1: the page is dissolved (if needed) and taken off from buddy, @@ -422,15 +428,16 @@ static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma, * Schedule a process for later kill. * Uses GFP_ATOMIC allocations to avoid potential recursions in the VM. * - * Note: @fsdax_pgoff is used only when @p is a fsdax page and a - * filesystem with a memory failure handler has claimed the - * memory_failure event. In all other cases, page->index and - * page->mapping are sufficient for mapping the page back to its - * corresponding user virtual address. + * Notice: @pgoff is used when: + * a. @p is a fsdax page and a filesystem with a memory failure handler + * has claimed the memory_failure event. + * b. pgoff is not backed by struct page. + * In all other cases, page->index and page->mapping are sufficient + * for mapping the page back to its corresponding user virtual address. */ static void __add_to_kill(struct task_struct *tsk, struct page *p, struct vm_area_struct *vma, struct list_head *to_kill, - unsigned long ksm_addr, pgoff_t fsdax_pgoff) + unsigned long ksm_addr, pgoff_t pgoff) { struct to_kill *tk; @@ -440,13 +447,19 @@ static void __add_to_kill(struct task_struct *tsk, struct page *p, return; } - tk->addr = ksm_addr ? ksm_addr : page_address_in_vma(p, vma); - if (is_zone_device_page(p)) { - if (fsdax_pgoff != FSDAX_INVALID_PGOFF) - tk->addr = vma_pgoff_address(fsdax_pgoff, 1, vma); - tk->size_shift = dev_pagemap_mapping_shift(vma, tk->addr); - } else - tk->size_shift = page_shift(compound_head(p)); + /* Check for pgoff not backed by struct page */ + if (!(pfn_valid(pgoff)) && (vma->vm_flags | PFN_MAP)) { + tk->addr = vma_pgoff_address(pgoff, 1, vma); + tk->size_shift = PAGE_SHIFT; + } else { + tk->addr = ksm_addr ? ksm_addr : page_address_in_vma(p, vma); + if (is_zone_device_page(p)) { + if (pgoff != FSDAX_INVALID_PGOFF) + tk->addr = vma_pgoff_address(pgoff, 1, vma); + tk->size_shift = dev_pagemap_mapping_shift(vma, tk->addr); + } else + tk->size_shift = page_shift(compound_head(p)); + } /* * Send SIGKILL if "tk->addr == -EFAULT". Also, as @@ -459,8 +472,8 @@ static void __add_to_kill(struct task_struct *tsk, struct page *p, * has a mapping for the page. */ if (tk->addr == -EFAULT) { - pr_info("Unable to find user space address %lx in %s\n", - page_to_pfn(p), tsk->comm); + pr_info("Unable to find address %lx in %s\n", + pfn_valid(pgoff) ? page_to_pfn(p) : pgoff, tsk->comm); } else if (tk->size_shift == 0) { kfree(tk); return; @@ -666,8 +679,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill, i_mmap_unlock_read(mapping); } -#ifdef CONFIG_FS_DAX -static void add_to_kill_fsdax(struct task_struct *tsk, struct page *p, +static void add_to_kill_pgoff(struct task_struct *tsk, struct page *p, struct vm_area_struct *vma, struct list_head *to_kill, pgoff_t pgoff) { @@ -675,11 +687,12 @@ static void add_to_kill_fsdax(struct task_struct *tsk, struct page *p, } /* - * Collect processes when the error hit a fsdax page. + * Collect processes when the error hit a fsdax page or a PFN not backed by + * struct page. */ -static void collect_procs_fsdax(struct page *page, - struct address_space *mapping, pgoff_t pgoff, - struct list_head *to_kill) +static void collect_procs_pgoff(struct page *page, + struct address_space *mapping, pgoff_t pgoff, + struct list_head *to_kill) { struct vm_area_struct *vma; struct task_struct *tsk; @@ -693,13 +706,12 @@ static void collect_procs_fsdax(struct page *page, continue; vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) { if (vma->vm_mm == t->mm) - add_to_kill_fsdax(t, page, vma, to_kill, pgoff); + add_to_kill_pgoff(t, page, vma, to_kill, pgoff); } } rcu_read_unlock(); i_mmap_unlock_read(mapping); } -#endif /* CONFIG_FS_DAX */ /* * Collect the processes who have the corrupted page mapped to kill. @@ -893,6 +905,7 @@ static const char * const action_page_types[] = { [MF_MSG_BUDDY] = "free buddy page", [MF_MSG_DAX] = "dax page", [MF_MSG_UNSPLIT_THP] = "unsplit thp", + [MF_MSG_PFN_MAP] = "non struct page pfn", [MF_MSG_UNKNOWN] = "unknown page", }; @@ -1324,7 +1337,8 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type, num_poisoned_pages_inc(pfn); - update_per_node_mf_stats(pfn, result); + if (type != MF_MSG_PFN_MAP) + update_per_node_mf_stats(pfn, result); pr_err("%#lx: recovery action for %s: %s\n", pfn, action_page_types[type], action_name[result]); @@ -1808,7 +1822,7 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, SetPageHWPoison(page); - collect_procs_fsdax(page, mapping, index, &to_kill); + collect_procs_pgoff(page, mapping, index, &to_kill); unmap_and_kill(&to_kill, page_to_pfn(page), mapping, index, mf_flags); unlock: @@ -2147,6 +2161,83 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, return rc; } +int register_pfn_address_space(struct pfn_address_space *pfn_space) +{ + if (!pfn_space) + return -EINVAL; + + if (!request_mem_region(pfn_space->node.start << PAGE_SHIFT, + (pfn_space->node.last - pfn_space->node.start + 1) << PAGE_SHIFT, "")) + return -EBUSY; + + mutex_lock(&pfn_space_lock); + interval_tree_insert(&pfn_space->node, &pfn_space_itree); + mutex_unlock(&pfn_space_lock); + + return 0; +} +EXPORT_SYMBOL_GPL(register_pfn_address_space); + +void unregister_pfn_address_space(struct pfn_address_space *pfn_space) +{ + if (!pfn_space) + return; + + mutex_lock(&pfn_space_lock); + interval_tree_remove(&pfn_space->node, &pfn_space_itree); + mutex_unlock(&pfn_space_lock); + release_mem_region(pfn_space->node.start << PAGE_SHIFT, + (pfn_space->node.last - pfn_space->node.start + 1) << PAGE_SHIFT); +} +EXPORT_SYMBOL_GPL(unregister_pfn_address_space); + +static int memory_failure_pfn(unsigned long pfn, int flags) +{ + struct interval_tree_node *node; + int res = MF_FAILED; + LIST_HEAD(tokill); + + mutex_lock(&pfn_space_lock); + /* + * Modules registers with MM the address space mapping to the device memory they + * manage. Iterate to identify exactly which address space has mapped to this + * failing PFN. + */ + for (node = interval_tree_iter_first(&pfn_space_itree, pfn, pfn); node; + node = interval_tree_iter_next(node, pfn, pfn)) { + struct pfn_address_space *pfn_space = + container_of(node, struct pfn_address_space, node); + /* + * Modules managing the device memory need to be conveyed about the + * memory failure so that the poisoned PFN can be tracked. + */ + if (pfn_space->ops) + pfn_space->ops->failure(pfn_space, pfn); + + collect_procs_pgoff(NULL, pfn_space->mapping, pfn, &tokill); + + unmap_mapping_range(pfn_space->mapping, pfn << PAGE_SHIFT, + PAGE_SIZE, 0); + + res = MF_RECOVERED; + } + mutex_unlock(&pfn_space_lock); + + if (res == MF_FAILED) + return action_result(pfn, MF_MSG_PFN_MAP, res); + + /* + * Unlike System-RAM there is no possibility to swap in a different + * physical page at a given virtual address, so all userspace + * consumption of direct PFN memory necessitates SIGBUS (i.e. + * MF_MUST_KILL) + */ + flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; + kill_procs(&tokill, true, false, pfn, flags); + + return action_result(pfn, MF_MSG_PFN_MAP, MF_RECOVERED); +} + /** * memory_failure - Handle memory failure of a page. * @pfn: Page Number of the corrupted page @@ -2186,6 +2277,11 @@ int memory_failure(unsigned long pfn, int flags) if (!(flags & MF_SW_SIMULATED)) hw_memory_failure = true; + if (!pfn_valid(pfn) && !arch_is_platform_page(PFN_PHYS(pfn))) { + res = memory_failure_pfn(pfn, flags); + goto unlock_mutex; + } + p = pfn_to_online_page(pfn); if (!p) { res = arch_memory_failure(pfn, flags); From patchwork Thu Nov 23 00:35:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankit Agrawal X-Patchwork-Id: 13465626 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="CpNtRGSp" Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2040.outbound.protection.outlook.com [40.107.236.40]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D1C7D4A; Wed, 22 Nov 2023 16:36:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=T795ru13brCgtgGycGGwVlrOk9/jtMK6jgNuRbjsiI7wEfMJsAwOK9KuXmRD254ylZsvUwfNt/KtSozC+Bb7u1xKQ8DRAz0qRUkV/9TZn5HJZv/kpbc7goyGdVtUpyazg9bvBRy1SajmzDjiu3yt8V0VuMajd+jqtCmTdVNbZHkF981py3TdKaWgqKwmFdU7JPL0eL307/OkOyQcQTt1wgKEUe+1iIO2hVvCMbjOiaStOxQGCj/8jvLycFdUcE4AYUD5Mujm6LhupyAAlZ+8g3LFUWunTX42uiDdj6k/MscN0C8/7yYYZa2lMUVeyTtSQgaGACRgOpkp/BPRWRP/jw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=3/bZyWHQoKEw5iy9xhVjRwl4/N9WhdB8iG6w+9HFKPQ=; b=j02nSSEIVVSWLIsSG1oaoYWkcBiV/SynQm58HWSAhTVmFkvQlSO/5txqAhVifpKyptkkB6CJaR/X/ydxaR5tB37cPrAj2LweI1aRcTOxJVQbGIu4AxRVga1WeTMaG9/bWgPCirlCTgbSN2n94x9eA9Owt7yffcOCmd+EyxiryxpZ2siUYShmBC9LuQv5qQVlWIpI0z0vE5hgV/wg+HyZr6L22mzwso5RESYoED41mXRNoDGGEjLvuldElWXUPe9rSBr0wKESLJTRm7BPFSfiVHbMzZ9wMEAvWqetslnL+kRSGYuRUZ+fnWMg7GvFXSd28NQQkvVMHZwSSf6izqeBsA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=kvack.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3/bZyWHQoKEw5iy9xhVjRwl4/N9WhdB8iG6w+9HFKPQ=; b=CpNtRGSpPfd6YgUdsJFWiQv50qtgpjXd1ZLAzfVLG1i+6yOtV4y4pG9Bd/yC+1C5ZojnhKNRTrSHF60HENgY0vefFnr+OI7Gqa44yVFxo1q3/J4FjRCMIJ12Dp1am4bUa5TaRMyd5TB5gT8rCJ4Vk9DyJD06P2zksS8jKAdUqtPCebwkO/5QKRCPBmdPS+QH9chEA+3L88S/n5bdrNBL5W1IU3udSMzcohkQ7JKzf2CDZyZ+ePoECyVxLaEOGCJu1kt8AIvjYOlji/e9whoBjS6JjFcZLwpZHG4a9T4xOD48XmHNgTaUgWonRhCsOFXhu17buCSmXyGxZ6F0cyQPCA== Received: from PH7PR10CA0012.namprd10.prod.outlook.com (2603:10b6:510:23d::26) by IA1PR12MB8518.namprd12.prod.outlook.com (2603:10b6:208:447::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7025.20; Thu, 23 Nov 2023 00:35:58 +0000 Received: from SN1PEPF0002BA51.namprd03.prod.outlook.com (2603:10b6:510:23d:cafe::81) by PH7PR10CA0012.outlook.office365.com (2603:10b6:510:23d::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7025.19 via Frontend Transport; Thu, 23 Nov 2023 00:35:58 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SN1PEPF0002BA51.mail.protection.outlook.com (10.167.242.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7025.12 via Frontend Transport; Thu, 23 Nov 2023 00:35:57 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Wed, 22 Nov 2023 16:35:44 -0800 Received: from rnnvmail201.nvidia.com (10.129.68.8) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Wed, 22 Nov 2023 16:35:43 -0800 Received: from sgarnayak-dt.nvidia.com (10.127.8.9) by mail.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41 via Frontend Transport; Wed, 22 Nov 2023 16:35:35 -0800 From: To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , Subject: [PATCH v2 2/4] mm: Add poison error check in fixup_user_fault() for mapped pfn Date: Thu, 23 Nov 2023 06:05:10 +0530 Message-ID: <20231123003513.24292-3-ankita@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231123003513.24292-1-ankita@nvidia.com> References: <20231123003513.24292-1-ankita@nvidia.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF0002BA51:EE_|IA1PR12MB8518:EE_ X-MS-Office365-Filtering-Correlation-Id: 41f53125-461b-4c1f-db98-08dbebbc2911 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: EY9H1jv+Y3CS+Oo0Z12esoY6jKBtgObbS6VLgRO+a1sYZa6HurFqt4BTb4ZwjRTzHyHt962CZ7KF8vOORuclbCC96P0F4697PrIjSm69QDfe5Gy29V8zkY2YManNfCUcVBhmL4zYBiu3AgQQKa9E9Iw2hiKhLm6dpgZQENnrC/XeHk5gwIo8ctHzIXqh9Mz/LVNmc1sMpTpAONvOL85IccvTDfat/JqUE092g09nMT2j6K6h8GKIjXgoQWZ2RAtiUhO6XV+bpasbuQr69UHxUcN+socMn0rvCs3HOSRIEcRWcKdSIVWMIE4bnB3YV8wI4xdq6RkYv2dybGUTBJma7hJpVf15er3aRqBp0h611FtAhJd1//9mKaTpG3QzESyClQq8ogrFwoEpPB6oDnvSq1QxJxjseC7f5SW9/N9h0r8NnNgxpxLXvhoqpVxaYqVLoHRF8oGIXsLosLAvz+n2H03WteVYNePRMakxZ9KluRnaq8YxsGy1NiHZGACNpOAAZXmQcYVqNdh0EHiQnqzV7WYDcQngGAs4+8LgPzSKNV5iZr/+JkFkM92AQHs1Ap6MWOeLlLb7Q34ZmtOlTEptkzH4wmWP08+Ry0/dZX9JWQZ5zrSNBjwPEW6o8sxZtz11GzacngEs32CqT5lC3o2JTwL955OkFKdIUWKik8DGZgn8UXhWY0woy3IBZSGGNAtNZq0O30vwCwemAHS9uKXy1SlgkcI9PJ1YndYjSVoxNwahrt+9aODgQ4n+CR7m1dzSXLxxrosQZady6+E6D73lOA== X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230031)(4636009)(39860400002)(346002)(136003)(376002)(396003)(230922051799003)(64100799003)(451199024)(186009)(82310400011)(1800799012)(46966006)(36840700001)(40470700004)(40460700003)(110136005)(316002)(54906003)(70586007)(70206006)(7696005)(36756003)(426003)(6666004)(336012)(26005)(1076003)(2616005)(478600001)(921008)(82740400003)(7636003)(356005)(83380400001)(36860700001)(86362001)(47076005)(2876002)(2906002)(7416002)(5660300002)(40480700001)(8676002)(8936002)(41300700001)(4326008);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Nov 2023 00:35:57.7222 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 41f53125-461b-4c1f-db98-08dbebbc2911 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF0002BA51.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR12MB8518 From: Ankit Agrawal The fixup_user_fault() currently does not expect a VM_FAULT_HWPOISON and hence does not check for it while calling vm_fault_to_errno(). Since we now have a new code path which can trigger such case, change fixup_user_fault to look for VM_FAULT_HWPOISON. Also make hva_to_pfn_remapped check for -EHWPOISON and communicate the poison fault up to the user_mem_abort(). Signed-off-by: Ankit Agrawal --- mm/gup.c | 2 +- virt/kvm/kvm_main.c | 6 ++++++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/gup.c b/mm/gup.c index 231711efa390..b78af20a0f52 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1414,7 +1414,7 @@ int fixup_user_fault(struct mm_struct *mm, } if (ret & VM_FAULT_ERROR) { - int err = vm_fault_to_errno(ret, 0); + int err = vm_fault_to_errno(ret, FOLL_HWPOISON); if (err) return err; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 486800a7024b..2ff067f21a7c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2731,6 +2731,12 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible, r = hva_to_pfn_remapped(vma, addr, write_fault, writable, &pfn); if (r == -EAGAIN) goto retry; + + if (r == -EHWPOISON) { + pfn = KVM_PFN_ERR_HWPOISON; + goto exit; + } + if (r < 0) pfn = KVM_PFN_ERR_FAULT; } else { From patchwork Thu Nov 23 00:35:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankit Agrawal X-Patchwork-Id: 13465627 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="a6c6bUiG" Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2041.outbound.protection.outlook.com [40.107.236.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 207881B5; Wed, 22 Nov 2023 16:36:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PRUq60LB4Ue8BxujOvyDL471XF/U4364CxEM3sFMmrLXcDSKLwX8eCG8InE2WnPpj5e7ii77PXECdXlpQkO61JXCb4HX4OJogTXDBlNO1UZJZxaVRiN8TdweTgE83TKqp6d9WCC8mNLk7wvGovISXN8vtI0qL0v2GeI4AA9dx64tdlMFmBdr/9emKY3OqdrQBj0ipNWTLnxcPPRVOOtDf3Nxgxueu/bruVQbcTJBfWWyUjLVzGL5CU9Sb4dEc/WEfTuiSMC0pGMOvzGKoKuOy0MqddWu1+TtH5sW//sskc9/T7BbZKMud9NTuzcYk2MhOUKQTlVSlNV4g7WXG4F9hQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=TKyHN62MM5zhnAXnGG0KI+SaesGrJGK3kKJ8ooZn3cE=; b=FEg2+isqcKdtKVrsfGFPbbMujjo/ebQkfTa+ZiMAfSg8veIZ+by9bKVtm+Zj5+9SP2GnSOi7nqHQiUKg6+PWFVU779L4EI9QsVr+YkCmWMspuNvva4s5FHFawd9dMQhPuii6zQvc9pBh4ukZTxlCGWbPeMumywqsXZPkSJcInm2/vD/QDqsH7mafP9D1ahp9URTMWTWox4yqCU5rH5r9bb5+aZNRw0+8OHewEx0sIwuAjL9RvGnq3LReJ//jdQ+RvQWksMKOa3GY7TZ2BEGLCOTZIXKu9J0CILnVAbZJZnfu/pRw3ho1PHsyGPUsFwqtJGKbZXgKWIfbpvqNoCExyQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=kvack.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TKyHN62MM5zhnAXnGG0KI+SaesGrJGK3kKJ8ooZn3cE=; b=a6c6bUiGYU9le44wHmZeTvZI6qIDQlOj+f1HtA0NnMIuiDe57ODNpmHXL4dlgZi1smKjrP2K/5D48rqKvaXcyHmIbrl+K2fLzub5DUlxlK5j75U1wUFmQg+B9z/fdLf3Oe7Ux4MRMFkdRNo3QlJfsFEiJi1ieaXbeFJP3xIrkbj610XbBNyQ2/lTnv2mgvLclfh1NjaDjYwtpzQP6pM68JrQfKyIgw7Rwvbc2xnj4szVeoOF8bELQSHm54PDGj+jK+aaFepHFZ6bRCqIv9Zb/o6DYP/4T4i4yeDd0EhpFM49ejLxObIF2kcLuVzuNMoWtt0bT8yYmaDZor6Hh2aT0A== Received: from SN7PR04CA0192.namprd04.prod.outlook.com (2603:10b6:806:126::17) by MN2PR12MB4239.namprd12.prod.outlook.com (2603:10b6:208:1d2::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7025.19; Thu, 23 Nov 2023 00:36:09 +0000 Received: from SN1PEPF0002BA4D.namprd03.prod.outlook.com (2603:10b6:806:126:cafe::c6) by SN7PR04CA0192.outlook.office365.com (2603:10b6:806:126::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7025.20 via Frontend Transport; Thu, 23 Nov 2023 00:36:08 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SN1PEPF0002BA4D.mail.protection.outlook.com (10.167.242.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7025.12 via Frontend Transport; Thu, 23 Nov 2023 00:36:08 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Wed, 22 Nov 2023 16:35:53 -0800 Received: from rnnvmail201.nvidia.com (10.129.68.8) by rnnvmail204.nvidia.com (10.129.68.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Wed, 22 Nov 2023 16:35:53 -0800 Received: from sgarnayak-dt.nvidia.com (10.127.8.9) by mail.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41 via Frontend Transport; Wed, 22 Nov 2023 16:35:44 -0800 From: To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , Subject: [PATCH v2 3/4] mm: Change ghes code to allow poison of non-struct pfn Date: Thu, 23 Nov 2023 06:05:11 +0530 Message-ID: <20231123003513.24292-4-ankita@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231123003513.24292-1-ankita@nvidia.com> References: <20231123003513.24292-1-ankita@nvidia.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF0002BA4D:EE_|MN2PR12MB4239:EE_ X-MS-Office365-Filtering-Correlation-Id: b765f315-9dc2-4d0a-7f72-08dbebbc2f8a X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: WrtQ1grfptBOa3bptbQQXk6KBrMPNG+Y04JYbdL9ZhZ8CgSD5EN7e3Dnp1xjPN4FyZDo3/DXm2/pFX+2auUmiiMB1tKA3xXsIwmcGJB47ZEHJ32jJisBuiLktz+e4EGncSKGO2/Pr5toc430ZAXjcMOtXUX60vTYRLSOoE4GzpxyXx982//yuLTwJvx8J+38FfKzkNHYwdCBIdf38UIsu0M2sPapBINeb2tCFj3Yg1Vdkzrh4M35mdEEZ87IxWSHNT7P1gQ9m1SCIsBJqlRjtaLlT18pa46rqUcOqJcbY94XSVu7M7m+M6qx7LPitX5Oy6AvGEM1Hdd3uOjgq4Npz1+psCZegQcL7BppzAGrXb16oOJ0l7z+IOviNj7q84X3r9qu9wlSiz6YNkvsCAicR3OweZuuJOSKUGPvAitKJkWEYcK8Lt6UrK0OdVfBUSmnC2gew+HuE5zAreHfQABRrhSWWHTSmL13ykY9uTM1qLT2zqaLSWKuSDs0CxX4Xa/yAyCqlKTaRgxg/uY6wh5UuxzTfGs3xGbuliAht0wH8Z9lJDa+Lx6nVTOJs+izhQLKp7OtMSADlD7m4FMf8yR1ONlIzx68nDZ3FnA/PA4MZITzYUQaBx6D8SGSIZkS7iRK+rrUSJeP0FmoPwdqACqPAYNzNN/+iflYOmnqhGiFx/pPT8cMjVNQQMZf6OnBizitsf1gHqULYMcor7WaeP4iJhkdElH/PDQwXd6dZRAeIokJNxI05cHLwBusYrh8WBeq X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230031)(4636009)(376002)(396003)(346002)(136003)(39860400002)(230922051799003)(451199024)(82310400011)(186009)(64100799003)(1800799012)(46966006)(40470700004)(36840700001)(2616005)(26005)(1076003)(336012)(478600001)(6666004)(47076005)(36860700001)(426003)(7416002)(7696005)(2906002)(2876002)(5660300002)(41300700001)(110136005)(70206006)(70586007)(8936002)(316002)(8676002)(4326008)(54906003)(40460700003)(356005)(82740400003)(7636003)(83380400001)(921008)(86362001)(36756003)(40480700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Nov 2023 00:36:08.5635 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b765f315-9dc2-4d0a-7f72-08dbebbc2f8a X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF0002BA4D.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB4239 From: Ankit Agrawal The GHES code allows calling of memory_failure() on the PFNs that pass the pfn_valid() check. This contract is broken for the remapped PFNs which fails the check and ghes_do_memory_failure() returns without triggering memory_failure(). Update code to allow memory_failure() call on PFNs failing pfn_valid(). Signed-off-by: Ankit Agrawal --- drivers/acpi/apei/ghes.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 63ad0541db38..0ca6ab9fccbe 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -471,20 +471,10 @@ static void ghes_kick_task_work(struct callback_head *head) static bool ghes_do_memory_failure(u64 physical_addr, int flags) { - unsigned long pfn; - if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) return false; - pfn = PHYS_PFN(physical_addr); - if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) { - pr_warn_ratelimited(FW_WARN GHES_PFX - "Invalid address in generic error data: %#llx\n", - physical_addr); - return false; - } - - memory_failure_queue(pfn, flags); + memory_failure_queue(PHYS_PFN(physical_addr), flags); return true; } From patchwork Thu Nov 23 00:35:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ankit Agrawal X-Patchwork-Id: 13465628 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="dodHc9g8" Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2065.outbound.protection.outlook.com [40.107.93.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22DA210C2; Wed, 22 Nov 2023 16:36:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Vk0nS4lhCb4avwE//vY090AL8tzSJ/L52+ub88jSUzAFtyGgIdWxECReJHuElFR+eWnpRM8f3VJfcTBNWlKKjR2nUfHPa9wd/cBUl/kUBq1H3mzwHo08DQINyomXQOkIlv07gac4k+EDAd+7FPagu0y5J44HgvOhVcQcWwVcnPl5QpJ1K3jAG+TUVdBt0Wqax2+gbUu15oYnSjzc7qLfiOMFduVen1qUiOs1fjERUqHDcdhwzTylLz0Pa2wnbv/ngaUnAspuWektvaH4BbFMVUHClhkq3ANHLUIzAR8ohRy18omA+9LLHTPatsWgUxXAQNlQYojeJ0GK7rjMUVbWPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=f/EoQPH49J/9G1oOSriQvHmV2OtoEDGKOUjky2Jz6OA=; b=d5F3A7iY2i/CfCmc1AjRnFbvResTlNAFIv550voLOtGypQyqkUDR4K/NKwcT94Clf5TXceeWVpf7MkXkS6qU26hppN/Wn7esycQ0GWofUpp/wNjk3ptmS84bvdfroZyOqaEHW+rbuMCHwda2v5Cor8W5OUj3gVYviLKNFT03GDaII3lh7UtCVfvjoNhucz1qVotHZnqQ1pKkYC5mdDcpR3EIMT28JjHkk2KhIatmnNLGMv3EcGXG/PTpL07bwrX6dhl720Xvf6pNnA8+/lAsnGGIcC1D+SNdGOAXRgg4ptCxz6UVntzkC194FLBvTQUq++pQNYJ1jpZIfv1tFJO3Uw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=kvack.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=f/EoQPH49J/9G1oOSriQvHmV2OtoEDGKOUjky2Jz6OA=; b=dodHc9g8iMBA4NyK01TcJQMuV/FCTe8ladSci+YBKihra1ZxqB9wTYmIhi/0+XtN9JFmEr8IA5Tdxko2d0IZBa+IpeqrMCy9rmkJFphfhnyyB8fxcKAhMKd58fMwB2h9O20ikzpxLedMQ0620pJTcAATnZlYFvVP5dnwbKG7usCeZoV8PoHIVUfZn8xKDGwIDLeaCelrUG6NLK8KscgwzVjbrGYu/C65G1M7GjtEIImQpnQK9Qgdf574loEopIrFhqjzjmo4latkeW8oJoHnJGZCU00DL5dLC1R9TqWpM4dKIRSAwDRDlbheJgSNcvWO/8YxuC1b3hQmivQ/XmN+yw== Received: from DM6PR02CA0077.namprd02.prod.outlook.com (2603:10b6:5:1f4::18) by CH3PR12MB8709.namprd12.prod.outlook.com (2603:10b6:610:17c::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7025.19; Thu, 23 Nov 2023 00:36:17 +0000 Received: from DS3PEPF000099D5.namprd04.prod.outlook.com (2603:10b6:5:1f4:cafe::7e) by DM6PR02CA0077.outlook.office365.com (2603:10b6:5:1f4::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7002.27 via Frontend Transport; Thu, 23 Nov 2023 00:36:17 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by DS3PEPF000099D5.mail.protection.outlook.com (10.167.17.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7025.12 via Frontend Transport; Thu, 23 Nov 2023 00:36:17 +0000 Received: from rnnvmail202.nvidia.com (10.129.68.7) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Wed, 22 Nov 2023 16:36:03 -0800 Received: from rnnvmail201.nvidia.com (10.129.68.8) by rnnvmail202.nvidia.com (10.129.68.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41; Wed, 22 Nov 2023 16:36:03 -0800 Received: from sgarnayak-dt.nvidia.com (10.127.8.9) by mail.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.41 via Frontend Transport; Wed, 22 Nov 2023 16:35:54 -0800 From: To: , , , , , , , , , , , , , , , , CC: , , , , , , , , , , , , , , , , Subject: [PATCH v2 4/4] vfio/nvgpu: register device memory for poison handling Date: Thu, 23 Nov 2023 06:05:12 +0530 Message-ID: <20231123003513.24292-5-ankita@nvidia.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20231123003513.24292-1-ankita@nvidia.com> References: <20231123003513.24292-1-ankita@nvidia.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-NV-OnPremToCloud: ExternallySecured X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS3PEPF000099D5:EE_|CH3PR12MB8709:EE_ X-MS-Office365-Filtering-Correlation-Id: 5d625f86-f225-4845-0fa2-08dbebbc34fb X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: vE8GSyABOaBpIoM5iPGqh/fsbLwNrscJOgVJaLiGwhos9NX0eFBy5ZoFvTKcQ/tZQQiUsWXXc2uh5k1w3UOidjBSi2l2enZa2OZ0lkaq4NICCUy562d79X6fSHD9LyGem5ZKY8xP7npf5WyD3MICZrUo6QECeVvsez7cFieV7oeaj1d1wcDGAxBgBGOySXuGlRyyySCkbDMC5tZgl7XQ8imESTp1v7L3n3aXz/X99QJ0+H1gcG5BnbB+agy2C7P9rCBNvopyn4CHoZHZSAZRm1jhGCMn7HJtEHVUwDRZWjH60LBi3ggQt/pga9cXKWfcE42AV7/RIN/MR1r/jpVXW3II71uI6vYGepOBxMLRqXHCWejLc2g1hVS1i9Bapq3lkV1vjcuROWU6A2nlC6rNAfuTi/nJEWjyOco0N/T5vFGciSDZ42MrP9A7Nhi5w18E+wRGezbMw3OYkgMrhc8aheeqMNyTM/7RSsqlBN7Iu5H/2yKmlfL+szbWnrw/NT1MBa9BYjASlU4gQUv+fdXFuy5zxH+ejbu+mLiyWC75AOQeUTmLylNjbH5cGiWr2RwsSN5brzjP0+hTDHferd6xLVIWwUZvj5jtDAa81G4cDWPtL0XHVPqdkCLdZZlp8Mw9mVjiZ2E1y5/r63m5ykHtcMvC9V8znZ1OAtwuxOJH08Lzhx/VnCYlKmVHERWSXCydqeZCCirpSrKVDfb7qOGexPNn3kRrAoBwFIO7qPkcYwwH0WUDDG+v3haNkBGtmcyP4OsmGbnQXIOf+zgdxotkx+KwO1MZMmz/Qqo/Z7WCw78= X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230031)(4636009)(39860400002)(376002)(346002)(136003)(396003)(230922051799003)(451199024)(64100799003)(1800799012)(82310400011)(186009)(40470700004)(46966006)(36840700001)(40460700003)(1076003)(336012)(426003)(7696005)(6666004)(2616005)(47076005)(83380400001)(36860700001)(4326008)(5660300002)(8936002)(8676002)(41300700001)(478600001)(7416002)(2876002)(966005)(2906002)(316002)(26005)(356005)(110136005)(70206006)(54906003)(70586007)(86362001)(36756003)(82740400003)(7636003)(921008)(40480700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Nov 2023 00:36:17.6742 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5d625f86-f225-4845-0fa2-08dbebbc34fb X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DS3PEPF000099D5.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH3PR12MB8709 From: Ankit Agrawal The nvgrace-gpu-vfio-pci module [1] maps the device memory to the user VA (Qemu) using remap_pfn_range() without adding the memory to the kernel. The device memory pages are not backed by struct page. Patches 1-3 implements the mechanism to handle ECC/poison on memory page without struct page and expose a registration function. This new mechanism is leveraged here. The module registers its memory region with the kernel MM for ECC handling using the register_pfn_address_space() registration API exposed by the kernel. It also defines a failure callback function pfn_memory_failure() to get the poisoned PFN from the MM. The module track poisoned PFN using a hastable. The PFN is communicated by the kernel MM to the module through the failure function, which push the appropriate memory offset to the hashtable. The module also defines a VMA fault ops for the module. It returns VM_FAULT_HWPOISON in case the memory offset is found in the hashtable. [1] https://lore.kernel.org/all/20231114081611.30550-1-ankita@nvidia.com/ Signed-off-by: Ankit Agrawal --- drivers/vfio/pci/nvgrace-gpu/main.c | 123 +++++++++++++++++++++++++++- drivers/vfio/vfio_main.c | 3 +- 2 files changed, 124 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c index b8634974e5cc..5a567375bd14 100644 --- a/drivers/vfio/pci/nvgrace-gpu/main.c +++ b/drivers/vfio/pci/nvgrace-gpu/main.c @@ -6,6 +6,16 @@ #include #include #include +#ifdef CONFIG_MEMORY_FAILURE +#include +#include +#include +#endif + +struct h_node { + unsigned long mem_offset; + struct hlist_node node; +}; struct nvgrace_gpu_vfio_pci_core_device { struct vfio_pci_core_device core_device; @@ -13,8 +23,96 @@ struct nvgrace_gpu_vfio_pci_core_device { size_t memlength; void *memmap; struct mutex memmap_lock; +#ifdef CONFIG_MEMORY_FAILURE + struct pfn_address_space pfn_address_space; + DECLARE_HASHTABLE(htbl, 8); +#endif +}; + +#ifdef CONFIG_MEMORY_FAILURE +static void +nvgrace_gpu_vfio_pci_pfn_memory_failure(struct pfn_address_space *pfn_space, + unsigned long pfn) +{ + struct nvgrace_gpu_vfio_pci_core_device *nvdev = container_of( + pfn_space, struct nvgrace_gpu_vfio_pci_core_device, pfn_address_space); + unsigned long mem_offset = pfn - pfn_space->node.start; + struct h_node *ecc; + + if (mem_offset >= (nvdev->memlength >> PAGE_SHIFT)) + return; + + /* + * MM has called to notify a poisoned page. Track that in the hastable. + */ + ecc = (struct h_node *)(vzalloc(sizeof(struct h_node))); + ecc->mem_offset = mem_offset; + hash_add(nvdev->htbl, &(ecc->node), ecc->mem_offset); +} + +struct pfn_address_space_ops nvgrace_gpu_vfio_pci_pas_ops = { + .failure = nvgrace_gpu_vfio_pci_pfn_memory_failure, }; +static int +nvgrace_gpu_vfio_pci_register_pfn_range(struct nvgrace_gpu_vfio_pci_core_device *nvdev, + struct vm_area_struct *vma) +{ + unsigned long nr_pages; + int ret = 0; + + nr_pages = nvdev->memlength >> PAGE_SHIFT; + + nvdev->pfn_address_space.node.start = vma->vm_pgoff; + nvdev->pfn_address_space.node.last = vma->vm_pgoff + nr_pages - 1; + nvdev->pfn_address_space.ops = &nvgrace_gpu_vfio_pci_pas_ops; + nvdev->pfn_address_space.mapping = vma->vm_file->f_mapping; + + ret = register_pfn_address_space(&(nvdev->pfn_address_space)); + + return ret; +} + +extern struct vfio_device *vfio_device_from_file(struct file *file); + +static vm_fault_t nvgrace_gpu_vfio_pci_fault(struct vm_fault *vmf) +{ + unsigned long mem_offset = vmf->pgoff - vmf->vma->vm_pgoff; + struct vfio_device *core_vdev; + struct nvgrace_gpu_vfio_pci_core_device *nvdev; + bool found = false; + struct h_node *cur; + + if (!(vmf->vma->vm_file)) + goto error_exit; + + core_vdev = vfio_device_from_file(vmf->vma->vm_file); + + if (!core_vdev) + goto error_exit; + + nvdev = container_of(core_vdev, + struct nvgrace_gpu_vfio_pci_core_device, core_device.vdev); + + if (mem_offset < (nvdev->memlength >> PAGE_SHIFT)) { + /* + * Check if the page is poisoned. + */ + hash_for_each_possible(nvdev->htbl, cur, node, mem_offset) { + if (cur->mem_offset == mem_offset) + return VM_FAULT_HWPOISON; + } + } + +error_exit: + return VM_FAULT_ERROR; +} + +static const struct vm_operations_struct nvgrace_gpu_vfio_pci_mmap_ops = { + .fault = nvgrace_gpu_vfio_pci_fault, +}; +#endif + static int nvgrace_gpu_vfio_pci_open_device(struct vfio_device *core_vdev) { struct vfio_pci_core_device *vdev = @@ -46,6 +144,9 @@ static void nvgrace_gpu_vfio_pci_close_device(struct vfio_device *core_vdev) mutex_destroy(&nvdev->memmap_lock); +#ifdef CONFIG_MEMORY_FAILURE + unregister_pfn_address_space(&(nvdev->pfn_address_space)); +#endif vfio_pci_core_close_device(core_vdev); } @@ -103,8 +204,12 @@ static int nvgrace_gpu_vfio_pci_mmap(struct vfio_device *core_vdev, return ret; vma->vm_pgoff = start_pfn; +#ifdef CONFIG_MEMORY_FAILURE + vma->vm_ops = &nvgrace_gpu_vfio_pci_mmap_ops; - return 0; + ret = nvgrace_gpu_vfio_pci_register_pfn_range(nvdev, vma); +#endif + return ret; } static long @@ -413,6 +518,12 @@ nvgrace_gpu_vfio_pci_fetch_memory_property(struct pci_dev *pdev, nvdev->memlength = memlength; +#ifdef CONFIG_MEMORY_FAILURE + /* + * Initialize the hashtable tracking the poisoned pages. + */ + hash_init(nvdev->htbl); +#endif return ret; } @@ -448,6 +559,16 @@ static void nvgrace_gpu_vfio_pci_remove(struct pci_dev *pdev) { struct nvgrace_gpu_vfio_pci_core_device *nvdev = nvgrace_gpu_drvdata(pdev); struct vfio_pci_core_device *vdev = &nvdev->core_device; +#ifdef CONFIG_MEMORY_FAILURE + struct h_node *cur; + unsigned long bkt; + struct hlist_node *tmp_node; + + hash_for_each_safe(nvdev->htbl, bkt, tmp_node, cur, node) { + hash_del(&cur->node); + vfree(cur); + } +#endif vfio_pci_core_unregister_device(vdev); vfio_put_device(&vdev->vdev); diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 8d4995ada74a..290431ac2e00 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1319,7 +1319,7 @@ const struct file_operations vfio_device_fops = { .mmap = vfio_device_fops_mmap, }; -static struct vfio_device *vfio_device_from_file(struct file *file) +struct vfio_device *vfio_device_from_file(struct file *file) { struct vfio_device_file *df = file->private_data; @@ -1327,6 +1327,7 @@ static struct vfio_device *vfio_device_from_file(struct file *file) return NULL; return df->device; } +EXPORT_SYMBOL_GPL(vfio_device_from_file); /** * vfio_file_is_valid - True if the file is valid vfio file