From patchwork Fri Aug 2 15:55:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13751724 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5942A1537D4 for ; Fri, 2 Aug 2024 15:55:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614148; cv=none; b=hwPzAOgnlIoOc4RIi6xT0B5uShHX+Dg1FGfxq01WKZN115YUxf4nk4MynlIYK7E2mhtXmYzD3efwuf+0em1aeIsxgG7IeXdIPP6CMv6p8IhF2RBvXjbO11IcJgGW0Ki1OjXVy9BxMezEFYTgsuDdAQL1iddUzMCBENGEO8QLaM4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614148; c=relaxed/simple; bh=yWrrysTviCd1i7aiL4my77/CM6Vr1IENrGtPRMboIWo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YC+HbLhvm8ku63RRhtBone90QSVSiwM/a4EpRIoyMfNRBdMqbe0M4UWDLgtwHhSMsG96rBg8XyYu8fLfuT+z3sYybftAeL+aIwzSHhVtpKdUHPLg+jLzw/iF0ZU5Qt8y8RSpCmaZnD0kFhCk6rBRxYaT7Wo//+ZxRNprf+/SuH4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=SWLVrXYL; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="SWLVrXYL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722614145; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UBGeX8yYPmEXT+9EKnjIl4n92Yc2dlnMkDo/CXaQu2Y=; b=SWLVrXYLlD1mTkAKXyJKND/yYYDgQzGmHlG/uToQkS9yJZJ3Mk/DVh1H9NJJHEAN2Lp1ei DtrrwhztYAC/iWpmumSm101tMGJs+OvcAIxnPObXUtG4RdODp1rJu4zbemnaIti6OYWQHz aJYF9U0H6OjvwqQhOTizguwV19SPB0M= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-693-3V3FyyZyMi6_6IaMXo6ocw-1; Fri, 02 Aug 2024 11:55:42 -0400 X-MC-Unique: 3V3FyyZyMi6_6IaMXo6ocw-1 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D08A81955EB7; Fri, 2 Aug 2024 15:55:39 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.192.113]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 90DD4300018D; Fri, 2 Aug 2024 15:55:33 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sven Schnelle , Gerald Schaefer Subject: [PATCH v1 01/11] mm: provide vm_normal_(page|folio)_pmd() with CONFIG_PGTABLE_HAS_HUGE_LEAVES Date: Fri, 2 Aug 2024 17:55:14 +0200 Message-ID: <20240802155524.517137-2-david@redhat.com> In-Reply-To: <20240802155524.517137-1-david@redhat.com> References: <20240802155524.517137-1-david@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 We want to make use of vm_normal_page_pmd() in generic page table walking code where we might walk hugetlb folios that are mapped by PMDs even without CONFIG_TRANSPARENT_HUGEPAGE. So let's expose vm_normal_page_pmd() + vm_normal_folio_pmd() with CONFIG_PGTABLE_HAS_HUGE_LEAVES. Signed-off-by: David Hildenbrand --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index 4c8716cb306c..29772beb3275 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -666,7 +666,7 @@ struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr, return NULL; } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t pmd) { From patchwork Fri Aug 2 15:55:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13751725 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C7A51537B8 for ; Fri, 2 Aug 2024 15:55:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614154; cv=none; b=e28e7TvtnD9Elta6rL7PGPzsOw2QrbFC07KzlwgUs2lNlNFostAb1B8b3cz94wBlMo/Qyj0+gLqnvV4CSolBjOV1iBnjXjkLPGMALE4Zw4WmjtTyMGagcb2xUjX/yHRllef3A8LmVABktec/Ez7ml4LpZGYidKVoPKsK7Qk/OwY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614154; c=relaxed/simple; bh=+2mciwQZhQgiba+/ptMKy+LVt5h9+laF7gW8HKD2Vew=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ObquIjVv1EyZn6jcO076OXrKMvfxtnbR00EPihG2GYgBcBi0RZGWjWjjqC8Aelxruwu1X0O/hLrizw3pird/zTzA+kQzh8YYXOuAo9R2MA0UBL7KLeC3YPXVNrFpgRZ9egK5z9hCTbN4qAnyz0s+u5ZIBbuQTBOP+Rn9JBAzWZQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=TIfX8drz; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TIfX8drz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722614151; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UsiWiNzwiDquKdRCTf6m6XAEYG/mEcIY3V2xwGU3/s4=; b=TIfX8drzhTr4lkUr4JU+8Urygs1Kf/hkBu2/OLj8XuXkDtpxhQMWW8ya5HWWYGHd6+qIL4 N8+6Sm+XeJxhd7xHRAMXLPjP3te+mfsNTp1LSmRHR8ATvxTBaLVCIJO8sQf9flJjkPJoT6 6/jBzqfraRPJLHmo+w2w1ek3GolPYcM= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-295-BK2N-7WpOw6In_X1MJIYww-1; Fri, 02 Aug 2024 11:55:48 -0400 X-MC-Unique: BK2N-7WpOw6In_X1MJIYww-1 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 19D131944AAF; Fri, 2 Aug 2024 15:55:46 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.192.113]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 5F05D300018D; Fri, 2 Aug 2024 15:55:40 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sven Schnelle , Gerald Schaefer Subject: [PATCH v1 02/11] mm/pagewalk: introduce folio_walk_start() + folio_walk_end() Date: Fri, 2 Aug 2024 17:55:15 +0200 Message-ID: <20240802155524.517137-3-david@redhat.com> In-Reply-To: <20240802155524.517137-1-david@redhat.com> References: <20240802155524.517137-1-david@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 We want to get rid of follow_page(), and have a more reasonable way to just lookup a folio mapped at a certain address, perform some checks while still under PTL, and then only conditionally grab a folio reference if really required. Further, we might want to get rid of some walk_page_range*() users that really only want to temporarily lookup a single folio at a single address. So let's add a new page table walker that does exactly that, similarly to GUP also being able to walk hugetlb VMAs. Add folio_walk_end() as a macro for now: the compiler is not easy to please with the pte_unmap()->kunmap_local(). Note that one difference between follow_page() and get_user_pages(1) is that follow_page() will not trigger faults to get something mapped. So folio_walk is at least currently not a replacement for get_user_pages(1), but could likely be extended/reused to achieve something similar in the future. Signed-off-by: David Hildenbrand --- include/linux/pagewalk.h | 58 +++++++++++ mm/pagewalk.c | 202 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 260 insertions(+) diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 27cd1e59ccf7..f5eb5a32aeed 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -130,4 +130,62 @@ int walk_page_mapping(struct address_space *mapping, pgoff_t first_index, pgoff_t nr, const struct mm_walk_ops *ops, void *private); +typedef int __bitwise folio_walk_flags_t; + +/* + * Walk migration entries as well. Careful: a large folio might get split + * concurrently. + */ +#define FW_MIGRATION ((__force folio_walk_flags_t)BIT(0)) + +/* Walk shared zeropages (small + huge) as well. */ +#define FW_ZEROPAGE ((__force folio_walk_flags_t)BIT(1)) + +enum folio_walk_level { + FW_LEVEL_PTE, + FW_LEVEL_PMD, + FW_LEVEL_PUD, +}; + +/** + * struct folio_walk - folio_walk_start() / folio_walk_end() data + * @page: exact folio page referenced (if applicable) + * @level: page table level identifying the entry type + * @pte: pointer to the page table entry (FW_LEVEL_PTE). + * @pmd: pointer to the page table entry (FW_LEVEL_PMD). + * @pud: pointer to the page table entry (FW_LEVEL_PUD). + * @ptl: pointer to the page table lock. + * + * (see folio_walk_start() documentation for more details) + */ +struct folio_walk { + /* public */ + struct page *page; + enum folio_walk_level level; + union { + pte_t *ptep; + pud_t *pudp; + pmd_t *pmdp; + }; + union { + pte_t pte; + pud_t pud; + pmd_t pmd; + }; + /* private */ + struct vm_area_struct *vma; + spinlock_t *ptl; +}; + +struct folio *folio_walk_start(struct folio_walk *fw, + struct vm_area_struct *vma, unsigned long addr, + folio_walk_flags_t flags); + +#define folio_walk_end(__fw, __vma) do { \ + spin_unlock((__fw)->ptl); \ + if (likely((__fw)->level == FW_LEVEL_PTE)) \ + pte_unmap((__fw)->ptep); \ + vma_pgtable_walk_end(__vma); \ +} while (0) + #endif /* _LINUX_PAGEWALK_H */ diff --git a/mm/pagewalk.c b/mm/pagewalk.c index ae2f08ce991b..cd79fb3b89e5 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -3,6 +3,8 @@ #include #include #include +#include +#include /* * We want to know the real level where a entry is located ignoring any @@ -654,3 +656,203 @@ int walk_page_mapping(struct address_space *mapping, pgoff_t first_index, return err; } + +/** + * folio_walk_start - walk the page tables to a folio + * @fw: filled with information on success. + * @vma: the VMA. + * @addr: the virtual address to use for the page table walk. + * @flags: flags modifying which folios to walk to. + * + * Walk the page tables using @addr in a given @vma to a mapped folio and + * return the folio, making sure that the page table entry referenced by + * @addr cannot change until folio_walk_end() was called. + * + * As default, this function returns only folios that are not special (e.g., not + * the zeropage) and never returns folios that are supposed to be ignored by the + * VM as documented by vm_normal_page(). If requested, zeropages will be + * returned as well. + * + * As default, this function only considers present page table entries. + * If requested, it will also consider migration entries. + * + * If this function returns NULL it might either indicate "there is nothing" or + * "there is nothing suitable". + * + * On success, @fw is filled and the function returns the folio while the PTL + * is still held and folio_walk_end() must be called to clean up, + * releasing any held locks. The returned folio must *not* be used after the + * call to folio_walk_end(), unless a short-term folio reference is taken before + * that call. + * + * @fw->page will correspond to the page that is effectively referenced by + * @addr. However, for migration entries and shared zeropages @fw->page is + * set to NULL. Note that large folios might be mapped by multiple page table + * entries, and this function will always only lookup a single entry as + * specified by @addr, which might or might not cover more than a single page of + * the returned folio. + * + * This function must *not* be used as a naive replacement for + * get_user_pages() / pin_user_pages(), especially not to perform DMA or + * to carelessly modify page content. This function may *only* be used to grab + * short-term folio references, never to grab long-term folio references. + * + * Using the page table entry pointers in @fw for reading or modifying the + * entry should be avoided where possible: however, there might be valid + * use cases. + * + * WARNING: Modifying page table entries in hugetlb VMAs requires a lot of care. + * For example, PMD page table sharing might require prior unsharing. Also, + * logical hugetlb entries might span multiple physical page table entries, + * which *must* be modified in a single operation (set_huge_pte_at(), + * huge_ptep_set_*, ...). Note that the page table entry stored in @fw might + * not correspond to the first physical entry of a logical hugetlb entry. + * + * The mmap lock must be held in read mode. + * + * Return: folio pointer on success, otherwise NULL. + */ +struct folio *folio_walk_start(struct folio_walk *fw, + struct vm_area_struct *vma, unsigned long addr, + folio_walk_flags_t flags) +{ + unsigned long entry_size; + bool expose_page = true; + struct page *page; + pud_t *pudp, pud; + pmd_t *pmdp, pmd; + pte_t *ptep, pte; + spinlock_t *ptl; + pgd_t *pgdp; + p4d_t *p4dp; + + mmap_assert_locked(vma->vm_mm); + vma_pgtable_walk_begin(vma); + + if (WARN_ON_ONCE(addr < vma->vm_start || addr >= vma->vm_end)) + goto not_found; + + pgdp = pgd_offset(vma->vm_mm, addr); + if (pgd_none_or_clear_bad(pgdp)) + goto not_found; + + p4dp = p4d_offset(pgdp, addr); + if (p4d_none_or_clear_bad(p4dp)) + goto not_found; + + pudp = pud_offset(p4dp, addr); + pud = pudp_get(pudp); + if (pud_none(pud)) + goto not_found; + if (IS_ENABLED(CONFIG_PGTABLE_HAS_HUGE_LEAVES) && pud_leaf(pud)) { + ptl = pud_lock(vma->vm_mm, pudp); + pud = pudp_get(pudp); + + entry_size = PUD_SIZE; + fw->level = FW_LEVEL_PUD; + fw->pudp = pudp; + fw->pud = pud; + + if (!pud_present(pud) || pud_devmap(pud)) { + spin_unlock(ptl); + goto not_found; + } else if (!pud_leaf(pud)) { + spin_unlock(ptl); + goto pmd_table; + } + /* + * TODO: vm_normal_page_pud() will be handy once we want to + * support PUD mappings in VM_PFNMAP|VM_MIXEDMAP VMAs. + */ + page = pud_page(pud); + goto found; + } + +pmd_table: + VM_WARN_ON_ONCE(pud_leaf(*pudp)); + pmdp = pmd_offset(pudp, addr); + pmd = pmdp_get_lockless(pmdp); + if (pmd_none(pmd)) + goto not_found; + if (IS_ENABLED(CONFIG_PGTABLE_HAS_HUGE_LEAVES) && pmd_leaf(pmd)) { + ptl = pmd_lock(vma->vm_mm, pmdp); + pmd = pmdp_get(pmdp); + + entry_size = PMD_SIZE; + fw->level = FW_LEVEL_PMD; + fw->pmdp = pmdp; + fw->pmd = pmd; + + if (pmd_none(pmd)) { + spin_unlock(ptl); + goto not_found; + } else if (!pmd_leaf(pmd)) { + spin_unlock(ptl); + goto pte_table; + } else if (pmd_present(pmd)) { + page = vm_normal_page_pmd(vma, addr, pmd); + if (page) { + goto found; + } else if ((flags & FW_ZEROPAGE) && + is_huge_zero_pmd(pmd)) { + page = pfn_to_page(pmd_pfn(pmd)); + expose_page = false; + goto found; + } + } else if ((flags & FW_MIGRATION) && + is_pmd_migration_entry(pmd)) { + swp_entry_t entry = pmd_to_swp_entry(pmd); + + page = pfn_swap_entry_to_page(entry); + expose_page = false; + goto found; + } + spin_unlock(ptl); + goto not_found; + } + +pte_table: + VM_WARN_ON_ONCE(pmd_leaf(pmdp_get_lockless(pmdp))); + ptep = pte_offset_map_lock(vma->vm_mm, pmdp, addr, &ptl); + if (!ptep) + goto not_found; + pte = ptep_get(ptep); + + entry_size = PAGE_SIZE; + fw->level = FW_LEVEL_PTE; + fw->ptep = ptep; + fw->pte = pte; + + if (pte_present(pte)) { + page = vm_normal_page(vma, addr, pte); + if (page) + goto found; + if ((flags & FW_ZEROPAGE) && + is_zero_pfn(pte_pfn(pte))) { + page = pfn_to_page(pte_pfn(pte)); + expose_page = false; + goto found; + } + } else if (!pte_none(pte)) { + swp_entry_t entry = pte_to_swp_entry(pte); + + if ((flags & FW_MIGRATION) && + is_migration_entry(entry)) { + page = pfn_swap_entry_to_page(entry); + expose_page = false; + goto found; + } + } + pte_unmap_unlock(ptep, ptl); +not_found: + vma_pgtable_walk_end(vma); + return NULL; +found: + if (expose_page) + /* Note: Offset from the mapped page, not the folio start. */ + fw->page = nth_page(page, (addr & (entry_size - 1)) >> PAGE_SHIFT); + else + fw->page = NULL; + fw->ptl = ptl; + return page_folio(page); +} From patchwork Fri Aug 2 15:55:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13751726 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C77751537D0 for ; Fri, 2 Aug 2024 15:55:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614161; cv=none; b=FEOF3VYPGcRzdS4hiUOzeXoaYxjhcSV2Br/56WnVecu5Dv2utJjLII/5ltkPmEumu3cYJGmwbbVNS8k+74mZx9iHGAUHEfQdf2D2/sykaVn/j60h71NQh+eTur0qbk3w/dULxV44K+Kyc3F/MpWMfntyVXkru4w8aXgUI2L8FCc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614161; c=relaxed/simple; bh=bP6pH0ekuQFtmuSKGa8PXlt8u6jV+CJT+tkCQ3nBDTs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rqyS7RwPPk3+pQyaFf/Kxk/MPhLXi4LJ7vbQ9qsV2nEYYXA7ZqB7OCpb8SzvHCgO51OMnu8B03ydRZf6J1uJp2N/0aCskH0GfsuNjo8rbxqN1FUgnrzMK+n9tjr/1iWNaj88SgwQcnMi8/iRP/2DKNdrt+W2KUod+uknXPqU5Bk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=iD1CoK8y; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="iD1CoK8y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722614159; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bw5wWvDZXB6NwMKpULq7b52Hk472J0+TTnpqg3dn/tg=; b=iD1CoK8yND4V5mBrxk2QaHXqmMGZaVxG20SVfRJw24Yqr+DOmj3x5kRJ2SQdn4biaxJBQL XDWBS0tyAdDSaJdMs3lbHR0x0hZ7BnxLc8zYaHqdKEFVw1eN6x/7JFZ0i7j8m4LtIGrJDt Gp0XS5rQ5D1Od2CaZBlyiXoRwooVrAM= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-63-0uIxfBSuNxWDrKG_hH-lfg-1; Fri, 02 Aug 2024 11:55:55 -0400 X-MC-Unique: 0uIxfBSuNxWDrKG_hH-lfg-1 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2085618B669A; Fri, 2 Aug 2024 15:55:53 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.192.113]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A8A8F3000198; Fri, 2 Aug 2024 15:55:46 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sven Schnelle , Gerald Schaefer Subject: [PATCH v1 03/11] mm/migrate: convert do_pages_stat_array() from follow_page() to folio_walk Date: Fri, 2 Aug 2024 17:55:16 +0200 Message-ID: <20240802155524.517137-4-david@redhat.com> In-Reply-To: <20240802155524.517137-1-david@redhat.com> References: <20240802155524.517137-1-david@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Let's use folio_walk instead, so we can avoid taking a folio reference just to read the nid and get rid of another follow_page()/FOLL_DUMP user. Use FW_ZEROPAGE so we can return "-EFAULT" for it as documented. The possible return values for follow_page() were confusing, especially with FOLL_DUMP set. We'll handle it like documented in the man page: * -EFAULT: This is a zero page or the memory area is not mapped by the process. * -ENOENT: The page is not present. We'll keep setting -ENOENT for ZONE_DEVICE. Maybe not the right thing to do, but it likely doesn't really matter (just like for weird devmap, whereby we fake "not present"). Note that the other errors (-EACCESS, -EBUSY, -EIO, -EINVAL, -ENOMEM) so far only applied when actually moving pages, not when only querying stats. We'll effectively drop the "secretmem" check we had in follow_page(), but that shouldn't really matter here, we're not accessing folio/page content after all. Signed-off-by: David Hildenbrand --- mm/migrate.c | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index aa482c954cb0..b5365a434ba9 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -50,6 +50,7 @@ #include #include #include +#include #include @@ -2331,28 +2332,26 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, for (i = 0; i < nr_pages; i++) { unsigned long addr = (unsigned long)(*pages); struct vm_area_struct *vma; - struct page *page; + struct folio_walk fw; + struct folio *folio; int err = -EFAULT; vma = vma_lookup(mm, addr); if (!vma) goto set_status; - /* FOLL_DUMP to ignore special (like zero) pages */ - page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP); - - err = PTR_ERR(page); - if (IS_ERR(page)) - goto set_status; - - err = -ENOENT; - if (!page) - goto set_status; - - if (!is_zone_device_page(page)) - err = page_to_nid(page); - - put_page(page); + folio = folio_walk_start(&fw, vma, addr, FW_ZEROPAGE); + if (folio) { + if (is_zero_folio(folio) || is_huge_zero_folio(folio)) + err = -EFAULT; + else if (folio_is_zone_device(folio)) + err = -ENOENT; + else + err = folio_nid(folio); + folio_walk_end(&fw, vma); + } else { + err = -ENOENT; + } set_status: *status = err; From patchwork Fri Aug 2 15:55:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13751727 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF91F1E3CA5 for ; Fri, 2 Aug 2024 15:56:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614172; cv=none; b=JBLyZIshX3y4yAH8Z9dGUXLv5hbv0k5i88AgXzL0dBtzz+NAme+xHQQ1eTOtm7EjzyYCk38l7k6IY8UYpo52HdMBI6XfI1Xo8CYegSpCUKuUmShK2JrsTzcbPUY0b46roSHof0Lu88nRe78CxT8TjWKcaAvzZiYwOi2um/qpYCo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614172; c=relaxed/simple; bh=0PHdseon7tXgt7KnjM1V2dOzESCD71xZm07JeIYRqCY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IbHh3KXThaE5UnLS8LZhgHwYhuhVX7GZyruaj56BnM8j/TZEF4m/2lmmyun8Iqe75qDTV1gn6MHpJ5s37UudVNKLoCL8b9EkRaMknH/LO3KIAk6tyPRQX7NwIayCgeluJqsagE/6P3ZA/lb/+fisXH2paHhx5feum2Wkt5t3liQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=OZKeBxdY; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="OZKeBxdY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722614168; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WGUlr9yCrIjZfp2Jios2/QtrJcjV6+nMbt6NumQJE9A=; b=OZKeBxdYIflPsUSvPtMo03AmuvdX1xJgfM7uJQURD2afWlncALCdKkdjfIEQp6Q72iDPeM GNvIRlsbe0M4IzvlXypjwwbggFB+UFS0c4biMQJMSnpoUO8Y+6t+oVdd0d5izUv7TIJwzf Fy86vLvLO7zn2cSFO2iO16YhBhG+1/c= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-8-GL6qdjBaOLKZo_tI8pVNMA-1; Fri, 02 Aug 2024 11:56:03 -0400 X-MC-Unique: GL6qdjBaOLKZo_tI8pVNMA-1 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9722A1955D4D; Fri, 2 Aug 2024 15:55:59 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.192.113]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 863C5300018D; Fri, 2 Aug 2024 15:55:53 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sven Schnelle , Gerald Schaefer Subject: [PATCH v1 04/11] mm/migrate: convert add_page_for_migration() from follow_page() to folio_walk Date: Fri, 2 Aug 2024 17:55:17 +0200 Message-ID: <20240802155524.517137-5-david@redhat.com> In-Reply-To: <20240802155524.517137-1-david@redhat.com> References: <20240802155524.517137-1-david@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Let's use folio_walk instead, so we can avoid taking a folio reference when we won't even be trying to migrate the folio and to get rid of another follow_page()/FOLL_DUMP user. Use FW_ZEROPAGE so we can return "-EFAULT" for it as documented. We now perform the folio_likely_mapped_shared() check under PTL, which is what we want: relying on the mapcount and friends after dropping the PTL does not make too much sense, as the page can get unmapped concurrently from this process. Further, we perform the folio isolation under PTL, similar to how we handle it for MADV_PAGEOUT. The possible return values for follow_page() were confusing, especially with FOLL_DUMP set. We'll handle it like documented in the man page: * -EFAULT: This is a zero page or the memory area is not mapped by the process. * -ENOENT: The page is not present. We'll keep setting -ENOENT for ZONE_DEVICE. Maybe not the right thing to do, but it likely doesn't really matter (just like for weird devmap, whereby we fake "not present"). The other errros are left as is, and match the documentation in the man page. While at it, rename add_page_for_migration() to add_folio_for_migration(). We'll lose the "secretmem" check, but that shouldn't really matter because these folios cannot ever be migrated. Should vma_migratable() refuse these VMAs? Maybe. Signed-off-by: David Hildenbrand --- mm/migrate.c | 100 +++++++++++++++++++++++---------------------------- 1 file changed, 45 insertions(+), 55 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index b5365a434ba9..e1383d9cc944 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2112,76 +2112,66 @@ static int do_move_pages_to_node(struct list_head *pagelist, int node) return err; } +static int __add_folio_for_migration(struct folio *folio, int node, + struct list_head *pagelist, bool migrate_all) +{ + if (is_zero_folio(folio) || is_huge_zero_folio(folio)) + return -EFAULT; + + if (folio_is_zone_device(folio)) + return -ENOENT; + + if (folio_nid(folio) == node) + return 0; + + if (folio_likely_mapped_shared(folio) && !migrate_all) + return -EACCES; + + if (folio_test_hugetlb(folio)) { + if (isolate_hugetlb(folio, pagelist)) + return 1; + } else if (folio_isolate_lru(folio)) { + list_add_tail(&folio->lru, pagelist); + node_stat_mod_folio(folio, + NR_ISOLATED_ANON + folio_is_file_lru(folio), + folio_nr_pages(folio)); + return 1; + } + return -EBUSY; +} + /* - * Resolves the given address to a struct page, isolates it from the LRU and + * Resolves the given address to a struct folio, isolates it from the LRU and * puts it to the given pagelist. * Returns: - * errno - if the page cannot be found/isolated + * errno - if the folio cannot be found/isolated * 0 - when it doesn't have to be migrated because it is already on the * target node * 1 - when it has been queued */ -static int add_page_for_migration(struct mm_struct *mm, const void __user *p, +static int add_folio_for_migration(struct mm_struct *mm, const void __user *p, int node, struct list_head *pagelist, bool migrate_all) { struct vm_area_struct *vma; - unsigned long addr; - struct page *page; + struct folio_walk fw; struct folio *folio; - int err; + unsigned long addr; + int err = -EFAULT; mmap_read_lock(mm); addr = (unsigned long)untagged_addr_remote(mm, p); - err = -EFAULT; vma = vma_lookup(mm, addr); - if (!vma || !vma_migratable(vma)) - goto out; - - /* FOLL_DUMP to ignore special (like zero) pages */ - page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP); - - err = PTR_ERR(page); - if (IS_ERR(page)) - goto out; - - err = -ENOENT; - if (!page) - goto out; - - folio = page_folio(page); - if (folio_is_zone_device(folio)) - goto out_putfolio; - - err = 0; - if (folio_nid(folio) == node) - goto out_putfolio; - - err = -EACCES; - if (folio_likely_mapped_shared(folio) && !migrate_all) - goto out_putfolio; - - err = -EBUSY; - if (folio_test_hugetlb(folio)) { - if (isolate_hugetlb(folio, pagelist)) - err = 1; - } else { - if (!folio_isolate_lru(folio)) - goto out_putfolio; - - err = 1; - list_add_tail(&folio->lru, pagelist); - node_stat_mod_folio(folio, - NR_ISOLATED_ANON + folio_is_file_lru(folio), - folio_nr_pages(folio)); + if (vma && vma_migratable(vma)) { + folio = folio_walk_start(&fw, vma, addr, FW_ZEROPAGE); + if (folio) { + err = __add_folio_for_migration(folio, node, pagelist, + migrate_all); + folio_walk_end(&fw, vma); + } else { + err = -ENOENT; + } } -out_putfolio: - /* - * Either remove the duplicate refcount from folio_isolate_lru() - * or drop the folio ref if it was not isolated. - */ - folio_put(folio); -out: mmap_read_unlock(mm); return err; } @@ -2275,8 +2265,8 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, * Errors in the page lookup or isolation are not fatal and we simply * report them via status */ - err = add_page_for_migration(mm, p, current_node, &pagelist, - flags & MPOL_MF_MOVE_ALL); + err = add_folio_for_migration(mm, p, current_node, &pagelist, + flags & MPOL_MF_MOVE_ALL); if (err > 0) { /* The page is successfully queued for migration */ From patchwork Fri Aug 2 15:55:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13751728 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D519B1E486C for ; Fri, 2 Aug 2024 15:56:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614176; cv=none; b=Jl/KHbbSTEXiQ7Am1cS73Odv9Mzxz3zR3m/lwvbPRgC3j140Msxiz2amRS9sbxuN4btZwiOmt/AEEMArVUFVHR8aisGu+DQPa9mXvpPkaQeonFxTaIWyEdQO7jOTje0AAS+f70f3gO9qFUvUyWAiaxRjP2FFKoGTYTdtU34uTCU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614176; c=relaxed/simple; bh=P92AymmJj/Ft8pl2A+91m7BSNQa3s/9+AJd/Q8AkboU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=G7OK6gt9CI/qKCUFQ+9V1jhTm3ymEyoGjbsdAFXySCwlM3Jy7Yn/olKjLO0B0c3hyPES9QSvCA0GYBtkY4N2gKbaP3+16/bgrHOvNjUIK9gfcge75TkgfbmYEdHYW0iv+9KXEsLwowdXVzRJiFEfix6JDWNo7r5RhNKdKBRboVk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=OubIpU+N; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="OubIpU+N" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722614174; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Dsokb2QZ1A8J7FAYTpU1vDg99kVkubfGaPDc1WAA3q4=; b=OubIpU+NhCmv6kWJXFG8SkcZ52HY1WtLa9Sh+cdgbI9vntamICNshmtNgdrMw0WXG1SAVb 0lT7FCOD+enmN4xzYXtBJeqk6tXnVFlLZ14gmlYHl6yCkALd3wsmhGcl4CW6eWKhwn7rbP R8CGFIsmc6Qmuko9WC6fzXHnZz8X1Kw= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-318-NNjgzyEROca_nRik9KEcaw-1; Fri, 02 Aug 2024 11:56:08 -0400 X-MC-Unique: NNjgzyEROca_nRik9KEcaw-1 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4CD5B19560B1; Fri, 2 Aug 2024 15:56:06 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.192.113]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 241AD300018D; Fri, 2 Aug 2024 15:55:59 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sven Schnelle , Gerald Schaefer Subject: [PATCH v1 05/11] mm/ksm: convert get_mergeable_page() from follow_page() to folio_walk Date: Fri, 2 Aug 2024 17:55:18 +0200 Message-ID: <20240802155524.517137-6-david@redhat.com> In-Reply-To: <20240802155524.517137-1-david@redhat.com> References: <20240802155524.517137-1-david@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Let's use folio_walk instead, for example avoiding taking temporary folio references if the folio does not even apply and getting rid of one more follow_page() user. Note that zeropages obviously don't apply: old code could just have specified FOLL_DUMP. Anon folios are never secretmem, so we don't care about losing the check in follow_page(). Signed-off-by: David Hildenbrand --- mm/ksm.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index 14d9e53b1ec2..742b005f3f77 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -767,26 +767,28 @@ static struct page *get_mergeable_page(struct ksm_rmap_item *rmap_item) struct mm_struct *mm = rmap_item->mm; unsigned long addr = rmap_item->address; struct vm_area_struct *vma; - struct page *page; + struct page *page = NULL; + struct folio_walk fw; + struct folio *folio; mmap_read_lock(mm); vma = find_mergeable_vma(mm, addr); if (!vma) goto out; - page = follow_page(vma, addr, FOLL_GET); - if (IS_ERR_OR_NULL(page)) - goto out; - if (is_zone_device_page(page)) - goto out_putpage; - if (PageAnon(page)) { + folio = folio_walk_start(&fw, vma, addr, 0); + if (folio) { + if (!folio_is_zone_device(folio) && + folio_test_anon(folio)) { + folio_get(folio); + page = fw.page; + } + folio_walk_end(&fw, vma); + } +out: + if (page) { flush_anon_page(vma, page, addr); flush_dcache_page(page); - } else { -out_putpage: - put_page(page); -out: - page = NULL; } mmap_read_unlock(mm); return page; From patchwork Fri Aug 2 15:55:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13751729 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01B1615C156 for ; Fri, 2 Aug 2024 15:56:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614184; cv=none; b=EPzaJZO8idMpWFz9SzopReP2A+3TVMSrYuixxySXLC9qeaUeZMs5ilYg5QaoShuw0ZNz1b0CV/B7UblyFrnseUnKh23PJQxT0IaaC33fMyEJJ2FrPPwLCqCTJJbJgUAYeQFQtuQbuzcdbdNeos8qErbqL4rGZFmzF1bfZPdScIM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614184; c=relaxed/simple; bh=zyobrQ+EEzE8yl0gNwh8+PYSEOv1I4PIQJWs9IzsPB8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=R/ozh5tOSvBmuBCXFYIIv9i4Z/ZC/TtF8VjwRFgDrXG7rdp0PwvTSJzc0ABExyxlhNGKV6zfVEDjF7+srKFl6dHOheJhDONKvQlsfhQ9fQ6jUvFkD2g67GnEknVS/hVcMiRIw1bbGd+T6nD9QfYLmg1XjxSwxU47M5Ul4w28/D8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=D7nyP25P; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="D7nyP25P" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722614182; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9RoWh69jXPnUan+groy9VerJmyWjUE6OshAG0jiN7k8=; b=D7nyP25PhUXVzcEdw4EiObXts/+n8391N31DpPckE9zPUMS6wzIQndlIttqCb4+TBiADTS 5k4Xi+cevQ9iRCaLDRS6ycyiiinK6ml3bSLPsaBd7AhZSpd1+6WewFtJi8FT6Qc478gkjP XnJLAACk77UxhSnBSh71W9zny7stB2w= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-654-6rd6U-xePRWdI5xPe_dR3w-1; Fri, 02 Aug 2024 11:56:15 -0400 X-MC-Unique: 6rd6U-xePRWdI5xPe_dR3w-1 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 45CAA18B65ED; Fri, 2 Aug 2024 15:56:13 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.192.113]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 8A4CE300018D; Fri, 2 Aug 2024 15:56:06 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sven Schnelle , Gerald Schaefer Subject: [PATCH v1 06/11] mm/ksm: convert scan_get_next_rmap_item() from follow_page() to folio_walk Date: Fri, 2 Aug 2024 17:55:19 +0200 Message-ID: <20240802155524.517137-7-david@redhat.com> In-Reply-To: <20240802155524.517137-1-david@redhat.com> References: <20240802155524.517137-1-david@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Let's use folio_walk instead, for example avoiding taking temporary folio references if the folio does obviously not even apply and getting rid of one more follow_page() user. We cannot move all handling under the PTL, so leave the rmap handling (which implies an allocation) out. Note that zeropages obviously don't apply: old code could just have specified FOLL_DUMP. Further, we don't care about losing the secretmem check in follow_page(): these are never anon pages and vma_ksm_compatible() would never consider secretmem vmas (VM_SHARED | VM_MAYSHARE must be set for secretmem, see secretmem_mmap()). Signed-off-by: David Hildenbrand --- mm/ksm.c | 38 ++++++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index 742b005f3f77..0f5b2bba4ef0 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2564,36 +2564,46 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page) ksm_scan.address = vma->vm_end; while (ksm_scan.address < vma->vm_end) { + struct page *tmp_page = NULL; + struct folio_walk fw; + struct folio *folio; + if (ksm_test_exit(mm)) break; - *page = follow_page(vma, ksm_scan.address, FOLL_GET); - if (IS_ERR_OR_NULL(*page)) { - ksm_scan.address += PAGE_SIZE; - cond_resched(); - continue; + + folio = folio_walk_start(&fw, vma, ksm_scan.address, 0); + if (folio) { + if (!folio_is_zone_device(folio) && + folio_test_anon(folio)) { + folio_get(folio); + tmp_page = fw.page; + } + folio_walk_end(&fw, vma); } - if (is_zone_device_page(*page)) - goto next_page; - if (PageAnon(*page)) { - flush_anon_page(vma, *page, ksm_scan.address); - flush_dcache_page(*page); + + if (tmp_page) { + flush_anon_page(vma, tmp_page, ksm_scan.address); + flush_dcache_page(tmp_page); rmap_item = get_next_rmap_item(mm_slot, ksm_scan.rmap_list, ksm_scan.address); if (rmap_item) { ksm_scan.rmap_list = &rmap_item->rmap_list; - if (should_skip_rmap_item(*page, rmap_item)) + if (should_skip_rmap_item(tmp_page, rmap_item)) { + folio_put(folio); goto next_page; + } ksm_scan.address += PAGE_SIZE; - } else - put_page(*page); + *page = tmp_page; + } else { + folio_put(folio); + } mmap_read_unlock(mm); return rmap_item; } next_page: - put_page(*page); ksm_scan.address += PAGE_SIZE; cond_resched(); } From patchwork Fri Aug 2 15:55:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13751755 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 592F13DABE3 for ; Fri, 2 Aug 2024 15:56:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614190; cv=none; b=j7hMN3P3ttEKfPEcBhDMlPr3UdyqEKOQ8mZG3cwrTHv7z2mGU53hbI3zHy1sGq2BClYSHJ/WXhaez9lAznMYXGH8q7Tk8/Amof9JoeIBPxUUXUSh77xz6PiiG/WTSppDPOjkpFPePGoBuQdQTmwlnBSmxDNpPnDFl/tC4QL8InQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614190; c=relaxed/simple; bh=28p9D+KQFhZK/nXSJOf5//9mKCyB6fypztEnAu+mAsM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EbORwLPOIQZqX65ynVDy+feGj8FPkocuXtg5RUhEGz0OAfab/b3ADyHouCGiOVBC0roHXnTsBRRYFGyzMUqC9IKVBFgzIdgxeEcaBF5wcqrFl1t/qd0u+OZeH2rSy0US6MWYXnR+KW61dDUSUHzRtjN/W9dQVuEarHldTHDiiwg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=WlH+biBV; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WlH+biBV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722614187; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Mv/si6kOfwIvneZw24kO6Mkvevsw9QN/WnWMpKXtofY=; b=WlH+biBVDdr75Zz0Fm3uyHg9DbHl92hUBaBjoaCmlmNZqvxIZl0U1f28U3Axs72dEVE4wm 1WjjtgjkmRBBx+PVPAGqzY+b+xajYmAhbwgrpHWHewKyqZYjQBo9Tg1vqzSqJJXvKb+kQY NxbA0+j+09SvzP7RRTFphB92xjOn0yY= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-261-HVr8hLF5NxywldDisXzbUw-1; Fri, 02 Aug 2024 11:56:21 -0400 X-MC-Unique: HVr8hLF5NxywldDisXzbUw-1 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7675919B9A9B; Fri, 2 Aug 2024 15:56:19 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.192.113]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7F6E1300019D; Fri, 2 Aug 2024 15:56:13 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sven Schnelle , Gerald Schaefer Subject: [PATCH v1 07/11] mm/huge_memory: convert split_huge_pages_pid() from follow_page() to folio_walk Date: Fri, 2 Aug 2024 17:55:20 +0200 Message-ID: <20240802155524.517137-8-david@redhat.com> In-Reply-To: <20240802155524.517137-1-david@redhat.com> References: <20240802155524.517137-1-david@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Let's remove yet another follow_page() user. Note that we have to do the split without holding the PTL, after folio_walk_end(). We don't care about losing the secretmem check in follow_page(). Signed-off-by: David Hildenbrand Signed-off-by: David Hildenbrand Reviewed-by: Zi Yan --- mm/huge_memory.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0167dc27e365..697fcf89f975 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include @@ -3507,7 +3508,7 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, */ for (addr = vaddr_start; addr < vaddr_end; addr += PAGE_SIZE) { struct vm_area_struct *vma = vma_lookup(mm, addr); - struct page *page; + struct folio_walk fw; struct folio *folio; if (!vma) @@ -3519,13 +3520,10 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, continue; } - /* FOLL_DUMP to ignore special (like zero) pages */ - page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP); - - if (IS_ERR_OR_NULL(page)) + folio = folio_walk_start(&fw, vma, addr, 0); + if (!folio) continue; - folio = page_folio(page); if (!is_transparent_hugepage(folio)) goto next; @@ -3544,13 +3542,19 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, if (!folio_trylock(folio)) goto next; + folio_get(folio); + folio_walk_end(&fw, vma); if (!split_folio_to_order(folio, new_order)) split++; folio_unlock(folio); -next: folio_put(folio); + + cond_resched(); + continue; +next: + folio_walk_end(&fw, vma); cond_resched(); } mmap_read_unlock(mm); From patchwork Fri Aug 2 15:55:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13751756 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E987166F0D for ; Fri, 2 Aug 2024 15:56:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614196; cv=none; b=fPLy566KYtSTgO49PwBdWwlA590xPpvulpQ39JaHaoRwmfkRM+QuBqyrx+nd+WiNyvlqgbxZjicB+Z/hOAz3fNheWimjxyACmh4kHAhMiz7cRMyFjx9bM/+9K0BhongVEEcYyImB03B8TjbOjO0XXYkDCzIcp3AdA+7q04wv7WI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614196; c=relaxed/simple; bh=2IhrE8Vmt4VK8rVWOEfE0I0BvwytyEl+f20lNv5wS/g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NL9/Ok1Y7aOOmujJamp3TJJVzEqyutgvYadgzAF68137Pq0AkKfxT7e+XtcBI5R+p99UrOZIWTjvES5SMgWP3OHxSxHy49hBj7XobUdYY7ieyO/JaxkgWUK3XXpepiYdpTnabp426KswcsNquwxugC+af5ug6/JTO4Srq93TWo8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=VMyO9V5j; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="VMyO9V5j" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722614194; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pjcECCTgCmWZTwab+g/RP394PBHLh37fGlT1XvTATXs=; b=VMyO9V5jw5PBC7zQEqKVIsbk2jI0a1m2Gg/Wds7dtOoDh1R9rj2QYr2r8+7QeRzYpnN9Md 7O2qCW1wOJBlgbJbsYHirQEpBE0oWYrvJnCx8ekJ46z8wYroYyXN+l4JsFGB2fsp2u9zkw 5oSEJCf8dguEb20+c7zAk2S3ShXDkqg= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-74-WFy2ItSoPW6bK_LvEE1k_Q-1; Fri, 02 Aug 2024 11:56:29 -0400 X-MC-Unique: WFy2ItSoPW6bK_LvEE1k_Q-1 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 95BE31910417; Fri, 2 Aug 2024 15:56:26 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.192.113]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 11D163000198; Fri, 2 Aug 2024 15:56:19 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sven Schnelle , Gerald Schaefer Subject: [PATCH v1 08/11] s390/uv: convert gmap_destroy_page() from follow_page() to folio_walk Date: Fri, 2 Aug 2024 17:55:21 +0200 Message-ID: <20240802155524.517137-9-david@redhat.com> In-Reply-To: <20240802155524.517137-1-david@redhat.com> References: <20240802155524.517137-1-david@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Let's get rid of another follow_page() user and perform the UV calls under PTL -- which likely should be fine. No need for an additional reference while holding the PTL: uv_destroy_folio() and uv_convert_from_secure_folio() raise the refcount, so any concurrent make_folio_secure() would see an unexpted reference and cannot set PG_arch_1 concurrently. Do we really need a writable PTE? Likely yes, because the "destroy" part is, in comparison to the export, a destructive operation. So we'll keep the writability check for now. We'll lose the secretmem check from follow_page(). Likely we don't care about that here. Signed-off-by: David Hildenbrand Reviewed-by: Claudio Imbrenda --- arch/s390/kernel/uv.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c index 35ed2aea8891..9646f773208a 100644 --- a/arch/s390/kernel/uv.c +++ b/arch/s390/kernel/uv.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -462,9 +463,9 @@ EXPORT_SYMBOL_GPL(gmap_convert_to_secure); int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr) { struct vm_area_struct *vma; + struct folio_walk fw; unsigned long uaddr; struct folio *folio; - struct page *page; int rc; rc = -EFAULT; @@ -483,11 +484,15 @@ int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr) goto out; rc = 0; - /* we take an extra reference here */ - page = follow_page(vma, uaddr, FOLL_WRITE | FOLL_GET); - if (IS_ERR_OR_NULL(page)) + folio = folio_walk_start(&fw, vma, uaddr, 0); + if (!folio) goto out; - folio = page_folio(page); + /* + * See gmap_make_secure(): large folios cannot be secure. Small + * folio implies FW_LEVEL_PTE. + */ + if (folio_test_large(folio) || !pte_write(fw.pte)) + goto out_walk_end; rc = uv_destroy_folio(folio); /* * Fault handlers can race; it is possible that two CPUs will fault @@ -500,7 +505,8 @@ int gmap_destroy_page(struct gmap *gmap, unsigned long gaddr) */ if (rc) rc = uv_convert_from_secure_folio(folio); - folio_put(folio); +out_walk_end: + folio_walk_end(&fw, vma); out: mmap_read_unlock(gmap->mm); return rc; From patchwork Fri Aug 2 15:55:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13751757 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F23117E705 for ; Fri, 2 Aug 2024 15:56:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614207; cv=none; b=LYC/R7KwMyoTIEJLmYxFJOKQ2kY/yGj5hv+yMyT0ILglLMN+s3HpQmsglJys+T1sxFpc12ilUOMM9DBcL2BteTXnf9Khkm6bOHu2CDZ0RcLBpvcgZA5eclHKKck9UnlLxif34gY7aaREK1PviHevYzVQuVKMUJuRTnvPjRzg2dQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614207; c=relaxed/simple; bh=D7p0bladQLZdcwqcqDuCFmY1nVG1fLgP938/gYuixF0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=g/XVkxaYNykPaGaI1Q0oOmFGSalauIjm0R8W1Kwu6utK5tJ2Ni42qpfgVdWuD/wdm1FFjnOlmG7oZf/qH1CJ+M35myOmSC8Xz4fSgO69oUcGtJhSLVUonciwvVsOoO9dZQDq73uvBWpDwkizG0yhmQ7DJY/DcQnZd56qcsOZRNc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=f0TtLs3b; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="f0TtLs3b" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722614203; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y8hDUFeX9csW/JE1owVpjHxLVtzwb2VhQ3RSyvg6DEc=; b=f0TtLs3bjsEhCvNLIivKYozUU5WnT2Pqj0C+4rbhCMIul/xABTHy1PHyRcTF8PpLBaWjD6 ew++LhdWIjzOju/JkkkCkIlU+wIfyP5aOFlJrYYN1uS88uQ95vHI57VYA4e+KKpZ5sfK6Q q1gj3VP1uQ5ks7NCNupmcLorSrutPb0= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-314-xQRxca8hNvOMbFAn_7Ci0g-1; Fri, 02 Aug 2024 11:56:36 -0400 X-MC-Unique: xQRxca8hNvOMbFAn_7Ci0g-1 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9C68619776D7; Fri, 2 Aug 2024 15:56:32 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.192.113]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CD843300018D; Fri, 2 Aug 2024 15:56:26 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sven Schnelle , Gerald Schaefer Subject: [PATCH v1 09/11] s390/mm/fault: convert do_secure_storage_access() from follow_page() to folio_walk Date: Fri, 2 Aug 2024 17:55:22 +0200 Message-ID: <20240802155524.517137-10-david@redhat.com> In-Reply-To: <20240802155524.517137-1-david@redhat.com> References: <20240802155524.517137-1-david@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Let's get rid of another follow_page() user and perform the conversion under PTL: Note that this is also what follow_page_pte() ends up doing. Unfortunately we cannot currently optimize out the additional reference, because arch_make_folio_accessible() must be called with a raised refcount to protect against concurrent conversion to secure. We can just move the arch_make_folio_accessible() under the PTL, like follow_page_pte() would. We'll effectively drop the "writable" check implied by FOLL_WRITE: follow_page_pte() would also not check that when calling arch_make_folio_accessible(), so there is no good reason for doing that here. We'll lose the secretmem check from follow_page() as well, about which we shouldn't really care about. Signed-off-by: David Hildenbrand Reviewed-by: Claudio Imbrenda --- arch/s390/mm/fault.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 8e149ef5e89b..ad8b0d6b77ea 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include #include @@ -492,9 +493,9 @@ void do_secure_storage_access(struct pt_regs *regs) union teid teid = { .val = regs->int_parm_long }; unsigned long addr = get_fault_address(regs); struct vm_area_struct *vma; + struct folio_walk fw; struct mm_struct *mm; struct folio *folio; - struct page *page; struct gmap *gmap; int rc; @@ -536,15 +537,18 @@ void do_secure_storage_access(struct pt_regs *regs) vma = find_vma(mm, addr); if (!vma) return handle_fault_error(regs, SEGV_MAPERR); - page = follow_page(vma, addr, FOLL_WRITE | FOLL_GET); - if (IS_ERR_OR_NULL(page)) { + folio = folio_walk_start(&fw, vma, addr, 0); + if (!folio) { mmap_read_unlock(mm); break; } - folio = page_folio(page); - if (arch_make_folio_accessible(folio)) - send_sig(SIGSEGV, current, 0); + /* arch_make_folio_accessible() needs a raised refcount. */ + folio_get(folio); + rc = arch_make_folio_accessible(folio); folio_put(folio); + folio_walk_end(&fw, vma); + if (rc) + send_sig(SIGSEGV, current, 0); mmap_read_unlock(mm); break; case KERNEL_FAULT: From patchwork Fri Aug 2 15:55:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13751758 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA77B14EC57 for ; Fri, 2 Aug 2024 15:56:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614210; cv=none; b=liL++DM3F2jkBFUjDH8PDEl3BOSVZyrNB+1rzNF+6f4dLgD5qm9jnfeUCcqSWHdJNCM8O3rZsZE9oj9mYBiTMhEuXJ8DbKEdYbxRGM7MSSsN9+BXqWYL9cjPIq9ctUF7KrPQZOHtZpFpGoxl9+2BvH3o7hYboe2X1eTAdjWxlMs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614210; c=relaxed/simple; bh=BHfhJrsd92h4Q1KfRv5HAZdhBRATlXmf64EjgmZS4Bg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Xcgrd/WKG/aaI4kkMJ3sZfmk6iFkzClmcr7DhNoYvOGq6326XNbdUTxS2aRkyWv6rq8W811dENACRLigTKwwYbpEVGJYhIFoBZdHrQtr1Wxrtde/2NHlBEJ9WL89fSFLdFhtNWgzy4eJYJRxFx18FtJlY3WLszfwiK5+g2I0lXo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=h+y+SmvA; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="h+y+SmvA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722614208; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MPEnEoi5Ma31G0FuCm1jYE13UYiHj4w8jLpqqbcOSIc=; b=h+y+SmvAAiZkxG9NZqLM2wk40jadIEBjg+XrcZFZgx/YKFu5RDe49NNdvJ36yhhWJLB8t1 bxcptxFfbLl1s8032yQ3Pg3LOpLL4hNQu/XCGRm1rXrSlbvWMRTLHVoNoP47FP3VUjCcVU ZX6camtMkpd8N1M1aQKZn4UhpsLmJdU= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-150-6ny0YSGpPsS9rY-jD1QXAg-1; Fri, 02 Aug 2024 11:56:42 -0400 X-MC-Unique: 6ny0YSGpPsS9rY-jD1QXAg-1 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 420521944B27; Fri, 2 Aug 2024 15:56:39 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.192.113]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 13545300018D; Fri, 2 Aug 2024 15:56:32 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sven Schnelle , Gerald Schaefer Subject: [PATCH v1 10/11] mm: remove follow_page() Date: Fri, 2 Aug 2024 17:55:23 +0200 Message-ID: <20240802155524.517137-11-david@redhat.com> In-Reply-To: <20240802155524.517137-1-david@redhat.com> References: <20240802155524.517137-1-david@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 All users are gone, let's remove it and any leftovers in comments. We'll leave any FOLL/follow_page_() naming cleanups as future work. Signed-off-by: David Hildenbrand --- Documentation/mm/transhuge.rst | 6 +++--- include/linux/mm.h | 3 --- mm/filemap.c | 2 +- mm/gup.c | 24 +----------------------- mm/nommu.c | 6 ------ 5 files changed, 5 insertions(+), 36 deletions(-) diff --git a/Documentation/mm/transhuge.rst b/Documentation/mm/transhuge.rst index 1ba0ad63246c..a2cd8800d527 100644 --- a/Documentation/mm/transhuge.rst +++ b/Documentation/mm/transhuge.rst @@ -31,10 +31,10 @@ Design principles feature that applies to all dynamic high order allocations in the kernel) -get_user_pages and follow_page -============================== +get_user_pages and pin_user_pages +================================= -get_user_pages and follow_page if run on a hugepage, will return the +get_user_pages and pin_user_pages if run on a hugepage, will return the head or tail pages as usual (exactly as they would do on hugetlbfs). Most GUP users will only care about the actual physical address of the page and its temporary pinning to release after the I/O diff --git a/include/linux/mm.h b/include/linux/mm.h index 2f6c08b53e4f..ee8cea73d415 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3527,9 +3527,6 @@ static inline vm_fault_t vmf_fs_error(int err) return VM_FAULT_SIGBUS; } -struct page *follow_page(struct vm_area_struct *vma, unsigned long address, - unsigned int foll_flags); - static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags) { if (vm_fault & VM_FAULT_OOM) diff --git a/mm/filemap.c b/mm/filemap.c index d62150418b91..4130be74f6fd 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -112,7 +112,7 @@ * ->swap_lock (try_to_unmap_one) * ->private_lock (try_to_unmap_one) * ->i_pages lock (try_to_unmap_one) - * ->lruvec->lru_lock (follow_page->mark_page_accessed) + * ->lruvec->lru_lock (follow_page_mask->mark_page_accessed) * ->lruvec->lru_lock (check_pte_range->isolate_lru_page) * ->private_lock (folio_remove_rmap_pte->set_page_dirty) * ->i_pages lock (folio_remove_rmap_pte->set_page_dirty) diff --git a/mm/gup.c b/mm/gup.c index 3e8484c893aa..d19884e097fd 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1072,28 +1072,6 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, return page; } -struct page *follow_page(struct vm_area_struct *vma, unsigned long address, - unsigned int foll_flags) -{ - struct follow_page_context ctx = { NULL }; - struct page *page; - - if (vma_is_secretmem(vma)) - return NULL; - - if (WARN_ON_ONCE(foll_flags & FOLL_PIN)) - return NULL; - - /* - * We never set FOLL_HONOR_NUMA_FAULT because callers don't expect - * to fail on PROT_NONE-mapped pages. - */ - page = follow_page_mask(vma, address, foll_flags, &ctx); - if (ctx.pgmap) - put_dev_pagemap(ctx.pgmap); - return page; -} - static int get_gate_page(struct mm_struct *mm, unsigned long address, unsigned int gup_flags, struct vm_area_struct **vma, struct page **page) @@ -2519,7 +2497,7 @@ static bool is_valid_gup_args(struct page **pages, int *locked, * These flags not allowed to be specified externally to the gup * interfaces: * - FOLL_TOUCH/FOLL_PIN/FOLL_TRIED/FOLL_FAST_ONLY are internal only - * - FOLL_REMOTE is internal only and used on follow_page() + * - FOLL_REMOTE is internal only, set in (get|pin)_user_pages_remote() * - FOLL_UNLOCKABLE is internal only and used if locked is !NULL */ if (WARN_ON_ONCE(gup_flags & INTERNAL_GUP_FLAGS)) diff --git a/mm/nommu.c b/mm/nommu.c index 40cac1348b40..385b0c15add8 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1578,12 +1578,6 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, return ret; } -struct page *follow_page(struct vm_area_struct *vma, unsigned long address, - unsigned int foll_flags) -{ - return NULL; -} - int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t prot) { From patchwork Fri Aug 2 15:55:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13751759 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B5921A34AA for ; Fri, 2 Aug 2024 15:56:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614214; cv=none; b=Qhf08HwBsRtZD/2GxkgxSJVEh3lGZWoTCHBZYrtnFAp5fj+k9lVS1ERb2H16umxP5k62vr7KhSVE7ex9E0ipSqgLweIkhPCMl7M66I2V49CFeYbzPQ3xbqhUWp/VROyJFQQ3dNoeF9KC8ZpNukn+ShIHW/jKHKmCZcZsdhiCPxw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722614214; c=relaxed/simple; bh=B9QpgUZqHBSgII4REwP9lD0/53JlrVZdILJpl6GeaCQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GBv7747G1gASF+PgVUKa3Mm/YOSIr1mU60ZSsWl887xCx1SDevEjTYe/B3Zmdzb/oEQOqbn3YxQFn5vgwB9N9YGo9J4Bz2O/4M+ZDpn3wBEHSDD+OAwyVL+ojw/HcFak4tmpAKGbr/SiPbEbvnQ7zJtyjdPgXp/3ejGwrbuqfcQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=HhTudOSc; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HhTudOSc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722614211; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XisBOloUEP3qStGU5kEnvOleMULjhsDMvLI1Qb38ywI=; b=HhTudOSczoDCzNBQBnIpia1CQZtRqjMaQQvctcz8uWfl66KLPLUyn1ol+K/k191SmfJMPC dNrmVHXMqvC+ubNoXK0PsnTG0RTa8SC64T5iDnWwQ16YMJtictZtzwmkevfTyKqs+6yRnt Hl2lreAjSkmNVDvvDQGeRkcDoFO/0aE= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-533-7fMx_I3IMvS20EQpewJYGQ-1; Fri, 02 Aug 2024 11:56:48 -0400 X-MC-Unique: 7fMx_I3IMvS20EQpewJYGQ-1 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 833C51955D4D; Fri, 2 Aug 2024 15:56:45 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.192.113]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B30F4300018D; Fri, 2 Aug 2024 15:56:39 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org, David Hildenbrand , Andrew Morton , "Matthew Wilcox (Oracle)" , Jonathan Corbet , Christian Borntraeger , Janosch Frank , Claudio Imbrenda , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sven Schnelle , Gerald Schaefer Subject: [PATCH v1 11/11] mm/ksm: convert break_ksm() from walk_page_range_vma() to folio_walk Date: Fri, 2 Aug 2024 17:55:24 +0200 Message-ID: <20240802155524.517137-12-david@redhat.com> In-Reply-To: <20240802155524.517137-1-david@redhat.com> References: <20240802155524.517137-1-david@redhat.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Let's simplify by reusing folio_walk. Keep the existing behavior by handling migration entries and zeropages. Signed-off-by: David Hildenbrand --- mm/ksm.c | 63 ++++++++++++++------------------------------------------ 1 file changed, 16 insertions(+), 47 deletions(-) diff --git a/mm/ksm.c b/mm/ksm.c index 0f5b2bba4ef0..8e53666bc7b0 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -608,47 +608,6 @@ static inline bool ksm_test_exit(struct mm_struct *mm) return atomic_read(&mm->mm_users) == 0; } -static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long next, - struct mm_walk *walk) -{ - struct page *page = NULL; - spinlock_t *ptl; - pte_t *pte; - pte_t ptent; - int ret; - - pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); - if (!pte) - return 0; - ptent = ptep_get(pte); - if (pte_present(ptent)) { - page = vm_normal_page(walk->vma, addr, ptent); - } else if (!pte_none(ptent)) { - swp_entry_t entry = pte_to_swp_entry(ptent); - - /* - * As KSM pages remain KSM pages until freed, no need to wait - * here for migration to end. - */ - if (is_migration_entry(entry)) - page = pfn_swap_entry_to_page(entry); - } - /* return 1 if the page is an normal ksm page or KSM-placed zero page */ - ret = (page && PageKsm(page)) || is_ksm_zero_pte(ptent); - pte_unmap_unlock(pte, ptl); - return ret; -} - -static const struct mm_walk_ops break_ksm_ops = { - .pmd_entry = break_ksm_pmd_entry, - .walk_lock = PGWALK_RDLOCK, -}; - -static const struct mm_walk_ops break_ksm_lock_vma_ops = { - .pmd_entry = break_ksm_pmd_entry, - .walk_lock = PGWALK_WRLOCK, -}; - /* * We use break_ksm to break COW on a ksm page by triggering unsharing, * such that the ksm page will get replaced by an exclusive anonymous page. @@ -665,16 +624,26 @@ static const struct mm_walk_ops break_ksm_lock_vma_ops = { static int break_ksm(struct vm_area_struct *vma, unsigned long addr, bool lock_vma) { vm_fault_t ret = 0; - const struct mm_walk_ops *ops = lock_vma ? - &break_ksm_lock_vma_ops : &break_ksm_ops; + + if (lock_vma) + vma_start_write(vma); do { - int ksm_page; + bool ksm_page = false; + struct folio_walk fw; + struct folio *folio; cond_resched(); - ksm_page = walk_page_range_vma(vma, addr, addr + 1, ops, NULL); - if (WARN_ON_ONCE(ksm_page < 0)) - return ksm_page; + folio = folio_walk_start(&fw, vma, addr, + FW_MIGRATION | FW_ZEROPAGE); + if (folio) { + /* Small folio implies FW_LEVEL_PTE. */ + if (!folio_test_large(folio) && + (folio_test_ksm(folio) || is_ksm_zero_pte(fw.pte))) + ksm_page = true; + folio_walk_end(&fw, vma); + } + if (!ksm_page) return 0; ret = handle_mm_fault(vma, addr,