From patchwork Fri Apr 28 00:41:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13225915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 107FAC77B73 for ; Fri, 28 Apr 2023 00:41:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 59C6F6B0072; Thu, 27 Apr 2023 20:41:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 54C0C6B0074; Thu, 27 Apr 2023 20:41:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3EBDE900002; Thu, 27 Apr 2023 20:41:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2F86F6B0072 for ; Thu, 27 Apr 2023 20:41:50 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id EA5F21C6BCE for ; Fri, 28 Apr 2023 00:41:49 +0000 (UTC) X-FDA: 80728947138.10.BE9892C Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf27.hostedemail.com (Postfix) with ESMTP id 278FE40017 for ; Fri, 28 Apr 2023 00:41:47 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=bk8krxON; spf=pass (imf27.hostedemail.com: domain of 3ShZLZAgKCMs0zr7zFr4x55x2v.t532z4BE-331Crt1.58x@flex--jiaqiyan.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3ShZLZAgKCMs0zr7zFr4x55x2v.t532z4BE-331Crt1.58x@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682642508; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HlsvVMr5wJrguvOpjMRMDu7HLXIfjm1V1/lZO+i2wEY=; b=lGBE4pJApYYGEjcY/JZcrFvTSNojXnpQw9bPPSfzZSGDGtoxTlqEinhEK3Ml5ilqbotkN2 1DNFXiOmGowAqLKh7FrXMLCEuhDMnALuawXJOHMxfKjxKEPyVZf9Xsvj6NmwVCfRveKWpK FotN3rFE2yWuyGTGN+nl7XaHp6OHlAU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682642508; a=rsa-sha256; cv=none; b=UHpjK8qzGUXEQYjFE+wV34ZnRKMbxVW0nbU1hrUwloVlaw31txO5G8Y/s9N87B3Hqqx998 Z+bYe93Z4dQc4VzcxiCiTUCQm6rfRWRyJ6VN1lA5UdqDaFycd6Bx2YnFfdo907n4pd+Pfm ZsXUGLy0pYdFcQeDa2xIHy4qZaJsh7o= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=bk8krxON; spf=pass (imf27.hostedemail.com: domain of 3ShZLZAgKCMs0zr7zFr4x55x2v.t532z4BE-331Crt1.58x@flex--jiaqiyan.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3ShZLZAgKCMs0zr7zFr4x55x2v.t532z4BE-331Crt1.58x@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-63b86fc03fcso6143700b3a.1 for ; Thu, 27 Apr 2023 17:41:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682642507; x=1685234507; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=HlsvVMr5wJrguvOpjMRMDu7HLXIfjm1V1/lZO+i2wEY=; b=bk8krxONTSthWw2wf37v9leSTroOrwRjUmemZZJ/Lyh5va07os3jH0SdlKMX2Zlkwj 1mnDEDdY4bjpGnEORTKMNTb16HBb+EUa0NxlyJbDTbRUJSGV6xcl2sUkO+b9xA1tcteZ kE4xcf4qAtxu+ZvDVFv6OUnvm2WuKr6H5Zk0Y26N+8UjC/Dh1ZosktrEV0alGZDuZAzp xmSlF2vf5rAlQF9o5tSrvz/IpWboDcrV66QOeb2d7d9hznArgM+WJwI7eI/8GgRge0wi qWiUu+8FqsMv/fRAtfP3XkQ/i7K33M36dADdpudayfMi988owv/SqJUleGXA2KWm6Zbv D8/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682642507; x=1685234507; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HlsvVMr5wJrguvOpjMRMDu7HLXIfjm1V1/lZO+i2wEY=; b=N7CT2+0o3SA31Vu9Q125FpjgBygXvT+UgpVeaVk4V5QP6UWbpeo2z5+U9KWEcnXoUq J5ug8KezcIFDOxxNrUsM3ojiVcTxoHcJKqqrtn5GEPzM6PDC0HKm6Lt564ljq+0t6IFM 672XbTGuYFpW0QtYIkVCRUdqOYAzrS918LhbUqueSp8fO/v6Fos0RmX0dyW7gPFOqjT5 BoR3XgVgTTz90XOIWoCve+L9AhGok8IeG7qimBVnX8ZqGYqw21jH59/XjsVBwzofpW62 cIiXUFTu9NzwW9H6hdQHxPGet0gIpyHU9jZxRnMWh5QwjpgvpndFWpTe9nXDUgtgQE6s Jj9Q== X-Gm-Message-State: AC+VfDxGdwZ+Y4hGHnhT+QJEYTLhVxQ3WT7hZvnTgyGUavx3X6jE4C3M 6o6qxjIeQbp6ZtJMMsVWu1ZBabIvuKYkRQ== X-Google-Smtp-Source: ACHHUZ6W1gTtUmt2zvgUtAGLFr+kDeiH7f4S+nefzbrb5wMoRtHw15HiLOJ2LVfoGQEoCXbFDUBF4Z+2JZW82A== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a05:6a00:80de:b0:63b:234e:d641 with SMTP id ei30-20020a056a0080de00b0063b234ed641mr928248pfb.4.1682642506763; Thu, 27 Apr 2023 17:41:46 -0700 (PDT) Date: Fri, 28 Apr 2023 00:41:33 +0000 In-Reply-To: <20230428004139.2899856-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230428004139.2899856-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.40.1.495.gc816e09b53d-goog Message-ID: <20230428004139.2899856-2-jiaqiyan@google.com> Subject: [RFC PATCH v1 1/7] hugetlb: add HugeTLB splitting functionality From: Jiaqi Yan To: mike.kravetz@oracle.com, peterx@redhat.com, naoya.horiguchi@nec.com Cc: songmuchun@bytedance.com, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com, rientjes@google.com, linmiaohe@huawei.com, shy828301@gmail.com, baolin.wang@linux.alibaba.com, wangkefeng.wang@huawei.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jiaqi Yan X-Rspam-User: X-Rspamd-Queue-Id: 278FE40017 X-Rspamd-Server: rspam09 X-Stat-Signature: 776d9a73ixhme9e1r4ijzeg3raqfkm5f X-HE-Tag: 1682642507-172638 X-HE-Meta: U2FsdGVkX1/2WRS1tilSmE8EZ5kmY9VtiE/pLa03kIa7JjQbPzBbM+AZ1ByZ8Nf9lqoCBXR0/nj+4qtJO63WX0vgzJZpmGG7QeCdQM8ZJa3GrxBxooTkH883yudVt+igi8759Nt/6VMBFfRRiU4u6DXGKsykU+z+5T5PpSauVquX75w1Mm4rijpK0DPoqPdFoeFLBzjJhc9rATD/T0dvfrUhFS5uUwVINXcIGeBBtT6zXWQ0e4PWAUU7tZiQkWzbmMDHwv6hCPM6+O2rhnYxaMlzpAih23Ohj7enfcOyKoi9311RmYzsNHXiU8m4p/4UmDksen65rgijRCf4t/7reLvMo8NPcGzBjOoKFVAwOzF5mXAsBeXSY0XORCqb3NR7fHVk+S9KKGffNosoizTn265b49IQDZX31GzqwIFvbAfm23N82vrCEaRy7+vwsWKXig8IYRl2uRs/INjyfh/TdNv3HG9BM6b6JbNAEWafyvkxjjIp2nEJu28iU6RJzKc9GxDifXmXUx3SSxzMOUZBuzrfZWR0P2NPIdMNQ79dh98uBshVS7mzaBt2Y+CBJVM8vlt4CghyPB2ss0bGJg+4Bc6Jcs6SxBnn65TK/Ny6djJI4KW1uzSSGxsmoH4SennIl4XYAMEsmcUmeHjG6tX7cK0f/rAHur08U9HwQM5Dyy57r6Exar31giQbDGjnRtIGApAHyfnfvbjW0+kQeG2NEiWI6rCcXZeYwidrraecg51iYrf2NN2oTkSbLtj2wnPcP31cIFbzfiJH2wwpjENVhYXDO3ZUf/DxfX3DDD2AOq6dKDwfXzHYJgjRq0JI5fUwJGHtUQ6xZsbfk1cV2ADy1h3xNbj113hAXaJg59x6IdyIj7CVXFYpmZAE8S7jjbnjCH59x7dRA2rgseDyNcqw/a5rpoBfSVMZQqqw0xBTKTkJeshuBkqBCxu4XAdqNsPLm6EcSz/rf7/SZ13DR/q p/LZb/cY Co9CwSy+EjfXV8AUtMFDpOKiRGMO6/f1OIj33/EF34NtNhFFQhjTjbTKJ3om5zisUCGDIWYx9MQ4d7ZcCVb0kUp9oikcwoDabS4cVlpCX9lp8+tgAwyKf2cHacsPJVArP2xCeGIq+XRBwozI4PNxNKEnvB26IhLqcNAtNL5PyDRi89TDphT5HBdw5jB4qXhd6QYygAvAFa4IXkBW2DOY2XiyVIZQImEY9ah572B1Pd/EKa5MjOuRRrkuDXA1I7eodgDL9RhCaPRUMK3I1SyqWhPRNRNAeOPIMnv0i0SLy+d6ERCxL5qWq52m4irDzRdy0mw/Pax2U5DoNUPVNs8XnGfZzmGEyqGXoJNPXjFQzS9kQOUFznGwREFOJsYb5So4OWFAP9rrdh4bqhkfhrWqEzHPXZBVZQiWQGsdW15iwTmiFB8vb7dh2ebvwVrlKYVJj8Yldc1ovcGBp8duTMaPknKsCDyeMqKmpxoAcYJQcPdeBX6hNnSu05P+GyFnPifWpDLpTC9w47RofE7M+B7KtGgygO6tM7JYr96TE7KETMpSleTQQIeYnOee1qzueMOJNxd9xeWSc+nQ1pC7AwKp9SuS+9MCAyWIchUU5ksTlwYszJC2nMqwjgsGA2R3eru7REDdjEOIMgBoKS2Q0X5RyCJDNPg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The new function, hugetlb_split_to_shift, optimally splits the page table to map a particular address at a paricular granularity. This is useful for punching a hole in the mapping and for mapping (and unmapping) small sections of a HugeTLB page. Splitting is for present leaf HugeTLB PTE only. None HugeTLB PTEs and other non-present HugeTLB PTE types are not supported as they are better left untouched: * None PTEs * Migration PTEs * HWPOISON PTEs * UFFD writeprotect PTEs Signed-off-by: Jiaqi Yan --- include/linux/hugetlb.h | 9 ++ mm/hugetlb.c | 249 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 258 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 742e7f2cb170..d44bf6a794e5 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1266,6 +1266,9 @@ int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, unsigned long end); int hugetlb_collapse(struct mm_struct *mm, unsigned long start, unsigned long end); +int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma, + struct hugetlb_pte *hpte, unsigned long addr, + unsigned int desired_shift); #else static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { @@ -1292,6 +1295,12 @@ int hugetlb_collapse(struct mm_struct *mm, unsigned long start, { return -EINVAL; } +int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma, + const struct hugetlb_pte *hpte, unsigned long addr, + unsigned int desired_shift) +{ + return -EINVAL; +} #endif static inline diff --git a/mm/hugetlb.c b/mm/hugetlb.c index df4c17164abb..d3f3f1c2d293 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -8203,6 +8203,255 @@ int hugetlb_collapse(struct mm_struct *mm, unsigned long start, return ret; } +/* + * Find the optimal HugeTLB PTE shift that @desired_addr could be mapped at. + */ +static int hugetlb_find_shift(struct vm_area_struct *vma, + unsigned long curr, + unsigned long end, + unsigned long desired_addr, + unsigned long desired_shift, + unsigned int *shift_found) +{ + struct hstate *h = hstate_vma(vma); + struct hstate *tmp_h; + unsigned int shift; + unsigned long sz; + + for_each_hgm_shift(h, tmp_h, shift) { + sz = 1UL << shift; + /* This sz is not aligned or too large. */ + if (!IS_ALIGNED(curr, sz) || curr + sz > end) + continue; + /* + * When desired_addr is in [curr, curr + sz), + * we want shift to be as close to desired_shift + * as possible. + */ + if (curr <= desired_addr && desired_addr < curr + sz + && shift > desired_shift) + continue; + + *shift_found = shift; + return 0; + } + + return -EINVAL; +} + +/* + * Given a particular address @addr and it is a present leaf HugeTLB PTE, + * split it so that the PTE that maps @addr is at @desired_shift. + */ +static int hugetlb_split_to_shift_present_leaf(struct mm_struct *mm, + struct vm_area_struct *vma, + pte_t old_entry, + unsigned long start, + unsigned long end, + unsigned long addr, + unsigned int orig_shift, + unsigned int desired_shift) +{ + bool old_entry_dirty; + bool old_entry_write; + bool old_entry_uffd_wp; + pte_t new_entry; + unsigned long curr; + unsigned long sz; + unsigned int shift; + int ret = 0; + struct hugetlb_pte new_hpte; + struct page *subpage = NULL; + struct folio *folio = page_folio(compound_head(pte_page(old_entry))); + struct hstate *h = hstate_vma(vma); + spinlock_t *ptl; + + /* Unmap original unsplit hugepage per huge_ptep_get_and_clear. */ + hugetlb_remove_rmap(folio_page(folio, 0), orig_shift, h, vma); + + old_entry_dirty = huge_pte_dirty(old_entry); + old_entry_write = huge_pte_write(old_entry); + old_entry_uffd_wp = huge_pte_uffd_wp(old_entry); + + for (curr = start; curr < end; curr += sz) { + ret = hugetlb_find_shift(vma, curr, end, addr, + desired_shift, &shift); + + /* Unable to find a shift that works */ + if (WARN_ON(ret)) + goto abort; + + /* + * Do HGM full walk and allocate new page table structures + * to continue to walk to the level we want. + */ + sz = 1UL << shift; + ret = hugetlb_full_walk_alloc(&new_hpte, vma, curr, sz); + if (WARN_ON(ret)) + goto abort; + + BUG_ON(hugetlb_pte_size(&new_hpte) > sz); + /* + * When hugetlb_pte_size(new_hpte) is than sz, increment + * curr by hugetlb_pte_size(new_hpte) to avoid skip over + * some PTEs. + */ + if (hugetlb_pte_size(&new_hpte) < sz) + sz = hugetlb_pte_size(&new_hpte); + + subpage = hugetlb_find_subpage(h, folio, curr); + /* + * Creating a new (finer granularity) PT entry and + * populate it with old_entry's bits. + */ + new_entry = make_huge_pte(vma, subpage, + huge_pte_write(old_entry), shift); + if (old_entry_dirty) + new_entry = huge_pte_mkdirty(new_entry); + if (old_entry_write) + new_entry = huge_pte_mkwrite(new_entry); + if (old_entry_uffd_wp) + new_entry = huge_pte_mkuffd_wp(new_entry); + ptl = hugetlb_pte_lock(&new_hpte); + set_huge_pte_at(mm, curr, new_hpte.ptep, new_entry); + spin_unlock(ptl); + /* Increment ref/mapcount per set_huge_pte_at(). */ + hugetlb_add_file_rmap(subpage, shift, h, vma); + folio_get(folio); + } + /* + * This refcount decrement is for the huge_ptep_get_and_clear + * on the hpte BEFORE splitting, for the same reason as + * hugetlb_remove_rmap(), but we cannot do it at that time. + * Now that splitting succeeded, the refcount can be decremented. + */ + folio_put(folio); + return 0; +abort: + /* + * Restore mapcount on unsplitted hugepage. No need to restore + * refcount as we won't folio_put() until splitting succeeded. + */ + hugetlb_add_file_rmap(folio_page(folio, 0), orig_shift, h, vma); + return ret; +} + +/* + * Given a particular address @addr, split the HugeTLB PTE that currently + * maps it so that, for the given @addr, the PTE that maps it is @desired_shift. + * The splitting is always done optimally. + * + * Example: given a HugeTLB 1G page mapped from VA 0 to 1G, if caller calls + * this API with addr=0 and desired_shift=PAGE_SHIFT, we will change the page + * table as follows: + * 1. The original PUD will be split into 512 2M PMDs first + * 2. The 1st PMD will further be split into 512 4K PTEs + * + * Callers are required to hold locks on the file mapping within vma. + */ +int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma, + struct hugetlb_pte *hpte, unsigned long addr, + unsigned int desired_shift) +{ + unsigned long start, end; + unsigned long desired_sz = 1UL << desired_shift; + int ret; + pte_t old_entry; + struct mmu_gather tlb; + struct mmu_notifier_range range; + spinlock_t *ptl; + + BUG_ON(!hpte->ptep); + + start = addr & hugetlb_pte_mask(hpte); + end = start + hugetlb_pte_size(hpte); + BUG_ON(!IS_ALIGNED(start, desired_sz)); + BUG_ON(!IS_ALIGNED(end, desired_sz)); + BUG_ON(addr < start || end <= addr); + + if (hpte->shift == desired_shift) + return 0; + + /* + * Non none-mostly hugetlb PTEs must be present leaf-level PTE, + * i.e. not split before. + */ + ptl = hugetlb_pte_lock(hpte); + BUG_ON(!huge_pte_none_mostly(huge_ptep_get(hpte->ptep)) && + !hugetlb_pte_present_leaf(hpte, huge_ptep_get(hpte->ptep))); + + i_mmap_assert_write_locked(vma->vm_file->f_mapping); + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, start, end); + mmu_notifier_invalidate_range_start(&range); + + /* + * Get and clear the PTE. We will allocate new page table structures + * when walking the page table. + */ + old_entry = huge_ptep_get_and_clear(mm, start, hpte->ptep); + spin_unlock(ptl); + + /* + * From now on, any failure exit needs to go through "skip" to + * put old_entry back. If any form of hugetlb_split_to_shift_xxx + * is invoked, it also needs to go through "abort" to get rid of + * the allocated PTEs created before splitting fails. + */ + + if (unlikely(huge_pte_none_mostly(old_entry))) { + ret = -EAGAIN; + goto skip; + } + if (unlikely(!pte_present(old_entry))) { + if (is_hugetlb_entry_migration(old_entry)) + ret = -EBUSY; + else if (is_hugetlb_entry_hwpoisoned(old_entry)) + ret = -EHWPOISON; + else { + WARN_ONCE(1, "Unexpected case of non-present HugeTLB PTE\n"); + ret = -EINVAL; + } + goto skip; + } + + if (!hugetlb_pte_present_leaf(hpte, old_entry)) { + WARN_ONCE(1, "HugeTLB present PTE is not leaf\n"); + ret = -EAGAIN; + goto skip; + } + /* From now on old_entry is present leaf entry. */ + ret = hugetlb_split_to_shift_present_leaf(mm, vma, old_entry, + start, end, addr, + hpte->shift, + desired_shift); + if (ret) + goto abort; + + /* Splitting done, new page table entries successfully setup. */ + mmu_notifier_invalidate_range_end(&range); + return 0; +abort: + /* Splitting failed, restoring to the original page table state. */ + tlb_gather_mmu(&tlb, mm); + /* Decrement mapcount for all the split PTEs. */ + __unmap_hugepage_range(&tlb, vma, start, end, NULL, ZAP_FLAG_DROP_MARKER); + /* + * Free any newly allocated page table entries. + * Ok if no new entries allocated at all. + */ + hugetlb_free_pgd_range(&tlb, start, end, start, end); + /* Decrement refcount for all the split PTEs. */ + tlb_finish_mmu(&tlb); +skip: + /* Restore the old entry. */ + ptl = hugetlb_pte_lock(hpte); + set_huge_pte_at(mm, start, hpte->ptep, old_entry); + spin_unlock(ptl); + mmu_notifier_invalidate_range_end(&range); + return ret; +} + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* From patchwork Fri Apr 28 00:41:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13225916 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70C38C77B7C for ; Fri, 28 Apr 2023 00:41:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EEBB16B0074; Thu, 27 Apr 2023 20:41:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E9ACB900002; Thu, 27 Apr 2023 20:41:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8CDF6B0078; Thu, 27 Apr 2023 20:41:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id CC3206B0074 for ; Thu, 27 Apr 2023 20:41:51 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 95DB2AD1ED for ; Fri, 28 Apr 2023 00:41:51 +0000 (UTC) X-FDA: 80728947222.29.3386F88 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf15.hostedemail.com (Postfix) with ESMTP id C79E0A0005 for ; Fri, 28 Apr 2023 00:41:49 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=3F224aKz; spf=pass (imf15.hostedemail.com: domain of 3TBZLZAgKCM021t91Ht6z77z4x.v75416DG-553Etv3.7Az@flex--jiaqiyan.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3TBZLZAgKCM021t91Ht6z77z4x.v75416DG-553Etv3.7Az@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682642509; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8bYeQzrUNDYZNhkuugE6jf2Akn1BdGdutTjwyn1e4o8=; b=4Iqxzsl0rZVEWCCFJmLooCm8O9cg5N4FatTJkuwz1iKqLGxWUomus9pX6jD6K58N6ncaWA HIZQYG+1kHvQvNHzT+m27KyCFkCPIIJEuAV4CY7LQWzL9MEG2Vk93cq/d1w3T+yOxUrWdE fcdFYr7oJcfs54gMd6M6YBL8qty4DxQ= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=3F224aKz; spf=pass (imf15.hostedemail.com: domain of 3TBZLZAgKCM021t91Ht6z77z4x.v75416DG-553Etv3.7Az@flex--jiaqiyan.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3TBZLZAgKCM021t91Ht6z77z4x.v75416DG-553Etv3.7Az@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682642509; a=rsa-sha256; cv=none; b=EgGRzcW4RBzUqTtcVXBj1GnfFSml4DyDaaJEEwEH99d1PCmqdv9cOsjw07MnljfwjV320o yQzn5iPHFNB0SNwySngZs1AFEC7TThmB1b/nSydv8Lbo2IUKHh45mHa+/MSuV0b7fw0red 0chp6i970Cb873UpIT0ZLdDnt08Q4uo= Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2474acaeaf8so5119566a91.1 for ; Thu, 27 Apr 2023 17:41:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682642508; x=1685234508; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8bYeQzrUNDYZNhkuugE6jf2Akn1BdGdutTjwyn1e4o8=; b=3F224aKzgYOc/AvfHDwiitWzc6njaindjq7TzzaiPro6lvt8bS0DSSV6QcWa4f2xh7 OCnqi82Pv2zBPGZRbSpyfGekXLluJqCZbi0SwBFNzDNX1eKvR5Qfu8UGApuRJSegTD23 Wf2t/MaVERhWf96SaiPu5FGxffT+fMBXn6u0g35bMo7NXOc0+4rjPu9+Py3aLIXOJajh UJQg241cRBfw3XHvvrB95hKTnaifEYCXS1lCQau+Z9JvPKFx1KSnltCBZWdCG+eStjbn Omnk245IfYkhnNTfMfBDS7p3Nby60Fh1QgneJKJYHONxX26YbbNS96JQhxWa0+pZZPH8 XV9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682642508; x=1685234508; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8bYeQzrUNDYZNhkuugE6jf2Akn1BdGdutTjwyn1e4o8=; b=dwHtOeS6OK3z6wSwSqrf3vf01Tl05BV0BFp1YVgqVWRc3Dn7qXbjNF3l7bIB6wV363 1G95xRLJRA672qSsMrLR1s+oYPCxthrQ+TDlP9vRox/8ZeRTgecJnlYeouDtK6u8gXvo Z7DoVLJ7gNZC2BTmjId29IqVCB4pUiIGrL9Cg6hNbVNmI1c9ivLt5o5pO6xm7g2V2Yt+ tVxgB0VK+FNIt8C43YGvK+YS3YzWGRdcUYOlUAbRJ518ccBRKVhAlTliFywSv2+oUJUf rsPAD9LFyFZwl5TdO0CpmrATidDfBl9U/7XwsTq07Fx++evmmqAdlzKLOfxdcU75QblD 7MVQ== X-Gm-Message-State: AC+VfDyrzMpSapqLuXRmDOWZ/xWUyZvemkooS1Jl/wjB4ko46TWwkCiF 0k18t1J5MXiCfuBPz857diiHqPYVZkTgrA== X-Google-Smtp-Source: ACHHUZ5+4VboIN/jABrhB8fFk35/7s+Bqip+YL9U3H3a9Ag2csktvsv65xDsOQoeBdKjAUSnnyd2diBmQx0p/A== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a17:90a:304b:b0:247:1639:9650 with SMTP id q11-20020a17090a304b00b0024716399650mr1011181pjl.2.1682642508523; Thu, 27 Apr 2023 17:41:48 -0700 (PDT) Date: Fri, 28 Apr 2023 00:41:34 +0000 In-Reply-To: <20230428004139.2899856-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230428004139.2899856-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.40.1.495.gc816e09b53d-goog Message-ID: <20230428004139.2899856-3-jiaqiyan@google.com> Subject: [RFC PATCH v1 2/7] hugetlb: create PTE level mapping when possible From: Jiaqi Yan To: mike.kravetz@oracle.com, peterx@redhat.com, naoya.horiguchi@nec.com Cc: songmuchun@bytedance.com, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com, rientjes@google.com, linmiaohe@huawei.com, shy828301@gmail.com, baolin.wang@linux.alibaba.com, wangkefeng.wang@huawei.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jiaqi Yan X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C79E0A0005 X-Rspam-User: X-Stat-Signature: 71i318mwsb518w5ymx5nitbwqd38e7tb X-HE-Tag: 1682642509-505874 X-HE-Meta: U2FsdGVkX19e2Q6OowXmtJZBFcgoaXGBBMbXHc+KWRK4dPuC913R7h7vuiCcPGKLuVFhnzteGPIWg0V73Z5TFLssawFA9UqEFV3dDSOLpbPbyoX+nGcGeIZtqboeSGBZUx8qPgSVq1mFqfxIJx11HnMIYc1iWnKa9L4i4UHP/n5A7DAjE9ZfGYUmo9KVCu86Mwx7yUSQwUN0lkbfwknYefWf8dLtpUu2snD/D8kw26gmzQMcGzaYDmyF1wK5noA2R3QnLEfSsojTFbB3Md5spLh7CbvP0xhOehP9KxY3Sey5PS2JLNAo3c8x3jfkaiDv81SsPg3VaTqyhZlsyjPXUUI6BDYuAjOWKKnEtWabmaKdnwMfs1G9Wxi22ZnR/LCcDEYUWAefF/ugrVhEU/NIxF3s7WXg9bNIPA1ChZZECBqM6xCNfIHp1BTxL/lX+O8bvcQ2llgyJxpYPqbxWJGTBdi9/XgyqU5P+vYcSewFaPcrBEYj5jWEIELKNGtmenQK14kInEhLvns8FPSYhnt4O0zRmWVpsdniswi8X9QvvGhs9zUiM2ZYyuQM/8uDVm+gzy8m+OfzQXx69+agRNWUswsWR+crTRZc5Re4i6128pMh/OqYuClC/pYAUdjckyh2oCww2F0ZXpqvUvyN6kc+FQ/NJRXjeInBjr1tb78r4ylv+c5uGx1cUPtumwqSorTjVTSLSK+yEwe4crjm6sAGhgmTbWNQfuNeIWMBVaz/gfSsM4YzPXls+yIQSdF42oRg1y7IUhCUBCZR83SZMpvP90Nbmdtru96Bp91yJYBKaBAKkNnQ5PMpfzAGoJhnby3PoDPMMgxiyFfKtgt5EZgfOT/nlymZzVnV3pwHOFjuj0Ge3xAw5Q+K+GvHtAIT+c9oc37leLH7AUpp1Bl98pxESiVnwbGMt2ylYIQTaXO8XDH21dNMUCdLla4YgOBLUubdBhTLSKovF1YGeWPJaoR J65gTXc2 KRObXIr1dGElowu0FZSfiwVMUK11O0qz+DzkTsdITeFXX4o1nHJ+5I5H4rgZt6QQWRXT/lzHLX1bTPdAIlsdL87v+VJVHOATIMx3qYTgJTHXKbOajIA/zofj8uAg2vYYew5DTIXMzD+grGIGZ54tl/e8i9VBLyQaoCx3PX623cCbnJoKz5XpvgxynC5lvGqJ1Uz5W86LwXb8Bvj08INOl9ilqt+Mkt+XrecTE728OO/N0t6I91+3dzX8rHMZUkvrOVe7oIeLACHjxwHhfFeT+3awFoJjSAU/cMv+ouzEfZ4ZXOCHqHzGiheHeeoeDrjZQANwAH4JIZiSeo1DnhJfXgAd1KItb3Ex7CCiyX3acQ+ooAn0QsIR6fz7SnCIGBS7yU8wR1OqV4y3YWMy+alOuwr8rqfAC3xLntgDB6HChiWQrVlu+RvQRH6rpfIbKh4e+lv6mKAWwMfnQnRP+48ToUzDjofE80EbhzfOR/ydZ8GtljdRSl/+HQjbezR7ckP9OmTJoowKbggY0VR8tdUzKJQ95Wv54NNdEIYnNHP8PFvKaUv07QSJVbzWEvuLcE91y9KcuRU7LL1D8+taMyoobNcia/iqEJy+ClbI5Xaxj/s8o/np9k5i69w6Z3A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In memory_failure handling, for each VMA that the HWPOISON HugeTLB page mapped to, enable HGM if eligible, then split the P*D mapped hugepage to smaller PTEs. try_to_unmap still unmaps the entire hugetlb page, one PTE by one PTE, at levels smaller than original P*D. For example, if a hugepage was original mapped at PUD size, it will be split into PMDs and PTEs, and all of these PMDs and PTEs will be unmapped. The next commit will only unmap the raw HWPOISON PTE. For VMA that is not HGM eligible, or failed to enable HGM, or failed to split hugepage mapping, the hugepage is still mapped by its original P*D then unmapped at this P*D. Signed-off-by: Jiaqi Yan --- include/linux/hugetlb.h | 5 +++ mm/hugetlb.c | 27 ++++++++++++++++ mm/memory-failure.c | 68 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 100 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d44bf6a794e5..03074b23c396 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1266,6 +1266,7 @@ int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, unsigned long end); int hugetlb_collapse(struct mm_struct *mm, unsigned long start, unsigned long end); +int hugetlb_enable_hgm_vma(struct vm_area_struct *vma); int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma, struct hugetlb_pte *hpte, unsigned long addr, unsigned int desired_shift); @@ -1295,6 +1296,10 @@ int hugetlb_collapse(struct mm_struct *mm, unsigned long start, { return -EINVAL; } +int hugetlb_enable_hgm_vma(struct vm_area_struct *vma) +{ + return -EINVAL; +} int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma, const struct hugetlb_pte *hpte, unsigned long addr, unsigned int desired_shift) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d3f3f1c2d293..1419176b7e51 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -8203,6 +8203,33 @@ int hugetlb_collapse(struct mm_struct *mm, unsigned long start, return ret; } +int hugetlb_enable_hgm_vma(struct vm_area_struct *vma) +{ + if (hugetlb_hgm_enabled(vma)) + return 0; + + if (!is_vm_hugetlb_page(vma)) { + pr_warn("VMA=[%#lx, %#lx) is not HugeTLB\n", + vma->vm_start, vma->vm_end); + return -EINVAL; + } + + if (!hugetlb_hgm_eligible(vma)) { + pr_warn("VMA=[%#lx, %#lx) is not HGM eligible\n", + vma->vm_start, vma->vm_end); + return -EINVAL; + } + + hugetlb_unshare_all_pmds(vma); + + /* + * TODO: add the ability to tell if HGM is enabled by kernel + * (for HWPOISON unmapping) or by userspace (via MADV_SPLIT). + */ + vm_flags_set(vma, VM_HUGETLB_HGM); + return 0; +} + /* * Find the optimal HugeTLB PTE shift that @desired_addr could be mapped at. */ diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 0b37cbc6e8ae..eb5579b6787e 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1479,6 +1479,73 @@ static int get_hwpoison_page(struct page *p, unsigned long flags) return ret; } +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +/* + * For each HGM-eligible VMA that the poisoned page mapped to, create new + * HGM mapping for hugepage @folio and make sure @poisoned_page is mapped + * by a PAGESIZE level PTE. Caller (hwpoison_user_mappings) must ensure + * 1. folio's address space (mapping) is locked in write mode. + * 2. folio is locked. + */ +static void try_to_split_huge_mapping(struct folio *folio, + struct page *poisoned_page) +{ + struct address_space *mapping = folio_mapping(folio); + pgoff_t pgoff_start; + pgoff_t pgoff_end; + struct vm_area_struct *vma; + unsigned long poisoned_addr; + unsigned long head_addr; + struct hugetlb_pte hpte; + + if (WARN_ON(!mapping)) + return; + + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + + pgoff_start = folio_pgoff(folio); + pgoff_end = pgoff_start + folio_nr_pages(folio) - 1; + + vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff_start, pgoff_end) { + /* Enable HGM on HGM-eligible VMAs. */ + if (!hugetlb_hgm_eligible(vma)) + continue; + + i_mmap_assert_locked(vma->vm_file->f_mapping); + if (hugetlb_enable_hgm_vma(vma)) { + pr_err("Failed to enable HGM on eligible VMA=[%#lx, %#lx)\n", + vma->vm_start, vma->vm_end); + continue; + } + + poisoned_addr = vma_address(poisoned_page, vma); + head_addr = vma_address(folio_page(folio, 0), vma); + /* + * Get the hugetlb_pte of the PUD-mapped hugepage first, + * then split the PUD entry into PMD + PTE entries. + * + * Both getting original huge PTE and splitting requires write + * lock on vma->vm_file->f_mapping, which caller + * (e.g. hwpoison_user_mappings) should already acquired. + */ + if (hugetlb_full_walk(&hpte, vma, head_addr)) + continue; + + if (hugetlb_split_to_shift(vma->vm_mm, vma, &hpte, + poisoned_addr, PAGE_SHIFT)) { + pr_err("Failed to split huge mapping: pfn=%#lx, vaddr=%#lx in VMA=[%#lx, %#lx)\n", + page_to_pfn(poisoned_page), poisoned_addr, + vma->vm_start, vma->vm_end); + } + } +} +#else +static void try_to_split_huge_mapping(struct folio *folio, + struct page *poisoned_page) +{ +} +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + /* * Do all that is necessary to remove user space mappings. Unmap * the pages and send SIGBUS to the processes if the data was dirty. @@ -1555,6 +1622,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn, */ mapping = hugetlb_page_mapping_lock_write(hpage); if (mapping) { + try_to_split_huge_mapping(folio, p); try_to_unmap(folio, ttu|TTU_RMAP_LOCKED); i_mmap_unlock_write(mapping); } else From patchwork Fri Apr 28 00:41:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13225917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AFB5C7EE24 for ; Fri, 28 Apr 2023 00:41:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 28E956B0075; Thu, 27 Apr 2023 20:41:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2185A6B0078; Thu, 27 Apr 2023 20:41:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E0146B007B; Thu, 27 Apr 2023 20:41:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id F10AE6B0075 for ; Thu, 27 Apr 2023 20:41:52 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C6897AD1F1 for ; Fri, 28 Apr 2023 00:41:52 +0000 (UTC) X-FDA: 80728947264.23.519FD06 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf09.hostedemail.com (Postfix) with ESMTP id 1458E14000B for ; Fri, 28 Apr 2023 00:41:50 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=NA00K5Oq; spf=pass (imf09.hostedemail.com: domain of 3ThZLZAgKCM843vB3Jv819916z.x97638FI-775Gvx5.9C1@flex--jiaqiyan.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3ThZLZAgKCM843vB3Jv819916z.x97638FI-775Gvx5.9C1@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682642511; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cdjbuBZu8tlGo1pxc0tk7o3QpZ2/1SgnLkFhr9tanbY=; b=cAsQq/7Wc7eXG1oHlji70kKWpuNBVJ1aRbnhpUE7t6CNZ6nXAzcOlYRk7gy1O05YHj/2bH EjwWuLO2vfpt3s/NW2pBfnL2eZTdb4wvDV+1v2KRtc82rjwLfSkuerLslfWI5Qqe7krsDb 6//ucBal8brfJd+QrdDywxoRQqupXp4= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=NA00K5Oq; spf=pass (imf09.hostedemail.com: domain of 3ThZLZAgKCM843vB3Jv819916z.x97638FI-775Gvx5.9C1@flex--jiaqiyan.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3ThZLZAgKCM843vB3Jv819916z.x97638FI-775Gvx5.9C1@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682642511; a=rsa-sha256; cv=none; b=NW3AUo7dYYkgKEWNDws92h2z9D4QHxSGgI/2XmMjEziS5gBV4VMOsxKiqh+yXPjg6oreJq erzBECfmSvq+Xk3hxNtCDX3uQhHUmu5eitUB4VDQLAHL5IdWwZNZV6p3x/2ARVRSPQWl18 c9aeqhEm19MlN50NQmE359FQHcO0i9I= Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-64115ef7234so6820725b3a.1 for ; Thu, 27 Apr 2023 17:41:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682642510; x=1685234510; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=cdjbuBZu8tlGo1pxc0tk7o3QpZ2/1SgnLkFhr9tanbY=; b=NA00K5Oq++SM0fGqna1cu6q3oeHcwy8f6So1qbLBBkMZpAFX9KXHuuDhs+gKOGNWxO BhUnhKp6iJbWjTLkiohP0DmdwrNXAziXuc0rvi0ihCWDhd7KVFxtgwOLgIjQlCxv10Cb Fr6lK4GVdCDCCuqabbeuFGr2SIxFl+p/TQ0exA+3rwcuGyQVsa66S5vAjj2+5psPOGf+ xKc58mbSpQGNsTNlj2VAUHQltWBcsb5W8OiCbzA004IXZZA/CdrPP2C+gtRnuVuprbCi P+Bwl8NfFz/7uOfeFZvdmYgCjQhOuj1pGE5/iVs2S/NXjYWrLHuFkM8vb6hoL8o2cfhT CcLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682642510; x=1685234510; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cdjbuBZu8tlGo1pxc0tk7o3QpZ2/1SgnLkFhr9tanbY=; b=O5zc3x8hCfN8LDQ6LPEkUywPNlkIz0IwmX8AzR9SMart5GrkdEI6+oqqoyI1Lx8OG4 8V+68Keb9X8OxdtK9tafuW2nyenEoPEXaHDAJXrfvQDE686hb4v/uDZHVamEPl3DjVc5 1fkgdvQLNOfztaResZdSVlWT+DuPh/JVZ2WpzgX0Ewydn6VYnruhrksr+BU1wZQf8mVN GkHR3/8BnTxba6+kVGno9pBaZl3VLTi8ar+IUbRg7zqxqafUtZ7MQTX0utfb9OOojVjz XJWyvlNqh1lCMXqWZhUCvK8HCEuquyvEwx0IM85SxuFPU0SZvisSvGx2IJFtgGXtCzY9 dwDg== X-Gm-Message-State: AC+VfDx3aoSKXmKkzfQGVJljV8+UEW5f7zdhSBTeGx9LJtsUqvob2t5m FpH0VzeJoxuB9twWxqR252kknP58mllgHA== X-Google-Smtp-Source: ACHHUZ5uI4KkTztNfACbj2ihd4x1PB/LmRz8ATBzfiQsUsXj0jD9z3cDdKYJarS81OndGtMC5UktDsmysZGalA== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a05:6a00:44c9:b0:63d:24ea:4172 with SMTP id cv9-20020a056a0044c900b0063d24ea4172mr2753850pfb.1.1682642510002; Thu, 27 Apr 2023 17:41:50 -0700 (PDT) Date: Fri, 28 Apr 2023 00:41:35 +0000 In-Reply-To: <20230428004139.2899856-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230428004139.2899856-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.40.1.495.gc816e09b53d-goog Message-ID: <20230428004139.2899856-4-jiaqiyan@google.com> Subject: [RFC PATCH v1 3/7] mm: publish raw_hwp_page in mm.h From: Jiaqi Yan To: mike.kravetz@oracle.com, peterx@redhat.com, naoya.horiguchi@nec.com Cc: songmuchun@bytedance.com, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com, rientjes@google.com, linmiaohe@huawei.com, shy828301@gmail.com, baolin.wang@linux.alibaba.com, wangkefeng.wang@huawei.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jiaqi Yan X-Stat-Signature: qq5b7w315196xr7ks8gj4pnxcxskh53j X-Rspam-User: X-Rspamd-Queue-Id: 1458E14000B X-Rspamd-Server: rspam06 X-HE-Tag: 1682642510-813825 X-HE-Meta: U2FsdGVkX18msNf4J3r+hOAp+qObllX63dD/RCb1uND4r+8kAbtKPnS4EjTKFNaQfF7DV+NP3kkscEsV9XVTkW8r6NbCq6BJND/2IQIGVTPAwd0Ctwx3O2ro5QKBNaMtdxzv0NDdm2xyNtI4BZ/t8wkMjMJlFImAFYjbS2NS+eUJqGdO9ClQaROhV3PAxDlTLwqGz7ZGuArTvezQKb9hkUrgQsqULtRziN7eWkg/PBLLUKB16li7DKIbQmvrLHpFm+BuuImjlDbykOMI2ohizIwDh6SNGcWm+Q/2BGFc+rLtEZdzsljv5nIEG6vw8ErV52nPPYtU97rXicGnn4uxDASLmtvPMMe1yftU+2pZ50RtLNU5/6RuGgfNRb0e/2UjuzyUPb1XWzDoauZozkymX5D8Er67PGhadBQNZCPC4vOo8cmOZW8+Gx+SHKkq5OWVBH8ZlQEkGpr0SNMiSgNlJs1nHZF5nSUzrOZi9RjHSmFh4e0Tk/58ZuQE3nTmA8oJRyuXDjSf1tgrQ7LD0n5bwijvfyE2VEw3S26UeT26Wj1Q4V3UFF7i6Tg7G2y44f6c0H1kfKHk2W0whoARxRstmhC8Dp244e3OQJndVVTJAUVZQijuDOOhLnzfLcNzixa8tBe4HuwNWZxCkmQK9uYKZJgfHfg/d54+5CZRJda7CUDM51EXexock70cydvrv/DWfHEeFbTnJIYGgTIsnsakibLtFU6zFBfOLYJGDefZnr/v0UhYfU7XOBf04xZMFsMVBoLoB8sxC2wJGHCW0zdxntuDXVLQ2dYsVvCOE6dg+KpLUasmuQ+kmpgEs+zIZNaqJz6+7Dl5D3KWiVA8vqC2xl15wEccme57J8ISu2OvDrOxV0wqEXSBRMw3crtFb/A1oMqu0Af/VHmHlvQNo4GPChkmNGAa1hAEA9HDbJVJ9TKW3BWu1nmtumbmEgCDQi7JaXHFJnSsqTCAEXWcac5 KclYuj0Q 8DmZIH+/zBZM0VUeBqjIKqCywfo/FxZ9HH8mMzMWpRgnIz5Jm0/vZtRW8b9RUL1WSq0wLDOiIufEPL7p6d+iut1WkRiZ5lJiTRgLHgJ+0HRXWJRcbMPdoBVI4V6dA4WDslcb3p/8Bb/ADF2qQCFPHsBCSk+vP/nNfyetgYo1TH7rBMnVks1OIXiGyB+AF5yoTUOhdh95QJtxiR67Cyxy7dc50epvm6PMmc5Bvna31GI+WOrHVBNN2UOBN2FJazzhUiWHVcmy+vN+M4cxVU9Yrjd8zFPu0tAiWYNRBQ9wISkDBD6frfXQPTNr7nzjngcS7wNUkSV5KMPTFqxuv0YQ2zgJl4zvgbeZSrycxOuzPY39bGKXaF8zUlVjBFkbzKZbhenqRSfLZGE2tX+F1OfRnwjbRzX5IOqY8/kfauGChwjtZqxcqtjvVUifh3xlbd+6tdJXd2SrmJa29lJY4lnnkNkKankWqeXHBOqxcTcWmyT4bNLcCNJ+1DZKyHJ1ADybWg4khkwsFS81VhCpm33WB7snEVM11zb5uX+cEmVQxT2lRYb8x2bz99YA+Ihw5xGOpB4b+IHed6GQ5HLKXQyGmge5Lb6vYAScXNOONFJbFWupE18UkJA2f7viMZ0G0YajCwFwudhCCyMh74Lg6BdhJhn/O5ETHbSmRlwWM7DwTVtIe308= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: raw_hwp_page will be needed by HugeTLB to determine if a raw subpage in a hugepage is poisoned and either should be unmapped or not faulted in at PAGE_SIZE PTE level Signed-off-by: Jiaqi Yan --- include/linux/mm.h | 16 ++++++++++++++++ mm/memory-failure.c | 13 ------------- 2 files changed, 16 insertions(+), 13 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 9d3216b4284a..4496d7bdd3ea 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3522,6 +3522,22 @@ enum mf_action_page_type { */ extern const struct attribute_group memory_failure_attr_group; +#ifdef CONFIG_HUGETLB_PAGE +/* + * Struct raw_hwp_page represents information about "raw error page", + * constructing singly linked list from ->_hugetlb_hwpoison field of folio. + */ +struct raw_hwp_page { + struct llist_node node; + struct page *page; +}; + +static inline struct llist_head *raw_hwp_list_head(struct folio *folio) +{ + return (struct llist_head *)&folio->_hugetlb_hwpoison; +} +#endif + #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) extern void clear_huge_page(struct page *page, unsigned long addr_hint, diff --git a/mm/memory-failure.c b/mm/memory-failure.c index eb5579b6787e..48e62d04af17 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1826,19 +1826,6 @@ EXPORT_SYMBOL_GPL(mf_dax_kill_procs); #endif /* CONFIG_FS_DAX */ #ifdef CONFIG_HUGETLB_PAGE -/* - * Struct raw_hwp_page represents information about "raw error page", - * constructing singly linked list from ->_hugetlb_hwpoison field of folio. - */ -struct raw_hwp_page { - struct llist_node node; - struct page *page; -}; - -static inline struct llist_head *raw_hwp_list_head(struct folio *folio) -{ - return (struct llist_head *)&folio->_hugetlb_hwpoison; -} static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag) { From patchwork Fri Apr 28 00:41:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13225918 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7C82C77B73 for ; Fri, 28 Apr 2023 00:41:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3053A6B0078; Thu, 27 Apr 2023 20:41:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2661F6B007B; Thu, 27 Apr 2023 20:41:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 106996B007D; Thu, 27 Apr 2023 20:41:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 02B366B0078 for ; Thu, 27 Apr 2023 20:41:56 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id BFF11A035B for ; Fri, 28 Apr 2023 00:41:55 +0000 (UTC) X-FDA: 80728947390.16.E183ECE Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf09.hostedemail.com (Postfix) with ESMTP id EFC7B14000B for ; Fri, 28 Apr 2023 00:41:52 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=H3Aow8dq; spf=pass (imf09.hostedemail.com: domain of 3TxZLZAgKCNA54wC4Kw92AA270.yA8749GJ-886Hwy6.AD2@flex--jiaqiyan.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3TxZLZAgKCNA54wC4Kw92AA270.yA8749GJ-886Hwy6.AD2@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682642513; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/o5oEUz9QY3MpW1sWxeWWzj/nFi9LBqOWOdS4qpvdww=; b=Aid2oZ1Qog1SYN5J4mvX6csIjZ7hdJBgHuYmTPG/Zcd1IbgD6R4GZOaaIpL9XE0o6yzpQo EdwVZU8X8N30bufG8b5AGWX33S4eUpZwxnJ0T0zlpmlnKTiYZ3MlptwV5pr/2SVB5LTo7K oDaaOlLXlFmvC8nObxi6tLipSFMmYrI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=H3Aow8dq; spf=pass (imf09.hostedemail.com: domain of 3TxZLZAgKCNA54wC4Kw92AA270.yA8749GJ-886Hwy6.AD2@flex--jiaqiyan.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3TxZLZAgKCNA54wC4Kw92AA270.yA8749GJ-886Hwy6.AD2@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682642513; a=rsa-sha256; cv=none; b=VANBvOX4+AfpMkPIKVP6FMhbv0Pmh0WSoZiYC01q5j6nvTjdjq81zuzyZvfb+vPvTTWNT4 MNW37L36wf55cVYBd+ocpwKFhIEYpxXAdq+CDpMUvDfgvXDc/JJl0N2SRHIKU5bEovW+vi Uq8kr9WRhaox5mdtsl6wmTUInSkBtSY= Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-51b67183546so5484376a12.0 for ; Thu, 27 Apr 2023 17:41:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682642511; x=1685234511; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=/o5oEUz9QY3MpW1sWxeWWzj/nFi9LBqOWOdS4qpvdww=; b=H3Aow8dqpsTl+l8cACpDA0oGPYleJ8rZzl0l4HNzNXUQLqmNjqSWtnGmjDEbYTT2ug rimuWS5nO+TuUPapY0M0aNes9nsnlU4VRybeQAvHLUWI3zxVov2KJc4PtUvWGG5q23ft BocnV5I6j9gGFe7suFnGIyh9nycSkCjfU6iEEVM5hafPAUe8G6Tgmbovev4CAZzjn/F1 bkNlgHQsxB2naDkK+Ksj/yf6Bi8WCjSulS2hbY0+guIEGzgoHIuXLH3p/5g3TFI6ei8X zj7r82kI5QrxhOf73CJ3SqQm925ACvT/lcFvzS2U4C9P0wCgLPd2ZzT2atfl/Lm0T9tR 9Cyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682642512; x=1685234512; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/o5oEUz9QY3MpW1sWxeWWzj/nFi9LBqOWOdS4qpvdww=; b=hHRX3riF6EbzKyKWQH9WQHk+1etLitywFQu33l/JmZRRBo80maGqmUTw9Y5Pfx9yUI WAqGtR2bho6CU3iv563idaP0gPbvBK9EpfIKHiHA/EclEIZTgL3UKnHp4Bmq6CEJSCW/ P34sS4GiX2AtNfNPTtiGCMk23bU31eno8wZN1T3WV24mOxiH1b2KmESISxBMqV37bBtv dOH8Ep1ihLdMs+AWcAH93EfDo2EQpG9I8BHd5409l1J3As3GU6XFjmyUAYZcRPEXQekI VGlJoE7MDWLJbKPPbG3jxyrdYLnK3zj5XEHvjsPuBFVyAX51ODx2EVM98XJJsKd5nyha lzHg== X-Gm-Message-State: AC+VfDxmWDxC8GBv9UfbV57FKnRUcjbzkFpR+ID/tzYNQwKCChexFty/ wmbekpYKlvaiLd0X1E0cj0fU6/V/D9cnzw== X-Google-Smtp-Source: ACHHUZ7EYI0v8TfE0cDAj+VsJnuUzkACUMZNzRrqX/ygIrQzIMk/uWr3EPO/bWYVpDd5SGvb73zYw7w2ENsYGQ== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a63:6985:0:b0:517:ce37:756e with SMTP id e127-20020a636985000000b00517ce37756emr815534pgc.7.1682642511665; Thu, 27 Apr 2023 17:41:51 -0700 (PDT) Date: Fri, 28 Apr 2023 00:41:36 +0000 In-Reply-To: <20230428004139.2899856-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230428004139.2899856-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.40.1.495.gc816e09b53d-goog Message-ID: <20230428004139.2899856-5-jiaqiyan@google.com> Subject: [RFC PATCH v1 4/7] mm/memory_failure: unmap raw HWPoison PTEs when possible From: Jiaqi Yan To: mike.kravetz@oracle.com, peterx@redhat.com, naoya.horiguchi@nec.com Cc: songmuchun@bytedance.com, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com, rientjes@google.com, linmiaohe@huawei.com, shy828301@gmail.com, baolin.wang@linux.alibaba.com, wangkefeng.wang@huawei.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jiaqi Yan X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 7qdqirb6oxy3x8ztw76m5n9nok65wij6 X-Rspamd-Queue-Id: EFC7B14000B X-HE-Tag: 1682642512-822431 X-HE-Meta: U2FsdGVkX1+G916xZWtFqteSC7i0Oo0ShxSkT9hgC2FJxoO176rnpjU2Re8HzJ1MLEm20vCLft2OR2YUh+/xfuJ0qB0ci0EjoQYw6NelpRauFDJQvxXDwyc+bG/Ml5cbXFAyC4Lx0B52AvUKl/PhrLNEXw3skZejkkQz6BOKnib4qXj37JvctWDhPx4yHjNL0Gap04dZlNM/7ht5Tbw8u/rkWLgV7fQyGiiui07JQKWDe6pmMBjXTXL4ExNnZaqetWKUZ9USfE+z1qAu5Do7ytm2xi/i6j3IWFMcJmZrJmO/cxS3HsgiESv7N+HKyyFDV+jYvzSmoma2PXSFq0RxHKxwEE4TCVyW+zHFCGWYexVdZyK1Hb/ZDgrAeptn3OVs5e+g3aDkzKslGhXY8QSNMaGSmgfXaHrD4/GSo4P1044KK89uehkOIlePUtAFMjcdenDj+xdpqOlIG3z6z4grnEhL8Zy42KlK1XrQ8hKJ9w7n0O/htSRD2OQNl3DWmy7G5lhc1+R8d+P3dbxk/UtmfHvqCX0gM/WMZ3y6heOXc5/uHP8uhzbWFkpZS+IKzuXjnqC5UoFMFPXS3EsDF9T/jZ9tEZ7zbkFJeLMTK6/nI9DAmGwqtp/wOCpZz+pdAhyWxgdKvm01NNX1DfhmUWA6R2UFEScekKTNwA7XUz0dVTF3zYiWx4gboQhTcXIJzHGsqcwiyw0Ga2d4UBH11wqKuiAdMIMYSzwjS0av0FLS9kR4o3s6sJ8HicMSTnuo15iU04fv/h8N4HztuOj4tr/vL3ZRsx4SHKmq37lJsYgfLFV+hLS1orwVjjUiVAQr1zW7kk5Qvoq8CMnpjnRKxd9pUt7IrwqeMH4BAdqKqYvDitdxN7hHm/SXQHyQAcWEqPrlX/ExsMKU6wa5DwfHy+ohgGqWik4FhnDel2D1Vpz3TBNaJD7xvKzqjXwa+ThwMdVuyo9j0zoH5JhNpbN37bv 7pVLFpCU QKeLPLErnWP7MecDTq7MAEnTRGMKcurrMSDwI5L2VE3GdVGxxomPPUt/C1RbUbaDIBlkljHZAj0a5MUBNR02HvBZsFPi+Uso+S4FzZswEU99IJj+r24O7xgXAi6ClW95x8fE3GOMCAnFQWVT4vQCSs9pIoCdV5pwVmMtV/+UkaGCYWZR7euDDhoT3ySIKTB5WLV01HRcuArq31qxDUCsy8Mjzw9dCAamKX8mCM4+oIg6GqjUkNUjMY6UUBk4lKC4dVac3HMolYQoePHCEMfDcIEsTaZTHxNg/QiUZzLP/melVktbuJvU+7hz3nPdSqDJHb+Pz2UJiDqqxGk0MJKWO+Mr92WdYyppxd/B4+5mMfJYXt5JTSCLn14xfYLhUjCzj2pmtiO+15P0azQvb8QwXhbicuvmpae8hCOovRjbGyhM3j1qtTM/KbG+K4vc4TJ0UUq39SlsoC3G65m2gulO12QVpLkX6gbnoModMMkZMIPBcXifsRXN20Qmsl/Ahlnk6YKarJE8TUn1zd1kLOhLsTeudLXPr9UMHn6V6/RUFytvBiXo8Md/Ve1jY/WWo1W3BJDpvnzK/SWswFhojDxpFM3ZqtluVOX+5xY8ytUuYC0MaXSJgEl+nb/bxUPpWp02mJcZvP7Nx6+EEDd9LiZf4YOp886QGUwbH1BUdW1Sy1vODcNw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When a folio's VMA is HGM eligible, try_to_unmap_one now only unmaps the raw HWPOISON page (previously split and mapped at PTE size). If HGM failed to be enabled on eligible VMA or splitting failed, try_to_unmap_one fails. For VMS that is not HGM eligible, try_to_unmap_one still unmaps the whole P*D. When only the raw HWPOISON subpage is unmapped but others keep mapped, the old way in memory_failure to check if unmapping successful doesn't work. So introduce is_unmapping_successful() to cover both existing and new unmapping behavior. For the new unmapping behavior, store how many times a raw HWPOISON page is expected to be unmapped, and how many times it is actually unmapped in try_to_unmap_one(). A HWPOISON raw page is expected to be unmapped from a VMA if splitting succeeded in try_to_split_huge_mapping(), so unmap_success = (nr_expected_unamps == nr_actual_unmaps). Old folio_set_hugetlb_hwpoison returns -EHWPOISON if a folio has any raw HWPOISON subpage, and try_memory_failure_hugetlb won't attempt recovery actions again because recovery used to be done on the entire hugepage. With the new unmapping behavior, this doesn't hold. More subpages in the hugepage can become corrupted, and needs to be recovered (i.e. unmapped) individually. New folio_set_hugetlb_hwpoison returns 0 after adding a new raw subpage to raw_hwp_list. Unmapping raw HWPOISON page requires allocating raw_hwp_page successfully in folio_set_hugetlb_hwpoison, so try_memory_failure_hugetlb now may fail due to OOM. Signed-off-by: Jiaqi Yan --- include/linux/mm.h | 20 ++++++- mm/memory-failure.c | 140 ++++++++++++++++++++++++++++++++++++++------ mm/rmap.c | 38 +++++++++++- 3 files changed, 175 insertions(+), 23 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 4496d7bdd3ea..dc192f98cb1d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3522,20 +3522,38 @@ enum mf_action_page_type { */ extern const struct attribute_group memory_failure_attr_group; -#ifdef CONFIG_HUGETLB_PAGE /* * Struct raw_hwp_page represents information about "raw error page", * constructing singly linked list from ->_hugetlb_hwpoison field of folio. + * @node: the node in folio->_hugetlb_hwpoison list. + * @page: the raw HWPOISON page struct. + * @nr_vmas_mapped: the number of VMAs that map @page when detected. + * @nr_expected_unmaps: if a VMA that maps @page when detected is eligible + * for high granularity mapping, @page is expected to be unmapped. + * @nr_actual_unmaps: how many times the raw page is actually unmapped. */ struct raw_hwp_page { struct llist_node node; struct page *page; + int nr_vmas_mapped; + int nr_expected_unmaps; + int nr_actual_unmaps; }; +#ifdef CONFIG_HUGETLB_PAGE static inline struct llist_head *raw_hwp_list_head(struct folio *folio) { return (struct llist_head *)&folio->_hugetlb_hwpoison; } + +struct raw_hwp_page *find_in_raw_hwp_list(struct folio *folio, + struct page *subpage); +#else +static inline struct raw_hwp_page *find_in_raw_hwp_list(struct folio *folio, + struct page *subpage) +{ + return NULL; +} #endif #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBFS) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 48e62d04af17..47b935918ceb 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1120,10 +1120,10 @@ static int me_swapcache_clean(struct page_state *ps, struct page *p) } /* - * Huge pages. Needs work. - * Issues: - * - Error on hugepage is contained in hugepage unit (not in raw page unit.) - * To narrow down kill region to one page, we need to break up pmd. + * Huge pages. + * - Without HGM: Error on hugepage is contained in hugepage unit (not in + * raw page unit). + * - With HGM: Kill region is narrowed down to just one page. */ static int me_huge_page(struct page_state *ps, struct page *p) { @@ -1131,6 +1131,7 @@ static int me_huge_page(struct page_state *ps, struct page *p) struct page *hpage = compound_head(p); struct address_space *mapping; bool extra_pins = false; + struct raw_hwp_page *hwp_page = find_in_raw_hwp_list(page_folio(p), p); if (!PageHuge(hpage)) return MF_DELAYED; @@ -1157,7 +1158,8 @@ static int me_huge_page(struct page_state *ps, struct page *p) } } - if (has_extra_refcount(ps, p, extra_pins)) + if (hwp_page->nr_expected_unmaps == 0 && + has_extra_refcount(ps, p, extra_pins)) res = MF_FAILED; return res; @@ -1497,24 +1499,30 @@ static void try_to_split_huge_mapping(struct folio *folio, unsigned long poisoned_addr; unsigned long head_addr; struct hugetlb_pte hpte; + struct raw_hwp_page *hwp_page = NULL; if (WARN_ON(!mapping)) return; VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + hwp_page = find_in_raw_hwp_list(folio, poisoned_page); + VM_BUG_ON_PAGE(!hwp_page, poisoned_page); + pgoff_start = folio_pgoff(folio); pgoff_end = pgoff_start + folio_nr_pages(folio) - 1; vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff_start, pgoff_end) { + ++hwp_page->nr_vmas_mapped; + /* Enable HGM on HGM-eligible VMAs. */ if (!hugetlb_hgm_eligible(vma)) continue; i_mmap_assert_locked(vma->vm_file->f_mapping); if (hugetlb_enable_hgm_vma(vma)) { - pr_err("Failed to enable HGM on eligible VMA=[%#lx, %#lx)\n", - vma->vm_start, vma->vm_end); + pr_err("%#lx: failed to enable HGM on eligible VMA=[%#lx, %#lx)\n", + page_to_pfn(poisoned_page), vma->vm_start, vma->vm_end); continue; } @@ -1528,15 +1536,21 @@ static void try_to_split_huge_mapping(struct folio *folio, * lock on vma->vm_file->f_mapping, which caller * (e.g. hwpoison_user_mappings) should already acquired. */ - if (hugetlb_full_walk(&hpte, vma, head_addr)) + if (hugetlb_full_walk(&hpte, vma, head_addr)) { + pr_err("%#lx: failed to PT-walk with HGM on eligible VMA=[%#lx, %#lx)\n", + page_to_pfn(poisoned_page), vma->vm_start, vma->vm_end); continue; + } if (hugetlb_split_to_shift(vma->vm_mm, vma, &hpte, poisoned_addr, PAGE_SHIFT)) { - pr_err("Failed to split huge mapping: pfn=%#lx, vaddr=%#lx in VMA=[%#lx, %#lx)\n", + pr_err("%#lx: Failed to split huge mapping: vaddr=%#lx in VMA=[%#lx, %#lx)\n", page_to_pfn(poisoned_page), poisoned_addr, vma->vm_start, vma->vm_end); + continue; } + + ++hwp_page->nr_expected_unmaps; } } #else @@ -1546,6 +1560,47 @@ static void try_to_split_huge_mapping(struct folio *folio, } #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ +static bool is_unmapping_successful(struct folio *folio, + struct page *poisoned_page) +{ + bool unmap_success = false; + struct raw_hwp_page *hwp_page = find_in_raw_hwp_list(folio, poisoned_page); + + if (!folio_test_hugetlb(folio) || + folio_test_anon(folio) || + !IS_ENABLED(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING)) { + unmap_success = folio_mapped(folio); + if (!unmap_success) + pr_err("%#lx: failed to unmap page (mapcount=%d)\n", + page_to_pfn(poisoned_page), + page_mapcount(folio_page(folio, 0))); + + return unmap_success; + } + + VM_BUG_ON_PAGE(!hwp_page, poisoned_page); + + /* + * Unmapping may not happen for some VMA: + * - HGM-eligible VMA but @poisoned_page is not faulted yet: nothing + * needs to be done at this point yet until page fault handling. + * - HGM-non-eliggible VMA: mapcount decreases by nr_subpages for each VMA, + * but not tracked so cannot tell if successfully unmapped from such VMA. + */ + if (hwp_page->nr_vmas_mapped != hwp_page->nr_expected_unmaps) + pr_info("%#lx: mapped by %d VMAs but %d unmappings are expected\n", + page_to_pfn(poisoned_page), hwp_page->nr_vmas_mapped, + hwp_page->nr_expected_unmaps); + + unmap_success = hwp_page->nr_expected_unmaps == hwp_page->nr_actual_unmaps; + + if (!unmap_success) + pr_err("%#lx: failed to unmap page (folio_mapcount=%d)\n", + page_to_pfn(poisoned_page), folio_mapcount(folio)); + + return unmap_success; +} + /* * Do all that is necessary to remove user space mappings. Unmap * the pages and send SIGBUS to the processes if the data was dirty. @@ -1631,10 +1686,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn, try_to_unmap(folio, ttu); } - unmap_success = !page_mapped(hpage); - if (!unmap_success) - pr_err("%#lx: failed to unmap page (mapcount=%d)\n", - pfn, page_mapcount(hpage)); + unmap_success = is_unmapping_successful(folio, p); /* * try_to_unmap() might put mlocked page in lru cache, so call @@ -1827,6 +1879,31 @@ EXPORT_SYMBOL_GPL(mf_dax_kill_procs); #ifdef CONFIG_HUGETLB_PAGE +/* + * Given a HWPOISON @subpage as raw page, find its location in @folio's + * _hugetlb_hwpoison. Return NULL if @subpage is not in the list. + */ +struct raw_hwp_page *find_in_raw_hwp_list(struct folio *folio, + struct page *subpage) +{ + struct llist_node *t, *tnode; + struct llist_head *raw_hwp_head = raw_hwp_list_head(folio); + struct raw_hwp_page *hwp_page = NULL; + struct raw_hwp_page *p; + + VM_BUG_ON_PAGE(PageHWPoison(subpage), subpage); + + llist_for_each_safe(tnode, t, raw_hwp_head->first) { + p = container_of(tnode, struct raw_hwp_page, node); + if (subpage == p->page) { + hwp_page = p; + break; + } + } + + return hwp_page; +} + static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag) { struct llist_head *head; @@ -1837,6 +1914,9 @@ static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag) llist_for_each_safe(tnode, t, head->first) { struct raw_hwp_page *p = container_of(tnode, struct raw_hwp_page, node); + /* Ideally raw HWPoison pages are fully unmapped if possible. */ + WARN_ON(p->nr_expected_unmaps != p->nr_actual_unmaps); + if (move_flag) SetPageHWPoison(p->page); else @@ -1853,7 +1933,8 @@ static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page) struct llist_head *head; struct raw_hwp_page *raw_hwp; struct llist_node *t, *tnode; - int ret = folio_test_set_hwpoison(folio) ? -EHWPOISON : 0; + bool has_hwpoison = folio_test_set_hwpoison(folio); + bool hgm_enabled = IS_ENABLED(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING); /* * Once the hwpoison hugepage has lost reliable raw error info, @@ -1873,9 +1954,20 @@ static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page) raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC); if (raw_hwp) { raw_hwp->page = page; + raw_hwp->nr_vmas_mapped = 0; + raw_hwp->nr_expected_unmaps = 0; + raw_hwp->nr_actual_unmaps = 0; llist_add(&raw_hwp->node, head); + if (hgm_enabled) + /* + * A new raw poisoned page. Don't return + * HWPOISON. Error event will be counted + * in action_result(). + */ + return 0; + /* the first error event will be counted in action_result(). */ - if (ret) + if (has_hwpoison) num_poisoned_pages_inc(page_to_pfn(page)); } else { /* @@ -1889,8 +1981,16 @@ static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page) * used any more, so free it. */ __folio_free_raw_hwp(folio, false); + + /* + * HGM relies on raw_hwp allocated and inserted to raw_hwp_list. + */ + if (hgm_enabled) + return -ENOMEM; } - return ret; + + BUG_ON(hgm_enabled); + return has_hwpoison ? -EHWPOISON : 0; } static unsigned long folio_free_raw_hwp(struct folio *folio, bool move_flag) @@ -1936,6 +2036,7 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags, struct page *page = pfn_to_page(pfn); struct folio *folio = page_folio(page); int ret = 2; /* fallback to normal page handling */ + int set_page_hwpoison = 0; bool count_increased = false; if (!folio_test_hugetlb(folio)) @@ -1956,8 +2057,9 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags, goto out; } - if (folio_set_hugetlb_hwpoison(folio, page)) { - ret = -EHWPOISON; + set_page_hwpoison = folio_set_hugetlb_hwpoison(folio, page); + if (set_page_hwpoison) { + ret = set_page_hwpoison; goto out; } @@ -2004,7 +2106,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb res = kill_accessing_process(current, folio_pfn(folio), flags); } return res; - } else if (res == -EBUSY) { + } else if (res == -EBUSY || res == -ENOMEM) { if (!(flags & MF_NO_RETRY)) { flags |= MF_NO_RETRY; goto retry; diff --git a/mm/rmap.c b/mm/rmap.c index d3bc81466902..4cfaa34b001e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1453,6 +1453,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; bool page_poisoned; + bool hgm_eligible = hugetlb_hgm_eligible(vma); + struct raw_hwp_page *hwp_page; /* * When racing against e.g. zap_pte_range() on another cpu, @@ -1525,6 +1527,29 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * in the case where the hugetlb page is poisoned. */ VM_BUG_ON_FOLIO(!page_poisoned, folio); + + /* + * When VMA is not HGM eligible, unmap at hugepage's + * original P*D. + * + * When HGM is eligible: + * - if original P*D is split to smaller P*Ds and + * PTEs, we skip subpage if it is not raw HWPoison + * page, or it was but was already unmapped. + * - if original P*D is not split, skip unmapping + * and memory_failure result will be MF_IGNORED. + */ + if (hgm_eligible) { + if (pvmw.pte_order > 0) + continue; + hwp_page = find_in_raw_hwp_list(folio, subpage); + if (hwp_page == NULL) + continue; + if (hwp_page->nr_expected_unmaps == + hwp_page->nr_actual_unmaps) + continue; + } + /* * huge_pmd_unshare may unmap an entire PMD page. * There is no way of knowing exactly which PMDs may @@ -1760,12 +1785,19 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * * See Documentation/mm/mmu_notifier.rst */ - if (folio_test_hugetlb(folio)) + if (!folio_test_hugetlb(folio)) + page_remove_rmap(subpage, vma, false); + else { hugetlb_remove_rmap(subpage, pvmw.pte_order + PAGE_SHIFT, hstate_vma(vma), vma); - else - page_remove_rmap(subpage, vma, false); + if (hgm_eligible) { + VM_BUG_ON_FOLIO(pvmw.pte_order > 0, folio); + VM_BUG_ON_FOLIO(!hwp_page, folio); + VM_BUG_ON_FOLIO(subpage != hwp_page->page, folio); + ++hwp_page->nr_actual_unmaps; + } + } if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); From patchwork Fri Apr 28 00:41:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13225919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70EF0C77B61 for ; Fri, 28 Apr 2023 00:41:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 994226B007B; Thu, 27 Apr 2023 20:41:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 91BFD6B007D; Thu, 27 Apr 2023 20:41:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E5496B007E; Thu, 27 Apr 2023 20:41:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 702216B007B for ; Thu, 27 Apr 2023 20:41:56 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 430228041A for ; Fri, 28 Apr 2023 00:41:56 +0000 (UTC) X-FDA: 80728947432.09.10F5EB3 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf07.hostedemail.com (Postfix) with ESMTP id 7942340017 for ; Fri, 28 Apr 2023 00:41:54 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=VnYGn8ix; spf=pass (imf07.hostedemail.com: domain of 3URZLZAgKCNI76yE6MyB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jiaqiyan.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3URZLZAgKCNI76yE6MyB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682642514; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n4TaCEi/qu9zeBFiKyPkDKaAfUVepeL/TJwi41Ch5WU=; b=tVPORKxTsAT66gCXBjNmQ5+Aptnd5UfBs/uyK3A0UdUua5FPVx7ePqiFKLMcRD2ODNDzVH L66QIuS6msecVpPYAgUH2sVuQb1viXnszyamSdZ+Ft7MK3Y/DGhX9ncOCRksgPQMxyS3MC 21lhNwK6Im/c0jnsVtlgT5A1vlM53dM= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=VnYGn8ix; spf=pass (imf07.hostedemail.com: domain of 3URZLZAgKCNI76yE6MyB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jiaqiyan.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3URZLZAgKCNI76yE6MyB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682642514; a=rsa-sha256; cv=none; b=S4t9V3uwOqugbzPF9KubjZJqY8GfdN+Z+jv+C9s69V5RMA8xYvU6HW/hrKV3ZZfrTXnlOG 79IWsjRoPQnZ5LyKigWzaCkPDvOUY+a/j+rEaJsS1wCFv/MAz2YT4WKUVTdwDJOV6KwY2+ h8Yk8A/UOyua1IxeQH0KWuhc4/YFCv4= Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-247a0922a71so4940080a91.3 for ; Thu, 27 Apr 2023 17:41:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682642513; x=1685234513; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=n4TaCEi/qu9zeBFiKyPkDKaAfUVepeL/TJwi41Ch5WU=; b=VnYGn8ix7ehNw4jSdgSliIEL30nuomoEdqnEPKIL+LwGe6Kq9F94+shVVxiheZbkWD 6SCxFWByNoglqaAY8+2jwRBhhMv8is9cTlRndkNUAT+ekUicUpVUKSrYTJE1+1qgY1+T XalfCUNsRxx40XaKcI4Au6GAT6yyjsw2JmxEiO3jC331ijP61wbqZRvaor+AuQRY/EV0 8xhF7Q/kD7NJMX6DeaoffwGfPsl1qiOTx1jh3A1GwOTpnDWdZFX3F96vp3dTkKAd/s8Q x0j8eh66XJ/N6SBvIyYPcrKHiqstUyfTb2pMBzaxOcrjP89vFasMLH8UbLHFT1qmcwmr pIMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682642513; x=1685234513; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=n4TaCEi/qu9zeBFiKyPkDKaAfUVepeL/TJwi41Ch5WU=; b=CV9kECsGEG4EgSd+tpwhBWWsVWOjgSslN3zKCzy0TWijtBW1yR6e08zSqb7lxtzzAg VamhlYb/uioL6su2fmfZbMsZ3pIm1eeQmTp30Wkt+W/KsoUmqqn2LHjzKOR3sPo4SPtH 8ufrezYcciNOq9J8zwL+CwtmeZ7dPF3NsqxdjaM0nHCQRIiDBOuwS/W8ijVajjiZqZFC o2G9WeA2KVgeXDoTj89FcJLRV4QUEOZ6UHdwyhH46vKJaKYSkd1KZDMoRkrEHw5COmEV ZTH6dE9jadTXd7CduY84Wnx+H9PBlyy2S4w0i1OwMbBgXZifqLS2dqB0fz4LUGZR8OSW MIdQ== X-Gm-Message-State: AC+VfDwR4bMoVgPjG00vNILLu9gO8CDFaQ5dcNngbg6RgSPok1Qd2njy 6ITiNNvkMtnhe0eeX6hY1ZV8/dRehjJxsg== X-Google-Smtp-Source: ACHHUZ6lB5FSBQICSdN4ldTt2OldKStIrw0Mu+868RMewcAyv255kFlkEUhCZW+eyXgtrhKbY3nUVvDJARNhtQ== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a17:902:cecf:b0:1a5:e03:55b with SMTP id d15-20020a170902cecf00b001a50e03055bmr992059plg.11.1682642513448; Thu, 27 Apr 2023 17:41:53 -0700 (PDT) Date: Fri, 28 Apr 2023 00:41:37 +0000 In-Reply-To: <20230428004139.2899856-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230428004139.2899856-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.40.1.495.gc816e09b53d-goog Message-ID: <20230428004139.2899856-6-jiaqiyan@google.com> Subject: [RFC PATCH v1 5/7] hugetlb: only VM_FAULT_HWPOISON_LARGE raw page From: Jiaqi Yan To: mike.kravetz@oracle.com, peterx@redhat.com, naoya.horiguchi@nec.com Cc: songmuchun@bytedance.com, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com, rientjes@google.com, linmiaohe@huawei.com, shy828301@gmail.com, baolin.wang@linux.alibaba.com, wangkefeng.wang@huawei.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jiaqi Yan X-Rspamd-Queue-Id: 7942340017 X-Stat-Signature: 4s68szepsyw8woo11wpf6wrdz9sgtg3e X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1682642514-577524 X-HE-Meta: U2FsdGVkX1/r4WiJl1GW6Tn+ukYQUBFRNkptvDtXUmwDmlkb85z9HqiJhdDGZ+lb9zxmCjZtMSyxI2W9VrBRSWNhi/OJm4MUdL8UignTHlGgB45UuSOqKpagzdkGYWIWGpoVDz3U9ggj2yOrSKEejTrU8mLSXftdE6/8fMhwbmVluq9Ps7zuZXnLw+nFPqJ+kU0+EGjQU4A2/MGNiN/NQWZwiQ/ttGx9IR5uMhNZnwVoVNUoOxAWThlwuAumlapzyLN65DvlU4PV2gPP0kucw7VCtHFkMHFQoNlgSKOwc9CGNdDwzKKZbnv8Mim7ZCe0EOFpTFZamT1YNs+5HKFP+HXEx241JBZhUXTf9h2TOrJKeP0DWgWxI7D4joC9g3smKRExiO5TmuO0KN3JiXRevkSVIDXxD2sCpvnricughDV1j8Ut1W5whwJH0wOR4JoEqLb+4+hCsPQKk5T2BWmNjauwJS81DczYRx0/8qpKl91R4UlDCYAx5UkW2bYABEOKP1rxMvQu2lqILaK/O4RctY6XGLi5ec7++B4qI3af+Z6Qf8ULgTCIssIn2aPlbv2Buw3l7hudji6YSUkPtkTToogOLlBiuSAulbgt2Te2JMf1MQXPscdIBp7jZTOqIwS+A9R3sb7mSTnwjZGx7dQSH2jEqNGRVjnZM/3Ip9A91/rDIIOwXciXp1PdMlfFra1c9RmfH+r9NTGbg6z7QYUga6/8eaFGg/mxXcy+BmidbjjjuFFYqTyYAWw8uFGdurllsBo1xLRNelnuZnDsm1k/oXWZzhHmHu/CGuiKZtXPlBTqbNp2IX/uebN6HestGbBYNpKnAhpO24Cx/Hdn/nx38+KibVUWYnvwgivXrVaSRpbV2uqsfgnyHTYn1yRK8GXR1ZE8zPzZG/ZGJZSRJoW6K26DyFItzbGYekuYb7qnB9rJ1qE9UURGpjWh+AKBmyR/wdQiZMtTLYwjkRfg1Hs iwNP/qje 8c8JsyiLN4UzU91x1eRL8fRxogfuS5H473vbPHYJgWi/E9li7BcUTZT2CT0EL8VtQlBAsVOCT0YPq/O5EJdvlLy9XjaOb7caP5TUM7vQYff+B9oQUkdTByxBbiULIDWk0fKlmwMl8U01r9k1hHOgE8gJYgGJ9jI2jQJfAFpZ+7IrSmriGRThqNghipGyVh1r/WjLm6yFfhHVgCOmdrhaieyv6+pAf6SeDgUYXn8estnovLzXo7LZTQ72f1rNcH9SfH1HJpj0K2X82m5JmvEsZDDl6RkaLN1vxECz8yAVC3I29Ldhr0wX5c12n4Plp44WpLq6NAaWExGHg7dtS+/buQhlhsHXrbAgcieN7AQmDpt9GSsxhaqzSYjuQ4xcH6W7f7wKudUzHCufWG1pIji+u9kzGVPDwXT5No1DT15WTxpctlfZWXiBLJI1akCYwfg5qRksST1++2F7zsrGlrseJh5a7tvZt/7n+QyD3uggi18tEQQtEEPCvTxFFuqBi2yNUZQMKAgqT8AN7MmMTMfU79+YgPxFa+Jfh+uvuoQMqU45tG982/GMoDNltJXOxv3iKpOKx7U3YWp6r40BB2Pkc05sO963vzR8wOG9t/nFCwjO3l85u2nbHw6iMDgbFgXNO5Rl/NIH0rMTOgn0H8Y9P79g+OmnDJGyqD0qe1xZQ6+4AIxw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Memory raw pages can become HWPOISON between when userspace maps a hugepage and when userspace faults in the hugepage. Today when hugetlb faults somewhere in a hugepage containing HWPOISON raw pages, the result is a VM_FAULT_HWPOISON_LARGE. This commit teaches hugetlb page fault handler to only VM_FAULT_HWPOISON_LARGE if the faulting address is within HWPOISON raw page; otherwise, fault handler can continue to fault in healthy raw pages. Signed-off-by: Jiaqi Yan --- include/linux/mm.h | 2 + mm/hugetlb.c | 129 ++++++++++++++++++++++++++++++++++++++++++-- mm/memory-failure.c | 1 + 3 files changed, 127 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index dc192f98cb1d..7caa4530953f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3531,6 +3531,7 @@ extern const struct attribute_group memory_failure_attr_group; * @nr_expected_unmaps: if a VMA that maps @page when detected is eligible * for high granularity mapping, @page is expected to be unmapped. * @nr_actual_unmaps: how many times the raw page is actually unmapped. + * @index: index of the poisoned subpage in the folio. */ struct raw_hwp_page { struct llist_node node; @@ -3538,6 +3539,7 @@ struct raw_hwp_page { int nr_vmas_mapped; int nr_expected_unmaps; int nr_actual_unmaps; + unsigned long index; }; #ifdef CONFIG_HUGETLB_PAGE diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1419176b7e51..f8ddf04ae0c4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6158,6 +6158,30 @@ static struct folio *hugetlb_try_find_lock_folio(struct address_space *mapping, return folio; } +static vm_fault_t hugetlb_no_page_hwpoison(struct mm_struct *mm, + struct vm_area_struct *vma, + struct folio *folio, + unsigned long address, + struct hugetlb_pte *hpte, + unsigned int flags); + +#ifndef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +static vm_fault_t hugetlb_no_page_hwpoison(struct mm_struct *mm, + struct vm_area_struct *vma, + struct folio *folio, + unsigned long address, + struct hugetlb_pte *hpte, + unsigned int flags) +{ + if (unlikely(folio_test_hwpoison(folio))) { + return VM_FAULT_HWPOISON_LARGE | + VM_FAULT_SET_HINDEX(hstate_index(hstate_vma(vma))); + } + + return 0; +} +#endif + static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, @@ -6287,13 +6311,13 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, /* * If memory error occurs between mmap() and fault, some process * don't have hwpoisoned swap entry for errored virtual address. - * So we need to block hugepage fault by PG_hwpoison bit check. + * So we need to block hugepage fault by hwpoison check: + * - without HGM, the check is based on PG_hwpoison + * - with HGM, check if the raw page for address is poisoned */ - if (unlikely(folio_test_hwpoison(folio))) { - ret = VM_FAULT_HWPOISON_LARGE | - VM_FAULT_SET_HINDEX(hstate_index(h)); + ret = hugetlb_no_page_hwpoison(mm, vma, folio, address, hpte, flags); + if (unlikely(ret)) goto backout_unlocked; - } /* Check for page in userfault range. */ if (userfaultfd_minor(vma)) { @@ -8426,6 +8450,11 @@ int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma, * the allocated PTEs created before splitting fails. */ + /* + * For none and UFFD_WP marker PTEs, given try_to_unmap_one doesn't + * unmap them, delay the splitting until page fault happens. See the + * hugetlb_no_page_hwpoison check in hugetlb_no_page. + */ if (unlikely(huge_pte_none_mostly(old_entry))) { ret = -EAGAIN; goto skip; @@ -8479,6 +8508,96 @@ int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma, return ret; } +/* + * Given a hugetlb PTE, if we want to split it into its next smaller level + * PTE, return what size we should use to do HGM walk with allocations. + * If given hugetlb PTE is already at smallest PAGESIZE, returns -EINVAL. + */ +static int hgm_next_size(struct vm_area_struct *vma, struct hugetlb_pte *hpte) +{ + struct hstate *h = hstate_vma(vma), *tmp_h; + unsigned int shift; + unsigned long curr_size = hugetlb_pte_size(hpte); + unsigned long next_size; + + for_each_hgm_shift(h, tmp_h, shift) { + next_size = 1UL << shift; + if (next_size < curr_size) + return next_size; + } + + return -EINVAL; +} + +/* + * Check if address is in the range of a HWPOISON raw page. + * During checking hugetlb PTE may be split into smaller hguetlb PTEs. + */ +static vm_fault_t hugetlb_no_page_hwpoison(struct mm_struct *mm, + struct vm_area_struct *vma, + struct folio *folio, + unsigned long address, + struct hugetlb_pte *hpte, + unsigned int flags) +{ + unsigned long range_start, range_end; + unsigned long start_index, end_index; + unsigned long folio_start = vma_address(folio_page(folio, 0), vma); + struct llist_node *t, *tnode; + struct llist_head *raw_hwp_head = raw_hwp_list_head(folio); + struct raw_hwp_page *p = NULL; + bool contain_hwpoison = false; + int hgm_size; + int hgm_ret = 0; + + if (likely(!folio_test_hwpoison(folio))) + return 0; + + if (hugetlb_enable_hgm_vma(vma)) + return VM_FAULT_HWPOISON_LARGE | + VM_FAULT_SET_HINDEX(hstate_index(hstate_vma(vma))); + +recheck: + range_start = address & hugetlb_pte_mask(hpte); + range_end = range_start + hugetlb_pte_size(hpte); + start_index = (range_start - folio_start) / PAGE_SIZE; + end_index = start_index + hugetlb_pte_size(hpte) / PAGE_SIZE; + + contain_hwpoison = false; + llist_for_each_safe(tnode, t, raw_hwp_head->first) { + p = container_of(tnode, struct raw_hwp_page, node); + if (start_index <= p->index && p->index < end_index) { + contain_hwpoison = true; + break; + } + } + + if (!contain_hwpoison) + return 0; + + if (hugetlb_pte_size(hpte) == PAGE_SIZE) + return VM_FAULT_HWPOISON; + + /* + * hugetlb_fault already ensured hugetlb_vma_lock_read. + * We also checked hugetlb_pte_size(hpte) != PAGE_SIZE, + * so hgm_size must be something meaningful to HGM. + */ + hgm_size = hgm_next_size(vma, hpte); + VM_BUG_ON(hgm_size == -EINVAL); + hgm_ret = hugetlb_full_walk_alloc(hpte, vma, address, hgm_size); + if (hgm_ret) { + WARN_ON_ONCE(hgm_ret); + /* + * When splitting using HGM fails, return like + * HGM is not eligible or enabled. + */ + return VM_FAULT_HWPOISON_LARGE | + VM_FAULT_SET_HINDEX(hstate_index(hstate_vma(vma))); + } + goto recheck; +} + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 47b935918ceb..9093ba53feed 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1957,6 +1957,7 @@ static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page) raw_hwp->nr_vmas_mapped = 0; raw_hwp->nr_expected_unmaps = 0; raw_hwp->nr_actual_unmaps = 0; + raw_hwp->index = folio_page_idx(folio, page); llist_add(&raw_hwp->node, head); if (hgm_enabled) /* From patchwork Fri Apr 28 00:41:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13225920 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B923C77B7C for ; Fri, 28 Apr 2023 00:42:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 17F216B007D; Thu, 27 Apr 2023 20:41:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 042DA6B007E; Thu, 27 Apr 2023 20:41:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E73B36B0080; Thu, 27 Apr 2023 20:41:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D16426B007D for ; Thu, 27 Apr 2023 20:41:57 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9A5481203CA for ; Fri, 28 Apr 2023 00:41:57 +0000 (UTC) X-FDA: 80728947474.02.AA3B145 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf12.hostedemail.com (Postfix) with ESMTP id E460D4000B for ; Fri, 28 Apr 2023 00:41:55 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=lkIUSE2Y; spf=pass (imf12.hostedemail.com: domain of 3UxZLZAgKCNQ980G8O0D6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jiaqiyan.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3UxZLZAgKCNQ980G8O0D6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682642515; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=83gQugE5cbaf4bRjalHdDmK4/joJvX/FU1hRUwFi3sU=; b=FkI35OBwySm2ef/JnDBHEzqp0DnTdDdUVc0cyQhVnR3+fYMwPd2YekGZJ51objT6UiUt/r Tj7B04U4GwOU5x+2MH/r2uh8CnthG0bHxOj2hh+FIVJHZNMoKdZKZqxyZCrJbJBGLfbdHT OFtgXjHxEkoz6nW5MqUDLk4MlwxIlLA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682642515; a=rsa-sha256; cv=none; b=XNhZ8rEtslIwFG2+e45H+uyf+BBYR/0JWrqD8pJgB/IYhkk+atfdkCHFkgUdWWJxHlRwkA e7O+P/zbAMubfAmSJuoKOmytDpkwoTpipB76iQDWoJZSnRz6qLWOt3I2Ab13TYeCMTUFcn b5GCPuA+9C2yQ3vGjmuBFe36CiEh7ug= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=lkIUSE2Y; spf=pass (imf12.hostedemail.com: domain of 3UxZLZAgKCNQ980G8O0D6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jiaqiyan.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3UxZLZAgKCNQ980G8O0D6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-b9a2abd8f7bso5532641276.2 for ; Thu, 27 Apr 2023 17:41:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682642515; x=1685234515; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=83gQugE5cbaf4bRjalHdDmK4/joJvX/FU1hRUwFi3sU=; b=lkIUSE2Y29UDUiwiH/PXq03j0d2aozkiWLLGwRUxJKBzv349gzdyhvqftmJ2Fu62zY jMWxszNhpeRS5y3kImN5n2opTgBWFzDwuqKRzgYyjj6M0n66z7iMvb+tEcCceMdcsBOp Yeb8sup8/qzQ53OohH3ZK9C3z9mQqNbOrxs+PHi1sWeypkY3IZhF0GeGg2khyzhtLKeV DWO5IYJmpJdERF9S5Pot3hgD65WiP9Z2x3emz7ewMNj79q9jZZcAwhpTWXAelkXAdsUG qsrxPQRSj6bArSMm1jg0g9gbmM8D0ht2i0W9PUL5DphQLDMfi4v6IBF2iF6Uhh+G6NXh 45Zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682642515; x=1685234515; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=83gQugE5cbaf4bRjalHdDmK4/joJvX/FU1hRUwFi3sU=; b=jxZKK3LWZgMMZtpPCoHYiuG/kNg0+ZlePbelHPxNXY+Z6otQ5qEAND28a408bQwe21 ufdyt03QxY3uKPZ3xNoSJj5wKvPqVro9TDZXZNn1vDfczzYbVuAS8daQbE8VLxM3nCMC 5XM+KZLevd4MJoLxnjiaeSZdS6OMZ9vz1rqsHZ6Rz3Rs8Hm82wd1bph76Yqs0CfVPRFd c5Yt7lBl2weWuANeIo1FS3TJpOJ4pAKUzuE01jrO0XE63F3SKlCZsL2lYJWmzJ3yX7y8 nssLqdFjQeY7Ud/p8ziPDr7ag/3SF26f/1NhDGow4GEvF6jO2L/xlotZJUO0xdOm6xd8 aeQA== X-Gm-Message-State: AC+VfDwc4DyWVwdHUY119UYdGCt1K69DRe1/BV4jGMGi6K4wCbC/Kv4M +6FXyrExoXpuaU2gT527yqPwdmgnZ/3mZg== X-Google-Smtp-Source: ACHHUZ7xVswVxMlAbuXM0D1pvugkyEcjJjFYRWa+KqmIhde44CvOq6FK0bhBjRxTVvPgtVl4zq6KC5eExxNjOg== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a25:c041:0:b0:b96:5b8a:3c34 with SMTP id c62-20020a25c041000000b00b965b8a3c34mr1217514ybf.11.1682642515107; Thu, 27 Apr 2023 17:41:55 -0700 (PDT) Date: Fri, 28 Apr 2023 00:41:38 +0000 In-Reply-To: <20230428004139.2899856-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230428004139.2899856-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.40.1.495.gc816e09b53d-goog Message-ID: <20230428004139.2899856-7-jiaqiyan@google.com> Subject: [RFC PATCH v1 6/7] selftest/mm: test PAGESIZE unmapping HWPOISON pages From: Jiaqi Yan To: mike.kravetz@oracle.com, peterx@redhat.com, naoya.horiguchi@nec.com Cc: songmuchun@bytedance.com, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com, rientjes@google.com, linmiaohe@huawei.com, shy828301@gmail.com, baolin.wang@linux.alibaba.com, wangkefeng.wang@huawei.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jiaqi Yan X-Rspam-User: X-Rspamd-Queue-Id: E460D4000B X-Rspamd-Server: rspam09 X-Stat-Signature: p5duudewzacqcnduuy5exuwguzbzwmbg X-HE-Tag: 1682642515-660481 X-HE-Meta: U2FsdGVkX1+WdfPlBCE8juoL3VL0xrOY1rJSh5EpsOOzIJrisQd2s6jGBZisupXhJEzxvtORwwM6E3oPYO1TQbZncLJFBl/Ynt+Dyw9nkRJiUsYS9kB6sOA8GshMzgNJT3mROoC/zd98ZP1Y7JigxogyOaxsG//NsumQ9CWfXUhH6eHregRhg9ul9R7TPaAea5tl2vPjgKUz8eDO7MII91jNeEjLi5ZWAe+k1Z/QAPEOJvpmMONXoTvrAYdE69hSkglnRNC6lmGvRwJrXWPGCBFpW3fYaDOAVYKsoIIxgrnBYwfoCXNy/scrwxwez3L8QmpweENl9OaXJLV3gQscumLtz5ckYAMXRdkrlRfUxfqWc2TzX74D56MaWVFOlYemGKNNeM6PSrUCay09RN96GRoU9wYrZ2XfhKeb/DldE/1/zTwdxptvGtvyuF09vRjE/acu3d2qRIntkZDFTIYwLDo5YlT8ZRkNDycvEOYCTOt14AJaW7RHiNWPWfvbXOsnKp9lb2MqgLj8SJ/ID6ht8OWiFu9+WSJ4UOdwTSWM1cNvUYmgQ2swMZ8qvKUhmn+YHto2u7lfonKHUWjIzXIU9H2PUoQjq+JUe2RQ4TTb0SLBetO6JilD9GuTgaRqdDp2OIlTMNiJSpVNdXRE0Uk4DQLZCm42eY12sHjTbGt2j/wuqV3kQQJHevGvl/BjwdAe/Js7UQk8et5RoIXc3EH3qJzuIi85ij2rasT3jTsydZSO8jVjJ6SyNFnqxqw+5pI2XRlitioF2eu7WaDiTA4hhEyQYWVQP9icirUyBabfPGStQHAtUbU6fwy8SNsay4CJvJluil+8fLch0ldMvupz78SjwWRLx0L45B0HEYeCuRHZpdjnbtxfhld92D+k7D8VCXNoKjSHlp40b3rLOaXDxN3esP7J6f6uJR0JLo7hfEDOQWGnliIn59QP1UrfZT4r0XJz4qDmuoF3rpnO/pV M1WKfA5Y yuLY7Zi3XwVLrVsuMHOMnVPr0SmkI2UhyPLBTXMscTyfvE5YcmTvkExu9AQ37DkTHlzqfCwh6ORzzXxLFFlahNGHeGSSUyUXjv1TWo5IZsQnxKMGfvDkF/KxuYfcUsN1tjKI31u8gUvKL9RZhbaAzWjmDQl788fwh01es4dVYVCcGgPAXgONeZ7vRtKtYU/VtCwXkvZKmhjx9E+ovQW5V65ozcsGgFgD+rLKbDe1FbsRHKA24EnmIleyda3BD6aQewTbiMT6CrpToPryTvf+AC5KZRym96LApn8XpDPsDwfyxxxMw7xWNM++/qfBQQTQv+XxJwe5WcpORpZZ0wNRIE7Y4/VfN2FwJqzA2FLLBL3uNs3rAeLBd1S4lmAmI04+odE/WabkcgJ8GVz+I325t32Yo6QOSV3BVCTvocV5a+XJX5UZ5RXptImfa3s0WCUgHQMhYS7sbZLhZtf5zTbAhlL7uriQ+l/gZuLMI0HPg3OB17d01Wzc9F5OXekXAfqI6rgN+59XlZr6qLlUakmhDWdKT98AExqNMNdJIxzTM8hHLcH4BX9cWzkj2uZxeCZlB2lUi6hJv/0yc8YfGOin1Ri2HrEV/iszC5BYDgXz0TDaMpcSWiBO+/Fvcnw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After injecting memory errors to byte addresses inside HugeTLB page, the updated test checks 1. only a raw page is unmapped, and userspace gets correct SIGBUS from kernel. 2. other subpages in the same hugepage are still mapped and data not corrupted. Signed-off-by: Jiaqi Yan --- tools/testing/selftests/mm/hugetlb-hgm.c | 194 +++++++++++++++++++---- 1 file changed, 167 insertions(+), 27 deletions(-) diff --git a/tools/testing/selftests/mm/hugetlb-hgm.c b/tools/testing/selftests/mm/hugetlb-hgm.c index c0ba6ad44005..bc9529986b66 100644 --- a/tools/testing/selftests/mm/hugetlb-hgm.c +++ b/tools/testing/selftests/mm/hugetlb-hgm.c @@ -39,6 +39,10 @@ #define MADV_SPLIT 26 #endif +#ifndef NUM_HWPOISON_PAGES +#define NUM_HWPOISON_PAGES 3UL +#endif + #define PREFIX " ... " #define ERROR_PREFIX " !!! " @@ -241,6 +245,9 @@ static int test_sigbus(char *addr, bool poison) sigbus_addr, addr); else if (poison && !was_mceerr) printf(ERROR_PREFIX "didn't get an MCEERR?\n"); + else if (!poison && was_mceerr) + printf(ERROR_PREFIX "got BUS_MCEERR_AR sigbus on expected healthy address: %p\n", + sigbus_addr); else ret = 0; out: @@ -272,43 +279,176 @@ static int read_event_from_uffd(int *uffd, pthread_t *pthread) return 0; } -static int test_sigbus_range(char *primary_map, size_t len, bool hwpoison) +struct range_exclude_pages { + /* Starting address of the buffer. */ + char *mapping; + /* Length of the buffer in bytes. */ + size_t length; + /* The value that each byte in buffer should equal to. */ + char value; + /* + * PAGESIZE aligned addresses excluded from the checking, + * e.g. if PAGE_SIZE=4k, for each addr in excludes, + * skips checking on [addr, addr + 4096). + */ + unsigned long excluded[NUM_HWPOISON_PAGES]; +}; + +static int check_range_exclude_pages(struct range_exclude_pages *range) +{ + const unsigned long pagesize = getpagesize(); + unsigned long excluded_index; + unsigned long page_index; + bool should_skip; + size_t i = 0; + size_t j = 0; + + while (i < range->length) { + page_index = ((unsigned long)(range->mapping + i)) / pagesize; + should_skip = false; + for (j = 0; j < NUM_HWPOISON_PAGES; ++j) { + excluded_index = range->excluded[j] / pagesize; + if (page_index == excluded_index) { + should_skip = true; + break; + } + } + if (should_skip) { + printf(PREFIX "skip excluded addr range [%#lx, %#lx)\n", + (unsigned long)(range->mapping + i), + (unsigned long)(range->mapping + i + pagesize)); + i += pagesize; + continue; + } + if (range->mapping[i] != range->value) { + printf(ERROR_PREFIX "mismatch at %p (%d != %d)\n", + &range->mapping[i], range->mapping[i], range->value); + return -1; + } + ++i; + } + + return 0; +} + +enum test_status verify_raw_pages(char *map, size_t len, + unsigned long excluded[NUM_HWPOISON_PAGES]) { const unsigned long pagesize = getpagesize(); - const int num_checks = 512; - unsigned long bytes_per_check = len/num_checks; - int i; + unsigned long size, offset, value; + size_t j = 0; + + for (size = len / 2, offset = 0, value = 1; size > pagesize; + offset += size, size /= 2, ++value) { + struct range_exclude_pages range = { + .mapping = map + offset, + .length = size, + .value = value, + }; + for (j = 0; j < NUM_HWPOISON_PAGES; ++j) + range.excluded[j] = excluded[j]; + + printf(PREFIX "checking non-poisoned range [%p, %p) " + "(len=%#lx) per-byte value=%lu\n", + range.mapping, range.mapping + range.length, + range.length, value); + if (check_range_exclude_pages(&range)) + return TEST_FAILED; + + printf(PREFIX PREFIX "good\n"); + } - printf(PREFIX "checking that we can't access " - "(%d addresses within %p -> %p)\n", - num_checks, primary_map, primary_map + len); + return TEST_PASSED; +} - if (pagesize > bytes_per_check) - bytes_per_check = pagesize; +static int read_hwpoison_pages(unsigned long *nr_hwp_pages) +{ + const unsigned long pagesize = getpagesize(); + char buffer[256] = {0}; + char *cmd = "cat /proc/meminfo | grep -i HardwareCorrupted | grep -o '[0-9]*'"; + FILE *cmdfile = popen(cmd, "r"); - for (i = 0; i < len; i += bytes_per_check) - if (test_sigbus(primary_map + i, hwpoison) < 0) - return 1; - /* check very last byte, because we left it unmapped */ - if (test_sigbus(primary_map + len - 1, hwpoison)) - return 1; + if (!(fgets(buffer, sizeof(buffer), cmdfile))) { + perror("failed to read HardwareCorrupted from /proc/meminfo\n"); + return -1; + } + pclose(cmdfile); + *nr_hwp_pages = atoll(buffer) * 1024 / pagesize; return 0; } -static enum test_status test_hwpoison(char *primary_map, size_t len) +static enum test_status test_hwpoison_one_raw_page(char *hwpoison_addr) { - printf(PREFIX "poisoning %p -> %p\n", primary_map, primary_map + len); - if (madvise(primary_map, len, MADV_HWPOISON) < 0) { + const unsigned long pagesize = getpagesize(); + + printf(PREFIX "poisoning [%p, %p) (len=%#lx)\n", + hwpoison_addr, hwpoison_addr + pagesize, pagesize); + if (madvise(hwpoison_addr, pagesize, MADV_HWPOISON) < 0) { perror(ERROR_PREFIX "MADV_HWPOISON failed"); return TEST_SKIPPED; } - return test_sigbus_range(primary_map, len, true) - ? TEST_FAILED : TEST_PASSED; + printf(PREFIX "checking poisoned range [%p, %p) (len=%#lx)\n", + hwpoison_addr, hwpoison_addr + pagesize, pagesize); + if (test_sigbus(hwpoison_addr, true) < 0) + return TEST_FAILED; + + return TEST_PASSED; } -static int test_fork(int uffd, char *primary_map, size_t len) +static enum test_status test_hwpoison_present(char *map, size_t len, + bool already_injected) +{ + const unsigned long pagesize = getpagesize(); + const unsigned long hwpoison_next = 128; + unsigned long nr_hwpoison_pages_before, nr_hwpoison_pages_after; + enum test_status ret; + size_t i; + char *hwpoison_addr = map; + unsigned long hwpoison_addrs[NUM_HWPOISON_PAGES]; + + if (hwpoison_next * (NUM_HWPOISON_PAGES - 1) >= (len / pagesize)) { + printf(ERROR_PREFIX "max hwpoison_addr out of range"); + return TEST_SKIPPED; + } + + for (i = 0; i < NUM_HWPOISON_PAGES; ++i) { + hwpoison_addrs[i] = (unsigned long)hwpoison_addr; + hwpoison_addr += hwpoison_next * pagesize; + } + + if (already_injected) + return verify_raw_pages(map, len, hwpoison_addrs); + + if (read_hwpoison_pages(&nr_hwpoison_pages_before)) { + printf(ERROR_PREFIX "check #HWPOISON pages\n"); + return TEST_SKIPPED; + } + printf(PREFIX "Before injections, #HWPOISON pages = %ld\n", nr_hwpoison_pages_before); + + for (i = 0; i < NUM_HWPOISON_PAGES; ++i) { + ret = test_hwpoison_one_raw_page((char *)hwpoison_addrs[i]); + if (ret != TEST_PASSED) + return ret; + } + + if (read_hwpoison_pages(&nr_hwpoison_pages_after)) { + printf(ERROR_PREFIX "check #HWPOISON pages\n"); + return TEST_SKIPPED; + } + printf(PREFIX "After injections, #HWPOISON pages = %ld\n", nr_hwpoison_pages_after); + + if (nr_hwpoison_pages_after - nr_hwpoison_pages_before != NUM_HWPOISON_PAGES) { + printf(ERROR_PREFIX "delta #HWPOISON pages != %ld", + NUM_HWPOISON_PAGES); + return TEST_FAILED; + } + + return verify_raw_pages(map, len, hwpoison_addrs); +} + +int test_fork(int uffd, char *primary_map, size_t len) { int status; int ret = 0; @@ -360,7 +500,6 @@ static int test_fork(int uffd, char *primary_map, size_t len) pthread_join(uffd_thd, NULL); return ret; - } static int uffd_register(int uffd, char *primary_map, unsigned long len, @@ -394,6 +533,7 @@ test_hgm(int fd, size_t hugepagesize, size_t len, enum test_type type) bool uffd_wp = type == TEST_UFFDWP; bool verify = type == TEST_DEFAULT; int register_args; + enum test_status hwp_status = TEST_SKIPPED; if (ftruncate(fd, len) < 0) { perror(ERROR_PREFIX "ftruncate failed"); @@ -489,10 +629,10 @@ test_hgm(int fd, size_t hugepagesize, size_t len, enum test_type type) * mapping. */ if (hwpoison) { - enum test_status new_status = test_hwpoison(primary_map, len); - - if (new_status != TEST_PASSED) { - status = new_status; + /* test_hwpoison can fail with TEST_SKIPPED. */ + hwp_status = test_hwpoison_present(primary_map, len, false); + if (hwp_status != TEST_PASSED) { + status = hwp_status; goto done; } } @@ -539,7 +679,7 @@ test_hgm(int fd, size_t hugepagesize, size_t len, enum test_type type) /* * Verify that memory is still poisoned. */ - if (hwpoison && test_sigbus_range(primary_map, len, true)) + if (hwpoison && test_hwpoison_present(primary_map, len, true)) goto done; status = TEST_PASSED; From patchwork Fri Apr 28 00:41:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13225921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DA0FC77B73 for ; Fri, 28 Apr 2023 00:42:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E29B06B007E; Thu, 27 Apr 2023 20:41:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B68466B0080; Thu, 27 Apr 2023 20:41:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9932F900002; Thu, 27 Apr 2023 20:41:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7A5046B007E for ; Thu, 27 Apr 2023 20:41:59 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 49942A0148 for ; Fri, 28 Apr 2023 00:41:59 +0000 (UTC) X-FDA: 80728947558.24.473966D Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf09.hostedemail.com (Postfix) with ESMTP id 5B701140011 for ; Fri, 28 Apr 2023 00:41:57 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=cBpNq4kD; spf=pass (imf09.hostedemail.com: domain of 3VBZLZAgKCNUA91H9P1E7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--jiaqiyan.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3VBZLZAgKCNUA91H9P1E7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682642517; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ezb/mtBWXto18t5BMeGyfP88KWTxkli4FptKeY/x46M=; b=ZKpP2/yx/mteFIaYYNvI1O6acXku2v8M6D0+NA+ThcvKnVzB4JOpfJzgCGkJuamwLx1x2B g3jFBBdYq/Okuk0WJNYOYRux4itfeGYIQlw2Gcha4wSC0HShmHGFiaD2iDuuq73NpMZI0X pAt3Xiy9j9LlSbQypT2w694ZA68gpKU= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=cBpNq4kD; spf=pass (imf09.hostedemail.com: domain of 3VBZLZAgKCNUA91H9P1E7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--jiaqiyan.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3VBZLZAgKCNUA91H9P1E7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682642517; a=rsa-sha256; cv=none; b=uFanllB9NJPP1d/Ap1DpjXP0KhO2gFcn4kxw7CcwCbQ7feSdphGxgUpUjuGZZJVrmp3YtR 8HTUGbaCvZjt5gVGv1k9AHJn54EGzu+Q59HAD5avybH419yn6DMDrgpn6RJFPC5l4LHtKf f2xH91+8nD0McKtpg3rp4DO/96jFHE4= Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-51b67183546so5484435a12.0 for ; Thu, 27 Apr 2023 17:41:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682642516; x=1685234516; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Ezb/mtBWXto18t5BMeGyfP88KWTxkli4FptKeY/x46M=; b=cBpNq4kDOoS1KXNAw36+zGssSnylFqMaR3rTdo/Bbi9j+uSIW0zcOYI0md0xbW4j6W +PHFxJ/rOe3KUn/hfS4DGeQhcETYJOxEfXjZyGlSMKsv+CSEvmuki2TM5s9h+sqzZXkm zFMImG52kUZ92kxbvfqyxdPPLXjw/DJeyVb7bUVcVONfdKFL8z9+VjLIhMS0+re2GE2O zEwtBtEBHkyndAa1lteLDV2QAeqC8zrgH0gfV610xTUo+Pm8uYlqrCTPp0id3O0nqgM8 EO4A4dGanmX/FeoSb2uM1T+dObQeuLR3ZhqUU9X+KEIF6pPvyCLlK8E7wcVNgFeDCjlE Uraw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682642516; x=1685234516; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Ezb/mtBWXto18t5BMeGyfP88KWTxkli4FptKeY/x46M=; b=iO2Go+oTEN0oq05aKGX+Z49nlUsCrlfYil4M2k5PzcxaxAzKMoljtoeagtOpGDvXLl +2fKwXj0swYyIfxLzK6M+dP+xt9ppDEhOkEPF/zakFJ4zC785Gfx5edSed2o4oO4koiO C+xX/3/efXdKF5CxurbeqrCvyIpNa2YDYEVZLPmXFNzEPxNB2xSO/LJFW5801v5fhtgi 6r9YBZREXu8MpRMube5uyh4XqUpEY+hiz3GjAVO/ub4szo4dWaZCHk6ev+RVUCKbeaFi kxoJH4oJBFXIcnzh47xxXBy2RGZ+hkRSxvNtvckHqOig/5ONzkOQEYbyyXQKhNDtjPD+ W4LA== X-Gm-Message-State: AC+VfDye12lzLFh2f0zqxunRgGnotFU1GpQI1eAkwU0kxXh7FBmSEuPC wg8wSEczrS4F4wwc5W92dDxZJvXhSb0+zg== X-Google-Smtp-Source: ACHHUZ4uSTJSvROEy8sBOqqJIAEA7UBFzhEBDJSv45cu5mSBlwmubK7GVOYVl6ekjsL6kpRFWsParYbdrONO0g== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a63:2885:0:b0:513:953f:fee4 with SMTP id bs127-20020a632885000000b00513953ffee4mr815399pgb.10.1682642516710; Thu, 27 Apr 2023 17:41:56 -0700 (PDT) Date: Fri, 28 Apr 2023 00:41:39 +0000 In-Reply-To: <20230428004139.2899856-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230428004139.2899856-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.40.1.495.gc816e09b53d-goog Message-ID: <20230428004139.2899856-8-jiaqiyan@google.com> Subject: [RFC PATCH v1 7/7] selftest/mm: test PAGESIZE unmapping UFFD WP marker HWPOISON pages From: Jiaqi Yan To: mike.kravetz@oracle.com, peterx@redhat.com, naoya.horiguchi@nec.com Cc: songmuchun@bytedance.com, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com, rientjes@google.com, linmiaohe@huawei.com, shy828301@gmail.com, baolin.wang@linux.alibaba.com, wangkefeng.wang@huawei.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jiaqi Yan X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: nbugnt7c7iosfjci4jqn93u3h1idgx45 X-Rspamd-Queue-Id: 5B701140011 X-HE-Tag: 1682642517-624822 X-HE-Meta: U2FsdGVkX188yX30+75/yXR+ksg5VL/Gro7+z5b1eHpIo5rdIT9qNRdr65B5gwOYvcj6zq/z5nkIVIreqYZXA2kCweP2XaynJalc13yL3ypY9WF0ZhK4W3w4WdFY8iaa9cOhTWmzPoNN5116NhisZaySCr8y/muAI1jxVbsIcCQwmr90mhnAQe13rMHNLa0Y97dSTkw/R4euBVZS+AcbwiUmQlb12JR285rJMZpsDpmkl/+c8RauQagsUCspCLD5DSIn4yDES6L/6InfoEGRO8N/YVgGe5hR8Agi0y+K/zF7UrcuztdgsYyIXM/NdRpOs6TdJHZrZMv+G6txjfPu5B30i9RquIjw8Mt3YDH0w9kgFHhxeOsTYYy24M8kf3+JxKXOUs4m0KvOqMLjGC9ozfhrNgB2LWXUtP/Mr/3tObBQP5bzBOdVLS35PrJ+6d/5ohH3HLQvvaEr2gH5DG1O+tacQuIzXJqhACXhU29mCwMFLGeOaXfcmfNx7CbocUp03WDZVPl5Yo66GDtKPlMs4j21djdd7A4CrL7JP0drP9GbsbzdqFKdBSnJO5AdijijIVkPKVDjUmogwu9GHo3Q7hqlPzRIY0py4Qof0sbYzlyxwSn9ebcCJswtRuDi6Pj/PcP3dkezg84iZ7LTeOcGdIvEWPYHSYmV0blmji+btJGRaKZEMO7dlobhkBxsy+enjbH9BOPJj84JU/IQ/xlQ6OZqE3n8zkWhUy8PbSUdsFrsz3Zs0IC9zqzeM2hdMmbpXz+Jxad1wjT30FzBRHBNWTT4KnfjuiHThMOAf0TuJgXqmYIgo2YmU42Xvr6LaWnnZDrGyn7s8ZBGLm/dX89mfN6pPTBpT7/UQ/+KBYONzE6ttA31td6Dt8T2R3RBp66oeCAj74fvtD3eQ2yCUdXG2SztuNB0MSmzp2diSTTWnqUbJkNNQn4UONkITu0cGeEmdvTA1Mw77kgjIXX/eJz glTARqCF renHi7ps+YlWnmtfT6J7sxg4pfkUpn61DIFUqSipDMBxJMgEY+OR2JUQjuYXOBZn4C9ug5BTVEKbww5+9asmPxLY5v/S3ixDdUSJ+DfK54EUsu/c69DADX+c/gzr5EHJfAnBq4Rw81E0IpI5jP/QjGCR1e/URnwAjhjwwm0rcwNpSDy6oR4CrCnPSmFmR2i/GsDnncMGegQNvyOOiDhtCvbB4kHFPO73oI4dV1uaTsN8xxmjZia+m+xnEF0McMJsJAiw75lU7GZFCqYBNdzqVxVm8XxmSvXFa1sJZn6NX19Zyqf0mM8ijIKjAscc2FGDjYSDnnQ88bjTWASdVAC061AOc7Nkq3NwW7Echsx8R6bMZCUezam5oA+sOXlNqxRgen9rksSZOfgEEHU6COyMiF9nZTah03jvxE8DT97mFm552SyyGOSIguCIh05QoRtPr3cEebwgrBGMcNHN30e/N53ak423DHVNp47t2oSYQj21P/ZJgmw+PJg/yKKzjK0rcCD3+pb82Kla3fXMEyr28/slSXnSogEjZE3PRkHuHa6asit79X6tjvR7WJh4LNmFsKjmix+VyOcxP8idyLvfpu58ySHXe/NojBFv+m5l8t6GxONS3usqugONgOQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For not-yet-faulted hugepage containing HWPOISON raw page, test 1. only HWPOISON raw page will not be faulted, and a BUS_MCEERR_AR SIGBUS will be sent to userspace. 2. healthy raw pages are faulted in as normal. Since the hugepage has been writeprotect by UFFD, non BUS_MCEERR_AR SIGBUS will be sent to userspace. Signed-off-by: Jiaqi Yan --- tools/testing/selftests/mm/hugetlb-hgm.c | 170 +++++++++++++++++++++++ 1 file changed, 170 insertions(+) diff --git a/tools/testing/selftests/mm/hugetlb-hgm.c b/tools/testing/selftests/mm/hugetlb-hgm.c index bc9529986b66..81ee2d99fea8 100644 --- a/tools/testing/selftests/mm/hugetlb-hgm.c +++ b/tools/testing/selftests/mm/hugetlb-hgm.c @@ -515,6 +515,169 @@ static int uffd_register(int uffd, char *primary_map, unsigned long len, return ioctl(uffd, UFFDIO_REGISTER, ®); } +static int setup_present_map(char *present_map, size_t len) +{ + size_t offset = 0; + unsigned char iter = 0; + unsigned long pagesize = getpagesize(); + uint64_t size; + + for (size = len/2; size >= pagesize; + offset += size, size /= 2) { + iter++; + memset(present_map + offset, iter, size); + } + return 0; +} + +static enum test_status test_hwpoison_absent_uffd_wp(int fd, size_t hugepagesize, size_t len) +{ + int uffd; + char *absent_map, *present_map; + struct uffdio_api api; + int register_args; + struct sigaction new, old; + enum test_status status = TEST_SKIPPED; + const unsigned long pagesize = getpagesize(); + const unsigned long hwpoison_index = 128; + char *hwpoison_addr; + + if (hwpoison_index >= (len / pagesize)) { + printf(ERROR_PREFIX "hwpoison_index out of range"); + return TEST_FAILED; + } + + if (ftruncate(fd, len) < 0) { + perror(ERROR_PREFIX "ftruncate failed"); + return TEST_FAILED; + } + + uffd = userfaultfd(O_CLOEXEC); + if (uffd < 0) { + perror(ERROR_PREFIX "uffd not created"); + return TEST_FAILED; + } + + absent_map = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (absent_map == MAP_FAILED) { + perror(ERROR_PREFIX "mmap for ABSENT mapping failed"); + goto close_uffd; + } + printf(PREFIX "ABSENT mapping: %p\n", absent_map); + + api.api = UFFD_API; + api.features = UFFD_FEATURE_SIGBUS | UFFD_FEATURE_EXACT_ADDRESS | + UFFD_FEATURE_EVENT_FORK; + if (ioctl(uffd, UFFDIO_API, &api) == -1) { + perror(ERROR_PREFIX "UFFDIO_API failed"); + goto unmap_absent; + } + + /* + * Register with UFFDIO_REGISTER_MODE_WP to have UFFD WP bit on + * the HugeTLB page table entry. + */ + register_args = UFFDIO_REGISTER_MODE_MISSING | UFFDIO_REGISTER_MODE_WP; + if (uffd_register(uffd, absent_map, len, register_args)) { + perror(ERROR_PREFIX "UFFDIO_REGISTER failed"); + goto unmap_absent; + } + + new.sa_sigaction = &sigbus_handler; + new.sa_flags = SA_SIGINFO; + if (sigaction(SIGBUS, &new, &old) < 0) { + perror(ERROR_PREFIX "could not setup SIGBUS handler"); + goto unmap_absent; + } + + /* + * Set WP markers to the absent huge mapping. With HGM enabled in + * kernel CONFIG, memory_failure will enabled HGM in kernel, + * so no need to enable HGM from userspace. + */ + if (userfaultfd_writeprotect(uffd, absent_map, len, true) < 0) { + status = TEST_FAILED; + goto unmap_absent; + } + + status = TEST_PASSED; + + /* + * With MAP_SHARED hugetlb memory, we cna inject memory error to + * not-yet-faulted mapping (absent_map) by injecting memory error + * to a already faulted mapping (present_map). + */ + present_map = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (present_map == MAP_FAILED) { + perror(ERROR_PREFIX "mmap for non present mapping failed"); + goto close_uffd; + } + printf(PREFIX "PRESENT mapping: %p\n", present_map); + setup_present_map(present_map, len); + + hwpoison_addr = present_map + hwpoison_index * pagesize; + if (madvise(hwpoison_addr, pagesize, MADV_HWPOISON)) { + perror(PREFIX "MADV_HWPOISON a page in PRESENT mapping failed"); + status = TEST_FAILED; + goto unmap_present; + } + + printf(PREFIX "checking poisoned range [%p, %p) (len=%#lx) in PRESENT mapping\n", + hwpoison_addr, hwpoison_addr + pagesize, pagesize); + if (test_sigbus(hwpoison_addr, true) < 0) { + status = TEST_FAILED; + goto done; + } + printf(PREFIX "checking healthy pages in PRESENT mapping\n"); + unsigned long hwpoison_addrs[] = { + (unsigned long)hwpoison_addr, + (unsigned long)hwpoison_addr, + (unsigned long)hwpoison_addr + }; + status = verify_raw_pages(present_map, len, hwpoison_addrs); + if (status != TEST_PASSED) { + printf(ERROR_PREFIX "checking healthy pages failed\n"); + goto done; + } + + for (int i = 0; i < len; i += pagesize) { + if (i == hwpoison_index * pagesize) { + printf(PREFIX "checking poisoned range [%p, %p) (len=%#lx) in ABSENT mapping\n", + absent_map + i, absent_map + i + pagesize, pagesize); + if (test_sigbus(absent_map + i, true) < 0) { + status = TEST_FAILED; + break; + } + } else { + /* + * With UFFD_FEATURE_SIGBUS, we should get a SIGBUS for + * every not faulted (non present) page/byte. + */ + if (test_sigbus(absent_map + i, false) < 0) { + printf(PREFIX "checking healthy range [%p, %p) (len=%#lx) in ABSENT mapping failed\n", + absent_map + i, absent_map + i + pagesize, pagesize); + status = TEST_FAILED; + break; + } + } + } +done: + if (ftruncate(fd, 0) < 0) { + perror(ERROR_PREFIX "ftruncate back to 0 failed"); + status = TEST_FAILED; + } +unmap_present: + printf(PREFIX "Unmap PRESENT mapping=%p\n", absent_map); + munmap(present_map, len); +unmap_absent: + printf(PREFIX "Unmap ABSENT mapping=%p\n", absent_map); + munmap(absent_map, len); +close_uffd: + printf(PREFIX "Close UFFD\n"); + close(uffd); + return status; +} + enum test_type { TEST_DEFAULT, TEST_UFFDWP, @@ -744,6 +907,13 @@ int main(void) printf("HGM hwpoison test: %s\n", status_to_str(status)); if (status == TEST_FAILED) ret = -1; + + printf("HGM hwpoison UFFD-WP marker test...\n"); + status = test_hwpoison_absent_uffd_wp(fd, hugepagesize, len); + printf("HGM hwpoison UFFD-WP marker test: %s\n", + status_to_str(status)); + if (status == TEST_FAILED) + ret = -1; close: close(fd);