From patchwork Fri Apr 28 00:41:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiaqi Yan X-Patchwork-Id: 13225916 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70C38C77B7C for ; Fri, 28 Apr 2023 00:41:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EEBB16B0074; Thu, 27 Apr 2023 20:41:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E9ACB900002; Thu, 27 Apr 2023 20:41:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D8CDF6B0078; Thu, 27 Apr 2023 20:41:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id CC3206B0074 for ; Thu, 27 Apr 2023 20:41:51 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 95DB2AD1ED for ; Fri, 28 Apr 2023 00:41:51 +0000 (UTC) X-FDA: 80728947222.29.3386F88 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf15.hostedemail.com (Postfix) with ESMTP id C79E0A0005 for ; Fri, 28 Apr 2023 00:41:49 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=3F224aKz; spf=pass (imf15.hostedemail.com: domain of 3TBZLZAgKCM021t91Ht6z77z4x.v75416DG-553Etv3.7Az@flex--jiaqiyan.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3TBZLZAgKCM021t91Ht6z77z4x.v75416DG-553Etv3.7Az@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682642509; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8bYeQzrUNDYZNhkuugE6jf2Akn1BdGdutTjwyn1e4o8=; b=4Iqxzsl0rZVEWCCFJmLooCm8O9cg5N4FatTJkuwz1iKqLGxWUomus9pX6jD6K58N6ncaWA HIZQYG+1kHvQvNHzT+m27KyCFkCPIIJEuAV4CY7LQWzL9MEG2Vk93cq/d1w3T+yOxUrWdE fcdFYr7oJcfs54gMd6M6YBL8qty4DxQ= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=3F224aKz; spf=pass (imf15.hostedemail.com: domain of 3TBZLZAgKCM021t91Ht6z77z4x.v75416DG-553Etv3.7Az@flex--jiaqiyan.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3TBZLZAgKCM021t91Ht6z77z4x.v75416DG-553Etv3.7Az@flex--jiaqiyan.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682642509; a=rsa-sha256; cv=none; b=EgGRzcW4RBzUqTtcVXBj1GnfFSml4DyDaaJEEwEH99d1PCmqdv9cOsjw07MnljfwjV320o yQzn5iPHFNB0SNwySngZs1AFEC7TThmB1b/nSydv8Lbo2IUKHh45mHa+/MSuV0b7fw0red 0chp6i970Cb873UpIT0ZLdDnt08Q4uo= Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2474acaeaf8so5119566a91.1 for ; Thu, 27 Apr 2023 17:41:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682642508; x=1685234508; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8bYeQzrUNDYZNhkuugE6jf2Akn1BdGdutTjwyn1e4o8=; b=3F224aKzgYOc/AvfHDwiitWzc6njaindjq7TzzaiPro6lvt8bS0DSSV6QcWa4f2xh7 OCnqi82Pv2zBPGZRbSpyfGekXLluJqCZbi0SwBFNzDNX1eKvR5Qfu8UGApuRJSegTD23 Wf2t/MaVERhWf96SaiPu5FGxffT+fMBXn6u0g35bMo7NXOc0+4rjPu9+Py3aLIXOJajh UJQg241cRBfw3XHvvrB95hKTnaifEYCXS1lCQau+Z9JvPKFx1KSnltCBZWdCG+eStjbn Omnk245IfYkhnNTfMfBDS7p3Nby60Fh1QgneJKJYHONxX26YbbNS96JQhxWa0+pZZPH8 XV9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682642508; x=1685234508; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8bYeQzrUNDYZNhkuugE6jf2Akn1BdGdutTjwyn1e4o8=; b=dwHtOeS6OK3z6wSwSqrf3vf01Tl05BV0BFp1YVgqVWRc3Dn7qXbjNF3l7bIB6wV363 1G95xRLJRA672qSsMrLR1s+oYPCxthrQ+TDlP9vRox/8ZeRTgecJnlYeouDtK6u8gXvo Z7DoVLJ7gNZC2BTmjId29IqVCB4pUiIGrL9Cg6hNbVNmI1c9ivLt5o5pO6xm7g2V2Yt+ tVxgB0VK+FNIt8C43YGvK+YS3YzWGRdcUYOlUAbRJ518ccBRKVhAlTliFywSv2+oUJUf rsPAD9LFyFZwl5TdO0CpmrATidDfBl9U/7XwsTq07Fx++evmmqAdlzKLOfxdcU75QblD 7MVQ== X-Gm-Message-State: AC+VfDyrzMpSapqLuXRmDOWZ/xWUyZvemkooS1Jl/wjB4ko46TWwkCiF 0k18t1J5MXiCfuBPz857diiHqPYVZkTgrA== X-Google-Smtp-Source: ACHHUZ5+4VboIN/jABrhB8fFk35/7s+Bqip+YL9U3H3a9Ag2csktvsv65xDsOQoeBdKjAUSnnyd2diBmQx0p/A== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a17:90a:304b:b0:247:1639:9650 with SMTP id q11-20020a17090a304b00b0024716399650mr1011181pjl.2.1682642508523; Thu, 27 Apr 2023 17:41:48 -0700 (PDT) Date: Fri, 28 Apr 2023 00:41:34 +0000 In-Reply-To: <20230428004139.2899856-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230428004139.2899856-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.40.1.495.gc816e09b53d-goog Message-ID: <20230428004139.2899856-3-jiaqiyan@google.com> Subject: [RFC PATCH v1 2/7] hugetlb: create PTE level mapping when possible From: Jiaqi Yan To: mike.kravetz@oracle.com, peterx@redhat.com, naoya.horiguchi@nec.com Cc: songmuchun@bytedance.com, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com, rientjes@google.com, linmiaohe@huawei.com, shy828301@gmail.com, baolin.wang@linux.alibaba.com, wangkefeng.wang@huawei.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jiaqi Yan X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C79E0A0005 X-Rspam-User: X-Stat-Signature: 71i318mwsb518w5ymx5nitbwqd38e7tb X-HE-Tag: 1682642509-505874 X-HE-Meta: U2FsdGVkX19e2Q6OowXmtJZBFcgoaXGBBMbXHc+KWRK4dPuC913R7h7vuiCcPGKLuVFhnzteGPIWg0V73Z5TFLssawFA9UqEFV3dDSOLpbPbyoX+nGcGeIZtqboeSGBZUx8qPgSVq1mFqfxIJx11HnMIYc1iWnKa9L4i4UHP/n5A7DAjE9ZfGYUmo9KVCu86Mwx7yUSQwUN0lkbfwknYefWf8dLtpUu2snD/D8kw26gmzQMcGzaYDmyF1wK5noA2R3QnLEfSsojTFbB3Md5spLh7CbvP0xhOehP9KxY3Sey5PS2JLNAo3c8x3jfkaiDv81SsPg3VaTqyhZlsyjPXUUI6BDYuAjOWKKnEtWabmaKdnwMfs1G9Wxi22ZnR/LCcDEYUWAefF/ugrVhEU/NIxF3s7WXg9bNIPA1ChZZECBqM6xCNfIHp1BTxL/lX+O8bvcQ2llgyJxpYPqbxWJGTBdi9/XgyqU5P+vYcSewFaPcrBEYj5jWEIELKNGtmenQK14kInEhLvns8FPSYhnt4O0zRmWVpsdniswi8X9QvvGhs9zUiM2ZYyuQM/8uDVm+gzy8m+OfzQXx69+agRNWUswsWR+crTRZc5Re4i6128pMh/OqYuClC/pYAUdjckyh2oCww2F0ZXpqvUvyN6kc+FQ/NJRXjeInBjr1tb78r4ylv+c5uGx1cUPtumwqSorTjVTSLSK+yEwe4crjm6sAGhgmTbWNQfuNeIWMBVaz/gfSsM4YzPXls+yIQSdF42oRg1y7IUhCUBCZR83SZMpvP90Nbmdtru96Bp91yJYBKaBAKkNnQ5PMpfzAGoJhnby3PoDPMMgxiyFfKtgt5EZgfOT/nlymZzVnV3pwHOFjuj0Ge3xAw5Q+K+GvHtAIT+c9oc37leLH7AUpp1Bl98pxESiVnwbGMt2ylYIQTaXO8XDH21dNMUCdLla4YgOBLUubdBhTLSKovF1YGeWPJaoR J65gTXc2 KRObXIr1dGElowu0FZSfiwVMUK11O0qz+DzkTsdITeFXX4o1nHJ+5I5H4rgZt6QQWRXT/lzHLX1bTPdAIlsdL87v+VJVHOATIMx3qYTgJTHXKbOajIA/zofj8uAg2vYYew5DTIXMzD+grGIGZ54tl/e8i9VBLyQaoCx3PX623cCbnJoKz5XpvgxynC5lvGqJ1Uz5W86LwXb8Bvj08INOl9ilqt+Mkt+XrecTE728OO/N0t6I91+3dzX8rHMZUkvrOVe7oIeLACHjxwHhfFeT+3awFoJjSAU/cMv+ouzEfZ4ZXOCHqHzGiheHeeoeDrjZQANwAH4JIZiSeo1DnhJfXgAd1KItb3Ex7CCiyX3acQ+ooAn0QsIR6fz7SnCIGBS7yU8wR1OqV4y3YWMy+alOuwr8rqfAC3xLntgDB6HChiWQrVlu+RvQRH6rpfIbKh4e+lv6mKAWwMfnQnRP+48ToUzDjofE80EbhzfOR/ydZ8GtljdRSl/+HQjbezR7ckP9OmTJoowKbggY0VR8tdUzKJQ95Wv54NNdEIYnNHP8PFvKaUv07QSJVbzWEvuLcE91y9KcuRU7LL1D8+taMyoobNcia/iqEJy+ClbI5Xaxj/s8o/np9k5i69w6Z3A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In memory_failure handling, for each VMA that the HWPOISON HugeTLB page mapped to, enable HGM if eligible, then split the P*D mapped hugepage to smaller PTEs. try_to_unmap still unmaps the entire hugetlb page, one PTE by one PTE, at levels smaller than original P*D. For example, if a hugepage was original mapped at PUD size, it will be split into PMDs and PTEs, and all of these PMDs and PTEs will be unmapped. The next commit will only unmap the raw HWPOISON PTE. For VMA that is not HGM eligible, or failed to enable HGM, or failed to split hugepage mapping, the hugepage is still mapped by its original P*D then unmapped at this P*D. Signed-off-by: Jiaqi Yan --- include/linux/hugetlb.h | 5 +++ mm/hugetlb.c | 27 ++++++++++++++++ mm/memory-failure.c | 68 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 100 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d44bf6a794e5..03074b23c396 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1266,6 +1266,7 @@ int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, unsigned long end); int hugetlb_collapse(struct mm_struct *mm, unsigned long start, unsigned long end); +int hugetlb_enable_hgm_vma(struct vm_area_struct *vma); int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma, struct hugetlb_pte *hpte, unsigned long addr, unsigned int desired_shift); @@ -1295,6 +1296,10 @@ int hugetlb_collapse(struct mm_struct *mm, unsigned long start, { return -EINVAL; } +int hugetlb_enable_hgm_vma(struct vm_area_struct *vma) +{ + return -EINVAL; +} int hugetlb_split_to_shift(struct mm_struct *mm, struct vm_area_struct *vma, const struct hugetlb_pte *hpte, unsigned long addr, unsigned int desired_shift) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d3f3f1c2d293..1419176b7e51 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -8203,6 +8203,33 @@ int hugetlb_collapse(struct mm_struct *mm, unsigned long start, return ret; } +int hugetlb_enable_hgm_vma(struct vm_area_struct *vma) +{ + if (hugetlb_hgm_enabled(vma)) + return 0; + + if (!is_vm_hugetlb_page(vma)) { + pr_warn("VMA=[%#lx, %#lx) is not HugeTLB\n", + vma->vm_start, vma->vm_end); + return -EINVAL; + } + + if (!hugetlb_hgm_eligible(vma)) { + pr_warn("VMA=[%#lx, %#lx) is not HGM eligible\n", + vma->vm_start, vma->vm_end); + return -EINVAL; + } + + hugetlb_unshare_all_pmds(vma); + + /* + * TODO: add the ability to tell if HGM is enabled by kernel + * (for HWPOISON unmapping) or by userspace (via MADV_SPLIT). + */ + vm_flags_set(vma, VM_HUGETLB_HGM); + return 0; +} + /* * Find the optimal HugeTLB PTE shift that @desired_addr could be mapped at. */ diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 0b37cbc6e8ae..eb5579b6787e 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1479,6 +1479,73 @@ static int get_hwpoison_page(struct page *p, unsigned long flags) return ret; } +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +/* + * For each HGM-eligible VMA that the poisoned page mapped to, create new + * HGM mapping for hugepage @folio and make sure @poisoned_page is mapped + * by a PAGESIZE level PTE. Caller (hwpoison_user_mappings) must ensure + * 1. folio's address space (mapping) is locked in write mode. + * 2. folio is locked. + */ +static void try_to_split_huge_mapping(struct folio *folio, + struct page *poisoned_page) +{ + struct address_space *mapping = folio_mapping(folio); + pgoff_t pgoff_start; + pgoff_t pgoff_end; + struct vm_area_struct *vma; + unsigned long poisoned_addr; + unsigned long head_addr; + struct hugetlb_pte hpte; + + if (WARN_ON(!mapping)) + return; + + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + + pgoff_start = folio_pgoff(folio); + pgoff_end = pgoff_start + folio_nr_pages(folio) - 1; + + vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff_start, pgoff_end) { + /* Enable HGM on HGM-eligible VMAs. */ + if (!hugetlb_hgm_eligible(vma)) + continue; + + i_mmap_assert_locked(vma->vm_file->f_mapping); + if (hugetlb_enable_hgm_vma(vma)) { + pr_err("Failed to enable HGM on eligible VMA=[%#lx, %#lx)\n", + vma->vm_start, vma->vm_end); + continue; + } + + poisoned_addr = vma_address(poisoned_page, vma); + head_addr = vma_address(folio_page(folio, 0), vma); + /* + * Get the hugetlb_pte of the PUD-mapped hugepage first, + * then split the PUD entry into PMD + PTE entries. + * + * Both getting original huge PTE and splitting requires write + * lock on vma->vm_file->f_mapping, which caller + * (e.g. hwpoison_user_mappings) should already acquired. + */ + if (hugetlb_full_walk(&hpte, vma, head_addr)) + continue; + + if (hugetlb_split_to_shift(vma->vm_mm, vma, &hpte, + poisoned_addr, PAGE_SHIFT)) { + pr_err("Failed to split huge mapping: pfn=%#lx, vaddr=%#lx in VMA=[%#lx, %#lx)\n", + page_to_pfn(poisoned_page), poisoned_addr, + vma->vm_start, vma->vm_end); + } + } +} +#else +static void try_to_split_huge_mapping(struct folio *folio, + struct page *poisoned_page) +{ +} +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + /* * Do all that is necessary to remove user space mappings. Unmap * the pages and send SIGBUS to the processes if the data was dirty. @@ -1555,6 +1622,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn, */ mapping = hugetlb_page_mapping_lock_write(hpage); if (mapping) { + try_to_split_huge_mapping(folio, p); try_to_unmap(folio, ttu|TTU_RMAP_LOCKED); i_mmap_unlock_write(mapping); } else