From patchwork Sat Aug 5 10:12:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13342560 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3A0EEB64DD for ; Sat, 5 Aug 2023 10:13:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B217A8D0002; Sat, 5 Aug 2023 06:13:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD1AD8D0001; Sat, 5 Aug 2023 06:13:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C06D8D0002; Sat, 5 Aug 2023 06:13:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8BFAB8D0001 for ; Sat, 5 Aug 2023 06:13:54 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4EF171A019C for ; Sat, 5 Aug 2023 10:13:54 +0000 (UTC) X-FDA: 81089639988.03.C52A4AE Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf23.hostedemail.com (Postfix) with ESMTP id 8FF6114001D for ; Sat, 5 Aug 2023 10:13:52 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=OhvsdyCd; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf23.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691230432; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=gzcbtTcLRTqfUr+sJMtFYMn2Exc1ZZ92jeIwz7XOB2k=; b=VGxYNa7NaaDRWDlRq/5Hr6YVEJoNfvp/wOgq2N0SIsCJTSAQyyGSC3Hx1GJ9G+oVlL42+1 tPzArujz87OLkG0FJbrc4EtkrA6d1psh750f52sCa6276W5OzeQokozS1au2FOjF93roec ROOd2vPiyvXZOEJJR8uLu1Ub/yIYYcY= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=OhvsdyCd; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf23.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691230432; a=rsa-sha256; cv=none; b=32rB/yJdhhEr7iIx4+/dK34HcpS5l594zwxObWkKsTBwIDdMKkS9PO6Gfw1p9byIV/assb JPdl/9huQBAOcSFUZ55ynaFXrU3WGrmH0WsSJZ9WQ1ZC1c7wdR8uMwkx4ezucvknZn24mx Q5AIiHJQ9OC99VhOMANlcQYdewUwYvE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1691230431; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=gzcbtTcLRTqfUr+sJMtFYMn2Exc1ZZ92jeIwz7XOB2k=; b=OhvsdyCdHU9dn07uEttpwASwJUhaUDwbkz3izec+OH0HZ/k3L0v2f4fWpPVnYa4+5K3Es0 fqqK4C8nU+nnwpBAcGuhVzV044BY4zuo5TTyr2RnWhonfWexrN02RiF5VpzMP6f8UZh9BK JeHFVyIKBmvjZMnjIpmq6xcCIgPmM48= Received: from mimecast-mx02.redhat.com (66.187.233.73 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-621-RDbso_ouOr6OYGzsgSY-Zw-1; Sat, 05 Aug 2023 06:13:46 -0400 X-MC-Unique: RDbso_ouOr6OYGzsgSY-Zw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B72CD3C0DDA6; Sat, 5 Aug 2023 10:13:45 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.192.47]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4B42EF7FBA; Sat, 5 Aug 2023 10:12:57 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Ryan Roberts , stable@vger.kernel.org, Andrew Morton , Vlastimil Babka , John Hubbard , Jason Gunthorpe , Peter Xu , Mike Kravetz Subject: [PATCH v1] mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast Date: Sat, 5 Aug 2023 12:12:56 +0200 Message-ID: <20230805101256.87306-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Rspam-User: X-Stat-Signature: xf67mcga57m8wzuo55go8p4qep68yzf1 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 8FF6114001D X-HE-Tag: 1691230432-814103 X-HE-Meta: U2FsdGVkX18wcHfo+jrIsYU0MMz0NVUMkK7W2UVxjljiEtbxVmmdG0NeMovlC0+ZZpbwVts5RHS2hRE2uosTm+JgiA+dvEuX+LPUdsQ0A3yUcjbmLeWaoClth6zx9m4ZlRXTf3ILSHOI0EZwWBSL26wbVtg7bXBYGIVM6gX+7eVsWbihBgd+0ivg5RqBJ2llin35GAFnV/aX9pBx4sQ4oLbQ+SZUewg8gPoPBfK6dD29DPrtj1LFwK+QVP/GygFI2jLXCkSok0oPTftyOcT+9iN4yuXAM8Enr6UnEMlswNPBJ6Y1XXncT9nUFmbM1Dtmk76eclEEMRv2Jd4GgENisTN9PycFqEeTCx7rrbUxIyV6gUEIfqvOLLvlAFkNmzpCTjiymnJZtZAV/q4tsEvcE8vjPJPcj2+K8TFREayE/Uit/W5wt0TP793BSq3SOxLBxJv0ZuUJnx7JuJcwxO/pliLCNs0oKU+/TJd+Y+ei6dXkWaZU2b4vRNNXS7Vxd+It2UbuOZiDtqVD21PpMzC0p7tJ3/S2agrcMDPYPHfa+qoDIO9GLNrGHpFP7fYw+gfKesDI7RMu7jGDFDvuPdg5nzfLw2Jz079Tgsp3kIyp3VnCiWbIDQgzeF05hD2KDJsT185PRvHL3zCpWm3e43VRnFdSDKO98ORPuxypl8a7g/H3QNje72gh+vUleuzDQ6OSX9MBbEz82CZP1GUcHEP/E9Hk7QBynwdevdHUSriqA9evUrsaQekJaAYfBTpb0uVz63h1wFo3NbzqcJPd26/tBAyAqp0Irfwt14B9OrY14cp6I56yDgEQvg3FyQ8u1lBmhyRFax4ujQIazaptmzm/1Ph/XNJsAb8OAzQgTkiLwC2avlxy5j1fsa7zwiRW0eEwNW5DhIb1UcDRVUDCmgebKOCUo3vaEXPoqkbkNTz6TZBgcg2D0N9HaYmGMgsfG5Q0CfRVMiebrxX8A+Iqwbd DehKYVbv x5wpLFchn7ghrF/GBVpD3BqHViB8LBMjD5sl3FnlrHazMNcqxoHGLTmdvzzJZHd4qZrfUuXhp+gtA1gI/NkGu0R+Eso1HfLXEBReaSHYImA5usQlgw6C/n29dJmVQJKUQ1WdSdVptNLDkgLM83haWaQYBYBRXlm8ZB0hk744Ux659Nz8MJGZp3mwuMTV8qwsHsk9IYapSOvidBnM18hX39rpP+sQSzPHq54mTE1VyHpTg3Lo2VC9u+n0bMZ33HdUPeTaUy7kF7AVqhq02pfd4XVQRF/AMA0BHuxwVEyoDwVKxOWfR7y79LtPZZKRWZeKjDoZ2W9c/bRtFA5dG8uLK703qxl6tf9/eHO1/cUljSlTan/NoXd+ucvveMb5/poWadCIKOdREK4RQ4Srz/q69FATlOcF3ofu8sQcnbV/cgxpEau1t9kZM7g56N8gN6HQUnEVN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In contrast to most other GUP code, GUP-fast common page table walking code like gup_pte_range() also handles hugetlb pages. But in contrast to other hugetlb page table walking code, it does not look at the hugetlb PTE abstraction whereby we have only a single logical hugetlb PTE per hugetlb page, even when using multiple cont-PTEs underneath -- which is for example what huge_ptep_get() abstracts. So when we have a hugetlb page that is mapped via cont-PTEs, GUP-fast might stumble over a PTE that does not map the head page of a hugetlb page -- not the first "head" PTE of such a cont mapping. Logically, the whole hugetlb page is mapped (entire_mapcount == 1), but we might end up calling gup_must_unshare() with a tail page of a hugetlb page. We only maintain a single PageAnonExclusive flag per hugetlb page (as hugetlb pages cannot get partially COW-shared), stored for the head page. That flag is clear for all tail pages. So when gup_must_unshare() ends up calling PageAnonExclusive() with a tail page of a hugetlb page: 1) With CONFIG_DEBUG_VM_PGFLAGS Stumbles over the: VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page); For example, when executing the COW selftests with 64k hugetlb pages on arm64: [ 61.082187] page:00000000829819ff refcount:3 mapcount:1 mapping:0000000000000000 index:0x1 pfn:0x11ee11 [ 61.082842] head:0000000080f79bf7 order:4 entire_mapcount:1 nr_pages_mapped:0 pincount:2 [ 61.083384] anon flags: 0x17ffff80003000e(referenced|uptodate|dirty|head|mappedtodisk|node=0|zone=2|lastcpupid=0xfffff) [ 61.084101] page_type: 0xffffffff() [ 61.084332] raw: 017ffff800000000 fffffc00037b8401 0000000000000402 0000000200000000 [ 61.084840] raw: 0000000000000010 0000000000000000 00000000ffffffff 0000000000000000 [ 61.085359] head: 017ffff80003000e ffffd9e95b09b788 ffffd9e95b09b788 ffff0007ff63cf71 [ 61.085885] head: 0000000000000000 0000000000000002 00000003ffffffff 0000000000000000 [ 61.086415] page dumped because: VM_BUG_ON_PAGE(PageHuge(page) && !PageHead(page)) [ 61.086914] ------------[ cut here ]------------ [ 61.087220] kernel BUG at include/linux/page-flags.h:990! [ 61.087591] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 61.087999] Modules linked in: ... [ 61.089404] CPU: 0 PID: 4612 Comm: cow Kdump: loaded Not tainted 6.5.0-rc4+ #3 [ 61.089917] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 61.090409] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 61.090897] pc : gup_must_unshare.part.0+0x64/0x98 [ 61.091242] lr : gup_must_unshare.part.0+0x64/0x98 [ 61.091592] sp : ffff8000825eb940 [ 61.091826] x29: ffff8000825eb940 x28: 0000000000000000 x27: fffffc00037b8440 [ 61.092329] x26: 0400000000000001 x25: 0000000000080101 x24: 0000000000080000 [ 61.092835] x23: 0000000000080100 x22: ffff0000cffb9588 x21: ffff0000c8ec6b58 [ 61.093341] x20: 0000ffffad6b1000 x19: fffffc00037b8440 x18: ffffffffffffffff [ 61.093850] x17: 2864616548656761 x16: 5021202626202965 x15: 6761702865677548 [ 61.094358] x14: 6567615028454741 x13: 2929656761702864 x12: 6165486567615021 [ 61.094858] x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffffd9e958b7a1c0 [ 61.095359] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 00000000002bffa8 [ 61.095873] x5 : ffff0008bb19e708 x4 : 0000000000000000 x3 : 0000000000000000 [ 61.096380] x2 : 0000000000000000 x1 : ffff0000cf6636c0 x0 : 0000000000000046 [ 61.096894] Call trace: [ 61.097080] gup_must_unshare.part.0+0x64/0x98 [ 61.097392] gup_pte_range+0x3a8/0x3f0 [ 61.097662] gup_pgd_range+0x1ec/0x280 [ 61.097942] lockless_pages_from_mm+0x64/0x1a0 [ 61.098258] internal_get_user_pages_fast+0xe4/0x1d0 [ 61.098612] pin_user_pages_fast+0x58/0x78 [ 61.098917] pin_longterm_test_start+0xf4/0x2b8 [ 61.099243] gup_test_ioctl+0x170/0x3b0 [ 61.099528] __arm64_sys_ioctl+0xa8/0xf0 [ 61.099822] invoke_syscall.constprop.0+0x7c/0xd0 [ 61.100160] el0_svc_common.constprop.0+0xe8/0x100 [ 61.100500] do_el0_svc+0x38/0xa0 [ 61.100736] el0_svc+0x3c/0x198 [ 61.100971] el0t_64_sync_handler+0x134/0x150 [ 61.101280] el0t_64_sync+0x17c/0x180 [ 61.101543] Code: aa1303e0 f00074c1 912b0021 97fffeb2 (d4210000) 2) Without CONFIG_DEBUG_VM_PGFLAGS Always detects "not exclusive" for passed tail pages and refuses to PIN the tail pages R/O, as gup_must_unshare() == true. GUP-fast will fallback to ordinary GUP. As ordinary GUP properly considers the logical hugetlb PTE abstraction in hugetlb_follow_page_mask(), pinning the page will succeed when looking at the PageAnonExclusive on the head page only. So the only real effect of this is that with cont-PTE hugetlb pages, we'll always fallback from GUP-fast to ordinary GUP when not working on the head page, which ends up checking the head page and do the right thing. Consequently, the cow selftests pass with cont-PTE hugetlb pages as well without CONFIG_DEBUG_VM_PGFLAGS. Note that this only applies to anon hugetlb pages that are mapped using cont-PTEs: for example 64k hugetlb pages on a 4k arm64 kernel. ... and only when R/O-pinning (FOLL_PIN) such pages that are mapped into the page table R/O using GUP-fast. On production kernels (and even most debug kernels, that don't set CONFIG_DEBUG_VM_PGFLAGS) this patch should theoretically not be required to be backported. But of course, it does not hurt. Reported-by: Ryan Roberts Fixes: a7f226604170 ("mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page") Cc: Cc: Andrew Morton Cc: Vlastimil Babka Cc: John Hubbard Cc: Jason Gunthorpe Cc: Peter Xu Cc: Mike Kravetz Signed-off-by: David Hildenbrand Reviewed-by: Ryan Roberts Tested-by: Ryan Roberts --- mm/internal.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/mm/internal.h b/mm/internal.h index a7d9e980429a..fe242dd0b72c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -997,6 +997,16 @@ static inline bool gup_must_unshare(struct vm_area_struct *vma, if (IS_ENABLED(CONFIG_HAVE_FAST_GUP)) smp_rmb(); + /* + * During GUP-fast we might not get called on the head page for a + * hugetlb page that is mapped using cont-PTE, because GUP-fast does + * not work with the abstracted hugetlb PTEs that always point at the + * head page. For hugetlb, PageAnonExclusive only applies on the head + * page (as it cannot be partially COW-shared), so lookup the head page. + */ + if (unlikely(!PageHead(page) && PageHuge(page))) + page = compound_head(page); + /* * Note that PageKsm() pages cannot be exclusive, and consequently, * cannot get pinned.