From patchwork Mon Feb 10 19:37:53 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13968888 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0461BC021A2 for ; Mon, 10 Feb 2025 19:38:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 708CD10E60E; Mon, 10 Feb 2025 19:38:53 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.b="Fczo2bUn"; dkim-atps=neutral Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by gabe.freedesktop.org (Postfix) with ESMTPS id CEF5010E60B for ; Mon, 10 Feb 2025 19:38:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739216331; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QxHe6mgzr5LViG+7+kvBnycnKkPt1kZTw9hZ/c9dl+o=; b=Fczo2bUn1bqy/z6jfhxwPL6DE8oSzk1fT7RlHskpTgesgrftsbo0CDDbYFm09+ZlpBBuwN r/KRLWLmFKZJ2C7BZB6Bk1TsM5ursnPQj794Ts7gGadP+WE4jSTjGB6skjR64+EqEwq+sl eZ1HAyKJmdphxzURVtYMScU4s8wtSjs= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-122-8_iESGBFNLum1WwkiBlrkw-1; Mon, 10 Feb 2025 14:38:47 -0500 X-MC-Unique: 8_iESGBFNLum1WwkiBlrkw-1 X-Mimecast-MFC-AGG-ID: 8_iESGBFNLum1WwkiBlrkw Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-38dcc9653caso862345f8f.2 for ; Mon, 10 Feb 2025 11:38:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739216326; x=1739821126; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QxHe6mgzr5LViG+7+kvBnycnKkPt1kZTw9hZ/c9dl+o=; b=C8XWs+i50KwM4cf5L07tbl5tM2aXtC+3Kcr7z2CrTRwpntiP783kMe5+ldPzGO+Wr0 72LekvEo3kiiTw+/Gv5LHDZGmBbU23VappA7EXZxeJZae1GMAN1+j/JD5vO8K8JyAL3y CKpIRQZhe0y33hzZfhjyFB3E22OWFOYJaoPxO0OJh7p7Z4QY2xozsy2i5TXO0CaRfpDQ KQMcfHOLqp2DnXXSBR5WhimLzv92Bcl/A8X4bhdSQugAzu84zrBfGjM3Xf38oqM6t2Ts TvFQq+R2S+eZM0O13qnoU7z2WJxRVIx5V+G5R50YF/PfgdjbuaDVkW/5qdI9quNNgZ+8 YPig== X-Forwarded-Encrypted: i=1; AJvYcCW9enNG4tGPR1mGZ2CZdoOxkL6GgFbnObupi6RwTAYhgygSDeYMCQMyQlHSrigDQzd6Ir5L5OZl1v8=@lists.freedesktop.org X-Gm-Message-State: AOJu0YxE+7idMKAUPY3r3/D//nAjP1wBmT6AQWDJGUdNyBRDFaCN+pQA Om4XqhCG2GRqBtoZ9QAnAbkm0wHoPjOiOkjK9f/NYWuPQWx98u0ta25Bc9PqUu3cw6Z7PIHZGvs q/eSQTMoBInXEga2UEmw9SmoaiEHNpYkMfKbbdSsONibo5293ZxLRWVHkol0NdN5AtA== X-Gm-Gg: ASbGncv6jNajmr6hzbFwRl7dBZpsRFiG4wWAgCNIYoM1Vz2JdqRNMy+ITczeM/Ay80D +3RKeaJCsig/3i++wHcJuT2hOQGeZQ+FnpFYnm7RLG90cf98dv0YOTnqHFUIzH4tNeIcRQpdKxz Wg+yx1ReBADuLsY9t48PbmdIVjwLwp12SCriDbXtgjRzNJKk+hLZxHj18bYI1UWJrq3tpib5xDx YyndjulOChLD/lYHna77B/6sP8CUoheOLgbCmqmOQ1nprd9Fp2Lm99BtxF7fJpmlH0zSmYBs40u a1LhY5D9See3W1eUNNKhWTpy36S/kWUGQnZ07tyw4vXd/DWdN0IY1RqM7texZ60pXg== X-Received: by 2002:a05:6000:2a6:b0:38d:dc4d:3473 with SMTP id ffacd0b85a97d-38ddc4d34b0mr6018876f8f.51.1739216326367; Mon, 10 Feb 2025 11:38:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IEI9wcsziW5nBYTCF/f9iYx8hWmPInDi/J4NMO1Qty0GFsip5PSFKOaGK1UrWp2GzbjhG1/jA== X-Received: by 2002:a05:6000:2a6:b0:38d:dc4d:3473 with SMTP id ffacd0b85a97d-38ddc4d34b0mr6018834f8f.51.1739216325915; Mon, 10 Feb 2025 11:38:45 -0800 (PST) Received: from localhost (p200300cbc734b80012c465cd348aaee6.dip0.t-ipconnect.de. [2003:cb:c734:b800:12c4:65cd:348a:aee6]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-38dd295200asm7894656f8f.44.2025.02.10.11.38.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Feb 2025 11:38:44 -0800 (PST) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-mm@kvack.org, nouveau@lists.freedesktop.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, damon@lists.linux.dev, David Hildenbrand , Andrew Morton , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Jonathan Corbet , Alex Shi , Yanteng Si , Karol Herbst , Lyude Paul , Danilo Krummrich , David Airlie , Simona Vetter , Masami Hiramatsu , Oleg Nesterov , Peter Zijlstra , SeongJae Park , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Pasha Tatashin , Peter Xu , Alistair Popple , Jason Gunthorpe Subject: [PATCH v2 11/17] mm/rmap: handle device-exclusive entries correctly in try_to_migrate_one() Date: Mon, 10 Feb 2025 20:37:53 +0100 Message-ID: <20250210193801.781278-12-david@redhat.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250210193801.781278-1-david@redhat.com> References: <20250210193801.781278-1-david@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: A1hJzkJBNmVAhMCGtzCQcBxSvCpdqfdZ58zEtG-D_fQ_1739216326 X-Mimecast-Originator: redhat.com content-type: text/plain; charset="US-ASCII"; x-default=true X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). try_to_migrate_one() is not prepared for that, so teach it about these PFN swap PTEs. We already handle device-private entries by specializing on the folio, so we can reshuffle that code to make it work on the PFN swap PTEs instead. Get rid of the folio_is_device_private() handling. Note that we never currently expect device-private folios with HWPoison flag set at that point, so add a warning in case that ever changes and we can figure out what the right thing to do is. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Further note that try_to_migrate() calls MMU notifiers and holds the folio lock, so any device-exclusive users should be properly prepared for a device-exclusive PTE to "vanish". Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand --- mm/rmap.c | 124 ++++++++++++++++++++++-------------------------------- 1 file changed, 51 insertions(+), 73 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 47142a656ae51..7c471c3ea64c4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2039,9 +2039,9 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, { struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); + bool anon_exclusive, writable, ret = true; pte_t pteval; struct page *subpage; - bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; unsigned long pfn; @@ -2108,24 +2108,19 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); - pfn = pte_pfn(ptep_get(pvmw.pte)); - - if (folio_is_zone_device(folio)) { - /* - * Our PTE is a non-present device exclusive entry and - * calculating the subpage as for the common case would - * result in an invalid pointer. - * - * Since only PAGE_SIZE pages can currently be - * migrated, just set it to page. This will need to be - * changed when hugepage migrations to device private - * memory are supported. - */ - VM_BUG_ON_FOLIO(folio_nr_pages(folio) > 1, folio); - subpage = &folio->page; + /* + * Handle PFN swap PTEs, such as device-exclusive ones, that + * actually map pages. + */ + pteval = ptep_get(pvmw.pte); + if (likely(pte_present(pteval))) { + pfn = pte_pfn(pteval); } else { - subpage = folio_page(folio, pfn - folio_pfn(folio)); + pfn = swp_offset_pfn(pte_to_swp_entry(pteval)); + VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); } + + subpage = folio_page(folio, pfn - folio_pfn(folio)); address = pvmw.address; anon_exclusive = folio_test_anon(folio) && PageAnonExclusive(subpage); @@ -2181,7 +2176,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, } /* Nuke the hugetlb page table entry */ pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); - } else { + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + writable = pte_write(pteval); + } else if (likely(pte_present(pteval))) { flush_cache_page(vma, address, pfn); /* Nuke the page table entry. */ if (should_defer_flush(mm, flags)) { @@ -2199,54 +2197,23 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } + if (pte_dirty(pteval)) + folio_mark_dirty(folio); + writable = pte_write(pteval); + } else { + pte_clear(mm, address, pvmw.pte); + writable = is_writable_device_private_entry(pte_to_swp_entry(pteval)); } - /* Set the dirty flag on the folio now the pte is gone. */ - if (pte_dirty(pteval)) - folio_mark_dirty(folio); + VM_WARN_ON_FOLIO(writable && folio_test_anon(folio) && + !anon_exclusive, folio); /* Update high watermark before we lower rss */ update_hiwater_rss(mm); - if (folio_is_device_private(folio)) { - unsigned long pfn = folio_pfn(folio); - swp_entry_t entry; - pte_t swp_pte; - - if (anon_exclusive) - WARN_ON_ONCE(folio_try_share_anon_rmap_pte(folio, - subpage)); + if (PageHWPoison(subpage)) { + VM_WARN_ON_FOLIO(folio_is_device_private(folio), folio); - /* - * Store the pfn of the page in a special migration - * pte. do_swap_page() will wait until the migration - * pte is removed and then restart fault handling. - */ - entry = pte_to_swp_entry(pteval); - if (is_writable_device_private_entry(entry)) - entry = make_writable_migration_entry(pfn); - else if (anon_exclusive) - entry = make_readable_exclusive_migration_entry(pfn); - else - entry = make_readable_migration_entry(pfn); - swp_pte = swp_entry_to_pte(entry); - - /* - * pteval maps a zone device page and is therefore - * a swap pte. - */ - if (pte_swp_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_swp_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, pvmw.address, pvmw.pte, swp_pte); - trace_set_migration_pte(pvmw.address, pte_val(swp_pte), - folio_order(folio)); - /* - * No need to invalidate here it will synchronize on - * against the special swap migration pte. - */ - } else if (PageHWPoison(subpage)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(folio_nr_pages(folio), mm); @@ -2256,8 +2223,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, dec_mm_counter(mm, mm_counter(folio)); set_pte_at(mm, address, pvmw.pte, pteval); } - - } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) { + } else if (likely(pte_present(pteval)) && pte_unused(pteval) && + !userfaultfd_armed(vma)) { /* * The guest indicated that the page content is of no * interest anymore. Simply discard the pte, vmscan @@ -2273,6 +2240,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, swp_entry_t entry; pte_t swp_pte; + /* + * arch_unmap_one() is expected to be a NOP on + * architectures where we could have PFN swap PTEs, + * so we'll not check/care. + */ if (arch_unmap_one(mm, vma, address, pteval) < 0) { if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, @@ -2283,8 +2255,6 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, page_vma_mapped_walk_done(&pvmw); break; } - VM_BUG_ON_PAGE(pte_write(pteval) && folio_test_anon(folio) && - !anon_exclusive, subpage); /* See folio_try_share_anon_rmap_pte(): clear PTE first. */ if (folio_test_hugetlb(folio)) { @@ -2309,7 +2279,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * pte. do_swap_page() will wait until the migration * pte is removed and then restart fault handling. */ - if (pte_write(pteval)) + if (writable) entry = make_writable_migration_entry( page_to_pfn(subpage)); else if (anon_exclusive) @@ -2318,15 +2288,23 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, else entry = make_readable_migration_entry( page_to_pfn(subpage)); - if (pte_young(pteval)) - entry = make_migration_entry_young(entry); - if (pte_dirty(pteval)) - entry = make_migration_entry_dirty(entry); - swp_pte = swp_entry_to_pte(entry); - if (pte_soft_dirty(pteval)) - swp_pte = pte_swp_mksoft_dirty(swp_pte); - if (pte_uffd_wp(pteval)) - swp_pte = pte_swp_mkuffd_wp(swp_pte); + if (likely(pte_present(pteval))) { + if (pte_young(pteval)) + entry = make_migration_entry_young(entry); + if (pte_dirty(pteval)) + entry = make_migration_entry_dirty(entry); + swp_pte = swp_entry_to_pte(entry); + if (pte_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + } else { + swp_pte = swp_entry_to_pte(entry); + if (pte_swp_soft_dirty(pteval)) + swp_pte = pte_swp_mksoft_dirty(swp_pte); + if (pte_swp_uffd_wp(pteval)) + swp_pte = pte_swp_mkuffd_wp(swp_pte); + } if (folio_test_hugetlb(folio)) set_huge_pte_at(mm, address, pvmw.pte, swp_pte, hsz);