From patchwork Thu Jun 20 21:29:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13706417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CE9EC27C79 for ; Thu, 20 Jun 2024 21:29:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A227D8D00E9; Thu, 20 Jun 2024 17:29:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A9E18D00D5; Thu, 20 Jun 2024 17:29:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D4A58D00E9; Thu, 20 Jun 2024 17:29:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 594E08D00D5 for ; Thu, 20 Jun 2024 17:29:50 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 092E01C1290 for ; Thu, 20 Jun 2024 21:29:50 +0000 (UTC) X-FDA: 82252559340.23.65B8877 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 57B8F1A0010 for ; Thu, 20 Jun 2024 21:29:48 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AeTB0iFN; spf=pass (imf19.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718918978; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WE7XSbE5eRnL7QGI4j2KBnNO3JP8DLj8XFRYb1hlHtY=; b=GN9DEY8TzAhFe5MmpfHG8DLC97PdidARzxTSk+BH2ckGAdV+jhj6Cer2a0uW5ZQgR/wQjk qlZKHYwxoEquhGhOj/U9zOEBcOvu2uLobPMZZqPVp75S43Viq1zJKSk5TeLRwjZ+IfZqoi 1t6n5/7EOoZ0R41MdEMnHY7eLRUjxd4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718918978; a=rsa-sha256; cv=none; b=gIYQ0y2Yk39JbMxWjNUukfbYf2wf0IgqrvCpW1fcDHGCr9bqn5oACubIUzxYhiIUSB+YU+ HXsakRgK385zMNbUarkNVSs6ENQRNIqVhc7Bcs8r4WQz4iG6AuLSk7XMiSXLDtitW9dgJ+ puIeATN6dBpE8kV4BeFVLudp9dMx4Z8= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AeTB0iFN; spf=pass (imf19.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718918987; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WE7XSbE5eRnL7QGI4j2KBnNO3JP8DLj8XFRYb1hlHtY=; b=AeTB0iFNJC1JrAfjYd7dzSv7+1Eq1nqrprPYxsu1lTJsIJe23G1oOOdLVZQ9i6sIeiFW0q SfwDGO9P/dNsF3KJ9hpcKm2E139rCWgHSp+9i85gRVfNo4mZHZaGe7WrshbAyRh/SMetFz +yINA+8SllCVMp2cE00d+k2mI8UhyAk= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-675-y_q3-vAEPDaRf0dMPw964g-1; Thu, 20 Jun 2024 17:29:44 -0400 X-MC-Unique: y_q3-vAEPDaRf0dMPw964g-1 Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 58F3E19560B8; Thu, 20 Jun 2024 21:29:43 +0000 (UTC) Received: from t14s.redhat.com (unknown [10.39.192.54]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7C1E019560AF; Thu, 20 Jun 2024 21:29:41 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton Subject: [PATCH v1 2/2] mm/migrate: move NUMA hinting fault folio isolation + checks under PTL Date: Thu, 20 Jun 2024 23:29:35 +0200 Message-ID: <20240620212935.656243-3-david@redhat.com> In-Reply-To: <20240620212935.656243-1-david@redhat.com> References: <20240620212935.656243-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Stat-Signature: uon4iee9cr1z7o9kznb4sfe9nnddznuu X-Rspamd-Queue-Id: 57B8F1A0010 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1718918988-692849 X-HE-Meta: U2FsdGVkX1/4yC39xWT8JkMTTYNwiHKUJkZMmM5Y8NMFmXtcaDVjbMMvd6gIgKvogIE0Sj0tUcIUW0ly46O/8aqWhBcbHNlYxZSQig9hlKFaUjvALPxxY9FwJMORAMKOfVCpFsIbm8EBiMI+W8C6ZNCEG43LmvCMni7f1Q7ATFW8AeYYiWCYBVpL3E2WS7Lr6z6RwtJey5Xz4lcH/t+3PHpwcg38+KY/FSihbFIhgAMwTzyNUiAPpZHfyh1zHhg4u6r31wce3druEONIzQopsUaendWUyybZ6E0p8Te4RgfVDEF4uCm5xTrPnN8bsUf7R3p1KjeVw7vQ9ZA1zgg8K/bw61AORlU8oW6Wb0H7xXKe0SqTOhRmC4EwoTScg8Xyj6ZvNC650Th7QrEy3oJVESvSmrOxaWbNU9F/aYuhtufRLr9AoTcIw0IESCK66CORJAmHqVyL7YKGF41EME3FIMBP5TRBuOiNJpOOYnR2jnDrc478ygxkRtQoyYgKUm6m7F8wjnjfkUBiqGBHBfSed+BPQDG9iL2W/xBCnLXEcQV/c22SKlLKKc5RxC40RLN/Pv+9/2tVt8cgJaGejsSHE+qzsLZZGqmK+Qelam1VHBv+UprkwHXszhnPVa1cXfV/0Te1uB/agzkP66CK3+pojn3QbAu3q31U2gEOn/dnWk+B0SlTwYK7DoiD9eFhQrCfilUF7nbUYyd6YM99FvajScPy6+wApO3R6B0sGA38hNai7Rb5m18FQxL5FGk+uGA+OJZaeulOCTamEO1Ec7LB83wQv5c0kMO8ODaacSSN9CwZWGNRvkeSOAtXQWasOaspl+FBh8wGQjgPDDJYo8QcQ/J/fzW73kulriFPBcnlp9o2Z4RXAdLn9gHm+G4epDmGeX+5cIML/szAM7qNZe5kLgpH5GrEB64zf4+GXKcpI/He4zaahUX/UseU8HNqB/3ETwR7dOq50V3tA3h6WiJ bAwXSQqA noe/XorkADL5tbBenuLZfId3s1nx7rK7cXkBXYnp6NQTcWKzd10XA2RQET7ZsaetkLNWhlRBH4J8AJUM6aEKGljI055VrXV78wM8PLvrdB9NL6AH0v2K4Nz9fBR35Bg1e4S1vY4jVmyCRQqpyVLaMKx3PTYB+RNexDanGmDhyPGOzqFOWVhP96CXSbZxLs6iPii4x/c3Fbks0aDePYdORJ05RMH9W5RVB0XMhQFSvR8JtBubUBeJd6XmjHFK087aRlSaQP/5j3SPwjGVjJ4EiWDXDgE6vFieM8grXtStNpH8I9RQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently we always take a folio reference even if migration will not even be tried or isolation failed, requiring us to grab+drop an additional reference. Further, we end up calling folio_likely_mapped_shared() while the folio might have already been unmapped, because after we dropped the PTL, that can easily happen. We want to stop touching mapcounts and friends from such context, and only call folio_likely_mapped_shared() while the folio is still mapped: mapcount information is pretty much stale and unreliable otherwise. So let's move checks into numamigrate_isolate_folio(), rename that function to migrate_misplaced_folio_prepare(), and call that function from callsites where we call migrate_misplaced_folio(), but still with the PTL held. We can now stop taking temporary folio references, and really only take a reference if folio isolation succeeded. Doing the folio_likely_mapped_shared() + golio isolation under PT lock is now similar to how we handle MADV_PAGEOUT. While at it, combine the folio_is_file_lru() checks. Signed-off-by: David Hildenbrand Reviewed-by: Baolin Wang Tested-by: Donet Tom Signed-off-by: David Hildenbrand --- include/linux/migrate.h | 7 ++++ mm/huge_memory.c | 8 ++-- mm/memory.c | 9 +++-- mm/migrate.c | 81 +++++++++++++++++++---------------------- 4 files changed, 55 insertions(+), 50 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index f9d92482d117..644be30b69c8 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -139,9 +139,16 @@ const struct movable_operations *page_movable_ops(struct page *page) } #ifdef CONFIG_NUMA_BALANCING +int migrate_misplaced_folio_prepare(struct folio *folio, + struct vm_area_struct *vma, int node); int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, int node); #else +static inline int migrate_misplaced_folio_prepare(struct folio *folio, + struct vm_area_struct *vma, int node) +{ + return -EAGAIN; /* can't migrate now */ +} static inline int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, int node) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index fc27dabcd8e3..4b2817bb2c7d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1688,11 +1688,13 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) if (node_is_toptier(nid)) last_cpupid = folio_last_cpupid(folio); target_nid = numa_migrate_prep(folio, vmf, haddr, nid, &flags); - if (target_nid == NUMA_NO_NODE) { - folio_put(folio); + if (target_nid == NUMA_NO_NODE) + goto out_map; + if (migrate_misplaced_folio_prepare(folio, vma, target_nid)) { + flags |= TNF_MIGRATE_FAIL; goto out_map; } - + /* The folio is isolated and isolation code holds a folio reference. */ spin_unlock(vmf->ptl); writable = false; diff --git a/mm/memory.c b/mm/memory.c index 118660de5bcc..4fd1ecfced4d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5207,8 +5207,6 @@ int numa_migrate_prep(struct folio *folio, struct vm_fault *vmf, { struct vm_area_struct *vma = vmf->vma; - folio_get(folio); - /* Record the current PID acceesing VMA */ vma_set_access_pid_bit(vma); @@ -5345,10 +5343,13 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) else last_cpupid = folio_last_cpupid(folio); target_nid = numa_migrate_prep(folio, vmf, vmf->address, nid, &flags); - if (target_nid == NUMA_NO_NODE) { - folio_put(folio); + if (target_nid == NUMA_NO_NODE) + goto out_map; + if (migrate_misplaced_folio_prepare(folio, vma, target_nid)) { + flags |= TNF_MIGRATE_FAIL; goto out_map; } + /* The folio is isolated and isolation code holds a folio reference. */ pte_unmap_unlock(vmf->pte, vmf->ptl); writable = false; ignore_writable = true; diff --git a/mm/migrate.c b/mm/migrate.c index 0307b54879a0..27f070f64f27 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2530,9 +2530,37 @@ static struct folio *alloc_misplaced_dst_folio(struct folio *src, return __folio_alloc_node(gfp, order, nid); } -static int numamigrate_isolate_folio(pg_data_t *pgdat, struct folio *folio) +/* + * Prepare for calling migrate_misplaced_folio() by isolating the folio if + * permitted. Must be called with the PTL still held. + */ +int migrate_misplaced_folio_prepare(struct folio *folio, + struct vm_area_struct *vma, int node) { int nr_pages = folio_nr_pages(folio); + pg_data_t *pgdat = NODE_DATA(node); + + if (folio_is_file_lru(folio)) { + /* + * Do not migrate file folios that are mapped in multiple + * processes with execute permissions as they are probably + * shared libraries. + * + * See folio_likely_mapped_shared() on possible imprecision + * when we cannot easily detect if a folio is shared. + */ + if ((vma->vm_flags & VM_EXEC) && + folio_likely_mapped_shared(folio)) + return -EACCES; + + /* + * Do not migrate dirty folios as not all filesystems can move + * dirty folios in MIGRATE_ASYNC mode which is a waste of + * cycles. + */ + if (folio_test_dirty(folio)) + return -EAGAIN; + } /* Avoid migrating to a node that is nearly full */ if (!migrate_balanced_pgdat(pgdat, nr_pages)) { @@ -2550,65 +2578,37 @@ static int numamigrate_isolate_folio(pg_data_t *pgdat, struct folio *folio) * further. */ if (z < 0) - return 0; + return -EAGAIN; wakeup_kswapd(pgdat->node_zones + z, 0, folio_order(folio), ZONE_MOVABLE); - return 0; + return -EAGAIN; } if (!folio_isolate_lru(folio)) - return 0; + return -EAGAIN; node_stat_mod_folio(folio, NR_ISOLATED_ANON + folio_is_file_lru(folio), nr_pages); - - /* - * Isolating the folio has taken another reference, so the - * caller's reference can be safely dropped without the folio - * disappearing underneath us during migration. - */ - folio_put(folio); - return 1; + return 0; } /* * Attempt to migrate a misplaced folio to the specified destination - * node. Caller is expected to have an elevated reference count on - * the folio that will be dropped by this function before returning. + * node. Caller is expected to have isolated the folio by calling + * migrate_misplaced_folio_prepare(), which will result in an + * elevated reference count on the folio. This function will un-isolate the + * folio, dereferencing the folio before returning. */ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, int node) { pg_data_t *pgdat = NODE_DATA(node); - int isolated; int nr_remaining; unsigned int nr_succeeded; LIST_HEAD(migratepages); int nr_pages = folio_nr_pages(folio); - /* - * Don't migrate file folios that are mapped in multiple processes - * with execute permissions as they are probably shared libraries. - * - * See folio_likely_mapped_shared() on possible imprecision when we - * cannot easily detect if a folio is shared. - */ - if (folio_likely_mapped_shared(folio) && folio_is_file_lru(folio) && - (vma->vm_flags & VM_EXEC)) - goto out; - - /* - * Also do not migrate dirty folios as not all filesystems can move - * dirty folios in MIGRATE_ASYNC mode which is a waste of cycles. - */ - if (folio_is_file_lru(folio) && folio_test_dirty(folio)) - goto out; - - isolated = numamigrate_isolate_folio(pgdat, folio); - if (!isolated) - goto out; - list_add(&folio->lru, &migratepages); nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio, NULL, node, MIGRATE_ASYNC, @@ -2620,7 +2620,6 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, folio_is_file_lru(folio), -nr_pages); folio_putback_lru(folio); } - isolated = 0; } if (nr_succeeded) { count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded); @@ -2629,11 +2628,7 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma, nr_succeeded); } BUG_ON(!list_empty(&migratepages)); - return isolated ? 0 : -EAGAIN; - -out: - folio_put(folio); - return -EAGAIN; + return nr_remaining ? -EAGAIN : 0; } #endif /* CONFIG_NUMA_BALANCING */ #endif /* CONFIG_NUMA */