From patchwork Mon Sep 25 08:21:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13397475 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48CDBCE7A89 for ; Mon, 25 Sep 2023 08:21:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B21196B00D0; Mon, 25 Sep 2023 04:21:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AA68B6B00D9; Mon, 25 Sep 2023 04:21:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91DC96B00DF; Mon, 25 Sep 2023 04:21:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7BB876B00D0 for ; Mon, 25 Sep 2023 04:21:23 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 3CEE4160B04 for ; Mon, 25 Sep 2023 08:21:23 +0000 (UTC) X-FDA: 81274425246.29.F0F538E Received: from mail-yw1-f182.google.com (mail-yw1-f182.google.com [209.85.128.182]) by imf05.hostedemail.com (Postfix) with ESMTP id 76B8110000C for ; Mon, 25 Sep 2023 08:21:21 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=t9LdxQ4Z; spf=pass (imf05.hostedemail.com: domain of hughd@google.com designates 209.85.128.182 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695630081; a=rsa-sha256; cv=none; b=JHHpLn2cTB7swE/y9aTjk/SK+QF59/KZPWXIlkk3Se83LDJeMZyZ1N28QAKaFBd2YWOlUk 7DA2Y5K4A/AYmBcquXIlmr0HszGF135hbmefUxQu1R1PgSY/0mzcy2iiu2BhdCldT0Bigx w3Kn+hC9le8lu+KwXPcFqUidqQged/Y= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=t9LdxQ4Z; spf=pass (imf05.hostedemail.com: domain of hughd@google.com designates 209.85.128.182 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695630081; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=51Cow/pgH6uhDfisazzoiSztYmzsaEmF2Mt51ptIi+M=; b=e7emN2SG2UQYbRC7e7XW2fNogdX2JHc0cxmv1FzhibwAPu34OuT0f9MsisSKdvL3TEt8mc 4e7N9OL+KP+NkXZo39E2X8FVPheRxsxJBG2F8Yf0oanifw6S9rCaJ3mFRZVFAmlpmuyYoW jgoZJ4jlMOy7j96UecGCDLtu8QTmWkQ= Received: by mail-yw1-f182.google.com with SMTP id 00721157ae682-59f57ad6126so30889497b3.3 for ; Mon, 25 Sep 2023 01:21:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695630080; x=1696234880; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=51Cow/pgH6uhDfisazzoiSztYmzsaEmF2Mt51ptIi+M=; b=t9LdxQ4ZXdxJiWgjesMw2L3uayXBtqhrVddxo/sCic4PEfa1DDnF4mX9+xfhXhNU1u j+JxKHAYW1h8nVXz+wB4gYnCxKmS2vEDnJ0zafzGUZABoE5nd12qI9mO9iXfNGsY+aHx seKCzeahNVeLUYfLMC4O3DoMex7VbMGtK2Np8qV2ilS0/czvJpQJkiF4Vy7tThbfydWz PhN03TeawGgmtKGugcAZYEHHXR/Z8fjefJ7r+Yc9ikxe3f/H7+iNQHmE2nJ2mGE31YWg GAi+B8VpPbWA+XwharRaERjLM/8hlVQVADa9ybyb8ZdkUPARMDiF6pE2h4NDWnZZdzAL iT7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695630080; x=1696234880; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=51Cow/pgH6uhDfisazzoiSztYmzsaEmF2Mt51ptIi+M=; b=Qbv3/lAAbgcfcaBb7iV2PiuS4huBaPYfkgfeVYS1MLKOi6KZXJLxAw+GjfqlzF6XWF YVipm/I8N1nx3fLNngoKUGKY7ZQTzG1jIWLOJMSSkUz2ZAQM56e3OHfPECxZ7lqYkbIh DceVw0PXpSpGeVA2gMRCCL04J7LSDSfq6i9njuQIZNpAdsrHwZXO3n0XuT/1CY0eqR60 HIqgSIxw8KsnduE3MiBP81yry4187ZZGvishIVzDKHau5H/3dy0i7yQXjGVZ5p5uRMt2 pY/JaSFd0Nu76G6reGWkIflgRbWOJBbkW9YBG2Fqi2Qyik0ylz/hjyPhALBatO23VrHE EwtQ== X-Gm-Message-State: AOJu0YwQRL4rg7A7RGdbimlikd6p0/eez4vWIrG9bYnkEbVT4xvc856F YUbRG8uXHR6YRQ17vgQ3F8Dckg== X-Google-Smtp-Source: AGHT+IGr7os4V6FD7A0Dc+N5PVL7LECactrhqyVufW3y5mQ7Ilx2z2M+P22nvGF/P0KFm7TJWuxF3Q== X-Received: by 2002:a0d:f884:0:b0:59b:dbb7:5c74 with SMTP id i126-20020a0df884000000b0059bdbb75c74mr6123161ywf.32.1695630080347; Mon, 25 Sep 2023 01:21:20 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id h67-20020a0dc546000000b00583b144fe51sm2289914ywd.118.2023.09.25.01.21.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 01:21:19 -0700 (PDT) Date: Mon, 25 Sep 2023 01:21:10 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Andi Kleen , Christoph Lameter , Matthew Wilcox , Mike Kravetz , David Hildenbrand , Suren Baghdasaryan , Yang Shi , Sidhartha Kumar , Vishal Moola , Kefeng Wang , Greg Kroah-Hartman , Tejun Heo , Mel Gorman , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 01/12] hugetlbfs: drop shared NUMA mempolicy pretence In-Reply-To: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> Message-ID: <47a562a-6998-4dc6-5df-3834d2f2f411@google.com> References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 76B8110000C X-Stat-Signature: eogojzcrf8ipseu3shjih335sroxjb51 X-Rspam-User: X-HE-Tag: 1695630081-990046 X-HE-Meta: U2FsdGVkX1/WkVFb8B3nEwcqH8eXQ6laWmdSRFI3To3cwFnaOjMtfXvi64/MIrQhTukQBzhcUJvgNAxGzewp+nehPN2KbCPSMZRZ3/6t1m/ZLatN1czy5ojS0BuDJYryUjQ9B+0cGLWy0N6ubHyN0aMH0DknZ22IMdYYTJrTZ81sQ6Qw5kbZW0dUAa3BDoaT2FYXjcQ+ySNzX4ep+CS2RRYkpT8FYpEnr2F8YbHmCPvaHyKhQATat0k9qDNVxNZE++pxxgC2VBpKs6X2SivXaiK6Ep70pUXEdf2DqbmXMChnAAGq3weHweYX6pbD4pHn0fL3L/C/jgsC00ZhdflD8TMQhKFug2f3d2LxQNPKSRFdq/Lchwj7kFduh0+iWQiqfcLFomtAPugrY8j07IqcOrN/19IJXtPQXZOAyRT2gUCNCcJEi1R7qstOxSNdz1cRrpV8ayg0Z0u3a8WfcMgvYQ5eChI524K8EjxeTchl8KGr0+GvZHKPI8M25SpqAYrolpRi5KDFyv3lFBUyJ1DwsT0F8iI13hMo/EUKFnj5/zCnTs1HsgfPlcjiltv95wQGH361p+We/off2v/uyiOb6REuHGq0AkzHdYxS56JDHuWeVWbHgFoIP0CqjIjz/gAB/Eb9JDZbW6SljoaOakLYlQJFLMNvlcLehwF5u6pPBeOfl1e0NNSmLrEc6YLvtvn++oSTkrIRZU5sYlDLbE1GX/DK0/W+HwYIP17kBhAkwf+4SAAnrtAqy3OZ1WQBGrQ53H1O3q2a4fFb5w7hRbnb0/hcW+v0DGX6m/0+tXsqygeV8+Wvhh9cPvcB8seIokg4H1pnnDZKdlWeSHnEBd48VPk1t25tdRXdQk6LsDxO5ex/O3aQsWmN9oUlOo3LmIuNYXoZ0HJ9cJx5dFo5XSib9oAcrRtpL7MouFI2y64m/FcuZFzJrATmCdB5r4q7aglFtfB+w+uFn84MlNflY0G qLZINZDU aTR9P4NuQxsiqSysA5R41/nttmmKhNTWaJ5whFA57Ref1PqmR+zk4oA6nA1NasFyqP3M4xtfOsKcSsfenjIgYeMzQc2JJBCELKz1A9Cav3xiAFD/ZEW3R5c/qqmLHcOfjGTjg0yEE42H53a2thBRW+m44i03+o7uo0vfdnzkq0xhRF3KAIpOSWT7yDi56eoyfk5MfEPsJkV3VL9nuE/71vJCqiLp60gLMvki3vWcU52AQxu7HDoVA11f575hKHEAvyPVhEDkDHwiK2ImaGIDmwjWGmurlUfmQaVXqXByQR2YYFMY2wlYWIhnjXucQ0TC0SJVWCyhtStvEhmZznbdyOWlzz7JWl/uDCnIVpECc7fFO8xXfo48uZECsSIV+RRiug5p/MHvD8lt1QOzATjpddOY/hriH8EbJUS75rEZqTQHP8eayAO5HUwND876nXWQhZpLnyQD/pVf8x90= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: hugetlbfs_fallocate() goes through the motions of pasting a shared NUMA mempolicy onto its pseudo-vma, but how could there ever be a shared NUMA mempolicy for this file? hugetlb_vm_ops has never offered a set_policy method, and hugetlbfs_parse_param() has never supported any mpol options for a mount-wide default policy. It's just an illusion: clean it away so as not to confuse others, giving us more freedom to adjust shmem's set_policy/get_policy implementation. But hugetlbfs_inode_info is still required, just to accommodate seals. Yes, shared NUMA mempolicy support could be added to hugetlbfs, with a set_policy method and/or mpol mount option (Andi's first posting did include an admitted-unsatisfactory hugetlb_set_policy()); but it seems that nobody has bothered to add that in the nineteen years since v2.6.7 made it possible, and there is at least one company that has invested enough into hugetlbfs, that I guess they have learnt well enough how to manage its NUMA, without needing shared mempolicy. Signed-off-by: Hugh Dickins Reviewed-by: Matthew Wilcox (Oracle) --- fs/hugetlbfs/inode.c | 41 +---------------------------------------- include/linux/hugetlb.h | 2 -- 2 files changed, 1 insertion(+), 42 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 316c4cebd3f3..ffee27b10d42 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -83,29 +83,6 @@ static const struct fs_parameter_spec hugetlb_fs_parameters[] = { {} }; -#ifdef CONFIG_NUMA -static inline void hugetlb_set_vma_policy(struct vm_area_struct *vma, - struct inode *inode, pgoff_t index) -{ - vma->vm_policy = mpol_shared_policy_lookup(&HUGETLBFS_I(inode)->policy, - index); -} - -static inline void hugetlb_drop_vma_policy(struct vm_area_struct *vma) -{ - mpol_cond_put(vma->vm_policy); -} -#else -static inline void hugetlb_set_vma_policy(struct vm_area_struct *vma, - struct inode *inode, pgoff_t index) -{ -} - -static inline void hugetlb_drop_vma_policy(struct vm_area_struct *vma) -{ -} -#endif - /* * Mask used when checking the page offset value passed in via system * calls. This value will be converted to a loff_t which is signed. @@ -852,8 +829,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, /* * Initialize a pseudo vma as this is required by the huge page - * allocation routines. If NUMA is configured, use page index - * as input to create an allocation policy. + * allocation routines. */ vma_init(&pseudo_vma, mm); vm_flags_init(&pseudo_vma, VM_HUGETLB | VM_MAYSHARE | VM_SHARED); @@ -901,9 +877,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset, * folios in these areas, we need to consume the reserves * to keep reservation accounting consistent. */ - hugetlb_set_vma_policy(&pseudo_vma, inode, index); folio = alloc_hugetlb_folio(&pseudo_vma, addr, 0); - hugetlb_drop_vma_policy(&pseudo_vma); if (IS_ERR(folio)) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); error = PTR_ERR(folio); @@ -1282,18 +1256,6 @@ static struct inode *hugetlbfs_alloc_inode(struct super_block *sb) hugetlbfs_inc_free_inodes(sbinfo); return NULL; } - - /* - * Any time after allocation, hugetlbfs_destroy_inode can be called - * for the inode. mpol_free_shared_policy is unconditionally called - * as part of hugetlbfs_destroy_inode. So, initialize policy here - * in case of a quick call to destroy. - * - * Note that the policy is initialized even if we are creating a - * private inode. This simplifies hugetlbfs_destroy_inode. - */ - mpol_shared_policy_init(&p->policy, NULL); - return &p->vfs_inode; } @@ -1305,7 +1267,6 @@ static void hugetlbfs_free_inode(struct inode *inode) static void hugetlbfs_destroy_inode(struct inode *inode) { hugetlbfs_inc_free_inodes(HUGETLBFS_SB(inode->i_sb)); - mpol_free_shared_policy(&HUGETLBFS_I(inode)->policy); } static const struct address_space_operations hugetlbfs_aops = { diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 5b2626063f4f..6522eb3cd007 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -30,7 +30,6 @@ void free_huge_folio(struct folio *folio); #ifdef CONFIG_HUGETLB_PAGE -#include #include #include @@ -512,7 +511,6 @@ static inline struct hugetlbfs_sb_info *HUGETLBFS_SB(struct super_block *sb) } struct hugetlbfs_inode_info { - struct shared_policy policy; struct inode vfs_inode; unsigned int seals; }; From patchwork Mon Sep 25 08:22:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13397476 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2B16CE7A8C for ; Mon, 25 Sep 2023 08:22:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 528DD8D000F; Mon, 25 Sep 2023 04:22:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D8DB8D0001; Mon, 25 Sep 2023 04:22:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A1518D000F; Mon, 25 Sep 2023 04:22:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 290208D0001 for ; Mon, 25 Sep 2023 04:22:34 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E0B0640ADD for ; Mon, 25 Sep 2023 08:22:33 +0000 (UTC) X-FDA: 81274428186.22.4911F44 Received: from mail-yb1-f178.google.com (mail-yb1-f178.google.com [209.85.219.178]) by imf28.hostedemail.com (Postfix) with ESMTP id 270C2C0008 for ; Mon, 25 Sep 2023 08:22:31 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=VEIsj9f7; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of hughd@google.com designates 209.85.219.178 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695630152; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4sZK+xCuT0P0uH5eP+c4/6boHa42doWBfySEpYJVCK8=; b=6lfaOeQYIovaMry2eZkvXX1dpz0dsXPGj2X0MWkfr2QJz5sVhbQTtVn7rw5XY/OqGv6UmJ FFIumPtJK49mokuGwIWhzsDhHTDhxywbD1n2MrFxycYWPK/CL5qidTyo2IKSlXegdYlNFr vcUeNx9rY/KdKrQ2+CD0VL8733Bcg+8= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=VEIsj9f7; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of hughd@google.com designates 209.85.219.178 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695630152; a=rsa-sha256; cv=none; b=vStqa1IgUGIxZtDaCVudxzEh5wh228Yc9M/HtESWmSdPfLUZdmsorrjHejxDPpCGPvUWEI eh2NRZh1i7ZcOmr549s9gXNwFHqi9nq/MHxQu2nLDsONW69jjUdm11TFhrR5BcyxMvY06D y9X6FVysrUncwC8tLS5IwWc9ejSiqYg= Received: by mail-yb1-f178.google.com with SMTP id 3f1490d57ef6-d8168d08bebso6621689276.0 for ; Mon, 25 Sep 2023 01:22:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695630151; x=1696234951; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=4sZK+xCuT0P0uH5eP+c4/6boHa42doWBfySEpYJVCK8=; b=VEIsj9f7sjEvo+2IuDEu/Dt78roJxoQA/2CJN09FL+PKlmFKBvuY9VH92e6wqzoua5 fF1l+t4TiTSmuu1t8EGfvSPLkDaEdBDA/OIXV0Al5my/2x19r5lTuDfiErRDNakxSNB4 uHLe8EvMagCtmF2EtwIZ3pNkZLY8KiSmRz4LQM+fSgAMl7QP7eM4ZAx5ptndcfV4W0Dh H7ODHZwkCkPmYZ9hqe+ULUkJdA0nFHpa//f4XIATCEwdY/xdBDfd4bNTVeD3P/goDoEp YacYmEEa4lbxhInIF00YUjWvZCyRag5ekr/V6EiaflqX5XtDR/A+yXwiaJnXBRdAnpC7 1zYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695630151; x=1696234951; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4sZK+xCuT0P0uH5eP+c4/6boHa42doWBfySEpYJVCK8=; b=EX793p2T2xSbwpQw7ba/zxKCbLYP7nwbAbucNc6BYYOvLymaffSbXAXas5ftjfX//S TxzhIXyKq6C+hmUAQw0hP6bHMjihBvYuw/MdshU9olCLbIXBHCDAnKQ6oJZvqq+SswfO 5zGECJuz6Go1V4ns/6vWBWIYNVSWLuK9aPosJK/lQjf+eNsNn2jTKjPWFZZEul5QmUzN LXyXWFc4RsK10/T9pQk1+pO9LcwzKmj7Tw32QNKy8Mi3annTs9/vQeCzIxhvbd63vTA7 XWXsoT9XkgX91mK0x0YWq7KOPz7A/n1U1RjiE/mhhng49u8CLa/deFI5dO4bP91Jhvs2 MWAg== X-Gm-Message-State: AOJu0YzSyinX4wlQwTdTGvJumEY50WgUuiJH2dJ8zFpU0qbcRWdKWuhK R71ij+0wrMDveZMBN2m0jUVWsQ== X-Google-Smtp-Source: AGHT+IEwv44FcIamrq7TyieXIPq5bY4OaPs0D1yBi6vYNp4KwxZNNaUgHfeQiKSegaaLKY4KIVfLOQ== X-Received: by 2002:a25:6942:0:b0:d7b:9580:240c with SMTP id e63-20020a256942000000b00d7b9580240cmr6055638ybc.47.1695630151093; Mon, 25 Sep 2023 01:22:31 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id n80-20020a25da53000000b00d217e46d25csm2127705ybf.4.2023.09.25.01.22.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 01:22:30 -0700 (PDT) Date: Mon, 25 Sep 2023 01:22:27 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Andi Kleen , Christoph Lameter , Matthew Wilcox , Mike Kravetz , David Hildenbrand , Suren Baghdasaryan , Yang Shi , Sidhartha Kumar , Vishal Moola , Kefeng Wang , Greg Kroah-Hartman , Tejun Heo , Mel Gorman , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 02/12] kernfs: drop shared NUMA mempolicy hooks In-Reply-To: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> Message-ID: References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 270C2C0008 X-Stat-Signature: 765ctm7g398xeyiofktp49aefa736sn1 X-HE-Tag: 1695630151-225688 X-HE-Meta: U2FsdGVkX1+h1ba5uY+3H5BG87zvOM8sGvB75FwkuZr8fYGzcznbmmu/LlvL+WIMi1bxNq+eMa4QjsMs2ZXmad95RKYUESU1kYYD37v2e977pU7LkL4EJipC3KnCoWcztD/yFGF1/7F/XZSbcULjBb2pFIqC+97IMLpnulEsaJGGe1H7Zu+oGmVAZax4WRi1BDi2YvFmZ7lSwhm2ZxvQ137HJ37XWeiN2n9dZALt+NyBbKhwkUq0TMOXt+j9Pi5uCuTzNJiVkjkcpyqYomGYgq3LkXKnpcJT1cWDiBbgEJSX5JXgtj1cOy0EcBAoRbFc26uPPeTD5T5Uv1X9eyLbnHQ8+NEWscJAfTz/G7r7lxU9sfZ5y3bzcWA2JNVw4mL/zKhh4xjpaliSt2/eCK7wXA8SUuOSiX959ML7bxt90CoYGV29rVAIYrT3P1bDH+7gHgWwm3dYgHN8l/iFe0e3bYiE5Re2KFzy0FkK847OUDAYGJm/JYY9BKcAxeZsIwPjPto/6iOXPGbUolKK4OvGLLklMGaaQi4yJajHHGo+dPZlDCDUG+I0xOsmGom7EtRdRid2PWoDcglPiqwJgJxWgpLnay7qyxlWOMBMmv5Jv2AsNUbFJvDyzzzbn7PkZRxO9V5Uq6ap8AS+Md4dalqqL3YKzc0fe2NDqs29cyFWY0As+CjyJWI2RUsV1vUA/UHEydAr9E9YkTpB5XTBo9J8AMTjE2DARmAtOTjlMvrADX4SRuY6JgrusPJQWIAQEm+BQShQr3wF5hlnzfQYGyqYlJPmVyGFzO1wceyZ5fs9EjPJ4AgM/SdS9628FELLzP0fb0t46zm7JNnXPhLywuZyi1sHTJCHrqsXQYnoiTlwl1AE76XFGkRRUt9M1E4riSS7cE4tMlEWQHhv/e+YZIWJpMop950j+XWC3VOyclroNAho0g+CcvJ9SLYtsK2rP1aQTDRIV58dQR4hISEfv3q CDs1fk2o Vs/ie7vntJDQ4LViUezhqwTxi7U8c1QqPlrESrC17Ff7KSDFKbvK09uhgPGJzgrbYVXIIZVCq1Ghy3hC2zU5iXbg9rT9x0Pf48DjnIyKfOk/6vMCxCPwUA6PMQkzqz55SAagbUWSlUMbQNonrorgNPO1qDHn/hEzlridzWFrEKmhvlo92n8/Q59R3K6xhOfHaVotp8nsc6UyEgusyojGHB9JdgZswS5z9CQov5DhYSXGDB3VATWCgXK9H4WZpe2XWq7ykp4qfpzRoz2BgAd48eYMGlOwgxgUiG8bvsp/Xo9Jveecn7/rp1o8QOybqJ/stiQTWwl8LUpjB83Esyog2dflDFxB7/qvU3XSv2EoTEmkHz135psOxRZQlVqcUE6FiCSZ9DIWfdcJUVZz6eT7N/WFk+3Lf8rYwNrMhh0XNHp70sW6mK5YsO3PMaJO5/+FyPURRpJ/CRvu3Kvo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It seems strange that kernfs should be an outlier with a set_policy and get_policy in its kernfs_vm_ops. Ah, it dates back to v2.6.30's commit 095160aee954 ("sysfs: fix some bin_vm_ops errors"), when I had crashed on powerpc's pci_mmap_legacy_page_range() fallback to shmem_zero_setup(). Well, that was commendably thorough, to give sysfs-bin a set_policy and get_policy, just to avoid the way it was coded resulting in EINVAL from mmap when CONFIG_NUMA; but somehow feels a bit over-the-top to me now. It's easier to say that nobody should expect to manage a shmem object's shared NUMA mempolicy via some kernfs backdoor to that object: delete that code (and there's no longer an EINVAL from mmap in the NUMA case). This then leaves set_policy/get_policy as implemented only by shmem - though importantly also by SysV SHM, which has to interface with shmem which implements them, and with SHM_HUGETLB which does not. Signed-off-by: Hugh Dickins Reviewed-by: Matthew Wilcox (Oracle) --- fs/kernfs/file.c | 49 ------------------------------------------------ 1 file changed, 49 deletions(-) diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index 180906c36f51..aaa76410e550 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -429,60 +429,11 @@ static int kernfs_vma_access(struct vm_area_struct *vma, unsigned long addr, return ret; } -#ifdef CONFIG_NUMA -static int kernfs_vma_set_policy(struct vm_area_struct *vma, - struct mempolicy *new) -{ - struct file *file = vma->vm_file; - struct kernfs_open_file *of = kernfs_of(file); - int ret; - - if (!of->vm_ops) - return 0; - - if (!kernfs_get_active(of->kn)) - return -EINVAL; - - ret = 0; - if (of->vm_ops->set_policy) - ret = of->vm_ops->set_policy(vma, new); - - kernfs_put_active(of->kn); - return ret; -} - -static struct mempolicy *kernfs_vma_get_policy(struct vm_area_struct *vma, - unsigned long addr) -{ - struct file *file = vma->vm_file; - struct kernfs_open_file *of = kernfs_of(file); - struct mempolicy *pol; - - if (!of->vm_ops) - return vma->vm_policy; - - if (!kernfs_get_active(of->kn)) - return vma->vm_policy; - - pol = vma->vm_policy; - if (of->vm_ops->get_policy) - pol = of->vm_ops->get_policy(vma, addr); - - kernfs_put_active(of->kn); - return pol; -} - -#endif - static const struct vm_operations_struct kernfs_vm_ops = { .open = kernfs_vma_open, .fault = kernfs_vma_fault, .page_mkwrite = kernfs_vma_page_mkwrite, .access = kernfs_vma_access, -#ifdef CONFIG_NUMA - .set_policy = kernfs_vma_set_policy, - .get_policy = kernfs_vma_get_policy, -#endif }; static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma) From patchwork Mon Sep 25 08:24:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13397479 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 897BECE7A8C for ; Mon, 25 Sep 2023 08:24:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 095BE8D0010; Mon, 25 Sep 2023 04:24:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 046588D0001; Mon, 25 Sep 2023 04:24:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E29138D0010; Mon, 25 Sep 2023 04:24:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id CE08B8D0001 for ; Mon, 25 Sep 2023 04:24:08 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A3A48A0AB4 for ; Mon, 25 Sep 2023 08:24:08 +0000 (UTC) X-FDA: 81274432176.25.13FB691 Received: from mail-yb1-f179.google.com (mail-yb1-f179.google.com [209.85.219.179]) by imf27.hostedemail.com (Postfix) with ESMTP id AEBF34002C for ; Mon, 25 Sep 2023 08:24:06 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=aUyrgXeN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of hughd@google.com designates 209.85.219.179 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695630246; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aN6nR5IwDNUCMkgwxdT7Lz56gPehMPk2IXT5mlQNQ+g=; b=KAwxiqctynaDzOixMtaqa4Z4RAGkdUN0yLibwrUCayBePX84956E+Z1Lkbm89wm3B9I4wT Q7kuelhYgNKMEnR3i9vSUZzocVBexjjXOb5cin8oPTAZMidozLYysR+6YUIUkA2wceHcJD WXv/VLpZoppZYXxcrN1SRFSMa0aHyxc= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=aUyrgXeN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of hughd@google.com designates 209.85.219.179 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695630246; a=rsa-sha256; cv=none; b=ux9CWSLo9OfP42yOlSSWTNYkWeN1eGAcciDt2NJOXlcAD7Ej0CHaVEeOIdbgHm6Ge7pbBI a/GBGqnKvpPP366AYQy2DmXp/uMQq7ocroZVxTswXqVe9JLpST96xYkIeCIJxf0lyjGX7v v1X0YR+GiKSBgzpqbyaxv8iejDMnQC4= Received: by mail-yb1-f179.google.com with SMTP id 3f1490d57ef6-d862533ea85so4897743276.0 for ; Mon, 25 Sep 2023 01:24:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695630246; x=1696235046; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=aN6nR5IwDNUCMkgwxdT7Lz56gPehMPk2IXT5mlQNQ+g=; b=aUyrgXeNjjQCuwzvwfF8EBzNNMHuGjFV1ckm08LmOzQyV4uOjvQKV250PyLfutEoav VMjZTSJkNCBVfLC7BY6f9IZ98lCNIiXetEEpE7MTjArYprIbzGqCUw0xz6ZpAi6j3E+k fR6HzsOoZ0tP7tCgGCp2vIn+bKw8ICCZrPxIAEie2eaiorA4M+fJ37feouNPNX7vomzc 43GRf8ZOEreU2fOiq3/hlCw2LiKA8zd2Xx5ak58JdRNo0xM569xnZOQomtgpNJaxJIGy pbCT2aKn8r9iEmg+lvBmLSr3wzGAFNT7UcZ5RZNgq5X/0GNrqi7obSWj/iYeCvvhF+Iu 4g6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695630246; x=1696235046; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aN6nR5IwDNUCMkgwxdT7Lz56gPehMPk2IXT5mlQNQ+g=; b=Rs98PjZmnavepNm2HQkrRMXBntluOmxz6QnQLEJO05hg4H2kXfY4lOiETnPPQ8AabX avo7+A9JWQ592O4R1KCbRkyNi1Ygu078Iplx3SuDeg9jPtaXw9SFfxlr8VqAndpgk1cr nRPVN9Mjxb2GW3TCfIuJNiG8u+bA09pUMDAm2uB/2+VRfTYNJBl0QQ0GstDrGcJzsVv7 BLt1WckVWQb+DeL/uKFu4NZxdTAmQ6YeTzuwOYxlGNMEQ0WOml/wFPeAcdzzBXFrMnaC AVGoCbwMTycn4WU7sdcOyjuQUFxzxCb5ODeu083aM7sjfMKPq12gT7JEwj0IFkJCBtIq ClHQ== X-Gm-Message-State: AOJu0YzXokoKUT/BUR6pwG96d4JhR0QGw9Kp0Y0Sgx7rxxpsCCZRdTuw Xq88FEb2tEE+9Ik+Hg9XEUtfug== X-Google-Smtp-Source: AGHT+IEZ5z3Rqd6K+p87ryKeqtoIFxtYPibeDUbUMGyD2lKyNvgK9u7Hoa4GbTrF1vLwXwt7ln5PhQ== X-Received: by 2002:a25:dbcc:0:b0:d0f:846c:ef7b with SMTP id g195-20020a25dbcc000000b00d0f846cef7bmr5135924ybf.17.1695630245463; Mon, 25 Sep 2023 01:24:05 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j134-20020a25238c000000b00c64533e4e20sm2118142ybj.33.2023.09.25.01.24.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 01:24:04 -0700 (PDT) Date: Mon, 25 Sep 2023 01:24:02 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Andi Kleen , Christoph Lameter , Matthew Wilcox , Mike Kravetz , David Hildenbrand , Suren Baghdasaryan , Yang Shi , Sidhartha Kumar , Vishal Moola , Kefeng Wang , Greg Kroah-Hartman , Tejun Heo , Mel Gorman , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 03/12] mempolicy: fix migrate_pages(2) syscall return nr_failed In-Reply-To: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> Message-ID: References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: AEBF34002C X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: qsr4ajr491wjwnnb9wjwimdonn3c615x X-HE-Tag: 1695630246-910879 X-HE-Meta: U2FsdGVkX18mtEKybVfs12ssSyqM1nv1MyftBJs0bf9fRPVaEkVb+ZoHsIctFll0iVNeSY7a/L+okHnA81kCE0BthroAL/kbijbXvBaheXq8/Yq72GgxwJVL+44ofXCBfwrlOJCwM+EvDiGTaWQsb5wh/86HNPtCePw7M9aS8t8IodQYXNxUisOAS5ZEBMS9k7w4ujYcJqj3aE5x6ALMqyxKPp8Dt0nzfh7mJjZdhoqk2hSWneabnZ+jP5YOtQvvM2XW5sx3qO5BgTQW0kt8OzlNjRe4BUnw6+9og9FaN5Sz1ZHzU5yxk/6BvNIWYi7X73YFhDpwTQnOx+RroZCsknvcbq6o2iX2KJOitXZGnuDog07EY3LqDHdRAcrIIUho3laFAcxjmh8H2cofjYxxGXpPmY4DcPeMc5j4hGNzvbrvQ4oWxT1LNvt8K2v8vuLPDdceuEAjwJI6wg6T1Qmxf14GZGjkLlOS9VCiSW8iDIv+OW8NYbeA0Hig+Tfg6WkklGpgSZc8I0q1oCf39PBKkAlS7h/x1zMcJy+w/YIvHFs512wHFqWvmRt4mJPVy7tMn+/awmAjwCGlMBBuFxKqwQTRnq3G3DswdGr6oASAmvBlAzxeKjYuaUWgIajZeooiNNcdvf4y5pIkjeBmHzNiP0wfofpWAsnguTW8jzHzfbbe7n6auDbSp5UYsn2G07vUMPhzO/FL469Tt8oFo3r8xxEg9CQRz7AdX7tMfEOgQKNS8o38+OdSVC+juwBVauOsA7qw6Ztx1SeJ4BUds006C5XmgiPNS0QCjASDZQAeYE40IF4JnxxXcueS45ipfqSz2G6vEni7oM6YNrr1+A4kZkBhYD/3WivF06Kcff1Sv7rvBGKGvUgD9Rk2Nl6GnwPH+sJQNQ2GtsD/14digVtR2j0mmkgjWDGuhTfbdijNaoDzi/a80QdW5wuDkA+DEMsLP7x8fNKJ2pHg81b8qL3 HODIp3sD u/pV5DHVxsRnFRuSjshZGJLIltklNj7xkc51JbmXFhw5aGmxdkcfOJexosHPyVd+a51XceNoYgJDe4CLlfyC/koDkR/7eKB2TEmwJHIyO1rwyiZrLQKrT0WI/HrMufnuJBD6iB1pSB1Qpez18kg8lq4RtfDVHR9tXp6iy0np8eRLEU78RDC7cRuduPknkOC/yeEGfafsdJTsBVmtrazgInFJISJTHGwUoDR7qs2NlKa1htS++sDwndwUVx2nJbTTlXO3X9s2Scc+iaBVqvZkhEGb5sahgxW4AbmgY3Mv3NeuBiDMj/RDA9ofm4dwS3PyWHHyUMGWX8AdZJE9IjZDwwtGf9LJ85x6uAQHMIaafg/W0P4IV0gKb614q84faCPkJ82sWaQ5a3yIaxPqTfSgedx6byoIxbP9/SYpT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: "man 2 migrate_pages" says "On success migrate_pages() returns the number of pages that could not be moved". Although 5.3 and 5.4 commits fixed mbind(MPOL_MF_STRICT|MPOL_MF_MOVE*) to fail with EIO when not all pages could be moved (because some could not be isolated for migration), migrate_pages(2) was left still reporting only those pages failing at the migration stage, forgetting those failing at the earlier isolation stage. Fix that by accumulating a long nr_failed count in struct queue_pages, returned by queue_pages_range() when it's not returning an error, for adding on to the nr_failed count from migrate_pages() in mm/migrate.c. A count of pages? It's more a count of folios, but changing it to pages would entail more work (also in mm/migrate.c): does not seem justified. queue_pages_range() itself should only return -EIO in the "strictly unmovable" case (STRICT without any MOVEs): in that case it's best to break out as soon as nr_failed gets set; but otherwise it should continue to isolate pages for MOVing even when nr_failed - as the mbind(2) manpage promises. This fixes mbind(MPOL_MF_STRICT|MPOL_MF_MOVE*) behavior left over from 5.3 and 5.4, and the recent syzbot need for vma_start_write() before mbind_range(): but both of those may be fixed by smaller patches. There's a case when nr_failed should be incremented when it was missed: queue_folios_pte_range() and queue_folios_hugetlb() count the transient migration entries, like queue_folios_pmd() already did. And there's a case when nr_failed should not be incremented when it would have been: in meeting later PTEs of the same large folio, which can only be isolated once: fixed by recording the current large folio in struct queue_pages. Clean up the affected functions, fixing or updating many comments. Bool migrate_folio_add(), without -EIO: true if adding, or if skipping shared (but its arguable folio_estimated_sharers() heuristic left unchanged). Use MPOL_MF_WRLOCK flag to queue_pages_range(), instead of bool lock_vma. Use explicit STRICT|MOVE* flags where queue_pages_test_walk() checks for skipping, instead of hiding them behind MPOL_MF_VALID. Signed-off-by: Hugh Dickins Reviewed-by: Matthew Wilcox (Oracle) --- mm/mempolicy.c | 322 +++++++++++++++++++++++-------------------------- 1 file changed, 149 insertions(+), 173 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 42b5567e3773..937386409c28 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -111,7 +111,8 @@ /* Internal flags */ #define MPOL_MF_DISCONTIG_OK (MPOL_MF_INTERNAL << 0) /* Skip checks for continuous vmas */ -#define MPOL_MF_INVERT (MPOL_MF_INTERNAL << 1) /* Invert check for nodemask */ +#define MPOL_MF_INVERT (MPOL_MF_INTERNAL << 1) /* Invert check for nodemask */ +#define MPOL_MF_WRLOCK (MPOL_MF_INTERNAL << 2) /* Write-lock walked vmas */ static struct kmem_cache *policy_cache; static struct kmem_cache *sn_cache; @@ -416,9 +417,19 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { }, }; -static int migrate_folio_add(struct folio *folio, struct list_head *foliolist, +static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist, unsigned long flags); +static bool strictly_unmovable(unsigned long flags) +{ + /* + * STRICT without MOVE flags lets do_mbind() fail immediately with -EIO + * if any misplaced page is found. + */ + return (flags & (MPOL_MF_STRICT | MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) == + MPOL_MF_STRICT; +} + struct queue_pages { struct list_head *pagelist; unsigned long flags; @@ -426,6 +437,8 @@ struct queue_pages { unsigned long start; unsigned long end; struct vm_area_struct *first; + struct folio *large; /* note last large folio on pagelist */ + long nr_failed; /* could not be isolated at this time */ }; /* @@ -443,50 +456,27 @@ static inline bool queue_folio_required(struct folio *folio, return node_isset(nid, *qp->nmask) == !(flags & MPOL_MF_INVERT); } -/* - * queue_folios_pmd() has three possible return values: - * 0 - folios are placed on the right node or queued successfully, or - * special page is met, i.e. huge zero page. - * 1 - there is unmovable folio, and MPOL_MF_MOVE* & MPOL_MF_STRICT were - * specified. - * -EIO - is migration entry or only MPOL_MF_STRICT was specified and an - * existing folio was already on a node that does not follow the - * policy. - */ -static int queue_folios_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, +static void queue_folios_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, unsigned long end, struct mm_walk *walk) - __releases(ptl) { - int ret = 0; struct folio *folio; struct queue_pages *qp = walk->private; - unsigned long flags; if (unlikely(is_pmd_migration_entry(*pmd))) { - ret = -EIO; - goto unlock; + qp->nr_failed++; + return; } folio = pfn_folio(pmd_pfn(*pmd)); if (is_huge_zero_page(&folio->page)) { walk->action = ACTION_CONTINUE; - goto unlock; + return; } if (!queue_folio_required(folio, qp)) - goto unlock; - - flags = qp->flags; - /* go to folio migration */ - if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { - if (!vma_migratable(walk->vma) || - migrate_folio_add(folio, qp->pagelist, flags)) { - ret = 1; - goto unlock; - } - } else - ret = -EIO; -unlock: - spin_unlock(ptl); - return ret; + return; + if (!(qp->flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || + !vma_migratable(walk->vma) || + !migrate_folio_add(folio, qp->pagelist, qp->flags)) + qp->nr_failed++; } /* @@ -496,8 +486,6 @@ static int queue_folios_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, * queue_folios_pte_range() has three possible return values: * 0 - folios are placed on the right node or queued successfully, or * special page is met, i.e. zero page. - * 1 - there is unmovable folio, and MPOL_MF_MOVE* & MPOL_MF_STRICT were - * specified. * -EIO - only MPOL_MF_STRICT was specified and an existing folio was already * on a node that does not follow the policy. */ @@ -508,14 +496,16 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, struct folio *folio; struct queue_pages *qp = walk->private; unsigned long flags = qp->flags; - bool has_unmovable = false; pte_t *pte, *mapped_pte; pte_t ptent; spinlock_t *ptl; ptl = pmd_trans_huge_lock(pmd, vma); - if (ptl) - return queue_folios_pmd(pmd, ptl, addr, end, walk); + if (ptl) { + queue_folios_pmd(pmd, ptl, addr, end, walk); + spin_unlock(ptl); + goto out; + } mapped_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); if (!pte) { @@ -524,8 +514,13 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, } for (; addr != end; pte++, addr += PAGE_SIZE) { ptent = ptep_get(pte); - if (!pte_present(ptent)) + if (pte_none(ptent)) continue; + if (!pte_present(ptent)) { + if (is_migration_entry(pte_to_swp_entry(ptent))) + qp->nr_failed++; + continue; + } folio = vm_normal_folio(vma, addr, ptent); if (!folio || folio_is_zone_device(folio)) continue; @@ -537,97 +532,82 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, continue; if (!queue_folio_required(folio, qp)) continue; - if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { - /* MPOL_MF_STRICT must be specified if we get here */ - if (!vma_migratable(vma)) { - has_unmovable = true; + if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || + !vma_migratable(vma)) { + qp->nr_failed++; + if (strictly_unmovable(flags)) break; - } - + } + if (migrate_folio_add(folio, qp->pagelist, flags)) { /* - * Do not abort immediately since there may be - * temporary off LRU pages in the range. Still - * need migrate other LRU pages. + * A large folio can only be isolated from LRU once, + * but may be mapped by many PTEs (and Copy-On-Write may + * intersperse PTEs of other folios). This is a common + * case, so don't mistake it for failure (but of course + * there can be other cases of multi-mapped pages which + * this quick check does not help to filter out - and a + * search of the pagelist might grow to be prohibitive). */ - if (migrate_folio_add(folio, qp->pagelist, flags)) - has_unmovable = true; - } else - break; + if (folio_test_large(folio)) + qp->large = folio; + } else if (folio != qp->large) { + qp->nr_failed++; + if (strictly_unmovable(flags)) + break; + } } pte_unmap_unlock(mapped_pte, ptl); cond_resched(); - - if (has_unmovable) - return 1; - - return addr != end ? -EIO : 0; +out: + if (qp->nr_failed && strictly_unmovable(flags)) + return -EIO; + return 0; } static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, unsigned long end, struct mm_walk *walk) { - int ret = 0; #ifdef CONFIG_HUGETLB_PAGE struct queue_pages *qp = walk->private; - unsigned long flags = (qp->flags & MPOL_MF_VALID); + unsigned long flags = qp->flags; struct folio *folio; spinlock_t *ptl; pte_t entry; ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte); entry = huge_ptep_get(pte); - if (!pte_present(entry)) + if (!pte_present(entry)) { + if (unlikely(is_hugetlb_entry_migration(entry))) + qp->nr_failed++; goto unlock; + } folio = pfn_folio(pte_pfn(entry)); if (!queue_folio_required(folio, qp)) goto unlock; - - if (flags == MPOL_MF_STRICT) { - /* - * STRICT alone means only detecting misplaced folio and no - * need to further check other vma. - */ - ret = -EIO; + if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || + !vma_migratable(walk->vma)) { + qp->nr_failed++; goto unlock; } - - if (!vma_migratable(walk->vma)) { - /* - * Must be STRICT with MOVE*, otherwise .test_walk() have - * stopped walking current vma. - * Detecting misplaced folio but allow migrating folios which - * have been queued. - */ - ret = 1; - goto unlock; - } - /* - * With MPOL_MF_MOVE, we try to migrate only unshared folios. If it - * is shared it is likely not worth migrating. + * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio. + * Choosing not to migrate a shared folio is not counted as a failure. * * To check if the folio is shared, ideally we want to make sure * every page is mapped to the same process. Doing that is very - * expensive, so check the estimated mapcount of the folio instead. + * expensive, so check the estimated sharers of the folio instead. */ - if (flags & (MPOL_MF_MOVE_ALL) || - (flags & MPOL_MF_MOVE && folio_estimated_sharers(folio) == 1 && - !hugetlb_pmd_shared(pte))) { - if (!isolate_hugetlb(folio, qp->pagelist) && - (flags & MPOL_MF_STRICT)) - /* - * Failed to isolate folio but allow migrating pages - * which have been queued. - */ - ret = 1; - } + if ((flags & MPOL_MF_MOVE_ALL) || + (folio_estimated_sharers(folio) == 1 && !hugetlb_pmd_shared(pte))) + if (!isolate_hugetlb(folio, qp->pagelist)) + qp->nr_failed++; unlock: spin_unlock(ptl); -#else - BUG(); + if (qp->nr_failed && strictly_unmovable(flags)) + return -EIO; #endif - return ret; + return 0; } #ifdef CONFIG_NUMA_BALANCING @@ -708,8 +688,11 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end, return 1; } - /* queue pages from current vma */ - if (flags & MPOL_MF_VALID) + /* + * Check page nodes, and queue pages to move, in the current vma. + * But if no moving, and no strict checking, the scan can be skipped. + */ + if (flags & (MPOL_MF_STRICT | MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) return 0; return 1; } @@ -731,22 +714,22 @@ static const struct mm_walk_ops queue_pages_lock_vma_walk_ops = { /* * Walk through page tables and collect pages to be migrated. * - * If pages found in a given range are on a set of nodes (determined by - * @nodes and @flags,) it's isolated and queued to the pagelist which is - * passed via @private. + * If pages found in a given range are not on the required set of @nodes, + * and migration is allowed, they are isolated and queued to the pagelist + * which is passed via @private. * - * queue_pages_range() has three possible return values: - * 1 - there is unmovable page, but MPOL_MF_MOVE* & MPOL_MF_STRICT were - * specified. - * 0 - queue pages successfully or no misplaced page. - * errno - i.e. misplaced pages with MPOL_MF_STRICT specified (-EIO) or - * memory range specified by nodemask and maxnode points outside - * your accessible address space (-EFAULT) + * queue_pages_range() may return: + * 0 - all pages already on the right node, or successfully queued for moving + * (or neither strict checking nor moving requested: only range checking). + * >0 - this number of misplaced folios could not be queued for moving + * (a hugetlbfs page or a transparent huge page being counted as 1). + * -EIO - a misplaced page found, when MPOL_MF_STRICT specified without MOVEs. + * -EFAULT - a hole in the memory range, when MPOL_MF_DISCONTIG_OK unspecified. */ -static int +static long queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, nodemask_t *nodes, unsigned long flags, - struct list_head *pagelist, bool lock_vma) + struct list_head *pagelist) { int err; struct queue_pages qp = { @@ -757,7 +740,7 @@ queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, .end = end, .first = NULL, }; - const struct mm_walk_ops *ops = lock_vma ? + const struct mm_walk_ops *ops = (flags & MPOL_MF_WRLOCK) ? &queue_pages_lock_vma_walk_ops : &queue_pages_walk_ops; err = walk_page_range(mm, start, end, ops, &qp); @@ -766,7 +749,7 @@ queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, /* whole range in hole */ err = -EFAULT; - return err; + return err ? : qp.nr_failed; } /* @@ -1029,16 +1012,16 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, } #ifdef CONFIG_MIGRATION -static int migrate_folio_add(struct folio *folio, struct list_head *foliolist, +static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist, unsigned long flags) { /* - * We try to migrate only unshared folios. If it is shared it - * is likely not worth migrating. + * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared folio. + * Choosing not to migrate a shared folio is not counted as a failure. * * To check if the folio is shared, ideally we want to make sure * every page is mapped to the same process. Doing that is very - * expensive, so check the estimated mapcount of the folio instead. + * expensive, so check the estimated sharers of the folio instead. */ if ((flags & MPOL_MF_MOVE_ALL) || folio_estimated_sharers(folio) == 1) { if (folio_isolate_lru(folio)) { @@ -1046,32 +1029,31 @@ static int migrate_folio_add(struct folio *folio, struct list_head *foliolist, node_stat_mod_folio(folio, NR_ISOLATED_ANON + folio_is_file_lru(folio), folio_nr_pages(folio)); - } else if (flags & MPOL_MF_STRICT) { + } else { /* * Non-movable folio may reach here. And, there may be * temporary off LRU folios or non-LRU movable folios. * Treat them as unmovable folios since they can't be - * isolated, so they can't be moved at the moment. It - * should return -EIO for this case too. + * isolated, so they can't be moved at the moment. */ - return -EIO; + return false; } } - - return 0; + return true; } /* * Migrate pages from one node to a target node. * Returns error or the number of pages not migrated. */ -static int migrate_to_node(struct mm_struct *mm, int source, int dest, - int flags) +static long migrate_to_node(struct mm_struct *mm, int source, int dest, + int flags) { nodemask_t nmask; struct vm_area_struct *vma; LIST_HEAD(pagelist); - int err = 0; + long nr_failed; + long err = 0; struct migration_target_control mtc = { .nid = dest, .gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, @@ -1080,23 +1062,27 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest, nodes_clear(nmask); node_set(source, nmask); - /* - * This does not "check" the range but isolates all pages that - * need migration. Between passing in the full user address - * space range and MPOL_MF_DISCONTIG_OK, this call can not fail. - */ - vma = find_vma(mm, 0); VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))); - queue_pages_range(mm, vma->vm_start, mm->task_size, &nmask, - flags | MPOL_MF_DISCONTIG_OK, &pagelist, false); + vma = find_vma(mm, 0); + + /* + * This does not migrate the range, but isolates all pages that + * need migration. Between passing in the full user address + * space range and MPOL_MF_DISCONTIG_OK, this call cannot fail, + * but passes back the count of pages which could not be isolated. + */ + nr_failed = queue_pages_range(mm, vma->vm_start, mm->task_size, &nmask, + flags | MPOL_MF_DISCONTIG_OK, &pagelist); if (!list_empty(&pagelist)) { err = migrate_pages(&pagelist, alloc_migration_target, NULL, - (unsigned long)&mtc, MIGRATE_SYNC, MR_SYSCALL, NULL); + (unsigned long)&mtc, MIGRATE_SYNC, MR_SYSCALL, NULL); if (err) putback_movable_pages(&pagelist); } + if (err >= 0) + err += nr_failed; return err; } @@ -1109,8 +1095,8 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest, int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, const nodemask_t *to, int flags) { - int busy = 0; - int err = 0; + long nr_failed = 0; + long err = 0; nodemask_t tmp; lru_cache_disable(); @@ -1192,7 +1178,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, node_clear(source, tmp); err = migrate_to_node(mm, source, dest, flags); if (err > 0) - busy += err; + nr_failed += err; if (err < 0) break; } @@ -1201,8 +1187,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, lru_cache_enable(); if (err < 0) return err; - return busy; - + return (nr_failed < INT_MAX) ? nr_failed : INT_MAX; } /* @@ -1241,10 +1226,10 @@ static struct folio *new_folio(struct folio *src, unsigned long start) } #else -static int migrate_folio_add(struct folio *folio, struct list_head *foliolist, +static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist, unsigned long flags) { - return -EIO; + return false; } int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, @@ -1268,8 +1253,8 @@ static long do_mbind(unsigned long start, unsigned long len, struct vma_iterator vmi; struct mempolicy *new; unsigned long end; - int err; - int ret; + long err; + long nr_failed; LIST_HEAD(pagelist); if (flags & ~(unsigned long)MPOL_MF_VALID) @@ -1309,10 +1294,8 @@ static long do_mbind(unsigned long start, unsigned long len, start, start + len, mode, mode_flags, nmask ? nodes_addr(*nmask)[0] : NUMA_NO_NODE); - if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { - + if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) lru_cache_disable(); - } { NODEMASK_SCRATCH(scratch); if (scratch) { @@ -1328,44 +1311,37 @@ static long do_mbind(unsigned long start, unsigned long len, goto mpol_out; /* - * Lock the VMAs before scanning for pages to migrate, to ensure we don't - * miss a concurrently inserted page. + * Lock the VMAs before scanning for pages to migrate, + * to ensure we don't miss a concurrently inserted page. */ - ret = queue_pages_range(mm, start, end, nmask, - flags | MPOL_MF_INVERT, &pagelist, true); + nr_failed = queue_pages_range(mm, start, end, nmask, + flags | MPOL_MF_INVERT | MPOL_MF_WRLOCK, &pagelist); - if (ret < 0) { - err = ret; - goto up_out; - } - - vma_iter_init(&vmi, mm, start); - prev = vma_prev(&vmi); - for_each_vma_range(vmi, vma, end) { - err = mbind_range(&vmi, vma, &prev, start, end, new); - if (err) - break; + if (nr_failed < 0) { + err = nr_failed; + } else { + vma_iter_init(&vmi, mm, start); + prev = vma_prev(&vmi); + for_each_vma_range(vmi, vma, end) { + err = mbind_range(&vmi, vma, &prev, start, end, new); + if (err) + break; + } } if (!err) { - int nr_failed = 0; - if (!list_empty(&pagelist)) { WARN_ON_ONCE(flags & MPOL_MF_LAZY); - nr_failed = migrate_pages(&pagelist, new_folio, NULL, + nr_failed |= migrate_pages(&pagelist, new_folio, NULL, start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND, NULL); - if (nr_failed) - putback_movable_pages(&pagelist); } - - if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT))) + if (nr_failed && (flags & MPOL_MF_STRICT)) err = -EIO; - } else { -up_out: - if (!list_empty(&pagelist)) - putback_movable_pages(&pagelist); } + if (!list_empty(&pagelist)) + putback_movable_pages(&pagelist); + mmap_write_unlock(mm); mpol_out: mpol_put(new); From patchwork Mon Sep 25 08:25:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13397480 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F6A0CE7A89 for ; Mon, 25 Sep 2023 08:25:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CF2DD8D0011; Mon, 25 Sep 2023 04:25:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CA41B8D0001; Mon, 25 Sep 2023 04:25:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B6B5B8D0011; Mon, 25 Sep 2023 04:25:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A76918D0001 for ; Mon, 25 Sep 2023 04:25:15 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 7CEAAA0B85 for ; Mon, 25 Sep 2023 08:25:15 +0000 (UTC) X-FDA: 81274434990.18.9568D36 Received: from mail-yw1-f170.google.com (mail-yw1-f170.google.com [209.85.128.170]) by imf19.hostedemail.com (Postfix) with ESMTP id AECBD1A000B for ; Mon, 25 Sep 2023 08:25:13 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=2IkeYn9o; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of hughd@google.com designates 209.85.128.170 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695630313; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hroENP4HhihHjQySGJW5cTAKXjm0pfkryRKjlEMKAf0=; b=IRSJ2o9azksQvmfY6k0H61fKN1p8GZxnF8zd3jLfKyqOHvxV/djnEb2D9tQF/eB1gEfDWQ C4D4YMULL8ZwB8lqdPWeZC054UlHBspDsdbUaGuFM8QlDq7KIzenrXCLOOeL+T+wJyHa0Z IRla5MZZ9i/G2tKITUzbyvQ3oRZvNrY= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=2IkeYn9o; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf19.hostedemail.com: domain of hughd@google.com designates 209.85.128.170 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695630313; a=rsa-sha256; cv=none; b=DcuYp3voYqq9j5Ct4Tail/ZQk6uSRIYEcIWxnQ2CQWE+peuk5xZ0/YcguDAKqhrJAfm1/Z j0FSAUHforVbwW89zfUrMHGwqjrWyzc8d2BWIqN2qxqKdcP9WSGdcBUj7dThPmbI9vQ9R9 3USQUMd43Bn/Nkg/cJagDf43vWJFEXY= Received: by mail-yw1-f170.google.com with SMTP id 00721157ae682-59c04237bf2so74011127b3.0 for ; Mon, 25 Sep 2023 01:25:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695630313; x=1696235113; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=hroENP4HhihHjQySGJW5cTAKXjm0pfkryRKjlEMKAf0=; b=2IkeYn9osmgtcIQI79yKRD56q/78oAm3F+iR3fDBouHO1D2Zjl9syUOHthID7+3v6d /Imjm+Req+ELKnZajPLVb1KZxt3W1V8hS31MDDszdcbHFqzXJZVd42lwTdc4M1LtYMor ZlCGyQ0BsfnLtXp+WFR2tWoNV6lo7uKSnasC9VHENZehO7Ea/HOlemHV+k6AHX2kh1qn XcUk1ZNXkY5OEpHLaP7fRX/DmFNUCh2zPz3j3TcbGLMVL2AaWcrM02xc58fNfNtXh6Ra ORD+UW+7p/WILxOfIJF5xGdJVkBfF8dQklhpm72y3TZ2VVL7+n8KxuYhdjSHZZ81LEHU Wl2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695630313; x=1696235113; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hroENP4HhihHjQySGJW5cTAKXjm0pfkryRKjlEMKAf0=; b=n9ITXPf3zxazJkEuZJ/b4uhIEJ4Zpsi+WjOenp0rdTb4w3+qo47qpXT3yrL1CPpVFY 8NrLChKBKQz2DiLV8HU7AYSSuZ/5vcqji2np+G4ipCgetGvGUQkfT4omE/nW48SVEp1n u2ESEvvfYOWW15mdbGTyakRzv2M+KvnOBD5qdMYohrdw6+CWTIu4krkajXq6km00b52f QixEaBSx1R0pX6PlJZrl4p+huJZW4BrIJ+m9MxXE/JKEIuo30AUe2in0kSm1Ozp8BSiz h/SqSbVXlUfVhP5fgY4fyTBAblt5fuh3G7R9rCYJhNHKZL63n06wdmH7FjEtJqrKkZ7W z0HQ== X-Gm-Message-State: AOJu0YyjHvoFg+VWyDOo5c5ZbuH7VseII9Xz3uRjFfBKiFdb55z/fwnv zP7aw+LmpOYch9CuYE58uINRSw== X-Google-Smtp-Source: AGHT+IGswZO2h3vhIcg/xM5XdhydTtdSAqCgFDwsheA7+YxiMWnOmcWx3WkqpsQDKBWY2Pxy5ekbXw== X-Received: by 2002:a05:690c:3603:b0:571:11ea:b2dd with SMTP id ft3-20020a05690c360300b0057111eab2ddmr5712836ywb.32.1695630312698; Mon, 25 Sep 2023 01:25:12 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x2-20020a818702000000b005869ca8da8esm2265401ywf.146.2023.09.25.01.25.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 01:25:11 -0700 (PDT) Date: Mon, 25 Sep 2023 01:25:09 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Andi Kleen , Christoph Lameter , Matthew Wilcox , Mike Kravetz , David Hildenbrand , Suren Baghdasaryan , Yang Shi , Sidhartha Kumar , Vishal Moola , Kefeng Wang , Greg Kroah-Hartman , Tejun Heo , Mel Gorman , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 04/12] mempolicy trivia: delete those ancient pr_debug()s In-Reply-To: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> Message-ID: References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: AECBD1A000B X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: fqsyeei3ppjd8yekg4wseew4rwkgmct8 X-HE-Tag: 1695630313-508570 X-HE-Meta: U2FsdGVkX18JOkYgiQmkeuhJr3qspuKib6TUPh3imeABVzf9DoY+AkZ9iWvHhGWh0VOAiS3QVx7uSnawdV6np0tfyQEgjfExDynfdeSdHhgw/mX29cGKz3AW3cbEW/B6lageS/cOzZ1pJ07bPYctpn30yBypDRMmhqC5ud3wV3W6aPnV1lxbrM3TXJrEoBiEaMkj60zbdpSjjgxXV/u+nLYxPW930J+VX0oMwq1M3HyC2OwrZvxrRPCVAbM2q8J5ppPovfFoPgNNLfbZuEriu34KSBT/sslSz+tUKoQw5qNQFpvslt/7qYqs0GWKBjBKjUy6antNWwtrwToJLCSFt+ZGJb7VvDkBsS0YvOzMYNSipIRLUS7Te2CnbOZuLI9hDENqveqTPcCDf0T/PEfeTZCYk2u1ppdlbMaksHSblfpOlYuuTyzbHDA5OuQHUvTWusEWVX94VMhjnyLvRH0DoNaPwGO6K0csUJwakUXsx0ONJ4TDC92YQdSGQlZpqkaJWRnqTng0srlupVtZx6zpDvhAvAlFZieWcwSKWg0iPwQWYiFem68btxw+AFI88MOvQsEvmVni55gQcZNC45N+YO+hC7cJPcFZG0xpVVaJZgvTZlTvQLDYuq6QumrhSSYUvYt316B1yjONJRe/2ndPoDdPLG09ymIg7qHtnDZP5/kHZrY5t5X+f9koeVPYwpRnsMET5gUdeaZyM5EneWznb7JEFh/oQ/YsK0Y3Djc56XRroLl3FdyI4F1SYi7PogePavLfuogqsETPPTrwt5I3ROSUgCipb8QP2sgIq5iGaO/Rloq4DhLe8KDO09+BaETHsXCBGf0Hv+A/xoK5azoJHcuuSkrjWtk/hPfkm0WiLhrme+j1cDnhPbz6oTCwPtiFxHw7p/cfrvaVnshLrz3afk4TKuc+mBT5y5CQaRRw6sENmTWDhxBwSNSNW/3ycXG37Ni8LkHs2AbqcIarJRC fqC3wEGk FM69ltifcQnjVF1V3k5BdIgSgR3JieG9gV7RrvnF1fpwk1Nsan23KlymDVmaDU8Azfnoi1uiIF6YDOAS0NeAQ7I5fB3aUCL1wqD+tS9rh8qZx4FK6mqORvpG2oVdEBqdxSNhAdcsaYgNJ0bQ2ThHg29tSAPVlL02umlTZ5yYmyCnsuENjRLdQzviZVh3KgOH7P9r96MOWZNvuU2foPdKEAOg0tdsyeA6hVuNdg0Bo5oQvsR0jjrTpcvos9w3o+7UVL5/JaONX5my0Bj5eQZvY5MLthb+QLyfsA+rwJHIW9yn7a4jubVBy3A+fphlqaVEQ+0ugC9A3CK/edcYykl56OlB9IE6nLBzrIf6dLLqa7d/59OaX5i+es2vMtPt8YfUVIPkvq1ZdBviGevwWH1ZGmvIjg0WFzkxu3nl8hdng985ONJEjvI5vmeAmgU+kpWjmNLv1cDWxSU9Vwj1wz38/2eKgVmUxn8m0sw4u6EVGoBsK4GtoHeovFeW4UORVngzv1XLg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Delete those ancient pr_debug()s - PDprintk()s in Andi Kleen's original submission of core NUMA API, and useful when debugging shared mempolicy lifetime back then, but not used recently. Signed-off-by: Hugh Dickins Reviewed-by: Matthew Wilcox (Oracle) --- mm/mempolicy.c | 21 --------------------- 1 file changed, 21 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 937386409c28..b2573921b78f 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -264,9 +264,6 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, { struct mempolicy *policy; - pr_debug("setting mode %d flags %d nodes[0] %lx\n", - mode, flags, nodes ? nodes_addr(*nodes)[0] : NUMA_NO_NODE); - if (mode == MPOL_DEFAULT) { if (nodes && !nodes_empty(*nodes)) return ERR_PTR(-EINVAL); @@ -765,11 +762,6 @@ static int vma_replace_policy(struct vm_area_struct *vma, vma_assert_write_locked(vma); - pr_debug("vma %lx-%lx/%lx vm_ops %p vm_file %p set_policy %p\n", - vma->vm_start, vma->vm_end, vma->vm_pgoff, - vma->vm_ops, vma->vm_file, - vma->vm_ops ? vma->vm_ops->set_policy : NULL); - new = mpol_dup(pol); if (IS_ERR(new)) return PTR_ERR(new); @@ -1290,10 +1282,6 @@ static long do_mbind(unsigned long start, unsigned long len, if (!new) flags |= MPOL_MF_DISCONTIG_OK; - pr_debug("mbind %lx-%lx mode:%d flags:%d nodes:%lx\n", - start, start + len, mode, mode_flags, - nmask ? nodes_addr(*nmask)[0] : NUMA_NO_NODE); - if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) lru_cache_disable(); { @@ -2511,8 +2499,6 @@ static void sp_insert(struct shared_policy *sp, struct sp_node *new) } rb_link_node(&new->nd, parent, p); rb_insert_color(&new->nd, &sp->root); - pr_debug("inserting %lx-%lx: %d\n", new->start, new->end, - new->policy ? new->policy->mode : 0); } /* Find shared policy intersecting idx */ @@ -2649,7 +2635,6 @@ void mpol_put_task_policy(struct task_struct *task) static void sp_delete(struct shared_policy *sp, struct sp_node *n) { - pr_debug("deleting %lx-l%lx\n", n->start, n->end); rb_erase(&n->nd, &sp->root); sp_free(n); } @@ -2806,12 +2791,6 @@ int mpol_set_shared_policy(struct shared_policy *info, struct sp_node *new = NULL; unsigned long sz = vma_pages(vma); - pr_debug("set_shared_policy %lx sz %lu %d %d %lx\n", - vma->vm_pgoff, - sz, npol ? npol->mode : -1, - npol ? npol->flags : -1, - npol ? nodes_addr(npol->nodes)[0] : NUMA_NO_NODE); - if (npol) { new = sp_alloc(vma->vm_pgoff, vma->vm_pgoff + sz, npol); if (!new) From patchwork Mon Sep 25 08:26:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13397481 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 373FCCE7A81 for ; Mon, 25 Sep 2023 08:27:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C31FE8D0012; Mon, 25 Sep 2023 04:27:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BE0E68D0001; Mon, 25 Sep 2023 04:27:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA8D98D0012; Mon, 25 Sep 2023 04:27:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9BA438D0001 for ; Mon, 25 Sep 2023 04:27:01 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7A6B540AEA for ; Mon, 25 Sep 2023 08:27:01 +0000 (UTC) X-FDA: 81274439442.22.DC7C7FE Received: from mail-yw1-f173.google.com (mail-yw1-f173.google.com [209.85.128.173]) by imf22.hostedemail.com (Postfix) with ESMTP id A5648C0026 for ; Mon, 25 Sep 2023 08:26:59 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bAt5vta8; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of hughd@google.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695630419; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=u9hKZK3ZUeqzyIltnhSgdeSnOl5bDTv2bcCM7fdUw7E=; b=68s42yLrpapTl7KxMTqFyBYZzEgCubDpwm+hiFYk1uzB4HvSJIE2/rlMMNNh/aP/xfuQ+0 Iy6SUbXwD8S2Rgf5pGQbxvVWfEnExwECSszRSF5/7tRzSGjC3Hmdpqv2bGd5ncMt3lq/za wpOV/SMf7QyoIh/GGz635aWJoVin1oo= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bAt5vta8; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of hughd@google.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695630419; a=rsa-sha256; cv=none; b=23InTRUl3hhaj7RJRkhbDEgp3MAsNWf3VP/0KHHgyr7uGmAICdoXeW4qeDezpS9X7MAyh2 kr6/2kGK+eHLbELYZ3D1WxLBz8V91B7EHvf0x/hLI34F9nAuWT4eEJaG0NIgSquESi3/iJ C7E2JP6IWiDrhM4Ze1PdDUJkneKr4mE= Received: by mail-yw1-f173.google.com with SMTP id 00721157ae682-59f6e6b206fso20432697b3.3 for ; Mon, 25 Sep 2023 01:26:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695630419; x=1696235219; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=u9hKZK3ZUeqzyIltnhSgdeSnOl5bDTv2bcCM7fdUw7E=; b=bAt5vta8yO328knKguWpPdUqZwuP/GSdkXsPesLSfQAKX1fKAjcgO1MCI6RcicQoq/ eKIWDz4flWrfcgRS+ksG9Tlh6iqU+dsiJLF8tE5ipFqOJoRqcSZe19LgpBOu5i689WDg OvI8xRpAJH7WnioHlZ3i3Osc3qAYMKEwxBeh3BKYDNRNCNcLqKWs9FbOThaQbSyefV/+ fYLIDiYe8tnlHj/Pje329/YevqNR1KTFUkAX7CYYWXIG/sPbJJ/ADgzC6bEEhNNOuWMO HytxmO6Fxx4FfKl6LbpAAf3g96jmlU5aiHHcX6ooEYcNQRjeDSy6WYwWq8+YHlbshdch B3qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695630419; x=1696235219; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=u9hKZK3ZUeqzyIltnhSgdeSnOl5bDTv2bcCM7fdUw7E=; b=HNPCDIN9UxyeYexbQE0N/YSodunwCsTSxS028ByrhjSvQk1ts6O7/0bqdrjgEsgt5n rmsTJ65rdOlh+1HhUEbAByREzZhdmLq5QmAwmOViUq1HBxGASefwFr36A3jatG21FA4+ 5fBSihOKgTuZYtKnQbIDkZQvU3Zl/3LrUhBT3MfZRSdm1FNq9UVhD8MhsKC8F5ztuId3 SJ7k1moH+S5k3o3cd5RdmovJcx08RtzuYWhQQuPMwlEBZtzcEACZ3uhe6rtKIzzqGUcH dpnBwl1fMR89uK+MumUxLSf/mv8O2QmkTYoWhUnMETpTYe2epR76DTGVEsVuys+BexOH lgWw== X-Gm-Message-State: AOJu0YwkEYKVuLRNe8qc3IQco3BguzfznN9ZN+MYWIlww2fxpzdrkqK4 inVxwc7dDU5s1ghToTQsxcLbHw== X-Google-Smtp-Source: AGHT+IE5M5EgLVxUkBpaGKUdLKFQAyUDWZdjcFC3dAQ5VlTjGCz9UOlXNzgENK6A+AZ+A9Q73Se6+g== X-Received: by 2002:a0d:ef43:0:b0:57a:f72:ebf8 with SMTP id y64-20020a0def43000000b0057a0f72ebf8mr6180318ywe.28.1695630418681; Mon, 25 Sep 2023 01:26:58 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id w129-20020a817b87000000b00589e68edac6sm2307982ywc.39.2023.09.25.01.26.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 01:26:57 -0700 (PDT) Date: Mon, 25 Sep 2023 01:26:55 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Andi Kleen , Christoph Lameter , Matthew Wilcox , Mike Kravetz , David Hildenbrand , Suren Baghdasaryan , Yang Shi , Sidhartha Kumar , Vishal Moola , Kefeng Wang , Greg Kroah-Hartman , Tejun Heo , Mel Gorman , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 05/12] mempolicy trivia: slightly more consistent naming In-Reply-To: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> Message-ID: <1a75d3dd-7fa-7a41-c76b-1232198a9a4a@google.com> References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: frwkejn8j5feuyopgjmnqf6hs9iyfnip X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A5648C0026 X-HE-Tag: 1695630419-563249 X-HE-Meta: U2FsdGVkX191OuL9yCjjz4lkSqxg5ogSRcV1ONJATOQ7N15dXHbIltSpH+YbL4qxKOCH8RzU/kJsWvd18nExMwLOijevvFSc9kd32rVCaHdX3mLRMm/14lgFQrXrH1mRRMA+xH5u+hK2XKxNXccuUo77/Ome2QeagEKTTELLge0mqK+HgdLRwkr0d1RLCbbUkcbzHQpjp+9f/Ew1Qwlzb709StotT4jQPQ8hOK/4VdFLgcOdcF6JZDAwuWlw7dWf9JJ1TzV2j9A3SBq/9UoLe5TNn0oW/6mYGCFgWG8SftihVvShqrbBtsNCoEJWS/sQOxSFSDiqoP2lo+tz1LxCXcoFKpQJWJOCCuTSJEJaHJcoc7okpvsHYN7jIVYxfykKhWt0bXc97KUqQLq47hxFnwYbTWD+Xru6Jc0pLJO7XF6f/cLldEmqIklTGVCkzv9XcuEYzZ14CSjPyDa3Lm5fcmbMX2HlMvF/fVGpbm1dxQfqgImMWKIaAhwqgYMkoDHL/kWO67/Nv3SSbgcDrKruA6oe88jiienuq2CygfG6fzd3ZVk6PPQN5zcBkNwIiMxOYkRIuOBneh91EQ2/opvj7vXcTgUoTgg3/PgwalNx32VJJwY2R9b+3PT2fYN5d3JXfcnF2KlCUa2q0N4HBmwWD80k31ov1uvaOwPXofI+bRvNDtvxoEirUen3mEQJaZIMSuDU96pThI80x2IDKiqYRuc5tUcVrLsiroF/mAgH+2v6v11LM4D70+HWW0xEggHRJcXPeSvJ1+3Eeulh54pJ0O+2S5d9OnYWM8GWKXfDgbD2gGhaGyd9YLWCDEmUowhYRm+/SeL9N6c/3DVee9TV9R0P1p+p3h57SmVmuHCvT5/p+xzXYjvmO01UJGrXJDhB691hStvHsrw4mOObgw1rsfCezRM9RPmsyDQ2mHTShW8EF6TVbuGcm+RILJjAmGJMtEAFoSzulL0Wru2Ao57 5eW7++cB VbOMRkiaUN6/ueRd7jCqXeEkABseOjWINhw42hshqh33tqrPbCBIVAj3ygOeq+lh1gQG6PueQ0FQC01V3YXU4HwziQN8S6YtvjGFoZHFGkdiRfeZvpRsq43bU0od34V0OnQch7YLbFTFkQ6MPRcb8uDAapb+kEuN/jPadXGL1A5x/7RqvHebpSn7IkWvtb7mla9o+40j5soBAGkRvD1COjmwOdzD9fN7w2E7fX00HhuaK6rnTa1KXKCyp3QfEuE5Mkm3qWFyccX9mwYXDgyKRHmsi3mW+1cPHxFPXXozwlK4vCY02IFkMhUJhB53vkn3oCEPZ29l8/nTSc1uv7U4VPV+rtfLHDK8X5fJXkvH+CuuCg1omK1EHfTFBu7VIoLkgt/JaHXFJpYi5nRn4xchdIJV+sr9dkY9yt/PSuzxSLo2eJbNNz5QbA/A3Cr+NwzfLzWf9pfnDFMHXO4E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Before getting down to work, do a little cleanup, mainly of inconsistent variable naming. I gave up trying to rationalize mpol versus pol versus policy, and node versus nid, but let's avoid p and nd. Remove a few superfluous blank lines, but add one; and here prefer vma->vm_policy to vma_policy(vma) - the latter being appropriate in other sources, which have to allow for !CONFIG_NUMA. That intriguing line about KERNEL_DS? should have gone in v2.6.15, when numa_policy_init() stopped using set_mempolicy(2)'s system call handler. Signed-off-by: Hugh Dickins Reviewed-by: Matthew Wilcox (Oracle) --- include/linux/mempolicy.h | 11 +++--- mm/mempolicy.c | 73 ++++++++++++++++++--------------------- 2 files changed, 38 insertions(+), 46 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index d232de7cdc56..8013d716dc46 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -126,10 +126,9 @@ struct shared_policy { int vma_dup_policy(struct vm_area_struct *src, struct vm_area_struct *dst); void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol); -int mpol_set_shared_policy(struct shared_policy *info, - struct vm_area_struct *vma, - struct mempolicy *new); -void mpol_free_shared_policy(struct shared_policy *p); +int mpol_set_shared_policy(struct shared_policy *sp, + struct vm_area_struct *vma, struct mempolicy *mpol); +void mpol_free_shared_policy(struct shared_policy *sp); struct mempolicy *mpol_shared_policy_lookup(struct shared_policy *sp, unsigned long idx); @@ -193,7 +192,7 @@ static inline bool mpol_equal(struct mempolicy *a, struct mempolicy *b) return true; } -static inline void mpol_put(struct mempolicy *p) +static inline void mpol_put(struct mempolicy *pol) { } @@ -212,7 +211,7 @@ static inline void mpol_shared_policy_init(struct shared_policy *sp, { } -static inline void mpol_free_shared_policy(struct shared_policy *p) +static inline void mpol_free_shared_policy(struct shared_policy *sp) { } diff --git a/mm/mempolicy.c b/mm/mempolicy.c index b2573921b78f..121bb490481b 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -25,7 +25,7 @@ * to the last. It would be better if bind would truly restrict * the allocation to memory nodes instead * - * preferred Try a specific node first before normal fallback. + * preferred Try a specific node first before normal fallback. * As a special case NUMA_NO_NODE here means do the allocation * on the local CPU. This is normally identical to default, * but useful to set in a VMA when you have a non default @@ -52,7 +52,7 @@ * on systems with highmem kernel lowmem allocation don't get policied. * Same with GFP_DMA allocations. * - * For shmfs/tmpfs/hugetlbfs shared memory the policy is shared between + * For shmem/tmpfs shared memory the policy is shared between * all users and remembered even when nobody has memory mapped. */ @@ -291,6 +291,7 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, return ERR_PTR(-EINVAL); } else if (nodes_empty(*nodes)) return ERR_PTR(-EINVAL); + policy = kmem_cache_alloc(policy_cache, GFP_KERNEL); if (!policy) return ERR_PTR(-ENOMEM); @@ -303,11 +304,11 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, } /* Slow path of a mpol destructor. */ -void __mpol_put(struct mempolicy *p) +void __mpol_put(struct mempolicy *pol) { - if (!atomic_dec_and_test(&p->refcnt)) + if (!atomic_dec_and_test(&pol->refcnt)) return; - kmem_cache_free(policy_cache, p); + kmem_cache_free(policy_cache, pol); } static void mpol_rebind_default(struct mempolicy *pol, const nodemask_t *nodes) @@ -364,7 +365,6 @@ static void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *newmask) * * Called with task's alloc_lock held. */ - void mpol_rebind_task(struct task_struct *tsk, const nodemask_t *new) { mpol_rebind_policy(tsk->mempolicy, new); @@ -375,7 +375,6 @@ void mpol_rebind_task(struct task_struct *tsk, const nodemask_t *new) * * Call holding a reference to mm. Takes mm->mmap_lock during call. */ - void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new) { struct vm_area_struct *vma; @@ -754,7 +753,7 @@ queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, * This must be called with the mmap_lock held for writing. */ static int vma_replace_policy(struct vm_area_struct *vma, - struct mempolicy *pol) + struct mempolicy *pol) { int err; struct mempolicy *old; @@ -800,7 +799,7 @@ static int mbind_range(struct vma_iterator *vmi, struct vm_area_struct *vma, vmstart = vma->vm_start; } - if (mpol_equal(vma_policy(vma), new_pol)) { + if (mpol_equal(vma->vm_policy, new_pol)) { *prev = vma; return 0; } @@ -872,18 +871,18 @@ static long do_set_mempolicy(unsigned short mode, unsigned short flags, * * Called with task's alloc_lock held */ -static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) +static void get_policy_nodemask(struct mempolicy *pol, nodemask_t *nodes) { nodes_clear(*nodes); - if (p == &default_policy) + if (pol == &default_policy) return; - switch (p->mode) { + switch (pol->mode) { case MPOL_BIND: case MPOL_INTERLEAVE: case MPOL_PREFERRED: case MPOL_PREFERRED_MANY: - *nodes = p->nodes; + *nodes = pol->nodes; break; case MPOL_LOCAL: /* return empty node mask for local allocation */ @@ -1649,7 +1648,6 @@ static int kernel_migrate_pages(pid_t pid, unsigned long maxnode, out_put: put_task_struct(task); goto out; - } SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode, @@ -1659,7 +1657,6 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode, return kernel_migrate_pages(pid, maxnode, old_nodes, new_nodes); } - /* Retrieve NUMA policy */ static int kernel_get_mempolicy(int __user *policy, unsigned long __user *nmask, @@ -1842,10 +1839,10 @@ nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) * policy_node() is always coupled with policy_nodemask(), which * secures the nodemask limit for 'bind' and 'prefer-many' policy. */ -static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd) +static int policy_node(gfp_t gfp, struct mempolicy *policy, int nid) { if (policy->mode == MPOL_PREFERRED) { - nd = first_node(policy->nodes); + nid = first_node(policy->nodes); } else { /* * __GFP_THISNODE shouldn't even be used with the bind policy @@ -1860,19 +1857,18 @@ static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd) policy->home_node != NUMA_NO_NODE) return policy->home_node; - return nd; + return nid; } /* Do dynamic interleaving for a process */ -static unsigned interleave_nodes(struct mempolicy *policy) +static unsigned int interleave_nodes(struct mempolicy *policy) { - unsigned next; - struct task_struct *me = current; + unsigned int nid; - next = next_node_in(me->il_prev, policy->nodes); - if (next < MAX_NUMNODES) - me->il_prev = next; - return next; + nid = next_node_in(current->il_prev, policy->nodes); + if (nid < MAX_NUMNODES) + current->il_prev = nid; + return nid; } /* @@ -2362,7 +2358,7 @@ unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp, int vma_dup_policy(struct vm_area_struct *src, struct vm_area_struct *dst) { - struct mempolicy *pol = mpol_dup(vma_policy(src)); + struct mempolicy *pol = mpol_dup(src->vm_policy); if (IS_ERR(pol)) return PTR_ERR(pol); @@ -2784,40 +2780,40 @@ void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol) } } -int mpol_set_shared_policy(struct shared_policy *info, - struct vm_area_struct *vma, struct mempolicy *npol) +int mpol_set_shared_policy(struct shared_policy *sp, + struct vm_area_struct *vma, struct mempolicy *pol) { int err; struct sp_node *new = NULL; unsigned long sz = vma_pages(vma); - if (npol) { - new = sp_alloc(vma->vm_pgoff, vma->vm_pgoff + sz, npol); + if (pol) { + new = sp_alloc(vma->vm_pgoff, vma->vm_pgoff + sz, pol); if (!new) return -ENOMEM; } - err = shared_policy_replace(info, vma->vm_pgoff, vma->vm_pgoff+sz, new); + err = shared_policy_replace(sp, vma->vm_pgoff, vma->vm_pgoff + sz, new); if (err && new) sp_free(new); return err; } /* Free a backing policy store on inode delete. */ -void mpol_free_shared_policy(struct shared_policy *p) +void mpol_free_shared_policy(struct shared_policy *sp) { struct sp_node *n; struct rb_node *next; - if (!p->root.rb_node) + if (!sp->root.rb_node) return; - write_lock(&p->lock); - next = rb_first(&p->root); + write_lock(&sp->lock); + next = rb_first(&sp->root); while (next) { n = rb_entry(next, struct sp_node, nd); next = rb_next(&n->nd); - sp_delete(p, n); + sp_delete(sp, n); } - write_unlock(&p->lock); + write_unlock(&sp->lock); } #ifdef CONFIG_NUMA_BALANCING @@ -2867,7 +2863,6 @@ static inline void __init check_numabalancing_enable(void) } #endif /* CONFIG_NUMA_BALANCING */ -/* assumes fs == KERNEL_DS */ void __init numa_policy_init(void) { nodemask_t interleave_nodes; @@ -2930,7 +2925,6 @@ void numa_default_policy(void) /* * Parse and format mempolicy from/to strings */ - static const char * const policy_modes[] = { [MPOL_DEFAULT] = "default", @@ -2941,7 +2935,6 @@ static const char * const policy_modes[] = [MPOL_PREFERRED_MANY] = "prefer (many)", }; - #ifdef CONFIG_TMPFS /** * mpol_parse_str - parse string to mempolicy, for tmpfs mpol mount option. From patchwork Mon Sep 25 08:28:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13397482 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E31EBCE7A81 for ; Mon, 25 Sep 2023 08:28:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7ED698D0013; Mon, 25 Sep 2023 04:28:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 79CDD8D0001; Mon, 25 Sep 2023 04:28:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 665648D0013; Mon, 25 Sep 2023 04:28:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 54E968D0001 for ; Mon, 25 Sep 2023 04:28:20 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 24034160B2A for ; Mon, 25 Sep 2023 08:28:20 +0000 (UTC) X-FDA: 81274442760.06.A08D000 Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com [209.85.128.175]) by imf13.hostedemail.com (Postfix) with ESMTP id 5FCCE2000F for ; Mon, 25 Sep 2023 08:28:18 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=A7hHeAJi; spf=pass (imf13.hostedemail.com: domain of hughd@google.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695630498; a=rsa-sha256; cv=none; b=fAk3iRNUdD4JK3q3FTfYWgrdOou12xVz105OiLcEQiXshNBTqloagqbtQf994n/r38ri9N 5F2O89SGeZBtG17V+etipgw5QZC5VyoKherlTp7UpemHgDZ8RzbM4RoGJCFt/mQkOXqqe3 VzpvbQMYmNZMaCt9E6pwyGPpNXulQwY= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=A7hHeAJi; spf=pass (imf13.hostedemail.com: domain of hughd@google.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695630498; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ny+Jruw5kljLWGVDgFi/c0BN4bXVfRfudV3E3SL5ba0=; b=0iQzVJD+ToavqgnAw80EsJY48/gaQl2d9N4uX8IGwuX2HrSJM0yrFsN27tbipBtjATWDa9 eVnt5j2BAzDl4vciHPZgE/KkPfLpPaOUxnY0XjcYulhCJGPpOfZmn1Qpngw0G5Qvwtuo67 bnU8Y4SnjO8OaH3EGGuXWMgmDpJeCNI= Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-59f6763767dso23979647b3.2 for ; Mon, 25 Sep 2023 01:28:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695630497; x=1696235297; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=ny+Jruw5kljLWGVDgFi/c0BN4bXVfRfudV3E3SL5ba0=; b=A7hHeAJi2/lqEaWQfNQqLV+BMq5o7jotLND5giA7hVrVMkvLjbm8/v766EFVE6RJCt qDxLJz2L2sgUENjJ7TD6/LlaEqqZaFEJy/7A85AIuzBTqA6jScxRaBSN7ck0WmhZdWL0 bo1+WNoaFJQu1gxFOSbPoSXlR5eVSk+NkunAM5twxJDExRLhYKT7CoFCIksmms2sI07x oNxQZAxbY3yu1yu0oY6f8cdeKEBZHB4Qtg665xQTHPJvBvKTHnP8CfJSEUtDIDJkfFm3 nNUBGX1r/Eroazsp2O98/iqK4WpLF2SQ2D31niZ0iwn8b8BQbsEkMYYhPJ0W4G/UDK7Z 4qzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695630497; x=1696235297; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ny+Jruw5kljLWGVDgFi/c0BN4bXVfRfudV3E3SL5ba0=; b=ks9aOF2qmXtZadNYbaFWQvnk+gzZRJXjGqDrXuXZXVNH605NDbOGRYcAXVOMLjhejP WEqEkmfQrVhYtSOU4x8ltCwJNzvojCiMoS3v3bge69TPk3fPiLo6rLoghlg1k7cKa+tU C1yvMdPVQs8c8xhovoi2TWIZKID/1RuMp/pN1uibrLzLrzUnoVwKWx6Bi8CoMQK0RRKB DzqhmQkjaEZZA+3SZcFj3F9QtIr708ism8P8OkIUsnvlz6Ebpc3nNgduhPaiTPukIHFO vr8JK9FNHPyoxTUc05/jEiQpizwgXkLKNA2stoaFwu2DQtLmagBtIUa0RUXpfD1hKZGl eBAQ== X-Gm-Message-State: AOJu0YxZxj/ErSDTwZ5ycDOz7p7q+oVwG/HOZr4zU2Az62hGkn0bWteH K44K2oye2goUHJQt6d0u+CvXqQ== X-Google-Smtp-Source: AGHT+IH7TByv+2j1g6DH4DUwI+5DujRbEJ2kgtwKlcyesipljkA02KuVspk813qm/mayffaiCSX9vQ== X-Received: by 2002:a0d:f2c7:0:b0:599:b59f:5280 with SMTP id b190-20020a0df2c7000000b00599b59f5280mr5636415ywf.28.1695630497424; Mon, 25 Sep 2023 01:28:17 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x187-20020a81a0c4000000b0058c55d40765sm2270742ywg.106.2023.09.25.01.28.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 01:28:16 -0700 (PDT) Date: Mon, 25 Sep 2023 01:28:14 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Andi Kleen , Christoph Lameter , Matthew Wilcox , Mike Kravetz , David Hildenbrand , Suren Baghdasaryan , Yang Shi , Sidhartha Kumar , Vishal Moola , Kefeng Wang , Greg Kroah-Hartman , Tejun Heo , Mel Gorman , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 06/12] mempolicy trivia: use pgoff_t in shared mempolicy tree In-Reply-To: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> Message-ID: References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 5FCCE2000F X-Stat-Signature: a5xn6sdbyydh9px4ixsyozktsf1urmxj X-Rspam-User: X-HE-Tag: 1695630498-792172 X-HE-Meta: U2FsdGVkX19HfZLteZQQWpn4TJeDtQ0e5qv/4tMxT+vWpXkHaXK4oq0uePY6ZhK1+jv529x0WgaX0me4WPpe+pchc+fR/vaDEZbAvJmbfyIkliKH4oIehQ7Uuy8J081rgZj1A7F16WqeQR1qs+VrBZiOv5LQvztHKdnhjWCDCcdAdZEpJMXegyh04exdn3+1s1UJwldDmUUZR15Rgd+2t1VT4Xwqm/7m4FF4H6uXtoIx9DQGiK5RjdC+fy8S+eIU3NekiXm5JX3XtOrcq6GUMQj7GzhyNV0vC5a7FHs4ykcRTATwd83RdLY+bbrNMZN6vSVqkO4HQjisZlOgnfQwdK+JbwMXWUlPKOpCJBpWU6MZEU0JvBMiFrwNmoYnnySGpm+FIpGZ0Kljhedhi3R3GrTl/wMSVg1KCxZvqy/PH1o//2LVIojSFiLhNMjwm7w6XWFoPU/dz62435cbNx67Er+7lNiophScHgNO2Gpg14cW3j0TWHkCzdepYQHoVaseSOI8HkGoSqoue7EjpeVc1sCNVVcaubOy/HpY1tXzr5pcb6Tmo5mqdpp658P6jGHMl88UzFRCS/4wVkIHrsue0+DkrunxzUmRrl5U5zere9HBfIiGCp3TazAlNunACQtPaHZMSfcAUD8u7wlPEkEma8V/9Et0dlwaG0YTIm2erOwSDUek2GHt42QECL2nY+FMribIYIswZgK9xSjqt/L4GMKHpRsNQMMm7ObW1G4rDzkADawupjlOpUqdThwi8VQWw4sc/JhpPqiJwFIMnRLmxxkHngWZWO7kkK5YloSnclAbK2y9OY7zZsZK6pZqTP8VrL00BQ4QdaDldvBZtEy+yhOvtjdVlqBD6K88QPaBhrLIOBW9erjtfFOb735Y4B1OXyMSzGRjUzCMueg+Cktq8FVaifqZIXcSXc5TNR4ohW5hMU0g/AJF9fbKkVjah0YzNhk9OPmxEtzzn1Aw2qH gvstPjDp keVOKF6nwsvdzHpefMWi4tTODLBg0pIyZdLHk7/dLvJwKn7NwCUGj3xoIYDxbADiSKwJ9YDBo0FCH3wKrR43O/YX40nPs77vPRsZeD0co0EaWL33A5qAsspGM0OseJoziaFA8NmT6jsKr3gDy2J9FNNcvO3CTNYWl/zv7peadWQOKx6FXt5A82+dcvziMQ+k/KF8u8lL5PQNr6DagY3lThDqi0JC2+Nq9dBaD0tgWArEE/0cTYK7V3L3I/abLwItI73c6fbxNhAPlTsy5cWeShW3A/QtkyRqYKQ2PvsneLIgXtb45A6AjjwqXYKIGNaSnc4zQdjzcnEvK9hO/EDOR6B7YX9j3R+dwVRjeJA+JZJK/IoOLNAg6mV0iZyhm6l7X9IxZ7Gj3b+3zgiXldV64hYMF84eJiE9tBE2Cemys0NmT4Y1TQgKM9J61rBLQ41+e4TYbPra6qedWnj04VtwPWZ5gymczI5x+9pesrV+cxv6EqIHkK1x7q5wn4d02nGcDwttu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Prefer the more explicit "pgoff_t" to "unsigned long" when dealing with a shared mempolicy tree. Delete confusing comment about pseudo mm vmas. Signed-off-by: Hugh Dickins --- include/linux/mempolicy.h | 12 +++--------- mm/mempolicy.c | 8 ++++---- 2 files changed, 7 insertions(+), 13 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 8013d716dc46..12f7dc74a457 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -107,18 +107,12 @@ static inline bool mpol_equal(struct mempolicy *a, struct mempolicy *b) /* * Tree of shared policies for a shared memory region. - * Maintain the policies in a pseudo mm that contains vmas. The vmas - * carry the policy. As a special twist the pseudo mm is indexed in pages, not - * bytes, so that we can work with shared memory segments bigger than - * unsigned long. */ - struct sp_node { struct rb_node nd; - unsigned long start, end; + pgoff_t start, end; struct mempolicy *policy; }; - struct shared_policy { struct rb_root root; rwlock_t lock; @@ -130,7 +124,7 @@ int mpol_set_shared_policy(struct shared_policy *sp, struct vm_area_struct *vma, struct mempolicy *mpol); void mpol_free_shared_policy(struct shared_policy *sp); struct mempolicy *mpol_shared_policy_lookup(struct shared_policy *sp, - unsigned long idx); + pgoff_t idx); struct mempolicy *get_task_policy(struct task_struct *p); struct mempolicy *__get_vma_policy(struct vm_area_struct *vma, @@ -216,7 +210,7 @@ static inline void mpol_free_shared_policy(struct shared_policy *sp) } static inline struct mempolicy * -mpol_shared_policy_lookup(struct shared_policy *sp, unsigned long idx) +mpol_shared_policy_lookup(struct shared_policy *sp, pgoff_t idx) { return NULL; } diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 121bb490481b..065e886ec9b6 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2444,7 +2444,7 @@ bool __mpol_equal(struct mempolicy *a, struct mempolicy *b) * reading or for writing */ static struct sp_node * -sp_lookup(struct shared_policy *sp, unsigned long start, unsigned long end) +sp_lookup(struct shared_policy *sp, pgoff_t start, pgoff_t end) { struct rb_node *n = sp->root.rb_node; @@ -2499,7 +2499,7 @@ static void sp_insert(struct shared_policy *sp, struct sp_node *new) /* Find shared policy intersecting idx */ struct mempolicy * -mpol_shared_policy_lookup(struct shared_policy *sp, unsigned long idx) +mpol_shared_policy_lookup(struct shared_policy *sp, pgoff_t idx) { struct mempolicy *pol = NULL; struct sp_node *sn; @@ -2665,8 +2665,8 @@ static struct sp_node *sp_alloc(unsigned long start, unsigned long end, } /* Replace a policy range. */ -static int shared_policy_replace(struct shared_policy *sp, unsigned long start, - unsigned long end, struct sp_node *new) +static int shared_policy_replace(struct shared_policy *sp, pgoff_t start, + pgoff_t end, struct sp_node *new) { struct sp_node *n; struct sp_node *n_new = NULL; From patchwork Mon Sep 25 08:29:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13397484 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B86CCE7A81 for ; Mon, 25 Sep 2023 08:29:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA87E8D0014; Mon, 25 Sep 2023 04:29:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C58DD8D0001; Mon, 25 Sep 2023 04:29:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B215C8D0014; Mon, 25 Sep 2023 04:29:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A411D8D0001 for ; Mon, 25 Sep 2023 04:29:34 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7D1C3807DE for ; Mon, 25 Sep 2023 08:29:34 +0000 (UTC) X-FDA: 81274445868.13.8B592D6 Received: from mail-yw1-f176.google.com (mail-yw1-f176.google.com [209.85.128.176]) by imf13.hostedemail.com (Postfix) with ESMTP id B95872000B for ; Mon, 25 Sep 2023 08:29:32 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oNluEmAL; spf=pass (imf13.hostedemail.com: domain of hughd@google.com designates 209.85.128.176 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695630572; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HPVWJplZZXmT6NBbA8O4nkrJWg/FlNHzJ5g3rgEWw70=; b=yUT5GEjXGR52aGLZFhj3CyvKiF5GoiLteCfXCFtbRaMUTgZ0EXdIRV4VA1t+ISQhES8mw0 iCRauQX+q3OX+XMhzZr98RpB+3ovkYLSQJZPHLC7shK/q7pG2iayoWhJjW3oRAwVM2H2Eu UilwZzm6iGCmdNUHfyZI2Gp4PqWHnoA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695630572; a=rsa-sha256; cv=none; b=DCFCfbLkR33L9GSbq9hZRUSC7epRVxvm0XBW5ApTbWHYCvQBgdwdO/fDhIfvAiAhUmqNpx ZqzPWLkgT+Svq83jRlWHGWu/i9xRIMtG5o/rADSTq9ozaPPgTLB98+2QcxwMsh1DE+hVzU PigbLp0M161L15b+NtgUIcpVWokiY/I= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oNluEmAL; spf=pass (imf13.hostedemail.com: domain of hughd@google.com designates 209.85.128.176 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f176.google.com with SMTP id 00721157ae682-59be6bcf408so68552377b3.2 for ; Mon, 25 Sep 2023 01:29:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695630572; x=1696235372; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=HPVWJplZZXmT6NBbA8O4nkrJWg/FlNHzJ5g3rgEWw70=; b=oNluEmALoq48QgFa5UIDVRYhWp8JGcee159wyC0aHbiVhaN1QciCMwqY8gGMEX9xex rDG8C6CXV3ZDGKr0Li4R8zViiPAehopRtApKpr0x0HzsGeaLwauLMJzTl3gQt4KLHZ+v GnVy90x5mbpQv2clBTLddnh2pCmqA99BJgDU0n5xX5Pm3XGsBdGus0p7Rb/Fzd2uYf1t Y01Y3iKFCssQb4n4GhQX6OeWyYUrzoBwDgkNwFVRYCtFdGnhhDvHJz5yU5Pq1RyijIVD UwUymS5fveBEeA+TiAByR2klHt4rcPwVibEk/DxaKcZEyezO6CvMpk88p3G89q9qUCNb 3CYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695630572; x=1696235372; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HPVWJplZZXmT6NBbA8O4nkrJWg/FlNHzJ5g3rgEWw70=; b=nYnIckSmKKq3XDsgvQWWJuhzidnCL/mf/c9okXX+NxLnikUDML+CNUZjp27BMMaMVT klcVU3/kCwpaj9i69+7ctUqMNyWekcx/Pez+w5mrV3YK7OyXfu2+o+rDxUfRK1CR5gNj YuLvxV0V/JVM0+dEg2FVEhoqjqjVqWLrC4aaZPz0TnkhRihe/Ng2hA1HgrYKcN6O5fH9 qXxW4U2PADnTtwMZb/w5IOd3l+Pb9Yw1weCDfj5EPZrfjAzIlralSaPAdXTyMxSbe4iu 0YZEeMzYJys08ZrTvabSrlcrlkQLFZyvCFE+sQD7F5hkpv/djb7lGCCXHDOd5m0DACBP sDVA== X-Gm-Message-State: AOJu0YzTPAwlPR6+Ef4IdKKNm4gzL6W+jvUswTy9sAXPUsEFTWrKaMWK Puu0P0nS+EEr5OGJ0lwjCTOADw== X-Google-Smtp-Source: AGHT+IHogreXDslqhpyk+Ms3XFo/uXGtPLAXb4cvDc8C5nh9Ehy9UaqEagZCyCvQ6vw015lUH0Gb7A== X-Received: by 2002:a81:52c9:0:b0:59c:c79:eeee with SMTP id g192-20020a8152c9000000b0059c0c79eeeemr5778750ywb.44.1695630571804; Mon, 25 Sep 2023 01:29:31 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id c64-20020a0dc143000000b0059a34cfa2a5sm2259385ywd.67.2023.09.25.01.29.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 01:29:30 -0700 (PDT) Date: Mon, 25 Sep 2023 01:29:28 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Andi Kleen , Christoph Lameter , Matthew Wilcox , Mike Kravetz , David Hildenbrand , Suren Baghdasaryan , Yang Shi , Sidhartha Kumar , Vishal Moola , Kefeng Wang , Greg Kroah-Hartman , Tejun Heo , Mel Gorman , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 07/12] mempolicy: mpol_shared_policy_init() without pseudo-vma In-Reply-To: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> Message-ID: References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: B95872000B X-Rspam-User: X-Stat-Signature: 9e4uzw5shkbntswi89s55ww8iesa96ss X-Rspamd-Server: rspam03 X-HE-Tag: 1695630572-315139 X-HE-Meta: U2FsdGVkX19Sh2O8YJx9+dKhdICyknnvKwTCZnCi2YX3neGymp+WmE3nhtK/B81HiDvJ5HOnbtd2/kT553AqkxwBftjWp2niIFT+8KEvGMLoOIDgP4dwWjL5P2i2gK2FJf1xpqojiLofAiyBMQGWKTLeE+N+j8CQowHIx3UoQ3oULJI38Gh3HO7e0gE1ICrxK0N+KnGMoDeTtOJQe3rAmlsbpM3cNd1+giZKFHPvYgJMAxXEnWsM3U8oN4x/rUBiJOT5Ht4mr/EiPQ5LPZsRi/gAQPJOjbsLvFu5A/IdFVIXsEaByIkD4+aC8dx8XY1iSifzRWBD3Bb1+YGcGAWxca0o8UfwXuk2fhBO5iu6c2ho2rjCWsQe0LnoE1Wu9k3fdjPPcRoD6dz9WV3ekpasXf4/2x6+9H5cL3PkM774rHgLUWfIN4uqT4teRM/YWRXpQvP1pV4BEm4D6fjXShRKsiy9w8dn6yAYEhf8/5t/xDXgdyuibyWIcHkjAHFcjG0xGQ1EjQHCnyHEqC2ZhPrhrLbwAzVjuw7F+dRTUnXp/7fPUQZx24S0BhKsKnOLrwCHeqUqpn1u3vAEX/OZu6HlDPpRDSatphhrec8jPN3Nx0L0DjolMffJfiDsnWixpDPkSLD4h6Yx21sUyJbu6nNTfmqMdjyQG1pSo0HCdNfRV2EpAXJBHXdyK5zpsCxb375NRd3WFCPitdy/Hi5kyT9eEMHVxeOj37HhrjB+biGaUe6V05JtMqVVfiH5LGgnBHIU+QonQ0Dz0FHqItyOGJJVWCQi+XAeQjTTWPOv6jz6SfgHCZ9bFP4g03b58xXDhP/w+EdpdI8B3mIFdcjkEBCnPuitJZc6tX0kwYgW/vbuQVUfYKBzV+IVQlnrh/FbWunrSvstgfFbSbmCfdgLz+stgSxQz7YGkZ4nY9XJcmxNx2mVAxeZSJzzmprt3D7HfKte7ysQBx8np9c9q2gjv5F 4dPwt7bK Ro3qHf9X/Dbn/7GDqX8BDwLrAyH5sDJpNCxmYM9t1eCJ8JZkAuH6PoIYnSj0jduyxrdshqLK6x+4O7ZmfADyqIOswIpU//rXJSXSTzFLELcs/Rzjsguxa6lQEcqWweFpKJ5ML5S+OxMulCBep9rzimd8o0lYmu3CQ9dB7PKYuq9Lf7juhA9ZXWI+JzqpeOrojYKCsdsQ+HTA7JV136YSLLHHFW9xjwMJRGqpZQqvq12BxXvTO7tLvRwGHqrdDGCtMF22vzP696+bgj0at4RDQLdlHlzlSXMamWot2BONS1eWRAxEjOUKCKu6zYrFoTi7UNp89z+2YWlhuTEhNvfvxCK9xDacsUDVI1gYJzUTc0F1L7Z9mtGUWbaxzUcXcVlOJN9kDvoRP/eZU2r6ff/SRULot4WOg2233wIxjrxc06wDVtce2X1d4Bum5QJDIh8srSG2ZJGdU2vkCH/TYRLn7VAR1mymXYWSN6T6HeXDjOflkHhPl8TyVETcPrAsISL32SYCE X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: mpol_shared_policy_init() does not need to use a pseudo-vma: it can use sp_alloc() and sp_insert() directly, since the object's shared policy tree is empty and inaccessible (needing no lock) at get_inode() time. Signed-off-by: Hugh Dickins --- mm/mempolicy.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 065e886ec9b6..a22b641cfd6b 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2749,7 +2749,7 @@ void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol) rwlock_init(&sp->lock); if (mpol) { - struct vm_area_struct pvma; + struct sp_node *n; struct mempolicy *new; NODEMASK_SCRATCH(scratch); @@ -2766,11 +2766,10 @@ void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol) if (ret) goto put_new; - /* Create pseudo-vma that contains just the policy */ - vma_init(&pvma, NULL); - pvma.vm_end = TASK_SIZE; /* policy covers entire file */ - mpol_set_shared_policy(sp, &pvma, new); /* adds ref */ - + /* alloc node covering entire file; adds ref to new */ + n = sp_alloc(0, MAX_LFS_FILESIZE >> PAGE_SHIFT, new); + if (n) + sp_insert(sp, n); put_new: mpol_put(new); /* drop initial ref */ free_scratch: From patchwork Mon Sep 25 08:30:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13397485 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2463ECE7A89 for ; Mon, 25 Sep 2023 08:30:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B97E08D0016; Mon, 25 Sep 2023 04:30:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B48668D0001; Mon, 25 Sep 2023 04:30:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A38548D0016; Mon, 25 Sep 2023 04:30:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9549F8D0001 for ; Mon, 25 Sep 2023 04:30:57 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 6A63F1A030F for ; Mon, 25 Sep 2023 08:30:57 +0000 (UTC) X-FDA: 81274449354.21.ECA63B7 Received: from mail-yw1-f170.google.com (mail-yw1-f170.google.com [209.85.128.170]) by imf03.hostedemail.com (Postfix) with ESMTP id 92F632000D for ; Mon, 25 Sep 2023 08:30:55 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ijMCI2rC; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of hughd@google.com designates 209.85.128.170 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695630655; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ulgX9THEbQN3dQQC629PIbXmK6AWD1UO9BuCeRr1HmQ=; b=Of0owKEinj3Tb/i2eb3hDrZ+Cr7SLIvN0XwRYD6rMf1i2n5ooZkDO0hPBXd0ezgNwJdFsM aMXIc9EJ3mwhPYg25u6IypnoZjKDfoWfgwaqAbB51F0ChIbPHxKHbtALqVODHY8Efi/G+K EBPCWxB6Af62wiHV9EUPwNTMBWRBFLc= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ijMCI2rC; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of hughd@google.com designates 209.85.128.170 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695630655; a=rsa-sha256; cv=none; b=ACxTzJAK2BF1Cwdjc7PMHpEsz7j/G+fd5Bmi1E4wzHptZIUUywzHOF7HGt6hxxU071wVBf 0p7IyiC8+Isy63FVHXOLLUFy3mrK3XT22QiEro7iPc7Ge6nywsY0jCOCicJW457FUf8CXB F50Utjvvf8M7WVIHWHW4k1YQmHBx41Y= Received: by mail-yw1-f170.google.com with SMTP id 00721157ae682-59bf1dde73fso73722437b3.3 for ; Mon, 25 Sep 2023 01:30:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695630654; x=1696235454; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=ulgX9THEbQN3dQQC629PIbXmK6AWD1UO9BuCeRr1HmQ=; b=ijMCI2rCRJFjOWU6IL3JADMzJA+066oAQnDc9SGPkNVyVt3CioYRcFmCPQ8Nal9As6 hCprmMyuAw46oc/KNtNw1Jy25KBrcXJF4CgsjMrQieGCJddK2cfQe7wakrxUniII/1nk o/IboKaitjMq/P/htq/oqqnQH1g+OYCR1mAEb4dLXiVxfQiR+ezCqz5JevxQDEfqpr1R S6I9wDn4Nb0jawvkZQ5INxzfDGB3kBJdvawnW0ts6fJJ3C/jiqe8MACx204eshPMO/DI 0EYfokaYVGUilDcoA4vIB+RSb9n1MxoqL9eRCdBLoHmaHegy1sGBfoG9E60nK/zJf8Uu nsDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695630654; x=1696235454; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ulgX9THEbQN3dQQC629PIbXmK6AWD1UO9BuCeRr1HmQ=; b=KIntgr0SsVToUb8kYnxRMiKwb2B7Ngn+Puk4px4jknYBj1voDgu5Gb5iQtFmI/GWwJ m35aWZflr6zqNEYCq01XvFR9l8e8KkGGPa6pOfkE/hQy/Y4ZHYJsi5aQHpgyYHuF+rEI AtwfRnLEcoJXacCBS7t/kpMvDLC2CpLXzXwjf+wZAboBldOkdEiKiVV+NvGQ/Ua3UZGM M9OY/ZCC6FT/64kXQ0KZ8MFLlK4s6hXstu56fBXv+CQDhY5nkHfZ7eLz/PilSv7ApU37 NP4Ia9+vQMh9jKdN+BnvzmSiapbl3RehTgHwYzJNCA9hllfjTPfsk34thCgIQO34YLqt JSug== X-Gm-Message-State: AOJu0YxGEXBw646+vQYeJ8RMXIgvkgGyCem4W/m3QeLnsDLxaqCDhm0y VIXty5pVDM6hbmKSqKg3zo88rw== X-Google-Smtp-Source: AGHT+IEGFXRC2nnrJxyRas7mIKt1ska0D212eRm9P7TUFWDSpRyUl8E9tDZ9Nksk92zJ5nQ/VXHlUQ== X-Received: by 2002:a05:690c:3603:b0:571:11ea:b2dd with SMTP id ft3-20020a05690c360300b0057111eab2ddmr5723589ywb.32.1695630654594; Mon, 25 Sep 2023 01:30:54 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id j185-20020a816ec2000000b00592236855cesm2259304ywc.61.2023.09.25.01.30.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 01:30:53 -0700 (PDT) Date: Mon, 25 Sep 2023 01:30:51 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Andi Kleen , Christoph Lameter , Matthew Wilcox , Mike Kravetz , David Hildenbrand , Suren Baghdasaryan , Yang Shi , Sidhartha Kumar , Vishal Moola , Kefeng Wang , Greg Kroah-Hartman , Tejun Heo , Mel Gorman , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 08/12] mempolicy: remove confusing MPOL_MF_LAZY dead code In-Reply-To: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> Message-ID: <2cb8b08a-a96c-2a61-94dd-4cd51ad0605d@google.com> References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 92F632000D X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 6awgqkg7tf86ywz6aa9e8kpnw541r4e4 X-HE-Tag: 1695630655-315254 X-HE-Meta: U2FsdGVkX18qhPhH2u3T4pWS3PNrk3kd8TD+UgVg+GwvzsxQXT87KKg32o1n3BfsVzw1swS8V0D7zG1pMD1gWcZGNxYNkzEUrnhDMGYvA4HutvbdlZumI2bab1nqVZo0RNOu5Mm71eWj70M5Pj5eIdTsHNXwnfD4gxOGQ2z/wnEAqXZ5JQveuE1t/zfvg3AjZPup0+sNCvjtbmAxQPYMiSazkvhJRlMc6HHwo/ya252bqeCDqX+Fc/IjdcvoXslhDiRX7lEzomlNLRueYvBm07+vDbYsBIF+CSez55iRamFPxXoCnJ/101uztzzrix4tq29ohFC23FxSFFitir1iby0qp7pFx179z1Xl14pbzy2eF2zm3PdX+zEaK7cL27q1yIUVfLwHKgjwcjCI+DGvtKU7mFcg5ufsWj1nxn1OIa78IEwQyV44/5uWNw9Zvy0d2LC+DjAH63mjhxdsQIfSTnPq07eJuXcsiDKAkLjBRcixUMIZKyRgVDNCUDQp8jfz4kUOApK7qTSIwV27GE9i/ZzK5wQ9u97fF2JtlgzukRPHMBZmIEDp3sJFPXqojrk1IFhoswXNoiWvC+4H4+lddscfrSYha6CXwglGb0vqausuocN+5f98mlCjkc1w2x0ccmVCEGQ9zNNVAnytM+tUOKF96X+4rN2w0Yjv+w/0bZHjhkQmWwAf77JQxbZO78WOr+tD0ZABOUHMCDkwEfM8NFLMoInsgh/simC3Qoc0XHgswYUObW97jMVKkxEc2R/WCYAQzG8vnbYtzlwTpb2GlatWaNnA5iF14ZybY59FpxNhvjyIibBwnyCzEoTP3m/e+U8tHbPHKtDHjzwNz3b8KsEmUJxkKbVpFBuS8wfV6Opyb0PSvBayuvJ4Bov0SQ/8KJE2y4V2CDDezrg5nOxlUNj7QskHG9PD/Pa93GmLznGU3VdvD/nf2O212jJlMVmKDD0zA49F74fISoBF9XA 9WrWgVzj wnqmTJxTsYSe4uLa33rdhAz8e8RL+GMwm+nMpRM3c1iHjKUiN29U4phKdZSvd64P7lKYDawGtanZVHyFkOFPSQlY0HdB0DTMYT4KgNk82gvLsQfEskAXoKltVT0P5u15wfAVMugMVABN9oeqxdq7nn1eBVuIgLxlTgFdvLl7AvGbTnQMTZlUQXrGoLqWE4UOeQ0Ml3ibE9nSMfHWoAYitykoXYn+9uOo+Gw9FELs2SJFR1oche2sA6+PtOrsXw0PgSkOZ0ih61GeqecUnJ0zUznO8+fDVBvWWiW+gKqE2sGAAdwVrZ6A/4YFs7B0S1MszI/EbSFa75ygDEMG/hVcDhuUNts0/RT6wTkG6l6Nl7Hw3RkYi7+jSmHCZUqBAju/wy6uHBs7N4e3U68cAqz32wAcRigiE+m06i61ey1nr7KMOTaagWQbM4vFpm/r3rbN2DPkio2T4WMFvuGGDqwY20kKL8xgk0GRixBSWO5lURJi6wTjBP46eTOSwVqVj5siDPVDKpLHgrLuIrCsmdAnBcvFzqlMGAmPstXTrT3DUonbMPgp8yZDoCJm1R6bZxoACfstZqipJeitOPxF7YoXwhUpXnQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: v3.8 commit b24f53a0bea3 ("mm: mempolicy: Add MPOL_MF_LAZY") introduced MPOL_MF_LAZY, and included it in the MPOL_MF_VALID flags; but a720094ded8 ("mm: mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now") immediately removed it from MPOL_MF_VALID flags, pending further review. "This will need to be revisited", but it has not been reinstated. The present state is confusing: there is dead code in mm/mempolicy.c to handle MPOL_MF_LAZY cases which can never occur. Remove that: it can be resurrected later if necessary. But keep the definition of MPOL_MF_LAZY, which must remain in the UAPI, even though it always fails with EINVAL. https://lore.kernel.org/linux-mm/1553041659-46787-1-git-send-email-yang.shi@linux.alibaba.com/ links to a previous request to remove MPOL_MF_LAZY. Signed-off-by: Hugh Dickins Reviewed-by: Matthew Wilcox (Oracle) --- include/uapi/linux/mempolicy.h | 2 +- mm/mempolicy.c | 18 ------------------ 2 files changed, 1 insertion(+), 19 deletions(-) diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h index 046d0ccba4cd..a8963f7ef4c2 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -48,7 +48,7 @@ enum { #define MPOL_MF_MOVE (1<<1) /* Move pages owned by this process to conform to policy */ #define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to policy */ -#define MPOL_MF_LAZY (1<<3) /* Modifies '_MOVE: lazy migrate on fault */ +#define MPOL_MF_LAZY (1<<3) /* UNSUPPORTED FLAG: Lazy migrate on fault */ #define MPOL_MF_INTERNAL (1<<4) /* Internal flags start here */ #define MPOL_MF_VALID (MPOL_MF_STRICT | \ diff --git a/mm/mempolicy.c b/mm/mempolicy.c index a22b641cfd6b..7ab6102d7da4 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -632,12 +632,6 @@ unsigned long change_prot_numa(struct vm_area_struct *vma, return nr_updated; } -#else -static unsigned long change_prot_numa(struct vm_area_struct *vma, - unsigned long addr, unsigned long end) -{ - return 0; -} #endif /* CONFIG_NUMA_BALANCING */ static int queue_pages_test_walk(unsigned long start, unsigned long end, @@ -676,14 +670,6 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end, if (endvma > end) endvma = end; - if (flags & MPOL_MF_LAZY) { - /* Similar to task_numa_work, skip inaccessible VMAs */ - if (!is_vm_hugetlb_page(vma) && vma_is_accessible(vma) && - !(vma->vm_flags & VM_MIXEDMAP)) - change_prot_numa(vma, start, endvma); - return 1; - } - /* * Check page nodes, and queue pages to move, in the current vma. * But if no moving, and no strict checking, the scan can be skipped. @@ -1271,9 +1257,6 @@ static long do_mbind(unsigned long start, unsigned long len, if (IS_ERR(new)) return PTR_ERR(new); - if (flags & MPOL_MF_LAZY) - new->flags |= MPOL_F_MOF; - /* * If we are using the default policy then operation * on discontinuous address spaces is okay after all @@ -1318,7 +1301,6 @@ static long do_mbind(unsigned long start, unsigned long len, if (!err) { if (!list_empty(&pagelist)) { - WARN_ON_ONCE(flags & MPOL_MF_LAZY); nr_failed |= migrate_pages(&pagelist, new_folio, NULL, start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND, NULL); } From patchwork Mon Sep 25 08:32:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13397486 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 795C1CE7A89 for ; Mon, 25 Sep 2023 08:32:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 187E78D0017; Mon, 25 Sep 2023 04:32:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 137868D0001; Mon, 25 Sep 2023 04:32:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 026578D0017; Mon, 25 Sep 2023 04:32:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E8EF38D0001 for ; Mon, 25 Sep 2023 04:32:08 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A346EB3CB5 for ; Mon, 25 Sep 2023 08:32:08 +0000 (UTC) X-FDA: 81274452336.01.2CDDED1 Received: from mail-yw1-f173.google.com (mail-yw1-f173.google.com [209.85.128.173]) by imf09.hostedemail.com (Postfix) with ESMTP id C7A12140008 for ; Mon, 25 Sep 2023 08:32:06 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oDPuvRHV; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of hughd@google.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695630726; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ESRqX1PlrIchswbjY6F5gY3I+wvwb2kVpQ+ph7N3oWo=; b=SlvbBX3YtgCCKQAndlQBD/YByJDU29XkkLq6F3atZfsHolIrlYZXiqbTaTUaJIXl0gzDaB E5qxSEZYHMY7DtkGcSqToqqTWYw5YX2lKfBOn0z1LM19L6K8DhOlqciSOUmyV5Ykso0IDs xbjb1EKyb0lalOQ+1Bvhghvva0Vv1bk= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oDPuvRHV; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of hughd@google.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695630726; a=rsa-sha256; cv=none; b=Usd1W5pk+aBbkERAyBq4SA1OzbYi2rxA1pu0dHGxlCrA9C8P+2It4+hBk7wQ+CqiQFmMlG yBG3W/tqY+gCRQnCYMwrtU7XMdKvdxJQOVBhxwIvgqKpzvgcqi1sIxJDMOavosB8e1mtKC OIijeFmibOeblvpnHwTsY8FPnuZhwmE= Received: by mail-yw1-f173.google.com with SMTP id 00721157ae682-59f7cc71e2eso15151797b3.0 for ; Mon, 25 Sep 2023 01:32:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695630726; x=1696235526; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=ESRqX1PlrIchswbjY6F5gY3I+wvwb2kVpQ+ph7N3oWo=; b=oDPuvRHVVuW5Eqn4Z3wZqhYNGABjxs/6oQp7mcjml1ly1ezZnPSYmmqxGIgmsUEIXN zRKKyHwDg2Ia1CIb7tW1zlWyAy75tZtRL9bEtxcNNZo2G2BgbZl78ju6JDAHgvgJpfct NDk+TBt9olfc6eObB4a6mtYR0ahUv+XaM0wK8176UxkoixplSWMu4U/HMtosFB45d751 4kjIBI7tmIMGX39O4ZRnbWHnK65RzaIqf5pgSO7GAwcdOSRLTe8cpXvHvndDMG8Icgn0 EHnsqtL2EbCcUW1pz36zrnt+LsKlbc9d3POv4YsZ80YEMFad/hO6QXsKLeYuCbRuao2r 7pEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695630726; x=1696235526; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ESRqX1PlrIchswbjY6F5gY3I+wvwb2kVpQ+ph7N3oWo=; b=eImcCoRKtW16OoA0C/QyPjDcEMiH80UNK50xyQk0+B3VkGN4av9oaJsiDG59ZucOo0 YgW10u+Rploru5y2iJuwpk0getOZC6RYzIfm3nK2UpbVewiAGvx7uav+Cfee22mrgasn yrvxAtdoV45eJXe3snnhn8dbcuwB8t30+wrDKKz1uDrYbblAM7vOSkAAovNp++FYxNUf /udufoWBzKdmwg56QYGQqKh0xbG8T+BlfDN5he7seBLGAv+glzrl2z4sTWwaJ89OiLW0 33GdSMfKVGJgP5e1YWgsi9wbj0xhZJk7fELfgYkPaWJwcojmHwnGtK52Gww87nXiwl9j 8GIQ== X-Gm-Message-State: AOJu0Yyk9AKmUqWYA8UFQ/Udk9CDzQD4B+kxS+jLvpGFxCiJXr+aYgI8 40MH8QArh9hThgw/Nocfj+A80g== X-Google-Smtp-Source: AGHT+IFI7sWdT9SaX/WaLFmLycQV1E3ZHD2waCsU6dSHFKECPifOP+qkTIKONo5XkYDTZbJNeNdwNw== X-Received: by 2002:a25:744b:0:b0:d84:da24:96de with SMTP id p72-20020a25744b000000b00d84da2496demr5304926ybc.33.1695630725781; Mon, 25 Sep 2023 01:32:05 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id t3-20020a056902018300b00d1b86efc0ffsm2024884ybh.6.2023.09.25.01.32.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 01:32:05 -0700 (PDT) Date: Mon, 25 Sep 2023 01:32:02 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Andi Kleen , Christoph Lameter , Matthew Wilcox , Mike Kravetz , David Hildenbrand , Suren Baghdasaryan , Yang Shi , Sidhartha Kumar , Vishal Moola , Kefeng Wang , Greg Kroah-Hartman , Tejun Heo , Mel Gorman , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 09/12] mm: add page_rmappable_folio() wrapper In-Reply-To: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> Message-ID: References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: C7A12140008 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 4397qdns587rpqodramuto4wd7k7zq5e X-HE-Tag: 1695630726-680514 X-HE-Meta: U2FsdGVkX18da0zDZMQSYnjQyjYBJZaEM2w4BwX5yMPK7S8GkM6WL80DNtl1dI8PW/NGcILxEtpLhIJrIfcN7+zsm/UZEeYduyOCKTX1U7xIk5XWLK1LhcbsJH27FMmVEIJf94W+jyizle6yNe9Mbj55wnQyyE+ZaoZM1TlIpRTEDBGTsix1BXOVlsIaMw7frhDtDK5RyJpIDJYe4bhuNsNIfra9mWcDG3pBRgGztvmPLdChG/gbaizhLcyKI86/I9UCAx2wGsAaEsiVrdfoIJLxI+19Nx9wlEnqsYC5ggvAhhwZ12xXYb9dqjkQA+WR6WaNoHv7YZ9twSchexpfyl5np5lxy/i5kk8STsvcntRx4sc2jkxoF34Wk83PIb+Sy5Vcv4h38MM+HC6fofBeKceBehixaxX8TCA8qGuxK/SoW7K+BOYHCXwsdRS1LokRPJRk9Bebj6c5l05BEfvCcV5PDVbeyuSfm/1Wg4VXWuUDFGtkbq3j/9yeswsWgl3L1q4sfgf4e06ioLBBp+n/v8RAs/pbvt+G+Czcq+lvP1+En2wb6Nz6+2o8tVVPKGs/Ab3kPkWBcPQPNilGIKyXtn2INMOStucwxgT6ypl07EhmoYkgGe3UJPzzZbypBrJGHF57sXjWL7S014JFNP/ogKZDug6nS+3ki9V+iqP9LB3hAtUVRdRwba6Vx16gvtLCQ5aQcSmR2DptRAJQGwktF5e/RVFX1+wSGALwZYjulLH3+xaprmdZKMMj64qwF8UhZf1uV3vrzTYFi98fc+HjYn6+6t2dvUF5zN3eTIwS0vwSLdAzrpALMyKyJRDak0TkpT73fNTkOFT6WZ6CNDHg8CcwBarx53WQ3sJN83bkM74tv2lZZvn+6mEpd2Ba0jyJ68Z3WPKS93i3BopiAcpN+ZUbEg0FuuaosI3t6Ht9fkh88FpTvoK2Br29VUonUeb2Es40Q/Un6ku4ML1sX30 1bV7GuJA Ic1w6mpaQVzm2Lhcx/79sZ+eXh+9Cns/UA7O+Iz+vDxY35dtMOQiHnSGfIIPrB+Fnt3ilFXQZ9ALqXOsASa5KCDWAVk+sgi5//QvBqU/DGmzOdBegofAo2KPWe8CjRKqOZpVJLypFBC+nITA7LxNBhOyqBvEuz6JGZlcLwq/+F0q7/qHWJ+13RktGKHWvDnIHTLNbBQsy+7Xqv4hI2r/M1XXrsKSnJuxQLwXqA5ybhJJ6V3uBefELbz2LM96cAx9S+6OwjA5CJjCq+Qaw51T5VI8YR+HBrcUjcjEj2kd7bZl1axCRUC9+3hArtIC9WWUGe0UT/I+Eyo8adbgTwXj3fyDIYQZ9HV/FEZUC2q00HObT9s05DVI3i4G3DsA14hl2O4z+3fxmX7AJADom61zzgvvAaaWtCviUnwz3yPVMVkvShYJtVvBZ4AIRVKOhpNbQXSP8S+KwcH7pUQ6n3B+fkp317MKL4pYAz+55xzSW1XSaZURpu9YSUo6ebT0WM+lGkWK0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: folio_prep_large_rmappable() is being used repeatedly along with a conversion from page to folio, a check non-NULL, a check order > 1: wrap it all up into struct folio *page_rmappable_folio(struct page *). Signed-off-by: Hugh Dickins --- include/linux/huge_mm.h | 13 +++++++++++++ mm/mempolicy.c | 17 +++-------------- mm/page_alloc.c | 8 ++------ 3 files changed, 18 insertions(+), 20 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index fa0350b0812a..58e7662a8a62 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -141,6 +141,15 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); void folio_prep_large_rmappable(struct folio *folio); +static inline struct folio *page_rmappable_folio(struct page *page) +{ + struct folio *folio = (struct folio *)page; + + if (folio && folio_order(folio) > 1) + folio_prep_large_rmappable(folio); + return folio; +} + bool can_split_folio(struct folio *folio, int *pextra_pins); int split_huge_page_to_list(struct page *page, struct list_head *list); static inline int split_huge_page(struct page *page) @@ -281,6 +290,10 @@ static inline bool hugepage_vma_check(struct vm_area_struct *vma, } static inline void folio_prep_large_rmappable(struct folio *folio) {} +static inline struct folio *page_rmappable_folio(struct page *page) +{ + return (struct folio *)page; +} #define transparent_hugepage_flags 0UL diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 7ab6102d7da4..4c3b3f535630 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2137,10 +2137,7 @@ struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma, mpol_cond_put(pol); gfp |= __GFP_COMP; page = alloc_page_interleave(gfp, order, nid); - folio = (struct folio *)page; - if (folio && order > 1) - folio_prep_large_rmappable(folio); - goto out; + return page_rmappable_folio(page); } if (pol->mode == MPOL_PREFERRED_MANY) { @@ -2150,10 +2147,7 @@ struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma, gfp |= __GFP_COMP; page = alloc_pages_preferred_many(gfp, order, node, pol); mpol_cond_put(pol); - folio = (struct folio *)page; - if (folio && order > 1) - folio_prep_large_rmappable(folio); - goto out; + return page_rmappable_folio(page); } if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage)) { @@ -2247,12 +2241,7 @@ EXPORT_SYMBOL(alloc_pages); struct folio *folio_alloc(gfp_t gfp, unsigned order) { - struct page *page = alloc_pages(gfp | __GFP_COMP, order); - struct folio *folio = (struct folio *)page; - - if (folio && order > 1) - folio_prep_large_rmappable(folio); - return folio; + return page_rmappable_folio(alloc_pages(gfp | __GFP_COMP, order)); } EXPORT_SYMBOL(folio_alloc); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 95546f376302..5b1707d9025a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4456,12 +4456,8 @@ struct folio *__folio_alloc(gfp_t gfp, unsigned int order, int preferred_nid, nodemask_t *nodemask) { struct page *page = __alloc_pages(gfp | __GFP_COMP, order, - preferred_nid, nodemask); - struct folio *folio = (struct folio *)page; - - if (folio && order > 1) - folio_prep_large_rmappable(folio); - return folio; + preferred_nid, nodemask); + return page_rmappable_folio(page); } EXPORT_SYMBOL(__folio_alloc); From patchwork Mon Sep 25 08:33:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13397487 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6F7BCE7A89 for ; Mon, 25 Sep 2023 08:33:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6DEE88D0018; Mon, 25 Sep 2023 04:33:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 669A28D0001; Mon, 25 Sep 2023 04:33:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 495A78D0018; Mon, 25 Sep 2023 04:33:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 31CBE8D0001 for ; Mon, 25 Sep 2023 04:33:37 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 055E9C0502 for ; Mon, 25 Sep 2023 08:33:36 +0000 (UTC) X-FDA: 81274456074.11.BC4ED47 Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) by imf10.hostedemail.com (Postfix) with ESMTP id 0E3B5C0002 for ; Mon, 25 Sep 2023 08:33:34 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=zV27ErPg; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of hughd@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695630815; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=joHrE76+BZ/TkFMkYd+VVkSdXy8EoXIxUO0b5+SdE6c=; b=SdD9btAlWmj3UVsUubhlhQvI2lmEtYdtEcGb1Q7xcnVTlmq53CnbZSRca1RP4gkxuXyn60 X1kdoHUACxDsB2O1GwvN/gTTAfgruYZF9AZpVf3yAKMmJbL6kQBo7R1CCbzCN2aBgukSJF Q0ROexy0aBtMqPwOu+eQ24iwoeyY0NQ= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=zV27ErPg; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of hughd@google.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695630815; a=rsa-sha256; cv=none; b=SqeG6F+oBlKD+D6lZ3e/uOym0Wb5Zm6MNqy2UHXN6cHbbCaQlVxAmZd3IUuLM5861+Dc/F Q1WTOrl3dQCPtOSz535mrDbnszfo7m14kWBiDqehbXt0HYbkTv7N6hcRIwzZK/8w0QbCMi s754uX923TZg8aE1qzyS0grN9bFNl80= Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-59f7f46b326so13902917b3.0 for ; Mon, 25 Sep 2023 01:33:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695630814; x=1696235614; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=joHrE76+BZ/TkFMkYd+VVkSdXy8EoXIxUO0b5+SdE6c=; b=zV27ErPgFd5HkAVKI7oV3vdin/9KgNqQlLz44GLib254QTv6DXyPnc5k/hJqMTt2DS CQve7L8/ohiut+Te1V8ISlWEcafLAxGodJ2Yfwb3TV7Xo9QAaHqXVz1EGhRYLeSr8A8K 8XyUUg19/BfLK8eI9KP4rWsHddn13dd0Z+xTpqEe6TL05AQ0VuluLgqNmu3RzBMqfIsy GPLR0aVO3wFGrDSu3pW4u4mCj0rk5x1yLVY6UwKwxPUOriaYkLzurSaQWHW0e0+f63E9 bx7oXomMZ0ARyUeR99kdH1PGCqN+ulOOv50PlD6Ww6f8BHw5OPItchZYwiicUU+t3SYR L5rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695630814; x=1696235614; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=joHrE76+BZ/TkFMkYd+VVkSdXy8EoXIxUO0b5+SdE6c=; b=pJopKIuwxu/Ki3dIW/R2km/UVK/P4MxgYNFtng3XiWuUQUCkMwtBN48c4MeBn0l5wu 23zOtNzxYl1P3BJOT/F6+3D25kCFNLFB3W9bfiNLBM23q9DWkgSzfsSBRGXZNUOii7hX x9BYbWrW0CBTWqRK5uxCtvGjB81JkT0mBDEAVtkB1a+3T0a82yihcjiccYCjrw6MNAhS fdHkdWisN6ZJ1MGi1JYehPnXwffDSgdc9AAIx9TpH8T9TE8lQ2P8A5xn6etV8O0A4MUb niPCxRKorthR17ARUULZxZ6ZMaDI7Utjkb9bw5dBGNbd9NLDM0CUIy0sGMT+2qNcqk/n FtOw== X-Gm-Message-State: AOJu0YzoO+nR8BjPiJQoEmO7vjAXmokYZPp9cgCA33tSWGNLG7cpWDOm hphzOt6j4esLZzjo6OSsdC1dzQ== X-Google-Smtp-Source: AGHT+IEPjbstgzDus+yXL1ff8uYWPlteLpHjw4Yk9UR4CFsuBV9W8D4OqRhf0CYjN+MEFU6YFHTNtw== X-Received: by 2002:a05:690c:2fc3:b0:59f:4c52:2f5d with SMTP id ex3-20020a05690c2fc300b0059f4c522f5dmr4929843ywb.2.1695630813744; Mon, 25 Sep 2023 01:33:33 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id g203-20020a0dddd4000000b00592548b2c47sm2294036ywe.80.2023.09.25.01.33.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 01:33:32 -0700 (PDT) Date: Mon, 25 Sep 2023 01:33:30 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Andi Kleen , Christoph Lameter , Matthew Wilcox , Mike Kravetz , David Hildenbrand , Suren Baghdasaryan , Yang Shi , Sidhartha Kumar , Vishal Moola , Kefeng Wang , Greg Kroah-Hartman , Tejun Heo , Mel Gorman , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 10/12] mempolicy: alloc_pages_mpol() for NUMA policy without vma In-Reply-To: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> Message-ID: References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: xwptc71y81dr8fc4x166a5w8og9mjdw6 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 0E3B5C0002 X-HE-Tag: 1695630814-54364 X-HE-Meta: U2FsdGVkX18+pHBKwaciSlkQr3UjBpOgwzxeU2Wr+Q4ntllE+cPZwkv31MYQjNJGSINhuSK/ZQGjqvOfXFmvuaQ3qugFvN+0R2cswVMToQCusXKIjs9JvowXiazkGdosfEJV1T3ZffP4ihbxWH4zsPBFiCl43u+PjRB5frUgMC18DbZnIRrUrJiTGEZoPeZ2+w+w2bIeZxCLlaF8z8fq3nQnBEW0ETIIgUUzaCPdlYMfCgaPNLZo1JsrwhrYRrJzQ9mp6st2piUe/KV5RoyyexyFI8DW56xNLbrsh5hUb+60H/i0Z3njT3rvLwl7lN3KB7IBKkb6y07CFWJXn+9wxt2TjdupEiqZrxXyPh/qk3ApUyI/dXXecvuxSbqNRk8lHl6pC+79a9YfVby3ee0swJ7Xp7QsaRMPapW1bnJ5oHYAg9LeZ3AWAMplsnshts/NeSjH871oumqjezhP2RfRqENNeifaZ7bzheiHkBJxn7a+dit5/0J8FHiQFmr2VZQVRLvLeGp94ybE0HJz0SxBZj5Z+VXPUwvCuB7w3i+T9iDkOOEa/Eas6mDr10pruAKdnSxJTLteTqNPhHv8XJEbB9qavry2RtryR8Lax7X3uTRKqT0ONR1Vj2GjzV5myysT4z3hgp253kINBDQ8P9uA7RIooYlv2+NzJCkjxeNn9oJ18b5smXHCVz0E0Dk+LPTRQTsHN+LyyKcCygvO9ijURxTDvZGCUfz5WGOrAsOC6F3QX73EttSYolGfgONMht2EmR8y4h0mX0lrkMpl2Uz9CMtKLaRrELe6O4YM+U7x07F9NFIzxHK7ivvGL/t5a0/b0qcq/hl73Ejgv6yYOpmuzGJQ613zldVrPJogb/ptALH8js73Ezq6mGp65d6AlT0Tm+JmYQm8XVEv/UXv0CPc40xVEOT2Vwg2YuYVpGeP6ZwaxEoz6LzVe1z2TIsUetJzoNZaLeMuKHIAzwfDDMG 0tGqFZfr ctFjn03uKm6GNjlbMYPQsvexSXTkdnoCyHLRlP7M1ePKFI7Fg7W7gOQEe+xasHfqBeKDDf28oYmWtRsw6HjS+tunVWLLDp8785xlUTX3fkNZbGcDTGOSx4KJfEvNQmyM8/dz2LjnZdbv5Jnt2NCUMn5po3C70OQ8X1Bp8FphPUUweVQyduiVtKOfDzUHt4xCvIjpC9ntYthlBWoa7upvQjmv8fcC1IMdtHJhFD8k/mxzUr49YUrBzB8hyJyEJHIWjlI0FS4/mtorts7O0fE9/o0976MMm8Fu/51iufpr1kL17ddRNWgXlfI6XC8xuhz2etmvLeMhDXmr0vagtX+ud5h8/MaUFf6GglNN6r3eUqtk68iEX030fNH0eZkcVuHAzjTBNlFPvwoCa8VRqV8uq9DWB/Nwggg2QcC/OHQWuSRplfcY4nF4DD2Xe3nOcAGP2IihNPY1u6n1M3HOtV+a+8nZmBsNOBDQQbdaYWwK17VRLfx5mAYzM7LoyLA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shrink shmem's stack usage by eliminating the pseudo-vma from its folio allocation. alloc_pages_mpol(gfp, order, pol, ilx, nid) becomes the principal actor for passing mempolicy choice down to __alloc_pages(), rather than vma_alloc_folio(gfp, order, vma, addr, hugepage). vma_alloc_folio() and alloc_pages() remain, but as wrappers around alloc_pages_mpol(). alloc_pages_bulk_*() untouched, except to provide the additional args to policy_nodemask(), which subsumes policy_node(). Cleanup throughout, cutting out some unhelpful "helpers". It would all be much simpler without MPOL_INTERLEAVE, but that adds a dynamic to the constant mpol: complicated by v3.6 commit 09c231cb8bfd ("tmpfs: distribute interleave better across nodes"), which added ino bias to the interleave, hidden from mm/mempolicy.c until this commit. Hence "ilx" throughout, the "interleave index". Originally I thought it could be done just with nid, but that's wrong: the nodemask may come from the shared policy layer below a shmem vma, or it may come from the task layer above a shmem vma; and without the final nodemask then nodeid cannot be decided. And how ilx is applied depends also on page order. The interleave index is almost always irrelevant unless MPOL_INTERLEAVE: with one exception in alloc_pages_mpol(), where the NO_INTERLEAVE_INDEX passed down from vma-less alloc_pages() is also used as hint not to use THP-style hugepage allocation - to avoid the overhead of a hugepage arg (though I don't understand why we never just added a GFP bit for THP - if it actually needs a different allocation strategy from other pages of the same order). vma_alloc_folio() still carries its hugepage arg here, but it is not used, and should be removed when agreed. get_vma_policy() no longer allows a NULL vma: over time I believe we've eradicated all the places which used to need it e.g. swapoff and madvise used to pass NULL vma to read_swap_cache_async(), but now know the vma. Signed-off-by: Hugh Dickins --- fs/proc/task_mmu.c | 5 +- include/linux/gfp.h | 10 +- include/linux/mempolicy.h | 13 +- include/linux/mm.h | 2 +- ipc/shm.c | 21 +-- mm/mempolicy.c | 383 ++++++++++++++++---------------------- mm/shmem.c | 102 +++++----- mm/swap.h | 9 +- mm/swap_state.c | 86 +++++---- 9 files changed, 304 insertions(+), 327 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 3dd5be96691b..b0955a20e95f 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1945,8 +1945,9 @@ static int show_numa_map(struct seq_file *m, void *v) struct numa_maps *md = &numa_priv->md; struct file *file = vma->vm_file; struct mm_struct *mm = vma->vm_mm; - struct mempolicy *pol; char buffer[64]; + struct mempolicy *pol; + pgoff_t ilx; int nid; if (!mm) @@ -1955,7 +1956,7 @@ static int show_numa_map(struct seq_file *m, void *v) /* Ensure we start with an empty set of numa_maps statistics. */ memset(md, 0, sizeof(*md)); - pol = __get_vma_policy(vma, vma->vm_start); + pol = __get_vma_policy(vma, vma->vm_start, &ilx); if (pol) { mpol_to_str(buffer, sizeof(buffer), pol); mpol_cond_put(pol); diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 665f06675c83..f74f8d05b053 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -8,6 +8,7 @@ #include struct vm_area_struct; +struct mempolicy; /* Convert GFP flags to their corresponding migrate type */ #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE) @@ -262,7 +263,9 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, #ifdef CONFIG_NUMA struct page *alloc_pages(gfp_t gfp, unsigned int order); -struct folio *folio_alloc(gfp_t gfp, unsigned order); +struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order, + struct mempolicy *mpol, pgoff_t ilx, int nid); +struct folio *folio_alloc(gfp_t gfp, unsigned int order); struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma, unsigned long addr, bool hugepage); #else @@ -270,6 +273,11 @@ static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order) { return alloc_pages_node(numa_node_id(), gfp_mask, order); } +static inline struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order, + struct mempolicy *mpol, pgoff_t ilx, int nid) +{ + return alloc_pages(gfp, order); +} static inline struct folio *folio_alloc(gfp_t gfp, unsigned int order) { return __folio_alloc_node(gfp, order, numa_node_id()); diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 12f7dc74a457..ad93f23434bb 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -128,7 +128,9 @@ struct mempolicy *mpol_shared_policy_lookup(struct shared_policy *sp, struct mempolicy *get_task_policy(struct task_struct *p); struct mempolicy *__get_vma_policy(struct vm_area_struct *vma, - unsigned long addr); + unsigned long addr, pgoff_t *ilx); +struct mempolicy *get_vma_policy(struct vm_area_struct *vma, + unsigned long addr, int order, pgoff_t *ilx); bool vma_policy_mof(struct vm_area_struct *vma); extern void numa_default_policy(void); @@ -142,8 +144,6 @@ extern int huge_node(struct vm_area_struct *vma, extern bool init_nodemask_of_mempolicy(nodemask_t *mask); extern bool mempolicy_in_oom_domain(struct task_struct *tsk, const nodemask_t *mask); -extern nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy); - extern unsigned int mempolicy_slab_node(void); extern enum zone_type policy_zone; @@ -215,6 +215,13 @@ mpol_shared_policy_lookup(struct shared_policy *sp, pgoff_t idx) return NULL; } +static inline struct mempolicy *get_vma_policy(struct vm_area_struct *vma, + unsigned long addr, int order, pgoff_t *ilx) +{ + *ilx = 0; + return NULL; +} + #define vma_policy(vma) NULL static inline int diff --git a/include/linux/mm.h b/include/linux/mm.h index bf5d0b1b16f4..456f060b2475 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -619,7 +619,7 @@ struct vm_operations_struct { * policy. */ struct mempolicy *(*get_policy)(struct vm_area_struct *vma, - unsigned long addr); + unsigned long addr, pgoff_t *ilx); #endif /* * Called by vm_normal_page() for special PTEs to find the diff --git a/ipc/shm.c b/ipc/shm.c index 576a543b7cff..222aaf035afb 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -562,30 +562,25 @@ static unsigned long shm_pagesize(struct vm_area_struct *vma) } #ifdef CONFIG_NUMA -static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new) +static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol) { - struct file *file = vma->vm_file; - struct shm_file_data *sfd = shm_file_data(file); + struct shm_file_data *sfd = shm_file_data(vma->vm_file); int err = 0; if (sfd->vm_ops->set_policy) - err = sfd->vm_ops->set_policy(vma, new); + err = sfd->vm_ops->set_policy(vma, mpol); return err; } static struct mempolicy *shm_get_policy(struct vm_area_struct *vma, - unsigned long addr) + unsigned long addr, pgoff_t *ilx) { - struct file *file = vma->vm_file; - struct shm_file_data *sfd = shm_file_data(file); - struct mempolicy *pol = NULL; + struct shm_file_data *sfd = shm_file_data(vma->vm_file); + struct mempolicy *mpol = vma->vm_policy; if (sfd->vm_ops->get_policy) - pol = sfd->vm_ops->get_policy(vma, addr); - else if (vma->vm_policy) - pol = vma->vm_policy; - - return pol; + mpol = sfd->vm_ops->get_policy(vma, addr, ilx); + return mpol; } #endif diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 4c3b3f535630..d74df1e1b14a 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -114,6 +114,8 @@ #define MPOL_MF_INVERT (MPOL_MF_INTERNAL << 1) /* Invert check for nodemask */ #define MPOL_MF_WRLOCK (MPOL_MF_INTERNAL << 2) /* Write-lock walked vmas */ +#define NO_INTERLEAVE_INDEX (-1UL) + static struct kmem_cache *policy_cache; static struct kmem_cache *sn_cache; @@ -915,6 +917,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, } if (flags & MPOL_F_ADDR) { + pgoff_t ilx; /* ignored here */ /* * Do NOT fall back to task policy if the * vma/shared policy at addr is NULL. We @@ -926,10 +929,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, mmap_read_unlock(mm); return -EFAULT; } - if (vma->vm_ops && vma->vm_ops->get_policy) - pol = vma->vm_ops->get_policy(vma, addr); - else - pol = vma->vm_policy; + pol = __get_vma_policy(vma, addr, &ilx); } else if (addr) return -EINVAL; @@ -1187,6 +1187,15 @@ static struct folio *new_folio(struct folio *src, unsigned long start) break; } + /* + * __get_vma_policy() now expects a genuine non-NULL vma. Return NULL + * when the page can no longer be located in a vma: that is not ideal + * (migrate_pages() will give up early, presuming ENOMEM), but good + * enough to avoid a crash by syzkaller or concurrent holepunch. + */ + if (!vma) + return NULL; + if (folio_test_hugetlb(src)) { return alloc_hugetlb_folio_vma(folio_hstate(src), vma, address); @@ -1195,9 +1204,6 @@ static struct folio *new_folio(struct folio *src, unsigned long start) if (folio_test_large(src)) gfp = GFP_TRANSHUGE; - /* - * if !vma, vma_alloc_folio() will use task or system default policy - */ return vma_alloc_folio(gfp, folio_order(src), vma, address, folio_test_large(src)); } @@ -1705,34 +1711,19 @@ bool vma_migratable(struct vm_area_struct *vma) } struct mempolicy *__get_vma_policy(struct vm_area_struct *vma, - unsigned long addr) + unsigned long addr, pgoff_t *ilx) { - struct mempolicy *pol = NULL; - - if (vma) { - if (vma->vm_ops && vma->vm_ops->get_policy) { - pol = vma->vm_ops->get_policy(vma, addr); - } else if (vma->vm_policy) { - pol = vma->vm_policy; - - /* - * shmem_alloc_page() passes MPOL_F_SHARED policy with - * a pseudo vma whose vma->vm_ops=NULL. Take a reference - * count on these policies which will be dropped by - * mpol_cond_put() later - */ - if (mpol_needs_cond_ref(pol)) - mpol_get(pol); - } - } - - return pol; + *ilx = 0; + return (vma->vm_ops && vma->vm_ops->get_policy) ? + vma->vm_ops->get_policy(vma, addr, ilx) : vma->vm_policy; } /* - * get_vma_policy(@vma, @addr) + * get_vma_policy(@vma, @addr, @order, @ilx) * @vma: virtual memory area whose policy is sought * @addr: address in @vma for shared policy lookup + * @order: 0, or appropriate huge_page_order for interleaving + * @ilx: interleave index (output), for use only when MPOL_INTERLEAVE * * Returns effective policy for a VMA at specified address. * Falls back to current->mempolicy or system default policy, as necessary. @@ -1741,14 +1732,18 @@ struct mempolicy *__get_vma_policy(struct vm_area_struct *vma, * freeing by another task. It is the caller's responsibility to free the * extra reference for shared policies. */ -static struct mempolicy *get_vma_policy(struct vm_area_struct *vma, - unsigned long addr) +struct mempolicy *get_vma_policy(struct vm_area_struct *vma, + unsigned long addr, int order, pgoff_t *ilx) { - struct mempolicy *pol = __get_vma_policy(vma, addr); + struct mempolicy *pol; + pol = __get_vma_policy(vma, addr, ilx); if (!pol) pol = get_task_policy(current); - + if (pol->mode == MPOL_INTERLEAVE) { + *ilx += vma->vm_pgoff >> order; + *ilx += (addr - vma->vm_start) >> (PAGE_SHIFT + order); + } return pol; } @@ -1758,8 +1753,9 @@ bool vma_policy_mof(struct vm_area_struct *vma) if (vma->vm_ops && vma->vm_ops->get_policy) { bool ret = false; + pgoff_t ilx; /* ignored here */ - pol = vma->vm_ops->get_policy(vma, vma->vm_start); + pol = vma->vm_ops->get_policy(vma, vma->vm_start, &ilx); if (pol && (pol->flags & MPOL_F_MOF)) ret = true; mpol_cond_put(pol); @@ -1794,54 +1790,6 @@ bool apply_policy_zone(struct mempolicy *policy, enum zone_type zone) return zone >= dynamic_policy_zone; } -/* - * Return a nodemask representing a mempolicy for filtering nodes for - * page allocation - */ -nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) -{ - int mode = policy->mode; - - /* Lower zones don't get a nodemask applied for MPOL_BIND */ - if (unlikely(mode == MPOL_BIND) && - apply_policy_zone(policy, gfp_zone(gfp)) && - cpuset_nodemask_valid_mems_allowed(&policy->nodes)) - return &policy->nodes; - - if (mode == MPOL_PREFERRED_MANY) - return &policy->nodes; - - return NULL; -} - -/* - * Return the preferred node id for 'prefer' mempolicy, and return - * the given id for all other policies. - * - * policy_node() is always coupled with policy_nodemask(), which - * secures the nodemask limit for 'bind' and 'prefer-many' policy. - */ -static int policy_node(gfp_t gfp, struct mempolicy *policy, int nid) -{ - if (policy->mode == MPOL_PREFERRED) { - nid = first_node(policy->nodes); - } else { - /* - * __GFP_THISNODE shouldn't even be used with the bind policy - * because we might easily break the expectation to stay on the - * requested node and not break the policy. - */ - WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE)); - } - - if ((policy->mode == MPOL_BIND || - policy->mode == MPOL_PREFERRED_MANY) && - policy->home_node != NUMA_NO_NODE) - return policy->home_node; - - return nid; -} - /* Do dynamic interleaving for a process */ static unsigned int interleave_nodes(struct mempolicy *policy) { @@ -1901,11 +1849,11 @@ unsigned int mempolicy_slab_node(void) } /* - * Do static interleaving for a VMA with known offset @n. Returns the n'th - * node in pol->nodes (starting from n=0), wrapping around if n exceeds the - * number of present nodes. + * Do static interleaving for interleave index @ilx. Returns the ilx'th + * node in pol->nodes (starting from ilx=0), wrapping around if ilx + * exceeds the number of present nodes. */ -static unsigned offset_il_node(struct mempolicy *pol, unsigned long n) +static unsigned int interleave_nid(struct mempolicy *pol, pgoff_t ilx) { nodemask_t nodemask = pol->nodes; unsigned int target, nnodes; @@ -1923,33 +1871,54 @@ static unsigned offset_il_node(struct mempolicy *pol, unsigned long n) nnodes = nodes_weight(nodemask); if (!nnodes) return numa_node_id(); - target = (unsigned int)n % nnodes; + target = ilx % nnodes; nid = first_node(nodemask); for (i = 0; i < target; i++) nid = next_node(nid, nodemask); return nid; } -/* Determine a node number for interleave */ -static inline unsigned interleave_nid(struct mempolicy *pol, - struct vm_area_struct *vma, unsigned long addr, int shift) +/* + * Return a nodemask representing a mempolicy for filtering nodes for + * page allocation, together with preferred node id (or the input node id). + */ +static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol, + pgoff_t ilx, int *nid) { - if (vma) { - unsigned long off; + nodemask_t *nodemask = NULL; + switch (pol->mode) { + case MPOL_PREFERRED: + /* Override input node id */ + *nid = first_node(pol->nodes); + break; + case MPOL_PREFERRED_MANY: + nodemask = &pol->nodes; + if (pol->home_node != NUMA_NO_NODE) + *nid = pol->home_node; + break; + case MPOL_BIND: + /* Restrict to nodemask (but not on lower zones) */ + if (apply_policy_zone(pol, gfp_zone(gfp)) && + cpuset_nodemask_valid_mems_allowed(&pol->nodes)) + nodemask = &pol->nodes; + if (pol->home_node != NUMA_NO_NODE) + *nid = pol->home_node; /* - * for small pages, there is no difference between - * shift and PAGE_SHIFT, so the bit-shift is safe. - * for huge pages, since vm_pgoff is in units of small - * pages, we need to shift off the always 0 bits to get - * a useful offset. + * __GFP_THISNODE shouldn't even be used with the bind policy + * because we might easily break the expectation to stay on the + * requested node and not break the policy. */ - BUG_ON(shift < PAGE_SHIFT); - off = vma->vm_pgoff >> (shift - PAGE_SHIFT); - off += (addr - vma->vm_start) >> shift; - return offset_il_node(pol, off); - } else - return interleave_nodes(pol); + WARN_ON_ONCE(gfp & __GFP_THISNODE); + break; + case MPOL_INTERLEAVE: + /* Override input node id */ + *nid = (ilx == NO_INTERLEAVE_INDEX) ? + interleave_nodes(pol) : interleave_nid(pol, ilx); + break; + } + + return nodemask; } #ifdef CONFIG_HUGETLBFS @@ -1965,27 +1934,16 @@ static inline unsigned interleave_nid(struct mempolicy *pol, * to the struct mempolicy for conditional unref after allocation. * If the effective policy is 'bind' or 'prefer-many', returns a pointer * to the mempolicy's @nodemask for filtering the zonelist. - * - * Must be protected by read_mems_allowed_begin() */ int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, - struct mempolicy **mpol, nodemask_t **nodemask) + struct mempolicy **mpol, nodemask_t **nodemask) { + pgoff_t ilx; int nid; - int mode; - *mpol = get_vma_policy(vma, addr); - *nodemask = NULL; - mode = (*mpol)->mode; - - if (unlikely(mode == MPOL_INTERLEAVE)) { - nid = interleave_nid(*mpol, vma, addr, - huge_page_shift(hstate_vma(vma))); - } else { - nid = policy_node(gfp_flags, *mpol, numa_node_id()); - if (mode == MPOL_BIND || mode == MPOL_PREFERRED_MANY) - *nodemask = &(*mpol)->nodes; - } + nid = numa_node_id(); + *mpol = get_vma_policy(vma, addr, hstate_vma(vma)->order, &ilx); + *nodemask = policy_nodemask(gfp_flags, *mpol, ilx, &nid); return nid; } @@ -2063,27 +2021,8 @@ bool mempolicy_in_oom_domain(struct task_struct *tsk, return ret; } -/* Allocate a page in interleaved policy. - Own path because it needs to do special accounting. */ -static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, - unsigned nid) -{ - struct page *page; - - page = __alloc_pages(gfp, order, nid, NULL); - /* skip NUMA_INTERLEAVE_HIT counter update if numa stats is disabled */ - if (!static_branch_likely(&vm_numa_stat_key)) - return page; - if (page && page_to_nid(page) == nid) { - preempt_disable(); - __count_numa_event(page_zone(page), NUMA_INTERLEAVE_HIT); - preempt_enable(); - } - return page; -} - static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order, - int nid, struct mempolicy *pol) + int nid, nodemask_t *nodemask) { struct page *page; gfp_t preferred_gfp; @@ -2096,7 +2035,7 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order, */ preferred_gfp = gfp | __GFP_NOWARN; preferred_gfp &= ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL); - page = __alloc_pages(preferred_gfp, order, nid, &pol->nodes); + page = __alloc_pages(preferred_gfp, order, nid, nodemask); if (!page) page = __alloc_pages(gfp, order, nid, NULL); @@ -2104,55 +2043,29 @@ static struct page *alloc_pages_preferred_many(gfp_t gfp, unsigned int order, } /** - * vma_alloc_folio - Allocate a folio for a VMA. + * alloc_pages_mpol - Allocate pages according to NUMA mempolicy. * @gfp: GFP flags. - * @order: Order of the folio. - * @vma: Pointer to VMA or NULL if not available. - * @addr: Virtual address of the allocation. Must be inside @vma. - * @hugepage: For hugepages try only the preferred node if possible. + * @order: Order of the page allocation. + * @pol: Pointer to the NUMA mempolicy. + * @ilx: Index for interleave mempolicy (also distinguishes alloc_pages()). + * @nid: Preferred node (usually numa_node_id() but @mpol may override it). * - * Allocate a folio for a specific address in @vma, using the appropriate - * NUMA policy. When @vma is not NULL the caller must hold the mmap_lock - * of the mm_struct of the VMA to prevent it from going away. Should be - * used for all allocations for folios that will be mapped into user space. - * - * Return: The folio on success or NULL if allocation fails. + * Return: The page on success or NULL if allocation fails. */ -struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma, - unsigned long addr, bool hugepage) +struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order, + struct mempolicy *pol, pgoff_t ilx, int nid) { - struct mempolicy *pol; - int node = numa_node_id(); - struct folio *folio; - int preferred_nid; - nodemask_t *nmask; + nodemask_t *nodemask; + struct page *page; - pol = get_vma_policy(vma, addr); + nodemask = policy_nodemask(gfp, pol, ilx, &nid); - if (pol->mode == MPOL_INTERLEAVE) { - struct page *page; - unsigned nid; - - nid = interleave_nid(pol, vma, addr, PAGE_SHIFT + order); - mpol_cond_put(pol); - gfp |= __GFP_COMP; - page = alloc_page_interleave(gfp, order, nid); - return page_rmappable_folio(page); - } - - if (pol->mode == MPOL_PREFERRED_MANY) { - struct page *page; - - node = policy_node(gfp, pol, node); - gfp |= __GFP_COMP; - page = alloc_pages_preferred_many(gfp, order, node, pol); - mpol_cond_put(pol); - return page_rmappable_folio(page); - } - - if (unlikely(IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && hugepage)) { - int hpage_node = node; + if (pol->mode == MPOL_PREFERRED_MANY) + return alloc_pages_preferred_many(gfp, order, nid, nodemask); + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && + /* filter "hugepage" allocation, unless from alloc_pages() */ + order == HPAGE_PMD_ORDER && ilx != NO_INTERLEAVE_INDEX) { /* * For hugepage allocation and non-interleave policy which * allows the current node (or other explicitly preferred @@ -2163,39 +2076,68 @@ struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma, * If the policy is interleave or does not allow the current * node in its nodemask, we allocate the standard way. */ - if (pol->mode == MPOL_PREFERRED) - hpage_node = first_node(pol->nodes); - - nmask = policy_nodemask(gfp, pol); - if (!nmask || node_isset(hpage_node, *nmask)) { - mpol_cond_put(pol); + if (pol->mode != MPOL_INTERLEAVE && + (!nodemask || node_isset(nid, *nodemask))) { /* * First, try to allocate THP only on local node, but * don't reclaim unnecessarily, just compact. */ - folio = __folio_alloc_node(gfp | __GFP_THISNODE | - __GFP_NORETRY, order, hpage_node); - + page = __alloc_pages_node(nid, + gfp | __GFP_THISNODE | __GFP_NORETRY, order); + if (page || !(gfp & __GFP_DIRECT_RECLAIM)) + return page; /* * If hugepage allocations are configured to always * synchronous compact or the vma has been madvised * to prefer hugepage backing, retry allowing remote * memory with both reclaim and compact as well. */ - if (!folio && (gfp & __GFP_DIRECT_RECLAIM)) - folio = __folio_alloc(gfp, order, hpage_node, - nmask); - - goto out; } } - nmask = policy_nodemask(gfp, pol); - preferred_nid = policy_node(gfp, pol, node); - folio = __folio_alloc(gfp, order, preferred_nid, nmask); + page = __alloc_pages(gfp, order, nid, nodemask); + + if (unlikely(pol->mode == MPOL_INTERLEAVE) && page) { + /* skip NUMA_INTERLEAVE_HIT update if numa stats is disabled */ + if (static_branch_likely(&vm_numa_stat_key) && + page_to_nid(page) == nid) { + preempt_disable(); + __count_numa_event(page_zone(page), NUMA_INTERLEAVE_HIT); + preempt_enable(); + } + } + + return page; +} + +/** + * vma_alloc_folio - Allocate a folio for a VMA. + * @gfp: GFP flags. + * @order: Order of the folio. + * @vma: Pointer to VMA. + * @addr: Virtual address of the allocation. Must be inside @vma. + * @hugepage: Unused (was: For hugepages try only preferred node if possible). + * + * Allocate a folio for a specific address in @vma, using the appropriate + * NUMA policy. The caller must hold the mmap_lock of the mm_struct of the + * VMA to prevent it from going away. Should be used for all allocations + * for folios that will be mapped into user space, excepting hugetlbfs, and + * excepting where direct use of alloc_pages_mpol() is more appropriate. + * + * Return: The folio on success or NULL if allocation fails. + */ +struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma, + unsigned long addr, bool hugepage) +{ + struct mempolicy *pol; + pgoff_t ilx; + struct page *page; + + pol = get_vma_policy(vma, addr, order, &ilx); + page = alloc_pages_mpol(gfp | __GFP_COMP, order, + pol, ilx, numa_node_id()); mpol_cond_put(pol); -out: - return folio; + return page_rmappable_folio(page); } EXPORT_SYMBOL(vma_alloc_folio); @@ -2213,33 +2155,23 @@ EXPORT_SYMBOL(vma_alloc_folio); * flags are used. * Return: The page on success or NULL if allocation fails. */ -struct page *alloc_pages(gfp_t gfp, unsigned order) +struct page *alloc_pages(gfp_t gfp, unsigned int order) { struct mempolicy *pol = &default_policy; - struct page *page; - - if (!in_interrupt() && !(gfp & __GFP_THISNODE)) - pol = get_task_policy(current); /* * No reference counting needed for current->mempolicy * nor system default_policy */ - if (pol->mode == MPOL_INTERLEAVE) - page = alloc_page_interleave(gfp, order, interleave_nodes(pol)); - else if (pol->mode == MPOL_PREFERRED_MANY) - page = alloc_pages_preferred_many(gfp, order, - policy_node(gfp, pol, numa_node_id()), pol); - else - page = __alloc_pages(gfp, order, - policy_node(gfp, pol, numa_node_id()), - policy_nodemask(gfp, pol)); + if (!in_interrupt() && !(gfp & __GFP_THISNODE)) + pol = get_task_policy(current); - return page; + return alloc_pages_mpol(gfp, order, + pol, NO_INTERLEAVE_INDEX, numa_node_id()); } EXPORT_SYMBOL(alloc_pages); -struct folio *folio_alloc(gfp_t gfp, unsigned order) +struct folio *folio_alloc(gfp_t gfp, unsigned int order) { return page_rmappable_folio(alloc_pages(gfp | __GFP_COMP, order)); } @@ -2310,6 +2242,8 @@ unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp, unsigned long nr_pages, struct page **page_array) { struct mempolicy *pol = &default_policy; + nodemask_t *nodemask; + int nid; if (!in_interrupt() && !(gfp & __GFP_THISNODE)) pol = get_task_policy(current); @@ -2322,9 +2256,10 @@ unsigned long alloc_pages_bulk_array_mempolicy(gfp_t gfp, return alloc_pages_bulk_array_preferred_many(gfp, numa_node_id(), pol, nr_pages, page_array); - return __alloc_pages_bulk(gfp, policy_node(gfp, pol, numa_node_id()), - policy_nodemask(gfp, pol), nr_pages, NULL, - page_array); + nid = numa_node_id(); + nodemask = policy_nodemask(gfp, pol, NO_INTERLEAVE_INDEX, &nid); + return __alloc_pages_bulk(gfp, nid, nodemask, + nr_pages, NULL, page_array); } int vma_dup_policy(struct vm_area_struct *src, struct vm_area_struct *dst) @@ -2510,23 +2445,21 @@ static void sp_free(struct sp_node *n) int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long addr) { struct mempolicy *pol; + pgoff_t ilx; struct zoneref *z; int curnid = page_to_nid(page); - unsigned long pgoff; int thiscpu = raw_smp_processor_id(); int thisnid = cpu_to_node(thiscpu); int polnid = NUMA_NO_NODE; int ret = NUMA_NO_NODE; - pol = get_vma_policy(vma, addr); + pol = get_vma_policy(vma, addr, compound_order(page), &ilx); if (!(pol->flags & MPOL_F_MOF)) goto out; switch (pol->mode) { case MPOL_INTERLEAVE: - pgoff = vma->vm_pgoff; - pgoff += (addr - vma->vm_start) >> PAGE_SHIFT; - polnid = offset_il_node(pol, pgoff); + polnid = interleave_nid(pol, ilx); break; case MPOL_PREFERRED: diff --git a/mm/shmem.c b/mm/shmem.c index 69595d341882..aaf44aec2826 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1565,38 +1565,20 @@ static inline struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo) return NULL; } #endif /* CONFIG_NUMA && CONFIG_TMPFS */ -#ifndef CONFIG_NUMA -#define vm_policy vm_private_data -#endif -static void shmem_pseudo_vma_init(struct vm_area_struct *vma, - struct shmem_inode_info *info, pgoff_t index) -{ - /* Create a pseudo vma that just contains the policy */ - vma_init(vma, NULL); - /* Bias interleave by inode number to distribute better across nodes */ - vma->vm_pgoff = index + info->vfs_inode.i_ino; - vma->vm_policy = mpol_shared_policy_lookup(&info->policy, index); -} +static struct mempolicy *shmem_get_pgoff_policy(struct shmem_inode_info *info, + pgoff_t index, unsigned int order, pgoff_t *ilx); -static void shmem_pseudo_vma_destroy(struct vm_area_struct *vma) -{ - /* Drop reference taken by mpol_shared_policy_lookup() */ - mpol_cond_put(vma->vm_policy); -} - -static struct folio *shmem_swapin(swp_entry_t swap, gfp_t gfp, +static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp, struct shmem_inode_info *info, pgoff_t index) { - struct vm_area_struct pvma; + struct mempolicy *mpol; + pgoff_t ilx; struct page *page; - struct vm_fault vmf = { - .vma = &pvma, - }; - shmem_pseudo_vma_init(&pvma, info, index); - page = swap_cluster_readahead(swap, gfp, &vmf); - shmem_pseudo_vma_destroy(&pvma); + mpol = shmem_get_pgoff_policy(info, index, 0, &ilx); + page = swap_cluster_readahead(swap, gfp, mpol, ilx); + mpol_cond_put(mpol); if (!page) return NULL; @@ -1630,35 +1612,37 @@ static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp) static struct folio *shmem_alloc_hugefolio(gfp_t gfp, struct shmem_inode_info *info, pgoff_t index) { - struct vm_area_struct pvma; struct address_space *mapping = info->vfs_inode.i_mapping; - pgoff_t hindex; - struct folio *folio; + struct mempolicy *mpol; + pgoff_t ilx; + struct page *page; - hindex = round_down(index, HPAGE_PMD_NR); - if (xa_find(&mapping->i_pages, &hindex, hindex + HPAGE_PMD_NR - 1, + index = round_down(index, HPAGE_PMD_NR); + if (xa_find(&mapping->i_pages, &index, index + HPAGE_PMD_NR - 1, XA_PRESENT)) return NULL; - shmem_pseudo_vma_init(&pvma, info, hindex); - folio = vma_alloc_folio(gfp, HPAGE_PMD_ORDER, &pvma, 0, true); - shmem_pseudo_vma_destroy(&pvma); - if (!folio) + mpol = shmem_get_pgoff_policy(info, index, HPAGE_PMD_ORDER, &ilx); + page = alloc_pages_mpol(gfp, HPAGE_PMD_ORDER, mpol, ilx, numa_node_id()); + mpol_cond_put(mpol); + + if (!page) count_vm_event(THP_FILE_FALLBACK); - return folio; + return page_rmappable_folio(page); } static struct folio *shmem_alloc_folio(gfp_t gfp, - struct shmem_inode_info *info, pgoff_t index) + struct shmem_inode_info *info, pgoff_t index) { - struct vm_area_struct pvma; - struct folio *folio; + struct mempolicy *mpol; + pgoff_t ilx; + struct page *page; - shmem_pseudo_vma_init(&pvma, info, index); - folio = vma_alloc_folio(gfp, 0, &pvma, 0, false); - shmem_pseudo_vma_destroy(&pvma); + mpol = shmem_get_pgoff_policy(info, index, 0, &ilx); + page = alloc_pages_mpol(gfp, 0, mpol, ilx, numa_node_id()); + mpol_cond_put(mpol); - return folio; + return (struct folio *)page; } static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode, @@ -1848,7 +1832,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, count_memcg_event_mm(charge_mm, PGMAJFAULT); } /* Here we actually start the io */ - folio = shmem_swapin(swap, gfp, info, index); + folio = shmem_swapin_cluster(swap, gfp, info, index); if (!folio) { error = -ENOMEM; goto failed; @@ -2330,15 +2314,41 @@ static int shmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol) } static struct mempolicy *shmem_get_policy(struct vm_area_struct *vma, - unsigned long addr) + unsigned long addr, pgoff_t *ilx) { struct inode *inode = file_inode(vma->vm_file); pgoff_t index; + /* + * Bias interleave by inode number to distribute better across nodes; + * but this interface is independent of which page order is used, so + * supplies only that bias, letting caller apply the offset (adjusted + * by page order, as in shmem_get_pgoff_policy() and get_vma_policy()). + */ + *ilx = inode->i_ino; index = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; return mpol_shared_policy_lookup(&SHMEM_I(inode)->policy, index); } -#endif + +static struct mempolicy *shmem_get_pgoff_policy(struct shmem_inode_info *info, + pgoff_t index, unsigned int order, pgoff_t *ilx) +{ + struct mempolicy *mpol; + + /* Bias interleave by inode number to distribute better across nodes */ + *ilx = info->vfs_inode.i_ino + (index >> order); + + mpol = mpol_shared_policy_lookup(&info->policy, index); + return mpol ? mpol : get_task_policy(current); +} +#else +static struct mempolicy *shmem_get_pgoff_policy(struct shmem_inode_info *info, + pgoff_t index, unsigned int order, pgoff_t *ilx) +{ + *ilx = 0; + return NULL; +} +#endif /* CONFIG_NUMA */ int shmem_lock(struct file *file, int lock, struct ucounts *ucounts) { diff --git a/mm/swap.h b/mm/swap.h index 8a3c7a0ace4f..73c332ee4d91 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -2,6 +2,8 @@ #ifndef _MM_SWAP_H #define _MM_SWAP_H +struct mempolicy; + #ifdef CONFIG_SWAP #include /* for bio_end_io_t */ @@ -48,11 +50,10 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, unsigned long addr, struct swap_iocb **plug); struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, - struct vm_area_struct *vma, - unsigned long addr, + struct mempolicy *mpol, pgoff_t ilx, bool *new_page_allocated); struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); + struct mempolicy *mpol, pgoff_t ilx); struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, struct vm_fault *vmf); @@ -80,7 +81,7 @@ static inline void show_swap_cache_info(void) } static inline struct page *swap_cluster_readahead(swp_entry_t entry, - gfp_t gfp_mask, struct vm_fault *vmf) + gfp_t gfp_mask, struct mempolicy *mpol, pgoff_t ilx) { return NULL; } diff --git a/mm/swap_state.c b/mm/swap_state.c index b3b14bd0dd64..116d4d8a930e 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -410,8 +411,8 @@ struct folio *filemap_get_incore_folio(struct address_space *mapping, } struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, - struct vm_area_struct *vma, unsigned long addr, - bool *new_page_allocated) + struct mempolicy *mpol, pgoff_t ilx, + bool *new_page_allocated) { struct swap_info_struct *si; struct folio *folio; @@ -453,7 +454,8 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * before marking swap_map SWAP_HAS_CACHE, when -EEXIST will * cause any racers to loop around until we add it to cache. */ - folio = vma_alloc_folio(gfp_mask, 0, vma, addr, false); + folio = (struct folio *)alloc_pages_mpol(gfp_mask, 0, + mpol, ilx, numa_node_id()); if (!folio) goto fail_put_swap; @@ -528,14 +530,19 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, unsigned long addr, struct swap_iocb **plug) { - bool page_was_allocated; - struct page *retpage = __read_swap_cache_async(entry, gfp_mask, - vma, addr, &page_was_allocated); + bool page_allocated; + struct mempolicy *mpol; + pgoff_t ilx; + struct page *page; - if (page_was_allocated) - swap_readpage(retpage, false, plug); + mpol = get_vma_policy(vma, addr, 0, &ilx); + page = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, + &page_allocated); + mpol_cond_put(mpol); - return retpage; + if (page_allocated) + swap_readpage(page, false, plug); + return page; } static unsigned int __swapin_nr_pages(unsigned long prev_offset, @@ -603,7 +610,8 @@ static unsigned long swapin_nr_pages(unsigned long offset) * swap_cluster_readahead - swap in pages in hope we need them soon * @entry: swap entry of this memory * @gfp_mask: memory allocation flags - * @vmf: fault information + * @mpol: NUMA memory allocation policy to be applied + * @ilx: NUMA interleave index, for use only when MPOL_INTERLEAVE * * Returns the struct page for entry and addr, after queueing swapin. * @@ -612,13 +620,12 @@ static unsigned long swapin_nr_pages(unsigned long offset) * because it doesn't cost us any seek time. We also make sure to queue * the 'original' request together with the readahead ones... * - * This has been extended to use the NUMA policies from the mm triggering - * the readahead. - * - * Caller must hold read mmap_lock if vmf->vma is not NULL. + * Note: it is intentional that the same NUMA policy and interleave index + * are used for every page of the readahead: neighbouring pages on swap + * are fairly likely to have been swapped out from the same node. */ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, - struct vm_fault *vmf) + struct mempolicy *mpol, pgoff_t ilx) { struct page *page; unsigned long entry_offset = swp_offset(entry); @@ -629,8 +636,6 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, struct blk_plug plug; struct swap_iocb *splug = NULL; bool page_allocated; - struct vm_area_struct *vma = vmf->vma; - unsigned long addr = vmf->address; mask = swapin_nr_pages(offset) - 1; if (!mask) @@ -648,8 +653,8 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, for (offset = start_offset; offset <= end_offset ; offset++) { /* Ok, do the async read-ahead now */ page = __read_swap_cache_async( - swp_entry(swp_type(entry), offset), - gfp_mask, vma, addr, &page_allocated); + swp_entry(swp_type(entry), offset), + gfp_mask, mpol, ilx, &page_allocated); if (!page) continue; if (page_allocated) { @@ -663,11 +668,14 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, } blk_finish_plug(&plug); swap_read_unplug(splug); - lru_add_drain(); /* Push any new pages onto the LRU now */ skip: /* The page was likely read above, so no need for plugging here */ - return read_swap_cache_async(entry, gfp_mask, vma, addr, NULL); + page = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, + &page_allocated); + if (page_allocated) + swap_readpage(page, false, NULL); + return page; } int init_swap_address_space(unsigned int type, unsigned long nr_pages) @@ -765,8 +773,10 @@ static void swap_ra_info(struct vm_fault *vmf, /** * swap_vma_readahead - swap in pages in hope we need them soon - * @fentry: swap entry of this memory + * @targ_entry: swap entry of the targeted memory * @gfp_mask: memory allocation flags + * @mpol: NUMA memory allocation policy to be applied + * @targ_ilx: NUMA interleave index, for use only when MPOL_INTERLEAVE * @vmf: fault information * * Returns the struct page for entry and addr, after queueing swapin. @@ -777,16 +787,17 @@ static void swap_ra_info(struct vm_fault *vmf, * Caller must hold read mmap_lock if vmf->vma is not NULL. * */ -static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, +static struct page *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, + struct mempolicy *mpol, pgoff_t targ_ilx, struct vm_fault *vmf) { struct blk_plug plug; struct swap_iocb *splug = NULL; - struct vm_area_struct *vma = vmf->vma; struct page *page; pte_t *pte = NULL, pentry; unsigned long addr; swp_entry_t entry; + pgoff_t ilx; unsigned int i; bool page_allocated; struct vma_swap_readahead ra_info = { @@ -798,9 +809,10 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, goto skip; addr = vmf->address - (ra_info.offset * PAGE_SIZE); + ilx = targ_ilx - ra_info.offset; blk_start_plug(&plug); - for (i = 0; i < ra_info.nr_pte; i++, addr += PAGE_SIZE) { + for (i = 0; i < ra_info.nr_pte; i++, ilx++, addr += PAGE_SIZE) { if (!pte++) { pte = pte_offset_map(vmf->pmd, addr); if (!pte) @@ -814,8 +826,8 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, continue; pte_unmap(pte); pte = NULL; - page = __read_swap_cache_async(entry, gfp_mask, vma, - addr, &page_allocated); + page = __read_swap_cache_async(entry, gfp_mask, mpol, ilx, + &page_allocated); if (!page) continue; if (page_allocated) { @@ -834,8 +846,11 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, lru_add_drain(); skip: /* The page was likely read above, so no need for plugging here */ - return read_swap_cache_async(fentry, gfp_mask, vma, vmf->address, - NULL); + page = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx, + &page_allocated); + if (page_allocated) + swap_readpage(page, false, NULL); + return page; } /** @@ -853,9 +868,16 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, struct vm_fault *vmf) { - return swap_use_vma_readahead() ? - swap_vma_readahead(entry, gfp_mask, vmf) : - swap_cluster_readahead(entry, gfp_mask, vmf); + struct mempolicy *mpol; + pgoff_t ilx; + struct page *page; + + mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx); + page = swap_use_vma_readahead() ? + swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf) : + swap_cluster_readahead(entry, gfp_mask, mpol, ilx); + mpol_cond_put(mpol); + return page; } #ifdef CONFIG_SYSFS From patchwork Mon Sep 25 08:35:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13397490 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51450CE7A81 for ; Mon, 25 Sep 2023 08:35:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E5A368D0019; Mon, 25 Sep 2023 04:35:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DE36D8D0001; Mon, 25 Sep 2023 04:35:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB6518D0019; Mon, 25 Sep 2023 04:35:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B8B998D0001 for ; Mon, 25 Sep 2023 04:35:09 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 87716140AB5 for ; Mon, 25 Sep 2023 08:35:09 +0000 (UTC) X-FDA: 81274459938.26.E135388 Received: from mail-yb1-f173.google.com (mail-yb1-f173.google.com [209.85.219.173]) by imf29.hostedemail.com (Postfix) with ESMTP id AFE2512000E for ; Mon, 25 Sep 2023 08:35:07 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jAeratV7; spf=pass (imf29.hostedemail.com: domain of hughd@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695630907; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ezzNrLDnqV/k6jp9dH7Pe4D9g9Anl40AHbeIOwXqGUY=; b=EdEQXo04Xt//SFPY8Vwhqpvn8wQkkb++ge0tlMkJQm7NS1yTDxa8LJlKB8bx2R+lIzYWAL smSUcRt9PY1hW+mmtZqEVjqNRhzR5Snj5LgcJaO2dXj4WBcKI2JVbHt3KMLB42/HE/y9vF XIvW5zfEsrQ97DoKW4rs3a5olq3qln4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695630907; a=rsa-sha256; cv=none; b=u6sUO2sxQUBw3sJfY+JfsCsQIpykBQ/b3swflt5hnNdqduJkfenrPIlIdnMe3V+5bHWvwa f6hhEMN88fWqDDMNirwipScxvj1uo9YH8xbj2dbm/FMRVohJqB4qPGtvN8o64STazWJCtG eeauTF+yCxxfp85ZxOJbZ9TOSFpwLLY= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jAeratV7; spf=pass (imf29.hostedemail.com: domain of hughd@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f173.google.com with SMTP id 3f1490d57ef6-d867d4cf835so2976519276.1 for ; Mon, 25 Sep 2023 01:35:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695630907; x=1696235707; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=ezzNrLDnqV/k6jp9dH7Pe4D9g9Anl40AHbeIOwXqGUY=; b=jAeratV7w6LMIdPonXtSUSXBkUDPo+nJVlVaZPkV4nMCSRjplpw1qwQ7j5JbF1XRQr egkyP0P1xMCJGQTo4hE1gGWS3yCnA61AIrukHcIvQnJ382jHgs1a4BRTfyYcs5FXkZC8 YgPU9MRZ+C8PPxp6bzGNHZFfbk8vN5kcJde2XRjFt0JCICkkxy1tvOPc6PAXn5zsAORH nyInffigVyr/b6y0aSgx92NxnV85IgA1DkTKepT36hHcnOgk3BR/T6KoB6J2Q7fsnOc6 fhqSmFhvlL4pOsQSuLJhVbylAyQ7IeV6Rdu8x2bi4jFI436Rynv9CEeCWYK+s2+jlzni fzCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695630907; x=1696235707; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ezzNrLDnqV/k6jp9dH7Pe4D9g9Anl40AHbeIOwXqGUY=; b=TyPyaqyXLK3/VQlNUxsdqyeGeFM0CNkjUbOT5f2mRf95a/3/DIgbl/ZoVoWgtThdGx GKFbOlp/doivj7QEVGE/RRAVJsx/mz/iGHVGLTYaGjG9340bub/4g/pbbqR3RlLBu8Bi VTWlGAZ5luHjga9kIBWtbQZUlRAaksogudXZmdwr4J2dzhhRGw14zJI/AbyJTJAaweI6 2ZmOJRxVq3+UbljlbJtdx1Mye+yDda4116lbytWd/C0m5O3onB6GKGVywxWX0PlyliHP ALYqV/KzWOJzNeUIxAJh8kaeF/0/a8xeA6R62DNMKohnXDzIcn9IwCYqdoeBVh0ZBAs3 Oi8A== X-Gm-Message-State: AOJu0YylBEstmvhrdMD0i1mJIh0PQtWCMbozKM/1CFNSrXGa8V1rY+8i 853wJCOn6LeN7hmX0iou2aySAqQWAchQqsuZexuaUQ== X-Google-Smtp-Source: AGHT+IFp0uZwCYh9f6Ww0ZKvnQPx4qXewNgiqFJbqCBgiDfI2aX3U5liWK+g0EmK6zec2o4LhHt/pg== X-Received: by 2002:a05:690c:d93:b0:59f:8026:4260 with SMTP id da19-20020a05690c0d9300b0059f80264260mr2238947ywb.24.1695630906593; Mon, 25 Sep 2023 01:35:06 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id c188-20020a0df3c5000000b0059293c8d70csm2293994ywf.132.2023.09.25.01.35.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 01:35:05 -0700 (PDT) Date: Mon, 25 Sep 2023 01:35:03 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Andi Kleen , Christoph Lameter , Matthew Wilcox , Mike Kravetz , David Hildenbrand , Suren Baghdasaryan , Yang Shi , Sidhartha Kumar , Vishal Moola , Kefeng Wang , Greg Kroah-Hartman , Tejun Heo , Mel Gorman , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 11/12] mempolicy: mmap_lock is not needed while migrating folios In-Reply-To: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> Message-ID: <73183de1-6529-b146-f2cc-fcd5b812166@google.com> References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: AFE2512000E X-Rspam-User: X-Stat-Signature: qur6ccqtgmypo9j8tjz6or47adjmqnnw X-Rspamd-Server: rspam03 X-HE-Tag: 1695630907-354767 X-HE-Meta: U2FsdGVkX19DmypxcSdvzoNVax9hpxwd0EsGNiVLiz67EnCAh3OotTWsb4PyBB0dzdvrFG2iXF4Pf7fNVeFWje5mdZtejYJ80eeVnD8tFUnkcsoa8QksrvXSr00oVunlZ0xNUc9JzJ1ihjUm8YYK5aotUKqOdokVAgaw9GMdL/W/6P7GSTMa3x4lCb/0DSo8L2n0qVncUqb3dejOiAXXzHpRiCS0tgvXkMw11S++yE24X66weymjg368S2hDGEAWJPmWPwizCilX5oz4bQAX4yTK9PQf25GfveRcI8Iejwa17Xh6Ez9cNjt9Rt39bx8ZMGmp0fGSr7/O2fYh21W9qRy7bBVG1F4xgeVwCNdXiL2ASXKOeIHsJB6ajlorX7BdWAFuooV0dAL02GCJJ3/vuD2RuzXnHXFwDRGITIJB6z3U/7Skoegej9aMFrteKbMkKEpDHoXpD6kaWzhHMijsrgX1I4guFi/0PbpdceyMft/7OmdhNP2KIYKlPCkZ8WpNkMVt4JpJaL/z3Nfe4JLZ3xX4hpz+YI0vF+AXgRmHkIsI96d+1dy8Pg5EaBiH7Lrl9+43UX+QDPsU39PBF9DY695UwXA4K70oGlaPcqgbO5fUx55kXPZqeBp1TEINbAbIkaCuTHvnjABrFgJLQ57ZG0hUk6Qq5K5e2NlOvRvBrobDSPgNKJI6PtHm6GvQnaiFegDWz1a7FYGxgXMzlRk8bSbxJDoUYojFK3pfJEC/ajWQPzmA2cy2n/yjCmzG+TpGdzbhpJUEstHY/gAUzuegYuznhvkniaeK/3x6S3UYmgROxlztCzrCpbUwFccCJRPxM3qh8rKw8jmXKkHkjkX7hpC5bCvJc4uxbIkua9n9eNrQ/gSUrTRzVYP49JgIgWZw/eYK8MPVW4lOjJEk36oXBWnQKxLwaLgXxokUMgUfdZgv5S225Gqgx3XXMLmLrIgXbr5NmxJ8p6qTVPu2Y4D sk1KzLOb AMqnVeM0IqR3J0vMn4Wqa6MY+cerrHWDwxWBBFa5iJrawFd7VNZugaCa5QXw8j0TlFoCfhkfoQZrNxP45K8Z+iN/xj2Xw77wM3XE+VMcu77wvHIGZq+6mc6qDZrBTAcey+IOLgYwAJVPIIEMHakjCQKA0gVL+nRba+o1gocJV++WZXDGG9MK5zKwE2EOCgU1GJbLu+Ve7J//ECSSo9G2tLbNS2PWVjiA39D2jSAijQGlYWyPYsHSdjylkI2UXlc4DIr93Q6Uyg+igM+9V+4P5/5UZMGR2LekDKGUhydTLBLf9r/1jP7l/IvgekDsPl0MsCEKRUiydcMqKGptctuoZ8EuC/nObhGp/z5f3gNbf1ZMbXRKWLnmTYNbsWwPA3VZPhkqD76T08PUO7ZkfLGj5UV72wx1iQm86Yb6zgr3dJBM4KLOQJ5siWe7sTaoXy2gH6URBJ9BupPl/BRsyEvYBvXk4C9VZpnUFVe5KgWhO/SK4CsGynptsGyg6lppc0QGQRX2X X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: mbind(2) holds down_write of current task's mmap_lock throughout (exclusive because it needs to set the new mempolicy on the vmas); migrate_pages(2) holds down_read of pid's mmap_lock throughout. They both hold mmap_lock across the internal migrate_pages(), under which all new page allocations (huge or small) are made. I'm nervous about it; and migrate_pages() certainly does not need mmap_lock itself. It's done this way for mbind(2), because its page allocator is vma_alloc_folio() or alloc_hugetlb_folio_vma(), both of which depend on vma and address. Now that we have alloc_pages_mpol(), depending on (refcounted) memory policy and interleave index, mbind(2) can be modified to use that or alloc_hugetlb_folio_nodemask(), and then not need mmap_lock across the internal migrate_pages() at all: add alloc_migration_target_by_mpol() to replace mbind's new_page(). (After that change, alloc_hugetlb_folio_vma() is used by nothing but a userfaultfd function: move it out of hugetlb.h and into the #ifdef.) migrate_pages(2) has chosen its target node before migrating, so can continue to use the standard alloc_migration_target(); but let it take and drop mmap_lock just around migrate_to_node()'s queue_pages_range(): neither the node-to-node calculations nor the page migrations need it. It seems unlikely, but it is conceivable that some userspace depends on the kernel's mmap_lock exclusion here, instead of doing its own locking: more likely in a testsuite than in real life. It is also possible, of course, that some pages on the list will be munmapped by another thread before they are migrated, or a newer memory policy applied to the range by that time: but such races could happen before, as soon as mmap_lock was dropped, so it does not appear to be a concern. Signed-off-by: Hugh Dickins --- include/linux/hugetlb.h | 9 ----- mm/hugetlb.c | 38 +++++++++--------- mm/mempolicy.c | 85 +++++++++++++++++++++-------------------- 3 files changed, 64 insertions(+), 68 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6522eb3cd007..9c4265c73f76 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -714,8 +714,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask); -struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma, - unsigned long address); int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping, pgoff_t idx); void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, @@ -1024,13 +1022,6 @@ alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, return NULL; } -static inline struct folio *alloc_hugetlb_folio_vma(struct hstate *h, - struct vm_area_struct *vma, - unsigned long address) -{ - return NULL; -} - static inline int __alloc_bootmem_huge_page(struct hstate *h) { return 0; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ba6d39b71cb1..1af54dbbd7cc 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2479,24 +2479,6 @@ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, return alloc_migrate_hugetlb_folio(h, gfp_mask, preferred_nid, nmask); } -/* mempolicy aware migration callback */ -struct folio *alloc_hugetlb_folio_vma(struct hstate *h, struct vm_area_struct *vma, - unsigned long address) -{ - struct mempolicy *mpol; - nodemask_t *nodemask; - struct folio *folio; - gfp_t gfp_mask; - int node; - - gfp_mask = htlb_alloc_mask(h); - node = huge_node(vma, address, gfp_mask, &mpol, &nodemask); - folio = alloc_hugetlb_folio_nodemask(h, node, nodemask, gfp_mask); - mpol_cond_put(mpol); - - return folio; -} - /* * Increase the hugetlb pool such that it can accommodate a reservation * of size 'delta'. @@ -6225,6 +6207,26 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } #ifdef CONFIG_USERFAULTFD +/* + * Can probably be eliminated, but still used by hugetlb_mfill_atomic_pte(). + */ +static struct folio *alloc_hugetlb_folio_vma(struct hstate *h, + struct vm_area_struct *vma, unsigned long address) +{ + struct mempolicy *mpol; + nodemask_t *nodemask; + struct folio *folio; + gfp_t gfp_mask; + int node; + + gfp_mask = htlb_alloc_mask(h); + node = huge_node(vma, address, gfp_mask, &mpol, &nodemask); + folio = alloc_hugetlb_folio_nodemask(h, node, nodemask, gfp_mask); + mpol_cond_put(mpol); + + return folio; +} + /* * Used by userfaultfd UFFDIO_* ioctls. Based on userfaultfd's mfill_atomic_pte * with modifications for hugetlb pages. diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d74df1e1b14a..74b1894d29c1 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -417,6 +417,8 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { static bool migrate_folio_add(struct folio *folio, struct list_head *foliolist, unsigned long flags); +static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *pol, + pgoff_t ilx, int *nid); static bool strictly_unmovable(unsigned long flags) { @@ -1040,6 +1042,8 @@ static long migrate_to_node(struct mm_struct *mm, int source, int dest, node_set(source, nmask); VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))); + + mmap_read_lock(mm); vma = find_vma(mm, 0); /* @@ -1050,6 +1054,7 @@ static long migrate_to_node(struct mm_struct *mm, int source, int dest, */ nr_failed = queue_pages_range(mm, vma->vm_start, mm->task_size, &nmask, flags | MPOL_MF_DISCONTIG_OK, &pagelist); + mmap_read_unlock(mm); if (!list_empty(&pagelist)) { err = migrate_pages(&pagelist, alloc_migration_target, NULL, @@ -1078,8 +1083,6 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, lru_cache_disable(); - mmap_read_lock(mm); - /* * Find a 'source' bit set in 'tmp' whose corresponding 'dest' * bit in 'to' is not also set in 'tmp'. Clear the found 'source' @@ -1159,7 +1162,6 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, if (err < 0) break; } - mmap_read_unlock(mm); lru_cache_enable(); if (err < 0) @@ -1168,44 +1170,38 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, } /* - * Allocate a new page for page migration based on vma policy. - * Start by assuming the page is mapped by the same vma as contains @start. - * Search forward from there, if not. N.B., this assumes that the - * list of pages handed to migrate_pages()--which is how we get here-- - * is in virtual address order. + * Allocate a new folio for page migration, according to NUMA mempolicy. */ -static struct folio *new_folio(struct folio *src, unsigned long start) +static struct folio *alloc_migration_target_by_mpol(struct folio *src, + unsigned long private) { - struct vm_area_struct *vma; - unsigned long address; - VMA_ITERATOR(vmi, current->mm, start); - gfp_t gfp = GFP_HIGHUSER_MOVABLE | __GFP_RETRY_MAYFAIL; - - for_each_vma(vmi, vma) { - address = page_address_in_vma(&src->page, vma); - if (address != -EFAULT) - break; - } - - /* - * __get_vma_policy() now expects a genuine non-NULL vma. Return NULL - * when the page can no longer be located in a vma: that is not ideal - * (migrate_pages() will give up early, presuming ENOMEM), but good - * enough to avoid a crash by syzkaller or concurrent holepunch. - */ - if (!vma) - return NULL; + struct mempolicy *pol = (struct mempolicy *)private; + pgoff_t ilx = 0; /* improve on this later */ + struct page *page; + unsigned int order; + int nid = numa_node_id(); + gfp_t gfp; if (folio_test_hugetlb(src)) { - return alloc_hugetlb_folio_vma(folio_hstate(src), - vma, address); + nodemask_t *nodemask; + struct hstate *h; + + ilx += src->index; /* HugeTLBfs indexes in hpage_size */ + h = folio_hstate(src); + gfp = htlb_alloc_mask(h); + nodemask = policy_nodemask(gfp, pol, ilx, &nid); + return alloc_hugetlb_folio_nodemask(h, nid, nodemask, gfp); } if (folio_test_large(src)) gfp = GFP_TRANSHUGE; + else + gfp = GFP_HIGHUSER_MOVABLE | __GFP_RETRY_MAYFAIL | __GFP_COMP; - return vma_alloc_folio(gfp, folio_order(src), vma, address, - folio_test_large(src)); + order = folio_order(src); + ilx += src->index >> order; + page = alloc_pages_mpol(gfp, order, pol, ilx, nid); + return page_rmappable_folio(page); } #else @@ -1221,7 +1217,8 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, return -ENOSYS; } -static struct folio *new_folio(struct folio *src, unsigned long start) +static struct folio *alloc_migration_target_by_mpol(struct folio *src, + unsigned long private) { return NULL; } @@ -1295,6 +1292,7 @@ static long do_mbind(unsigned long start, unsigned long len, if (nr_failed < 0) { err = nr_failed; + nr_failed = 0; } else { vma_iter_init(&vmi, mm, start); prev = vma_prev(&vmi); @@ -1305,19 +1303,24 @@ static long do_mbind(unsigned long start, unsigned long len, } } - if (!err) { - if (!list_empty(&pagelist)) { - nr_failed |= migrate_pages(&pagelist, new_folio, NULL, - start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND, NULL); + mmap_write_unlock(mm); + + if (!err && !list_empty(&pagelist)) { + /* Convert MPOL_DEFAULT's NULL to task or default policy */ + if (!new) { + new = get_task_policy(current); + mpol_get(new); } - if (nr_failed && (flags & MPOL_MF_STRICT)) - err = -EIO; + nr_failed |= migrate_pages(&pagelist, + alloc_migration_target_by_mpol, NULL, + (unsigned long)new, MIGRATE_SYNC, + MR_MEMPOLICY_MBIND, NULL); } + if (nr_failed && (flags & MPOL_MF_STRICT)) + err = -EIO; if (!list_empty(&pagelist)) putback_movable_pages(&pagelist); - - mmap_write_unlock(mm); mpol_out: mpol_put(new); if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) From patchwork Mon Sep 25 08:36:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13397491 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28088CE7A89 for ; Mon, 25 Sep 2023 08:36:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B75C28D001B; Mon, 25 Sep 2023 04:36:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B25DD8D0001; Mon, 25 Sep 2023 04:36:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C6698D001B; Mon, 25 Sep 2023 04:36:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8A9A68D0001 for ; Mon, 25 Sep 2023 04:36:27 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 56CB71CA762 for ; Mon, 25 Sep 2023 08:36:27 +0000 (UTC) X-FDA: 81274463214.20.36DEACD Received: from mail-yb1-f180.google.com (mail-yb1-f180.google.com [209.85.219.180]) by imf07.hostedemail.com (Postfix) with ESMTP id 8772C4002A for ; Mon, 25 Sep 2023 08:36:25 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0oL+gG2h; spf=pass (imf07.hostedemail.com: domain of hughd@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695630985; a=rsa-sha256; cv=none; b=qFXlgP3enb0AFrCueCufIEAMqPmtUa5cInifUq4Gr5+Z5U46iyqfGveyO1VGcPJnguWMuJ 84F8AgdDNLRDxQGXJ87Il+MZkBlGSeE0zIG8CUM1U45EEL+SeDQKPcPId/ETZGyJ+2PApD KGi0eFitClB92+Nun+iKZLaz49W9glE= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0oL+gG2h; spf=pass (imf07.hostedemail.com: domain of hughd@google.com designates 209.85.219.180 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695630985; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OU+65snbrvpR+Q3ulaNy8J1cYx1gkkm9B2VLdfzl674=; b=5tXEQsmFb2j2rGyS62kMPpyY6SeT0cKPkh2Ouooz7AT8WotQzIZwOcDxj2DEVZQkG66lmS CRCP5WEmdyjkEsflqnFFaFc2+kfDbia2uJD2D76K1/OAyqvWLR0TBBVFq7wRWecAzKsPgv Z/HJpSdFzJ2VQ2YQryYRc7KAVI0b22g= Received: by mail-yb1-f180.google.com with SMTP id 3f1490d57ef6-d8164e661abso6765042276.1 for ; Mon, 25 Sep 2023 01:36:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695630984; x=1696235784; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=OU+65snbrvpR+Q3ulaNy8J1cYx1gkkm9B2VLdfzl674=; b=0oL+gG2hE0zDZR0iS6nCfmlgo1Ti2MAWzKPE4PGfnj81bEnaPKIWz9B7zn0rScTiCc /SWlqx96uj4Wptdovd4QaImgm1UUlAFqVsreIjzw7R5pO5azX+zccYzEtCUinvKbPVEX lse/sJKn2IIgwM0WN7bO2Dv3u/k1IMjFuZLFUq3dSZSp82QLC8w6tVmMbZzZ5wWeIQga mhK0zjr5Ui9cYL5rrP1Gn5vFDBEXBjf1t0TLwJsT4xI2BE+BdcN/3wgAEH81my91QCxG LfvnYTnjWh5k5sx5mvaNp7ZNcMn7GdnGRNmqQ4bhpuWdrHEuZdWnvlBefHthmuZ5gJaP LR1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695630984; x=1696235784; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OU+65snbrvpR+Q3ulaNy8J1cYx1gkkm9B2VLdfzl674=; b=OeXDN9Wu8SjXE/TCU7QgAacYK9Fs95z7uGdPjltOZaw3NW9EqJQLaK/t9TLStpTW2E 7hwz+LHokOV8TYZxc50f+Y9x4BIAo8CBg2WfQZLS5CRHiLrl/zQdKJmfpMIny2/3YPoy i+0mru1Qju73K7uz1y+QYE6+cBezgBidDEaMs/21J0IwbiDVlaz5kqkekQ8u98yLHld8 6GWmmaQaLtavLvIyCGyVM0eywRl70WVHvknQX1q4QppW9UB1XmDqv54QNr/zEq8DD5hn TlSPgeiyyCQFmkH5z3gpwMVXG7DG4NBwbWmRAjMM2tPRkwG3RfwCOJ4vdYfiOwOrLTj8 rrrQ== X-Gm-Message-State: AOJu0YxpTJORRP7e2N7rDN7MkUVUl2lS9wTfToZUd2eINEJO3FVugWHZ wtzpGKPofmRi0zGrEMzAEimrzg== X-Google-Smtp-Source: AGHT+IHXAX1QQV6PVx66keFoYyD4CtYdpXj+z0LhxW9OnhchJFVwm0d282m2rz466G2xal6CpuLvhA== X-Received: by 2002:a25:c092:0:b0:d53:f98f:8018 with SMTP id c140-20020a25c092000000b00d53f98f8018mr5028309ybf.65.1695630984516; Mon, 25 Sep 2023 01:36:24 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id m17-20020a258011000000b00d85abbdc93esm2014417ybk.12.2023.09.25.01.36.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 01:36:23 -0700 (PDT) Date: Mon, 25 Sep 2023 01:36:21 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Andrew Morton cc: Andi Kleen , Christoph Lameter , Matthew Wilcox , Mike Kravetz , David Hildenbrand , Suren Baghdasaryan , Yang Shi , Sidhartha Kumar , Vishal Moola , Kefeng Wang , Greg Kroah-Hartman , Tejun Heo , Mel Gorman , Michal Hocko , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 12/12] mempolicy: migration attempt to match interleave nodes In-Reply-To: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> Message-ID: <21a2895b-4590-a0f4-c81c-67e059494583@google.com> References: <2d872cef-7787-a7ca-10e-9d45a64c80b4@google.com> MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 8772C4002A X-Stat-Signature: 5twmu99qymag6f6x954mum874oxnqrrt X-Rspam-User: X-HE-Tag: 1695630985-309080 X-HE-Meta: U2FsdGVkX1+oNZK3BLoYJOPg1S2BgjXiC3kedJfodb2E/xIeEl0uE5UnInObpgP8mF9lNU3+KM3eLxgfQpT3vaUP6ysJEyHPap7hiRqi534Xajkl+uKMhLiAbZO+hLUkXwS0or6t3LpfoqNrnbHrTvbvTY5F6/9ZVIo8KOqj0i/qnpfwYQIUMmumL/1MB3EWdH64d8HVYvlvNQrwgQIq7S8B1SW9IJhaT8jcYWZwkChKpMW6baLFHXYe3gD23u4RExVHlh6jgusAZPZAwBf39GxZdOlKC2f+XVt2/dEUA9RiJS3uEfKPiFaG4mw1ickDwtb6D3QlUabdHfx7d59OY0Lfhd05odfnHwUfUs2yhzR+ZyZPU5xovNUqhqJm+gsi1aPN0C/IPxSQlXGUn9tju5oUm2B+UuQq7LvymkiCfDncH5ZLKWGDGOB2jtd+dUDShZJvvXwh54GzlePZBUhiqBAlQSUaDEwHglCAzqdO58zG9pU/pXLq71tJxcRN636NYnv0z3Nqk22j4Dd9BqaE7Ncs+EPcjuR1cLcxo59nMPYIHAVX7neiUq9XSRNdpa20ArTzI0LKaAskINvvpueInMl0fquNVhGeQEZl7HxdjCBhuqciqkccMcYl7U6pKaWPjEVMh0O19QjA/qDqp7VNnomDLF4L8mSpL6Jb+ari7O/qybTHVKY/ZfPyodzohpXvdpJKr8+HuqFAmIyTH725HPy4zEQz04bOWBET4lxBe2gqbI3ZD+94YfnoM9PofewOqn8CvQtgQ1K3bmu8xv67U3mHpZeJ+iilIh853c9YrND9V5b4rzgMaziCSaVvOFFcPto7tvEM0JRvuEsFAJzQd/MYxBOQSA4xV8Is89byIRFevgVq+7gGMUlZK5VpEZclsvQesq8cPJPOlklgnJWXD1jV26V+f0hgzTwLsv6GV9LOmAWUsjHUP+T6d8uSl3HXjACBdHsfKOwx3EB9CTy QseTAPbC UnhQu6VFYWaRKGEHnUDf76pvvw/p8bPge9om2pbAPAi/PF3+lkM9QKdr7Ij8VearJ9s1LHpzQqfWRH0sI/0UU6OcDE1CBXWYo9DhOFNn04aW9jtB6/rjQjiNjcON1+vkBUcQAAh69ZwIuzhdlKtpYOAygYyWoOI7LTBNTJ340NbNbJB33jVXqQbVGzrT8MauqzaypZv0Ljd1vsDjifzn5P4LjOHwTlYd5BC5GEPC1Fnpsy56POmjZ5i7DwqeDIRXpS3NKew3/RTHzlUK9t5m4sW8o/znh3HeasTbPSqDjfda+XTlErhMn+qYD6qVS1lG3uvALTo+mxyUf8P6aHK6BuPf2sc7QfSYEiPSsFN/XARgPh48= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Improve alloc_migration_target_by_mpol()'s treatment of MPOL_INTERLEAVE. Make an effort in do_mbind(), to identify the correct interleave index for the first page to be migrated, so that it and all subsequent pages from the same vma will be targeted to precisely their intended nodes. Pages from following vmas will still be interleaved from the requested nodemask, but perhaps starting from a different base. Whether this is worth doing at all, or worth improving further, is arguable: queue_folio_required() is right not to care about the precise placement on interleaved nodes; but this little effort seems appropriate. Signed-off-by: Hugh Dickins --- mm/mempolicy.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 49 insertions(+), 3 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 74b1894d29c1..7bb9ff69879b 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -430,6 +430,11 @@ static bool strictly_unmovable(unsigned long flags) MPOL_MF_STRICT; } +struct migration_mpol { /* for alloc_migration_target_by_mpol() */ + struct mempolicy *pol; + pgoff_t ilx; +}; + struct queue_pages { struct list_head *pagelist; unsigned long flags; @@ -1175,8 +1180,9 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, static struct folio *alloc_migration_target_by_mpol(struct folio *src, unsigned long private) { - struct mempolicy *pol = (struct mempolicy *)private; - pgoff_t ilx = 0; /* improve on this later */ + struct migration_mpol *mmpol = (struct migration_mpol *)private; + struct mempolicy *pol = mmpol->pol; + pgoff_t ilx = mmpol->ilx; struct page *page; unsigned int order; int nid = numa_node_id(); @@ -1231,6 +1237,7 @@ static long do_mbind(unsigned long start, unsigned long len, struct mm_struct *mm = current->mm; struct vm_area_struct *vma, *prev; struct vma_iterator vmi; + struct migration_mpol mmpol; struct mempolicy *new; unsigned long end; long err; @@ -1311,9 +1318,48 @@ static long do_mbind(unsigned long start, unsigned long len, new = get_task_policy(current); mpol_get(new); } + mmpol.pol = new; + mmpol.ilx = 0; + + /* + * In the interleaved case, attempt to allocate on exactly the + * targeted nodes, for the first VMA to be migrated; for later + * VMAs, the nodes will still be interleaved from the targeted + * nodemask, but one by one may be selected differently. + */ + if (new->mode == MPOL_INTERLEAVE) { + struct page *page; + unsigned int order; + unsigned long addr = -EFAULT; + + list_for_each_entry(page, &pagelist, lru) { + if (!PageKsm(page)) + break; + } + if (!list_entry_is_head(page, &pagelist, lru)) { + vma_iter_init(&vmi, mm, start); + for_each_vma_range(vmi, vma, end) { + addr = page_address_in_vma(page, vma); + if (addr != -EFAULT) + break; + } + } + if (addr != -EFAULT) { + order = compound_order(page); + /* We already know the pol, but not the ilx */ + mpol_cond_put(get_vma_policy(vma, addr, order, + &mmpol.ilx)); + /* HugeTLBfs indexes in hpage_size */ + if (order && PageHuge(page)) + order = 0; + /* Set base from which to increment by index */ + mmpol.ilx -= page->index >> order; + } + } + nr_failed |= migrate_pages(&pagelist, alloc_migration_target_by_mpol, NULL, - (unsigned long)new, MIGRATE_SYNC, + (unsigned long)&mmpol, MIGRATE_SYNC, MR_MEMPOLICY_MBIND, NULL); }