[0/2] Fixes for hugetlb mapcount at most 1 for shared PMDs

Message ID	20230126222721.222195-1-mike.kravetz@oracle.com (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Mike Kravetz <mike.kravetz@oracle.com> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>, James Houghton <jthoughton@google.com>, Peter Xu <peterx@redhat.com>, David Hildenbrand <david@redhat.com>, Michal Hocko <mhocko@suse.com>, Yang Shi <shy828301@gmail.com>, Vishal Moola <vishal.moola@gmail.com>, Matthew Wilcox <willy@infradead.org>, Muchun Song <songmuchun@bytedance.com>, Andrew Morton <akpm@linux-foundation.org>, Mike Kravetz <mike.kravetz@oracle.com> Subject: [PATCH 0/2] Fixes for hugetlb mapcount at most 1 for shared PMDs Date: Thu, 26 Jan 2023 14:27:19 -0800 Message-Id: <20230126222721.222195-1-mike.kravetz@oracle.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain MIME-Version: 1.0 Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Fixes for hugetlb mapcount at most 1 for shared PMDs \| expand [0/2] Fixes for hugetlb mapcount at most 1 for shared PMDs [1/2] mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps [2/2] migrate: hugetlb: Check for hugetlb shared PMD in node migration

Message ID

20230126222721.222195-1-mike.kravetz@oracle.com (mailing list archive)

Headers

From: Mike Kravetz <mike.kravetz@oracle.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
        James Houghton <jthoughton@google.com>, Peter Xu <peterx@redhat.com>,
        David Hildenbrand <david@redhat.com>, Michal Hocko <mhocko@suse.com>,
        Yang Shi <shy828301@gmail.com>, Vishal Moola <vishal.moola@gmail.com>,
        Matthew Wilcox <willy@infradead.org>,
        Muchun Song <songmuchun@bytedance.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Mike Kravetz <mike.kravetz@oracle.com>
Subject: [PATCH 0/2] Fixes for hugetlb mapcount at most 1 for shared PMDs
Date: Thu, 26 Jan 2023 14:27:19 -0800
Message-Id: <20230126222721.222195-1-mike.kravetz@oracle.com>
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
MIME-Version: 1.0
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 AYZtaKLHai9S9UB5opBeT4rA4pYpwjOd3hu+yTxP79kdVWfgHx8qeMFvstg65IS07Yu6JRGaCwSsrkHWcLyQ0TtGx1G4+ff1hpu2M1j220vBT48qGiidrFaEWrNfHpt3syYhs4m4wX74x9797KETKvR9qBDLs23kIUfOlHKLTBfWPpcDK9tGfKZnLH2SKSJonx5VofcykLzNxhctmt++A2Z8wLqvoIVvBX86BBHXshsRbf3N5e/URohyIQfiKHR+XKa336iqfsDdhrm2V36YJ5e1EmgGkv+wBxRr4T+hdjP0zmxhDqsaS0TweKnWCqBRCjJ34Ib8oVS8ksAV3ltKMZ/hTolWNizw5QQLjQeT8Utgi3HQHiw15DpR4m52PgwCA5oIuaT+AAxD1EoI0ensturD3gU/gPE+f4IVhamW4OzeyZb1ZD53wM6og9RM0gh9auHrPxN0AYM9nL4utT3JcB+ULwTsn8TPrKxTc9qrYiwZc8fOHrloYI+JQksC7x1lyuSrCQ0xdFI6xhPCpzqotkL7pQfCM00rL7EhZT7CgCBmCePNE3KJ8bVluMgKsm9GDIE0zrd6BhfWwRW8jEXrmpbeRvaryGU351kuJQTUn29opdm/Ur8shwVdDCm3drk6vdV8VI0NFgNbknp/afl/S+tFY+sGZGrcitufoqZ4yq3pqi67phGcpnIWnt7eUmSjtWmTYsoJ1W58sBUuYb/QdI2XLHs1dQ38pFzAF8jcOPm229WupThAvm7YO9TQUFZADvqWsJyJT559DmsuUnMXDgbbKkUEgvrJPi+jrnF753iBJDhU3LaYkirgSejbX++w0vUxvY/WYf2BfgdtxNWInLzFFpKp4hHXHqCe9hxzyE4DUT1dBLfrgSOb1LpBSNzDEZSpPKxT35DjlvFph/RmPKpyj6dCVdz4y8BNwT4/7d4WNlk3kLRhMWRtrz1IERtMXp1PCoOYMq2Z9XZ2BX6oMq2eksTF/qqPpVLwljBOXEsuc3XXaFb5NBdgGAfOdW5j3qgJ0/JfrafC1xL0MPnyIB2YEsry+tZd4WNgZnaYhFPgyFMvgB0OHewfKTD9jDUMrlKpuj7KGN9DFF/IeqQBYYBxrSu+HOIPvSgVL+qQEaVHAXPp8NUFmsDHmmUfxwyrs8yICWOOZ95PqtxMVfPa2QM/lN0szUbx1JxkMOw6hruSJG/Qo9PmGayZF6f6w4vwa+Svq/2ofvwZBSHsVwuqtDyCH2SG890obWtJbKnKNAGA2RWZjlLdbDNiphMlnPibwG+PcO0XpEetwxrJXo2bbeebqA5uwq1fXOafeddQvlazMvOcvaTYGP7+x5hdpeiR3bR9jCJu3mGZd86rjR2k47PQ1HykXLS3XICl287+zlpMq6j9l6x+7Q/dIMYi2cUlDeIgls++7tvphiVRkjDie+np2u3/0UOe3SC6DEEzYMLoJ8N7EO49/jcC5hH+6gxYveiJXanIbJc8FZkdm5JBmJNIOTeDIHVphis3V5FsD9JUYEZsTsi2m1cxlAzIJf04CsaEhFro7A3+oja2TGPk6ErHF2fvSH2smO5vTrzNdJBCmW/8367unAVDTyvOK5jOBAFxU2ITM93ByGG2BHwYlA==
X-MS-Exchange-AntiSpam-ExternalHop-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-ExternalHop-MessageData-0: 
 esYhIy7p9i2+MRmiO5XyNSQ6qHGq9vFNMC+sXkRZHm+WL005zF/VRPrLC4/Eriu389URVnc7t7cKD46A6tjD+BQUhWQvfv1I/LjYCAupJ1ARVYQ4XZm36UpdpDsFqT1CdDYo2XrEQ9VhFqv3tW06uxhZkTIlF4at5sZzdldfxjoavmJCEXlnQdjNr35qI9HqcO9KzJAYf0N6NZq6cMNQPVc44UL4Yjo6VKSMQwsF3av0bzOBbbqYJk7fvp8B6mxVyCnqZSNnOXb8GFCImPzci1El+0Pg62ZWDf4QHG3r5TDdc0gyJ/FUHLrpXb13Dm6jDCceVnpY3qGTqZ9rNTWeiWmO/uJKhPN23AoeQec4TGQ//TM9pGRsxip9iB3exUqTv9W5lNtOmaxE0HRvjESauMyP1NThxu5jVE/uXqe+VM3h+05b8o6qzi2p2DbZf9hhmqAiQoqlSEHEIwL8AseoNQM/hSE2KT90uZi20wvpRqjFgR7wJhLWOYEmADd8ATwFWH1LPc20clOazNI8XJzHAVSimBb5yAMmIfTs2k2e86dvDW6JS0YC9mOUwhCV2suxZFhRBwCfdBe4TZkKiqiP4jcqbyEGrPBsuN+rlReTtAgwFkDK2M8N+cnes34rFcWZTawJWGIhN3NODU/bbA4guONZlIaBsEEtGf7UofwWqdgTIX4jOdVdnoEVdUsUtmR49U1GpzxMtFTK9Fc+qp2padvcaHXLg0dunEZFprCf/908y9+3VLuCuwkICvugXX3u9mda3w5MRkH5lijzwTHVE30J7p0prf04gqe9afi8pCr0dJlcmxd8oJJUKgE9iHvqyqfeSfE0pFmMO63J51E1iZO7b43U34abxY+tuy4r3ROZbZ+K1Z+Z9qNOI1Dqqvp9WGO4T0DRRF9lWgzxZ34IitxhCSmD+RfmoX+WgO8Z6ODHHeESKZr8Yb+8QeG3lpt6evJG/w8vy4ET+6l858ZYCGhQxZjnN+VoU0vs56xcPSIQwcCUoANhs7z7nItXvghCX9p1lqDy2wf1NpzJypnx87COZWFods4wdvlCg6wMFUGGZFaXspoPIa7MpV0v4J8q
X-OriginatorOrg: oracle.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 650852e6-2360-4564-0a37-08daffec8072
X-MS-Exchange-CrossTenant-AuthSource: BY5PR10MB4196.namprd10.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Jan 2023 22:27:26.0074
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 2TUhqUaTkpels8PyyBROWAV9NqS0AxqmxbsH7Oh7PLhDtMAfTJy/yFqt33d0a6qHKc/ZcDcS0z0vbQFfU466Cw==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN6PR10MB7443
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1
 definitions=2023-01-26_09,2023-01-26_01,2022-06-22_01
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0
 suspectscore=0
 malwarescore=0 phishscore=0 mlxlogscore=962 adultscore=0 mlxscore=0
 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2212070000 definitions=main-2301260209
X-Proofpoint-GUID: oXW5PFZtzOXmHkJwD4s9PD51YwTsWOof
X-Proofpoint-ORIG-GUID: oXW5PFZtzOXmHkJwD4s9PD51YwTsWOof
X-Rspam-User: 
X-Rspamd-Queue-Id: 0B3E7A000C
X-Rspamd-Server: rspam01
X-Stat-Signature: ahottzyrx47j5xpn936x7fydnn6oa7z6
X-HE-Tag: 1674772069-256225
X-HE-Meta: 
 U2FsdGVkX183cXXeGoWz8NE6Pw0sfNfSroqBmO+CtCJvVrKQ6J6rTiQUn1vxyNfFlpT9JVWbjDqf6oY7enZiExE4/xXCN0cPRfRMy3+Jd6dwAbcyttcsDvVbFXB9d01OVpd6er+qLwmttZPCI20WaFU8S3axGyIN0NPwDF1OyEyECJJg5pLlgGktPjYw19go8GNelw/ftc4cSAuBNVJRP7m927K2nraWuf4xDGlOqKBAPwwyqPSNE9Oqjj9iQGm5ecRIQyeUTvT6Pq5btZUBC8ZGyXONuHWYo7xqFPdg7zhA1q9AJGPzRhMOw1xC0yzwSODPtpJv27AESIgMLu+hJh5gBiQOHheJ45epx9Gc5CrGoU4g2+r+72vRFFrWVXnjiM9s6TPnBd8LtQNgp5hr9kJKrEpeCkgV1xCMP39hs7TAQ4TR4etpupGTrtNaY6iNFDht+VHF8Xk2saU7VxmFeguvHONtdPxIgVWn4pFnBbdNxG29KDYZ/mOrGOWeu59aRgcGxMwOrLIePzn+XdD3ConsiCBY/wcaKH5hJMZ1n5inHiHQBf6h6UGlFMLEVB6jnY+daUINQBn06D2VnI8IjLUp0oa9hWE42rRM8xzd0306BDUe541P9XeRUvZUH/p0j8YvTXqOgAlNLcahErR3SjFQGtVH/aZAVz6dBOAlNI16KBymaEOelp/hEqo+z5bIfy5OoJ+JWBqwlCTP/jrfY4t8I2clvLA/Btd66LHt6AGgkDWvGrICXfJ2iTHOLiFNhBct16JGE0Izn+LTqqoVPHPK7IsvBBmYKFFQF+KxdIlgp/hfoMHNopwoWA/M3ed9jmMrASSoba1nikP58Me3RheESIkxZhgIfrCflpwdB2/UouIkOAOhY5L2agK71YhxP7ecTr5iDYyrODkYeIui8TGCzuYbTlt+q/Kz1jypUEIlUTlXtXtZLxT5TS7tAGu31+V7hLz8GWGHyPExtt6
 fOGrpJ+5
 Sq4udivVlEHBdq5gLBBcbR9B7flV8JTpc9Dfb8JM1W4tyu+RpxDS18sZjojU632jHcwAnaD6Gku009x05ybm+T0CA9ZBfDrVPzNcdDIYm+M7er7Xg3xdYxSbR82I5nnHDWFQiczGKHDJuNi3WiiYNm7Gu6ooDjogNDrmFCC8GMsl34tNJU8074ScsSXgm2F28wE1RuRP0O51nrA+UNVbs4pw+lZTKKCTG3L6AtKxVlP4u4O/iwcXmO8b2yxBoefw2rfOuuE8Wq6lLgzzF82yXfkfQ15ECSo0juDE8apRh+raaViCZcq2vyf4qYbEozxB1y/r/CeyT0tHlakgHAh9KOpY/m+4ZH718W0mK+rHCXmfpG4GRdm4itMo2Wc6ZYslnNxN7s/PVFBT68OOe3ejV4ZxpcxYyd0G/ERvX4X7JABQEBuGD2HTEXyaZyFxWMEuntSOIqHHgD+wlAzvz/GXiu46sS9/apOlXrOyI+urpzq67UERVNKNFIjXgAw==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Series

Fixes for hugetlb mapcount at most 1 for shared PMDs | expand

Message

Mike Kravetz Jan. 26, 2023, 10:27 p.m. UTC

This issue of mapcount in hugetlb pages referenced by shared PMDs was
discussed in [1].  The following two patches address user visible
behavior caused by this issue.

Patches apply to mm-stable as they can also target stable backports.

Ongoing folio conversions cause context conflicts in the second patch
when applied to mm-unstable/linux-next.  I can create separate patch(es)
if people agree with these.

[1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/
Mike Kravetz (2):
  mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps
  migrate: hugetlb: Check for hugetlb shared PMD in node migration

 fs/proc/task_mmu.c      | 10 ++++++++--
 include/linux/hugetlb.h | 12 ++++++++++++
 mm/mempolicy.c          |  3 ++-
 3 files changed, 22 insertions(+), 3 deletions(-)

Comments

Andrew Morton Jan. 26, 2023, 10:47 p.m. UTC | #1

On Thu, 26 Jan 2023 14:27:19 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> Ongoing folio conversions cause context conflicts in the second patch
> when applied to mm-unstable/linux-next.  I can create separate patch(es)
> if people agree with these.

I fixed things up.  queue_folios_hugetlb() is now

static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask,
			       unsigned long addr, unsigned long end,
			       struct mm_walk *walk)
{
	int ret = 0;
#ifdef CONFIG_HUGETLB_PAGE
	struct queue_pages *qp = walk->private;
	unsigned long flags = (qp->flags & MPOL_MF_VALID);
	struct folio *folio;
	spinlock_t *ptl;
	pte_t entry;

	ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte);
	entry = huge_ptep_get(pte);
	if (!pte_present(entry))
		goto unlock;
	folio = pfn_folio(pte_pfn(entry));
	if (!queue_folio_required(folio, qp))
		goto unlock;

	if (flags == MPOL_MF_STRICT) {
		/*
		 * STRICT alone means only detecting misplaced folio and no
		 * need to further check other vma.
		 */
		ret = -EIO;
		goto unlock;
	}

	if (!vma_migratable(walk->vma)) {
		/*
		 * Must be STRICT with MOVE*, otherwise .test_walk() have
		 * stopped walking current vma.
		 * Detecting misplaced folio but allow migrating folios which
		 * have been queued.
		 */
		ret = 1;
		goto unlock;
	}

	/*
	 * With MPOL_MF_MOVE, we try to migrate only unshared folios. If it
	 * is shared it is likely not worth migrating.
	 *
	 * To check if the folio is shared, ideally we want to make sure
	 * every page is mapped to the same process. Doing that is very
	 * expensive, so check the estimated mapcount of the folio instead.
	 */
	if (flags & (MPOL_MF_MOVE_ALL) ||
	    (flags & MPOL_MF_MOVE && folio_estimated_mapcount(folio) == 1 &&
	     !hugetlb_pmd_shared(pte))) {
		if (isolate_hugetlb(folio, qp->pagelist) &&
			(flags & MPOL_MF_STRICT))
			/*
			 * Failed to isolate folio but allow migrating pages
			 * which have been queued.
			 */
			ret = 1;
	}
unlock:
	spin_unlock(ptl);
#else
	BUG();
#endif
	return ret;
}

Peter Xu Jan. 26, 2023, 10:48 p.m. UTC | #2

On Thu, Jan 26, 2023 at 02:27:19PM -0800, Mike Kravetz wrote:
> This issue of mapcount in hugetlb pages referenced by shared PMDs was
> discussed in [1].  The following two patches address user visible
> behavior caused by this issue.
> 
> Patches apply to mm-stable as they can also target stable backports.
> 
> Ongoing folio conversions cause context conflicts in the second patch
> when applied to mm-unstable/linux-next.  I can create separate patch(es)
> if people agree with these.
> 
> [1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/
> Mike Kravetz (2):
>   mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps
>   migrate: hugetlb: Check for hugetlb shared PMD in node migration

Acked-by: Peter Xu <peterx@redhat.com>