From patchwork Sun Oct 27 19:59:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hugh Dickins X-Patchwork-Id: 13852728 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D6C3D1357E for ; Sun, 27 Oct 2024 19:59:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 927DF6B0095; Sun, 27 Oct 2024 15:59:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8D7736B0096; Sun, 27 Oct 2024 15:59:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 777FA6B0099; Sun, 27 Oct 2024 15:59:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 552B76B0095 for ; Sun, 27 Oct 2024 15:59:53 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 70B21AAC79 for ; Sun, 27 Oct 2024 19:59:09 +0000 (UTC) X-FDA: 82720446732.07.24B88DE Received: from mail-pg1-f175.google.com (mail-pg1-f175.google.com [209.85.215.175]) by imf28.hostedemail.com (Postfix) with ESMTP id 5E97BC0006 for ; Sun, 27 Oct 2024 19:59:28 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=NmZ2EzFY; spf=pass (imf28.hostedemail.com: domain of hughd@google.com designates 209.85.215.175 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730059033; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=2K0ZF9zx/iXF4BSeP4voI6OccaK4WNY3EiJpX9l6QVQ=; b=yv+cdgHQLIn2KgMSOF86ZY6qnKOXKnuzCslSsuRjDKp1hAHKevNtoyHINhB5dd9PIdzeEo 44w5cCpObWiP04QJfCNCiq94R1U9n6yBey/Vz9BScHsHv9KF6oEdrIjt+4W1IyRoQMeuZw OOyyBeX3NV/KUlzAqRyKW61Qmo1Bahc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730059033; a=rsa-sha256; cv=none; b=AqSgF8nY0pFAlmhOXctG849JUR9kV9M8t5U1pdOeZI3SRWt7SSeRwdyWspignmn238+5Cr 4AQVLRm9giKc0bdaow4o/K7LBD7NvMegNNYzZbMeSFPIubSBB/9WC0DtKAEX1H4YD3D6uw bhs5JKlSlovD6QRMVg9e05J4rFXSgSs= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=NmZ2EzFY; spf=pass (imf28.hostedemail.com: domain of hughd@google.com designates 209.85.215.175 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pg1-f175.google.com with SMTP id 41be03b00d2f7-7db238d07b3so2673003a12.2 for ; Sun, 27 Oct 2024 12:59:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730059190; x=1730663990; darn=kvack.org; h=mime-version:message-id:subject:cc:to:from:date:from:to:cc:subject :date:message-id:reply-to; bh=2K0ZF9zx/iXF4BSeP4voI6OccaK4WNY3EiJpX9l6QVQ=; b=NmZ2EzFYuFDMpuykWbifXT9dH67wXGaB9pdXtQ9CZO3uxox082rXx41Cd+SmUT5smo MsXs7nUuDVttSAkZrfWd3dIw7QJ1kGZ+wuPh9umTeJWoV2I0Mr7Gy3gO86FKDsoIHzDy rkJtC7lYOL7ojCtk+sr8hou9udPVXiTZkRYtPvJ0HM1749eKp7Z92/2yJGj9Kl56xgvp No47yRzC/xKE/PNdoBHsjC28wlB4066IcRkk3lHXyHrWB6oTltlbdD5ZKEUMekxQAfXU FQaNxJGyK0WZi0kxKaKW0X4HN1EcROMn0QDV1oF7yg7cv7OOgnWX/7NwDMTKgytwopjJ v7pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730059190; x=1730663990; h=mime-version:message-id:subject:cc:to:from:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=2K0ZF9zx/iXF4BSeP4voI6OccaK4WNY3EiJpX9l6QVQ=; b=kRdvQuONLxFRfzgVM1uD0f0DB4P95f8n7fCmxlnPkEtIrcYcqop+f2gzXA1wUAScE9 FFMP4nKssri8a1QZdJDIz0KxJ7lQb08WQXwxEJHEY5E2RcOMJp/S0iFUyV4hGBpyzFlu 95e84MEPAFfaD/8a4aX9TXYoUguQX1C5co2WaIxtrLiMWEnitH9zJcxPvbi1T0nLD8ux H3cADp1he19KjK4HdKboIeKmzoz71YvgShvvQH88sw6d82e3iu4bLe1PqGAbUleEJvjJ jNNT0BF5sn8GPE0t70ziCvYKFkC8rwZl4HaC5fFeIUgzw6MCZqhByxZWhXY/cH5/J05/ 52MA== X-Forwarded-Encrypted: i=1; AJvYcCVWnfiFByRsc5xxnB1HFD65aaIfuSGdmVXV35jSfzBjb8MgWO22K0nYnCtx5ik3dDQ5kzeh8zdwug==@kvack.org X-Gm-Message-State: AOJu0Yx/KuwjelAC0eRAkxctfozorjwfKO+P6Hh8xS4Mq4L5Z292DrRh pluvgbKKgB41WR/KpaIJNGVCNGRxp9Np5xJZiDkQal+LqbxBa+9q2u8Y1L083w== X-Google-Smtp-Source: AGHT+IEzH36YOXESM9Up0rNVkNIoEgpliyHU+RcGa54EOHpi5Nqa7gJZnKWl/7+xW3c5Q/LPMwj5Xw== X-Received: by 2002:a05:6a20:9f95:b0:1d7:ca7:bfce with SMTP id adf61e73a8af0-1d9a850a3c8mr9259258637.42.1730059189495; Sun, 27 Oct 2024 12:59:49 -0700 (PDT) Received: from darker.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-72057a2493bsm4476650b3a.179.2024.10.27.12.59.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 27 Oct 2024 12:59:48 -0700 (PDT) Date: Sun, 27 Oct 2024 12:59:34 -0700 (PDT) From: Hugh Dickins To: Andrew Morton cc: Usama Arif , Yang Shi , Wei Yang , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Johannes Weiner , Baolin Wang , Barry Song , Kefeng Wang , Ryan Roberts , Nhat Pham , Zi Yan , Chris Li , Shakeel Butt , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH hotfix v2 1/2] mm/thp: fix deferred split queue not partially_mapped Message-ID: <81e34a8b-113a-0701-740e-2135c97eb1d7@google.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5E97BC0006 X-Stat-Signature: ntahmq3rdp5cw9h7riursd17zx6fcgfu X-HE-Tag: 1730059168-816469 X-HE-Meta: U2FsdGVkX19THXkmOOv7+PY0/m6hWVga1k2EWUobgjF+ugqd5/Wb5VLqcUIlwJsGI1SRtATQULuOnlKMyvMZED6vTVhhWcGUR9V5PWPp7jcZSkNhFAZXeCFZWeo8ZVSI2ZmRBiNLUY5OVMotOAK1tEVs8JQau75cntlfVelFpalfhMK3y5wSwiAxCXapm718ihNxCfkuYZGH4/mbQSpV7VdfvFuEXtWFZabeOHF9iWyZlmQrPJ0h20iXbvSIcIQzs4RD9KjKFG63HkZ3AfnqhAAhe0EWtaEe7X7QmaFSrMfDrc933saBfU0ro2PcMEqSLhgLmRl9nXXEwL6umzQjSZdsJYDwSo4ZUAmae8EmlzGj5Bbhd/J3OXIHPle+dlq2olcBCpQp9kjXCr3/EmNHr/4KzuH32PhjCc6soEZP+ZBtZgks21COvpu3/5EBrMLznEaLjKUIRbpI/O8rn21BeGRLq0nCSGGYmLB/yVGRBZJKt7lGXsW/FKbxJByT0lx11OBwI+hgMOSelHtT+37DNVCnSr9MSsIlV6eeAm3fj8ovmUyCEYe3su9zgCnGkt3ocTwVWCCa4tBEslnYnx9KkpNGkBPgbwHhQd5eF7RlDiL9f2Zb6/x5c0/JlpMKWlyf31Jc5Fi7qo5O9HEdEZdjoLbiziaGdvcy/Nlu1+OVpqAEdKiyM8mPgijuHaZTfWfpfc5u1U2lMe2n67ZXeomxBds83RaKHEkzn6i1IABtCoVNt8pE8ioEHSl5E61o53UmWdN7hjTmfBPeTWrHgSzB3f0QzaQgAxQOL4jyxap4gvzuQkwqlhle7K/Wj3JTbZwRDYmyFY/B4R0yoMO0WkXJqYKDQTHGaUAUlj3/vv2MPv9NtVs+FhGuiFLQ9NyDr4OL0YH9R7AZjFw5DSWVcxmAUCoFQwibgfEd8uunp1jUssykR0Fskg8RjaSHNSlxhEriomIuzO1PeYeoz11mkei tf/gN4vJ e7lvNmXhaBSEsj1hGdtNVt0mQrgOXQX9ek+SzPttALQXI/eMykH+7z6kXGXvNfwvMI+mqM6UhQKssVopKf7diRWdaycgmJdt4hoA88V/n10KWCwNdX5eb2GYEthLfe0y4pSJIbnGS7p0tjpSJMSTw/O91rqbP5ETalkBUKqE8RCwSTVSQP3eKIcA2LksIRep4aqFO4KBOZ4ue+if93bJGjzR7tT/HGvlyC6JhBTGkDt0uzhqPeVyuyO2peYuyImZSHa27MFX0/n19zyQKsA10mZ/jdE9pAh50fYjZtLV2DDyv0lpr7V/40pOeHuKYbf9gnb78zQpwKTPCa2xMxBZD7A7OFJuV4Z2saCJPiBIDoqkRQf7yxZWIOk/G2jyHMuY/NDOFRYVbmUhHstysqW7mG9u21/VBDrgHtzQ3IkuhTAJSQ5FmVMf5munu5Zy28ZwkE+GgaMIkQBaxI4MUIWTutGdzCtTbuRmTLrrGWO9EwZ2UkGwVy5vJBzc9aXwf45JJ7rvRfPOb9MZBX4jlgU3jgOMJueMSOohEPUKh X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Recent changes are putting more pressure on THP deferred split queues: under load revealing long-standing races, causing list_del corruptions, "Bad page state"s and worse (I keep BUGs in both of those, so usually don't get to see how badly they end up without). The relevant recent changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, improved swap allocation, and underused THP splitting. The new unlocked list_del_init() in deferred_split_scan() is buggy. I gave bad advice, it looks plausible since that's a local on-stack list, but the fact is that it can race with a third party freeing or migrating the preceding folio (properly unqueueing it with refcount 0 while holding split_queue_lock), thereby corrupting the list linkage. The obvious answer would be to take split_queue_lock there: but it has a long history of contention, so I'm reluctant to add to that. Instead, make sure that there is always one safe (raised refcount) folio before, by delaying its folio_put(). (And of course I was wrong to suggest updating split_queue_len without the lock: leave that until the splice.) And remove two over-eager partially_mapped checks, restoring those tests to how they were before: if uncharge_folio() or free_tail_page_prepare() finds _deferred_list non-empty, it's in trouble whether or not that folio is partially_mapped (and the flag was already cleared in the latter case). Fixes: dafff3f4c850 ("mm: split underused THPs") Signed-off-by: Hugh Dickins Acked-by: Usama Arif Reviewed-by: David Hildenbrand Reviewed-by: Baolin Wang Acked-by: Zi Yan --- Based on 6.12-rc4 v2: added ack and reviewed-bys mm/huge_memory.c | 21 +++++++++++++++++---- mm/memcontrol.c | 3 +-- mm/page_alloc.c | 5 ++--- 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2fb328880b50..a1d345f1680c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3718,8 +3718,8 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, struct deferred_split *ds_queue = &pgdata->deferred_split_queue; unsigned long flags; LIST_HEAD(list); - struct folio *folio, *next; - int split = 0; + struct folio *folio, *next, *prev = NULL; + int split = 0, removed = 0; #ifdef CONFIG_MEMCG if (sc->memcg) @@ -3775,15 +3775,28 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, */ if (!did_split && !folio_test_partially_mapped(folio)) { list_del_init(&folio->_deferred_list); - ds_queue->split_queue_len--; + removed++; + } else { + /* + * That unlocked list_del_init() above would be unsafe, + * unless its folio is separated from any earlier folios + * left on the list (which may be concurrently unqueued) + * by one safe folio with refcount still raised. + */ + swap(folio, prev); } - folio_put(folio); + if (folio) + folio_put(folio); } spin_lock_irqsave(&ds_queue->split_queue_lock, flags); list_splice_tail(&list, &ds_queue->split_queue); + ds_queue->split_queue_len -= removed; spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + if (prev) + folio_put(prev); + /* * Stop shrinker if we didn't split any page, but the queue is empty. * This can happen if pages were freed under us. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 7845c64a2c57..2703227cce88 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4631,8 +4631,7 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug) VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); VM_BUG_ON_FOLIO(folio_order(folio) > 1 && !folio_test_hugetlb(folio) && - !list_empty(&folio->_deferred_list) && - folio_test_partially_mapped(folio), folio); + !list_empty(&folio->_deferred_list), folio); /* * Nobody should be changing or seriously looking at diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8afab64814dc..4b21a368b4e2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -961,9 +961,8 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) break; case 2: /* the second tail page: deferred_list overlaps ->mapping */ - if (unlikely(!list_empty(&folio->_deferred_list) && - folio_test_partially_mapped(folio))) { - bad_page(page, "partially mapped folio on deferred list"); + if (unlikely(!list_empty(&folio->_deferred_list))) { + bad_page(page, "on deferred list"); goto out; } break;