From patchwork Tue Apr 15 02:45:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Muchun Song X-Patchwork-Id: 14051360 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29ED4C369B4 for ; Tue, 15 Apr 2025 02:46:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C347328019A; Mon, 14 Apr 2025 22:46:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BE44D2800C2; Mon, 14 Apr 2025 22:46:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A84CB28019A; Mon, 14 Apr 2025 22:46:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 89B902800C2 for ; Mon, 14 Apr 2025 22:46:35 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 018E11A0B5B for ; Tue, 15 Apr 2025 02:46:35 +0000 (UTC) X-FDA: 83334739992.04.880A603 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf08.hostedemail.com (Postfix) with ESMTP id 1C9C7160002 for ; Tue, 15 Apr 2025 02:46:33 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=EK0tmV2E; spf=pass (imf08.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744685194; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=y8vA4hA6IVI4psjQZDVnpF3qlQM7CohLDDJ5h+RUQ50=; b=5VgF0I2NRf64JDsRFvX3upSszg2j8ebYm3O8sR1zllyB/znD7KFNhFV6pvU+dF56nGyEdn w3eJI7sX0qLyfRJ8YJ7ojeqWzYqNiOQt4o16XXKI/i5dBt5n3I8jUd+2HEqLeS06AC1hHF SLhhVw+LZ208/AFhGtsBMRDJngqRiBY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744685194; a=rsa-sha256; cv=none; b=q0ACGNbnD0YWE5glhlIhJ56lzqge8jsDk16EbMUzsEmBt0ZOGVW3BJFG0g6hX4PnSqpDhV MhUIxx4HtkVmvq09Ml7jTh512dQN6UKYd9j9aNUuWE0I9+okPhWufpbfZmTa9oMyO5dwwr zRCtP3ftoMSMw+zqHi7+qAkpc8s4XCA= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=EK0tmV2E; spf=pass (imf08.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-227b650504fso48139495ad.0 for ; Mon, 14 Apr 2025 19:46:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744685193; x=1745289993; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=y8vA4hA6IVI4psjQZDVnpF3qlQM7CohLDDJ5h+RUQ50=; b=EK0tmV2EJ+kGNp5AZbhg89fTc+GgwNcZ8dO7HmzR2PqLqsAsZfHJyxr8GzMqCf5w1r yeCgThYth8qi0k7ZWrAYsxujxubvCYSaY+FllWopUC4sonxlq+e5XHHhP2QbtMqjdiCE x44nVPGc10bnfezWfMVt3f0SEqTvPYQsOFDCg2wrbmqhxd/LRMbVFVtN/T9yLULcq/oR xJ+exp0QKQonFDGwfN1kH6fkSyx3zilNWPgO/g+trc7cYFxbbs6FpE7yIpdH8MQ1Tt34 /n2TK2bKCyFbDuLcw27iyAYWP+FtMDTfh8383ai2BLYmR1JNwHeuKePmdcaaLyY7x9CR Vr3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744685193; x=1745289993; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y8vA4hA6IVI4psjQZDVnpF3qlQM7CohLDDJ5h+RUQ50=; b=rnHVBkA1H6Rfr7TozOkPy5GQuL+9Xx4GmA8EpWVpixa24VlaFHbdw1tj18myn5E9GH luPzhXixBjAe3vlM19pP5z5rv3i2PU34EROU4nmzyLxrS2ZwtvumLPaALGMlGWpjbI+b b5joUiNooJPUoSao7W5zNP8wtGnk/gS5J10Hh0NHTrWc337SXgaKeCFMWO5pIboORTvg TkZqgokaM0uLqUcaYj97nwJNFPdKgzChxNokRe8WU65Bx4+KOE/0Qq+f7pUY9IvoNf5n 0oazYJVau1Wai3xx7Ak1uESXLJfpTDW8p+csqf5tAIR92cbo9YdRVEmnzYZzT2fvOsqX r3Xw== X-Forwarded-Encrypted: i=1; AJvYcCXbDBKQ6D7oXhXqLeC6o5BAbZT4CkGikf7aPivuTbLbYR/+lCkTNwvy8XqxZC5tcUoQd27kOdnGGA==@kvack.org X-Gm-Message-State: AOJu0YzyY6Ep2rS3cwTQawyar28GjwL/mjiwAEocVCG4UtJ+u3fmPA0w s8eRbU2Jgr3lJCXa667z24Q8ijrwCD3NkKobR4LFEzravhdrCOc4JBlVZVZn6Hc= X-Gm-Gg: ASbGnctQarPjhpeZxf9RD1vlp5MxEJQf9IPDxBjDuLV4AURemv2YJjbF0GVPL54h5tK M5Q+AoDc0ya7YBYMbEd/XlyGL3pczih6Bso+87ifCzRBPF9RmsF0ILkBM066r9mPqGDpWBHIleS dLnfKTXRXyzeSAuDPHN8WZByO5qz5HLnN+TXv+L5+sErkpPX+ATP0BwMCpbzSQPW0AcTJp3cupR ddH4WhXqwpGZrXLII2eZPC0rVrxF9P+0W6H2AnduNzIIRDaQUWGaYhWVd+Qt4XMEnoF1gHUVvm3 r+jOXW4MSVqE2p6M0HJd3BUSGPy9Dt+D9eYPIltXITG21fWaIKq0hkLsXf94xRYLIXd1i1ik X-Google-Smtp-Source: AGHT+IGN/ZCUfpQSbaBS4W8bBOfntfSIoehjtu5uWTeww1FUk2wFfobrLS/nbSO2R96mT1mo6Vfdog== X-Received: by 2002:a17:902:e552:b0:223:66bc:f1de with SMTP id d9443c01a7336-22bea4afc0fmr177104965ad.21.1744685192967; Mon, 14 Apr 2025 19:46:32 -0700 (PDT) Received: from PXLDJ45XCM.bytedance.net ([61.213.176.5]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7ccac49sm106681185ad.217.2025.04.14.19.46.28 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 14 Apr 2025 19:46:32 -0700 (PDT) From: Muchun Song To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, david@fromorbit.com, zhengqi.arch@bytedance.com, yosry.ahmed@linux.dev, nphamcs@gmail.com, chengming.zhou@linux.dev Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com, Muchun Song Subject: [PATCH RFC 07/28] mm: thp: use folio_batch to handle THP splitting in deferred_split_scan() Date: Tue, 15 Apr 2025 10:45:11 +0800 Message-Id: <20250415024532.26632-8-songmuchun@bytedance.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250415024532.26632-1-songmuchun@bytedance.com> References: <20250415024532.26632-1-songmuchun@bytedance.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 1C9C7160002 X-Stat-Signature: 4376j19neta7ru1b5ikuohrugcowqfyp X-HE-Tag: 1744685193-823830 X-HE-Meta: U2FsdGVkX1/UX3VLCAryOwZdzQJZfK+jKf/qRIxmI1HZ9kWNEC+wnH+Btqpia8TBNetQYMX/vSX89/VipwPtetN8zZEtkEUMIA1yBBIP59rpRill57Gr1omN6fbvbv3pyRpXh2qQYWfr8X7MdovOYfQOoQkhQXhEaXQW+ypsE+neL1a5Tr0WzCRoVSYWn8dZOoUOlLsY9gFbi5VR5T8rIzNnDdDahDtKcx8l9WXqi32SzMXHFN1YiQKjC4K2+wKYR1RBUEJliiN/EDcr3+DM0gzRUMv1il13B256vaNrevRjIZDW9HawTDKuCazEl5rPk3kyut+KLXzBb4Xy2A1orUUid6OUCWjXBg7dLrEmLBe+AMOYwddlv7Ybzcvv1c9jYTfHqizTd+uIpApMfCHzQvB6VyboSHgoduwanvnboQU/a7grnsS1OaXgMa0Tvj74/QOtjF/XcgPPyxAXYxEObgMq1aLtnSAy9re4TPc0jPe3AMRPScaocOfypfyb9nMPhjSfZ3v59MFteGFK6FXPOvV0IlwpyEbLqc8pTc9AKul7xa8ZtpQyKURYUqTevDyKwGRAZZD+bfcEC1At+c9bJLRguRaM5ZzJ8cfcJ0U23gvgfpAlEY+b5Br9ov0ytER5CjFFzMyU2Vs5pG8SnexqAr3MPv5qadOo4nyfvZMrValmV4yoWTUGd55ebECk9PeW6Y/DSGFHRFDQl73mcdKU2aEsub4GkRxfaR2+Qe9oNW4F261XuNuOoy1LMtingDr78VPLVtKIe/TdP0Uho7FsByE/I3AXduoQkywvNMau4xhXdNMvM9HkJGNtixH4iQcJP5nOgPRWXOsBBfwYYcX3CsLcajDjS8nkgKHXUn3dc/BUm7iluhSe5A66GStuHqgTImS93vqJyeVmNmq0qnnySJKlo5mqQ1v40ENas9svAAMXKPSSDQFbxWuQMcdNvuO00ozpVqJPxAfPnNLId0R PwWBfs8H j+c4j1jVEGgxGFs/VUXF0+EPIyef8Fv9g8rYVm63i0FNcaOfLqFQ9l7onIDu30KVDEgGWqRjNTACOnjLudJXRpsTQ68OiR+MEGgFBWqIw3S2VqDEXzVpjV95y8Umz6ixIjWqB7cVFzhHoGIy4rPkEpifmoAFQ3DN1mFl3KZ8yutL/hntgDh7xuxZ6g61QdZC65HZ7xqMkt7kKVEw5oK/rQB1MZWG+1cw2v5VXgaAcy4cvfHxxDpkUUF5QXphkPoX8XLVdp6j/cOFwbtnS+WFqWHzog0xJr5WXCLLgze3MSHaRD8Dv+wDHcKWPgFIgTTJuaBHz3Xp4l+8K2zyUGEMkA8GD8Pf54nQErNh1dpZEfzXaYHE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The maintenance of the folio->_deferred_list is intricate because it's reused in a local list. Here are some peculiarities: 1) When a folio is removed from its split queue and added to a local on-stack list in deferred_split_scan(), the ->split_queue_len isn't updated, leading to an inconsistency between it and the actual number of folios in the split queue. 2) When the folio is split via split_folio() later, it's removed from the local list while holding the split queue lock. At this time, this lock protects the local list, not the split queue. 3) To handle the race condition with a third-party freeing or migrating the preceding folio, we must ensure there's always one safe (with raised refcount) folio before by delaying its folio_put(). More details can be found in commit e66f3185fa04. It's rather tricky. We can use the folio_batch infrastructure to handle this clearly. In this case, ->split_queue_len will be consistent with the real number of folios in the split queue. If list_empty(&folio->_deferred_list) returns false, it's clear the folio must be in its split queue (not in a local list anymore). In the future, we aim to reparent LRU folios during memcg offline to eliminate dying memory cgroups. This patch prepares for using folio_split_queue_lock_irqsave() as folio memcg may change then. Signed-off-by: Muchun Song --- mm/huge_memory.c | 69 +++++++++++++++++++++--------------------------- 1 file changed, 30 insertions(+), 39 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 70820fa75c1f..d2bc943a40e8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -4220,40 +4220,47 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, struct pglist_data *pgdata = NODE_DATA(sc->nid); struct deferred_split *ds_queue = &pgdata->deferred_split_queue; unsigned long flags; - LIST_HEAD(list); - struct folio *folio, *next, *prev = NULL; - int split = 0, removed = 0; + struct folio *folio, *next; + int split = 0, i; + struct folio_batch fbatch; + bool done; #ifdef CONFIG_MEMCG if (sc->memcg) ds_queue = &sc->memcg->deferred_split_queue; #endif - + folio_batch_init(&fbatch); +retry: + done = true; spin_lock_irqsave(&ds_queue->split_queue_lock, flags); /* Take pin on all head pages to avoid freeing them under us */ list_for_each_entry_safe(folio, next, &ds_queue->split_queue, _deferred_list) { if (folio_try_get(folio)) { - list_move(&folio->_deferred_list, &list); - } else { + folio_batch_add(&fbatch, folio); + } else if (folio_test_partially_mapped(folio)) { /* We lost race with folio_put() */ - if (folio_test_partially_mapped(folio)) { - folio_clear_partially_mapped(folio); - mod_mthp_stat(folio_order(folio), - MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); - } - list_del_init(&folio->_deferred_list); - ds_queue->split_queue_len--; + folio_clear_partially_mapped(folio); + mod_mthp_stat(folio_order(folio), + MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } + list_del_init(&folio->_deferred_list); + ds_queue->split_queue_len--; if (!--sc->nr_to_scan) break; + if (folio_batch_space(&fbatch) == 0) { + done = false; + break; + } } split_queue_unlock_irqrestore(ds_queue, flags); - list_for_each_entry_safe(folio, next, &list, _deferred_list) { + for (i = 0; i < folio_batch_count(&fbatch); i++) { bool did_split = false; bool underused = false; + struct deferred_split *fqueue; + folio = fbatch.folios[i]; if (!folio_test_partially_mapped(folio)) { underused = thp_underused(folio); if (!underused) @@ -4269,39 +4276,23 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, } folio_unlock(folio); next: + if (did_split || !folio_test_partially_mapped(folio)) + continue; /* - * split_folio() removes folio from list on success. * Only add back to the queue if folio is partially mapped. * If thp_underused returns false, or if split_folio fails * in the case it was underused, then consider it used and * don't add it back to split_queue. */ - if (did_split) { - ; /* folio already removed from list */ - } else if (!folio_test_partially_mapped(folio)) { - list_del_init(&folio->_deferred_list); - removed++; - } else { - /* - * That unlocked list_del_init() above would be unsafe, - * unless its folio is separated from any earlier folios - * left on the list (which may be concurrently unqueued) - * by one safe folio with refcount still raised. - */ - swap(folio, prev); - } - if (folio) - folio_put(folio); + fqueue = folio_split_queue_lock_irqsave(folio, &flags); + list_add_tail(&folio->_deferred_list, &fqueue->split_queue); + fqueue->split_queue_len++; + split_queue_unlock_irqrestore(fqueue, flags); } + folios_put(&fbatch); - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); - list_splice_tail(&list, &ds_queue->split_queue); - ds_queue->split_queue_len -= removed; - split_queue_unlock_irqrestore(ds_queue, flags); - - if (prev) - folio_put(prev); - + if (!done) + goto retry; /* * Stop shrinker if we didn't split any page, but the queue is empty. * This can happen if pages were freed under us.