From patchwork Wed Jul 3 12:02:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ge Yang X-Patchwork-Id: 13722144 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78238C30653 for ; Wed, 3 Jul 2024 12:03:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E2506B00A0; Wed, 3 Jul 2024 08:03:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 092436B00A1; Wed, 3 Jul 2024 08:03:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9C476B00A2; Wed, 3 Jul 2024 08:03:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CA8B56B00A0 for ; Wed, 3 Jul 2024 08:03:02 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7E5CDC0186 for ; Wed, 3 Jul 2024 12:03:02 +0000 (UTC) X-FDA: 82298305404.12.C3CEC1C Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.7]) by imf26.hostedemail.com (Postfix) with ESMTP id 632AC140010 for ; Wed, 3 Jul 2024 12:02:58 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=T931knSL; spf=pass (imf26.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.7 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720008168; a=rsa-sha256; cv=none; b=VvmNyERnb7apSEHMASGXNuIPncGBbwarkeQDOZTPdTCBRk5VmOjtf0CDM6nLpwvIU8Qmk1 yTSDHvYlw/fnX2REg7OaWehxNJukXrk+5wKQMMFzirD1pSUc6x5tpijYTy88TMe7fcrmH6 HxhlGsFAAEIRh3JfeoFU9OuT5jrOYhQ= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=T931knSL; spf=pass (imf26.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.7 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720008168; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references:dkim-signature; bh=+xxXvPv4ZRctgc4I5XpHM6oasgCZi8jLWTxJTFBM7AM=; b=qqrIHT9pB7oqbheoNHfbRxzHanE8d/hYjlH/3y3K/u5G13G29RCmk53EeSoZq1acQ63WYn IQVQ6WsAI2AsY27BqIR9hpJ62ogWSquH2cRHkQVzVYlPib+ANThy+ToM0dgCAs9SFRaeRK t+MuRe8bSQ6/AoZ+ZyRlpYo2BJY+/Dc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=From:Subject:Date:Message-Id; bh=+xxXvPv4ZRctgc4I5X pHM6oasgCZi8jLWTxJTFBM7AM=; b=T931knSL8OsG26rLe+vEHmCV7+Dck4a1B1 xhZr2ieZ1YITIBsRbJzMmT623I1xpehq3eEXPVz9B2HUqmjfkBTBKWh4zuO9nU2L C4styEijuj3GRALEs3Y3/Hn9IZm/MjnWBa4ukT6yFbP2m68a9kAuwWHYM1737fBR qk7rPQwv0= Received: from hg-OptiPlex-7040.hygon.cn (unknown [118.242.3.34]) by gzga-smtp-mta-g1-1 (Coremail) with SMTP id _____wD3_9_mPYVmi7EgAQ--.48290S2; Wed, 03 Jul 2024 20:02:48 +0800 (CST) From: yangge1116@126.com To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, baolin.wang@linux.alibaba.com, aneesh.kumar@linux.ibm.com, liuzixing@hygon.cn, yangge Subject: [PATCH V3] mm/gup: Clear the LRU flag of a page before adding to LRU batch Date: Wed, 3 Jul 2024 20:02:33 +0800 Message-Id: <1720008153-16035-1-git-send-email-yangge1116@126.com> X-Mailer: git-send-email 2.7.4 X-CM-TRANSID: _____wD3_9_mPYVmi7EgAQ--.48290S2 X-Coremail-Antispam: 1Uf129KBjvJXoW3WF4rXF4fJr17CF4kCw4xCrg_yoW7AryxpF W7Gr9IqF4DGFnrur47Xw15Jr1Yk393Xa1UJFWxGry7AF15Xw1qkF1xKw1UJa9xJryruFn3 Za4UJF1vgF1UAF7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0zR5UUUUUUUU= X-Originating-IP: [118.242.3.34] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiGAkRG2VLcLTHxQAAsz X-Stat-Signature: cp8brdoxs48xe9z7oxcpt71m8dtrh1jh X-Rspamd-Queue-Id: 632AC140010 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1720008178-531777 X-HE-Meta: U2FsdGVkX1/l8ZhlRMjqqJXoqhsE+R45EQiMpi2gl1rIJa1yQoqgtkdpEU+RU4JZ4ghlBEuCwTUgRftZJd/lMfKiuSRi8Q1OVz7yj8831e0emjmJiOA8u/zrdiZ93bMPhbclxOPAfRRLo4q6BrJuTbxlesoryIdPwUIBPKFaZT2a9iKqyXbpA33kUM5Yr0ry+QAz4+IAoiIBzrlUpJkwWbTKsmKugsDTUrFsluk1yVGGz+k4KpYa8saWS5vE4J20kM9WV1cUi+Q627Jhy4sgHVz6COcEBF/RW0yq2JyRypPhoY+RyOBhcGLeeHfx+ZSfBmyYaSb7nTBDKEReVVlX/9lboqnwUVTSfG2a/+zbZ+bO5HswEgpfR4AGWIyc5ydwIacdOO8mLoGahEjLm0lT/nA+Fb40J1KT54D6wsUDqNNCYh+O9fs3kiwbn+xMpc/UYQP+I5MeWzTWgKm96InUbqJ/WzHgW3ouTP9xOrW335K39Tg1v5BUVPVv5sVcldIMrlvtmHy/uQZ0SXAtcmuF8NZCK+V0xFMBpq9c5lty0+ehWkrzPEEb9z+U+zjCW/aj2eiNoGyL3TtYJxtowz3NBim8wh3ZCfdBhgmMoV0xW9jhgi+dbXAUz5XU5cxuElUom8mpnZXREA1yOkmx4Jt5quXQgysNRifHG7lJrM7bU1UhqZFMsQ9Ndy27SjhksgICm7cgKU/PqJ8cHZEMCm+edZ3n/4sHLYx5T3wxKKEqX0atO9qlZzqwsOdwseoZgPXBXythOpaS5ss6xjiGYl8Oj6M2VsJB5KDQFEgDjJUmFPC2F5vyg67oxILGZXsBXO4dBYVUE1ou27Ejfq3IjU14vkqTwJ8qQi9gOyh7iO5Pxu6k+C5WutjSYKWHiA6kp9F8MBKNfqbA8nF0mvcXK9diMNh/DuBHZxyEJkX+n3rPieIjfgJwSTKcBf2UgQmn31f+YgSQ0eI470LoRHBJv0B RC6HwEkI A4GXWykYUg5KhyPKxC2AudTOA0sF89Rh+iNib6TX9J8fIhoPBcfWhRHBK6CbWFGILVO2MzrYnwqhUEhjLqFrdCr0sONo+48KwjQJOY6KTp8jOSQi5T4GgawJQ7t7Pj1rbK+gsEpefa3FHZRifEY1XhQBKEJG9UeUTvIXzRxt+8RFV2gs0/ETbkiC/zZZKcFIquN6lwRuCjZfbLfAb4S9bK2OrEx78MNj7VqdU9xhS/SMeGE6l9rogyirczMwW5cS09Jls+Tci0KyglMy9igTCWCKCecj2PTJsbGQ/7RHGxdyS26v3KlPu0G6cyrtfVJF6tBtoG/nvPq7+DLKx/OLT+RlzQUUw+1JnsZJ5cVkYGrsF9qpvL+IrJ88a3xt88tlw7rekvqCZybiOAKI4Ir6t1iaooJq6d0mOgAgDxZkK8BXRdg/goaMpiipkNA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: yangge If a large number of CMA memory are configured in system (for example, the CMA memory accounts for 50% of the system memory), starting a virtual virtual machine with device passthrough, it will call pin_user_pages_remote(..., FOLL_LONGTERM, ...) to pin memory. Normally if a page is present and in CMA area, pin_user_pages_remote() will migrate the page from CMA area to non-CMA area because of FOLL_LONGTERM flag. But the current code will cause the migration failure due to unexpected page refcounts, and eventually cause the virtual machine fail to start. If a page is added in LRU batch, its refcount increases one, remove the page from LRU batch decreases one. Page migration requires the page is not referenced by others except page mapping. Before migrating a page, we should try to drain the page from LRU batch in case the page is in it, however, folio_test_lru() is not sufficient to tell whether the page is in LRU batch or not, if the page is in LRU batch, the migration will fail. To solve the problem above, we modify the logic of adding to LRU batch. Before adding a page to LRU batch, we clear the LRU flag of the page so that we can check whether the page is in LRU batch by folio_test_lru(page). Seems making the LRU flag of the page invisible a long time is no problem, because a new page is allocated from buddy and added to the lru batch, its LRU flag is also not visible for a long time. Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") Cc: Signed-off-by: yangge --- mm/swap.c | 43 +++++++++++++++++++++++++++++++------------ 1 file changed, 31 insertions(+), 12 deletions(-) V3: Add fixes tag V2: Adjust code and commit message according to David's comments diff --git a/mm/swap.c b/mm/swap.c index dc205bd..9caf6b0 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -211,10 +211,6 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn) for (i = 0; i < folio_batch_count(fbatch); i++) { struct folio *folio = fbatch->folios[i]; - /* block memcg migration while the folio moves between lru */ - if (move_fn != lru_add_fn && !folio_test_clear_lru(folio)) - continue; - folio_lruvec_relock_irqsave(folio, &lruvec, &flags); move_fn(lruvec, folio); @@ -255,11 +251,16 @@ static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) void folio_rotate_reclaimable(struct folio *folio) { if (!folio_test_locked(folio) && !folio_test_dirty(folio) && - !folio_test_unevictable(folio) && folio_test_lru(folio)) { + !folio_test_unevictable(folio)) { struct folio_batch *fbatch; unsigned long flags; folio_get(folio); + if (!folio_test_clear_lru(folio)) { + folio_put(folio); + return; + } + local_lock_irqsave(&lru_rotate.lock, flags); fbatch = this_cpu_ptr(&lru_rotate.fbatch); folio_batch_add_and_move(fbatch, folio, lru_move_tail_fn); @@ -352,11 +353,15 @@ static void folio_activate_drain(int cpu) void folio_activate(struct folio *folio) { - if (folio_test_lru(folio) && !folio_test_active(folio) && - !folio_test_unevictable(folio)) { + if (!folio_test_active(folio) && !folio_test_unevictable(folio)) { struct folio_batch *fbatch; folio_get(folio); + if (!folio_test_clear_lru(folio)) { + folio_put(folio); + return; + } + local_lock(&cpu_fbatches.lock); fbatch = this_cpu_ptr(&cpu_fbatches.activate); folio_batch_add_and_move(fbatch, folio, folio_activate_fn); @@ -700,6 +705,11 @@ void deactivate_file_folio(struct folio *folio) return; folio_get(folio); + if (!folio_test_clear_lru(folio)) { + folio_put(folio); + return; + } + local_lock(&cpu_fbatches.lock); fbatch = this_cpu_ptr(&cpu_fbatches.lru_deactivate_file); folio_batch_add_and_move(fbatch, folio, lru_deactivate_file_fn); @@ -716,11 +726,16 @@ void deactivate_file_folio(struct folio *folio) */ void folio_deactivate(struct folio *folio) { - if (folio_test_lru(folio) && !folio_test_unevictable(folio) && - (folio_test_active(folio) || lru_gen_enabled())) { + if (!folio_test_unevictable(folio) && (folio_test_active(folio) || + lru_gen_enabled())) { struct folio_batch *fbatch; folio_get(folio); + if (!folio_test_clear_lru(folio)) { + folio_put(folio); + return; + } + local_lock(&cpu_fbatches.lock); fbatch = this_cpu_ptr(&cpu_fbatches.lru_deactivate); folio_batch_add_and_move(fbatch, folio, lru_deactivate_fn); @@ -737,12 +752,16 @@ void folio_deactivate(struct folio *folio) */ void folio_mark_lazyfree(struct folio *folio) { - if (folio_test_lru(folio) && folio_test_anon(folio) && - folio_test_swapbacked(folio) && !folio_test_swapcache(folio) && - !folio_test_unevictable(folio)) { + if (folio_test_anon(folio) && folio_test_swapbacked(folio) && + !folio_test_swapcache(folio) && !folio_test_unevictable(folio)) { struct folio_batch *fbatch; folio_get(folio); + if (!folio_test_clear_lru(folio)) { + folio_put(folio); + return; + } + local_lock(&cpu_fbatches.lock); fbatch = this_cpu_ptr(&cpu_fbatches.lru_lazyfree); folio_batch_add_and_move(fbatch, folio, lru_lazyfree_fn);