From patchwork Sun Jan 28 13:28:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 13534468 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1653C47DA9 for ; Sun, 28 Jan 2024 13:29:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 56E4D6B007B; Sun, 28 Jan 2024 08:29:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 51CDD6B007D; Sun, 28 Jan 2024 08:29:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3BDC06B007E; Sun, 28 Jan 2024 08:29:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2E5566B007B for ; Sun, 28 Jan 2024 08:29:04 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 01E2E1607B8 for ; Sun, 28 Jan 2024 13:29:03 +0000 (UTC) X-FDA: 81728800608.18.3B582EA Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) by imf02.hostedemail.com (Postfix) with ESMTP id 3FE2E8000E for ; Sun, 28 Jan 2024 13:29:02 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.177 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bytedance.com (policy=quarantine) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706448542; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lh9QOA1AtFblJ81VoirAW8qU36F2Ok/LfE+PZUCh84Y=; b=IIbDXLAxGRKkY3hFdKFKMyYuFXbcqFZR0IASJ8syOVLxSuIG1QL3t/v5xCnHkmyLnKfviO BQ4vIoXpUdcjIEZHSZCtSTDnrhdw2+t3DcXgTlfQzoAoh4pRn3TR3Mro2usScyDh4J1PDr 78oLvMRpX8wZqtPK6O2Fj/bpZosvYJo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706448542; a=rsa-sha256; cv=none; b=6Ca97I3GrU94AhlIo1VurJ89pwpQq8YkBJ+vCd8ve94OpCHc09cRHC6/dtOqwTp17H2B/g sRo7NBnZmlqU4c4OlQYLndWWXKnyN8X32Ox7txl72dvkuio1gMU8Qover5bUxMavglr55s oLNLpDAwWWLbRIj6NQdXFgCA4Jyo+VE= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.177 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bytedance.com (policy=quarantine) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Sun, 28 Jan 2024 13:28:49 +0000 Subject: [PATCH v2 1/3] mm/zswap: don't return LRU_SKIP if we have dropped lru lock MIME-Version: 1.0 Message-Id: <20240126-zswap-writeback-race-v2-1-b10479847099@bytedance.com> References: <20240126-zswap-writeback-race-v2-0-b10479847099@bytedance.com> In-Reply-To: <20240126-zswap-writeback-race-v2-0-b10479847099@bytedance.com> To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Andrew Morton , Chris Li Cc: Chengming Zhou , Johannes Weiner , Nhat Pham , linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Migadu-Flow: FLOW_OUT X-Stat-Signature: hwtq5y4o4keddwoha1h1ocmwgq7j6odf X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 3FE2E8000E X-Rspam-User: X-Rspamd-Pre-Result: action=add header; module=dmarc; Action set by DMARC X-Rspam: Yes X-HE-Tag: 1706448542-470454 X-HE-Meta: U2FsdGVkX19e9AnByoiLQAY7NsO0t76f8YZAQqC+CBUeggUDHBhuDDduKY+IsDpw+mDCPM54C/cIbabGSm1UFmkm6z84ymTO/Bmv+l4jNa2fcsc6JjL3nuqf0LnEUV9jrOQxIEgplW86u4whFxBSqp+1bq0wXNVTF2VPpPm8sAydHIRu3yDDrxMcAs9Gfkm63nxFjbW+7VjLj/2ZJo//kjSYXhNw65ek3GqhSSSm9dyX/XaqPhkuYM762kFJX0DPkUxj/qX47jyYI7rBlaqz2z46OUNc6Oe1yhez84gOWn7/YkFSi2WcRGBGe5MtvKxOmMyZBCcTuWuFivGfXuGcVNqFmUd+726liQaC/Mf6Ex2cAuEjjpsIRQaUoFH3iIz2vwczceP34ZkEFEgqlUzbw/HPnaJarxZwJV5wn0z2qhoP5xGWh/V7ZRrgg7KcS11Z4P4hTAvQw8N0yV0T03Q6AbKI2t3snJlSVY5xq14iZAYyGckb9xJCUuNmgOjiO4qNJA9mkXAVA/OZsx4M/YCVQGGe2PAwcuIFdlk+WrKUxMLJ36ebHavJzxPqGT09e24wcZh0Mt8K1I+9P0KeStE15ga9S/yxXZX4HR958KPpXpMziLK8sNVAkIIw5AM53DPLzaNrDFdAV5oax4d8DCu7j+vKDnf+WjzbKDxltc3nnvNBZmrcZpyLVg9codrwOLGeDsI6Rrs27EuJyLbpAUZU1JM5VAIsR3TvfQ1pOls9ixOQRuAi6Dtt3fLx8NrKTn/ZC3aNf2F5M0320SbTwa1A4hLRI73jwgb4TtnNWKnq4nlK3zgWBLwJklpQCvEzLc0IJDa8oiPIJwu0+5ogyzMnelmEkFDjSV9dNguiNrMP1izQYqVTwsTF6FOqhr/k23czU3myvH4C7fy6NHCbnb1hXC0sLbqFOLccGJlW9Sd3AUFznWAFqISiIRq6JCxzS8WGS5ixeuNSIka5lcwte43 UjIv/Bxx uU7A65j49PFd36vp5n4nXAxUjzZOhP/A0pcPmPef18tfanV0IJqDwS1oV6xIlBEhSsA/3knIEY2MyH3DUZD6NyLl63cpxPwurEbidD54MDVUoiSQPL+DzRasWtFRhauw75X/pAwlgGTpWD60HaLr/tFfpApe5v7ORIgaGWQcNybrPqIbOIupMKmd/4D+90+uCn+jlV65686dVeRFk/mi+au1mWo3iU+/cg9tU4IHflamiYqgZPPezPzPFAfN2n1CBu9pf1NuvSjFNwVwLm3e4QVOUfKsdpjochpJN8N7wCi1aqafRm/GU7PD7w7WPsbPPoAmg4j0KiFWXmDlEKbuocZ6IaWnpZ7bOgGJVTDnAOqH5n5YimRkdxqfW2ALCUfzim5RccW4BQHBwoD1TBuYGDQHj2Bcy6/LM+5BScLbUZfWeAqkcRiRuvtn13evMOCO1430gGo9as0U548TB26VV0g2nBfo8vgzdG1q5+XaLnGF7c+t7bzHwCf1UJg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: LRU_SKIP can only be returned if we don't ever dropped lru lock, or we need to return LRU_RETRY to restart from the head of lru list. Otherwise, the iteration might continue from a cursor position that was freed while the locks were dropped. Actually we may need to introduce another LRU_STOP to really terminate the ongoing shrinking scan process, when we encounter a warm page already in the swap cache. The current list_lru implementation doesn't have this function to early break from __list_lru_walk_one. Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") Acked-by: Johannes Weiner Reviewed-by: Nhat Pham Signed-off-by: Chengming Zhou Acked-by: Yosry Ahmed --- mm/zswap.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 00e90b9b5417..81cb3790e0dd 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -901,10 +901,8 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o * into the warmer region. We should terminate shrinking (if we're in the dynamic * shrinker context). */ - if (writeback_result == -EEXIST && encountered_page_in_swapcache) { - ret = LRU_SKIP; + if (writeback_result == -EEXIST && encountered_page_in_swapcache) *encountered_page_in_swapcache = true; - } goto put_unlock; } From patchwork Sun Jan 28 13:28:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 13534469 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4277DC47DA9 for ; Sun, 28 Jan 2024 13:29:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9AE76B0080; Sun, 28 Jan 2024 08:29:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C22D86B0081; Sun, 28 Jan 2024 08:29:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B11246B0082; Sun, 28 Jan 2024 08:29:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9DF096B0080 for ; Sun, 28 Jan 2024 08:29:06 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 4EA271A071E for ; Sun, 28 Jan 2024 13:29:06 +0000 (UTC) X-FDA: 81728800692.03.D5A416D Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) by imf07.hostedemail.com (Postfix) with ESMTP id 87B4540010 for ; Sun, 28 Jan 2024 13:29:04 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bytedance.com (policy=quarantine); spf=pass (imf07.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.177 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706448544; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A/GVc0C8yY1rDKGvg+e6HLczjYwLeBlI+7nCSc5IdPE=; b=0bTRl1hTXvZ4EQsELVslnOVIXvSToaxtjMCBD8lngS94iqKf+Br9TCwOSGN6rf1BMhjksZ ROtaCchA+x46Hy1HRF76abJlZ3iYA7QTuNnK+c8520yOzRkkuVZP634i7oE55Hp2yoGis3 zyctKb560txMJZRpoXgL+sruLkHSBRU= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bytedance.com (policy=quarantine); spf=pass (imf07.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.177 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706448544; a=rsa-sha256; cv=none; b=DPfIQdmPBOVtMGMT7+Gi0Y+YDwHo6rN65HppldUTd8j9I1viTcq/yD4tpfE5gTXRTNvX8Z JoE7/foQQJyk/zoLwugjeqrAGfFCNuhmgy1crEJXyNVYe1X0k9iUubPRWM7GTAArW+Wk1g +VutmEIyH28rG7Uc+I2jLZ7KZxgtdok= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Sun, 28 Jan 2024 13:28:50 +0000 Subject: [PATCH v2 2/3] mm/zswap: fix race between lru writeback and swapoff MIME-Version: 1.0 Message-Id: <20240126-zswap-writeback-race-v2-2-b10479847099@bytedance.com> References: <20240126-zswap-writeback-race-v2-0-b10479847099@bytedance.com> In-Reply-To: <20240126-zswap-writeback-race-v2-0-b10479847099@bytedance.com> To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Andrew Morton , Chris Li Cc: Chengming Zhou , Johannes Weiner , Nhat Pham , linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Queue-Id: 87B4540010 X-Rspamd-Server: rspam12 X-Stat-Signature: 8m3qx8woxu59kdqqknn7ogadmckzdbxh X-Rspamd-Pre-Result: action=add header; module=dmarc; Action set by DMARC X-Rspam: Yes X-HE-Tag: 1706448544-118264 X-HE-Meta: U2FsdGVkX18eEXQADryucuh3GsHJ/N90coLLRapCwtWHfxBXqlQNM1Uq0QS0KBB6ca3mDbbDzSJrZLSyFch0uS1smarlWFNhn+l49uv49Xby3hcCxC92HYrEbTfx7S+fYtipEqa3F+U7RcVFXWsrGxY/NhU+2getOldYBgNNMH6xqaKiebmH1+L9Se5aEfT1uxgGE/dnRUdCEZ1jFBxbfI4p2kRZWWpgkCrcrDwukzzBMclt03VQ89zXdgDcNS4VV8zRSIYoFtmvbpbxGKG3B8BbOROpGp0834ARI8kZ2t6pB7iyIuolz9QfgKfkBdlKqguMV6ksNU1uIj1ureKVyaY7eLXbOj9yw/TWUX9BflG+cHGSrQ003YBwlfNZETOVKWmN8jd3VvIIGf7zgD51Q3YlTELcuXo0sBUFh+669xTofsQ0frZwVa1aQ2dGBMgPHGMwBV8GePtVXeOI/uOki4PKezZTcwO7Nskpryr/8BcIuWL4gokCzMG/chh6xZMSMxrPyi8EQH6q3CgZuSgP2ZcVYZd7mX0JKArceVlr3Px++8RZXpV3MRLIEGph6+2hE9/frhv/jNtP0fx6wSzS7T8agnaPa9blM2N0frrvp4vZhZthdZmDNKAQRLsbDN3nDd6uz531q5irehddQHXtByIz08yNqZmBU+fq6cgh1X6tWqeKuYcIk9giDQIuHBgvNoa04xLm3ScVq/x2RrI5HN11GiXRAU1mix9XYwRGEIf7m9+1MjwZfY8fcMxf9ulNSJc5WPecA1/wWafxpY+ogr0D2mw31o1v8KK1xxqPbQcxDXelBhZd2t6DMvjkbH78Uma/YH/AyJC0biHkmeeXiPxm0NKtfH3Sma5WPBxgtIDNL+EEiZsCWr1fyRL0YaRCKfOM1XdnVpYZ8KmzrOD2JsfGxMtMhgVIKwiCWt8rHssBJeRmer9sxHj5jioktx6UZroO8qC7GGXQp3YmYo/ 1KUL+MfC WZbl7EE4sfJtl/JSlY7QUz3PvsaXJyR8yM+1rNPxdvdL9SRADK2MUfWWhLsgcPHMW3JYb2PYs1FLGSgGIlz9ENXqfXUreBXW6bSlVpVXv+JFOTXDMfdo42h+T/KtBiYeCFoIhOrLig/ygOyl9zLXnuuQsEdmCQfHqadVJGRXQQS/XzEoeAT2ZUnIUfBHgywnoUUfVnSvsV2EBlx1mrI2ddP1et4OW2bl+wTC57C2Bg7h3UCTxOJmnhr/J8/3CVh8WkRD68YaOekniKGct17F1cExEnzRLuVlcou0FOJrRnj/uIL76f1cbAO1Q558ITNof8/5mCi/2DLl/c4uD+LQMOEdnbGZI3BosmdrmjLziFZrzFWieuK9DWm5KjHTAjDwuBpDbZ5jluzMCoVvEs0u6HolHF0HErs2cXfWmpAb1a8fRI+ZwUVy5dLOHYQSlzVPjcRef9E1GHlZoe/Hr924Dp2RikMnov6fztibkcbn13+ddaGffgALxSP/kZTx5G3fIHjcu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: LRU writeback has race problem with swapoff, as spotted by Yosry [1]: CPU1 CPU2 shrink_memcg_cb swap_off list_lru_isolate zswap_invalidate zswap_swapoff kfree(tree) // UAF spin_lock(&tree->lock) The problem is that the entry in lru list can't protect the tree from being swapoff and freed, and the entry also can be invalidated and freed concurrently after we unlock the lru lock. We can fix it by moving the swap cache allocation ahead before referencing the tree, then check invalidate race with tree lock, only after that we can safely deref the entry. Note we couldn't deref entry or tree anymore after we unlock the folio, since we depend on this to hold on swapoff. So this patch moves all tree and entry usage to zswap_writeback_entry(), we only use the copied swpentry on the stack to allocate swap cache and if returned with folio locked we can reference the tree safely. Then we can check invalidate race with tree lock, the following things is much the same like zswap_load(). Since we can't deref the entry after zswap_writeback_entry(), we can't use zswap_lru_putback() anymore, instead we rotate the entry in the beginning. And it will be unlinked and freed when invalidated if writeback success. Another change is we don't update the memcg nr_zswap_protected in the -ENOMEM and -EEXIST cases anymore. -EEXIST case means we raced with swapin or concurrent shrinker action, since swapin already have memcg nr_zswap_protected updated, don't need double counts here. For concurrent shrinker, the folio will be writeback and freed anyway. -ENOMEM case is extremely rare and doesn't happen spuriously either, so don't bother distinguishing this case. [1] https://lore.kernel.org/all/CAJD7tkasHsRnT_75-TXsEe58V9_OW6m3g6CF7Kmsvz8CKRG_EA@mail.gmail.com/ Acked-by: Johannes Weiner Acked-by: Nhat Pham Signed-off-by: Chengming Zhou --- mm/zswap.c | 114 ++++++++++++++++++++++++++----------------------------------- 1 file changed, 49 insertions(+), 65 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 81cb3790e0dd..f5cb5a46e4d7 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -277,7 +277,7 @@ static inline struct zswap_tree *swap_zswap_tree(swp_entry_t swp) zpool_get_type((p)->zpools[0])) static int zswap_writeback_entry(struct zswap_entry *entry, - struct zswap_tree *tree); + swp_entry_t swpentry); static int zswap_pool_get(struct zswap_pool *pool); static void zswap_pool_put(struct zswap_pool *pool); @@ -445,27 +445,6 @@ static void zswap_lru_del(struct list_lru *list_lru, struct zswap_entry *entry) rcu_read_unlock(); } -static void zswap_lru_putback(struct list_lru *list_lru, - struct zswap_entry *entry) -{ - int nid = entry_to_nid(entry); - spinlock_t *lock = &list_lru->node[nid].lock; - struct mem_cgroup *memcg; - struct lruvec *lruvec; - - rcu_read_lock(); - memcg = mem_cgroup_from_entry(entry); - spin_lock(lock); - /* we cannot use list_lru_add here, because it increments node's lru count */ - list_lru_putback(list_lru, &entry->lru, nid, memcg); - spin_unlock(lock); - - lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(entry_to_nid(entry))); - /* increment the protection area to account for the LRU rotation. */ - atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected); - rcu_read_unlock(); -} - /********************************* * rbtree functions **********************************/ @@ -860,40 +839,47 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o { struct zswap_entry *entry = container_of(item, struct zswap_entry, lru); bool *encountered_page_in_swapcache = (bool *)arg; - struct zswap_tree *tree; - pgoff_t swpoffset; + swp_entry_t swpentry; enum lru_status ret = LRU_REMOVED_RETRY; int writeback_result; + /* + * Rotate the entry to the tail before unlocking the LRU, + * so that in case of an invalidation race concurrent + * reclaimers don't waste their time on it. + * + * If writeback succeeds, or failure is due to the entry + * being invalidated by the swap subsystem, the invalidation + * will unlink and free it. + * + * Temporary failures, where the same entry should be tried + * again immediately, almost never happen for this shrinker. + * We don't do any trylocking; -ENOMEM comes closest, + * but that's extremely rare and doesn't happen spuriously + * either. Don't bother distinguishing this case. + * + * But since they do exist in theory, the entry cannot just + * be unlinked, or we could leak it. Hence, rotate. + */ + list_move_tail(item, &l->list); + /* * Once the lru lock is dropped, the entry might get freed. The - * swpoffset is copied to the stack, and entry isn't deref'd again + * swpentry is copied to the stack, and entry isn't deref'd again * until the entry is verified to still be alive in the tree. */ - swpoffset = swp_offset(entry->swpentry); - tree = swap_zswap_tree(entry->swpentry); - list_lru_isolate(l, item); + swpentry = entry->swpentry; + /* * It's safe to drop the lock here because we return either * LRU_REMOVED_RETRY or LRU_RETRY. */ spin_unlock(lock); - /* Check for invalidate() race */ - spin_lock(&tree->lock); - if (entry != zswap_rb_search(&tree->rbroot, swpoffset)) - goto unlock; - - /* Hold a reference to prevent a free during writeback */ - zswap_entry_get(entry); - spin_unlock(&tree->lock); - - writeback_result = zswap_writeback_entry(entry, tree); + writeback_result = zswap_writeback_entry(entry, swpentry); - spin_lock(&tree->lock); if (writeback_result) { zswap_reject_reclaim_fail++; - zswap_lru_putback(&entry->pool->list_lru, entry); ret = LRU_RETRY; /* @@ -903,27 +889,10 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o */ if (writeback_result == -EEXIST && encountered_page_in_swapcache) *encountered_page_in_swapcache = true; - - goto put_unlock; + } else { + zswap_written_back_pages++; } - zswap_written_back_pages++; - - if (entry->objcg) - count_objcg_event(entry->objcg, ZSWPWB); - count_vm_event(ZSWPWB); - /* - * Writeback started successfully, the page now belongs to the - * swapcache. Drop the entry from zswap - unless invalidate already - * took it out while we had the tree->lock released for IO. - */ - zswap_invalidate_entry(tree, entry); - -put_unlock: - /* Drop local reference */ - zswap_entry_put(entry); -unlock: - spin_unlock(&tree->lock); spin_lock(lock); return ret; } @@ -1408,9 +1377,9 @@ static void __zswap_load(struct zswap_entry *entry, struct page *page) * freed. */ static int zswap_writeback_entry(struct zswap_entry *entry, - struct zswap_tree *tree) + swp_entry_t swpentry) { - swp_entry_t swpentry = entry->swpentry; + struct zswap_tree *tree; struct folio *folio; struct mempolicy *mpol; bool folio_was_allocated; @@ -1426,9 +1395,11 @@ static int zswap_writeback_entry(struct zswap_entry *entry, return -ENOMEM; /* - * Found an existing folio, we raced with load/swapin. We generally - * writeback cold folios from zswap, and swapin means the folio just - * became hot. Skip this folio and let the caller find another one. + * Found an existing folio, we raced with swapin or concurrent + * shrinker. We generally writeback cold folios from zswap, and + * swapin means the folio just became hot, so skip this folio. + * For unlikely concurrent shrinker case, it will be unlinked + * and freed when invalidated by the concurrent shrinker anyway. */ if (!folio_was_allocated) { folio_put(folio); @@ -1442,18 +1413,31 @@ static int zswap_writeback_entry(struct zswap_entry *entry, * backs (our zswap_entry reference doesn't prevent that), to * avoid overwriting a new swap folio with old compressed data. */ + tree = swap_zswap_tree(swpentry); spin_lock(&tree->lock); - if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) != entry) { + if (zswap_rb_search(&tree->rbroot, swp_offset(swpentry)) != entry) { spin_unlock(&tree->lock); delete_from_swap_cache(folio); folio_unlock(folio); folio_put(folio); return -ENOMEM; } + + /* Safe to deref entry after the entry is verified above. */ + zswap_entry_get(entry); spin_unlock(&tree->lock); __zswap_load(entry, &folio->page); + count_vm_event(ZSWPWB); + if (entry->objcg) + count_objcg_event(entry->objcg, ZSWPWB); + + spin_lock(&tree->lock); + zswap_invalidate_entry(tree, entry); + zswap_entry_put(entry); + spin_unlock(&tree->lock); + /* folio is up to date */ folio_mark_uptodate(folio); From patchwork Sun Jan 28 13:28:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 13534470 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4696BC47258 for ; Sun, 28 Jan 2024 13:29:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9B2B56B0082; Sun, 28 Jan 2024 08:29:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 964E16B0083; Sun, 28 Jan 2024 08:29:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 78E416B0085; Sun, 28 Jan 2024 08:29:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5E56A6B0082 for ; Sun, 28 Jan 2024 08:29:08 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 3552E1A0716 for ; Sun, 28 Jan 2024 13:29:08 +0000 (UTC) X-FDA: 81728800776.10.9AF9E93 Received: from out-170.mta0.migadu.com (out-170.mta0.migadu.com [91.218.175.170]) by imf16.hostedemail.com (Postfix) with ESMTP id 711FE180004 for ; Sun, 28 Jan 2024 13:29:06 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bytedance.com (policy=quarantine); spf=pass (imf16.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.170 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706448546; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U42APFZqtxAP0n67lkIssq/Va9L204p4mxi+BzXmFRI=; b=3i47nxJuUi0bUAdAWShDqCJiLMHbXpS/7U3sTmmOSIAHMH+XejJvCn5ZpoG5Vry2pe9/nO FjsmNAtzLMP1BH3uzR/SMxiYXvUbxPZYO3qmi2VLcHYzSSHSIrQiiUXeM/MCj1cuxt10Ao qq8Zw9XlfzDQy05zKCzj7Ye84I3+4V0= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bytedance.com (policy=quarantine); spf=pass (imf16.hostedemail.com: domain of chengming.zhou@linux.dev designates 91.218.175.170 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706448546; a=rsa-sha256; cv=none; b=HHtPSasYXnzLNC2XwrR3vRJX8QNyHh2gk/2mG9Bkr/CXrHyb7m61IYfBVFXT6XSNRMrSxD C323oOaWzAG8x4GWZBsTfy/uqKnQvoPB8s1/vsAIx+uY7Bdlduy5nCiQe79qTs14gwjV1x Yh6bBgz6Ggb1K96Df6zMcqFI6xBDKB4= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou Date: Sun, 28 Jan 2024 13:28:51 +0000 Subject: [PATCH v2 3/3] mm/list_lru: remove list_lru_putback() MIME-Version: 1.0 Message-Id: <20240126-zswap-writeback-race-v2-3-b10479847099@bytedance.com> References: <20240126-zswap-writeback-race-v2-0-b10479847099@bytedance.com> In-Reply-To: <20240126-zswap-writeback-race-v2-0-b10479847099@bytedance.com> To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Andrew Morton , Chris Li Cc: Chengming Zhou , Johannes Weiner , Nhat Pham , linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Migadu-Flow: FLOW_OUT X-Rspamd-Pre-Result: action=add header; module=dmarc; Action set by DMARC X-Rspam-User: X-Rspamd-Queue-Id: 711FE180004 X-Rspamd-Server: rspam05 X-Stat-Signature: tfcnwixskcgsmzmsde9ffxg4r1rhxrew X-Rspam: Yes X-HE-Tag: 1706448546-897318 X-HE-Meta: U2FsdGVkX180qVs/DMl8wQhYlfFpXiC7/BEZts8VzVMEIoH9a+SqwfphDdojbpcd1ekPUlZKpAbFVFCKnDsPQMVRCf7UqP2YoTN9CkIKCxfhBcVoKaiAVmjNBAMOrLG3KErQhuB/vUwDRw/mZuuW+3OBtx+jXC9Pg1rNE+73E/xStSIrNBnrbJHgkiptTecHrUlOTBExIExlgT4NPsm7k5jFonjXaoHzv9T7EkS9I0GadA4WOO1djfyGpA2VZ/p2g6binNDZ0UExSlONMNNTWXGzbc013lbBnPNOJZ4cRaHTM35xS3avLGgvExdlDk2cxseTICNuvJx0OTOZ9CQTTossdk4vK6laYnlDvADtPjUxXK3FBesyesBnBChyDpqXOnRmz3tL6+lX70Xx1OcdMkvyxhF6VUYny3ksfedaFfYo0QotwVJnUiheHdkGYFzfh0YH0yUvhmmtGgkP98/iwSfjm1nfzeifJGx2AuMpBxhiJ3DvHklEp7WDeGCTPpzEIsf8MAYErgFUOGigAK/TzUWYIbcqtFanpUcOsj5BBJp9pJVkNGuKP6PZcpCSy+79t9n2UZq4Yi9qFcBmuW50XcH/YDM5Rc1HlmN6eIN3a631zZD67RPZ4QU9QrNUOc3XgY11qBNkj4/Iy3HMaeycE0u/pzDrxqjB61B5xzZ+eswdT4DbPBG7rwrKIfyob09/f8yk87fpLxAOa/seIZeULi4wz88YktDz7766VHFuqB6jAHuEJZ6NNhQQxD0H6ub51LR7T1chq0DGs8oBI210B5vLlAVx6SinnnGBMeyqa0wCJDBXeOEWUOUB9cIaAZUeyaL8mQPtmlILwJyC+VegeTPrIilSLuzcU0VWw8zx3YNWyl5wpgtwaB9Sr2oKY8GqbMC4DwsVr5k582BF14p2ttYWCPbuuPAuNR+gg+dTa6q2jPBe0pW5iICQlrSArGbvJ9E2qLY1yb0wzd1oTdl lRngEFCM J8WMXh/7p6W7DzXyGMSueugysIrOsLVfZ3cuV34k8oEDiJokvyNs7BQNn2v2tMqKKiNoyO+l8XjO7m1S5gVM76A2hWX66ex+cH+rXToZRra+MtwjOCQyywU8+yqh/A9t4/nZkY/oy6xD72C5UWFm9jMid93YfND0ZTF/6i0MAsjvnvqD+hdmLlylnoZOUsYe/0slO/259GNjkzJtikXWMTRcA3bhH5b/3ZFcZWPOSPucuIQ0+Z6G/L4loTK00VjDStjSo69AMR4TDB7Q3+IQ9T0cwQnqN/1ikZ0zGIf6FpDPJEmRZ9s0UhHMzlVjvrdEmGAJg6ozzf+ZHqDxN3mY1z0djueWaet7LDELbnKmzn1aWPh/kznBzy5FaOFJF66yLXZ9HsJvJRjeZvUVJuwEsYS6cU2/dzeaiNieT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Since the only user zswap_lru_putback() has gone, remove list_lru_putback() too. Signed-off-by: Chengming Zhou Acked-by: Johannes Weiner Acked-by: Nhat Pham Acked-by: Yosry Ahmed --- include/linux/list_lru.h | 16 ---------------- mm/list_lru.c | 14 -------------- mm/zswap.c | 2 +- 3 files changed, 1 insertion(+), 31 deletions(-) diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index c679e6b293c4..f2882a820690 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -168,22 +168,6 @@ static inline unsigned long list_lru_count(struct list_lru *lru) void list_lru_isolate(struct list_lru_one *list, struct list_head *item); void list_lru_isolate_move(struct list_lru_one *list, struct list_head *item, struct list_head *head); -/** - * list_lru_putback: undo list_lru_isolate - * @lru: the lru pointer. - * @item: the item to put back. - * @nid: the node id of the sublist to put the item back to. - * @memcg: the cgroup of the sublist to put the item back to. - * - * Put back an isolated item into its original LRU. Note that unlike - * list_lru_add, this does not increment the node LRU count (as - * list_lru_isolate does not originally decrement this count). - * - * Since we might have dropped the LRU lock in between, recompute list_lru_one - * from the node's id and memcg. - */ -void list_lru_putback(struct list_lru *lru, struct list_head *item, int nid, - struct mem_cgroup *memcg); typedef enum lru_status (*list_lru_walk_cb)(struct list_head *item, struct list_lru_one *list, spinlock_t *lock, void *cb_arg); diff --git a/mm/list_lru.c b/mm/list_lru.c index 158781d1d3c2..61f3b6b1134f 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -162,20 +162,6 @@ void list_lru_isolate_move(struct list_lru_one *list, struct list_head *item, } EXPORT_SYMBOL_GPL(list_lru_isolate_move); -void list_lru_putback(struct list_lru *lru, struct list_head *item, int nid, - struct mem_cgroup *memcg) -{ - struct list_lru_one *list = - list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg)); - - if (list_empty(item)) { - list_add_tail(item, &list->list); - if (!list->nr_items++) - set_shrinker_bit(memcg, nid, lru_shrinker_id(lru)); - } -} -EXPORT_SYMBOL_GPL(list_lru_putback); - unsigned long list_lru_count_one(struct list_lru *lru, int nid, struct mem_cgroup *memcg) { diff --git a/mm/zswap.c b/mm/zswap.c index f5cb5a46e4d7..de68a5928527 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -411,7 +411,7 @@ static void zswap_lru_add(struct list_lru *list_lru, struct zswap_entry *entry) * 2. list_lru_add() is called after memcg->kmemcg_id is updated. The * new entry will be added directly to memcg's parent's list_lru. * - * Similar reasoning holds for list_lru_del() and list_lru_putback(). + * Similar reasoning holds for list_lru_del(). */ rcu_read_lock(); memcg = mem_cgroup_from_entry(entry);