From patchwork Tue Feb 25 21:32:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13991006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20A79C021BB for ; Tue, 25 Feb 2025 21:32:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36862280007; Tue, 25 Feb 2025 16:32:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C96F280003; Tue, 25 Feb 2025 16:32:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 143CE280007; Tue, 25 Feb 2025 16:32:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E4DAD280003 for ; Tue, 25 Feb 2025 16:32:04 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 73AAC120193 for ; Tue, 25 Feb 2025 21:32:04 +0000 (UTC) X-FDA: 83159764968.12.7BB4F42 Received: from mail-yw1-f173.google.com (mail-yw1-f173.google.com [209.85.128.173]) by imf10.hostedemail.com (Postfix) with ESMTP id 8B1B8C0013 for ; Tue, 25 Feb 2025 21:32:02 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PfjkVM6r; spf=pass (imf10.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740519122; a=rsa-sha256; cv=none; b=GmHQ5/s4ipqcQMTtVYBr+GITrJ1udbIch1dIG0fHNDy6WgSj+2XwOXs9zHGzpgPCcxk/2O Yh2p1cFRo4Iz9bCOMMNQgskRl+FJUXQh6ueLOw3Gh5aKxiYE/ptpotxrLMZ/W7gEYsqyWp KZRR0WZw112AN9q9O+MlLIJ9ZnaYVwE= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=PfjkVM6r; spf=pass (imf10.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740519122; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=I7Rrea/X07qPqBXuL0X3TBXgqYMlwfU0ITIeihIkB7o=; b=y1qO+Hc7hT9njuxo+3q8zHccMyblbCHw+W0oT9FpdmJWXsokrIOqAamY5ZA1ZnKg0N00uR 4ORHXC5g8ZYLyvzDEQR9k1M/S1CNAp9EqevNkmIf6XH1/Sb4CXyc6LxWp9ph+J69Lz8cf3 /KZode6Me0IYfYFdH5hlpnn9/elGMIw= Received: by mail-yw1-f173.google.com with SMTP id 00721157ae682-6f74b78df93so58525827b3.0 for ; Tue, 25 Feb 2025 13:32:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740519121; x=1741123921; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=I7Rrea/X07qPqBXuL0X3TBXgqYMlwfU0ITIeihIkB7o=; b=PfjkVM6r0DrDtSxQFH8JP95vk+XtMk1vrvf+5PDCDacuXrgHFEu/rUI6Yg6G38GDiZ 8Gs/mGPpgdHhT9pt/DV3ZVgSgzsUiD5eEgl35MdMua1HzAMWYLpVHGrV/FSgPYmzAA8n Ven7Q03DTdns04hNiJf1XIzwko9cITbDCzgSDbpDa0k9alBZW8YMz5MfQekqVRxHzBsT zl8VXWakFvhGwdcWx1+b/PewCCRdmsdC1tMbSAgNtPWNMi8zRSU0cBPPJ16zlWBjFCgX 1DnsFW0QawPDoRhH2814A5gGVoCDca9w+KkEbLwQPHIrepnRFMVT6++Q4VgRRUZwOsFp mH7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740519121; x=1741123921; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=I7Rrea/X07qPqBXuL0X3TBXgqYMlwfU0ITIeihIkB7o=; b=KOu1e4fLV1/YeIcI9N6BDF6VMCiENIxC9siEMIBU5iV0qJugh9rNEVO03B5WIohJyK 9W1du3/MWxzsNgn9WECUpOmd36bTIC75cWg1TCiGSeXzSZiDsNd5MAFYfiDLLZnV/FCR f2gxj8VzQZHMnQkKtmO2ygej7VTxlFy4ckJ3B8tl5WexMkijpHvuFo2RF1RBrzb7wakY l7V35KOXGAR0gzVaYv8Nqa90Ptx0UdA3EvskeS2CzkoHSQ44+DRmUNBWtBYaTMTQn/Tc yvI2SwHIFC7LT0odzjxYJteYV0DKDXlz6X3JB/HDwfVl7gn48zEFJcOS2jqMCPzk2Y+D yiKQ== X-Forwarded-Encrypted: i=1; AJvYcCXopmlqQCLWFiiCGDl1L6JwfC/r51H0LV1EUhGW56CiuMEo4KErwTv1vLrlJ3MUfnQidM6bFiln0Q==@kvack.org X-Gm-Message-State: AOJu0YxCNNgObYsZhPE+R3zhnzT3bgvqKj71KYcPW1DRSnBGM3fl2H83 knOocpBX6wZaA2vwVZXrprdP61klDKHj2TLVDDFwMhbxdjYGc9J5 X-Gm-Gg: ASbGncsk9xPbLtIHO70HlnNNyMT27vKD6sfzYnm0BzMzDP3JnzTZWknGVqL9H7LufOW 5C5NxbmCl9k9JzXdgDjyN/2DVJQWOIZb/uaNQc+6HRT0pVQQAKt9F1XrGw1Znce+fXDU5W4s+k9 27eapCwvUm02/DCtusM65LkG084iGMKjE4df53RcklpEUBMaOtZvk2dXRdksNl5mol4pq4NmTdh iBYHngKhUnvG7GAjPP9lciloER0fLJPr1xtwMG6DxabJOd2Bw3PH4duv12H4NsY+rZQzwc6bSKP 7jvrmyy3Nd41X/Bg7A== X-Google-Smtp-Source: AGHT+IHVonpcA3bjsHp1vhOPOp2NisrZS4zDBbh6hiesTlr8TBlBhIV+AlpbUxqOo2ZlblfcldolqA== X-Received: by 2002:a05:690c:680a:b0:6ef:7f89:d906 with SMTP id 00721157ae682-6fd220a8769mr11124647b3.33.1740519121459; Tue, 25 Feb 2025 13:32:01 -0800 (PST) Received: from localhost ([2a03:2880:25ff:71::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6fd1185bc43sm6375287b3.106.2025.02.25.13.32.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Feb 2025 13:32:00 -0800 (PST) From: Nhat Pham To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, yosryahmed@google.com, chengming.zhou@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org Subject: [PATCH] zswap: do not crash the kernel on decompression failure Date: Tue, 25 Feb 2025 13:32:00 -0800 Message-ID: <20250225213200.729056-1-nphamcs@gmail.com> X-Mailer: git-send-email 2.43.5 MIME-Version: 1.0 X-Stat-Signature: 37bmd14yabwhk5tmbbz5tfdk7r96eeoj X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 8B1B8C0013 X-Rspam-User: X-HE-Tag: 1740519122-862086 X-HE-Meta: U2FsdGVkX181Gd8fGZRMOZuwhqvG8kqzvd2RBDYQiMkV0DyDXPHPFnB7WPBAHrUpmpFbH8FEY7ghjAHcDEsfw7KVK+KxWlN3ithakbmiEKxhOEQUm7xy5Wu/BVGC1EObd7jATNNhmqBLv4iajE0LCcqjlnNfc9OJs3AaE/qlLE9tVDlyeIKpwHa12KkpJx359D2RPVUubaG3YngO3j2POakizWB+MvzdCxMkQeqLY7LbQF4SJEypdNXmn0ryvZbimLi/Kn64vMqB0JFT32oc8m8liDdGW4bx+4tYVpZp1Vfh+ysHF9cdyYdxbFTJmvhsa4goFGTRpGK9PHKECYdjYNUKhj0Mk6/OdHLfxUGnvXT9Q6+A0FAlmlRKPMUlxbyEP3WJtrOecRRJUtdeUPxj02G4e37GYl7qLdmLd7Z79AIbxU0rgJ4Eyg9uIfgeP67C4peG460WoSPBdByS0b2MLzOjCNiDaDR2p6qxdlPZ6+ZA9OtGuHz7PRAoIq0hDVFABeNd1q+WtikLRjhOzuVyCOJBtTM0EoiAxu8+MC2oWLXIaYj2rXv/TOM0wWaaNYQ6s1oV7A3HtD3n6zMcTRmDF4ABq6eY86Uh5UcITpN2rLdbRXjeIgVXLuZJ8hPZmu532kOGI1TC6UwSpGS8AltZM2C7yrpIRy4Oq8uY6VSl/koOiiE0HfaUnCDhhDFi7JUxUHJVz0Vs2RrHM4ywJ54f8HvWNONwYP3dfWE61b/BhMyNOYDkVjVbsJkaLgxcn0/ADVivNqtyfNV9cPEisyQNM08zGoaCwLv7/9YzsoF+1rYfF/9W45GxHKCDLzUEuR08GWYhl4D0blmmUDr0PJbQOmSwBowTzLBOiRz5wHULsynUqHpn2d1vc9HMHP54Dyh2Sxa3WWm98wukLUiQC8rwK6ZZFs8783fCdRWJjtcS55XxS2KQODjFdJf7IrFP+rKYTO+4SZzISbRpSSI22ZG lh2IKKby a5v5TAubxvD09FmEWp67E+n2ZGjD4iINCqKFdk/cwp0rNcOZ2pGj1eAGEz6qD/DVoirioiUFJk2kscShCH0+rjOFpC6MCuad4gspQswWZLpKzAAjWve3AYajcDCuHyyKNev+2n1+7n7jGz1+7cJxVOOC45IAwHrx6jIbOGh53Tov7ladHHi5LPPDhVpIMDlRgVA+kPSsL6vXrJ5XRznxFu7z7aTyYcr5G+tnTg3p9NCjJV3SOz9yEpesd7KyCSfskiSUK0glbNUT+xdj20jgfLK3nFi8azeYM2tA/vhWXyr3MDe3Vzo3AECkpGq4JiY/B2ls9VnXJFvDK7ETfQ0TdSH9XbAg6yOaHaZog3mVOmD4FyvoaruppKuQQ1wNrrJrH2v9RII8N9UufF70H7lpkkm2mwCzbeJ4R6UBmClozy2JnpSso05uVScHvcptieCvYlGZPIhNVdduN0+bkhqq7zgRH4bDPQ+94IXVgKHiPnyejhVLjPVU1fvdZ9yQoxzgSY8rHSvuaBoSjjI9BNNQKvVRMDwFo+D3K5XB5 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently, we crash the kernel when a decompression failure occurs in zswap (either because of memory corruption, or a bug in the compression algorithm). This is overkill. We should only SIGBUS the unfortunate process asking for the zswap entry on zswap load, and skip the corrupted entry in zswap writeback. See [1] for a recent upstream discussion about this. [1]: https://lore.kernel.org/all/ZsiLElTykamcYZ6J@casper.infradead.org/ Suggested-by: Matthew Wilcox Suggested-by: Yosry Ahmed Signed-off-by: Nhat Pham --- mm/zswap.c | 85 +++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 58 insertions(+), 27 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index f6316b66fb23..31d4397eed61 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -62,6 +62,8 @@ static u64 zswap_reject_reclaim_fail; static u64 zswap_reject_compress_fail; /* Compressed page was too big for the allocator to (optimally) store */ static u64 zswap_reject_compress_poor; +/* Load and writeback failed due to decompression failure */ +static u64 zswap_reject_decompress_fail; /* Store failed because underlying allocator could not get memory */ static u64 zswap_reject_alloc_fail; /* Store failed because the entry metadata could not be allocated (rare) */ @@ -953,11 +955,12 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry, return comp_ret == 0 && alloc_ret == 0; } -static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) +static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio) { struct zpool *zpool = entry->pool->zpool; struct scatterlist input, output; struct crypto_acomp_ctx *acomp_ctx; + bool ret = true; u8 *src; acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); @@ -984,12 +987,19 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) sg_init_table(&output, 1); sg_set_folio(&output, folio, PAGE_SIZE, 0); acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE); - BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait)); - BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); + if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait) || + acomp_ctx->req->dlen != PAGE_SIZE) { + ret = false; + zswap_reject_decompress_fail++; + pr_alert_ratelimited( + "decompression failed on zswap entry with offset %08lx\n", + entry->swpentry.val); + } mutex_unlock(&acomp_ctx->mutex); if (src != acomp_ctx->buffer) zpool_unmap_handle(zpool, entry->handle); + return ret; } /********************************* @@ -1018,6 +1028,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE, }; + int ret = 0; /* try to allocate swap cache folio */ mpol = get_task_policy(current); @@ -1034,8 +1045,8 @@ static int zswap_writeback_entry(struct zswap_entry *entry, * and freed when invalidated by the concurrent shrinker anyway. */ if (!folio_was_allocated) { - folio_put(folio); - return -EEXIST; + ret = -EEXIST; + goto put_folio; } /* @@ -1048,14 +1059,17 @@ static int zswap_writeback_entry(struct zswap_entry *entry, * be dereferenced. */ tree = swap_zswap_tree(swpentry); - if (entry != xa_cmpxchg(tree, offset, entry, NULL, GFP_KERNEL)) { - delete_from_swap_cache(folio); - folio_unlock(folio); - folio_put(folio); - return -ENOMEM; + if (entry != xa_load(tree, offset)) { + ret = -ENOMEM; + goto fail; } - zswap_decompress(entry, folio); + if (!zswap_decompress(entry, folio)) { + ret = -EIO; + goto fail; + } + + xa_erase(tree, offset); count_vm_event(ZSWPWB); if (entry->objcg) @@ -1071,9 +1085,14 @@ static int zswap_writeback_entry(struct zswap_entry *entry, /* start writeback */ __swap_writepage(folio, &wbc); - folio_put(folio); + goto put_folio; - return 0; +fail: + delete_from_swap_cache(folio); + folio_unlock(folio); +put_folio: + folio_put(folio); + return ret; } /********************************* @@ -1600,6 +1619,29 @@ bool zswap_load(struct folio *folio) if (WARN_ON_ONCE(folio_test_large(folio))) return true; + /* + * We cannot invalidate the zswap entry before decompressing it. If + * decompression fails, we must keep the entry in the tree so that + * a future read by another process on the same swap entry will also + * have to go through zswap. Otherwise, we risk silently reading + * corrupted data for the other process. + */ + entry = xa_load(tree, offset); + if (!entry) + return false; + + /* + * If decompression fails, we return true to notify the caller that the + * folio's data were in zswap, but do not mark the folio as up-to-date. + * This will effectively SIGBUS the calling process. + */ + if (!zswap_decompress(entry, folio)) + return true; + + count_vm_event(ZSWPIN); + if (entry->objcg) + count_objcg_events(entry->objcg, ZSWPIN, 1); + /* * When reading into the swapcache, invalidate our entry. The * swapcache can be the authoritative owner of the page and @@ -1612,21 +1654,8 @@ bool zswap_load(struct folio *folio) * files, which reads into a private page and may free it if * the fault fails. We remain the primary owner of the entry.) */ - if (swapcache) - entry = xa_erase(tree, offset); - else - entry = xa_load(tree, offset); - - if (!entry) - return false; - - zswap_decompress(entry, folio); - - count_vm_event(ZSWPIN); - if (entry->objcg) - count_objcg_events(entry->objcg, ZSWPIN, 1); - if (swapcache) { + xa_erase(tree, offset); zswap_entry_free(entry); folio_mark_dirty(folio); } @@ -1727,6 +1756,8 @@ static int zswap_debugfs_init(void) zswap_debugfs_root, &zswap_reject_compress_fail); debugfs_create_u64("reject_compress_poor", 0444, zswap_debugfs_root, &zswap_reject_compress_poor); + debugfs_create_u64("reject_decompress_fail", 0444, + zswap_debugfs_root, &zswap_reject_decompress_fail); debugfs_create_u64("written_back_pages", 0444, zswap_debugfs_root, &zswap_written_back_pages); debugfs_create_file("pool_total_size", 0444,