From patchwork Thu Feb 27 00:14:45 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 13993340 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6514AC021B8 for ; Thu, 27 Feb 2025 00:14:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA420280005; Wed, 26 Feb 2025 19:14:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E54D4280003; Wed, 26 Feb 2025 19:14:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF4AC280005; Wed, 26 Feb 2025 19:14:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B2ADD280003 for ; Wed, 26 Feb 2025 19:14:49 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 1213BB57F3 for ; Thu, 27 Feb 2025 00:14:49 +0000 (UTC) X-FDA: 83163803898.11.61B6E8D Received: from mail-yw1-f176.google.com (mail-yw1-f176.google.com [209.85.128.176]) by imf17.hostedemail.com (Postfix) with ESMTP id 4638740012 for ; Thu, 27 Feb 2025 00:14:47 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=g+CWWFBJ; spf=pass (imf17.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.128.176 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740615287; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Y7FGmvfp8qBg8DjAtLK6Oe9oRO76pBZzftFVPLP7nIc=; b=o2YZ/lbxSPPAKP/Y2bkVwO6BKpevVx1gglWo5RTwsKTlPjtpIA481WpEqCC+Pqd8jy9S2M nH8A1X3ogMMtbFMgniN/glQHktzqnT2ZK61YKsRax3c5jdSbIHb3y5tbkwq4SYFpscylBk xchO4LT46yP0kWQDRgKoSyR5sltgurE= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=g+CWWFBJ; spf=pass (imf17.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.128.176 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740615287; a=rsa-sha256; cv=none; b=lbQ6DV8jspWDYAAJHSdEBGeyAvsdXv0hz9J8Th7Wsf0xCfpUvTUlHJ6KUoG5OoCn/6XouU offNddxY6Yf7KUtZaYIHYf+ZIeQsbOE5sF1EuQDydU5zy83vuM6fk/lkyBt9VzpwnTvPJe BeolL+SMjulr3k+pOA2iYQ9BSQSCxOg= Received: by mail-yw1-f176.google.com with SMTP id 00721157ae682-6f7031ea11cso4085097b3.2 for ; Wed, 26 Feb 2025 16:14:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740615286; x=1741220086; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Y7FGmvfp8qBg8DjAtLK6Oe9oRO76pBZzftFVPLP7nIc=; b=g+CWWFBJs1VgQSzqgKbB6+KnGpNJPc6sWUk3SLZnzublZ/ZHc/oWikf42/W7v+2HzR adxzwMic8R4IGvW1B4plvE27upDW/PQ97agd5ZgfEKCtGJRKum6y/rQVF0h5kTFD+N9q 34XIgKaaDDtiu2b9c4ED+vgXzQSDVcS5w9ZOe/7OHMvCI4xyOHW0YwrSKGA1066NOVl5 sRUKhJA5qDkgcSoy11SUWKUX6PMhJkZSX1IN1Wj9778b4Rr1A7UgyJn1pseZpmmEsyCh 05Ld9KTObc7NhNVbRz45Rqavmq1KXkYQdg2aIE+UuKc8Gxp2O6Ev/IufXUyz4UUBlMZD I5ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740615286; x=1741220086; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Y7FGmvfp8qBg8DjAtLK6Oe9oRO76pBZzftFVPLP7nIc=; b=ZxTrL8KkM1d3Ln+QQs1/f+Y4XLJgoj21q1X/nbeIEXo+F34r9WXCRZrIKM3s+mL5Pk C9SvjIpvz8kk8scHHXZ81pBzOzlhRECZPlEWvktb/Q4EwHAdPXY4gDwF0v3V7k3fSZzq 8y8Kb9hRP9fJmtxUb5gH1d3VyXVfpc1Gc/nBb/spkf+PJ8fNCYrUK/U7wBrwC04qu0T2 H5Eu7agjDSmVE5wpVb+90gly4vqXxQQO5NQknVHPMe6AxDXXGYi536AmbrKYdrpZZJtC 9+TvnDfSkdsJa7dTrIh3f4qK8EN0RTVB2G44soNCkabFPtJ5xRO1FETvRYH22YyM0C8G AgMQ== X-Forwarded-Encrypted: i=1; AJvYcCVdbYovsPWXTUJP1/9cVmsvwflwBvj1NnCLTvEt8OPglQTs53FfjDKnJkiZz8uomqiMxODSZpqtCQ==@kvack.org X-Gm-Message-State: AOJu0Yx8aDyWGkM6tyAVryT2vtkqUlu3iwnGCYFupag03tiR53hgNYDl +wze3TSmpbrI0qSOW5qOt6rnd81d1mQQXxDJ3va8ywOpRAdq3RwW X-Gm-Gg: ASbGnctfL19eacN4ssGIh3Ni/Qc1eCy7FkCcEOds0xboPG/NiBTijrqnxokVfgdlRch tDAA6pf2cyKARmr5Rlbdy9/qYLA/z/maoU5APWbDYGJugweb5nAY3IoTrEVUdwr4ypiIwOLhNB2 zYGwcyWHpQm7HuYXUrQT+CTmXWTWTaJENC+gnZotJz9yVuW6nXHLwwENNmYwkyfcjoTMFDJJ+9g n53ZRpt1NAu1OEA3VlHJTa5Hpg9rN0HGKkuq8RJBklIfsdFnfeBxU+hJIujblBn6F+UBLbSVFQy fjE3ybKWC1Unc1xMFw== X-Google-Smtp-Source: AGHT+IGsI05nc3u4INuI63hgrvfexWbQ5GBKkI0kAUb2SP+FeeiPTtbBg8Udhf/lFmH68g03WAw38Q== X-Received: by 2002:a05:690c:6888:b0:6f4:8207:c68d with SMTP id 00721157ae682-6fd21dd7d5emr54601917b3.3.1740615286061; Wed, 26 Feb 2025 16:14:46 -0800 (PST) Received: from localhost ([2a03:2880:25ff:71::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6fd3cbc5ac9sm423007b3.121.2025.02.26.16.14.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Feb 2025 16:14:45 -0800 (PST) From: Nhat Pham To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, yosryahmed@google.com, yosry.ahmed@linux.dev, chengming.zhou@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org Subject: [PATCH v2] zswap: do not crash the kernel on decompression failure Date: Wed, 26 Feb 2025 16:14:45 -0800 Message-ID: <20250227001445.1099203-1-nphamcs@gmail.com> X-Mailer: git-send-email 2.43.5 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 4638740012 X-Stat-Signature: f4nhs8hzsw4bg14z3ggx33agokimhqd8 X-Rspamd-Server: rspam03 X-HE-Tag: 1740615287-405397 X-HE-Meta: U2FsdGVkX1+aDAxb9Amc8Dy5p88fQdirzDqNe+IGgrFL9E6aYnsMLVGr96JMLkQMm+6EaGZQoZceNpiXUa6oQEPQRuBBia93VGalUmCHCaJMiWZbQynJDgqJ+Tq+xJ636FhcC/n8fhDY/PugbI8Td9vQyJ9y9dlm+1cKnIDkrT8d4SUVd72ZkkQNQ/rMCO0bLen48bu2fNoWFq68MQ9xnQFeEPl2tTvk36dlTrq9hmucnFsLH+B7MOABLJ808ofUz2h6hCqW6Giwz77drjMg+0MjK1zIGpR8LQBV5sPbuRondFXymk6N41FKp1c07Jw0INUpERRRAMVGKCgDcZFvVPiWLUYaqBbxhT/O8V6LX0VGestxX6KvjxfavL2f1rg4jppHNGoPVDBdwLcJLvhchuHvh6ZZ+WvljgHFgVbHCq4aZ6oDU1Icx4O/D9VRKgt+lEDMJZOAy303mHcGBeyi5VIj+kEZcOtfeoCTbVNmIaku2ENPVxqESR0blX7vPFDovGsNYnONODU/ZwwJ5EB3NG7yhs821ybGL3X7itIqx9a/Ei4ktgryS1Ow7PoLy3NWzmitNrOoivB5BEWlltDdqDZ4H347v4+sQFaqrfHaXNzRiG/lo9x7X5UHSpeg1R4KB23nx/MJDcWYU7I358H9timEIlNsMHnudDyBBzQ77O3awmULUYlUJWhagfES3NKFIEGcVZK3dwvDqCWWrcUWvROcA2NZqVwtnvsNivY7txldAqmh2cOvAPkM0L1XRmtwRA4HrIV5ozL3n0rE4u5q97D4zwNdeimjUhBcXIjdzKqd0U6Er8Vu91rghqtpR5STZ9P6/0FNlPpB6wrBZrHE2d+fOwOZ5Lo+lgiMbQ7V5Fdyr5xRxxqAdcIk2gqtBNv8oV4onwi2pVYYWUag/zie6w0fhak3gyStvfZoXlIl7ovpEgNPMJKwRNoQ1LumzOPWIPGpw9NeDsIsSFVQ837 z91WKMBm RyKfU/cWV8wfZxIPzLmlfXCTXADQzc/HTfMijF3u55DHba71cwWhJpvsr0HtLgA7hbUbVaY1aXdBcqo+KHgUwGAkhp3qRaltYF/baDf7oW6hLjWldwo5DamJJkZRgeh3BC63kh2agsBjvwrukABT2kk21K9koizAirvXbP91Pv+Gg0qj21pfyIb+ZkWb5nEON91hWA6xZZDLcokpQNsjnztLo29CWLUfOD4stN0o8SDGaB4Fs8BY2Qf4Z90vc49biFLGz38uuTeYblzhW0EZkVtCSFnSLk34irkUCcwxmz2RUwyu2HxDTSl7V/W6HUjhqopSCQX4p2QFEwlEI3hfgWeegqTKcb+4QsFUDvP6gtC9u+WTErL1g6q+OAj7zPFb6F3Daz4SPq0n/tsP1KsxhjFnd4lO5i4ZIDQTid0BqXJGybC1QtIX5DhQ7FvEjrfQK71JTOWiJlaV0gYSBBI/wrn+I8gLPLsjyP+Q/09OpXo5PN1x1QY4G8xUy52CQ8UpSFW9x X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently, we crash the kernel when a decompression failure occurs in zswap (either because of memory corruption, or a bug in the compression algorithm). This is overkill. We should only SIGBUS the unfortunate process asking for the zswap entry on zswap load, and skip the corrupted entry in zswap writeback. The former is accomplished by returning true from zswap_load(), indicating that zswap owns the swapped out content, but without flagging the folio as up-to-date. The process trying to swap in the page will check for the uptodate folio flag and SIGBUS (see do_swap_page() in mm/memory.c for more details). See [1] for a recent upstream discussion about this. [1]: https://lore.kernel.org/all/ZsiLElTykamcYZ6J@casper.infradead.org/ Suggested-by: Matthew Wilcox Suggested-by: Yosry Ahmed Signed-off-by: Nhat Pham --- mm/zswap.c | 94 ++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 67 insertions(+), 27 deletions(-) base-commit: 598d34afeca6bb10554846cf157a3ded8729516c diff --git a/mm/zswap.c b/mm/zswap.c index 6dbf31bd2218..e4a2157bbc64 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -62,6 +62,8 @@ static u64 zswap_reject_reclaim_fail; static u64 zswap_reject_compress_fail; /* Compressed page was too big for the allocator to (optimally) store */ static u64 zswap_reject_compress_poor; +/* Load or writeback failed due to decompression failure */ +static u64 zswap_decompress_fail; /* Store failed because underlying allocator could not get memory */ static u64 zswap_reject_alloc_fail; /* Store failed because the entry metadata could not be allocated (rare) */ @@ -996,11 +998,13 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry, return comp_ret == 0 && alloc_ret == 0; } -static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) +static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio) { struct zpool *zpool = entry->pool->zpool; struct scatterlist input, output; struct crypto_acomp_ctx *acomp_ctx; + int decomp_ret; + bool ret = true; u8 *src; acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool); @@ -1025,12 +1029,25 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) sg_init_table(&output, 1); sg_set_folio(&output, folio, PAGE_SIZE, 0); acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE); - BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait)); - BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); + decomp_ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait); + if (decomp_ret || acomp_ctx->req->dlen != PAGE_SIZE) { + ret = false; + zswap_decompress_fail++; + pr_alert_ratelimited( + "decompression failed with returned value %d on zswap entry with swap entry value %08lx, swap type %d, and swap offset %lu. compression algorithm is %s. compressed size is %u bytes, and decompressed size is %u bytes.\n", + decomp_ret, + entry->swpentry.val, + swp_type(entry->swpentry), + swp_offset(entry->swpentry), + entry->pool->tfm_name, + entry->length, + acomp_ctx->req->dlen); + } if (src != acomp_ctx->buffer) zpool_unmap_handle(zpool, entry->handle); acomp_ctx_put_unlock(acomp_ctx); + return ret; } /********************************* @@ -1060,6 +1077,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE, }; + int ret = 0; /* try to allocate swap cache folio */ si = get_swap_device(swpentry); @@ -1081,8 +1099,8 @@ static int zswap_writeback_entry(struct zswap_entry *entry, * and freed when invalidated by the concurrent shrinker anyway. */ if (!folio_was_allocated) { - folio_put(folio); - return -EEXIST; + ret = -EEXIST; + goto put_folio; } /* @@ -1095,14 +1113,17 @@ static int zswap_writeback_entry(struct zswap_entry *entry, * be dereferenced. */ tree = swap_zswap_tree(swpentry); - if (entry != xa_cmpxchg(tree, offset, entry, NULL, GFP_KERNEL)) { - delete_from_swap_cache(folio); - folio_unlock(folio); - folio_put(folio); - return -ENOMEM; + if (entry != xa_load(tree, offset)) { + ret = -ENOMEM; + goto delete_unlock; + } + + if (!zswap_decompress(entry, folio)) { + ret = -EIO; + goto delete_unlock; } - zswap_decompress(entry, folio); + xa_erase(tree, offset); count_vm_event(ZSWPWB); if (entry->objcg) @@ -1118,9 +1139,14 @@ static int zswap_writeback_entry(struct zswap_entry *entry, /* start writeback */ __swap_writepage(folio, &wbc); - folio_put(folio); - return 0; +put_folio: + folio_put(folio); + return ret; +delete_unlock: + delete_from_swap_cache(folio); + folio_unlock(folio); + goto put_folio; } /********************************* @@ -1620,6 +1646,20 @@ bool zswap_store(struct folio *folio) return ret; } +/** + * zswap_load() - load a page from zswap + * @folio: folio to load + * + * Returns: true if zswap owns the swapped out contents, false otherwise. + * + * Note that the zswap_load() return value doesn't indicate success or failure, + * but whether zswap owns the swapped out contents. This MUST return true if + * zswap does own the swapped out contents, even if it fails to write the + * contents to the folio. Otherwise, the caller will try to read garbage from + * the backend. + * + * Success is signaled by marking the folio uptodate. + */ bool zswap_load(struct folio *folio) { swp_entry_t swp = folio->swap; @@ -1644,6 +1684,17 @@ bool zswap_load(struct folio *folio) if (WARN_ON_ONCE(folio_test_large(folio))) return true; + entry = xa_load(tree, offset); + if (!entry) + return false; + + if (!zswap_decompress(entry, folio)) + return true; + + count_vm_event(ZSWPIN); + if (entry->objcg) + count_objcg_events(entry->objcg, ZSWPIN, 1); + /* * When reading into the swapcache, invalidate our entry. The * swapcache can be the authoritative owner of the page and @@ -1656,21 +1707,8 @@ bool zswap_load(struct folio *folio) * files, which reads into a private page and may free it if * the fault fails. We remain the primary owner of the entry.) */ - if (swapcache) - entry = xa_erase(tree, offset); - else - entry = xa_load(tree, offset); - - if (!entry) - return false; - - zswap_decompress(entry, folio); - - count_vm_event(ZSWPIN); - if (entry->objcg) - count_objcg_events(entry->objcg, ZSWPIN, 1); - if (swapcache) { + xa_erase(tree, offset); zswap_entry_free(entry); folio_mark_dirty(folio); } @@ -1771,6 +1809,8 @@ static int zswap_debugfs_init(void) zswap_debugfs_root, &zswap_reject_compress_fail); debugfs_create_u64("reject_compress_poor", 0444, zswap_debugfs_root, &zswap_reject_compress_poor); + debugfs_create_u64("decompress_fail", 0444, + zswap_debugfs_root, &zswap_decompress_fail); debugfs_create_u64("written_back_pages", 0444, zswap_debugfs_root, &zswap_written_back_pages); debugfs_create_file("pool_total_size", 0444,