From patchwork Sat Jun 8 02:36:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 13690822 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD482C27C53 for ; Sat, 8 Jun 2024 02:37:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A1046B009D; Fri, 7 Jun 2024 22:37:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 650846B009F; Fri, 7 Jun 2024 22:37:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 517E86B00A0; Fri, 7 Jun 2024 22:37:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 32FF16B009D for ; Fri, 7 Jun 2024 22:37:07 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id CDC0014056B for ; Sat, 8 Jun 2024 02:37:06 +0000 (UTC) X-FDA: 82206159252.09.909CC63 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf11.hostedemail.com (Postfix) with ESMTP id 2C0C54000B for ; Sat, 8 Jun 2024 02:37:04 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=A5qK+Vv3; spf=pass (imf11.hostedemail.com: domain of 3z8NjZgoKCCAUKONU6DIA9CKKCHA.8KIHEJQT-IIGR68G.KNC@flex--yosryahmed.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3z8NjZgoKCCAUKONU6DIA9CKKCHA.8KIHEJQT-IIGR68G.KNC@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717814225; a=rsa-sha256; cv=none; b=v6A+87U0dz5xEfx5RjezpSD6QrWHr3M5iGmJ1Wt1YT43oF+6Dhb2wTCnh2Uer75Pc0fQku 8OS4Os7CTvANfjH3peuiZrHf1Q53XclTsYWeg5qmfdBCozQ4ZYGpjF3OhzfOQXV+VqiEjk kHDR7mfzPNE5kI1SLb7iEyY1+zCF49I= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=A5qK+Vv3; spf=pass (imf11.hostedemail.com: domain of 3z8NjZgoKCCAUKONU6DIA9CKKCHA.8KIHEJQT-IIGR68G.KNC@flex--yosryahmed.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3z8NjZgoKCCAUKONU6DIA9CKKCHA.8KIHEJQT-IIGR68G.KNC@flex--yosryahmed.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717814225; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=RiSTt00lJmIM9pwkdwoE4isrbgVfqbDGpglyPba/Gok=; b=lQJKGoPOOQ1vZwBAicwDqoZzx7aP4FXhgloGrNPVKEe9lQTihLdtfRAOlLCejpFrpRf0Oi FLhO1+9qDFN5vVuXibxc3kAdrspxA00Akx8Pnh6jfAjqw7c3Y5qTTBMQNojysUFp06XYqO nCCrJwYvFR538wPHsAzkMFHxJ/0cbiM= Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2c2dd8026dbso147362a91.0 for ; Fri, 07 Jun 2024 19:37:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717814224; x=1718419024; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=RiSTt00lJmIM9pwkdwoE4isrbgVfqbDGpglyPba/Gok=; b=A5qK+Vv3xgJbkcIHgPJp/+3N/hMpZNJqV5bwIUE11kcsEwFN+jXNoZBNogT3ngNaZP hqM2nqbbz7Wm9ykgWQUWnIyolccOhB7yoKpluLZGwRfj57SqdTXg8EgNQ+wyFacXQpTj 76cIdTDAInwAoiwaaKLX3sWudEVfdIzUE7tnbuFCXs33s2JPAp4uPwlP9NmF0UCMDY7i LL+Llv3VvF18tJoQJkXiwbjkT8R+RxbXXrGGwlgugl7dO90DYCCf+2VmRjqu3K8Y0HsU n9WASGJrjF8Fak+F093zwP0cSdnBEpAxYxBkJfH4yZa6BTSOD6ocPfUo6GRDvKJ1wFll wLMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717814224; x=1718419024; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=RiSTt00lJmIM9pwkdwoE4isrbgVfqbDGpglyPba/Gok=; b=WM8Q0JZI0y0KGyo8r01INyFTJkDsBzSakO1rbE+2RHlf1lD6jzoB/RQh9u+B1spzIm m2nwiqbkG7IRTYRGeLyImCX40Qrr743jBkA8lx4tQEhlIfjxhTBtAfi197zTcrEtkXhB mPWEkFhiAJNXC38TvS8LfA0Q5UEVxQ0JqJXG5GMzlnOF/cj0RbqBtOaxNRESChZ4woys s7BQwFHtL2PJX8jg3IAZ9U2tA0KurXUoOt1HHT8+nGhgqS2WtKGizygt3PzJGGwBx+C2 YQKrCXt7UfZClybGQ7WCksl4A5G3hEk+QIfnPTlEz36uEUZCHFICTPatG02HJoHwTCYo /wrg== X-Forwarded-Encrypted: i=1; AJvYcCXAoajl72Jr4okYArb8dvrZI7HR3AjPxuYVxrgat8Y8JUlGB0RrmalKqvjCw2X9RNb0mGZoVOUanwiRKlrfrtPyG4I= X-Gm-Message-State: AOJu0YzHLvpZMmOraVxelDTOHjNLsTtZKYn+gV4gb/z5tGJBVDcsHb2q B2qm2UjGW4O13cuxaXkOkRv8xHEUZjpNjCHDZP7PhwY1IOvroLj6LNU782X9wtwEs26yfhNKUsR aaQSa3pV9slNBFvFuVw== X-Google-Smtp-Source: AGHT+IFfA8mr9lfhDrLYFZf7Onfx5mpbJHVARhxMTc00b+HD8uEOFfV9ZWXuvMVIppVolBKqBMlIaVD8/YnpJlxT X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:29b4]) (user=yosryahmed job=sendgmr) by 2002:a17:90b:30c2:b0:2b2:bc19:61f5 with SMTP id 98e67ed59e1d1-2c2b8989ebdmr45265a91.1.1717814223591; Fri, 07 Jun 2024 19:37:03 -0700 (PDT) Date: Sat, 8 Jun 2024 02:36:54 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.45.2.505.gda0bf45e8d-goog Message-ID: <20240608023654.3513385-1-yosryahmed@google.com> Subject: [PATCH v2] mm: zswap: handle incorrect attempts to load of large folios From: Yosry Ahmed To: Andrew Morton Cc: Johannes Weiner , Nhat Pham , Chengming Zhou , Baolin Wang , Barry Song <21cnbao@gmail.com>, Chris Li , Ryan Roberts , David Hildenbrand , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yosry Ahmed X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 2C0C54000B X-Stat-Signature: whe4madcakx9qq387yqirx7ti7f8ju9d X-HE-Tag: 1717814224-736461 X-HE-Meta: U2FsdGVkX1/qn+M8GERGaUg5c/uQh1NCURWl4K8aBlo+L7gcZ42abzde15T9T3oly9l892XcvFARVDWDH8/6VnqM/1f8kAtsyR0zPfvECR6mX0txwB3rtln4gPa0aGkdP3rdB5UC6H7je7IGeNWdqwG4DZNXPAUjzJ56Uxvs/4ZCp6LJH+G6Q1yb67uKtkchuablSEz9WIGOSCZkDBhXwLd2lKhKm14/WtqymZM0Q/YduGd9dEtq6IOuiiFR+rFXu48N01EVOEh3oH3h3u24Wjog0u84g89H9z4QZz3zyH9YjxHfCMXHxxUm59na0W7XDu6noyBzk7B9S3qz3U6ZyCltwK+vsmVHTwylT0GBCCrlK2iGyCVZKFn7Sx9i+c3Zky/jX9LAsP3T0v6zO1QXx7vIL4lgNKNh9Gpnb+wOdXmttXT+8cWtkAb84tNshVUj6glAcRpyuN1CtQCSJ0Y/yUN8G/YDT6YASk6CxLMxEHOzPS1NpcoZdEeG+LwNSjW1SUimrE/YwNu4KUUHFr0r/aQ9PzoygQTcwNNullQWer5myIA/T6SemUUj+9i6sfQuVfy1g/BT+2xb3Sbh/YFrmTabvTpt+8f6LciGPiUe2SEab/+qRP7LBo7uhcud/TYRluYruipAFkjdADjrAdZZqUy02i2M+1/dLSePGwpLseouAhq1E8qoiW/rRaJXOdMn3vPuzshj2Pg5EAPPf0jZgGCe8h0rLU1Z4DhfTJBybrlSIaOfCSxj3gDGTM4qgG5gtB14bMV7QQ2WMVxzDi/DxOJr80eFbB9tTVA6LFQFQte2B1/iMRLuurhDoIgsi8BRIevd3oLobSVSpqODFCR2PO/Y6YmcDNrSSMM63VthFvE02Oh38ZudrwyprNjW4Ohz3584a0Epz+jIEWynfDWSOj66OlFtG7csIjt3y8UQ4gKqyu8Z339nf0BWs5IbVb9zc1y8PjX7DHQwOhTnVw8 cfg+vP5j tyfTiP/YSYXiaR/k4OKajMh23pCh3CtD3F99tTQbIbfA8+tCUcUEyw5qwXYTFnZucvx+eZet6yyzNGOe5PNcdYbjd4U0F8jxflK53jXy8WYfXWPK4i2rODS2MPMH/5xOIhlmVf/ucLMlQRrX2wzXmFjMv1gMe2V1JBMDAgj648qoTOGjgBtkGOQDXROb3fpFPbV+WhL5sEiioFTG/EiWYy4UtjUUInKzSijqPwaqh5FTcZf5P0yIvbcnFhwRI68mAwKnlcRUIFJXODE6j88t3/2yiNU8sHKSTL3IYKSjAIO9uj/+8sd1bIzIcgg6/0SSi/Ee3zDTifnRrJf+ezvYydsJLbNvL921zjp3bAw0Q7arXnKZ65A+uao/qaBSo+4MmfSlqpnnqHJAaF6giMKzzCyC+JE06ywL22OJ85Hy1GGONpHAwsQO7LyIC1lCmlYH24Xo4BFI8inv4o5AsdO6Nub54NHt5Z4ni+w5qqlQHfxF1Cmc2CkSHSnnceCfIGjGRSdwr5FXf0kS2/QKPncIdTKqLB0xkMNrUvRltWcQ+cBsO2ehrCne6UeyFY/q3IIubjpRdqnILQBIf3DtIiqtA3TqbwJ2Kzp1YxFQCKHJY7Ce4q4nMvuG9cUjnvcF+SnBJwQC5jA/5j8dB9MOE1JCU1y8fTYp2U9RX3aJB7OSzCnUvffAX441JYFlsFM8M6wp6tssa/8rfcKmujsa4O4/NfJkjvR5w/EzFN7Ay X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Zswap does not support storing or loading large folios. Until proper support is added, attempts to load large folios from zswap are a bug. For example, if a swapin fault observes that contiguous PTEs are pointing to contiguous swap entries and tries to swap them in as a large folio, swap_read_folio() will pass in a large folio to zswap_load(), but zswap_load() will only effectively load the first page in the folio. If the first page is not in zswap, the folio will be read from disk, even though other pages may be in zswap. In both cases, this will lead to silent data corruption. Proper support needs to be added before large folio swapins and zswap can work together. Looking at callers of swap_read_folio(), it seems like they are either allocated from __read_swap_cache_async() or do_swap_page() in the SWP_SYNCHRONOUS_IO path. Both of which allocate order-0 folios, so everything is fine for now. However, there is ongoing work to add to support large folio swapins [1]. To make sure new development does not break zswap (or get broken by zswap), add minimal handling of incorrect loads of large folios to zswap. First, move the call folio_mark_uptodate() inside zswap_load(). If a large folio load is attempted, and any page in that folio is in zswap, return 'true' without calling folio_mark_uptodate(). This will prevent the folio from being read from disk, and will emit an IO error because the folio is not uptodate (e.g. do_swap_fault() will return VM_FAULT_SIGBUS). It may not be reliable recovery in all cases, but it is better than nothing. This was tested by hacking the allocation in __read_swap_cache_async() to use order 2 and __GFP_COMP. In the future, to handle this correctly, the swapin code should: (a) Fallback to order-0 swapins if zswap was ever used on the machine, because compressed pages remain in zswap after it is disabled. (b) Add proper support to swapin large folios from zswap (fully or partially). Probably start with (a) then followup with (b). [1]https://lore.kernel.org/linux-mm/20240304081348.197341-6-21cnbao@gmail.com/ Signed-off-by: Yosry Ahmed --- v1: https://lore.kernel.org/lkml/20240606184818.1566920-1-yosryahmed@google.com/ v1 -> v2: - Instead of using VM_BUG_ON() use WARN_ON_ONCE() and add some recovery handling (David Hildenbrand). --- mm/page_io.c | 1 - mm/zswap.c | 22 +++++++++++++++++++++- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index f1a9cfab6e748..8f441dd8e109f 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -517,7 +517,6 @@ void swap_read_folio(struct folio *folio, struct swap_iocb **plug) delayacct_swapin_start(); if (zswap_load(folio)) { - folio_mark_uptodate(folio); folio_unlock(folio); } else if (data_race(sis->flags & SWP_FS_OPS)) { swap_read_folio_fs(folio, plug); diff --git a/mm/zswap.c b/mm/zswap.c index b9b35ef86d9be..ebb878d3e7865 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1557,6 +1557,26 @@ bool zswap_load(struct folio *folio) VM_WARN_ON_ONCE(!folio_test_locked(folio)); + /* + * Large folios should not be swapped in while zswap is being used, as + * they are not properly handled. Zswap does not properly load large + * folios, and a large folio may only be partially in zswap. + * + * If any of the subpages are in zswap, reading from disk would result + * in data corruption, so return true without marking the folio uptodate + * so that an IO error is emitted (e.g. do_swap_page() will sigfault). + * + * Otherwise, return false and read the folio from disk. + */ + if (folio_test_large(folio)) { + if (xa_find(tree, &offset, + offset + folio_nr_pages(folio) - 1, XA_PRESENT)) { + WARN_ON_ONCE(1); + return true; + } + return false; + } + /* * When reading into the swapcache, invalidate our entry. The * swapcache can be the authoritative owner of the page and @@ -1590,7 +1610,7 @@ bool zswap_load(struct folio *folio) zswap_entry_free(entry); folio_mark_dirty(folio); } - + folio_mark_uptodate(folio); return true; }