From patchwork Tue Feb 25 09:52:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolin Wang X-Patchwork-Id: 13989677 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61270C021B2 for ; Tue, 25 Feb 2025 09:53:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9BF9C6B007B; Tue, 25 Feb 2025 04:53:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 96F2D6B0082; Tue, 25 Feb 2025 04:53:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 885656B0085; Tue, 25 Feb 2025 04:53:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6E68D6B007B for ; Tue, 25 Feb 2025 04:53:17 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2366F1C9260 for ; Tue, 25 Feb 2025 09:53:17 +0000 (UTC) X-FDA: 83158004034.28.08DCFB6 Received: from out30-119.freemail.mail.aliyun.com (out30-119.freemail.mail.aliyun.com [115.124.30.119]) by imf06.hostedemail.com (Postfix) with ESMTP id 5C1C318000B for ; Tue, 25 Feb 2025 09:53:13 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=uj5DyjEp; spf=pass (imf06.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.119 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740477195; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=VVoLc33zPMeaiAyTcxHiedjnEIt4XQjILMkv0uRR958=; b=vltmRIody7NdmrqRQCBIup+2+ZwaJzcwuxi518s9GSQRL+Wiff8T/sXolRTGcGHoeLWL3P FE4D8BMs9GnUlTmtGBjKK0wYKlTWi6PHgYKdv9vfhxn+1/es2jTFIZVEXYd12PQ1TOvKhs /4gzIFaKd5+u3ccJ+eG0Tk5s/mTohw4= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=uj5DyjEp; spf=pass (imf06.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.119 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740477195; a=rsa-sha256; cv=none; b=E5yY4oq08V/Q/e107c0CP5HFzGWSem50v4xkZ61rxO4b9jSxJjdFFoXvBQhjUrRaFfb5O5 sXoYP7YNUgAlA2/U6bnvUKxCfBqtC5O6+3BW8dBEjOa0qH5fc/EZPFzh0yRm/ywATO2P0t DUTeGPrCo1ziRyUhut1hcp6JsoC/WJA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1740477191; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=VVoLc33zPMeaiAyTcxHiedjnEIt4XQjILMkv0uRR958=; b=uj5DyjEpO5VItRSzbsL/s5ik26t8ZyklVFbPcvJcDJg1D3kgfAzM4v6btBCzZ/APb7yVkyLpNG16TspAvV2QHsAtJwd569hLYKoOOpIeeeAiWxmHUi1u5zSwrqwLGuxo9JsF1+nvmA8MAHrNnMCnhlSzz1c+jxVoryBqSCdGsCk= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WQELTmb_1740477190 cluster:ay36) by smtp.aliyun-inc.com; Tue, 25 Feb 2025 17:53:10 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com Cc: willy@infradead.org, david@redhat.com, ioworker0@gmail.com, alex_y_xu@yahoo.ca, ryncsn@gmail.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2] mm: shmem: fix potential data corruption during shmem swapin Date: Tue, 25 Feb 2025 17:52:55 +0800 Message-ID: <2fe47c557e74e9df5fe2437ccdc6c9115fa1bf70.1740476943.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: y4sn18jk4x7woxqu1g8te1yr4xa6uzhx X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 5C1C318000B X-HE-Tag: 1740477193-625302 X-HE-Meta: U2FsdGVkX19oLxLtFR/K2EMHwoIi1JWzVUzuWQCAE6Wszajz73v30OhWEF5k3aCnZDawyJ8d+RAQEOJhBSTIouzG2bPGrM/15jpGUhM7KzB9sWG1T9sJMjuVFxcXGauDMhFqYGbBWDqJcBtcXf7FUQievDPABhiE285KAK0g/rIeuR2oc5YEOddozviz9UEQVguGjKDW6RP/nLQEli/p5mrX7/yH9EZ4wnRkllXevzZ96Rkv/gAUvmL8+pqVTltOq4e3QemsxJxWcsnClgRZDf6rKlOniVChe6W2YqOGppSIt0lmmzkiug6Z/o9CsgYcTrzb60lx2fxYQR4qLMJQz6/WhHOyxUJr1KseXiZVk9db83ZndMvnRvhDUvYinxCHaZtjI0w8r9ILaTS1WHnWRto7OvAUjuCWkryoiXzQQ9dpCDS/uqmrA9MN73v09UPOnBBV1qTpNq5AVr6XOCNBlwm4TiYe6aLkuQ9rUOeT2dMvZqOVb9hjrvAm5cPMQJezKWX+3x0zJeJcN0U2z78fHyUDrfQIL+7mcLWVDOP19q9lJEN9ADmIWDwSy+C7AlQ69ziPHGVHF5Ro759fVBrTVa5DEmsV7gD/7h6Zd4jS3A2YIAZVnBJmX1qrAWEj4XWaYL8ZaBqdS1UID/tP9lrLBsAbyGe3d/WxVtzIoyYFvGs2FmJuDsoGwK5KuPbutfz1BprxyLF/kLJQ0s5mPHecNXMCqw2mRSAGnvOXV7KWewk4i9lXtHQVn5+MYL/RzgqIzMoAT1M+91Iw0AUz67PDfkQVLtB4y9nuhiHcOUB79Z/REtHkz40ABgakv8wZo6Ktzl1T0C3laGXVYJ19TDbjBtpg7O38et8nrsfOP/D+kW2vcvIwOlM7jU5vmZixtVcvuOXBoKf6kEOtQoHogjc4Q4AJgULiQ5O4hzW7fxxLwYUgHMYOQ5+341QzZVomZIVnG6xudafrAaiePC13p+c i7T0kThW 03UQelhEufhhlyIIkz9qKYEZxdU9ARzsZSPKLEYiPSjW6CllGsq+YZtPyDIkoJ4z91TLSWSmLLwbtRvltSCAd6nCCm84LnzxPThYgMNlSMPjAHd0xxPprRWUL2C6lEFaS2XW/POYjDX8CA8WAhQK0kHUicUDNnPoD8+383rX0YliCfIoCYizHCPZSfLtsgd5rzl+b2Sb/o9iZIYyw9izQpUd17082x11LuhcNW4UOcN19Jmhi44gh7Ybfm/4uzcL9Z2UutwuKJjDSwJu1WFkYkZF67V0cduNnW42mdKXVg1qeGlvXjR40VChTBR1duIpcuGppK+fnDOwejf6V/unXTEQh7wWQRPLA+MTQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000003, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Alex and Kairui reported some issues (system hang or data corruption) when swapping out or swapping in large shmem folios. This is especially easy to reproduce when the tmpfs is mount with the 'huge=within_size' parameter. Thanks to Kairui's reproducer, the issue can be easily replicated. The root cause of the problem is that swap readahead may asynchronously swap in order 0 folios into the swap cache, while the shmem mapping can still store large swap entries. Then an order 0 folio is inserted into the shmem mapping without splitting the large swap entry, which overwrites the original large swap entry, leading to data corruption. When getting a folio from the swap cache, we should split the large swap entry stored in the shmem mapping if the orders do not match, to fix this issue. Fixes: 809bc86517cc ("mm: shmem: support large folio swap out") Reported-by: Alex Xu (Hello71) Reported-by: Kairui Song Closes: https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/ Tested-by: Kairui Song Signed-off-by: Baolin Wang --- Changes from v1: - Add the tested tag from Kairui. Thanks Kairui. - Add Closes information to the commit message. --- mm/shmem.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 4ea6109a8043..cebbac97a221 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2253,7 +2253,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, struct folio *folio = NULL; bool skip_swapcache = false; swp_entry_t swap; - int error, nr_pages; + int error, nr_pages, order, split_order; VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); swap = radix_to_swp_entry(*foliop); @@ -2272,10 +2272,9 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, /* Look it up and read it in.. */ folio = swap_cache_get_folio(swap, NULL, 0); + order = xa_get_order(&mapping->i_pages, index); if (!folio) { - int order = xa_get_order(&mapping->i_pages, index); bool fallback_order0 = false; - int split_order; /* Or update major stats only when swapin succeeds?? */ if (fault_type) { @@ -2339,6 +2338,29 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, error = -ENOMEM; goto failed; } + } else if (order != folio_order(folio)) { + /* + * Swap readahead may swap in order 0 folios into swapcache + * asynchronously, while the shmem mapping can still stores + * large swap entries. In such cases, we should split the + * large swap entry to prevent possible data corruption. + */ + split_order = shmem_split_large_entry(inode, index, swap, gfp); + if (split_order < 0) { + error = split_order; + goto failed; + } + + /* + * If the large swap entry has already been split, it is + * necessary to recalculate the new swap entry based on + * the old order alignment. + */ + if (split_order > 0) { + pgoff_t offset = index - round_down(index, 1 << split_order); + + swap = swp_entry(swp_type(swap), swp_offset(swap) + offset); + } } alloced: @@ -2346,7 +2368,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, folio_lock(folio); if ((!skip_swapcache && !folio_test_swapcache(folio)) || folio->swap.val != swap.val || - !shmem_confirm_swap(mapping, index, swap)) { + !shmem_confirm_swap(mapping, index, swap) || + xa_get_order(&mapping->i_pages, index) != folio_order(folio)) { error = -EEXIST; goto unlock; }