[10/40] lustre: llite: check truncated page in ->readpage()

Message ID	1681042400-15491-11-git-send-email-jsimmons@infradead.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <lustre-devel-bounces@lists.lustre.org> From: James Simmons <jsimmons@infradead.org> To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de> Date: Sun, 9 Apr 2023 08:12:50 -0400 Message-Id: <1681042400-15491-11-git-send-email-jsimmons@infradead.org> In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 10/40] lustre: llite: check truncated page in ->readpage() Precedence: list Cc: Lustre Development List <lustre-devel@lists.lustre.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>
Series	lustre: backport OpenSFS changes from March XX, 2023 \| expand [00/40] lustre: backport OpenSFS changes from March XX, 2023 [01/40] lustre: protocol: basic batching processing framework [02/40] lustre: lov: fiemap improperly handles fm_extent_count=0 [03/40] lustre: llite: SIGBUS is possible on a race with page reclaim [04/40] lustre: osc: page fault in osc_release_bounce_pages() [05/40] lustre: readahead: add stats for read-ahead page count [06/40] lustre: quota: enforce project quota for root [07/40] lustre: ldlm: send the cancel RPC asap [08/40] lustre: enc: align Base64 encoding with RFC 4648 base64url [09/40] lustre: quota: fix insane grant quota [10/40] lustre: llite: check truncated page in ->readpage() [11/40] lnet: o2iblnd: Fix key mismatch issue [12/40] lustre: sec: fid2path for encrypted files [13/40] lustre: sec: Lustre/HSM on enc file with enc key [14/40] lustre: llite: check read page past requested [15/40] lustre: llite: fix relatime support [16/40] lustre: ptlrpc: clarify AT error message [17/40] lustre: update version to 2.15.54 [18/40] lustre: tgt: skip free inodes in OST weights [19/40] lustre: fileset: check fileset for operations by fid [20/40] lustre: clio: Remove cl_page_size() [21/40] lustre: fid: clean up OBIF_MAX_OID and IDIF_MAX_OID [22/40] lustre: llog: fix processing of a wrapped catalog [23/40] lustre: llite: replace lld_nfs_dentry flag with opencache handling [24/40] lustre: llite: match lock in corresponding namespace [25/40] lnet: libcfs: remove unused hash code [26/40] lustre: client: -o network needs add_conn processing [27/40] lnet: Lock primary NID logic [28/40] lnet: Peers added via kernel API should be permanent [29/40] lnet: don't delete peer created by Lustre [30/40] lnet: memory leak in copy_ioc_udsp_descr [31/40] lnet: remove crash with UDSP [32/40] lustre: ptlrpc: fix clang build errors [33/40] lustre: ldlm: remove client_import_find_conn() [34/40] lnet: add 'force' option to lnetctl peer del [35/40] lustre: ldlm: BL_AST lock cancel still can be batched [36/40] lnet: lnet_parse_route uses wrong loop var [37/40] lustre: tgt: add qos debug [38/40] lustre: enc: file names encryption when using secure boot [39/40] lustre: uapi: add DMV_IMP_INHERIT connect flag [40/40] lustre: llite: dir layout inheritance fixes

Message ID

1681042400-15491-11-git-send-email-jsimmons@infradead.org (mailing list archive)

State

New, archived

Headers

From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>, Oleg Drokin <green@whamcloud.com>,
 NeilBrown <neilb@suse.de>
Date: Sun,  9 Apr 2023 08:12:50 -0400
Message-Id: <1681042400-15491-11-git-send-email-jsimmons@infradead.org>
In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org>
References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org>
Subject: [lustre-devel] [PATCH 10/40] lustre: llite: check truncated page in
 ->readpage()
Precedence: list
Cc: Lustre Development List <lustre-devel@lists.lustre.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: lustre-devel-bounces@lists.lustre.org
Sender: "lustre-devel" <lustre-devel-bounces@lists.lustre.org>

Series

lustre: backport OpenSFS changes from March XX, 2023 | expand

Commit Message

James Simmons April 9, 2023, 12:12 p.m. UTC

From: Qian Yingjin <qian@ddn.com>

The page end offset calculation in filemap_get_read_batch() was
off by one. This bug was introduced in commit v5.11-10234-gcbd59c48ae
("mm/filemap: use head pages in generic_file_buffered_read")

When a read is submitted with end offset 1048575, it calculates
the end page index for read of 256 where it should be 255. This
results in the readpage() call for the page with index 256 is over
stripe boundary and may not be covered by a DLM extent lock.

This happens in a corner race case: filemap_get_read_batch()
batches the page with index 256 for read, but later this page is
removed from page cache due to the lock protected it being revoked,
but has a reference count due to the batch.  This results in this
page in the read path is not covered by any DLM lock.

The solution is simple. We can check whether the page was
truncated and removed from page cache in ->readpage() by the
address_sapce pointer of the page. If it was truncated, return
AOP_TRUNCATED_PAGE to the upper caller.  This will cause the
kernel to retry to batch pages and the truncated page will not
be added as it was already removed from page cache of the file.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16412
Lustre-commit: 209afbe28b5f164bd ("LU-16412 llite: check truncated page in ->readpage()")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49433
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_support.h |  6 ++++--
 fs/lustre/llite/rw.c            | 35 +++++++++++++++++++++++++++++++++++
 fs/lustre/llite/rw26.c          |  7 +++++++
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index a2930c8..4ef5c61 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -458,8 +458,8 @@ 
 /* was	OBD_FAIL_LLOG_CATINFO_NET			0x1309 until 2.3 */
 #define OBD_FAIL_MDS_SYNC_CAPA_SL			0x1310
 #define OBD_FAIL_SEQ_ALLOC				0x1311
-#define OBD_FAIL_PLAIN_RECORDS			    0x1319
-#define OBD_FAIL_CATALOG_FULL_CHECK		    0x131a
+#define OBD_FAIL_PLAIN_RECORDS				0x1319
+#define OBD_FAIL_CATALOG_FULL_CHECK			0x131a
 
 #define OBD_FAIL_LLITE					0x1400
 #define OBD_FAIL_LLITE_FAULT_TRUNC_RACE			0x1401
@@ -488,6 +488,8 @@ 
 #define OBD_FAIL_LLITE_PAGE_ALLOC			0x1418
 #define OBD_FAIL_LLITE_OPEN_DELAY			0x1419
 #define OBD_FAIL_LLITE_XATTR_PAUSE			0x1420
+#define OBD_FAIL_LLITE_PAGE_INVALIDATE_PAUSE		0x1421
+#define OBD_FAIL_LLITE_READPAGE_PAUSE			0x1422
 
 #define OBD_FAIL_FID_INDIR				0x1501
 #define OBD_FAIL_FID_INLMA				0x1502
diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index 0b14ea6..dea2af1 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -1865,6 +1865,41 @@  int ll_readpage(struct file *file, struct page *vmpage)
 	struct ll_sb_info *sbi = ll_i2sbi(inode);
 	int result;
 
+	if (OBD_FAIL_PRECHECK(OBD_FAIL_LLITE_READPAGE_PAUSE)) {
+		unlock_page(vmpage);
+		OBD_FAIL_TIMEOUT(OBD_FAIL_LLITE_READPAGE_PAUSE, cfs_fail_val);
+		lock_page(vmpage);
+	}
+
+	/*
+	 * The @vmpage got truncated.
+	 * This is a kernel bug introduced since kernel 5.12:
+	 * comment: cbd59c48ae2bcadc4a7599c29cf32fd3f9b78251
+	 * ("mm/filemap: use head pages in generic_file_buffered_read")
+	 *
+	 * The page end offset calculation in filemap_get_read_batch() was off
+	 * by one.  When a read is submitted with end offset 1048575, then it
+	 * calculates the end page for read of 256 where it should be 255. This
+	 * results in the readpage() for the page with index 256 is over stripe
+	 * boundary and may not covered by a DLM extent lock.
+	 *
+	 * This happens in a corner race case: filemap_get_read_batch() adds
+	 * the page with index 256 for read which is not in the current read
+	 * I/O context, and this page is being invalidated and will be removed
+	 * from page cache due to the lock protected it being revoken. This
+	 * results in this page in the read path not covered by any DLM lock.
+	 *
+	 * The solution is simple. Check whether the page was truncated in
+	 * ->readpage(). If so, just return AOP_TRUNCATED_PAGE to the upper
+	 * caller. Then the kernel will retry to batch pages, and it will not
+	 * add the truncated page into batches as it was removed from page
+	 * cache of the file.
+	 */
+	if (vmpage->mapping != inode->i_mapping) {
+		unlock_page(vmpage);
+		return AOP_TRUNCATED_PAGE;
+	}
+
 	lcc = ll_cl_find(inode);
 	if (lcc) {
 		env = lcc->lcc_env;
diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c
index cadded4..6700717 100644
--- a/fs/lustre/llite/rw26.c
+++ b/fs/lustre/llite/rw26.c
@@ -96,6 +96,13 @@  static void ll_invalidatepage(struct page *vmpage, unsigned int offset,
 		}
 		cl_env_percpu_put(env);
 	}
+
+	if (OBD_FAIL_PRECHECK(OBD_FAIL_LLITE_PAGE_INVALIDATE_PAUSE)) {
+		unlock_page(vmpage);
+		OBD_FAIL_TIMEOUT(OBD_FAIL_LLITE_PAGE_INVALIDATE_PAUSE,
+				 cfs_fail_val);
+		lock_page(vmpage);
+	}
 }
 
 static int ll_releasepage(struct page *vmpage, gfp_t gfp_mask)

[10/40] lustre: llite: check truncated page in ->readpage()

Commit Message

Patch