From patchwork Thu Dec 16 23:48:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683093 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E82EAC433F5 for ; Thu, 16 Dec 2021 23:52:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 65EB16B0073; Thu, 16 Dec 2021 18:52:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E7C46B0074; Thu, 16 Dec 2021 18:52:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 439266B0075; Thu, 16 Dec 2021 18:52:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0043.hostedemail.com [216.40.44.43]) by kanga.kvack.org (Postfix) with ESMTP id 31F496B0073 for ; Thu, 16 Dec 2021 18:52:32 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D73B48249980 for ; Thu, 16 Dec 2021 23:52:21 +0000 (UTC) X-FDA: 78925308882.16.27967AC Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf13.hostedemail.com (Postfix) with ESMTP id 1DEF320010 for ; Thu, 16 Dec 2021 23:52:15 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 165891F3A1; Thu, 16 Dec 2021 23:52:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698740; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ug4yW5gOL63SqKbbFD2vKNdEEo7P9U8FK5i/2jPwxYk=; b=pJGrgR6BEwcIRhGSksUqNQsRNwXGhcim6tBNCAVsbADwae8I/iY+PxMl2+Oe1IhXY6sI75 SCfb9rfxZL6Hqvgrb5gHpQwa6CKAGbyvX16n7huUCCe3rTUTLphQJG6CZa/AyhtKnnO4Jf UpHISF2nWdOc3w+whip6Y+zta5Ae7X8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698740; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ug4yW5gOL63SqKbbFD2vKNdEEo7P9U8FK5i/2jPwxYk=; b=fTSHn9rcYsuxjAxKyTPBELQNYHb2uC3gKPXE53GOUp81yR7qaysg1lDeBl2XfhgzenTn9n 4xVEdjkR6jtp87Dw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id C47E613EFD; Thu, 16 Dec 2021 23:52:16 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id iYrDHzDRu2E5WwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:52:16 +0000 Subject: [PATCH 01/18] Structural cleanup for filesystem-based swap From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:22 +1100 Message-ID: <163969850251.20885.10819272484905153807.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 1DEF320010 X-Stat-Signature: h1uadsgsnw18mzftge55s9ttgnjukger Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=pJGrgR6B; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=fTSHn9rc; spf=pass (imf13.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1639698735-420756 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Linux primarily uses IO to block devices for swap, but can send the IO requests to a filesystem. This has only ever worked for NFS, and that hasn't worked for a while due to a lack of testing. This seems like a good time for some tidy-up before restoring swap-over-NFS functionality. This patch: - updates the documentation (both copies!) for swap_activate which is woefully out-of-date - introduces a new address_space operation "swap_rw" for swap IO. The code currently used ->readpage for reads and ->direct_IO for writes. The former imposes a limit of one-page-at-a-time, the later means that direct writes and swap writes are encouraged to use the same path. While similar, swap can often be simpler as it can assume that no allocation is needed, and coherence with the page cache is irrelevant. - move the responsibility for setting SWP_FS_OPS to ->swap_activate() and also requires it to always call add_swap_extent(). This makes it much easier to find filesystems that require SWP_FS_OPS. - drops the call to the filesystem for ->set_page_dirty(). These pages do not belong to the filesystem, and it has no interest in the dirty status. writeout is switched to ->swap_rw, but read-in is not as that requires too much change for this patch. Both cifs and nfs set SWP_FS_OPS but neither provide a swap_rw, so both will now fail to activate swap. cifs never really tried to provide swap support as ->direct_IO always returns an error. NFS will be fixed up with following patches. Signed-off-by: NeilBrown Reported-by: kernel test robot --- Documentation/filesystems/locking.rst | 18 ++++++++++++------ Documentation/filesystems/vfs.rst | 17 ++++++++++++----- fs/cifs/file.c | 7 ++++++- fs/nfs/file.c | 17 +++++++++++++++-- include/linux/fs.h | 1 + include/linux/swap.h | 1 - mm/page_io.c | 26 ++++++-------------------- mm/swap_state.c | 2 +- mm/swapfile.c | 10 +++------- 9 files changed, 56 insertions(+), 43 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index d36fe79167b3..c2bb753bf688 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -265,8 +265,9 @@ prototypes:: int (*launder_page)(struct page *); int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long); int (*error_remove_page)(struct address_space *, struct page *); - int (*swap_activate)(struct file *); + int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span) int (*swap_deactivate)(struct file *); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); locking rules: All except set_page_dirty and freepage may block @@ -295,6 +296,7 @@ is_partially_uptodate: yes error_remove_page: yes swap_activate: no swap_deactivate: no +swap_rw: yes, unlocks ====================== ======================== ========= =============== ->write_begin(), ->write_end() and ->readpage() may be called from @@ -397,15 +399,19 @@ cleaned, or an error value if not. Note that in order to prevent the page getting mapped back in and redirtied, it needs to be kept locked across the entire operation. -->swap_activate will be called with a non-zero argument on -files backing (non block device backed) swapfiles. A return value -of zero indicates success, in which case this file can be used for -backing swapspace. The swapspace operations will be proxied to the -address space operations. +->swap_activate() will be called to prepare the given file for swap. It +should perform any validation and preparation necessary to ensure that +writes can be performed with minimal memory allocation. It should call +add_swap_extent(), or the helper iomap_swapfile_activate(), and return +the number of extents added. If IO should be submitted through +->swap_rw(), it should set SWP_FS_OPS, otherwise IO will be submitted +directly to the block device ``sis->bdev``. ->swap_deactivate() will be called in the sys_swapoff() path after ->swap_activate() returned success. +->swap_rw will be called for swap IO if ->swap_activate() set SWP_FS_OPS. + file_lock_operations ==================== diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index bf5c48066fac..70d7ce335565 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -751,8 +751,9 @@ cache in your filesystem. The following members are defined: unsigned long); void (*is_dirty_writeback) (struct page *, bool *, bool *); int (*error_remove_page) (struct mapping *mapping, struct page *page); - int (*swap_activate)(struct file *); + int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span) int (*swap_deactivate)(struct file *); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); }; ``writepage`` @@ -959,15 +960,21 @@ cache in your filesystem. The following members are defined: unless you have them locked or reference counts increased. ``swap_activate`` - Called when swapon is used on a file to allocate space if - necessary and pin the block lookup information in memory. A - return value of zero indicates success, in which case this file - can be used to back swapspace. + + Called to prepare the given file for swap. It should perform + any validation and preparation necessary to ensure that writes + can be performed with minimal memory allocation. It should call + add_swap_extent(), or the helper iomap_swapfile_activate(), and + return the number of extents added. If IO should be submitted + through ->swap_rw(), it should set SWP_FS_OPS, otherwise IO will + be submitted directly to the block device ``sis->bdev``. ``swap_deactivate`` Called during swapoff on files where swap_activate was successful. +``swap_rw`` + Called to read or write swap pages when swap_activate() set SWP_FS_OPS. The File Object =============== diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 9fee3af83a73..50bebf5f15cc 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -4943,6 +4943,10 @@ static int cifs_swap_activate(struct swap_info_struct *sis, cifs_dbg(FYI, "swap activate\n"); + if (!swap_file->f_mapping->a_ops->swap_rw) + /* Cannot support swap */ + return -EINVAL; + spin_lock(&inode->i_lock); blocks = inode->i_blocks; isize = inode->i_size; @@ -4971,7 +4975,8 @@ static int cifs_swap_activate(struct swap_info_struct *sis, * from reading or writing the file */ - return 0; + sis->flags |= SWP_FS_OPS; + return add_swap_extent(sis, 0, sis->max, 0); } static void cifs_swap_deactivate(struct file *file) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 24e7dccce355..0d33c95eefb6 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -489,9 +489,14 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, { unsigned long blocks; long long isize; + int ret; struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); struct inode *inode = file->f_mapping->host; + if (!file->f_mapping->a_ops->swap_rw) + /* Cannot support swap */ + return -EINVAL; + spin_lock(&inode->i_lock); blocks = inode->i_blocks; isize = inode->i_size; @@ -501,9 +506,17 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, return -EINVAL; } + ret = rpc_clnt_swap_activate(clnt); + if (ret) + return ret; + ret = add_swap_extent(sis, 0, sis->max, 0); + if (ret < 0) { + rpc_clnt_swap_deactivate(clnt); + return ret; + } *span = sis->pages; - - return rpc_clnt_swap_activate(clnt); + sis->flags |= SWP_FS_OPS; + return ret; } static void nfs_swap_deactivate(struct file *file) diff --git a/include/linux/fs.h b/include/linux/fs.h index bbf812ce89a8..deaaf359cc49 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -415,6 +415,7 @@ struct address_space_operations { int (*swap_activate)(struct swap_info_struct *sis, struct file *file, sector_t *span); void (*swap_deactivate)(struct file *file); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); }; extern const struct address_space_operations empty_aops; diff --git a/include/linux/swap.h b/include/linux/swap.h index d1ea44b31f19..10b2a92c1aa1 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -427,7 +427,6 @@ extern int swap_writepage(struct page *page, struct writeback_control *wbc); extern void end_swap_bio_write(struct bio *bio); extern int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func); -extern int swap_set_page_dirty(struct page *page); int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); diff --git a/mm/page_io.c b/mm/page_io.c index 9725c7e1eeea..cb617a4f59df 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -307,10 +307,9 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, set_page_writeback(page); unlock_page(page); - ret = mapping->a_ops->direct_IO(&kiocb, &from); - if (ret == PAGE_SIZE) { + ret = mapping->a_ops->swap_rw(&kiocb, &from); + if (ret == 0) { count_vm_event(PSWPOUT); - ret = 0; } else { /* * In the case of swap-over-nfs, this can be a @@ -378,10 +377,11 @@ int swap_readpage(struct page *page, bool synchronous) } if (data_race(sis->flags & SWP_FS_OPS)) { - struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; + //struct file *swap_file = sis->swap_file; + //struct address_space *mapping = swap_file->f_mapping; - ret = mapping->a_ops->readpage(swap_file, page); + /* This needs to use ->swap_rw() */ + ret = -EINVAL; if (!ret) count_vm_event(PSWPIN); goto out; @@ -434,17 +434,3 @@ int swap_readpage(struct page *page, bool synchronous) psi_memstall_leave(&pflags); return ret; } - -int swap_set_page_dirty(struct page *page) -{ - struct swap_info_struct *sis = page_swap_info(page); - - if (data_race(sis->flags & SWP_FS_OPS)) { - struct address_space *mapping = sis->swap_file->f_mapping; - - VM_BUG_ON_PAGE(!PageSwapCache(page), page); - return mapping->a_ops->set_page_dirty(page); - } else { - return __set_page_dirty_no_writeback(page); - } -} diff --git a/mm/swap_state.c b/mm/swap_state.c index 8d4104242100..616eb1d75b35 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -30,7 +30,7 @@ */ static const struct address_space_operations swap_aops = { .writepage = swap_writepage, - .set_page_dirty = swap_set_page_dirty, + .set_page_dirty = __set_page_dirty_no_writeback, #ifdef CONFIG_MIGRATION .migratepage = migrate_page, #endif diff --git a/mm/swapfile.c b/mm/swapfile.c index e59e08ef46e1..419eacf474c5 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2397,13 +2397,9 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span) if (mapping->a_ops->swap_activate) { ret = mapping->a_ops->swap_activate(sis, swap_file, span); - if (ret >= 0) - sis->flags |= SWP_ACTIVATED; - if (!ret) { - sis->flags |= SWP_FS_OPS; - ret = add_swap_extent(sis, 0, sis->max, 0); - *span = sis->pages; - } + if (ret < 0) + return ret; + sis->flags |= SWP_ACTIVATED; return ret; } From patchwork Thu Dec 16 23:48:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683095 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76AE9C433EF for ; Thu, 16 Dec 2021 23:53:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 219AC6B0074; Thu, 16 Dec 2021 18:52:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A1526B0075; Thu, 16 Dec 2021 18:52:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F36FB6B0078; Thu, 16 Dec 2021 18:52:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0038.hostedemail.com [216.40.44.38]) by kanga.kvack.org (Postfix) with ESMTP id E0D446B0074 for ; Thu, 16 Dec 2021 18:52:38 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 9D224181AEF09 for ; Thu, 16 Dec 2021 23:52:28 +0000 (UTC) X-FDA: 78925309176.08.AEE3C61 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf19.hostedemail.com (Postfix) with ESMTP id EDE4C1A0012 for ; Thu, 16 Dec 2021 23:52:27 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E2E8A2111A; Thu, 16 Dec 2021 23:52:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698746; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cY/eypVNNHD1cZaQ8vKdoi5Gyv0t4LUDjHUscRDhk30=; b=IgDmkb062+ktR5wPexm2UVfch18gGVZ11eht2BuUDkkBL+dhtRIv2TlMVbD+PFiNuApmHc DBNzc/PPZxmlcaJVWCQ63pfrReLguAwm6IjgWI7Jcda9tvgSdUdYIQgsJuPTLt7HoKpOGs l39rJ0Hyxb6+ZhXLa3IE46nbdMtt5fs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698746; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cY/eypVNNHD1cZaQ8vKdoi5Gyv0t4LUDjHUscRDhk30=; b=VAsVCjzY8r3AO1a1zXPZMQ2VBuuAVHJe41M1BabTlKf+kOjd7EalIvtWnd1UzGtXHPKH5g C4/MHD2CajK+GMBQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9C74613EFD; Thu, 16 Dec 2021 23:52:23 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id W5H4FTfRu2FIWwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:52:23 +0000 Subject: [PATCH 02/18] MM: create new mm/swap.h header file. From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:22 +1100 Message-ID: <163969850279.20885.7172996032577523902.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: EDE4C1A0012 X-Stat-Signature: rkft13d1jg8ws3u8ti51w1h5p1r61mdk Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=IgDmkb06; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=VAsVCjzY; spf=pass (imf19.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1639698747-830793 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Many functions declared in include/linux/swap.h are only used within mm/ Create a new "mm/swap.h" and move some of these declarations there. Remove the redundant 'extern' from the function declarations. Signed-off-by: NeilBrown Reported-by: kernel test robot Reviewed-by: Christoph Hellwig --- include/linux/swap.h | 121 ----------------------------------------------- mm/madvise.c | 1 mm/memory.c | 1 mm/mincore.c | 1 mm/page_alloc.c | 1 mm/page_io.c | 1 mm/shmem.c | 1 mm/swap.h | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++ mm/swap_state.c | 1 mm/swapfile.c | 1 mm/util.c | 1 mm/vmscan.c | 1 12 files changed, 139 insertions(+), 121 deletions(-) create mode 100644 mm/swap.h diff --git a/include/linux/swap.h b/include/linux/swap.h index 10b2a92c1aa1..6a0c25c0bb95 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -419,61 +419,18 @@ extern void kswapd_stop(int nid); #ifdef CONFIG_SWAP -#include /* for bio_end_io_t */ - -/* linux/mm/page_io.c */ -extern int swap_readpage(struct page *page, bool do_poll); -extern int swap_writepage(struct page *page, struct writeback_control *wbc); -extern void end_swap_bio_write(struct bio *bio); -extern int __swap_writepage(struct page *page, struct writeback_control *wbc, - bio_end_io_t end_write_func); - int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); int generic_swapfile_activate(struct swap_info_struct *, struct file *, sector_t *); -/* linux/mm/swap_state.c */ -/* One swap address space for each 64M swap space */ -#define SWAP_ADDRESS_SPACE_SHIFT 14 -#define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT) -extern struct address_space *swapper_spaces[]; -#define swap_address_space(entry) \ - (&swapper_spaces[swp_type(entry)][swp_offset(entry) \ - >> SWAP_ADDRESS_SPACE_SHIFT]) static inline unsigned long total_swapcache_pages(void) { return global_node_page_state(NR_SWAPCACHE); } -extern void show_swap_cache_info(void); -extern int add_to_swap(struct page *page); -extern void *get_shadow_from_swap_cache(swp_entry_t entry); -extern int add_to_swap_cache(struct page *page, swp_entry_t entry, - gfp_t gfp, void **shadowp); -extern void __delete_from_swap_cache(struct page *page, - swp_entry_t entry, void *shadow); -extern void delete_from_swap_cache(struct page *); -extern void clear_shadow_from_swap_cache(int type, unsigned long begin, - unsigned long end); -extern void free_swap_cache(struct page *); extern void free_page_and_swap_cache(struct page *); extern void free_pages_and_swap_cache(struct page **, int); -extern struct page *lookup_swap_cache(swp_entry_t entry, - struct vm_area_struct *vma, - unsigned long addr); -struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index); -extern struct page *read_swap_cache_async(swp_entry_t, gfp_t, - struct vm_area_struct *vma, unsigned long addr, - bool do_poll); -extern struct page *__read_swap_cache_async(swp_entry_t, gfp_t, - struct vm_area_struct *vma, unsigned long addr, - bool *new_page_allocated); -extern struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); -extern struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); - /* linux/mm/swapfile.c */ extern atomic_long_t nr_swap_pages; extern long total_swap_pages; @@ -527,12 +484,6 @@ static inline void put_swap_device(struct swap_info_struct *si) } #else /* CONFIG_SWAP */ - -static inline int swap_readpage(struct page *page, bool do_poll) -{ - return 0; -} - static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry) { return NULL; @@ -547,11 +498,6 @@ static inline void put_swap_device(struct swap_info_struct *si) { } -static inline struct address_space *swap_address_space(swp_entry_t entry) -{ - return NULL; -} - #define get_nr_swap_pages() 0L #define total_swap_pages 0L #define total_swapcache_pages() 0UL @@ -566,14 +512,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry) #define free_pages_and_swap_cache(pages, nr) \ release_pages((pages), (nr)); -static inline void free_swap_cache(struct page *page) -{ -} - -static inline void show_swap_cache_info(void) -{ -} - /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */ #define free_swap_and_cache(e) is_pfn_swap_entry(e) @@ -599,65 +537,6 @@ static inline void put_swap_page(struct page *page, swp_entry_t swp) { } -static inline struct page *swap_cluster_readahead(swp_entry_t entry, - gfp_t gfp_mask, struct vm_fault *vmf) -{ - return NULL; -} - -static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask, - struct vm_fault *vmf) -{ - return NULL; -} - -static inline int swap_writepage(struct page *p, struct writeback_control *wbc) -{ - return 0; -} - -static inline struct page *lookup_swap_cache(swp_entry_t swp, - struct vm_area_struct *vma, - unsigned long addr) -{ - return NULL; -} - -static inline -struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index) -{ - return find_get_page(mapping, index); -} - -static inline int add_to_swap(struct page *page) -{ - return 0; -} - -static inline void *get_shadow_from_swap_cache(swp_entry_t entry) -{ - return NULL; -} - -static inline int add_to_swap_cache(struct page *page, swp_entry_t entry, - gfp_t gfp_mask, void **shadowp) -{ - return -1; -} - -static inline void __delete_from_swap_cache(struct page *page, - swp_entry_t entry, void *shadow) -{ -} - -static inline void delete_from_swap_cache(struct page *page) -{ -} - -static inline void clear_shadow_from_swap_cache(int type, unsigned long begin, - unsigned long end) -{ -} static inline int page_swapcount(struct page *page) { diff --git a/mm/madvise.c b/mm/madvise.c index 8c927202bbe6..724470773582 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -33,6 +33,7 @@ #include #include "internal.h" +#include "swap.h" struct madvise_walk_private { struct mmu_gather *tlb; diff --git a/mm/memory.c b/mm/memory.c index 8f1de811a1dc..80bbfd449b40 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -85,6 +85,7 @@ #include "pgalloc-track.h" #include "internal.h" +#include "swap.h" #if defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS) && !defined(CONFIG_COMPILE_TEST) #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid. diff --git a/mm/mincore.c b/mm/mincore.c index 9122676b54d6..f4f627325e12 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -20,6 +20,7 @@ #include #include +#include "swap.h" static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, unsigned long end, struct mm_walk *walk) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c5952749ad40..c0b7a3878801 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -78,6 +78,7 @@ #include "internal.h" #include "shuffle.h" #include "page_reporting.h" +#include "swap.h" /* Free Page Internal flags: for internal, non-pcp variants of free_pages(). */ typedef int __bitwise fpi_t; diff --git a/mm/page_io.c b/mm/page_io.c index cb617a4f59df..a9fe5de5dc32 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -25,6 +25,7 @@ #include #include #include +#include "swap.h" void end_swap_bio_write(struct bio *bio) { diff --git a/mm/shmem.c b/mm/shmem.c index 18f93c2d68f1..993b6b3ca20f 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -39,6 +39,7 @@ #include #include #include +#include "swap.h" static struct vfsmount *shm_mnt; diff --git a/mm/swap.h b/mm/swap.h new file mode 100644 index 000000000000..13e72a5023aa --- /dev/null +++ b/mm/swap.h @@ -0,0 +1,129 @@ + +#ifdef CONFIG_SWAP +#include /* for bio_end_io_t */ + +/* linux/mm/page_io.c */ +int swap_readpage(struct page *page, bool do_poll); +int swap_writepage(struct page *page, struct writeback_control *wbc); +void end_swap_bio_write(struct bio *bio); +int __swap_writepage(struct page *page, struct writeback_control *wbc, + bio_end_io_t end_write_func); + +/* linux/mm/swap_state.c */ +/* One swap address space for each 64M swap space */ +#define SWAP_ADDRESS_SPACE_SHIFT 14 +#define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT) +extern struct address_space *swapper_spaces[]; +#define swap_address_space(entry) \ + (&swapper_spaces[swp_type(entry)][swp_offset(entry) \ + >> SWAP_ADDRESS_SPACE_SHIFT]) + +void show_swap_cache_info(void); +int add_to_swap(struct page *page); +void *get_shadow_from_swap_cache(swp_entry_t entry); +int add_to_swap_cache(struct page *page, swp_entry_t entry, + gfp_t gfp, void **shadowp); +void __delete_from_swap_cache(struct page *page, + swp_entry_t entry, void *shadow); +void delete_from_swap_cache(struct page *); +void clear_shadow_from_swap_cache(int type, unsigned long begin, + unsigned long end); +void free_swap_cache(struct page *); +struct page *lookup_swap_cache(swp_entry_t entry, + struct vm_area_struct *vma, + unsigned long addr); +struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index); + +struct page *read_swap_cache_async(swp_entry_t, gfp_t, + struct vm_area_struct *vma, + unsigned long addr, + bool do_poll); +struct page *__read_swap_cache_async(swp_entry_t, gfp_t, + struct vm_area_struct *vma, + unsigned long addr, + bool *new_page_allocated); +struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf); +struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf); + +#else /* CONFIG_SWAP */ +static inline int swap_readpage(struct page *page, bool do_poll) +{ + return 0; +} + +static inline struct address_space *swap_address_space(swp_entry_t entry) +{ + return NULL; +} + +static inline void free_swap_cache(struct page *page) +{ +} + +static inline void show_swap_cache_info(void) +{ +} + +static inline struct page *swap_cluster_readahead(swp_entry_t entry, + gfp_t gfp_mask, struct vm_fault *vmf) +{ + return NULL; +} + +static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask, + struct vm_fault *vmf) +{ + return NULL; +} + +static inline int swap_writepage(struct page *p, struct writeback_control *wbc) +{ + return 0; +} + +static inline struct page *lookup_swap_cache(swp_entry_t swp, + struct vm_area_struct *vma, + unsigned long addr) +{ + return NULL; +} + +static inline +struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index) +{ + return find_get_page(mapping, index); +} + +static inline int add_to_swap(struct page *page) +{ + return 0; +} + +static inline void *get_shadow_from_swap_cache(swp_entry_t entry) +{ + return NULL; +} + +static inline int add_to_swap_cache(struct page *page, swp_entry_t entry, + gfp_t gfp_mask, void **shadowp) +{ + return -1; +} + +static inline void __delete_from_swap_cache(struct page *page, + swp_entry_t entry, void *shadow) +{ +} + +static inline void delete_from_swap_cache(struct page *page) +{ +} + +static inline void clear_shadow_from_swap_cache(int type, unsigned long begin, + unsigned long end) +{ +} + +#endif /* CONFIG_SWAP */ diff --git a/mm/swap_state.c b/mm/swap_state.c index 616eb1d75b35..514b86b05488 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -23,6 +23,7 @@ #include #include #include "internal.h" +#include "swap.h" /* * swapper_space is a fiction, retained to simplify the path through diff --git a/mm/swapfile.c b/mm/swapfile.c index 419eacf474c5..f23d9ff21cf8 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -44,6 +44,7 @@ #include #include #include +#include "swap.h" static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); diff --git a/mm/util.c b/mm/util.c index 741ba32a43ac..7e26387be090 100644 --- a/mm/util.c +++ b/mm/util.c @@ -27,6 +27,7 @@ #include #include "internal.h" +#include "swap.h" /** * kfree_const - conditionally free memory diff --git a/mm/vmscan.c b/mm/vmscan.c index fb9584641ac7..969bcdb4ca80 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -58,6 +58,7 @@ #include #include "internal.h" +#include "swap.h" #define CREATE_TRACE_POINTS #include From patchwork Thu Dec 16 23:48:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683097 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60E10C433F5 for ; Thu, 16 Dec 2021 23:54:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7BEE16B0075; Thu, 16 Dec 2021 18:52:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 77BDD6B0078; Thu, 16 Dec 2021 18:52:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E69E6B007B; Thu, 16 Dec 2021 18:52:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0247.hostedemail.com [216.40.44.247]) by kanga.kvack.org (Postfix) with ESMTP id 4BCD26B0075 for ; Thu, 16 Dec 2021 18:52:46 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0B0C77F170 for ; Thu, 16 Dec 2021 23:52:36 +0000 (UTC) X-FDA: 78925309470.06.FBD7604 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf04.hostedemail.com (Postfix) with ESMTP id 844BF40010 for ; Thu, 16 Dec 2021 23:52:35 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 89E8A1F3A1; Thu, 16 Dec 2021 23:52:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698754; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F0tABiaZQk9B5WXZcUDiE47XPd+0Z5cOdU9e2/lma0U=; b=KZSwyyuWY2VWvIvcGn7LPjf7u5kF+as1hgafoHApqb8TZij/gCuDcHnOrqNxf4dj9KzGQO hS1mjVQL+/Xytno7nBdjSb1t77v9Df73wvzlVNEPUS6+ZXU/rj7+hWHHYH6VXmyoZ7uaS4 sN6qeUEEmgBiDSHl+rBnJ2X/3O7629M= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698754; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=F0tABiaZQk9B5WXZcUDiE47XPd+0Z5cOdU9e2/lma0U=; b=0qMSxxYEfOQW5SiOR1a3Dnhzh13qBxtxIBrYxfHWzD6RFMuDAHw9EOwbCpkuTxM3+gfVlt Fx9Nb/8ydBgKN9Bg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 92E1B13EFD; Thu, 16 Dec 2021 23:52:31 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 4DHbEz/Ru2FSWwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:52:31 +0000 Subject: [PATCH 03/18] MM: use ->swap_rw for reads from SWP_FS_OPS swap-space From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:22 +1100 Message-ID: <163969850289.20885.1044395970457169316.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Queue-Id: 844BF40010 X-Stat-Signature: aghz7ktkdd4n4ygd8at45orjrd4uuut1 Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=KZSwyyuW; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=0qMSxxYE; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf04.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de X-Rspamd-Server: rspam11 X-HE-Tag: 1639698755-123142 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To submit an async read with ->swap_rw() we need to allocate a structure to hold the kiocb and other details. swap_readpage() cannot handle transient failure, so create a mempool to provide the structures. Signed-off-by: NeilBrown --- mm/page_io.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++------ mm/swap.h | 1 + mm/swapfile.c | 5 +++++ 3 files changed, 58 insertions(+), 6 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index a9fe5de5dc32..47d7e7866e33 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -283,6 +283,23 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page) #define bio_associate_blkg_from_page(bio, page) do { } while (0) #endif /* CONFIG_MEMCG && CONFIG_BLK_CGROUP */ +struct swap_iocb { + struct kiocb iocb; + struct bio_vec bvec; +}; +static mempool_t *sio_pool; + +int sio_pool_init(void) +{ + if (!sio_pool) + sio_pool = mempool_create_kmalloc_pool( + SWAP_CLUSTER_MAX, sizeof(struct swap_iocb)); + if (sio_pool) + return 0; + else + return -ENOMEM; +} + int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func) { @@ -353,6 +370,23 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return 0; } +static void sio_read_complete(struct kiocb *iocb, long ret) +{ + struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); + struct page *page = sio->bvec.bv_page; + + if (ret != 0 && ret != PAGE_SIZE) { + SetPageError(page); + ClearPageUptodate(page); + pr_alert_ratelimited("Read-error on swap-device\n"); + } else { + SetPageUptodate(page); + count_vm_event(PSWPIN); + } + unlock_page(page); + mempool_free(sio, sio_pool); +} + int swap_readpage(struct page *page, bool synchronous) { struct bio *bio; @@ -378,13 +412,25 @@ int swap_readpage(struct page *page, bool synchronous) } if (data_race(sis->flags & SWP_FS_OPS)) { - //struct file *swap_file = sis->swap_file; - //struct address_space *mapping = swap_file->f_mapping; + struct file *swap_file = sis->swap_file; + struct address_space *mapping = swap_file->f_mapping; + struct iov_iter from; + struct swap_iocb *sio; + loff_t pos = page_file_offset(page); + + sio = mempool_alloc(sio_pool, GFP_KERNEL); + init_sync_kiocb(&sio->iocb, swap_file); + sio->iocb.ki_pos = pos; + sio->iocb.ki_complete = sio_read_complete; + sio->bvec.bv_page = page; + sio->bvec.bv_len = PAGE_SIZE; + sio->bvec.bv_offset = 0; + + iov_iter_bvec(&from, READ, &sio->bvec, 1, PAGE_SIZE); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_read_complete(&sio->iocb, ret); - /* This needs to use ->swap_rw() */ - ret = -EINVAL; - if (!ret) - count_vm_event(PSWPIN); goto out; } diff --git a/mm/swap.h b/mm/swap.h index 13e72a5023aa..128a1d3e5558 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -3,6 +3,7 @@ #include /* for bio_end_io_t */ /* linux/mm/page_io.c */ +int sio_pool_init(void); int swap_readpage(struct page *page, bool do_poll); int swap_writepage(struct page *page, struct writeback_control *wbc); void end_swap_bio_write(struct bio *bio); diff --git a/mm/swapfile.c b/mm/swapfile.c index f23d9ff21cf8..43539be38e68 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2401,6 +2401,11 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span) if (ret < 0) return ret; sis->flags |= SWP_ACTIVATED; + if ((sis->flags & SWP_FS_OPS) && + sio_pool_init() != 0) { + destroy_swap_extents(sis); + return -ENOMEM; + } return ret; } From patchwork Thu Dec 16 23:48:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683099 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33729C433FE for ; Thu, 16 Dec 2021 23:54:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 043976B0078; Thu, 16 Dec 2021 18:52:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F0E0A6B007B; Thu, 16 Dec 2021 18:52:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D894B6B007D; Thu, 16 Dec 2021 18:52:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0141.hostedemail.com [216.40.44.141]) by kanga.kvack.org (Postfix) with ESMTP id C6C9A6B0078 for ; Thu, 16 Dec 2021 18:52:53 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 8F05983EA6 for ; Thu, 16 Dec 2021 23:52:43 +0000 (UTC) X-FDA: 78925309806.05.FAE8DEB Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf09.hostedemail.com (Postfix) with ESMTP id 9D9A7140015 for ; Thu, 16 Dec 2021 23:52:39 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 17F9E210F3; Thu, 16 Dec 2021 23:52:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698762; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GPxsJOrOMxi9W5MpZzSggupSlv/3nIfHiMP/Yesmfsw=; b=MuDDyFqHtBt1aEs14cVhsW02Jvy0k0wxqD7Vav6pcBvnKo58u2G5gFxBF1JFJTJliUc9zD NsEhT5ry7z171Jx95ixF46jUGHKwOERzUSkUQywWL+tzly0QLpb2Gd9kIgKa7PjMXvQvTD jv72nOc8yZo6JTO089NyMewTrp+paek= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698762; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GPxsJOrOMxi9W5MpZzSggupSlv/3nIfHiMP/Yesmfsw=; b=RVrKsQ2ieCeoEgWQYan3nonQiOfjW0l6/g6u/xVpJhL0hkOBOb6mn1sz/uz4zgLbbxVcVf TahbQpj/FtUCtaCw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id F273713EFD; Thu, 16 Dec 2021 23:52:38 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 09KBKkbRu2FbWwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:52:38 +0000 Subject: [PATCH 04/18] MM: perform async writes to SWP_FS_OPS swap-space From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:22 +1100 Message-ID: <163969850292.20885.16191050558510542930.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=MuDDyFqH; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=RVrKsQ2i; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf09.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 9D9A7140015 X-Stat-Signature: bcspbz5tubbrrqugm77yqxpt5y1333yp X-HE-Tag: 1639698759-220826 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Writes to SWP_FS_OPS swapspace is currently synchronous. To make it async we need to allocate the kiocb struct which may block, but won't block as long as waiting for the write to complete would block. Signed-off-by: NeilBrown --- mm/page_io.c | 69 +++++++++++++++++++++++++++++++++------------------------- 1 file changed, 39 insertions(+), 30 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 47d7e7866e33..84859132c9c6 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -300,6 +300,32 @@ int sio_pool_init(void) return -ENOMEM; } +static void sio_write_complete(struct kiocb *iocb, long ret) +{ + struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); + struct page *page = sio->bvec.bv_page; + + if (ret != 0 && ret != PAGE_SIZE) { + /* + * In the case of swap-over-nfs, this can be a + * temporary failure if the system has limited + * memory for allocating transmit buffers. + * Mark the page dirty and avoid + * folio_rotate_reclaimable but rate-limit the + * messages but do not flag PageError like + * the normal direct-to-bio case as it could + * be temporary. + */ + set_page_dirty(page); + ClearPageReclaim(page); + pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n", + ret, page_file_offset(page)); + } else + count_vm_event(PSWPOUT); + end_page_writeback(page); + mempool_free(sio, sio_pool); +} + int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func) { @@ -309,42 +335,25 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, VM_BUG_ON_PAGE(!PageSwapCache(page), page); if (data_race(sis->flags & SWP_FS_OPS)) { - struct kiocb kiocb; + struct swap_iocb *sio; struct file *swap_file = sis->swap_file; struct address_space *mapping = swap_file->f_mapping; - struct bio_vec bv = { - .bv_page = page, - .bv_len = PAGE_SIZE, - .bv_offset = 0 - }; struct iov_iter from; - iov_iter_bvec(&from, WRITE, &bv, 1, PAGE_SIZE); - init_sync_kiocb(&kiocb, swap_file); - kiocb.ki_pos = page_file_offset(page); - set_page_writeback(page); unlock_page(page); - ret = mapping->a_ops->swap_rw(&kiocb, &from); - if (ret == 0) { - count_vm_event(PSWPOUT); - } else { - /* - * In the case of swap-over-nfs, this can be a - * temporary failure if the system has limited - * memory for allocating transmit buffers. - * Mark the page dirty and avoid - * folio_rotate_reclaimable but rate-limit the - * messages but do not flag PageError like - * the normal direct-to-bio case as it could - * be temporary. - */ - set_page_dirty(page); - ClearPageReclaim(page); - pr_err_ratelimited("Write error on dio swapfile (%llu)\n", - page_file_offset(page)); - } - end_page_writeback(page); + sio = mempool_alloc(sio_pool, GFP_NOIO); + init_sync_kiocb(&sio->iocb, swap_file); + sio->iocb.ki_complete = sio_write_complete; + sio->iocb.ki_pos = page_file_offset(page); + sio->bvec.bv_page = page; + sio->bvec.bv_len = PAGE_SIZE; + sio->bvec.bv_offset = 0; + iov_iter_bvec(&from, WRITE, &sio->bvec, 1, PAGE_SIZE); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_write_complete(&sio->iocb, ret); + return ret; } From patchwork Thu Dec 16 23:48:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683101 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C205AC433F5 for ; Thu, 16 Dec 2021 23:55:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C513B6B007B; Thu, 16 Dec 2021 18:53:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BD9536B007D; Thu, 16 Dec 2021 18:53:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A54946B007E; Thu, 16 Dec 2021 18:53:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0198.hostedemail.com [216.40.44.198]) by kanga.kvack.org (Postfix) with ESMTP id 9526B6B007B for ; Thu, 16 Dec 2021 18:53:01 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5ED90180E1E3D for ; Thu, 16 Dec 2021 23:52:51 +0000 (UTC) X-FDA: 78925310142.13.E835E37 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf27.hostedemail.com (Postfix) with ESMTP id 8CDBB40018 for ; Thu, 16 Dec 2021 23:52:50 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 964391F3A1; Thu, 16 Dec 2021 23:52:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698769; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7ousG1C9Pxwr7ZMS3mSr07mvDQMxv5ZghnPgeMf5y5g=; b=LZTBmx+YstZRDskbnMwV6zkDjYe93SBkIGqjHStCAnwzjkMxskosNMqvaccsA+Kajs/Y3e 4TyAjP1udslAhzgZqWgznoYCzWqn0Y95H78dR0yIqzarh8rdmbErnaoKXTUpSNXPgyKvT/ G+IUkpEySr3DlAty65nYefYOE+JC1kI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698769; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7ousG1C9Pxwr7ZMS3mSr07mvDQMxv5ZghnPgeMf5y5g=; b=IxGPaeBY0t2YOmBoUfLGHbsyDNeAXuMNUW8TD3DA5PWYhO6cDHdZji2FUTqXBz5ZTOh7Vx 5QIeulAD/grm24Dg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 87F9513EFD; Thu, 16 Dec 2021 23:52:46 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id vcfFEE7Ru2FlWwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:52:46 +0000 Subject: [PATCH 05/18] MM: reclaim mustn't enter FS for SWP_FS_OPS swap-space From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:22 +1100 Message-ID: <163969850295.20885.4255989535187500085.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 8CDBB40018 X-Stat-Signature: zojuaw391dbd71jm5usgtbfefoocjb63 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=LZTBmx+Y; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=IxGPaeBY; spf=pass (imf27.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1639698770-885560 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If swap-out is using filesystem operations (SWP_FS_OPS), then it is not safe to enter the FS for reclaim. So only down-grade the requirement for swap pages to __GFP_IO after checking that SWP_FS_OPS are not being used. Signed-off-by: NeilBrown Reported-by: kernel test robot --- mm/vmscan.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 969bcdb4ca80..5f460d174b1b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1465,6 +1465,21 @@ static unsigned int demote_page_list(struct list_head *demote_pages, return nr_succeeded; } +static bool test_may_enter_fs(struct page *page, gfp_t gfp_mask) +{ + if (gfp_mask & __GFP_FS) + return true; + if (!PageSwapCache(page) || !(gfp_mask & __GFP_IO)) + return false; + /* We can "enter_fs" for swap-cache with only __GFP_IO + * providing this isn't SWP_FS_OPS. + * ->flags can be updated non-atomicially (scan_swap_map_slots), + * but that will never affect SWP_FS_OPS, so the data_race + * is safe. + */ + return !data_race(page_swap_info(page)->flags & SWP_FS_OPS); +} + /* * shrink_page_list() returns the number of reclaimed pages */ @@ -1514,8 +1529,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, if (!sc->may_unmap && page_mapped(page)) goto keep_locked; - may_enter_fs = (sc->gfp_mask & __GFP_FS) || - (PageSwapCache(page) && (sc->gfp_mask & __GFP_IO)); + may_enter_fs = test_may_enter_fs(page, sc->gfp_mask); /* * The number of dirty pages determines if a node is marked @@ -1683,7 +1697,8 @@ static unsigned int shrink_page_list(struct list_head *page_list, goto activate_locked_split; } - may_enter_fs = true; + may_enter_fs = test_may_enter_fs(page, + sc->gfp_mask); /* Adding to swap updated mapping */ mapping = page_mapping(page); From patchwork Thu Dec 16 23:48:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683103 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8C2AC433EF for ; Thu, 16 Dec 2021 23:55:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B3F4F6B007D; Thu, 16 Dec 2021 18:53:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AC85E6B007E; Thu, 16 Dec 2021 18:53:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91B9E6B0080; Thu, 16 Dec 2021 18:53:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0026.hostedemail.com [216.40.44.26]) by kanga.kvack.org (Postfix) with ESMTP id 7F0AE6B007D for ; Thu, 16 Dec 2021 18:53:08 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 4D401844FA for ; Thu, 16 Dec 2021 23:52:58 +0000 (UTC) X-FDA: 78925310436.05.F7135C7 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf04.hostedemail.com (Postfix) with ESMTP id A0A8440017 for ; Thu, 16 Dec 2021 23:52:57 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A2DE7210F3; Thu, 16 Dec 2021 23:52:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698776; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=67KP7naJ3z7jU42SdmmKmZdeiUNB4x0cSuduIliqibk=; b=nq5IUzzHkhy9FvG8LMFqnuULQenxkPt3HSQ+PMaHf/CyZYj7tAcyVYT+ZpvjJ2iMVzzv7Q dvf6IDowMCdLs75Rzcq6HFsD1jt6hsnjwGvhSj2piud2DoVod2sDrsKFYgyjNm5+7td6OD FEeuiHIEBdI/dY9eFciXfOvfQ7uL+MI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698776; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=67KP7naJ3z7jU42SdmmKmZdeiUNB4x0cSuduIliqibk=; b=NIFK0xMtiWHeN6asvz18UmH/0zQq9jvQzE495dU0e7PeoFtGDQF/Y7NziupPqyQfG5x90N VxS/K9E6fBbJebDw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 341ED13EFD; Thu, 16 Dec 2021 23:52:52 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 7dOJN1TRu2F2WwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:52:52 +0000 Subject: [PATCH 06/18] MM: submit multipage reads for SWP_FS_OPS swap-space From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:22 +1100 Message-ID: <163969850296.20885.16043920355602134308.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: A0A8440017 X-Stat-Signature: dawoeu9fafq4m898oc8wa59xs48qenb6 Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=nq5IUzzH; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=NIFK0xMt; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf04.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de X-HE-Tag: 1639698777-346285 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: swap_readpage() is given one page at a time, but maybe called repeatedly in succession. For block-device swapspace, the blk_plug functionality allows the multiple pages to be combined together at lower layers. That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page reads. With this patch we pass in a pointer-to-pointer when swap_readpage can store state between calls - much like the effect of blk_plug. After calling swap_readpage() some number of times, the state will be passed to swap_read_unplug() which can submit the combined request. Some caller currently call blk_finish_plug() *before* the final call to swap_readpage(), so the last page cannot be included. This patch moves blk_finish_plug() to after the last call, and calls swap_read_unplug() there too. Signed-off-by: NeilBrown Reported-by: kernel test robot --- mm/madvise.c | 8 +++-- mm/memory.c | 2 + mm/page_io.c | 95 ++++++++++++++++++++++++++++++++++++------------------- mm/swap.h | 13 ++++++-- mm/swap_state.c | 31 ++++++++++++------ 5 files changed, 100 insertions(+), 49 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 724470773582..a90870c7a2df 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -191,6 +191,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, pte_t *orig_pte; struct vm_area_struct *vma = walk->private; unsigned long index; + struct swap_iocb *splug = NULL; if (pmd_none_or_trans_huge_or_clear_bad(pmd)) return 0; @@ -212,10 +213,11 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, continue; page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, - vma, index, false); + vma, index, false, &splug); if (page) put_page(page); } + swap_read_unplug(splug); return 0; } @@ -231,6 +233,7 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma, XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start)); pgoff_t end_index = linear_page_index(vma, end + PAGE_SIZE - 1); struct page *page; + struct swap_iocb *splug = NULL; rcu_read_lock(); xas_for_each(&xas, page, end_index) { @@ -243,13 +246,14 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma, swap = radix_to_swp_entry(page); page = read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE, - NULL, 0, false); + NULL, 0, false, &splug); if (page) put_page(page); rcu_read_lock(); } rcu_read_unlock(); + swap_read_unplug(splug); lru_add_drain(); /* Push any new pages onto the LRU now */ } diff --git a/mm/memory.c b/mm/memory.c index 80bbfd449b40..0ca00f2a6890 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3538,7 +3538,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* To provide entry to swap_readpage() */ set_page_private(page, entry.val); - swap_readpage(page, true); + swap_readpage(page, true, NULL); set_page_private(page, 0); } } else { diff --git a/mm/page_io.c b/mm/page_io.c index 84859132c9c6..03fbf9463081 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -285,7 +285,8 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page) struct swap_iocb { struct kiocb iocb; - struct bio_vec bvec; + struct bio_vec bvec[SWAP_CLUSTER_MAX]; + int pages; }; static mempool_t *sio_pool; @@ -303,7 +304,7 @@ int sio_pool_init(void) static void sio_write_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); - struct page *page = sio->bvec.bv_page; + struct page *page = sio->bvec[0].bv_page; if (ret != 0 && ret != PAGE_SIZE) { /* @@ -346,10 +347,10 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, init_sync_kiocb(&sio->iocb, swap_file); sio->iocb.ki_complete = sio_write_complete; sio->iocb.ki_pos = page_file_offset(page); - sio->bvec.bv_page = page; - sio->bvec.bv_len = PAGE_SIZE; - sio->bvec.bv_offset = 0; - iov_iter_bvec(&from, WRITE, &sio->bvec, 1, PAGE_SIZE); + sio->bvec[0].bv_page = page; + sio->bvec[0].bv_len = PAGE_SIZE; + sio->bvec[0].bv_offset = 0; + iov_iter_bvec(&from, WRITE, &sio->bvec[0], 1, PAGE_SIZE); ret = mapping->a_ops->swap_rw(&sio->iocb, &from); if (ret != -EIOCBQUEUED) sio_write_complete(&sio->iocb, ret); @@ -382,21 +383,25 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, static void sio_read_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); - struct page *page = sio->bvec.bv_page; - - if (ret != 0 && ret != PAGE_SIZE) { - SetPageError(page); - ClearPageUptodate(page); - pr_alert_ratelimited("Read-error on swap-device\n"); - } else { - SetPageUptodate(page); - count_vm_event(PSWPIN); + int p; + + for (p = 0; p < sio->pages; p++) { + struct page *page = sio->bvec[p].bv_page; + if (ret != 0 && ret != PAGE_SIZE * sio->pages) { + SetPageError(page); + ClearPageUptodate(page); + pr_alert_ratelimited("Read-error on swap-device\n"); + } else { + SetPageUptodate(page); + count_vm_event(PSWPIN); + } + unlock_page(page); } - unlock_page(page); mempool_free(sio, sio_pool); } -int swap_readpage(struct page *page, bool synchronous) +int swap_readpage(struct page *page, bool synchronous, + struct swap_iocb **plug) { struct bio *bio; int ret = 0; @@ -421,24 +426,35 @@ int swap_readpage(struct page *page, bool synchronous) } if (data_race(sis->flags & SWP_FS_OPS)) { - struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - struct iov_iter from; - struct swap_iocb *sio; + struct swap_iocb *sio = NULL; loff_t pos = page_file_offset(page); - sio = mempool_alloc(sio_pool, GFP_KERNEL); - init_sync_kiocb(&sio->iocb, swap_file); - sio->iocb.ki_pos = pos; - sio->iocb.ki_complete = sio_read_complete; - sio->bvec.bv_page = page; - sio->bvec.bv_len = PAGE_SIZE; - sio->bvec.bv_offset = 0; - - iov_iter_bvec(&from, READ, &sio->bvec, 1, PAGE_SIZE); - ret = mapping->a_ops->swap_rw(&sio->iocb, &from); - if (ret != -EIOCBQUEUED) - sio_read_complete(&sio->iocb, ret); + if (*plug) + sio = *plug; + if (sio) { + if (sio->iocb.ki_filp != sis->swap_file || + sio->iocb.ki_pos + sio->pages * PAGE_SIZE != pos) { + swap_read_unplug(sio); + sio = NULL; + } + } + if (!sio) { + sio = mempool_alloc(sio_pool, GFP_KERNEL); + init_sync_kiocb(&sio->iocb, sis->swap_file); + sio->iocb.ki_pos = pos; + sio->iocb.ki_complete = sio_read_complete; + sio->pages = 0; + } + sio->bvec[sio->pages].bv_page = page; + sio->bvec[sio->pages].bv_len = PAGE_SIZE; + sio->bvec[sio->pages].bv_offset = 0; + sio->pages += 1; + if (sio->pages == ARRAY_SIZE(sio->bvec) || !plug) { + swap_read_unplug(sio); + sio = NULL; + } + if (plug) + *plug = sio; goto out; } @@ -490,3 +506,16 @@ int swap_readpage(struct page *page, bool synchronous) psi_memstall_leave(&pflags); return ret; } + +void __swap_read_unplug(struct swap_iocb *sio) +{ + struct iov_iter from; + struct address_space *mapping = sio->iocb.ki_filp->f_mapping; + int ret; + + iov_iter_bvec(&from, READ, sio->bvec, sio->pages, + PAGE_SIZE * sio->pages); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_read_complete(&sio->iocb, ret); +} diff --git a/mm/swap.h b/mm/swap.h index 128a1d3e5558..ce967abc5f46 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -4,7 +4,15 @@ /* linux/mm/page_io.c */ int sio_pool_init(void); -int swap_readpage(struct page *page, bool do_poll); +struct swap_iocb; +int swap_readpage(struct page *page, bool do_poll, + struct swap_iocb **plug); +void __swap_read_unplug(struct swap_iocb *plug); +static inline void swap_read_unplug(struct swap_iocb *plug) +{ + if (unlikely(plug)) + __swap_read_unplug(plug); +} int swap_writepage(struct page *page, struct writeback_control *wbc); void end_swap_bio_write(struct bio *bio); int __swap_writepage(struct page *page, struct writeback_control *wbc, @@ -38,7 +46,8 @@ struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index); struct page *read_swap_cache_async(swp_entry_t, gfp_t, struct vm_area_struct *vma, unsigned long addr, - bool do_poll); + bool do_poll, + struct swap_iocb **plug); struct page *__read_swap_cache_async(swp_entry_t, gfp_t, struct vm_area_struct *vma, unsigned long addr, diff --git a/mm/swap_state.c b/mm/swap_state.c index 514b86b05488..5cb2c75fa247 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -520,14 +520,16 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * the swap entry is no longer in use. */ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, - struct vm_area_struct *vma, unsigned long addr, bool do_poll) + struct vm_area_struct *vma, + unsigned long addr, bool do_poll, + struct swap_iocb **plug) { bool page_was_allocated; struct page *retpage = __read_swap_cache_async(entry, gfp_mask, vma, addr, &page_was_allocated); if (page_was_allocated) - swap_readpage(retpage, do_poll); + swap_readpage(retpage, do_poll, plug); return retpage; } @@ -621,10 +623,12 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, unsigned long mask; struct swap_info_struct *si = swp_swap_info(entry); struct blk_plug plug; + struct swap_iocb *splug = NULL; bool do_poll = true, page_allocated; struct vm_area_struct *vma = vmf->vma; unsigned long addr = vmf->address; + blk_start_plug(&plug); mask = swapin_nr_pages(offset) - 1; if (!mask) goto skip; @@ -638,7 +642,6 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (end_offset >= si->max) end_offset = si->max - 1; - blk_start_plug(&plug); for (offset = start_offset; offset <= end_offset ; offset++) { /* Ok, do the async read-ahead now */ page = __read_swap_cache_async( @@ -647,7 +650,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); + swap_readpage(page, false, &splug); if (offset != entry_offset) { SetPageReadahead(page); count_vm_event(SWAP_RA); @@ -655,11 +658,14 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, } put_page(page); } - blk_finish_plug(&plug); lru_add_drain(); /* Push any new pages onto the LRU now */ skip: - return read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll); + page = read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll, + &splug); + blk_finish_plug(&plug); + swap_read_unplug(splug); + return page; } int init_swap_address_space(unsigned int type, unsigned long nr_pages) @@ -790,6 +796,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, struct vm_fault *vmf) { struct blk_plug plug; + struct swap_iocb *splug = NULL; struct vm_area_struct *vma = vmf->vma; struct page *page; pte_t *pte, pentry; @@ -800,11 +807,11 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, .win = 1, }; + blk_start_plug(&plug); swap_ra_info(vmf, &ra_info); if (ra_info.win == 1) goto skip; - blk_start_plug(&plug); for (i = 0, pte = ra_info.ptes; i < ra_info.nr_pte; i++, pte++) { pentry = *pte; @@ -820,7 +827,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); + swap_readpage(page, false, &splug); if (i != ra_info.offset) { SetPageReadahead(page); count_vm_event(SWAP_RA); @@ -828,11 +835,13 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, } put_page(page); } - blk_finish_plug(&plug); lru_add_drain(); skip: - return read_swap_cache_async(fentry, gfp_mask, vma, vmf->address, - ra_info.win == 1); + page = read_swap_cache_async(fentry, gfp_mask, vma, vmf->address, + ra_info.win == 1, &splug); + blk_finish_plug(&plug); + swap_read_unplug(splug); + return page; } /** From patchwork Thu Dec 16 23:48:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683105 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A4BBC433F5 for ; Thu, 16 Dec 2021 23:56:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 636BD6B007E; Thu, 16 Dec 2021 18:53:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5BF846B0080; Thu, 16 Dec 2021 18:53:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 438DA6B0081; Thu, 16 Dec 2021 18:53:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0035.hostedemail.com [216.40.44.35]) by kanga.kvack.org (Postfix) with ESMTP id 32AAE6B007E for ; Thu, 16 Dec 2021 18:53:15 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E84F28249980 for ; Thu, 16 Dec 2021 23:53:04 +0000 (UTC) X-FDA: 78925310688.21.523BDAD Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf24.hostedemail.com (Postfix) with ESMTP id 7C77718001B for ; Thu, 16 Dec 2021 23:53:01 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 58AED1F37F; Thu, 16 Dec 2021 23:53:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698783; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=N3LExyldGHyELO/KroKo4GhmR5LrcJvQOPBtDseLnKQ=; b=H6jrNkx0+Bn7DRZ2IS4Tt3VaJE5IFDbnKX/YlTPw0TZ58NK9DGAAumqN7CapmzSxVcFy1F xvxxdwjTEx6LUmVZ1RxLLZntMOe//cz9F7W3BN6FGfCnTvgI+216eoC2Mx2/NgustLouwp pJrUTBshb/WBIow3a72iIKdtarTILyg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698783; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=N3LExyldGHyELO/KroKo4GhmR5LrcJvQOPBtDseLnKQ=; b=uNCZui2Jg1ABXESPZyxnWR1ftVQC8jlMBp8M1Rc6/yrWsQe0d+5nRJq+WThLboDS7xgpKd qPZI4pmYgBcsdADQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 556BC13EFD; Thu, 16 Dec 2021 23:53:00 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id IjunBFzRu2F9WwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:53:00 +0000 Subject: [PATCH 07/18] MM: submit multipage write for SWP_FS_OPS swap-space From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:23 +1100 Message-ID: <163969850299.20885.11549845125423716814.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Queue-Id: 7C77718001B X-Stat-Signature: u49eka5dqpz6k7c93t36tknic7rjx6nt Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=H6jrNkx0; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=uNCZui2J; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf24.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de X-Rspamd-Server: rspam11 X-HE-Tag: 1639698781-32138 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: swap_writepage() is given one page at a time, but may be called repeatedly in succession. For block-device swapspace, the blk_plug functionality allows the multiple pages to be combined together at lower layers. That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page reads. With this patch we pass a pointer-to-pointer via the wbc. swap_writepage can store state between calls - much like the pointer passed explicitly to swap_readpage. After calling swap_writepage() some number of times, the state will be passed to swap_write_unplug() which can submit the combined request. Signed-off-by: NeilBrown --- include/linux/writeback.h | 7 +++ mm/page_io.c | 98 ++++++++++++++++++++++++++++++--------------- mm/swap.h | 1 mm/vmscan.c | 9 +++- 4 files changed, 80 insertions(+), 35 deletions(-) diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 3bfd487d1dd2..16f780b618d2 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -79,6 +79,13 @@ struct writeback_control { unsigned punt_to_cgroup:1; /* cgrp punting, see __REQ_CGROUP_PUNT */ + /* To enable batching of swap writes to non-block-device backends, + * "plug" can be set point to a 'struct swap_iocb *'. When all swap + * writes have been submitted, if with swap_iocb is not NULL, + * swap_write_unplug() should be called. + */ + struct swap_iocb **plug; + #ifdef CONFIG_CGROUP_WRITEBACK struct bdi_writeback *wb; /* wb this writeback is issued under */ struct inode *inode; /* inode being written out */ diff --git a/mm/page_io.c b/mm/page_io.c index 03fbf9463081..92a31df467a2 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -304,26 +304,30 @@ int sio_pool_init(void) static void sio_write_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); - struct page *page = sio->bvec[0].bv_page; + int p; - if (ret != 0 && ret != PAGE_SIZE) { - /* - * In the case of swap-over-nfs, this can be a - * temporary failure if the system has limited - * memory for allocating transmit buffers. - * Mark the page dirty and avoid - * folio_rotate_reclaimable but rate-limit the - * messages but do not flag PageError like - * the normal direct-to-bio case as it could - * be temporary. - */ - set_page_dirty(page); - ClearPageReclaim(page); - pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n", - ret, page_file_offset(page)); - } else - count_vm_event(PSWPOUT); - end_page_writeback(page); + for (p = 0; p < sio->pages; p++) { + struct page *page = sio->bvec[p].bv_page; + + if (ret != 0 && ret != PAGE_SIZE * sio->pages) { + /* + * In the case of swap-over-nfs, this can be a + * temporary failure if the system has limited + * memory for allocating transmit buffers. + * Mark the page dirty and avoid + * folio_rotate_reclaimable but rate-limit the + * messages but do not flag PageError like + * the normal direct-to-bio case as it could + * be temporary. + */ + set_page_dirty(page); + ClearPageReclaim(page); + pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n", + ret, page_file_offset(page)); + } else + count_vm_event(PSWPOUT); + end_page_writeback(page); + } mempool_free(sio, sio_pool); } @@ -336,24 +340,39 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, VM_BUG_ON_PAGE(!PageSwapCache(page), page); if (data_race(sis->flags & SWP_FS_OPS)) { - struct swap_iocb *sio; + struct swap_iocb *sio = NULL; struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - struct iov_iter from; + loff_t pos = page_file_offset(page); set_page_writeback(page); unlock_page(page); - sio = mempool_alloc(sio_pool, GFP_NOIO); - init_sync_kiocb(&sio->iocb, swap_file); - sio->iocb.ki_complete = sio_write_complete; - sio->iocb.ki_pos = page_file_offset(page); - sio->bvec[0].bv_page = page; - sio->bvec[0].bv_len = PAGE_SIZE; - sio->bvec[0].bv_offset = 0; - iov_iter_bvec(&from, WRITE, &sio->bvec[0], 1, PAGE_SIZE); - ret = mapping->a_ops->swap_rw(&sio->iocb, &from); - if (ret != -EIOCBQUEUED) - sio_write_complete(&sio->iocb, ret); + + if (wbc->plug) + sio = *wbc->plug; + if (sio) { + if (sio->iocb.ki_filp != swap_file || + sio->iocb.ki_pos + sio->pages * PAGE_SIZE != pos) { + swap_write_unplug(sio); + sio = NULL; + } + } + if (!sio) { + sio = mempool_alloc(sio_pool, GFP_NOIO); + init_sync_kiocb(&sio->iocb, swap_file); + sio->iocb.ki_complete = sio_write_complete; + sio->iocb.ki_pos = pos; + sio->pages = 0; + } + sio->bvec[sio->pages].bv_page = page; + sio->bvec[sio->pages].bv_len = PAGE_SIZE; + sio->bvec[sio->pages].bv_offset = 0; + sio->pages += 1; + if (sio->pages == ARRAY_SIZE(sio->bvec) || !wbc->plug) { + swap_write_unplug(sio); + sio = NULL; + } + if (wbc->plug) + *wbc->plug = sio; return ret; } @@ -380,6 +399,19 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return 0; } +void swap_write_unplug(struct swap_iocb *sio) +{ + struct iov_iter from; + struct address_space *mapping = sio->iocb.ki_filp->f_mapping; + int ret; + + iov_iter_bvec(&from, WRITE, sio->bvec, sio->pages, + PAGE_SIZE * sio->pages); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_write_complete(&sio->iocb, ret); +} + static void sio_read_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); diff --git a/mm/swap.h b/mm/swap.h index ce967abc5f46..f4d0edda6e59 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -13,6 +13,7 @@ static inline void swap_read_unplug(struct swap_iocb *plug) if (unlikely(plug)) __swap_read_unplug(plug); } +void swap_write_unplug(struct swap_iocb *sio); int swap_writepage(struct page *page, struct writeback_control *wbc); void end_swap_bio_write(struct bio *bio); int __swap_writepage(struct page *page, struct writeback_control *wbc, diff --git a/mm/vmscan.c b/mm/vmscan.c index 5f460d174b1b..50a363e63102 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1123,7 +1123,8 @@ typedef enum { * pageout is called by shrink_page_list() for each dirty page. * Calls ->writepage(). */ -static pageout_t pageout(struct page *page, struct address_space *mapping) +static pageout_t pageout(struct page *page, struct address_space *mapping, + struct swap_iocb **plug) { /* * If the page is dirty, only perform writeback if that write @@ -1170,6 +1171,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping) .range_start = 0, .range_end = LLONG_MAX, .for_reclaim = 1, + .plug = plug, }; SetPageReclaim(page); @@ -1495,6 +1497,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, unsigned int nr_reclaimed = 0; unsigned int pgactivate = 0; bool do_demote_pass; + struct swap_iocb *plug = NULL; memset(stat, 0, sizeof(*stat)); cond_resched(); @@ -1780,7 +1783,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, * starts and then write it out here. */ try_to_unmap_flush_dirty(); - switch (pageout(page, mapping)) { + switch (pageout(page, mapping, &plug)) { case PAGE_KEEP: goto keep_locked; case PAGE_ACTIVATE: @@ -1934,6 +1937,8 @@ static unsigned int shrink_page_list(struct list_head *page_list, list_splice(&ret_pages, page_list); count_vm_events(PGACTIVATE, pgactivate); + if (plug) + swap_write_unplug(plug); return nr_reclaimed; } From patchwork Thu Dec 16 23:48:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683107 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A0F4C433EF for ; Thu, 16 Dec 2021 23:56:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9ED16B0080; Thu, 16 Dec 2021 18:53:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C27CF6B0081; Thu, 16 Dec 2021 18:53:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA19A6B0082; Thu, 16 Dec 2021 18:53:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0038.hostedemail.com [216.40.44.38]) by kanga.kvack.org (Postfix) with ESMTP id 986FE6B0080 for ; Thu, 16 Dec 2021 18:53:25 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6614D8632A for ; Thu, 16 Dec 2021 23:53:15 +0000 (UTC) X-FDA: 78925311150.22.ADF7F0E Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf16.hostedemail.com (Postfix) with ESMTP id CFA9618001B for ; Thu, 16 Dec 2021 23:53:14 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9C16F210F3; Thu, 16 Dec 2021 23:53:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698793; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qSBaI+WJMPR2UaOVoOuh6lltzGMEJfvZoUMkmtYG78A=; b=ISsxbwes6/ZB/yDvYIHG7lMdDFOR58vto/MC+nAnO0T0IEStc9DU+lw7/Zrr0uFsIo0a/c ZYMhmIKSSywGCGIkj1abWRi7ezaABsnRXOW4aCfUGsJn0dLz5WlsjUDqnUe0EYz+gu5f0B jjRTXQtcyc/6vssYAAzRWl6awaswjHI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698793; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qSBaI+WJMPR2UaOVoOuh6lltzGMEJfvZoUMkmtYG78A=; b=A4J6D6XG/va34XBqqJDjRBVuozm+NertzXGyBIO4/iqbF1cRjdXFKRayPv7sIlEhRX0bqA Ij9DczSGyXjcZbCg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9A54B13EFD; Thu, 16 Dec 2021 23:53:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 2/YkFWbRu2GJWwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:53:10 +0000 Subject: [PATCH 08/18] MM: Add AS_CAN_DIO mapping flag From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:23 +1100 Message-ID: <163969850302.20885.17124747377211907111.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: CFA9618001B X-Stat-Signature: go1djjmwgcqwkqjep6gnqihdjgq734kz Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=ISsxbwes; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=A4J6D6XG; spf=pass (imf16.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1639698794-559943 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently various places test if direct IO is possible on a file by checking for the existence of the direct_IO address space operation. This is a poor choice, as the direct_IO operation may not be used - it is only used if the generic_file_*_iter functions are called for direct IO and some filesystems - particularly NFS - don't do this. Instead, introduce a new mapping flag: AS_CAN_DIO and change the various places to check this (avoiding a pointer dereference). unlock_new_inode() will set this flag if ->direct_IO is present, so filesystems do not need to be changed. NFS *is* changed, to set the flag explicitly and discard the direct_IO entry in the address_space_operations for files. Signed-off-by: NeilBrown --- drivers/block/loop.c | 4 ++-- fs/fcntl.c | 5 +++-- fs/inode.c | 3 +++ fs/nfs/file.c | 1 - fs/nfs/inode.c | 1 + fs/open.c | 2 +- fs/overlayfs/file.c | 10 ++++------ include/linux/fs.h | 2 +- include/linux/pagemap.h | 3 ++- 9 files changed, 17 insertions(+), 14 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index c3a36cfaa855..ab4dee6c0fc3 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -184,8 +184,8 @@ static void __loop_update_dio(struct loop_device *lo, bool dio) */ if (dio) { if (queue_logical_block_size(lo->lo_queue) >= sb_bsize && - !(lo->lo_offset & dio_align) && - mapping->a_ops->direct_IO) + !(lo->lo_offset & dio_align) && + test_bit(AS_CAN_DIO, &mapping->flags)) use_dio = true; else use_dio = false; diff --git a/fs/fcntl.c b/fs/fcntl.c index 9c6c6a3e2de5..fcbf2dc44273 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -57,9 +58,9 @@ static int setfl(int fd, struct file * filp, unsigned long arg) /* Pipe packetized mode is controlled by O_DIRECT flag */ if (!S_ISFIFO(inode->i_mode) && (arg & O_DIRECT)) { - if (!filp->f_mapping || !filp->f_mapping->a_ops || - !filp->f_mapping->a_ops->direct_IO) - return -EINVAL; + if (!filp->f_mapping || + !test_bit(AS_CAN_DIO, &filp->f_mapping->flags)) + return -EINVAL; } if (filp->f_op->check_flags) diff --git a/fs/inode.c b/fs/inode.c index 6b80a51129d5..bae65ccecdb1 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -1008,6 +1008,9 @@ EXPORT_SYMBOL(lockdep_annotate_inode_mutex_key); void unlock_new_inode(struct inode *inode) { lockdep_annotate_inode_mutex_key(inode); + if (inode->i_mapping->a_ops && + inode->i_mapping->a_ops->direct_IO) + set_bit(AS_CAN_DIO, &inode->i_mapping->flags); spin_lock(&inode->i_lock); WARN_ON(!(inode->i_state & I_NEW)); inode->i_state &= ~I_NEW & ~I_CREATING; diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 0d33c95eefb6..60842b774b56 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -536,7 +536,6 @@ const struct address_space_operations nfs_file_aops = { .write_end = nfs_write_end, .invalidatepage = nfs_invalidate_page, .releasepage = nfs_release_page, - .direct_IO = nfs_direct_IO, #ifdef CONFIG_MIGRATION .migratepage = nfs_migrate_page, #endif diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index fda530d5e764..e9d1097170b1 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -496,6 +496,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr) if (S_ISREG(inode->i_mode)) { inode->i_fop = NFS_SB(sb)->nfs_client->rpc_ops->file_ops; inode->i_data.a_ops = &nfs_file_aops; + set_bit(AS_CAN_DIO, &inode->i_data.flags); nfs_inode_init_regular(nfsi); } else if (S_ISDIR(inode->i_mode)) { inode->i_op = NFS_SB(sb)->nfs_client->rpc_ops->dir_inode_ops; diff --git a/fs/open.c b/fs/open.c index f732fb94600c..ff58874acd10 100644 --- a/fs/open.c +++ b/fs/open.c @@ -840,7 +840,7 @@ static int do_dentry_open(struct file *f, /* NB: we're sure to have correct a_ops only after f_op->open */ if (f->f_flags & O_DIRECT) { - if (!f->f_mapping->a_ops || !f->f_mapping->a_ops->direct_IO) + if (!test_bit(AS_CAN_DIO, &f->f_mapping->flags)) return -EINVAL; } diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index fa125feed0ff..21754edf5b62 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "overlayfs.h" struct ovl_aio_req { @@ -83,8 +84,7 @@ static int ovl_change_flags(struct file *file, unsigned int flags) return -EPERM; if (flags & O_DIRECT) { - if (!file->f_mapping->a_ops || - !file->f_mapping->a_ops->direct_IO) + if (!test_bit(AS_CAN_DIO, &file->f_mapping->flags)) return -EINVAL; } @@ -306,8 +306,7 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter) ret = -EINVAL; if (iocb->ki_flags & IOCB_DIRECT && - (!real.file->f_mapping->a_ops || - !real.file->f_mapping->a_ops->direct_IO)) + !test_bit(AS_CAN_DIO, &real.file->f_mapping->flags)) goto out_fdput; old_cred = ovl_override_creds(file_inode(file)->i_sb); @@ -367,8 +366,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) ret = -EINVAL; if (iocb->ki_flags & IOCB_DIRECT && - (!real.file->f_mapping->a_ops || - !real.file->f_mapping->a_ops->direct_IO)) + !test_bit(AS_CAN_DIO, &real.file->f_mapping->flags)) goto out_fdput; if (!ovl_should_sync(OVL_FS(inode->i_sb))) diff --git a/include/linux/fs.h b/include/linux/fs.h index deaaf359cc49..1e954756b093 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -448,7 +448,7 @@ int pagecache_write_end(struct file *, struct address_space *mapping, * @nrpages: Number of page entries, protected by the i_pages lock. * @writeback_index: Writeback starts here. * @a_ops: Methods. - * @flags: Error bits and flags (AS_*). + * @flags: Error bits and flags (AS_*). (enum mapping_flags) * @wb_err: The most recent error which has occurred. * @private_lock: For use by the owner of the address_space. * @private_list: For use by the owner of the address_space. diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 605246452305..ceb599b6ba8b 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -81,10 +81,11 @@ enum mapping_flags { AS_ENOSPC = 1, /* ENOSPC on async write */ AS_MM_ALL_LOCKS = 2, /* under mm_take_all_locks() */ AS_UNEVICTABLE = 3, /* e.g., ramdisk, SHM_LOCK */ - AS_EXITING = 4, /* final truncate in progress */ + AS_EXITING = 4, /* final truncate in progress */ /* writeback related tags are not used */ AS_NO_WRITEBACK_TAGS = 5, AS_LARGE_FOLIO_SUPPORT = 6, + AS_CAN_DIO = 7, /* DIO is supported */ }; /** From patchwork Thu Dec 16 23:48:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDF99C433EF for ; Thu, 16 Dec 2021 23:57:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DFF036B0081; Thu, 16 Dec 2021 18:53:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D60076B0082; Thu, 16 Dec 2021 18:53:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD98A6B0083; Thu, 16 Dec 2021 18:53:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0197.hostedemail.com [216.40.44.197]) by kanga.kvack.org (Postfix) with ESMTP id A976E6B0081 for ; Thu, 16 Dec 2021 18:53:34 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 7160F8249980 for ; Thu, 16 Dec 2021 23:53:24 +0000 (UTC) X-FDA: 78925311528.25.C602667 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf07.hostedemail.com (Postfix) with ESMTP id ED4CC40019 for ; Thu, 16 Dec 2021 23:53:23 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E93BD210F3; Thu, 16 Dec 2021 23:53:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698802; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5kpl2jhqFYtBPH8hemOIlub9wC0sJNCcBfsh7RJkLRQ=; b=iOl4wvM/gflYZn4MNaCgunUkf12J/Gfmgfwn4fBrZINfWum99Qs1+squ/b9R6LCBvkJ8M7 WCs2rwbCBIZ6vzWkpxQcAg/kZYdRtriwwA2I4Q/vdFY9EmH/r0KXZpnYPA0zM1ugeHIy1s xqR+A2jSbvRfP2QfPxFZHPdMaYxRjM4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698802; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5kpl2jhqFYtBPH8hemOIlub9wC0sJNCcBfsh7RJkLRQ=; b=SegEJcOQ20Ku1P67FToIP20sJeTSt7+DFqh0f122yYH9LOYiWr8r7coVtW24i1Aqd93gw+ KumwwDO5BNv6D4Cg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 16FD413EFD; Thu, 16 Dec 2021 23:53:19 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id LFfJMW/Ru2GTWwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:53:19 +0000 Subject: [PATCH 09/18] NFS: rename nfs_direct_IO and use as ->swap_rw From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:23 +1100 Message-ID: <163969850310.20885.11800649164871080105.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: ED4CC40019 X-Stat-Signature: gkrix46s3s19d8hxsurq8zresqu99mt1 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b="iOl4wvM/"; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=SegEJcOQ; spf=pass (imf07.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1639698803-816931 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The nfs_direct_IO() exists to support SWAP IO, but hasn't worked for a while. We now need a ->swap_rw function which behaves slightly differently, returning zero for success rather than a byte count. So modify nfs_direct_IO accordingly, rename it, and use it as the ->swap_rw function. Note: it still won't work - that will be fixed in later patches. Signed-off-by: NeilBrown --- fs/nfs/direct.c | 23 ++++++++++------------- fs/nfs/file.c | 5 +---- include/linux/nfs_fs.h | 2 +- 3 files changed, 12 insertions(+), 18 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 9cff8709c80a..f1e169f3050a 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -152,28 +152,25 @@ nfs_direct_count_bytes(struct nfs_direct_req *dreq, } /** - * nfs_direct_IO - NFS address space operation for direct I/O + * nfs_swap_rw - NFS address space operation for swap I/O * @iocb: target I/O control block * @iter: I/O buffer * - * The presence of this routine in the address space ops vector means - * the NFS client supports direct I/O. However, for most direct IO, we - * shunt off direct read and write requests before the VFS gets them, - * so this method is only ever called for swap. + * Perform IO to the swap-file. This is much like direct IO. */ -ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) +int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter) { - struct inode *inode = iocb->ki_filp->f_mapping->host; - - /* we only support swap file calling nfs_direct_IO */ - if (!IS_SWAPFILE(inode)) - return 0; + ssize_t ret; VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE); if (iov_iter_rw(iter) == READ) - return nfs_file_direct_read(iocb, iter); - return nfs_file_direct_write(iocb, iter); + ret = nfs_file_direct_read(iocb, iter); + else + ret = nfs_file_direct_write(iocb, iter); + if (ret < 0) + return ret; + return 0; } static void nfs_direct_release_pages(struct page **pages, unsigned int npages) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 60842b774b56..b620fe697158 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -493,10 +493,6 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); struct inode *inode = file->f_mapping->host; - if (!file->f_mapping->a_ops->swap_rw) - /* Cannot support swap */ - return -EINVAL; - spin_lock(&inode->i_lock); blocks = inode->i_blocks; isize = inode->i_size; @@ -544,6 +540,7 @@ const struct address_space_operations nfs_file_aops = { .error_remove_page = generic_error_remove_page, .swap_activate = nfs_swap_activate, .swap_deactivate = nfs_swap_deactivate, + .swap_rw = nfs_swap_rw, }; /* diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 05f249f20f55..6329e6958718 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -510,7 +510,7 @@ static inline const struct cred *nfs_file_cred(struct file *file) /* * linux/fs/nfs/direct.c */ -extern ssize_t nfs_direct_IO(struct kiocb *, struct iov_iter *); +extern int nfs_swap_rw(struct kiocb *, struct iov_iter *); extern ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter); extern ssize_t nfs_file_direct_write(struct kiocb *iocb, From patchwork Thu Dec 16 23:48:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683111 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 381BEC433F5 for ; Thu, 16 Dec 2021 23:57:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 302976B0082; Thu, 16 Dec 2021 18:53:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 28A356B0083; Thu, 16 Dec 2021 18:53:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0DD306B0085; Thu, 16 Dec 2021 18:53:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0068.hostedemail.com [216.40.44.68]) by kanga.kvack.org (Postfix) with ESMTP id F1E2F6B0082 for ; Thu, 16 Dec 2021 18:53:41 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id BCC7C8698E for ; Thu, 16 Dec 2021 23:53:31 +0000 (UTC) X-FDA: 78925311822.22.F269EF1 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf28.hostedemail.com (Postfix) with ESMTP id 3BE83C000C for ; Thu, 16 Dec 2021 23:53:31 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 3BF111F37F; Thu, 16 Dec 2021 23:53:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698810; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oGTqKoZtDbecvGBIGtaMZLZto4uBvaDYc0Rr0u4Yk7Y=; b=F/0SodB8h/JFf0rkMNTpd82YsL5klsKADLO6pIw1l5DpFt1p9ujNX8ueg7w5b/qmq2Nnc6 qwRxiMTEpgnXS5XO5NK2eFxn1rpKkD5PmErLwPz1iKgWFynGXvJGVg2UlaoBv8jd0IPXde aILXgvTfZrYoGYh0JI2QfHyPfEIQgu4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698810; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oGTqKoZtDbecvGBIGtaMZLZto4uBvaDYc0Rr0u4Yk7Y=; b=xzsyfzA9n55B/z+zyLwZLTkmRlrd0ulkqRD2eAjVT4r01nfuDLGfVF5Lp/QtywROKLXJez cCoFecD39KUGYRCA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 30DC313EFD; Thu, 16 Dec 2021 23:53:26 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id haErN3bRu2GiWwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:53:26 +0000 Subject: [PATCH 10/18] NFS: swap IO handling is slightly different for O_DIRECT IO From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:23 +1100 Message-ID: <163969850314.20885.13214679186436457787.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 3BE83C000C X-Stat-Signature: n6utg8uaf57thhuutnonmq79o9udgjqd Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b="F/0SodB8"; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=xzsyfzA9; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf28.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de X-HE-Tag: 1639698811-527353 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted. We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw() 2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw(). Signed-off-by: NeilBrown --- fs/nfs/direct.c | 30 +++++++++++++++++++++--------- fs/nfs/file.c | 4 ++-- include/linux/nfs_fs.h | 4 ++-- 3 files changed, 25 insertions(+), 13 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index f1e169f3050a..eeff1b4e1a7c 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -165,9 +165,9 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter) VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE); if (iov_iter_rw(iter) == READ) - ret = nfs_file_direct_read(iocb, iter); + ret = nfs_file_direct_read(iocb, iter, true); else - ret = nfs_file_direct_write(iocb, iter); + ret = nfs_file_direct_write(iocb, iter, true); if (ret < 0) return ret; return 0; @@ -421,6 +421,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, * nfs_file_direct_read - file direct read operation for NFS files * @iocb: target I/O control block * @iter: vector of user buffers into which to read data + * @swap: flag indicating this is swap IO, not O_DIRECT IO * * We use this function for direct reads instead of calling * generic_file_aio_read() in order to avoid gfar's check to see if @@ -436,7 +437,8 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, * client must read the updated atime from the server back into its * cache. */ -ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) +ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter, + bool swap) { struct file *file = iocb->ki_filp; struct address_space *mapping = file->f_mapping; @@ -478,12 +480,14 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) if (iter_is_iovec(iter)) dreq->flags = NFS_ODIRECT_SHOULD_DIRTY; - nfs_start_io_direct(inode); + if (!swap) + nfs_start_io_direct(inode); NFS_I(inode)->read_io += count; requested = nfs_direct_read_schedule_iovec(dreq, iter, iocb->ki_pos); - nfs_end_io_direct(inode); + if (!swap) + nfs_end_io_direct(inode); if (requested > 0) { result = nfs_direct_wait(dreq); @@ -872,6 +876,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, * nfs_file_direct_write - file direct write operation for NFS files * @iocb: target I/O control block * @iter: vector of user buffers from which to write data + * @swap: flag indicating this is swap IO, not O_DIRECT IO * * We use this function for direct writes instead of calling * generic_file_aio_write() in order to avoid taking the inode @@ -888,7 +893,8 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, * Note that O_APPEND is not supported for NFS direct writes, as there * is no atomic O_APPEND write facility in the NFS protocol. */ -ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) +ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, + bool swap) { ssize_t result, requested; size_t count; @@ -902,7 +908,11 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n", file, iov_iter_count(iter), (long long) iocb->ki_pos); - result = generic_write_checks(iocb, iter); + if (!swap) + result = generic_write_checks(iocb, iter); + else + /* bypass generic checks */ + result = iov_iter_count(iter); if (result <= 0) return result; count = result; @@ -933,7 +943,8 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) dreq->iocb = iocb; pnfs_init_ds_commit_info_ops(&dreq->ds_cinfo, inode); - nfs_start_io_direct(inode); + if (!swap) + nfs_start_io_direct(inode); requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); @@ -942,7 +953,8 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) pos >> PAGE_SHIFT, end); } - nfs_end_io_direct(inode); + if (!swap) + nfs_end_io_direct(inode); if (requested > 0) { result = nfs_direct_wait(dreq); diff --git a/fs/nfs/file.c b/fs/nfs/file.c index b620fe697158..996dfb3c74b2 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -161,7 +161,7 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to) ssize_t result; if (iocb->ki_flags & IOCB_DIRECT) - return nfs_file_direct_read(iocb, to); + return nfs_file_direct_read(iocb, to, false); dprintk("NFS: read(%pD2, %zu@%lu)\n", iocb->ki_filp, @@ -625,7 +625,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) return result; if (iocb->ki_flags & IOCB_DIRECT) - return nfs_file_direct_write(iocb, from); + return nfs_file_direct_write(iocb, from, false); dprintk("NFS: write(%pD2, %zu@%Ld)\n", file, iov_iter_count(from), (long long) iocb->ki_pos); diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 6329e6958718..3a210478f665 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -512,9 +512,9 @@ static inline const struct cred *nfs_file_cred(struct file *file) */ extern int nfs_swap_rw(struct kiocb *, struct iov_iter *); extern ssize_t nfs_file_direct_read(struct kiocb *iocb, - struct iov_iter *iter); + struct iov_iter *iter, bool swap); extern ssize_t nfs_file_direct_write(struct kiocb *iocb, - struct iov_iter *iter); + struct iov_iter *iter, bool swap); /* * linux/fs/nfs/dir.c From patchwork Thu Dec 16 23:48:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C827CC433F5 for ; Thu, 16 Dec 2021 23:58:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5994D6B0083; Thu, 16 Dec 2021 18:53:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 522B76B0085; Thu, 16 Dec 2021 18:53:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39CDB6B0087; Thu, 16 Dec 2021 18:53:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0111.hostedemail.com [216.40.44.111]) by kanga.kvack.org (Postfix) with ESMTP id 28F556B0083 for ; Thu, 16 Dec 2021 18:53:50 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E3FE4181AC9CC for ; Thu, 16 Dec 2021 23:53:39 +0000 (UTC) X-FDA: 78925312158.03.1B645AC Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf03.hostedemail.com (Postfix) with ESMTP id 6C9C72000F for ; Thu, 16 Dec 2021 23:53:39 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 4D7EA212C8; Thu, 16 Dec 2021 23:53:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698818; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uJLgBSr+4ac2HtnFPf6KLC0wSgPyRK3wpQcub+LxsjE=; b=B5BZGrGhS1sLx+jn04Rr1MuE1ChyurPGtdI+nwDT/Hr/0+QXFKuZRtGCvuCRNoIkwlwyKP 9BfV/qhMuMbcQqDMBat4fq1Wo0xQWckKo0jCjlYNMmPl0tfn2JBp9hUPp/dnjydtCifmiB 8RRyHATryW0Fh+YgaDAS7VV2qTL3pAI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698818; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uJLgBSr+4ac2HtnFPf6KLC0wSgPyRK3wpQcub+LxsjE=; b=+QQyfMGX2CvZGYolzHHLcWwYdfeq3Mk7+x940MXnO+mysULhS3FXrt8AFWVVKGMrMf+WAT O4p/XR+Idp/HN7CA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 564E413EFD; Thu, 16 Dec 2021 23:53:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id qkUhBX/Ru2GpWwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:53:35 +0000 Subject: [PATCH 11/18] SUNRPC/call_alloc: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:23 +1100 Message-ID: <163969850317.20885.3872948365489724355.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=B5BZGrGh; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=+QQyfMGX; spf=pass (imf03.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 6C9C72000F X-Stat-Signature: 4u653njguka7dsggs6ocecend9nrc1ie X-HE-Tag: 1639698819-716789 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. rpc_malloc() can block, and this might cause deadlocks. So check RPC_IS_ASYNC(), rather than RPC_IS_SWAPPER() to determine if blocking is acceptable. Signed-off-by: NeilBrown --- net/sunrpc/sched.c | 4 +++- net/sunrpc/xprtrdma/transport.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index e2c835482791..d5b6e897f5a5 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -1023,8 +1023,10 @@ int rpc_malloc(struct rpc_task *task) struct rpc_buffer *buf; gfp_t gfp = GFP_NOFS; + if (RPC_IS_ASYNC(task)) + gfp = GFP_NOWAIT | __GFP_NOWARN; if (RPC_IS_SWAPPER(task)) - gfp = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN; + gfp |= __GFP_MEMALLOC; size += sizeof(struct rpc_buffer); if (size <= RPC_BUFFER_MAXSIZE) diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 16e5696314a4..a52277115500 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -574,8 +574,10 @@ xprt_rdma_allocate(struct rpc_task *task) gfp_t flags; flags = RPCRDMA_DEF_GFP; + if (RPC_IS_ASYNC(task)) + flags = GFP_NOWAIT | __GFP_NOWARN; if (RPC_IS_SWAPPER(task)) - flags = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN; + flags |= __GFP_MEMALLOC; if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize, flags)) From patchwork Thu Dec 16 23:48:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683115 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2783C433EF for ; Thu, 16 Dec 2021 23:58:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2657F6B0085; Thu, 16 Dec 2021 18:53:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1EED76B0087; Thu, 16 Dec 2021 18:53:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0696C6B0088; Thu, 16 Dec 2021 18:53:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0216.hostedemail.com [216.40.44.216]) by kanga.kvack.org (Postfix) with ESMTP id E97496B0085 for ; Thu, 16 Dec 2021 18:53:56 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id A9C5C8632A for ; Thu, 16 Dec 2021 23:53:46 +0000 (UTC) X-FDA: 78925312452.10.F36239B Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf30.hostedemail.com (Postfix) with ESMTP id 15C0080011 for ; Thu, 16 Dec 2021 23:53:45 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 21B742111A; Thu, 16 Dec 2021 23:53:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698825; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/Xf0wQNVm3xiMoEph9xuiX0w4MC4SZqJ4flfzelN2po=; b=HvkZQeCMFZCPSVQFUK/PUY7B7v+NLZpUF/m3Yj4xJKsbEa2leGoz9gqvwpHJBag+WefWI6 0VBcdl08xnIcY9vXn22pP/IeYzRGO28wyAsDZqf87G24s8KrbkeUDwXMNDYgmtPjvbHcDh qpdO8NC1Q9cEMZN1/U1R1NMXfhuVzC0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698825; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/Xf0wQNVm3xiMoEph9xuiX0w4MC4SZqJ4flfzelN2po=; b=vZy0FnDUXHJwQKOG4m/0/p1GY8oOwPXQ5ScsZz3bmlnHvp6t/QmB/UuBmIoCdt7YEvO6Gw uqEETfQGhz4AvEAg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2C84913EFD; Thu, 16 Dec 2021 23:53:41 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id RrWdNoXRu2GtWwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:53:41 +0000 Subject: [PATCH 12/18] SUNRPC/auth: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:23 +1100 Message-ID: <163969850320.20885.16756746953092069326.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=HvkZQeCM; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=vZy0FnDU; spf=pass (imf30.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspamd-Queue-Id: 15C0080011 X-Stat-Signature: weu1upywazetanqz5pkeartdju7spi9m X-Rspamd-Server: rspam04 X-HE-Tag: 1639698825-765768 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. lookup_cred() can block on a mempool or kmalloc - and this can cause deadlocks. So add a new RPCAUTH_LOOKUP flag for async lookups and don't block on memory. If the -ENOMEM gets back to call_refreshresult(), wait a short while and try again. HZ>>4 is chosen as it is used elsewhere for -ENOMEM retries. Signed-off-by: NeilBrown --- include/linux/sunrpc/auth.h | 1 + net/sunrpc/auth.c | 6 +++++- net/sunrpc/auth_gss/auth_gss.c | 6 +++++- net/sunrpc/auth_unix.c | 10 ++++++++-- net/sunrpc/clnt.c | 3 +++ 5 files changed, 22 insertions(+), 4 deletions(-) diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h index 98da816b5fc2..3e6ce288a7fc 100644 --- a/include/linux/sunrpc/auth.h +++ b/include/linux/sunrpc/auth.h @@ -99,6 +99,7 @@ struct rpc_auth_create_args { /* Flags for rpcauth_lookupcred() */ #define RPCAUTH_LOOKUP_NEW 0x01 /* Accept an uninitialised cred */ +#define RPCAUTH_LOOKUP_ASYNC 0x02 /* Don't block waiting for memory */ /* * Client authentication ops diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index a9f0d17fdb0d..6bfa19f9fa6a 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -615,6 +615,8 @@ rpcauth_bind_root_cred(struct rpc_task *task, int lookupflags) }; struct rpc_cred *ret; + if (RPC_IS_ASYNC(task)) + lookupflags |= RPCAUTH_LOOKUP_ASYNC; ret = auth->au_ops->lookup_cred(auth, &acred, lookupflags); put_cred(acred.cred); return ret; @@ -631,6 +633,8 @@ rpcauth_bind_machine_cred(struct rpc_task *task, int lookupflags) if (!acred.principal) return NULL; + if (RPC_IS_ASYNC(task)) + lookupflags |= RPCAUTH_LOOKUP_ASYNC; return auth->au_ops->lookup_cred(auth, &acred, lookupflags); } @@ -654,7 +658,7 @@ rpcauth_bindcred(struct rpc_task *task, const struct cred *cred, int flags) }; if (flags & RPC_TASK_ASYNC) - lookupflags |= RPCAUTH_LOOKUP_NEW; + lookupflags |= RPCAUTH_LOOKUP_NEW | RPCAUTH_LOOKUP_ASYNC; if (task->tk_op_cred) /* Task must use exactly this rpc_cred */ new = get_rpccred(task->tk_op_cred); diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c index 5f42aa5fc612..df72d6301f78 100644 --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -1341,7 +1341,11 @@ gss_hash_cred(struct auth_cred *acred, unsigned int hashbits) static struct rpc_cred * gss_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) { - return rpcauth_lookup_credcache(auth, acred, flags, GFP_NOFS); + gfp_t gfp = GFP_NOFS; + + if (flags & RPCAUTH_LOOKUP_ASYNC) + gfp = GFP_NOWAIT | __GFP_NOWARN; + return rpcauth_lookup_credcache(auth, acred, flags, gfp); } static struct rpc_cred * diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c index e7df1f782b2e..e5819265dd1b 100644 --- a/net/sunrpc/auth_unix.c +++ b/net/sunrpc/auth_unix.c @@ -43,8 +43,14 @@ unx_destroy(struct rpc_auth *auth) static struct rpc_cred * unx_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) { - struct rpc_cred *ret = mempool_alloc(unix_pool, GFP_NOFS); - + gfp_t gfp = GFP_NOFS; + struct rpc_cred *ret; + + if (flags & RPCAUTH_LOOKUP_ASYNC) + gfp = GFP_NOWAIT | __GFP_NOWARN; + ret = mempool_alloc(unix_pool, gfp); + if (!ret) + return ERR_PTR(-ENOMEM); rpcauth_init_cred(ret, acred, auth, &unix_credops); ret->cr_flags = 1UL << RPCAUTH_CRED_UPTODATE; return ret; diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index a312ea2bc440..238b2ef5491f 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1745,6 +1745,9 @@ call_refreshresult(struct rpc_task *task) task->tk_cred_retry--; trace_rpc_retry_refresh_status(task); return; + case -ENOMEM: + rpc_delay(task, HZ >> 4); + return; } trace_rpc_refresh_status(task); rpc_call_rpcerror(task, status); From patchwork Thu Dec 16 23:48:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B03BC433F5 for ; Thu, 16 Dec 2021 23:59:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F0566B0087; Thu, 16 Dec 2021 18:54:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 19F1C6B0088; Thu, 16 Dec 2021 18:54:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 067776B0089; Thu, 16 Dec 2021 18:54:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0201.hostedemail.com [216.40.44.201]) by kanga.kvack.org (Postfix) with ESMTP id EC1476B0087 for ; Thu, 16 Dec 2021 18:54:06 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B78EC8249980 for ; Thu, 16 Dec 2021 23:53:56 +0000 (UTC) X-FDA: 78925312872.27.52A1A02 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf08.hostedemail.com (Postfix) with ESMTP id 1CB6016000E for ; Thu, 16 Dec 2021 23:53:51 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 4C5251F37F; Thu, 16 Dec 2021 23:53:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698835; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f9JPZARu0omLin5bRQVSgWDU6wAXTb1AVpCm2PGq0QA=; b=XkyI4ce5LB7cLg0dZlvq/6cmAkIAlrOduZ3YaFyQlqABD8YnjmxuQ3tqORjbkTXll3mDxg VBgynpVdMJrYSmT5w3wMlCSXbxjFyO1A2mzVAn+2715rUj2Aj9DqoeIs5bP3zOyGIz8aEL zZXfps1UomDTorCv+vMH9ZfEBjFwLFI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698835; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f9JPZARu0omLin5bRQVSgWDU6wAXTb1AVpCm2PGq0QA=; b=zyNmjZO9NIndVyf1Mh426Eme8KWpjWDzaouhaq4ppJ7DxpOjj59Xgx/6m8mwAIDmFL8t0G xicSDE4ks/ug8BCA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 35EDA13EFD; Thu, 16 Dec 2021 23:53:51 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id auDHOI/Ru2G8WwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:53:51 +0000 Subject: [PATCH 13/18] SUNRPC/xprt: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:23 +1100 Message-ID: <163969850327.20885.823088186596498420.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 1CB6016000E X-Stat-Signature: toxkkpk1kzctaysgiwihh5c3qxdweiki Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=XkyI4ce5; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=zyNmjZO9; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf08.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de X-HE-Tag: 1639698831-999038 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. xprt_dynamic_alloc_slot can block indefinitely. This can tie up all workqueue threads and NFS can deadlock. So when called from a workqueue, set __GFP_NORETRY. The rdma alloc_slot already does not block. However it sets the error to -EAGAIN suggesting this will trigger a sleep. It does not. As we can see in call_reserveresult(), only -ENOMEM causes a sleep. -EAGAIN causes immediate retry. Signed-off-by: NeilBrown --- net/sunrpc/xprt.c | 5 ++++- net/sunrpc/xprtrdma/transport.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a02de2bddb28..47d207e416ab 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1687,12 +1687,15 @@ static bool xprt_throttle_congested(struct rpc_xprt *xprt, struct rpc_task *task static struct rpc_rqst *xprt_dynamic_alloc_slot(struct rpc_xprt *xprt) { struct rpc_rqst *req = ERR_PTR(-EAGAIN); + gfp_t gfp_mask = GFP_NOFS; if (xprt->num_reqs >= xprt->max_reqs) goto out; ++xprt->num_reqs; spin_unlock(&xprt->reserve_lock); - req = kzalloc(sizeof(struct rpc_rqst), GFP_NOFS); + if (current->flags & PF_WQ_WORKER) + gfp_mask |= __GFP_NORETRY | __GFP_NOWARN; + req = kzalloc(sizeof(struct rpc_rqst), gfp_mask); spin_lock(&xprt->reserve_lock); if (req != NULL) goto out; diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index a52277115500..32df23796747 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -521,7 +521,7 @@ xprt_rdma_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task) return; out_sleep: - task->tk_status = -EAGAIN; + task->tk_status = -ENOMEM; xprt_add_backlog(xprt, task); } From patchwork Thu Dec 16 23:48:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37607C433F5 for ; Fri, 17 Dec 2021 00:00:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6AA4B6B0071; Thu, 16 Dec 2021 18:54:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 633846B0088; Thu, 16 Dec 2021 18:54:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D4A06B0089; Thu, 16 Dec 2021 18:54:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0188.hostedemail.com [216.40.44.188]) by kanga.kvack.org (Postfix) with ESMTP id 4034A6B0071 for ; Thu, 16 Dec 2021 18:54:13 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0F993844FA for ; Thu, 16 Dec 2021 23:54:03 +0000 (UTC) X-FDA: 78925313166.24.9FC467C Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf25.hostedemail.com (Postfix) with ESMTP id 41000A0018 for ; Thu, 16 Dec 2021 23:53:57 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 901371F37F; Thu, 16 Dec 2021 23:54:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698841; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ca/QMA9b8aSP2XqDNpzMoh70WOqFOvfMXBDfQmXLg8g=; b=19YUlHlfjO2QpfANVsRzxJPbr0h1bUHEseqyROPWtZ/VjA5U6gv0cvqAOrj+gttp94HTBs V5rtgIC8rAPe3iyroPRtPsl0NcdpundBZIEPRhSfqhadZo1UOpl7iYBZrIywKwBFfaOwiG 9wTAFDdx96KBwjcXLZwYYTzJauGvr10= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698841; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ca/QMA9b8aSP2XqDNpzMoh70WOqFOvfMXBDfQmXLg8g=; b=M8fxrnSDJclwnSF/zdJmRf2hyHg3cUDn49eSKtK9IZmabY2ypSbW+uGwm8NPjUsGtLUAAI uTp02/fM6dkiJVBA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B129213EFD; Thu, 16 Dec 2021 23:53:58 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id mCvWG5bRu2HBWwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:53:58 +0000 Subject: [PATCH 14/18] SUNRPC: remove scheduling boost for "SWAPPER" tasks. From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:23 +1100 Message-ID: <163969850330.20885.12926885798126849542.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Queue-Id: 41000A0018 X-Stat-Signature: tta6gejgrny6rfpq5gjoxkn3eg1ye4dn Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=19YUlHlf; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=M8fxrnSD; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf25.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de X-Rspamd-Server: rspam02 X-HE-Tag: 1639698837-640410 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently, tasks marked as "swapper" tasks get put to the front of non-priority rpc_queues, and are sorted earlier than non-swapper tasks on the transport's ->xmit_queue. This is pointless as currently *all* tasks for a mount that has swap enabled on *any* file are marked as "swapper" tasks. So the net result is that the non-priority rpc_queues are reverse-ordered (LIFO). This scheduling boost is not necessary to avoid deadlocks, and hurts fairness, so remove it. If there were a need to expedite some requests, the tk_priority mechanism is a more appropriate tool. Signed-off-by: NeilBrown --- net/sunrpc/sched.c | 7 ------- net/sunrpc/xprt.c | 11 ----------- 2 files changed, 18 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index d5b6e897f5a5..256302bf6557 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -186,11 +186,6 @@ static void __rpc_add_wait_queue_priority(struct rpc_wait_queue *queue, /* * Add new request to wait queue. - * - * Swapper tasks always get inserted at the head of the queue. - * This should avoid many nasty memory deadlocks and hopefully - * improve overall performance. - * Everyone else gets appended to the queue to ensure proper FIFO behavior. */ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, struct rpc_task *task, @@ -199,8 +194,6 @@ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, INIT_LIST_HEAD(&task->u.tk_wait.timer_list); if (RPC_IS_PRIORITY(queue)) __rpc_add_wait_queue_priority(queue, task, queue_priority); - else if (RPC_IS_SWAPPER(task)) - list_add(&task->u.tk_wait.list, &queue->tasks[0]); else list_add_tail(&task->u.tk_wait.list, &queue->tasks[0]); task->tk_waitqueue = queue; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 47d207e416ab..a0a2583fe941 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1354,17 +1354,6 @@ xprt_request_enqueue_transmit(struct rpc_task *task) INIT_LIST_HEAD(&req->rq_xmit2); goto out; } - } else if (RPC_IS_SWAPPER(task)) { - list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { - if (pos->rq_cong || pos->rq_bytes_sent) - continue; - if (RPC_IS_SWAPPER(pos->rq_task)) - continue; - /* Note: req is added _before_ pos */ - list_add_tail(&req->rq_xmit, &pos->rq_xmit); - INIT_LIST_HEAD(&req->rq_xmit2); - goto out; - } } else if (!req->rq_seqno) { list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { if (pos->rq_task->tk_owner != task->tk_owner) From patchwork Thu Dec 16 23:48:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683123 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE7E2C433EF for ; Fri, 17 Dec 2021 00:00:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AA1076B0088; Thu, 16 Dec 2021 18:54:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A2B006B0089; Thu, 16 Dec 2021 18:54:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A5376B008A; Thu, 16 Dec 2021 18:54:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0124.hostedemail.com [216.40.44.124]) by kanga.kvack.org (Postfix) with ESMTP id 7CBA76B0088 for ; Thu, 16 Dec 2021 18:54:21 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 4C1E58698E for ; Thu, 16 Dec 2021 23:54:11 +0000 (UTC) X-FDA: 78925313502.11.DDA9CFE Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf30.hostedemail.com (Postfix) with ESMTP id BCB3080015 for ; Thu, 16 Dec 2021 23:54:10 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id BB5B42111A; Thu, 16 Dec 2021 23:54:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698849; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CDOSHbb3cCHR/Wvsf3Dv/Q3+pqM/Z1sApkeWL3yPdmA=; b=iZeNAv29PjQn6MX3D30qEOwbbGRbKdljvGl7O2FDh/D1Ag/7JsP/cNzRO8ED1jhDtvylBz 1BR6p151QwShGB5UAXtTFlx7nME3ze7DAxBS964FzJCfwjplb3UmVzILM3jHwMnSkT7ddN vY1HFMPXTPsVRa9Wv6KIxwlcodTSXyc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698849; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CDOSHbb3cCHR/Wvsf3Dv/Q3+pqM/Z1sApkeWL3yPdmA=; b=R9FXTSaA1m7h0cmZRDVZmblZ/AwfUnuQZJZKPm6EJ960BqMU94yab6AoBrD6cPJnUX3h64 S8cvKTw5ET5ojfDQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A0B0713EFD; Thu, 16 Dec 2021 23:54:06 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id tzqYFp7Ru2HNWwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:54:06 +0000 Subject: [PATCH 15/18] NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:23 +1100 Message-ID: <163969850333.20885.17464715871688147966.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: BCB3080015 X-Stat-Signature: jz1ya1yxwbs7h41dqms49gdwic8hc896 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=iZeNAv29; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=R9FXTSaA; spf=pass (imf30.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1639698850-583275 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: NFS_RPC_SWAPFLAGS is only used for READ requests. It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to requests. This is not needed for swap READ - though it is for writes where it is set via a different mechanism. RPC_TASK_ROOTCREDS causes the 'machine' credential to be used. This is not needed as the root credential is saved when the swap file is opened, and this is used for all IO. So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of RPC_TASK_ROOTCREDS, that isn't needed either. Remove both. Signed-off-by: NeilBrown --- fs/nfs/read.c | 4 ---- include/linux/nfs_fs.h | 5 ----- include/linux/sunrpc/sched.h | 1 - include/trace/events/sunrpc.h | 1 - net/sunrpc/auth.c | 2 +- 5 files changed, 1 insertion(+), 12 deletions(-) diff --git a/fs/nfs/read.c b/fs/nfs/read.c index d11af2a9299c..a8f2b884306c 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -194,10 +194,6 @@ static void nfs_initiate_read(struct nfs_pgio_header *hdr, const struct nfs_rpc_ops *rpc_ops, struct rpc_task_setup *task_setup_data, int how) { - struct inode *inode = hdr->inode; - int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0; - - task_setup_data->flags |= swap_flags; rpc_ops->read_setup(hdr, msg); trace_nfs_initiate_read(hdr); } diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 3a210478f665..5dce9129fe64 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -45,11 +45,6 @@ */ #define NFS_MAX_TRANSPORTS 16 -/* - * These are the default flags for swap requests - */ -#define NFS_RPC_SWAPFLAGS (RPC_TASK_SWAPPER|RPC_TASK_ROOTCREDS) - /* * Size of the NFS directory verifier */ diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index db964bb63912..56710f8056d3 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -124,7 +124,6 @@ struct rpc_task_setup { #define RPC_TASK_MOVEABLE 0x0004 /* nfs4.1+ rpc tasks */ #define RPC_TASK_NULLCREDS 0x0010 /* Use AUTH_NULL credential */ #define RPC_CALL_MAJORSEEN 0x0020 /* major timeout seen */ -#define RPC_TASK_ROOTCREDS 0x0040 /* force root creds */ #define RPC_TASK_DYNAMIC 0x0080 /* task was kmalloc'ed */ #define RPC_TASK_NO_ROUND_ROBIN 0x0100 /* send requests on "main" xprt */ #define RPC_TASK_SOFT 0x0200 /* Use soft timeouts */ diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 3a99358c262b..141dc34a450e 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -311,7 +311,6 @@ TRACE_EVENT(rpc_request, { RPC_TASK_MOVEABLE, "MOVEABLE" }, \ { RPC_TASK_NULLCREDS, "NULLCREDS" }, \ { RPC_CALL_MAJORSEEN, "MAJORSEEN" }, \ - { RPC_TASK_ROOTCREDS, "ROOTCREDS" }, \ { RPC_TASK_DYNAMIC, "DYNAMIC" }, \ { RPC_TASK_NO_ROUND_ROBIN, "NO_ROUND_ROBIN" }, \ { RPC_TASK_SOFT, "SOFT" }, \ diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index 6bfa19f9fa6a..682fcd24bf43 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -670,7 +670,7 @@ rpcauth_bindcred(struct rpc_task *task, const struct cred *cred, int flags) /* If machine cred couldn't be bound, try a root cred */ if (new) ; - else if (cred == &machine_cred || (flags & RPC_TASK_ROOTCREDS)) + else if (cred == &machine_cred) new = rpcauth_bind_root_cred(task, lookupflags); else if (flags & RPC_TASK_NULLCREDS) new = authnull_ops.lookup_cred(NULL, NULL, 0); From patchwork Thu Dec 16 23:48:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA835C433F5 for ; Fri, 17 Dec 2021 00:01:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B8C3E6B0089; Thu, 16 Dec 2021 18:54:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B3CE46B008A; Thu, 16 Dec 2021 18:54:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DC736B008C; Thu, 16 Dec 2021 18:54:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8F6E66B0089 for ; Thu, 16 Dec 2021 18:54:30 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 58FD08249980 for ; Thu, 16 Dec 2021 23:54:20 +0000 (UTC) X-FDA: 78925313880.27.C08AB30 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf19.hostedemail.com (Postfix) with ESMTP id DFC731A000C for ; Thu, 16 Dec 2021 23:54:19 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id E5FDE1F37F; Thu, 16 Dec 2021 23:54:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698858; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zySS5YHW/pQK0eZ7E/VfPEvlY3ztdGB5WZY/NUYMSXA=; b=yD3IvavaTcZy+deSrVAPk9u8AOWe+Y+5tQkIRAb2XpeADL+wQ+R9ePvecF8CthIizRKLeP H0RJjDJznBLvnzedGHUx9+rlvnhFyM9KnhETFPFJ5dPbu8vk6MnRE1fkusFAUG2+iwym/Y Lr0SwNvL523PXS5BvdNxLp76xrBNTHs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698858; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zySS5YHW/pQK0eZ7E/VfPEvlY3ztdGB5WZY/NUYMSXA=; b=QLbEL0Hs7fX8NS1AHSw9HQZumyUYsw3+CWzaySMLLa2wyhJoUkOA84tC7aDuCqQ6hvapqC r4QPOrJi/xhY7SBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 5011713EFD; Thu, 16 Dec 2021 23:54:15 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id KyfUAqfRu2HTWwAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:54:15 +0000 Subject: [PATCH 16/18] SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:23 +1100 Message-ID: <163969850340.20885.8131473201767138346.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: DFC731A000C X-Stat-Signature: bfhkwf3nngbt1s8u7g1x7s3z8yrkhz6i Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=yD3Ivava; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=QLbEL0Hs; spf=pass (imf19.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1639698859-46366 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: rpc tasks can be marked as RPC_TASK_SWAPPER. This causes GFP_MEMALLOC to be used for some allocations. This is needed in some cases, but not in all where it is currently provided, and in some where it isn't provided. Currently *all* tasks associated with a rpc_client on which swap is enabled get the flag and hence some GFP_MEMALLOC support. GFP_MEMALLOC is provided for ->buf_alloc() but only swap-writes need it. However xdr_alloc_bvec does not get GFP_MEMALLOC - though it often does need it. xdr_alloc_bvec is called while the XPRT_LOCK is held. If this blocks, then it blocks all other queued tasks. So this allocation needs GFP_MEMALLOC for *all* requests, not just writes, when the xprt is used for any swap writes. Similarly, if the transport is not connected, that will block all requests including swap writes, so memory allocations should get GFP_MEMALLOC if swap writes are possible. So with this patch: 1/ we ONLY set RPC_TASK_SWAPPER for swap writes. 2/ __rpc_execute() sets PF_MEMALLOC while handling any task with RPC_TASK_SWAPPER set, or when handling any task that holds the XPRT_LOCKED lock on an xprt used for swap. This removes the need for the RPC_IS_SWAPPER() test in ->buf_alloc handlers. 3/ xprt_prepare_transmit() sets PF_MEMALLOC after locking any task to a swapper xprt. __rpc_execute() will clear it. 3/ PF_MEMALLOC is set for all the connect workers. Signed-off-by: NeilBrown --- fs/nfs/write.c | 2 ++ net/sunrpc/clnt.c | 2 -- net/sunrpc/sched.c | 20 +++++++++++++++++--- net/sunrpc/xprt.c | 3 +++ net/sunrpc/xprtrdma/transport.c | 6 ++++-- net/sunrpc/xprtsock.c | 8 ++++++++ 6 files changed, 34 insertions(+), 7 deletions(-) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 9b7619ce17a7..0c7a304c9e74 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1408,6 +1408,8 @@ static void nfs_initiate_write(struct nfs_pgio_header *hdr, { int priority = flush_task_priority(how); + if (IS_SWAPFILE(hdr->inode)) + task_setup_data->flags |= RPC_TASK_SWAPPER; task_setup_data->priority = priority; rpc_ops->write_setup(hdr, msg, &task_setup_data->rpc_client); trace_nfs_initiate_write(hdr); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 238b2ef5491f..cb76fbea3ed5 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1085,8 +1085,6 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt) task->tk_flags |= RPC_TASK_TIMEOUT; if (clnt->cl_noretranstimeo) task->tk_flags |= RPC_TASK_NO_RETRANS_TIMEOUT; - if (atomic_read(&clnt->cl_swapper)) - task->tk_flags |= RPC_TASK_SWAPPER; /* Add to the client's list of all tasks */ spin_lock(&clnt->cl_lock); list_add_tail(&task->tk_task, &clnt->cl_tasks); diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 256302bf6557..9020cedb7c95 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -869,6 +869,15 @@ void rpc_release_calldata(const struct rpc_call_ops *ops, void *calldata) ops->rpc_release(calldata); } +static bool xprt_needs_memalloc(struct rpc_xprt *xprt, struct rpc_task *tk) +{ + if (!xprt) + return false; + if (!atomic_read(&xprt->swapper)) + return false; + return test_bit(XPRT_LOCKED, &xprt->state) && xprt->snd_task == tk; +} + /* * This is the RPC `scheduler' (or rather, the finite state machine). */ @@ -877,6 +886,7 @@ static void __rpc_execute(struct rpc_task *task) struct rpc_wait_queue *queue; int task_is_async = RPC_IS_ASYNC(task); int status = 0; + unsigned long pflags = current->flags; WARN_ON_ONCE(RPC_IS_QUEUED(task)); if (RPC_IS_QUEUED(task)) @@ -899,6 +909,10 @@ static void __rpc_execute(struct rpc_task *task) } if (!do_action) break; + if (RPC_IS_SWAPPER(task) || + xprt_needs_memalloc(task->tk_xprt, task)) + current->flags |= PF_MEMALLOC; + trace_rpc_task_run_action(task, do_action); do_action(task); @@ -936,7 +950,7 @@ static void __rpc_execute(struct rpc_task *task) rpc_clear_running(task); spin_unlock(&queue->lock); if (task_is_async) - return; + goto out; /* sync task: sleep here */ trace_rpc_task_sync_sleep(task, task->tk_action); @@ -960,6 +974,8 @@ static void __rpc_execute(struct rpc_task *task) /* Release all resources associated with the task */ rpc_release_task(task); +out: + current_restore_flags(pflags, PF_MEMALLOC); } /* @@ -1018,8 +1034,6 @@ int rpc_malloc(struct rpc_task *task) if (RPC_IS_ASYNC(task)) gfp = GFP_NOWAIT | __GFP_NOWARN; - if (RPC_IS_SWAPPER(task)) - gfp |= __GFP_MEMALLOC; size += sizeof(struct rpc_buffer); if (size <= RPC_BUFFER_MAXSIZE) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a0a2583fe941..0614e7463d4b 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1492,6 +1492,9 @@ bool xprt_prepare_transmit(struct rpc_task *task) return false; } + if (atomic_read(&xprt->swapper)) + /* This will be clear in __rpc_execute */ + current->flags |= PF_MEMALLOC; return true; } diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 32df23796747..256b06a92391 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -239,8 +239,11 @@ xprt_rdma_connect_worker(struct work_struct *work) struct rpcrdma_xprt *r_xprt = container_of(work, struct rpcrdma_xprt, rx_connect_worker.work); struct rpc_xprt *xprt = &r_xprt->rx_xprt; + unsigned int pflags = current->flags; int rc; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; rc = rpcrdma_xprt_connect(r_xprt); xprt_clear_connecting(xprt); if (!rc) { @@ -254,6 +257,7 @@ xprt_rdma_connect_worker(struct work_struct *work) rpcrdma_xprt_disconnect(r_xprt); xprt_unlock_connect(xprt, r_xprt); xprt_wake_pending_tasks(xprt, rc); + current_restore_flags(pflags, PF_MEMALLOC); } /** @@ -576,8 +580,6 @@ xprt_rdma_allocate(struct rpc_task *task) flags = RPCRDMA_DEF_GFP; if (RPC_IS_ASYNC(task)) flags = GFP_NOWAIT | __GFP_NOWARN; - if (RPC_IS_SWAPPER(task)) - flags |= __GFP_MEMALLOC; if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize, flags)) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index d8ee06a9650a..9d34c71004fa 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2047,7 +2047,10 @@ static void xs_udp_setup_socket(struct work_struct *work) struct rpc_xprt *xprt = &transport->xprt; struct socket *sock; int status = -EIO; + unsigned int pflags = current->flags; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; sock = xs_create_sock(xprt, transport, xs_addr(xprt)->sa_family, SOCK_DGRAM, IPPROTO_UDP, false); @@ -2067,6 +2070,7 @@ static void xs_udp_setup_socket(struct work_struct *work) xprt_clear_connecting(xprt); xprt_unlock_connect(xprt, transport); xprt_wake_pending_tasks(xprt, status); + current_restore_flags(pflags, PF_MEMALLOC); } /** @@ -2226,7 +2230,10 @@ static void xs_tcp_setup_socket(struct work_struct *work) struct socket *sock = transport->sock; struct rpc_xprt *xprt = &transport->xprt; int status; + unsigned int pflags = current->flags; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; if (!sock) { sock = xs_create_sock(xprt, transport, xs_addr(xprt)->sa_family, SOCK_STREAM, @@ -2291,6 +2298,7 @@ static void xs_tcp_setup_socket(struct work_struct *work) xprt_clear_connecting(xprt); out_unlock: xprt_unlock_connect(xprt, transport); + current_restore_flags(pflags, PF_MEMALLOC); } /** From patchwork Thu Dec 16 23:48:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79181C433EF for ; Fri, 17 Dec 2021 00:01:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE7846B0073; Thu, 16 Dec 2021 18:54:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E97F66B008A; Thu, 16 Dec 2021 18:54:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D38416B008C; Thu, 16 Dec 2021 18:54:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0221.hostedemail.com [216.40.44.221]) by kanga.kvack.org (Postfix) with ESMTP id C495F6B0073 for ; Thu, 16 Dec 2021 18:54:39 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 8A8F38698E for ; Thu, 16 Dec 2021 23:54:29 +0000 (UTC) X-FDA: 78925314258.18.072230B Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf21.hostedemail.com (Postfix) with ESMTP id 525AC1C000E for ; Thu, 16 Dec 2021 23:54:26 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 0FD6C2111A; Thu, 16 Dec 2021 23:54:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698868; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0YlWWmo2jWTqt7N81UWi0KQAD+FCdGohmTtrWNDINy4=; b=inDo9UjNmQEg+QtkTjpBTGCaytQv2VjTF0/XFJNQ9cRjYRy3ooi7OAXHAgyQz6JSpB4Sbg 0DZBq8dUII3svHW7oyN468XGTabuoN3YN9+/VrEl2SH2VQxyXIzM6kKaKmLp7mPo0lOpkR 6zw3mB04GX6QNxf9RLkyerhcaiCEucE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698868; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0YlWWmo2jWTqt7N81UWi0KQAD+FCdGohmTtrWNDINy4=; b=9QiAhPNwjhDf314YUAVOc6SMTJT+15xL2fGJ3N7fHjTm85zT/VMCIYDMmtGDvbKG4w8MpE 7BIW8t4MurqtFSAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 04A4F13EFD; Thu, 16 Dec 2021 23:54:24 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id XeORLLDRu2EQXAAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:54:24 +0000 Subject: [PATCH 17/18] NFSv4: keep state manager thread active if swap is enabled From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:23 +1100 Message-ID: <163969850343.20885.13733170689644192942.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 525AC1C000E X-Stat-Signature: w94a8uf1983k787qddoep79jbz4uzej1 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=inDo9UjN; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=9QiAhPNw; spf=pass (imf21.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1639698866-201430 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If we are swapping over NFSv4, we may not be able to allocate memory to start the state-manager thread at the time when we need it. So keep it always running when swap is enabled, and just signal it to start. This requires updating and testing the cl_swapper count on the root rpc_clnt after following all ->cl_parent links. Signed-off-by: NeilBrown --- fs/nfs/file.c | 15 ++++++++++++--- fs/nfs/nfs4_fs.h | 1 + fs/nfs/nfs4proc.c | 20 ++++++++++++++++++++ fs/nfs/nfs4state.c | 39 +++++++++++++++++++++++++++++++++------ include/linux/nfs_xdr.h | 2 ++ net/sunrpc/clnt.c | 2 ++ 6 files changed, 70 insertions(+), 9 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 996dfb3c74b2..6ad054b9bbd0 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -490,8 +490,9 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, unsigned long blocks; long long isize; int ret; - struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); - struct inode *inode = file->f_mapping->host; + struct inode *inode = file_inode(file); + struct rpc_clnt *clnt = NFS_CLIENT(inode); + struct nfs_client *cl = NFS_SERVER(inode)->nfs_client; spin_lock(&inode->i_lock); blocks = inode->i_blocks; @@ -512,14 +513,22 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, } *span = sis->pages; sis->flags |= SWP_FS_OPS; + + if (cl->rpc_ops->enable_swap) + cl->rpc_ops->enable_swap(inode); + return ret; } static void nfs_swap_deactivate(struct file *file) { - struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); + struct inode *inode = file_inode(file); + struct rpc_clnt *clnt = NFS_CLIENT(inode); + struct nfs_client *cl = NFS_SERVER(inode)->nfs_client; rpc_clnt_swap_deactivate(clnt); + if (cl->rpc_ops->disable_swap) + cl->rpc_ops->disable_swap(file_inode(file)); } const struct address_space_operations nfs_file_aops = { diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index ed5eaca6801e..8a9ce0f42efd 100644 --- a/fs/nfs/nfs4_fs.h +++ b/fs/nfs/nfs4_fs.h @@ -42,6 +42,7 @@ enum nfs4_client_state { NFS4CLNT_LEASE_MOVED, NFS4CLNT_DELEGATION_EXPIRED, NFS4CLNT_RUN_MANAGER, + NFS4CLNT_MANAGER_AVAILABLE, NFS4CLNT_RECALL_RUNNING, NFS4CLNT_RECALL_ANY_LAYOUT_READ, NFS4CLNT_RECALL_ANY_LAYOUT_RW, diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index ee3bc79f6ca3..ab6382f9cbf0 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -10347,6 +10347,24 @@ static ssize_t nfs4_listxattr(struct dentry *dentry, char *list, size_t size) return error + error2 + error3; } +static void nfs4_enable_swap(struct inode *inode) +{ + /* The state manager thread must always be running. + * It will notice the client is a swapper, and stay put. + */ + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client; + + nfs4_schedule_state_manager(clp); +} + +static void nfs4_disable_swap(struct inode *inode) +{ + /* The state manager thread will now exit once it is + * woken. + */ + wake_up_var(&NFS_SERVER(inode)->nfs_client->cl_state); +} + static const struct inode_operations nfs4_dir_inode_operations = { .create = nfs_create, .lookup = nfs_lookup, @@ -10423,6 +10441,8 @@ const struct nfs_rpc_ops nfs_v4_clientops = { .free_client = nfs4_free_client, .create_server = nfs4_create_server, .clone_server = nfs_clone_server, + .enable_swap = nfs4_enable_swap, + .disable_swap = nfs4_disable_swap, }; static const struct xattr_handler nfs4_xattr_nfs4_acl_handler = { diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index f63dfa01001c..ebe470e6aa8f 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1205,10 +1205,17 @@ void nfs4_schedule_state_manager(struct nfs_client *clp) { struct task_struct *task; char buf[INET6_ADDRSTRLEN + sizeof("-manager") + 1]; + struct rpc_clnt *cl = clp->cl_rpcclient; + + while (cl != cl->cl_parent) + cl = cl->cl_parent; set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state); - if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0) + if (test_and_set_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state) != 0) { + wake_up_var(&clp->cl_state); return; + } + set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state); __module_get(THIS_MODULE); refcount_inc(&clp->cl_count); @@ -1224,6 +1231,7 @@ void nfs4_schedule_state_manager(struct nfs_client *clp) printk(KERN_ERR "%s: kthread_run: %ld\n", __func__, PTR_ERR(task)); nfs4_clear_state_manager_bit(clp); + clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state); nfs_put_client(clp); module_put(THIS_MODULE); } @@ -2665,11 +2673,8 @@ static void nfs4_state_manager(struct nfs_client *clp) clear_bit(NFS4CLNT_RECALL_RUNNING, &clp->cl_state); } - /* Did we race with an attempt to give us more work? */ - if (!test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state)) - return; - if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0) - return; + return; + } while (refcount_read(&clp->cl_count) > 1 && !signalled()); goto out_drain; @@ -2689,9 +2694,31 @@ static void nfs4_state_manager(struct nfs_client *clp) static int nfs4_run_state_manager(void *ptr) { struct nfs_client *clp = ptr; + struct rpc_clnt *cl = clp->cl_rpcclient; + + while (cl != cl->cl_parent) + cl = cl->cl_parent; allow_signal(SIGKILL); +again: + set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state); nfs4_state_manager(clp); + if (atomic_read(&cl->cl_swapper)) { + wait_var_event_interruptible(&clp->cl_state, + test_bit(NFS4CLNT_RUN_MANAGER, + &clp->cl_state)); + if (atomic_read(&cl->cl_swapper) && + test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state)) + goto again; + /* Either no longer a swapper, or were signalled */ + } + clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state); + + if (refcount_read(&clp->cl_count) > 1 && !signalled() && + test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state) && + !test_and_set_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state)) + goto again; + nfs_put_client(clp); module_put_and_exit(0); return 0; diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 967a0098f0a9..04cf3a8fb949 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -1795,6 +1795,8 @@ struct nfs_rpc_ops { struct nfs_server *(*create_server)(struct fs_context *); struct nfs_server *(*clone_server)(struct nfs_server *, struct nfs_fh *, struct nfs_fattr *, rpc_authflavor_t); + void (*enable_swap)(struct inode *inode); + void (*disable_swap)(struct inode *inode); }; /* diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index cb76fbea3ed5..4cb403a0f334 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -3066,6 +3066,8 @@ rpc_clnt_swap_activate_callback(struct rpc_clnt *clnt, int rpc_clnt_swap_activate(struct rpc_clnt *clnt) { + while (clnt != clnt->cl_parent) + clnt = clnt->cl_parent; if (atomic_inc_return(&clnt->cl_swapper) == 1) return rpc_clnt_iterate_for_each_xprt(clnt, rpc_clnt_swap_activate_callback, NULL); From patchwork Thu Dec 16 23:48:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12683129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24327C433F5 for ; Fri, 17 Dec 2021 00:02:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A04E06B008A; Thu, 16 Dec 2021 18:54:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9DC546B008C; Thu, 16 Dec 2021 18:54:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 87DEF6B0092; Thu, 16 Dec 2021 18:54:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0211.hostedemail.com [216.40.44.211]) by kanga.kvack.org (Postfix) with ESMTP id 790D16B008A for ; Thu, 16 Dec 2021 18:54:47 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 39482180B146D for ; Thu, 16 Dec 2021 23:54:37 +0000 (UTC) X-FDA: 78925314594.04.FD9239D Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf02.hostedemail.com (Postfix) with ESMTP id 5FEB280012 for ; Thu, 16 Dec 2021 23:54:34 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C4E1B1F37F; Thu, 16 Dec 2021 23:54:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1639698875; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UN33ikAoXajq5CI0zBofFyR8XobeBM2LQz3kaqJLOi8=; b=MMSyLL6EF220adA5CA7TM22/UFSkJZQp7lQ587G2eUT03U+MWSVZSJyHVvp8XRqgvDsAtw XA+OKQsqNAtxoMxbnWkGTNbrzvSlwjaLb3+NhHC8ivAno2ISM0cBHfsW0lUnJ7McAof59g VvXnaleXzF5dNuLkqYvN4cgISEnzPDQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1639698875; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UN33ikAoXajq5CI0zBofFyR8XobeBM2LQz3kaqJLOi8=; b=typf5Kg/zxn17ZRHbU5b5gPIy954GXSE/v+Cgz6k3Ej6zydN26iNuVGaNYCcRxa8wo7Pu+ 3xBdXpDQJ8itDmBQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6180A13EFD; Thu, 16 Dec 2021 23:54:32 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 0EWWB7jRu2EXXAAAMHmgww (envelope-from ); Thu, 16 Dec 2021 23:54:32 +0000 Subject: [PATCH 18/18] NFS: swap-out must always use STABLE writes. From: NeilBrown To: Trond Myklebust , Anna Schumaker , Chuck Lever , Andrew Morton , Mel Gorman , Christoph Hellwig , David Howells Cc: linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Fri, 17 Dec 2021 10:48:23 +1100 Message-ID: <163969850347.20885.5566025433915169963.stgit@noble.brown> In-Reply-To: <163969801519.20885.3977673503103544412.stgit@noble.brown> References: <163969801519.20885.3977673503103544412.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=MMSyLL6E; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="typf5Kg/"; spf=pass (imf02.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspamd-Queue-Id: 5FEB280012 X-Stat-Signature: g8xijfewy6krwj1w6kxqpu5iw8wi7f9w X-Rspamd-Server: rspam04 X-HE-Tag: 1639698874-390915 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The commit handling code is not safe against memory-pressure deadlocks when writing to swap. In particular, nfs_commitdata_alloc() blocks indefinitely waiting for memory, and this can consume all available workqueue threads. swap-out most likely uses STABLE writes anyway as COND_STABLE indicates that a stable write should be used if the write fits in a single request, and it normally does. However if we ever swap with a small wsize, or gather unusually large numbers of pages for a single write, this might change. For safety, make it explicit in the code that direct writes used for swap must always use FLUSH_COND_STABLE. Signed-off-by: NeilBrown --- fs/nfs/direct.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index eeff1b4e1a7c..1317465150a6 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -790,7 +790,7 @@ static const struct nfs_pgio_completion_ops nfs_direct_write_completion_ops = { */ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, struct iov_iter *iter, - loff_t pos) + loff_t pos, int ioflags) { struct nfs_pageio_descriptor desc; struct inode *inode = dreq->inode; @@ -798,7 +798,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, size_t requested_bytes = 0; size_t wsize = max_t(size_t, NFS_SERVER(inode)->wsize, PAGE_SIZE); - nfs_pageio_init_write(&desc, inode, FLUSH_COND_STABLE, false, + nfs_pageio_init_write(&desc, inode, ioflags, false, &nfs_direct_write_completion_ops); desc.pg_dreq = dreq; get_dreq(dreq); @@ -904,6 +904,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, struct nfs_direct_req *dreq; struct nfs_lock_context *l_ctx; loff_t pos, end; + int ioflags = swap ? FLUSH_COND_STABLE : FLUSH_STABLE; dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n", file, iov_iter_count(iter), (long long) iocb->ki_pos); @@ -946,7 +947,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, if (!swap) nfs_start_io_direct(inode); - requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos, ioflags); if (mapping->nrpages) { invalidate_inode_pages2_range(mapping,