From patchwork Tue Mar 29 23:49:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12795341 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99419C433FE for ; Tue, 29 Mar 2022 23:51:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA7688D0003; Tue, 29 Mar 2022 19:51:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E2F658D0001; Tue, 29 Mar 2022 19:51:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C83228D0003; Tue, 29 Mar 2022 19:51:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id AFD4C8D0001 for ; Tue, 29 Mar 2022 19:51:22 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6650C23972 for ; Tue, 29 Mar 2022 23:51:22 +0000 (UTC) X-FDA: 79299072804.12.942095E Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf08.hostedemail.com (Postfix) with ESMTP id C2316160006 for ; Tue, 29 Mar 2022 23:51:21 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id BCB22210E4; Tue, 29 Mar 2022 23:51:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1648597880; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=78FUAhyxyEonHm/FDM0yw+XAWXu8dlR0I1pop4ojBzY=; b=N3QeMRRw6Max+iDvM3u1NsZ8RJg7y1NPGTqJ9OreveC1uKBH9T+zS6fiBcWh+GYyJ211fZ 7b3Y2dbebw+b+Mt6OKqhXWf87Ilr4LPIGKuYCQLX8N/r13XnJy3yV/Zs56n5EE0YkZbMVb Oeuvq+NB3spxQLZEA4sAXjf7faRVhz8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1648597880; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=78FUAhyxyEonHm/FDM0yw+XAWXu8dlR0I1pop4ojBzY=; b=CjHR4+0jIzItpnvHsP10zCZbnFALAtinZTkFzYwH4Rto6TpgHLd4qhHPp30QY18zB8JZXa Myf3lbTfLT02ibCA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6CAF613A7E; Tue, 29 Mar 2022 23:51:18 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 5X34CXabQ2IfLwAAMHmgww (envelope-from ); Tue, 29 Mar 2022 23:51:18 +0000 Subject: [PATCH 01/10] MM: create new mm/swap.h header file. From: NeilBrown To: Andrew Morton Cc: Christoph Hellwig , David Howells , linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 30 Mar 2022 10:49:41 +1100 Message-ID: <164859778120.29473.11725907882296224053.stgit@noble.brown> In-Reply-To: <164859751830.29473.5309689752169286816.stgit@noble.brown> References: <164859751830.29473.5309689752169286816.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C2316160006 X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=N3QeMRRw; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=CjHR4+0j; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf08.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de X-Stat-Signature: tnig9bahufuz4rf6kf5fjzzjbzathzrr X-HE-Tag: 1648597881-629424 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Many functions declared in include/linux/swap.h are only used within mm/ Create a new "mm/swap.h" and move some of these declarations there. Remove the redundant 'extern' from the function declarations. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- include/linux/swap.h | 121 --------------------------------------------- mm/huge_memory.c | 1 mm/madvise.c | 1 mm/memcontrol.c | 1 mm/memory.c | 1 mm/mincore.c | 1 mm/page_alloc.c | 1 mm/page_io.c | 1 mm/shmem.c | 1 mm/swap.h | 133 ++++++++++++++++++++++++++++++++++++++++++++++++++ mm/swap_state.c | 1 mm/swapfile.c | 1 mm/util.c | 1 mm/vmscan.c | 1 mm/zswap.c | 2 + 15 files changed, 147 insertions(+), 121 deletions(-) create mode 100644 mm/swap.h diff --git a/include/linux/swap.h b/include/linux/swap.h index 27093b477c5f..11390dde5a6c 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -420,62 +420,19 @@ extern void kswapd_stop(int nid); #ifdef CONFIG_SWAP -#include /* for bio_end_io_t */ - -/* linux/mm/page_io.c */ -extern int swap_readpage(struct page *page, bool do_poll); -extern int swap_writepage(struct page *page, struct writeback_control *wbc); -extern void end_swap_bio_write(struct bio *bio); -extern int __swap_writepage(struct page *page, struct writeback_control *wbc, - bio_end_io_t end_write_func); bool swap_dirty_folio(struct address_space *mapping, struct folio *folio); - int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); int generic_swapfile_activate(struct swap_info_struct *, struct file *, sector_t *); -/* linux/mm/swap_state.c */ -/* One swap address space for each 64M swap space */ -#define SWAP_ADDRESS_SPACE_SHIFT 14 -#define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT) -extern struct address_space *swapper_spaces[]; -#define swap_address_space(entry) \ - (&swapper_spaces[swp_type(entry)][swp_offset(entry) \ - >> SWAP_ADDRESS_SPACE_SHIFT]) static inline unsigned long total_swapcache_pages(void) { return global_node_page_state(NR_SWAPCACHE); } -extern void show_swap_cache_info(void); -extern int add_to_swap(struct page *page); -extern void *get_shadow_from_swap_cache(swp_entry_t entry); -extern int add_to_swap_cache(struct page *page, swp_entry_t entry, - gfp_t gfp, void **shadowp); -extern void __delete_from_swap_cache(struct page *page, - swp_entry_t entry, void *shadow); -extern void delete_from_swap_cache(struct page *); -extern void clear_shadow_from_swap_cache(int type, unsigned long begin, - unsigned long end); -extern void free_swap_cache(struct page *); extern void free_page_and_swap_cache(struct page *); extern void free_pages_and_swap_cache(struct page **, int); -extern struct page *lookup_swap_cache(swp_entry_t entry, - struct vm_area_struct *vma, - unsigned long addr); -struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index); -extern struct page *read_swap_cache_async(swp_entry_t, gfp_t, - struct vm_area_struct *vma, unsigned long addr, - bool do_poll); -extern struct page *__read_swap_cache_async(swp_entry_t, gfp_t, - struct vm_area_struct *vma, unsigned long addr, - bool *new_page_allocated); -extern struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); -extern struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); - /* linux/mm/swapfile.c */ extern atomic_long_t nr_swap_pages; extern long total_swap_pages; @@ -528,12 +485,6 @@ static inline void put_swap_device(struct swap_info_struct *si) } #else /* CONFIG_SWAP */ - -static inline int swap_readpage(struct page *page, bool do_poll) -{ - return 0; -} - static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry) { return NULL; @@ -548,11 +499,6 @@ static inline void put_swap_device(struct swap_info_struct *si) { } -static inline struct address_space *swap_address_space(swp_entry_t entry) -{ - return NULL; -} - #define get_nr_swap_pages() 0L #define total_swap_pages 0L #define total_swapcache_pages() 0UL @@ -567,14 +513,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry) #define free_pages_and_swap_cache(pages, nr) \ release_pages((pages), (nr)); -static inline void free_swap_cache(struct page *page) -{ -} - -static inline void show_swap_cache_info(void) -{ -} - /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */ #define free_swap_and_cache(e) is_pfn_swap_entry(e) @@ -600,65 +538,6 @@ static inline void put_swap_page(struct page *page, swp_entry_t swp) { } -static inline struct page *swap_cluster_readahead(swp_entry_t entry, - gfp_t gfp_mask, struct vm_fault *vmf) -{ - return NULL; -} - -static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask, - struct vm_fault *vmf) -{ - return NULL; -} - -static inline int swap_writepage(struct page *p, struct writeback_control *wbc) -{ - return 0; -} - -static inline struct page *lookup_swap_cache(swp_entry_t swp, - struct vm_area_struct *vma, - unsigned long addr) -{ - return NULL; -} - -static inline -struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index) -{ - return find_get_page(mapping, index); -} - -static inline int add_to_swap(struct page *page) -{ - return 0; -} - -static inline void *get_shadow_from_swap_cache(swp_entry_t entry) -{ - return NULL; -} - -static inline int add_to_swap_cache(struct page *page, swp_entry_t entry, - gfp_t gfp_mask, void **shadowp) -{ - return -1; -} - -static inline void __delete_from_swap_cache(struct page *page, - swp_entry_t entry, void *shadow) -{ -} - -static inline void delete_from_swap_cache(struct page *page) -{ -} - -static inline void clear_shadow_from_swap_cache(int type, unsigned long begin, - unsigned long end) -{ -} static inline int page_swapcount(struct page *page) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2fe38212e07c..2b433920726d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -39,6 +39,7 @@ #include #include #include "internal.h" +#include "swap.h" #define CREATE_TRACE_POINTS #include diff --git a/mm/madvise.c b/mm/madvise.c index b41858ee937b..4f48e48432e8 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -35,6 +35,7 @@ #include #include "internal.h" +#include "swap.h" struct madvise_walk_private { struct mmu_gather *tlb; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 725f76723220..4f4cb6a464fb 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -67,6 +67,7 @@ #include #include #include "slab.h" +#include "swap.h" #include diff --git a/mm/memory.c b/mm/memory.c index be44d0b36b18..92ea8ac374a4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -86,6 +86,7 @@ #include "pgalloc-track.h" #include "internal.h" +#include "swap.h" #if defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS) && !defined(CONFIG_COMPILE_TEST) #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid. diff --git a/mm/mincore.c b/mm/mincore.c index 9122676b54d6..f4f627325e12 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -20,6 +20,7 @@ #include #include +#include "swap.h" static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, unsigned long end, struct mm_walk *walk) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bdc8f60ae462..82bfcd23d0eb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -81,6 +81,7 @@ #include "internal.h" #include "shuffle.h" #include "page_reporting.h" +#include "swap.h" /* Free Page Internal flags: for internal, non-pcp variants of free_pages(). */ typedef int __bitwise fpi_t; diff --git a/mm/page_io.c b/mm/page_io.c index b417f000b49e..d01ab9d5410a 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -26,6 +26,7 @@ #include #include #include +#include "swap.h" void end_swap_bio_write(struct bio *bio) { diff --git a/mm/shmem.c b/mm/shmem.c index 529c9ad3e926..31db146f15ec 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -38,6 +38,7 @@ #include #include #include +#include "swap.h" static struct vfsmount *shm_mnt; diff --git a/mm/swap.h b/mm/swap.h new file mode 100644 index 000000000000..f8265bf0ce00 --- /dev/null +++ b/mm/swap.h @@ -0,0 +1,133 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _MM_SWAP_H +#define _MM_SWAP_H + +#ifdef CONFIG_SWAP +#include /* for bio_end_io_t */ + +/* linux/mm/page_io.c */ +int swap_readpage(struct page *page, bool do_poll); +int swap_writepage(struct page *page, struct writeback_control *wbc); +void end_swap_bio_write(struct bio *bio); +int __swap_writepage(struct page *page, struct writeback_control *wbc, + bio_end_io_t end_write_func); + +/* linux/mm/swap_state.c */ +/* One swap address space for each 64M swap space */ +#define SWAP_ADDRESS_SPACE_SHIFT 14 +#define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT) +extern struct address_space *swapper_spaces[]; +#define swap_address_space(entry) \ + (&swapper_spaces[swp_type(entry)][swp_offset(entry) \ + >> SWAP_ADDRESS_SPACE_SHIFT]) + +void show_swap_cache_info(void); +int add_to_swap(struct page *page); +void *get_shadow_from_swap_cache(swp_entry_t entry); +int add_to_swap_cache(struct page *page, swp_entry_t entry, + gfp_t gfp, void **shadowp); +void __delete_from_swap_cache(struct page *page, + swp_entry_t entry, void *shadow); +void delete_from_swap_cache(struct page *page); +void clear_shadow_from_swap_cache(int type, unsigned long begin, + unsigned long end); +void free_swap_cache(struct page *page); +struct page *lookup_swap_cache(swp_entry_t entry, + struct vm_area_struct *vma, + unsigned long addr); +struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index); + +struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, + struct vm_area_struct *vma, + unsigned long addr, + bool do_poll); +struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, + struct vm_area_struct *vma, + unsigned long addr, + bool *new_page_allocated); +struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf); +struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, + struct vm_fault *vmf); + +#else /* CONFIG_SWAP */ +static inline int swap_readpage(struct page *page, bool do_poll) +{ + return 0; +} + +static inline struct address_space *swap_address_space(swp_entry_t entry) +{ + return NULL; +} + +static inline void free_swap_cache(struct page *page) +{ +} + +static inline void show_swap_cache_info(void) +{ +} + +static inline struct page *swap_cluster_readahead(swp_entry_t entry, + gfp_t gfp_mask, struct vm_fault *vmf) +{ + return NULL; +} + +static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask, + struct vm_fault *vmf) +{ + return NULL; +} + +static inline int swap_writepage(struct page *p, struct writeback_control *wbc) +{ + return 0; +} + +static inline struct page *lookup_swap_cache(swp_entry_t swp, + struct vm_area_struct *vma, + unsigned long addr) +{ + return NULL; +} + +static inline +struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index) +{ + return find_get_page(mapping, index); +} + +static inline int add_to_swap(struct page *page) +{ + return 0; +} + +static inline void *get_shadow_from_swap_cache(swp_entry_t entry) +{ + return NULL; +} + +static inline int add_to_swap_cache(struct page *page, swp_entry_t entry, + gfp_t gfp_mask, void **shadowp) +{ + return -1; +} + +static inline void __delete_from_swap_cache(struct page *page, + swp_entry_t entry, void *shadow) +{ +} + +static inline void delete_from_swap_cache(struct page *page) +{ +} + +static inline void clear_shadow_from_swap_cache(int type, unsigned long begin, + unsigned long end) +{ +} + +#endif /* CONFIG_SWAP */ +#endif /* _MM_SWAP_H */ diff --git a/mm/swap_state.c b/mm/swap_state.c index 013856004825..5437dd317cf3 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -23,6 +23,7 @@ #include #include #include "internal.h" +#include "swap.h" /* * swapper_space is a fiction, retained to simplify the path through diff --git a/mm/swapfile.c b/mm/swapfile.c index 63c61f8b2611..2650927a009b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -44,6 +44,7 @@ #include #include #include +#include "swap.h" static bool swap_count_continued(struct swap_info_struct *, pgoff_t, unsigned char); diff --git a/mm/util.c b/mm/util.c index 54e5e761a9a9..e8f59c0ef90f 100644 --- a/mm/util.c +++ b/mm/util.c @@ -27,6 +27,7 @@ #include #include "internal.h" +#include "swap.h" /** * kfree_const - conditionally free memory diff --git a/mm/vmscan.c b/mm/vmscan.c index 1678802e03e7..60378d36ec77 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -59,6 +59,7 @@ #include #include "internal.h" +#include "swap.h" #define CREATE_TRACE_POINTS #include diff --git a/mm/zswap.c b/mm/zswap.c index 3efd8cae315e..2c5db4cbedea 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -36,6 +36,8 @@ #include #include +#include "swap.h" + /********************************* * statistics **********************************/ From patchwork Tue Mar 29 23:49:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12795342 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C626C433EF for ; Tue, 29 Mar 2022 23:51:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AAF5E8D0005; Tue, 29 Mar 2022 19:51:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A379C8D0001; Tue, 29 Mar 2022 19:51:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B11C8D0005; Tue, 29 Mar 2022 19:51:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0032.hostedemail.com [216.40.44.32]) by kanga.kvack.org (Postfix) with ESMTP id 77C318D0001 for ; Tue, 29 Mar 2022 19:51:28 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 22B221828D819 for ; Tue, 29 Mar 2022 23:51:28 +0000 (UTC) X-FDA: 79299073056.22.9BFAC5A Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf22.hostedemail.com (Postfix) with ESMTP id 83D5DC0003 for ; Tue, 29 Mar 2022 23:51:27 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 2FAF8210E4; Tue, 29 Mar 2022 23:51:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1648597886; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HnOSi35e7huDS3NoAvQMcSW+GRRJDr6mjIJuI3yhq6M=; b=fDs8Hp1TppCMD+ZF8OC62VmBEk2geE7a2GqsNOIWt9agTqYBcAZ6C8+JMXk+9P9IB98h/U mkjHuIrswYegFdfk4RU0N2ouvCSCYIlbpTSTZWwHPEOaOSiuXX3R8BBGSmucMusvAfwHlK HEKnxubAhOTexgHFWgLBV1D7V8LCYws= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1648597886; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HnOSi35e7huDS3NoAvQMcSW+GRRJDr6mjIJuI3yhq6M=; b=kjM8SjqIU011vunEmTySEzBTseN+224GBIV3vdL63xb2Mocnj0NfE45cl+sxn0yRrkW76+ vyRpl2do8mKsBPDQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 55A8713A7E; Tue, 29 Mar 2022 23:51:24 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id stJ5BXybQ2IkLwAAMHmgww (envelope-from ); Tue, 29 Mar 2022 23:51:24 +0000 Subject: [PATCH 02/10] MM: drop swap_dirty_folio From: NeilBrown To: Andrew Morton Cc: Christoph Hellwig , David Howells , linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 30 Mar 2022 10:49:41 +1100 Message-ID: <164859778123.29473.6900942583784889976.stgit@noble.brown> In-Reply-To: <164859751830.29473.5309689752169286816.stgit@noble.brown> References: <164859751830.29473.5309689752169286816.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspam-User: Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=fDs8Hp1T; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=kjM8SjqI; spf=pass (imf22.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 83D5DC0003 X-Stat-Signature: ucai59qpmaqrtkoz8ic88bdcxk1tcujr X-HE-Tag: 1648597887-877141 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: folios that are written to swap are owned by the MM subsystem - not any filesystem. When such a folio is passed to a filesystem to be written out to a swap-file, the filesystem handles the data, but the folio itself does not belong to the filesystem. So calling the filesystem's ->dirty_folio() address_space operation makes no sense. This is for folios in the given address space, and a folio to be written to swap does not exist in the given address space. So drop swap_dirty_folio() which calls the address-space's ->dirty_folio(), and always use noop_dirty_folio(), which is appropriate for folios being swapped out. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- include/linux/swap.h | 1 - mm/page_io.c | 17 ----------------- mm/swap_state.c | 2 +- 3 files changed, 1 insertion(+), 19 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 11390dde5a6c..6bc9e21262de 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -420,7 +420,6 @@ extern void kswapd_stop(int nid); #ifdef CONFIG_SWAP -bool swap_dirty_folio(struct address_space *mapping, struct folio *folio); int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); int generic_swapfile_activate(struct swap_info_struct *, struct file *, diff --git a/mm/page_io.c b/mm/page_io.c index d01ab9d5410a..5ffdbda31a16 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -439,20 +439,3 @@ int swap_readpage(struct page *page, bool synchronous) delayacct_swapin_end(); return ret; } - -bool swap_dirty_folio(struct address_space *mapping, struct folio *folio) -{ - struct swap_info_struct *sis = swp_swap_info(folio_swap_entry(folio)); - - if (data_race(sis->flags & SWP_FS_OPS)) { - const struct address_space_operations *aops; - - mapping = sis->swap_file->f_mapping; - aops = mapping->a_ops; - - VM_BUG_ON_FOLIO(!folio_test_swapcache(folio), folio); - return aops->dirty_folio(mapping, folio); - } else { - return noop_dirty_folio(mapping, folio); - } -} diff --git a/mm/swap_state.c b/mm/swap_state.c index 5437dd317cf3..f3ab01801629 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -31,7 +31,7 @@ */ static const struct address_space_operations swap_aops = { .writepage = swap_writepage, - .dirty_folio = swap_dirty_folio, + .dirty_folio = noop_dirty_folio, #ifdef CONFIG_MIGRATION .migratepage = migrate_page, #endif From patchwork Tue Mar 29 23:49:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12795343 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 749BCC433F5 for ; Tue, 29 Mar 2022 23:51:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B4A7B8D0006; Tue, 29 Mar 2022 19:51:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD4ED8D0001; Tue, 29 Mar 2022 19:51:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 94E808D0006; Tue, 29 Mar 2022 19:51:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 7FD948D0001 for ; Tue, 29 Mar 2022 19:51:37 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4BF41207A0 for ; Tue, 29 Mar 2022 23:51:37 +0000 (UTC) X-FDA: 79299073434.11.20C4378 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf09.hostedemail.com (Postfix) with ESMTP id B4F26140015 for ; Tue, 29 Mar 2022 23:51:36 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id AE73E210E4; Tue, 29 Mar 2022 23:51:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1648597895; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fUqsktX1Dv6qEK49ZdefTv/pOyZaEMsB1qtsMGhOrI4=; b=LvNolslLWWnhLdL/xjqxOHqkEMnr1D2s2zo0ZkV7o4M61dzgurAVy4d+4feES/DzhFlvBA Rwmaqp1DJD194Fis6kF6LpmUVovPGYtCPJsjN+6ncbXteQIE6K5e0huLuNB0kbxZSF3cRq B4q7NuCJVrWUsoRRDFHbzkkFQXKbIK4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1648597895; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fUqsktX1Dv6qEK49ZdefTv/pOyZaEMsB1qtsMGhOrI4=; b=QOa9e9vj7wV7WsyjC3Pj4tX/B/UhY320L+3XytQmEOmOmBS9JgR9QB46s/Bkan94wyYnjP JXPSJwFAzw/4QvAg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 7F53A13A7E; Tue, 29 Mar 2022 23:51:30 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id E/MiD4KbQ2IvLwAAMHmgww (envelope-from ); Tue, 29 Mar 2022 23:51:30 +0000 Subject: [PATCH 03/10] MM: move responsibility for setting SWP_FS_OPS to ->swap_activate From: NeilBrown To: Andrew Morton Cc: Christoph Hellwig , David Howells , linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 30 Mar 2022 10:49:41 +1100 Message-ID: <164859778123.29473.17908205846599043598.stgit@noble.brown> In-Reply-To: <164859751830.29473.5309689752169286816.stgit@noble.brown> References: <164859751830.29473.5309689752169286816.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Stat-Signature: 8ghnjstf8h9x817bed1wbtjy4m8fkbn1 Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=LvNolslL; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=QOa9e9vj; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf09.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: B4F26140015 X-HE-Tag: 1648597896-310294 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If a filesystem wishes to handle all swap IO itself (via ->direct_IO and ->readpage), rather than just providing devices addresses for submit_bio(), SWP_FS_OPS must be set. Currently the protocol for setting this it to have ->swap_activate return zero. In that case SWP_FS_OPS is set, and add_swap_extent() is called for the entire file. This is a little clumsy as different return values for ->swap_activate have quite different meanings, and it makes it hard to search for which filesystems require SWP_FS_OPS to be set. So remove the special meaning of a zero return, and require the filesystem to set SWP_FS_OPS if it so desires, and to always call add_swap_extent() as required. Currently only NFS and CIFS return zero for add_swap_extent(). Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- fs/cifs/file.c | 3 ++- fs/nfs/file.c | 13 +++++++++++-- include/linux/swap.h | 6 ++++++ mm/swapfile.c | 10 +++------- 4 files changed, 22 insertions(+), 10 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 60f43bff7ccb..050f463580f3 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -4927,7 +4927,8 @@ static int cifs_swap_activate(struct swap_info_struct *sis, * from reading or writing the file */ - return 0; + sis->flags |= SWP_FS_OPS; + return add_swap_extent(sis, 0, sis->max, 0); } static void cifs_swap_deactivate(struct file *file) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 2df2a5392737..66136dca0ad5 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -488,6 +488,7 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, { unsigned long blocks; long long isize; + int ret; struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); struct inode *inode = file->f_mapping->host; @@ -500,9 +501,17 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, return -EINVAL; } + ret = rpc_clnt_swap_activate(clnt); + if (ret) + return ret; + ret = add_swap_extent(sis, 0, sis->max, 0); + if (ret < 0) { + rpc_clnt_swap_deactivate(clnt); + return ret; + } *span = sis->pages; - - return rpc_clnt_swap_activate(clnt); + sis->flags |= SWP_FS_OPS; + return ret; } static void nfs_swap_deactivate(struct file *file) diff --git a/include/linux/swap.h b/include/linux/swap.h index 6bc9e21262de..e18b7edccc1d 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -570,6 +570,12 @@ static inline swp_entry_t get_swap_page(struct page *page) return entry; } +static inline int add_swap_extent(struct swap_info_struct *sis, + unsigned long start_page, + unsigned long nr_pages, sector_t start_block) +{ + return -EINVAL; +} #endif /* CONFIG_SWAP */ #ifdef CONFIG_THP_SWAP diff --git a/mm/swapfile.c b/mm/swapfile.c index 2650927a009b..8710c9c29862 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2244,13 +2244,9 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span) if (mapping->a_ops->swap_activate) { ret = mapping->a_ops->swap_activate(sis, swap_file, span); - if (ret >= 0) - sis->flags |= SWP_ACTIVATED; - if (!ret) { - sis->flags |= SWP_FS_OPS; - ret = add_swap_extent(sis, 0, sis->max, 0); - *span = sis->pages; - } + if (ret < 0) + return ret; + sis->flags |= SWP_ACTIVATED; return ret; } From patchwork Tue Mar 29 23:49:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12795344 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66662C433EF for ; Tue, 29 Mar 2022 23:51:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E57F28D0007; Tue, 29 Mar 2022 19:51:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DE1598D0001; Tue, 29 Mar 2022 19:51:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C5B5E8D0007; Tue, 29 Mar 2022 19:51:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0002.hostedemail.com [216.40.44.2]) by kanga.kvack.org (Postfix) with ESMTP id AE26F8D0001 for ; Tue, 29 Mar 2022 19:51:48 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6E9A5A010E for ; Tue, 29 Mar 2022 23:51:48 +0000 (UTC) X-FDA: 79299073896.28.7B66963 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf01.hostedemail.com (Postfix) with ESMTP id D540A40003 for ; Tue, 29 Mar 2022 23:51:47 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id D2912218F9; Tue, 29 Mar 2022 23:51:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1648597906; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sLvvpukBNWt13VBBabTsBNGnhJSKURMdgbwJ9UivCME=; b=1UD9aAQ8orYrMK0d0F0TErQntCJpTfs64QKgUga1VfuazRBB++sMQf8lzTP7dKndQHHGOp 4DbHUWSK64s0l0Uwk11IgeOybARpFK+xyPMcYTnAzjRWk2bQtidgvc/iF5HluPfYchA5Sp CAa7jCFUKUw1pUDVqC/CaHAnyc0+Y/E= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1648597906; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sLvvpukBNWt13VBBabTsBNGnhJSKURMdgbwJ9UivCME=; b=36zu8erF3HUybCY//DfF0XMJtYIgpph+i5zjMsGzt+6uxRgaFQutxNcn5EKYzSLC8tpzjS kP8JveVby32aqlBQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id E3A4313A7E; Tue, 29 Mar 2022 23:51:42 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id qZkrJ46bQ2I1LwAAMHmgww (envelope-from ); Tue, 29 Mar 2022 23:51:42 +0000 Subject: [PATCH 04/10] MM: reclaim mustn't enter FS for SWP_FS_OPS swap-space From: NeilBrown To: Andrew Morton Cc: Christoph Hellwig , David Howells , linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 30 Mar 2022 10:49:41 +1100 Message-ID: <164859778124.29473.16176717935781721855.stgit@noble.brown> In-Reply-To: <164859751830.29473.5309689752169286816.stgit@noble.brown> References: <164859751830.29473.5309689752169286816.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: D540A40003 X-Stat-Signature: hnpkos786jiuab5rx9wxkkfqwas7r1zm X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=1UD9aAQ8; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=36zu8erF; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf01.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de X-HE-Tag: 1648597907-805771 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If swap-out is using filesystem operations (SWP_FS_OPS), then it is not safe to enter the FS for reclaim. So only down-grade the requirement for swap pages to __GFP_IO after checking that SWP_FS_OPS are not being used. This makes the calculation of "may_enter_fs" slightly more complex, so move it into a separate function. with that done, there is little value in maintaining the bool variable any more. So replace the may_enter_fs variable with a may_enter_fs() function. This removes any risk for the variable becoming out-of-date. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- mm/swap.h | 8 ++++++++ mm/vmscan.c | 29 ++++++++++++++++++++--------- 2 files changed, 28 insertions(+), 9 deletions(-) diff --git a/mm/swap.h b/mm/swap.h index f8265bf0ce00..e19f185df5e2 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -50,6 +50,10 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, struct vm_fault *vmf); +static inline unsigned int page_swap_flags(struct page *page) +{ + return page_swap_info(page)->flags; +} #else /* CONFIG_SWAP */ static inline int swap_readpage(struct page *page, bool do_poll) { @@ -129,5 +133,9 @@ static inline void clear_shadow_from_swap_cache(int type, unsigned long begin, { } +static inline unsigned int page_swap_flags(struct page *page) +{ + return 0; +} #endif /* CONFIG_SWAP */ #endif /* _MM_SWAP_H */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 60378d36ec77..9150754bf2b8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1502,6 +1502,22 @@ static unsigned int demote_page_list(struct list_head *demote_pages, return nr_succeeded; } +static bool may_enter_fs(struct page *page, gfp_t gfp_mask) +{ + if (gfp_mask & __GFP_FS) + return true; + if (!PageSwapCache(page) || !(gfp_mask & __GFP_IO)) + return false; + /* + * We can "enter_fs" for swap-cache with only __GFP_IO + * providing this isn't SWP_FS_OPS. + * ->flags can be updated non-atomicially (scan_swap_map_slots), + * but that will never affect SWP_FS_OPS, so the data_race + * is safe. + */ + return !data_race(page_swap_flags(page) & SWP_FS_OPS); +} + /* * shrink_page_list() returns the number of reclaimed pages */ @@ -1528,7 +1544,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, struct page *page; struct folio *folio; enum page_references references = PAGEREF_RECLAIM; - bool dirty, writeback, may_enter_fs; + bool dirty, writeback; unsigned int nr_pages; cond_resched(); @@ -1553,9 +1569,6 @@ static unsigned int shrink_page_list(struct list_head *page_list, if (!sc->may_unmap && page_mapped(page)) goto keep_locked; - may_enter_fs = (sc->gfp_mask & __GFP_FS) || - (PageSwapCache(page) && (sc->gfp_mask & __GFP_IO)); - /* * The number of dirty pages determines if a node is marked * reclaim_congested. kswapd will stall and start writing @@ -1598,7 +1611,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, * not to fs). In this case mark the page for immediate * reclaim and continue scanning. * - * Require may_enter_fs because we would wait on fs, which + * Require may_enter_fs() because we would wait on fs, which * may not have submitted IO yet. And the loop driver might * enter reclaim, and deadlock if it waits on a page for * which it is needed to do the write (loop masks off @@ -1630,7 +1643,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, /* Case 2 above */ } else if (writeback_throttling_sane(sc) || - !PageReclaim(page) || !may_enter_fs) { + !PageReclaim(page) || !may_enter_fs(page, sc->gfp_mask)) { /* * This is slightly racy - end_page_writeback() * might have just cleared PageReclaim, then @@ -1720,8 +1733,6 @@ static unsigned int shrink_page_list(struct list_head *page_list, goto activate_locked_split; } - may_enter_fs = true; - /* Adding to swap updated mapping */ mapping = page_mapping(page); } @@ -1792,7 +1803,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, if (references == PAGEREF_RECLAIM_CLEAN) goto keep_locked; - if (!may_enter_fs) + if (!may_enter_fs(page, sc->gfp_mask)) goto keep_locked; if (!sc->may_writepage) goto keep_locked; From patchwork Tue Mar 29 23:49:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12795345 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30A35C433F5 for ; Tue, 29 Mar 2022 23:51:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B4B288D0008; Tue, 29 Mar 2022 19:51:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD4838D0001; Tue, 29 Mar 2022 19:51:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 94E8B8D0008; Tue, 29 Mar 2022 19:51:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 7C48A8D0001 for ; Tue, 29 Mar 2022 19:51:56 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 562B24A7 for ; Tue, 29 Mar 2022 23:51:56 +0000 (UTC) X-FDA: 79299074232.13.42E759F Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf17.hostedemail.com (Postfix) with ESMTP id B877E4001B for ; Tue, 29 Mar 2022 23:51:55 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id BC4A31F869; Tue, 29 Mar 2022 23:51:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1648597914; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oD2sx4C2K4gKivVvzHCUML/j+d0W7oIOs5nPocXITZk=; b=sZGZgC++vx+vHRjb4grpV6e/1XjJQrgYuKe3NcS/RD633GfUk/tuUMZ4LbRruFngnbe832 bzW2+SUlO98rLTK3IkDIF77AfBqTdZjfv276HiP5o+mUuxc0GKqJIOgke8OcUBe5K2UDPn Re9+tuf3tlm6E802rubAq8m1EPGdVFs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1648597914; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oD2sx4C2K4gKivVvzHCUML/j+d0W7oIOs5nPocXITZk=; b=FIo3rehNH8R13SVRWbhCkiZXMWtJ8XC79+K/X1xWoyZ/pqMqbgINvgCX7eDyOkxJ1lfbdo P9HM4So+h7r00lDQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B91FE13A7E; Tue, 29 Mar 2022 23:51:52 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id UIV7G5ibQ2I6LwAAMHmgww (envelope-from ); Tue, 29 Mar 2022 23:51:52 +0000 Subject: [PATCH 05/10] MM: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space From: NeilBrown To: Andrew Morton Cc: Christoph Hellwig , David Howells , linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 30 Mar 2022 10:49:41 +1100 Message-ID: <164859778125.29473.13430559328221330589.stgit@noble.brown> In-Reply-To: <164859751830.29473.5309689752169286816.stgit@noble.brown> References: <164859751830.29473.5309689752169286816.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Stat-Signature: meo7151jdjhtzckudkfqpp1cwpb93wit Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=sZGZgC++; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=FIo3rehN; spf=pass (imf17.hostedemail.com: domain of neilb@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: B877E4001B X-HE-Tag: 1648597915-72196 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: swap currently uses ->readpage to read swap pages. This can only request one page at a time from the filesystem, which is not most efficient. swap uses ->direct_IO for writes which while this is adequate is an inappropriate over-loading. ->direct_IO may need to had handle allocate space for holes or other details that are not relevant for swap. So this patch introduces a new address_space operation: ->swap_rw. In this patch it is used for reads, and a subsequent patch will switch writes to use it. No filesystem yet supports ->swap_rw, but that is not a problem because no filesystem actually works with filesystem-based swap. Only two filesystems set SWP_FS_OPS: - cifs sets the flag, but ->direct_IO always fails so swap cannot work. - nfs sets the flag, but ->direct_IO calls generic_write_checks() which has failed on swap files for several releases. To ensure that a NULL ->swap_rw isn't called, ->activate_swap() for both NFS and cifs are changed to fail if ->swap_rw is not set. This can be removed if/when the function is added. Future patches will restore swap-over-NFS functionality. To submit an async read with ->swap_rw() we need to allocate a structure to hold the kiocb and other details. swap_readpage() cannot handle transient failure, so we create a mempool to provide the structures. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- fs/cifs/file.c | 4 +++ fs/nfs/file.c | 4 +++ include/linux/fs.h | 1 + mm/page_io.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++----- mm/swap.h | 1 + mm/swapfile.c | 5 ++++ 6 files changed, 77 insertions(+), 6 deletions(-) diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 050f463580f3..cde8466f260b 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -4899,6 +4899,10 @@ static int cifs_swap_activate(struct swap_info_struct *sis, cifs_dbg(FYI, "swap activate\n"); + if (!swap_file->f_mapping->a_ops->swap_rw) + /* Cannot support swap */ + return -EINVAL; + spin_lock(&inode->i_lock); blocks = inode->i_blocks; isize = inode->i_size; diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 66136dca0ad5..6da81a4f3bff 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -492,6 +492,10 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); struct inode *inode = file->f_mapping->host; + if (!file->f_mapping->a_ops->swap_rw) + /* Cannot support swap */ + return -EINVAL; + spin_lock(&inode->i_lock); blocks = inode->i_blocks; isize = inode->i_size; diff --git a/include/linux/fs.h b/include/linux/fs.h index 183160872133..7c65e09c09a6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -409,6 +409,7 @@ struct address_space_operations { int (*swap_activate)(struct swap_info_struct *sis, struct file *file, sector_t *span); void (*swap_deactivate)(struct file *file); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); }; extern const struct address_space_operations empty_aops; diff --git a/mm/page_io.c b/mm/page_io.c index 5ffdbda31a16..52d423c9962b 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -284,6 +284,25 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page) #define bio_associate_blkg_from_page(bio, page) do { } while (0) #endif /* CONFIG_MEMCG && CONFIG_BLK_CGROUP */ +struct swap_iocb { + struct kiocb iocb; + struct bio_vec bvec; +}; +static mempool_t *sio_pool; + +int sio_pool_init(void) +{ + if (!sio_pool) { + mempool_t *pool = mempool_create_kmalloc_pool( + SWAP_CLUSTER_MAX, sizeof(struct swap_iocb)); + if (cmpxchg(&sio_pool, NULL, pool)) + mempool_destroy(pool); + } + if (!sio_pool) + return -ENOMEM; + return 0; +} + int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func) { @@ -355,6 +374,48 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return 0; } +static void sio_read_complete(struct kiocb *iocb, long ret) +{ + struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); + struct page *page = sio->bvec.bv_page; + + if (ret != 0 && ret != PAGE_SIZE) { + SetPageError(page); + ClearPageUptodate(page); + pr_alert_ratelimited("Read-error on swap-device\n"); + } else { + SetPageUptodate(page); + count_vm_event(PSWPIN); + } + unlock_page(page); + mempool_free(sio, sio_pool); +} + +static int swap_readpage_fs(struct page *page) +{ + struct swap_info_struct *sis = page_swap_info(page); + struct file *swap_file = sis->swap_file; + struct address_space *mapping = swap_file->f_mapping; + struct iov_iter from; + struct swap_iocb *sio; + loff_t pos = page_file_offset(page); + int ret; + + sio = mempool_alloc(sio_pool, GFP_KERNEL); + init_sync_kiocb(&sio->iocb, swap_file); + sio->iocb.ki_pos = pos; + sio->iocb.ki_complete = sio_read_complete; + sio->bvec.bv_page = page; + sio->bvec.bv_len = PAGE_SIZE; + sio->bvec.bv_offset = 0; + + iov_iter_bvec(&from, READ, &sio->bvec, 1, PAGE_SIZE); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_read_complete(&sio->iocb, ret); + return ret; +} + int swap_readpage(struct page *page, bool synchronous) { struct bio *bio; @@ -383,12 +444,7 @@ int swap_readpage(struct page *page, bool synchronous) } if (data_race(sis->flags & SWP_FS_OPS)) { - struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - - ret = mapping->a_ops->readpage(swap_file, page); - if (!ret) - count_vm_event(PSWPIN); + ret = swap_readpage_fs(page); goto out; } diff --git a/mm/swap.h b/mm/swap.h index e19f185df5e2..eafac80b18d9 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -6,6 +6,7 @@ #include /* for bio_end_io_t */ /* linux/mm/page_io.c */ +int sio_pool_init(void); int swap_readpage(struct page *page, bool do_poll); int swap_writepage(struct page *page, struct writeback_control *wbc); void end_swap_bio_write(struct bio *bio); diff --git a/mm/swapfile.c b/mm/swapfile.c index 8710c9c29862..2c9b4a7aecb0 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2247,6 +2247,11 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span) if (ret < 0) return ret; sis->flags |= SWP_ACTIVATED; + if ((sis->flags & SWP_FS_OPS) && + sio_pool_init() != 0) { + destroy_swap_extents(sis); + return -ENOMEM; + } return ret; } From patchwork Tue Mar 29 23:49:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12795346 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4084C433F5 for ; Tue, 29 Mar 2022 23:52:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A4948D0002; Tue, 29 Mar 2022 19:52:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 52DB78D0001; Tue, 29 Mar 2022 19:52:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3815D8D0002; Tue, 29 Mar 2022 19:52:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0020.hostedemail.com [216.40.44.20]) by kanga.kvack.org (Postfix) with ESMTP id 293548D0001 for ; Tue, 29 Mar 2022 19:52:08 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id DBA3518293234 for ; Tue, 29 Mar 2022 23:52:07 +0000 (UTC) X-FDA: 79299074694.30.0500392 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf13.hostedemail.com (Postfix) with ESMTP id 74E2E20014 for ; Tue, 29 Mar 2022 23:52:07 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 72B36218FB; Tue, 29 Mar 2022 23:52:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1648597926; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eZoKoyzq+mjWgFFmoZiRVp8Ge/xCx6lu4KsWyGo8HHM=; b=WK4ZqFzlcF4tMT+Q5HuZ+j0mqX8tYAUW7Brwlg+zCDxZlSeohFiKL7F4UKcMsQC/UZyX57 gRlNTV9fSse1claA38oMwqGVAvkjfUfqnd+rTbpKcFdqhCn26XKQVSquHTlETlV/KPhFQb 1EEVKC64hdhEnm9nqPlg/kN8Qdr+P4g= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1648597926; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eZoKoyzq+mjWgFFmoZiRVp8Ge/xCx6lu4KsWyGo8HHM=; b=k/3H9b5rfhEwGFBsuRm7S8d3P8BCNLsEFUrE798qhjoonzahT9uqijZIQY5MjCaqzdSIGF Nbvs0SOXowpvrGDg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2B56013A7E; Tue, 29 Mar 2022 23:52:00 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Kke9NaCbQ2JBLwAAMHmgww (envelope-from ); Tue, 29 Mar 2022 23:52:00 +0000 Subject: [PATCH 06/10] MM: perform async writes to SWP_FS_OPS swap-space using ->swap_rw From: NeilBrown To: Andrew Morton Cc: Christoph Hellwig , David Howells , linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 30 Mar 2022 10:49:41 +1100 Message-ID: <164859778126.29473.12399585304843922231.stgit@noble.brown> In-Reply-To: <164859751830.29473.5309689752169286816.stgit@noble.brown> References: <164859751830.29473.5309689752169286816.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Stat-Signature: xgg6g6atnpst8tihiokq5sinmh4mhfok Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=WK4ZqFzl; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b="k/3H9b5r"; spf=pass (imf13.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 74E2E20014 X-HE-Tag: 1648597927-817889 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch switches swap-out to SWP_FS_OPS swap-spaces to use ->swap_rw and makes the writes asynchronous, like they are for other swap spaces. To make it async we need to allocate the kiocb struct from a mempool. This may block, but won't block as long as waiting for the write to complete. At most it will wait for some previous swap IO to complete. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- mm/page_io.c | 98 ++++++++++++++++++++++++++++++++++------------------------ 1 file changed, 58 insertions(+), 40 deletions(-) diff --git a/mm/page_io.c b/mm/page_io.c index 52d423c9962b..a01cc273bb00 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -303,6 +303,57 @@ int sio_pool_init(void) return 0; } +static void sio_write_complete(struct kiocb *iocb, long ret) +{ + struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); + struct page *page = sio->bvec.bv_page; + + if (ret != PAGE_SIZE) { + /* + * In the case of swap-over-nfs, this can be a + * temporary failure if the system has limited + * memory for allocating transmit buffers. + * Mark the page dirty and avoid + * folio_rotate_reclaimable but rate-limit the + * messages but do not flag PageError like + * the normal direct-to-bio case as it could + * be temporary. + */ + set_page_dirty(page); + ClearPageReclaim(page); + pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n", + ret, page_file_offset(page)); + } else + count_vm_event(PSWPOUT); + end_page_writeback(page); + mempool_free(sio, sio_pool); +} + +static int swap_writepage_fs(struct page *page, struct writeback_control *wbc) +{ + struct swap_iocb *sio; + struct swap_info_struct *sis = page_swap_info(page); + struct file *swap_file = sis->swap_file; + struct address_space *mapping = swap_file->f_mapping; + struct iov_iter from; + int ret; + + set_page_writeback(page); + unlock_page(page); + sio = mempool_alloc(sio_pool, GFP_NOIO); + init_sync_kiocb(&sio->iocb, swap_file); + sio->iocb.ki_complete = sio_write_complete; + sio->iocb.ki_pos = page_file_offset(page); + sio->bvec.bv_page = page; + sio->bvec.bv_len = PAGE_SIZE; + sio->bvec.bv_offset = 0; + iov_iter_bvec(&from, WRITE, &sio->bvec, 1, PAGE_SIZE); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_write_complete(&sio->iocb, ret); + return ret; +} + int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func) { @@ -311,46 +362,13 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, struct swap_info_struct *sis = page_swap_info(page); VM_BUG_ON_PAGE(!PageSwapCache(page), page); - if (data_race(sis->flags & SWP_FS_OPS)) { - struct kiocb kiocb; - struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - struct bio_vec bv = { - .bv_page = page, - .bv_len = PAGE_SIZE, - .bv_offset = 0 - }; - struct iov_iter from; - - iov_iter_bvec(&from, WRITE, &bv, 1, PAGE_SIZE); - init_sync_kiocb(&kiocb, swap_file); - kiocb.ki_pos = page_file_offset(page); - - set_page_writeback(page); - unlock_page(page); - ret = mapping->a_ops->direct_IO(&kiocb, &from); - if (ret == PAGE_SIZE) { - count_vm_event(PSWPOUT); - ret = 0; - } else { - /* - * In the case of swap-over-nfs, this can be a - * temporary failure if the system has limited - * memory for allocating transmit buffers. - * Mark the page dirty and avoid - * folio_rotate_reclaimable but rate-limit the - * messages but do not flag PageError like - * the normal direct-to-bio case as it could - * be temporary. - */ - set_page_dirty(page); - ClearPageReclaim(page); - pr_err_ratelimited("Write error on dio swapfile (%llu)\n", - page_file_offset(page)); - } - end_page_writeback(page); - return ret; - } + /* + * ->flags can be updated non-atomicially (scan_swap_map_slots), + * but that will never affect SWP_FS_OPS, so the data_race + * is safe. + */ + if (data_race(sis->flags & SWP_FS_OPS)) + return swap_writepage_fs(page, wbc); ret = bdev_write_page(sis->bdev, swap_page_sector(page), page, wbc); if (!ret) { From patchwork Tue Mar 29 23:49:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12795347 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D36B8C433F5 for ; Tue, 29 Mar 2022 23:52:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 494858D0005; Tue, 29 Mar 2022 19:52:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 41C788D0001; Tue, 29 Mar 2022 19:52:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 297098D0005; Tue, 29 Mar 2022 19:52:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0229.hostedemail.com [216.40.44.229]) by kanga.kvack.org (Postfix) with ESMTP id 161EB8D0001 for ; Tue, 29 Mar 2022 19:52:21 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C3AA8A4A4A for ; Tue, 29 Mar 2022 23:52:20 +0000 (UTC) X-FDA: 79299075240.22.D3CBBBC Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf07.hostedemail.com (Postfix) with ESMTP id 3F4E140004 for ; Tue, 29 Mar 2022 23:52:20 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 2F6AF218F9; Tue, 29 Mar 2022 23:52:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1648597939; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tE32eedhCbqKNV/qzY0QU0j1sD/lbIXfPgDh2Z7fRTs=; b=npmS5Rtdt/vjDg9rTITOYuf0u/0wOAROyuYict0Yx8Rlg5aAG7crOrIyOIn5n7L9jpR+l3 xRm3PhCNFMQsJ0ujiDy2L6oYKm1FM94lty/8pH2WSGM5CMs0tIlhDERpH4jI7U4LC8JfcK joM92dZM0iUmJ2/asLXy8NyVeEpzxLY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1648597939; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tE32eedhCbqKNV/qzY0QU0j1sD/lbIXfPgDh2Z7fRTs=; b=3wC0HgxOOEeD/3T3+oCgunZQv+J+1MqE6Bd8mMwtFdIWZsqNC9klDTo+16h0XLvnTbFdk4 MQLIjNp7nPKQUXAQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 4922A13A7E; Tue, 29 Mar 2022 23:52:17 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id nFx+ArGbQ2JILwAAMHmgww (envelope-from ); Tue, 29 Mar 2022 23:52:17 +0000 Subject: [PATCH 07/10] DOC: update documentation for swap_activate and swap_rw From: NeilBrown To: Andrew Morton Cc: Christoph Hellwig , David Howells , linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 30 Mar 2022 10:49:41 +1100 Message-ID: <164859778126.29473.6778751233552859461.stgit@noble.brown> In-Reply-To: <164859751830.29473.5309689752169286816.stgit@noble.brown> References: <164859751830.29473.5309689752169286816.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Stat-Signature: iufj9eumxog4npyh3k9jkcd8nji5hiim X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 3F4E140004 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=npmS5Rtd; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=3wC0HgxO; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf07.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de X-Rspam-User: X-HE-Tag: 1648597940-230352 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This documentation for ->swap_activate() has been out-of-date for a long time. This patch updates it to match recent changes, and adds documentation for the associated ->swap_rw() Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- Documentation/filesystems/locking.rst | 18 ++++++++++++------ Documentation/filesystems/vfs.rst | 17 ++++++++++++----- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 2998cec9af4b..009d855c9be5 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -260,8 +260,9 @@ prototypes:: int (*launder_folio)(struct folio *); bool (*is_partially_uptodate)(struct folio *, size_t from, size_t count); int (*error_remove_page)(struct address_space *, struct page *); - int (*swap_activate)(struct file *); + int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span) int (*swap_deactivate)(struct file *); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); locking rules: All except dirty_folio and freepage may block @@ -290,6 +291,7 @@ is_partially_uptodate: yes error_remove_page: yes swap_activate: no swap_deactivate: no +swap_rw: yes, unlocks ====================== ======================== ========= =============== ->write_begin(), ->write_end() and ->readpage() may be called from @@ -392,15 +394,19 @@ cleaned, or an error value if not. Note that in order to prevent the folio getting mapped back in and redirtied, it needs to be kept locked across the entire operation. -->swap_activate will be called with a non-zero argument on -files backing (non block device backed) swapfiles. A return value -of zero indicates success, in which case this file can be used for -backing swapspace. The swapspace operations will be proxied to the -address space operations. +->swap_activate() will be called to prepare the given file for swap. It +should perform any validation and preparation necessary to ensure that +writes can be performed with minimal memory allocation. It should call +add_swap_extent(), or the helper iomap_swapfile_activate(), and return +the number of extents added. If IO should be submitted through +->swap_rw(), it should set SWP_FS_OPS, otherwise IO will be submitted +directly to the block device ``sis->bdev``. ->swap_deactivate() will be called in the sys_swapoff() path after ->swap_activate() returned success. +->swap_rw will be called for swap IO if SWP_FS_OPS was set by ->swap_activate(). + file_lock_operations ==================== diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 4f14edf93941..9d3480e089f6 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -751,8 +751,9 @@ cache in your filesystem. The following members are defined: size_t count); void (*is_dirty_writeback) (struct page *, bool *, bool *); int (*error_remove_page) (struct mapping *mapping, struct page *page); - int (*swap_activate)(struct file *); + int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span) int (*swap_deactivate)(struct file *); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); }; ``writepage`` @@ -963,15 +964,21 @@ cache in your filesystem. The following members are defined: unless you have them locked or reference counts increased. ``swap_activate`` - Called when swapon is used on a file to allocate space if - necessary and pin the block lookup information in memory. A - return value of zero indicates success, in which case this file - can be used to back swapspace. + + Called to prepare the given file for swap. It should perform + any validation and preparation necessary to ensure that writes + can be performed with minimal memory allocation. It should call + add_swap_extent(), or the helper iomap_swapfile_activate(), and + return the number of extents added. If IO should be submitted + through ->swap_rw(), it should set SWP_FS_OPS, otherwise IO will + be submitted directly to the block device ``sis->bdev``. ``swap_deactivate`` Called during swapoff on files where swap_activate was successful. +``swap_rw`` + Called to read or write swap pages when SWP_FS_OPS is set. The File Object =============== From patchwork Tue Mar 29 23:49:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12795348 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48AB7C433EF for ; Tue, 29 Mar 2022 23:52:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B24D08D0006; Tue, 29 Mar 2022 19:52:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AAC8D8D0001; Tue, 29 Mar 2022 19:52:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 925F58D0006; Tue, 29 Mar 2022 19:52:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0253.hostedemail.com [216.40.44.253]) by kanga.kvack.org (Postfix) with ESMTP id 7D1FA8D0001 for ; Tue, 29 Mar 2022 19:52:35 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2AD068249980 for ; Tue, 29 Mar 2022 23:52:35 +0000 (UTC) X-FDA: 79299075870.31.184E71E Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf23.hostedemail.com (Postfix) with ESMTP id 6D721140007 for ; Tue, 29 Mar 2022 23:52:34 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 6CEF2218F9; Tue, 29 Mar 2022 23:52:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1648597953; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tSXrVaD5DiGK3crhV7jd+q1J1Uq8L2v4rlku+hP3up4=; b=Cur9229XKJzV+Tc5YNCvmaC+c6cHdL4bc4tUgccnyLIh4kWVp5RHN8HYLIY5gA9+RM97NI qd5IcYIt/JhdPHZgcS9wHo12fnYRcByeyEHc0j9JoJgu0imsqZRRSgsOWLvRfRuDnUuwNT 7kgfaLgfCNfMIXu1tlXjK0p2S5sfDeQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1648597953; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tSXrVaD5DiGK3crhV7jd+q1J1Uq8L2v4rlku+hP3up4=; b=n90YQwjYGZAwzjwqYZKu3znA7VOr4WwdlNZVly/gooQogv42zYyeXdhKMkXzXVtBIVK3VV TZZJuLHYeS3ytiBg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3415313A7E; Tue, 29 Mar 2022 23:52:30 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id +53PN76bQ2JTLwAAMHmgww (envelope-from ); Tue, 29 Mar 2022 23:52:30 +0000 Subject: [PATCH 08/10] MM: submit multipage reads for SWP_FS_OPS swap-space From: NeilBrown To: Andrew Morton Cc: Christoph Hellwig , David Howells , linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 30 Mar 2022 10:49:41 +1100 Message-ID: <164859778127.29473.14059420492644907783.stgit@noble.brown> In-Reply-To: <164859751830.29473.5309689752169286816.stgit@noble.brown> References: <164859751830.29473.5309689752169286816.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=Cur9229X; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=n90YQwjY; spf=pass (imf23.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 6D721140007 X-Stat-Signature: tbnbgszumwp8o1tofc8fxgnau1oxx1c1 X-HE-Tag: 1648597954-122658 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: swap_readpage() is given one page at a time, but may be called repeatedly in succession. For block-device swap-space, the blk_plug functionality allows the multiple pages to be combined together at lower layers. That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page reads. With this patch we pass in a pointer-to-pointer when swap_readpage can store state between calls - much like the effect of blk_plug. After calling swap_readpage() some number of times, the state will be passed to swap_read_unplug() which can submit the combined request. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- mm/madvise.c | 8 +++- mm/memory.c | 2 + mm/page_io.c | 104 ++++++++++++++++++++++++++++++++++++------------------- mm/swap.h | 17 +++++++-- mm/swap_state.c | 20 +++++++---- 5 files changed, 104 insertions(+), 47 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 4f48e48432e8..297de11f73d6 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -198,6 +198,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, pte_t *orig_pte; struct vm_area_struct *vma = walk->private; unsigned long index; + struct swap_iocb *splug = NULL; if (pmd_none_or_trans_huge_or_clear_bad(pmd)) return 0; @@ -219,10 +220,11 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, continue; page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, - vma, index, false); + vma, index, false, &splug); if (page) put_page(page); } + swap_read_unplug(splug); return 0; } @@ -238,6 +240,7 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma, XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start)); pgoff_t end_index = linear_page_index(vma, end + PAGE_SIZE - 1); struct page *page; + struct swap_iocb *splug = NULL; rcu_read_lock(); xas_for_each(&xas, page, end_index) { @@ -250,13 +253,14 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma, swap = radix_to_swp_entry(page); page = read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE, - NULL, 0, false); + NULL, 0, false, &splug); if (page) put_page(page); rcu_read_lock(); } rcu_read_unlock(); + swap_read_unplug(splug); lru_add_drain(); /* Push any new pages onto the LRU now */ } diff --git a/mm/memory.c b/mm/memory.c index 92ea8ac374a4..8de0ad307cb2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3586,7 +3586,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* To provide entry to swap_readpage() */ set_page_private(page, entry.val); - swap_readpage(page, true); + swap_readpage(page, true, NULL); set_page_private(page, 0); } } else { diff --git a/mm/page_io.c b/mm/page_io.c index a01cc273bb00..8735707ea349 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -286,7 +286,8 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page) struct swap_iocb { struct kiocb iocb; - struct bio_vec bvec; + struct bio_vec bvec[SWAP_CLUSTER_MAX]; + int pages; }; static mempool_t *sio_pool; @@ -306,7 +307,7 @@ int sio_pool_init(void) static void sio_write_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); - struct page *page = sio->bvec.bv_page; + struct page *page = sio->bvec[0].bv_page; if (ret != PAGE_SIZE) { /* @@ -344,10 +345,10 @@ static int swap_writepage_fs(struct page *page, struct writeback_control *wbc) init_sync_kiocb(&sio->iocb, swap_file); sio->iocb.ki_complete = sio_write_complete; sio->iocb.ki_pos = page_file_offset(page); - sio->bvec.bv_page = page; - sio->bvec.bv_len = PAGE_SIZE; - sio->bvec.bv_offset = 0; - iov_iter_bvec(&from, WRITE, &sio->bvec, 1, PAGE_SIZE); + sio->bvec[0].bv_page = page; + sio->bvec[0].bv_len = PAGE_SIZE; + sio->bvec[0].bv_offset = 0; + iov_iter_bvec(&from, WRITE, &sio->bvec[0], 1, PAGE_SIZE); ret = mapping->a_ops->swap_rw(&sio->iocb, &from); if (ret != -EIOCBQUEUED) sio_write_complete(&sio->iocb, ret); @@ -395,46 +396,66 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, static void sio_read_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); - struct page *page = sio->bvec.bv_page; + int p; - if (ret != 0 && ret != PAGE_SIZE) { - SetPageError(page); - ClearPageUptodate(page); - pr_alert_ratelimited("Read-error on swap-device\n"); + if (ret == PAGE_SIZE * sio->pages) { + for (p = 0; p < sio->pages; p++) { + struct page *page = sio->bvec[p].bv_page; + + SetPageUptodate(page); + unlock_page(page); + } + count_vm_events(PSWPIN, sio->pages); } else { - SetPageUptodate(page); - count_vm_event(PSWPIN); + for (p = 0; p < sio->pages; p++) { + struct page *page = sio->bvec[p].bv_page; + + SetPageError(page); + ClearPageUptodate(page); + unlock_page(page); + } + pr_alert_ratelimited("Read-error on swap-device\n"); } - unlock_page(page); mempool_free(sio, sio_pool); } -static int swap_readpage_fs(struct page *page) +static void swap_readpage_fs(struct page *page, + struct swap_iocb **plug) { struct swap_info_struct *sis = page_swap_info(page); - struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - struct iov_iter from; - struct swap_iocb *sio; + struct swap_iocb *sio = NULL; loff_t pos = page_file_offset(page); - int ret; - - sio = mempool_alloc(sio_pool, GFP_KERNEL); - init_sync_kiocb(&sio->iocb, swap_file); - sio->iocb.ki_pos = pos; - sio->iocb.ki_complete = sio_read_complete; - sio->bvec.bv_page = page; - sio->bvec.bv_len = PAGE_SIZE; - sio->bvec.bv_offset = 0; - iov_iter_bvec(&from, READ, &sio->bvec, 1, PAGE_SIZE); - ret = mapping->a_ops->swap_rw(&sio->iocb, &from); - if (ret != -EIOCBQUEUED) - sio_read_complete(&sio->iocb, ret); - return ret; + if (plug) + sio = *plug; + if (sio) { + if (sio->iocb.ki_filp != sis->swap_file || + sio->iocb.ki_pos + sio->pages * PAGE_SIZE != pos) { + swap_read_unplug(sio); + sio = NULL; + } + } + if (!sio) { + sio = mempool_alloc(sio_pool, GFP_KERNEL); + init_sync_kiocb(&sio->iocb, sis->swap_file); + sio->iocb.ki_pos = pos; + sio->iocb.ki_complete = sio_read_complete; + sio->pages = 0; + } + sio->bvec[sio->pages].bv_page = page; + sio->bvec[sio->pages].bv_len = PAGE_SIZE; + sio->bvec[sio->pages].bv_offset = 0; + sio->pages += 1; + if (sio->pages == ARRAY_SIZE(sio->bvec) || !plug) { + swap_read_unplug(sio); + sio = NULL; + } + if (plug) + *plug = sio; } -int swap_readpage(struct page *page, bool synchronous) +int swap_readpage(struct page *page, bool synchronous, + struct swap_iocb **plug) { struct bio *bio; int ret = 0; @@ -462,7 +483,7 @@ int swap_readpage(struct page *page, bool synchronous) } if (data_race(sis->flags & SWP_FS_OPS)) { - ret = swap_readpage_fs(page); + swap_readpage_fs(page, plug); goto out; } @@ -513,3 +534,16 @@ int swap_readpage(struct page *page, bool synchronous) delayacct_swapin_end(); return ret; } + +void __swap_read_unplug(struct swap_iocb *sio) +{ + struct iov_iter from; + struct address_space *mapping = sio->iocb.ki_filp->f_mapping; + int ret; + + iov_iter_bvec(&from, READ, sio->bvec, sio->pages, + PAGE_SIZE * sio->pages); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_read_complete(&sio->iocb, ret); +} diff --git a/mm/swap.h b/mm/swap.h index eafac80b18d9..0389ab147837 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -7,7 +7,15 @@ /* linux/mm/page_io.c */ int sio_pool_init(void); -int swap_readpage(struct page *page, bool do_poll); +struct swap_iocb; +int swap_readpage(struct page *page, bool do_poll, + struct swap_iocb **plug); +void __swap_read_unplug(struct swap_iocb *plug); +static inline void swap_read_unplug(struct swap_iocb *plug) +{ + if (unlikely(plug)) + __swap_read_unplug(plug); +} int swap_writepage(struct page *page, struct writeback_control *wbc); void end_swap_bio_write(struct bio *bio); int __swap_writepage(struct page *page, struct writeback_control *wbc, @@ -41,7 +49,8 @@ struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index); struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, unsigned long addr, - bool do_poll); + bool do_poll, + struct swap_iocb **plug); struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, struct vm_area_struct *vma, unsigned long addr, @@ -56,7 +65,9 @@ static inline unsigned int page_swap_flags(struct page *page) return page_swap_info(page)->flags; } #else /* CONFIG_SWAP */ -static inline int swap_readpage(struct page *page, bool do_poll) +struct swap_iocb; +static inline int swap_readpage(struct page *page, bool do_poll, + struct swap_iocb **plug) { return 0; } diff --git a/mm/swap_state.c b/mm/swap_state.c index f3ab01801629..d41746a572a2 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -520,14 +520,16 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * the swap entry is no longer in use. */ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, - struct vm_area_struct *vma, unsigned long addr, bool do_poll) + struct vm_area_struct *vma, + unsigned long addr, bool do_poll, + struct swap_iocb **plug) { bool page_was_allocated; struct page *retpage = __read_swap_cache_async(entry, gfp_mask, vma, addr, &page_was_allocated); if (page_was_allocated) - swap_readpage(retpage, do_poll); + swap_readpage(retpage, do_poll, plug); return retpage; } @@ -621,6 +623,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, unsigned long mask; struct swap_info_struct *si = swp_swap_info(entry); struct blk_plug plug; + struct swap_iocb *splug = NULL; bool do_poll = true, page_allocated; struct vm_area_struct *vma = vmf->vma; unsigned long addr = vmf->address; @@ -647,7 +650,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); + swap_readpage(page, false, &splug); if (offset != entry_offset) { SetPageReadahead(page); count_vm_event(SWAP_RA); @@ -656,10 +659,12 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, put_page(page); } blk_finish_plug(&plug); + swap_read_unplug(splug); lru_add_drain(); /* Push any new pages onto the LRU now */ skip: - return read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll); + /* The page was likely read above, so no need for plugging here */ + return read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll, NULL); } int init_swap_address_space(unsigned int type, unsigned long nr_pages) @@ -790,6 +795,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, struct vm_fault *vmf) { struct blk_plug plug; + struct swap_iocb *splug = NULL; struct vm_area_struct *vma = vmf->vma; struct page *page; pte_t *pte, pentry; @@ -820,7 +826,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, if (!page) continue; if (page_allocated) { - swap_readpage(page, false); + swap_readpage(page, false, &splug); if (i != ra_info.offset) { SetPageReadahead(page); count_vm_event(SWAP_RA); @@ -829,10 +835,12 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask, put_page(page); } blk_finish_plug(&plug); + swap_read_unplug(splug); lru_add_drain(); skip: + /* The page was likely read above, so no need for plugging here */ return read_swap_cache_async(fentry, gfp_mask, vma, vmf->address, - ra_info.win == 1); + ra_info.win == 1, NULL); } /** From patchwork Tue Mar 29 23:49:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12795349 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FFACC433F5 for ; Tue, 29 Mar 2022 23:52:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AAC978D0002; Tue, 29 Mar 2022 19:52:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A36518D0001; Tue, 29 Mar 2022 19:52:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B02F8D0002; Tue, 29 Mar 2022 19:52:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 793B38D0001 for ; Tue, 29 Mar 2022 19:52:47 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 54F9760D17 for ; Tue, 29 Mar 2022 23:52:47 +0000 (UTC) X-FDA: 79299076374.02.13A18D7 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf18.hostedemail.com (Postfix) with ESMTP id B96881C0006 for ; Tue, 29 Mar 2022 23:52:46 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id BE0DE210E4; Tue, 29 Mar 2022 23:52:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1648597965; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tHfmQRfWWj/9ppymrH9i9ehDpTX6TvmocJ0Wdcak+HI=; b=jkbRIbS0OmZYiNdLIG34MIFqRkHP23EAVlbkeTqwiBB3sSCM0ymRBp5zpXKzMvf4gWlGCa o6ghCGue4lz6cfeD27CKmntzmaLQWdGxkzPKzGOyG4qnSL7Zah5vYPelH+rfkkbYa+gQsW +8GFH2Ep/ezL/5tqAETThTQdZdU/EvQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1648597965; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tHfmQRfWWj/9ppymrH9i9ehDpTX6TvmocJ0Wdcak+HI=; b=Qgs0CQ06rhSx7pqLgQ7LKjYM4pKBmy8EuO37JcJM5pR5Dm2W1ISycp8VNpeU9tohrdHJ22 CkmLLUx89LALOWCQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 957E613A7E; Tue, 29 Mar 2022 23:52:43 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id YkUOFMubQ2JXLwAAMHmgww (envelope-from ); Tue, 29 Mar 2022 23:52:43 +0000 Subject: [PATCH 09/10] MM: submit multipage write for SWP_FS_OPS swap-space From: NeilBrown To: Andrew Morton Cc: Christoph Hellwig , David Howells , linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 30 Mar 2022 10:49:41 +1100 Message-ID: <164859778128.29473.5191868522654408537.stgit@noble.brown> In-Reply-To: <164859751830.29473.5309689752169286816.stgit@noble.brown> References: <164859751830.29473.5309689752169286816.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Rspam-User: X-Stat-Signature: qtogxpfki6w1dguqopt7hs4afbdon6ma Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=jkbRIbS0; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=Qgs0CQ06; spf=pass (imf18.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: B96881C0006 X-HE-Tag: 1648597966-457700 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: swap_writepage() is given one page at a time, but may be called repeatedly in succession. For block-device swapspace, the blk_plug functionality allows the multiple pages to be combined together at lower layers. That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page reads. With this patch we pass a pointer-to-pointer via the wbc. swap_writepage can store state between calls - much like the pointer passed explicitly to swap_readpage. After calling swap_writepage() some number of times, the state will be passed to swap_write_unplug() which can submit the combined request. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- include/linux/writeback.h | 7 ++++ mm/page_io.c | 78 ++++++++++++++++++++++++++++++++------------- mm/swap.h | 4 ++ mm/vmscan.c | 9 ++++- 4 files changed, 74 insertions(+), 24 deletions(-) diff --git a/include/linux/writeback.h b/include/linux/writeback.h index fec248ab1fec..32b35f21cb97 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -80,6 +80,13 @@ struct writeback_control { unsigned punt_to_cgroup:1; /* cgrp punting, see __REQ_CGROUP_PUNT */ + /* To enable batching of swap writes to non-block-device backends, + * "plug" can be set point to a 'struct swap_iocb *'. When all swap + * writes have been submitted, if with swap_iocb is not NULL, + * swap_write_unplug() should be called. + */ + struct swap_iocb **swap_plug; + #ifdef CONFIG_CGROUP_WRITEBACK struct bdi_writeback *wb; /* wb this writeback is issued under */ struct inode *inode; /* inode being written out */ diff --git a/mm/page_io.c b/mm/page_io.c index 8735707ea349..6eeec8692a29 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -308,8 +308,9 @@ static void sio_write_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); struct page *page = sio->bvec[0].bv_page; + int p; - if (ret != PAGE_SIZE) { + if (ret != PAGE_SIZE * sio->pages) { /* * In the case of swap-over-nfs, this can be a * temporary failure if the system has limited @@ -320,43 +321,63 @@ static void sio_write_complete(struct kiocb *iocb, long ret) * the normal direct-to-bio case as it could * be temporary. */ - set_page_dirty(page); - ClearPageReclaim(page); pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n", ret, page_file_offset(page)); + for (p = 0; p < sio->pages; p++) { + page = sio->bvec[p].bv_page; + set_page_dirty(page); + ClearPageReclaim(page); + } } else - count_vm_event(PSWPOUT); - end_page_writeback(page); + count_vm_events(PSWPOUT, sio->pages); + + for (p = 0; p < sio->pages; p++) + end_page_writeback(sio->bvec[p].bv_page); + mempool_free(sio, sio_pool); } static int swap_writepage_fs(struct page *page, struct writeback_control *wbc) { - struct swap_iocb *sio; + struct swap_iocb *sio = NULL; struct swap_info_struct *sis = page_swap_info(page); struct file *swap_file = sis->swap_file; - struct address_space *mapping = swap_file->f_mapping; - struct iov_iter from; - int ret; + loff_t pos = page_file_offset(page); set_page_writeback(page); unlock_page(page); - sio = mempool_alloc(sio_pool, GFP_NOIO); - init_sync_kiocb(&sio->iocb, swap_file); - sio->iocb.ki_complete = sio_write_complete; - sio->iocb.ki_pos = page_file_offset(page); - sio->bvec[0].bv_page = page; - sio->bvec[0].bv_len = PAGE_SIZE; - sio->bvec[0].bv_offset = 0; - iov_iter_bvec(&from, WRITE, &sio->bvec[0], 1, PAGE_SIZE); - ret = mapping->a_ops->swap_rw(&sio->iocb, &from); - if (ret != -EIOCBQUEUED) - sio_write_complete(&sio->iocb, ret); - return ret; + if (wbc->swap_plug) + sio = *wbc->swap_plug; + if (sio) { + if (sio->iocb.ki_filp != swap_file || + sio->iocb.ki_pos + sio->pages * PAGE_SIZE != pos) { + swap_write_unplug(sio); + sio = NULL; + } + } + if (!sio) { + sio = mempool_alloc(sio_pool, GFP_NOIO); + init_sync_kiocb(&sio->iocb, swap_file); + sio->iocb.ki_complete = sio_write_complete; + sio->iocb.ki_pos = pos; + sio->pages = 0; + } + sio->bvec[sio->pages].bv_page = page; + sio->bvec[sio->pages].bv_len = PAGE_SIZE; + sio->bvec[sio->pages].bv_offset = 0; + sio->pages += 1; + if (sio->pages == ARRAY_SIZE(sio->bvec) || !wbc->swap_plug) { + swap_write_unplug(sio); + sio = NULL; + } + if (wbc->swap_plug) + *wbc->swap_plug = sio; + + return 0; } int __swap_writepage(struct page *page, struct writeback_control *wbc, - bio_end_io_t end_write_func) + bio_end_io_t end_write_func) { struct bio *bio; int ret; @@ -393,6 +414,19 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc, return 0; } +void swap_write_unplug(struct swap_iocb *sio) +{ + struct iov_iter from; + struct address_space *mapping = sio->iocb.ki_filp->f_mapping; + int ret; + + iov_iter_bvec(&from, WRITE, sio->bvec, sio->pages, + PAGE_SIZE * sio->pages); + ret = mapping->a_ops->swap_rw(&sio->iocb, &from); + if (ret != -EIOCBQUEUED) + sio_write_complete(&sio->iocb, ret); +} + static void sio_read_complete(struct kiocb *iocb, long ret) { struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb); diff --git a/mm/swap.h b/mm/swap.h index 0389ab147837..a6da8f612904 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -16,6 +16,7 @@ static inline void swap_read_unplug(struct swap_iocb *plug) if (unlikely(plug)) __swap_read_unplug(plug); } +void swap_write_unplug(struct swap_iocb *sio); int swap_writepage(struct page *page, struct writeback_control *wbc); void end_swap_bio_write(struct bio *bio); int __swap_writepage(struct page *page, struct writeback_control *wbc, @@ -71,6 +72,9 @@ static inline int swap_readpage(struct page *page, bool do_poll, { return 0; } +static inline void swap_write_unplug(struct swap_iocb *sio) +{ +} static inline struct address_space *swap_address_space(swp_entry_t entry) { diff --git a/mm/vmscan.c b/mm/vmscan.c index 9150754bf2b8..658724af15c7 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1156,7 +1156,8 @@ typedef enum { * pageout is called by shrink_page_list() for each dirty page. * Calls ->writepage(). */ -static pageout_t pageout(struct folio *folio, struct address_space *mapping) +static pageout_t pageout(struct folio *folio, struct address_space *mapping, + struct swap_iocb **plug) { /* * If the folio is dirty, only perform writeback if that write @@ -1201,6 +1202,7 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping) .range_start = 0, .range_end = LLONG_MAX, .for_reclaim = 1, + .swap_plug = plug, }; folio_set_reclaim(folio); @@ -1533,6 +1535,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, unsigned int nr_reclaimed = 0; unsigned int pgactivate = 0; bool do_demote_pass; + struct swap_iocb *plug = NULL; memset(stat, 0, sizeof(*stat)); cond_resched(); @@ -1814,7 +1817,7 @@ static unsigned int shrink_page_list(struct list_head *page_list, * starts and then write it out here. */ try_to_unmap_flush_dirty(); - switch (pageout(folio, mapping)) { + switch (pageout(folio, mapping, &plug)) { case PAGE_KEEP: goto keep_locked; case PAGE_ACTIVATE: @@ -1968,6 +1971,8 @@ static unsigned int shrink_page_list(struct list_head *page_list, list_splice(&ret_pages, page_list); count_vm_events(PGACTIVATE, pgactivate); + if (plug) + swap_write_unplug(plug); return nr_reclaimed; } From patchwork Tue Mar 29 23:49:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12795350 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5A67C433F5 for ; Tue, 29 Mar 2022 23:52:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6DCAA8D0005; Tue, 29 Mar 2022 19:52:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 666E88D0001; Tue, 29 Mar 2022 19:52:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 505DE8D0005; Tue, 29 Mar 2022 19:52:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 41E768D0001 for ; Tue, 29 Mar 2022 19:52:53 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 0FC4C3FA for ; Tue, 29 Mar 2022 23:52:53 +0000 (UTC) X-FDA: 79299076626.11.02CCE8A Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf03.hostedemail.com (Postfix) with ESMTP id 77FCF20003 for ; Tue, 29 Mar 2022 23:52:52 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 70BF9218F9; Tue, 29 Mar 2022 23:52:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1648597971; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4Nhl/gv+xJp+ByGDT5YVjTNRimHHTfVAVx6ENwNFDnA=; b=MZjJOPQZUFCfkGoWPznlD46FTghqaNHvn9Mqg5X5+zMFGTdMOOEI6thyMr1E5EyQ4IljTb oTSSt81OOTQWH4rXDHtkwrHe0NDzIzUi0rJ5rm8Fa8MF9IdUvhVZiYWMrO1qC+GsGHOCTn Dmb7lfyTLShn241RqjzAgAs3eT7/Y+0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1648597971; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4Nhl/gv+xJp+ByGDT5YVjTNRimHHTfVAVx6ENwNFDnA=; b=YFdItVT8i4IyKH/d06xEQvLP3mRbqho5lYqyulNC1s8hwHT9QQgIL+Mb/9ikn3DyhYLt5J VjxaHKAn+FzUX2BA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 57A4D13A7E; Tue, 29 Mar 2022 23:52:49 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 9xIDBdGbQ2JfLwAAMHmgww (envelope-from ); Tue, 29 Mar 2022 23:52:49 +0000 Subject: [PATCH 10/10] VFS: Add FMODE_CAN_ODIRECT file flag From: NeilBrown To: Andrew Morton Cc: Christoph Hellwig , David Howells , linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 30 Mar 2022 10:49:41 +1100 Message-ID: <164859778128.29473.15189737957277399416.stgit@noble.brown> In-Reply-To: <164859751830.29473.5309689752169286816.stgit@noble.brown> References: <164859751830.29473.5309689752169286816.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Stat-Signature: yb3h4q9ho6zbayk9tjhjmaeynd6i9rns Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=MZjJOPQZ; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=YFdItVT8; spf=pass (imf03.hostedemail.com: domain of neilb@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=neilb@suse.de; dmarc=pass (policy=none) header.from=suse.de X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 77FCF20003 X-HE-Tag: 1648597972-32596 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently various places test if direct IO is possible on a file by checking for the existence of the direct_IO address space operation. This is a poor choice, as the direct_IO operation may not be used - it is only used if the generic_file_*_iter functions are called for direct IO and some filesystems - particularly NFS - don't do this. Instead, introduce a new f_mode flag: FMODE_CAN_ODIRECT and change the various places to check this (avoiding pointer dereferences). do_dentry_open() will set this flag if ->direct_IO is present, so filesystems do not need to be changed. NFS *is* changed, to set the flag explicitly and discard the direct_IO entry in the address_space_operations for files. Other filesystems which currently use noop_direct_IO could usefully be changed to set this flag instead. Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- drivers/block/loop.c | 4 ++-- fs/fcntl.c | 9 ++++----- fs/nfs/file.c | 3 ++- fs/open.c | 9 ++++----- fs/overlayfs/file.c | 13 ++++--------- include/linux/fs.h | 3 +++ 6 files changed, 19 insertions(+), 22 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 3e636a75c83a..74cd550a8952 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -186,8 +186,8 @@ static void __loop_update_dio(struct loop_device *lo, bool dio) */ if (dio) { if (queue_logical_block_size(lo->lo_queue) >= sb_bsize && - !(lo->lo_offset & dio_align) && - mapping->a_ops->direct_IO) + !(lo->lo_offset & dio_align) && + (file->f_mode & FMODE_CAN_ODIRECT)) use_dio = true; else use_dio = false; diff --git a/fs/fcntl.c b/fs/fcntl.c index f15d885b9796..34a3faa4886d 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -56,11 +56,10 @@ static int setfl(int fd, struct file * filp, unsigned long arg) arg |= O_NONBLOCK; /* Pipe packetized mode is controlled by O_DIRECT flag */ - if (!S_ISFIFO(inode->i_mode) && (arg & O_DIRECT)) { - if (!filp->f_mapping || !filp->f_mapping->a_ops || - !filp->f_mapping->a_ops->direct_IO) - return -EINVAL; - } + if (!S_ISFIFO(inode->i_mode) && + (arg & O_DIRECT) && + !(filp->f_mode & FMODE_CAN_ODIRECT)) + return -EINVAL; if (filp->f_op->check_flags) error = filp->f_op->check_flags(arg); diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 6da81a4f3bff..143412226bab 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -74,6 +74,8 @@ nfs_file_open(struct inode *inode, struct file *filp) return res; res = nfs_open(inode, filp); + if (res == 0) + filp->f_mode |= FMODE_CAN_ODIRECT; return res; } @@ -535,7 +537,6 @@ const struct address_space_operations nfs_file_aops = { .write_end = nfs_write_end, .invalidate_folio = nfs_invalidate_folio, .releasepage = nfs_release_page, - .direct_IO = nfs_direct_IO, #ifdef CONFIG_MIGRATION .migratepage = nfs_migrate_page, #endif diff --git a/fs/open.c b/fs/open.c index 1315253e0247..7b50d7a2f51d 100644 --- a/fs/open.c +++ b/fs/open.c @@ -834,16 +834,15 @@ static int do_dentry_open(struct file *f, if ((f->f_mode & FMODE_WRITE) && likely(f->f_op->write || f->f_op->write_iter)) f->f_mode |= FMODE_CAN_WRITE; + if (f->f_mapping->a_ops && f->f_mapping->a_ops->direct_IO) + f->f_mode |= FMODE_CAN_ODIRECT; f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC); file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping); - /* NB: we're sure to have correct a_ops only after f_op->open */ - if (f->f_flags & O_DIRECT) { - if (!f->f_mapping->a_ops || !f->f_mapping->a_ops->direct_IO) - return -EINVAL; - } + if ((f->f_flags & O_DIRECT) && !(f->f_mode & FMODE_CAN_ODIRECT)) + return -EINVAL; /* * XXX: Huge page cache doesn't support writing yet. Drop all page diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index fa125feed0ff..9d69b4dbb8c4 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -82,11 +82,8 @@ static int ovl_change_flags(struct file *file, unsigned int flags) if (((flags ^ file->f_flags) & O_APPEND) && IS_APPEND(inode)) return -EPERM; - if (flags & O_DIRECT) { - if (!file->f_mapping->a_ops || - !file->f_mapping->a_ops->direct_IO) - return -EINVAL; - } + if ((flags & O_DIRECT) && !(file->f_mode & FMODE_CAN_ODIRECT)) + return -EINVAL; if (file->f_op->check_flags) { err = file->f_op->check_flags(flags); @@ -306,8 +303,7 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter) ret = -EINVAL; if (iocb->ki_flags & IOCB_DIRECT && - (!real.file->f_mapping->a_ops || - !real.file->f_mapping->a_ops->direct_IO)) + !(real.file->f_mode & FMODE_CAN_ODIRECT)) goto out_fdput; old_cred = ovl_override_creds(file_inode(file)->i_sb); @@ -367,8 +363,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter) ret = -EINVAL; if (iocb->ki_flags & IOCB_DIRECT && - (!real.file->f_mapping->a_ops || - !real.file->f_mapping->a_ops->direct_IO)) + !(real.file->f_mode & FMODE_CAN_ODIRECT)) goto out_fdput; if (!ovl_should_sync(OVL_FS(inode->i_sb))) diff --git a/include/linux/fs.h b/include/linux/fs.h index 7c65e09c09a6..781361562a27 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -162,6 +162,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, /* File is stream-like */ #define FMODE_STREAM ((__force fmode_t)0x200000) +/* File supports DIRECT IO */ +#define FMODE_CAN_ODIRECT ((__force fmode_t)0x400000) + /* File was opened by fanotify and shouldn't generate fanotify events */ #define FMODE_NONOTIFY ((__force fmode_t)0x4000000)