From patchwork Tue Apr 30 14:00:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13649080 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1182AC4345F for ; Tue, 30 Apr 2024 14:01:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 17BBF6B00A6; Tue, 30 Apr 2024 10:01:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 105216B00A8; Tue, 30 Apr 2024 10:01:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDDA06B00AA; Tue, 30 Apr 2024 10:01:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id AA58D6B00A6 for ; Tue, 30 Apr 2024 10:01:47 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 62A8DC0812 for ; Tue, 30 Apr 2024 14:01:43 +0000 (UTC) X-FDA: 82066361286.16.6A3FCA2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf01.hostedemail.com (Postfix) with ESMTP id 0CA5D40004 for ; Tue, 30 Apr 2024 14:01:37 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cyfenI8W; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf01.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714485698; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fSmTGGISerpaaT9aJOrv4cy/yLYw7vFSX4krrCAu5wI=; b=33764uJYZNstnlxM37gctMekTVlYCmU2OBfJcZxPFse6w59C1iKHOK2uT6VUK43Wgs+Ond PUxksi59Ju2pnut6cLXgb99PJUUhrjkcWIA93RLr1meYUHccIKoVyQQN2UpGR2+dYA5gNR 4OfULof8wGZpJT1mCKN5pQPQoLu07jU= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cyfenI8W; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf01.hostedemail.com: domain of dhowells@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714485698; a=rsa-sha256; cv=none; b=E6HDhSIKpWN79QKIAcN8MIgifqIAdos7la5EV0x9/aZmjs2Q+W8NOREWCDCKk9t302PALz zI+im3+Fks/s6U0TUdVSvFQGgANThHKIy55kiHP+LFMB4S4BAVY1cjrJM+LeojoPtNx5GU rRNiZqNacVHdOWqdxozNPeGMKztaqIc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1714485697; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fSmTGGISerpaaT9aJOrv4cy/yLYw7vFSX4krrCAu5wI=; b=cyfenI8WVzQf4vzlDsarIOhjfIJBKXmGvuy7uxyu5fJff7cJ2pnwwacRNinUIKk0uEy8WW +diDJ8mfoQXwysSECKHgpf6SWXibLleXRzKffR9vMsmZTR/jgVfay4m8k/1KkK4+VdTtj2 K3A6/0kLvt87rL8EkHQL0RirVfE6woY= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-663-dQcRgBBrN6WTDH9brex0BA-1; Tue, 30 Apr 2024 10:01:34 -0400 X-MC-Unique: dQcRgBBrN6WTDH9brex0BA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2E49C29AC01C; Tue, 30 Apr 2024 14:01:32 +0000 (UTC) Received: from warthog.procyon.org.com (unknown [10.42.28.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3FC0020128EF; Tue, 30 Apr 2024 14:01:28 +0000 (UTC) From: David Howells To: Christian Brauner , Jeff Layton , Gao Xiang , Dominique Martinet Cc: David Howells , Matthew Wilcox , Steve French , Marc Dionne , Paulo Alcantara , Shyam Prasad N , Tom Talpey , Eric Van Hensbergen , Ilya Dryomov , netfs@lists.linux.dev, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs@lists.linux.dev, linux-erofs@lists.ozlabs.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Miklos Szeredi , Trond Myklebust , Christoph Hellwig , Andrew Morton , Alexander Viro , Christian Brauner , devel@lists.orangefs.org Subject: [PATCH v2 07/22] mm: Provide a means of invalidation without using launder_folio Date: Tue, 30 Apr 2024 15:00:38 +0100 Message-ID: <20240430140056.261997-8-dhowells@redhat.com> In-Reply-To: <20240430140056.261997-1-dhowells@redhat.com> References: <20240430140056.261997-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.4 X-Stat-Signature: mph1kmmwx3igbfpjobxi4rhaan7rykb3 X-Rspamd-Queue-Id: 0CA5D40004 X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1714485697-302739 X-HE-Meta: U2FsdGVkX19QxC0yqE1w8gQqBlSXzNZ4jdHoIfAopN1vCw2504MmnpD/h+GWOX9jctxhPHA1+bNpLVIuWLO/y1Ha2yhfvw2rIQ0QPrIBLKFYHN184j3IQcvVH2Fi7/ExvhF4JEenpbFmFsxJUOXMByXRLN89HVQIkbHpmFb4uOV00JlngOW4iGooarRig2xXavI+P5qhoSIYL06c2YUdolSlmp83D/F5HUFAa3GC6ZWqWjrLWFs+yitel/X0IgZ2y3stUMTfJH46Yuz0RznGb+XVZq2VFoS2bQyRvv9wmaoM4MgVjhsPD8k2SXi7w4NuKmkMOBsMfzIgdlLuVoGFkWVeYK+xActUf2EbzsUn3gp2jGTfFaQh2z1duDMmoR+y8JdqSVobTO7cUMQK6A0vNLPEx27sBhZGkoE3v1f+JzSgpZPTDQGT8EL5vqNtFPH1OQpu4VClCNlp3I+X+aCqkaavQXaVq1QqS0+hsvtqzW5vAllslUp8HjkJ2rDnG2iX5l82t1wtO3emDX/649/dP0gObSY1g9OmktsqxXLQmM9hHq2ZK3t24CfO6AEIqSqIWmzlPecVa2z1ZA5F5rQtMi3uFgRcuiLatBN+gTWstU7NGw/p9RTSER83jYe6OLM0s2FWjDaZkUp9qKHfyVDyTAfEdXVe5RxUuCbuHLLTHzOft+tju9ZPObNW4rH3d3ebsgslRA3pPoncSH4ZoLoxVcevPMDHJHkSBrFRFsn6z6hZBZSmAZT0w2FM4JjzlDI98K0T46bgLRkjBQb57oKKowRvhpOqbSPQrd935gKS5J8djp7nwi58iA7Pu3T+z09viqrcZNljhOkImyYeicnH2mTeGH0clBQ93NECStFeXSszaHtzy+tHb1L8nOuGiyf8a6aM4zMmrYFmEl0VhENROpcDCWC9Yhmr1QkbsgYVrPeGIt4MK3H1AIEJ38YVGp2mvkNXh14HH96R+3HaZ+n 6QHeeHQI DYIK3dTEAsYB8KwwYcrW7kdUqMv/z1MKZZ+p2qsoMG/0PwNqPysoyqj4+bBCXlVcr1Dc4vXVi1NZBBaHN4FETcH9wd7ibN7KalQ/n1MTZiIbH49KrHjnF0lJ1O1Jl+dUwYmGCDbn8R1edtUiCArIFWukZKG3qtXkw4vYIsT9/+kU0J1hFqZrLD/CG7btRNaEUyLcTYPwkvLzinQvRZWxT62LkhTYW7BXaPRr7o2Jhf1S04YdvmSXirezLyldLT4n9fqvty2jACSQCpKXHl9ZIivJjsHsfpiIDjen8+HJBmQ5KWxRLId745VFbDg9XeyXTyxlGMgQlr/140A89PigqmafWQO7WH90UAligvjxptr+9/XcvTP/wEgXCNt7+nz0dCA0mF2KVrRmavbKcGsNv+QoPZsUXlB/AwAk5vw0vRB+mY+/zZkIC0rYnhcc76ao2WDQoqNRZqon0a4KANhd+DwVCmc/yUNk/a1Uk0OLJ/treCkSNQ0Ku9jY3NGZZ1LCz4tpHxqX8R+epflWzzNFMl9Y6ENwRXqsmJ2lqZfR+n3Y0u7C6M09AQBnKL2jCA5+xCWWTKA6XDv/Cy3sOcSaW37/+7voNoSGhbzUQx0ys6GrP0A/m0mmrzi5dAY0DT7xVpKoyt20DpK7+2/3yB2hOGFYV9RkiBJOD2xgh X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Implement a replacement for launder_folio. The key feature of invalidate_inode_pages2() is that it locks each folio individually, unmaps it to prevent mmap'd accesses interfering and calls the ->launder_folio() address_space op to flush it. This has problems: firstly, each folio is written individually as one or more small writes; secondly, adjacent folios cannot be added so easily into the laundry; thirdly, it's yet another op to implement. Instead, use the invalidate lock to cause anyone wanting to add a folio to the inode to wait, then unmap all the folios if we have mmaps, then, conditionally, use ->writepages() to flush any dirty data back and then discard all pages. The invalidate lock prevents ->read_iter(), ->write_iter() and faulting through mmap all from adding pages for the duration. This is then used from netfslib to handle the flusing in unbuffered and direct writes. Signed-off-by: David Howells cc: Matthew Wilcox cc: Miklos Szeredi cc: Trond Myklebust cc: Christoph Hellwig cc: Andrew Morton cc: Alexander Viro cc: Christian Brauner cc: Jeff Layton cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: devel@lists.orangefs.org --- Notes: Changes ======= ver #2) - Make filemap_invalidate_inode() take a range. - Make netfs_unbuffered_write_iter() use filemap_invalidate_inode(). fs/netfs/direct_write.c | 28 ++++++++++++++++++--- include/linux/pagemap.h | 2 ++ mm/filemap.c | 54 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 80 insertions(+), 4 deletions(-) diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c index bee047e20f5d..2b81cd4aae6e 100644 --- a/fs/netfs/direct_write.c +++ b/fs/netfs/direct_write.c @@ -132,12 +132,14 @@ static ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov ssize_t netfs_unbuffered_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; - struct inode *inode = file->f_mapping->host; + struct address_space *mapping = file->f_mapping; + struct inode *inode = mapping->host; struct netfs_inode *ictx = netfs_inode(inode); - unsigned long long end; ssize_t ret; + loff_t pos = iocb->ki_pos; + unsigned long long end = pos + iov_iter_count(from) - 1; - _enter("%llx,%zx,%llx", iocb->ki_pos, iov_iter_count(from), i_size_read(inode)); + _enter("%llx,%zx,%llx", pos, iov_iter_count(from), i_size_read(inode)); if (!iov_iter_count(from)) return 0; @@ -157,7 +159,25 @@ ssize_t netfs_unbuffered_write_iter(struct kiocb *iocb, struct iov_iter *from) ret = file_update_time(file); if (ret < 0) goto out; - ret = kiocb_invalidate_pages(iocb, iov_iter_count(from)); + if (iocb->ki_flags & IOCB_NOWAIT) { + /* We could block if there are any pages in the range. */ + ret = -EAGAIN; + if (filemap_range_has_page(mapping, pos, end)) + if (filemap_invalidate_inode(inode, true, pos, end)) + goto out; + } else { + ret = filemap_write_and_wait_range(mapping, pos, end); + if (ret < 0) + goto out; + } + + /* + * After a write we want buffered reads to be sure to go to disk to get + * the new data. We invalidate clean cached page from the region we're + * about to write. We do this *before* the write so that we can return + * without clobbering -EIOCBQUEUED from ->direct_IO(). + */ + ret = filemap_invalidate_inode(inode, true, pos, end); if (ret < 0) goto out; end = iocb->ki_pos + iov_iter_count(from); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 2df35e65557d..c5e33e2ca48a 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -40,6 +40,8 @@ int filemap_fdatawait_keep_errors(struct address_space *mapping); int filemap_fdatawait_range(struct address_space *, loff_t lstart, loff_t lend); int filemap_fdatawait_range_keep_errors(struct address_space *mapping, loff_t start_byte, loff_t end_byte); +int filemap_invalidate_inode(struct inode *inode, bool flush, + loff_t start, loff_t end); static inline int filemap_fdatawait(struct address_space *mapping) { diff --git a/mm/filemap.c b/mm/filemap.c index 9a2e28bf298a..53516305b4b4 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -4134,6 +4134,60 @@ bool filemap_release_folio(struct folio *folio, gfp_t gfp) } EXPORT_SYMBOL(filemap_release_folio); +/** + * filemap_invalidate_inode - Invalidate/forcibly write back a range of an inode's pagecache + * @inode: The inode to flush + * @flush: Set to write back rather than simply invalidate. + * @start: First byte to in range. + * @end: Last byte in range (inclusive), or LLONG_MAX for everything from start + * onwards. + * + * Invalidate all the folios on an inode that contribute to the specified + * range, possibly writing them back first. Whilst the operation is + * undertaken, the invalidate lock is held to prevent new folios from being + * installed. + */ +int filemap_invalidate_inode(struct inode *inode, bool flush, + loff_t start, loff_t end) +{ + struct address_space *mapping = inode->i_mapping; + pgoff_t first = start >> PAGE_SHIFT; + pgoff_t last = end >> PAGE_SHIFT; + pgoff_t nr = end == LLONG_MAX ? ULONG_MAX : last - first + 1; + + if (!mapping || !mapping->nrpages || end < start) + goto out; + + /* Prevent new folios from being added to the inode. */ + filemap_invalidate_lock(mapping); + + if (!mapping->nrpages) + goto unlock; + + unmap_mapping_pages(mapping, first, nr, false); + + /* Write back the data if we're asked to. */ + if (flush) { + struct writeback_control wbc = { + .sync_mode = WB_SYNC_ALL, + .nr_to_write = LONG_MAX, + .range_start = first, + .range_end = last, + }; + + filemap_fdatawrite_wbc(mapping, &wbc); + } + + /* Wait for writeback to complete on all folios and discard. */ + truncate_inode_pages_range(mapping, first, last); + +unlock: + filemap_invalidate_unlock(mapping); +out: + return filemap_check_errors(mapping); +} +EXPORT_SYMBOL(filemap_invalidate_inode); + #ifdef CONFIG_CACHESTAT_SYSCALL /** * filemap_cachestat() - compute the page cache statistics of a mapping