From patchwork Wed Jul 21 13:44:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12390941 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F907C636CE for ; Wed, 21 Jul 2021 13:45:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 281176124B for ; Wed, 21 Jul 2021 13:45:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238534AbhGUNEk (ORCPT ); Wed, 21 Jul 2021 09:04:40 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:59856 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238612AbhGUNEU (ORCPT ); Wed, 21 Jul 2021 09:04:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875094; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dD0k0Pq287Lg2ikVcgHQbQEUjRPU8jmSQuhejIIyzXI=; b=adHN0Mc/lPnSXG7TcH1akcv6cVAAM8eHpbjCY66ak8aqrr2Mponfb+qhd0kDxAzN0Y+h/M iKiuXQ2OJVCWABCVmtERnd8+nQlt1bB5VAD1Y3qiU9bTok52RdL1ATDHBUz29ZEmESKWiP NDoTY/2pWDUqMdtxplI8Gotk/T2M6Ds= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-252-1-2iTUoAPoiPWa2aZlt75Q-1; Wed, 21 Jul 2021 09:44:50 -0400 X-MC-Unique: 1-2iTUoAPoiPWa2aZlt75Q-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D3B08100B704; Wed, 21 Jul 2021 13:44:47 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6540D60CEC; Wed, 21 Jul 2021 13:44:40 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 01/12] afs: Sort out symlink reading From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:44:40 +0100 Message-ID: <162687508008.276387.6418924257569297305.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org afs_readpage() doesn't get a file pointer when called for a symlink, so separate it from regular file pointer handling. Signed-off-by: David Howells Reviewed-by: Jeff Layton --- fs/afs/file.c | 14 +++++++++----- fs/afs/inode.c | 6 +++--- fs/afs/internal.h | 3 ++- 3 files changed, 14 insertions(+), 9 deletions(-) diff --git a/fs/afs/file.c b/fs/afs/file.c index ca0d993add65..c9c21ad0e7c9 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -19,6 +19,7 @@ static int afs_file_mmap(struct file *file, struct vm_area_struct *vma); static int afs_readpage(struct file *file, struct page *page); +static int afs_symlink_readpage(struct file *file, struct page *page); static void afs_invalidatepage(struct page *page, unsigned int offset, unsigned int length); static int afs_releasepage(struct page *page, gfp_t gfp_flags); @@ -46,7 +47,7 @@ const struct inode_operations afs_file_inode_operations = { .permission = afs_permission, }; -const struct address_space_operations afs_fs_aops = { +const struct address_space_operations afs_file_aops = { .readpage = afs_readpage, .readahead = afs_readahead, .set_page_dirty = afs_set_page_dirty, @@ -60,6 +61,12 @@ const struct address_space_operations afs_fs_aops = { .writepages = afs_writepages, }; +const struct address_space_operations afs_symlink_aops = { + .readpage = afs_symlink_readpage, + .releasepage = afs_releasepage, + .invalidatepage = afs_invalidatepage, +}; + static const struct vm_operations_struct afs_vm_ops = { .fault = filemap_fault, .map_pages = filemap_map_pages, @@ -321,7 +328,7 @@ static void afs_req_issue_op(struct netfs_read_subrequest *subreq) afs_fetch_data(fsreq->vnode, fsreq); } -static int afs_symlink_readpage(struct page *page) +static int afs_symlink_readpage(struct file *file, struct page *page) { struct afs_vnode *vnode = AFS_FS_I(page->mapping->host); struct afs_read *fsreq; @@ -386,9 +393,6 @@ const struct netfs_read_request_ops afs_req_ops = { static int afs_readpage(struct file *file, struct page *page) { - if (!file) - return afs_symlink_readpage(page); - return netfs_readpage(file, page, &afs_req_ops, NULL); } diff --git a/fs/afs/inode.c b/fs/afs/inode.c index bef6f5ccfb09..cf7b66957c6f 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -105,7 +105,7 @@ static int afs_inode_init_from_status(struct afs_operation *op, inode->i_mode = S_IFREG | (status->mode & S_IALLUGO); inode->i_op = &afs_file_inode_operations; inode->i_fop = &afs_file_operations; - inode->i_mapping->a_ops = &afs_fs_aops; + inode->i_mapping->a_ops = &afs_file_aops; break; case AFS_FTYPE_DIR: inode->i_mode = S_IFDIR | (status->mode & S_IALLUGO); @@ -123,11 +123,11 @@ static int afs_inode_init_from_status(struct afs_operation *op, inode->i_mode = S_IFDIR | 0555; inode->i_op = &afs_mntpt_inode_operations; inode->i_fop = &afs_mntpt_file_operations; - inode->i_mapping->a_ops = &afs_fs_aops; + inode->i_mapping->a_ops = &afs_symlink_aops; } else { inode->i_mode = S_IFLNK | status->mode; inode->i_op = &afs_symlink_inode_operations; - inode->i_mapping->a_ops = &afs_fs_aops; + inode->i_mapping->a_ops = &afs_symlink_aops; } inode_nohighmem(inode); break; diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 791cf02e5696..ccdde00ada8a 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -1050,7 +1050,8 @@ extern void afs_dynroot_depopulate(struct super_block *); /* * file.c */ -extern const struct address_space_operations afs_fs_aops; +extern const struct address_space_operations afs_file_aops; +extern const struct address_space_operations afs_symlink_aops; extern const struct inode_operations afs_file_inode_operations; extern const struct file_operations afs_file_operations; extern const struct netfs_read_request_ops afs_req_ops; From patchwork Wed Jul 21 13:44:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12390939 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59839C6377B for ; Wed, 21 Jul 2021 13:45:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4058661249 for ; Wed, 21 Jul 2021 13:45:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238708AbhGUNE7 (ORCPT ); Wed, 21 Jul 2021 09:04:59 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:25602 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238700AbhGUNEf (ORCPT ); Wed, 21 Jul 2021 09:04:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875111; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J5baltngtCF+TZPYyjQGfSjHE6wKJdApaIxxJbNWIuk=; b=GLv2NbR8TL0OuQZGrt5SsK5IHGLWM6H9OrYczVC/c07Xvj9JuUjgQ9GM4rbdxYrpfyY7ZV s6+8BNdFMK5sUJjssXoCzGTc26n73NZHS3lBzAy+nk8Y4Tu+IdnJKzkOH+zQxaXTmdbFP2 zF1wZmt6JS7go1OpSIIAWKHzkSdUWpo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-556-juduQtDvMdOf3GwQ6nMtaw-1; Wed, 21 Jul 2021 09:45:08 -0400 X-MC-Unique: juduQtDvMdOf3GwQ6nMtaw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 07317804142; Wed, 21 Jul 2021 13:45:06 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7C4BB5DD86; Wed, 21 Jul 2021 13:44:53 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 02/12] netfs: Add an iov_iter to the read subreq for the network fs/cache to use From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:44:53 +0100 Message-ID: <162687509306.276387.7579641363406546284.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org Add an iov_iter to the read subrequest and set it up to define the destination buffer to write into. This will allow future patches to point to a bounce buffer instead for purposes of handling oversize writes, decryption (where we want to save the encrypted data to the cache) and decompression. Signed-off-by: David Howells --- fs/afs/file.c | 6 +----- fs/netfs/read_helper.c | 5 ++++- include/linux/netfs.h | 2 ++ 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/afs/file.c b/fs/afs/file.c index c9c21ad0e7c9..ca529f23515a 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -319,11 +319,7 @@ static void afs_req_issue_op(struct netfs_read_subrequest *subreq) fsreq->len = subreq->len - subreq->transferred; fsreq->key = subreq->rreq->netfs_priv; fsreq->vnode = vnode; - fsreq->iter = &fsreq->def_iter; - - iov_iter_xarray(&fsreq->def_iter, READ, - &fsreq->vnode->vfs_inode.i_mapping->i_pages, - fsreq->pos, fsreq->len); + fsreq->iter = &subreq->iter; afs_fetch_data(fsreq->vnode, fsreq); } diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index 0b6cd3b8734c..715f3e9c380d 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -150,7 +150,7 @@ static void netfs_clear_unread(struct netfs_read_subrequest *subreq) { struct iov_iter iter; - iov_iter_xarray(&iter, WRITE, &subreq->rreq->mapping->i_pages, + iov_iter_xarray(&iter, READ, &subreq->rreq->mapping->i_pages, subreq->start + subreq->transferred, subreq->len - subreq->transferred); iov_iter_zero(iov_iter_count(&iter), &iter); @@ -745,6 +745,9 @@ netfs_rreq_prepare_read(struct netfs_read_request *rreq, if (WARN_ON(subreq->len == 0)) source = NETFS_INVALID_READ; + iov_iter_xarray(&subreq->iter, READ, &rreq->mapping->i_pages, + subreq->start, subreq->len); + out: subreq->source = source; trace_netfs_sreq(subreq, netfs_sreq_trace_prepare); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index fe9887768292..5e4fafcc9480 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -17,6 +17,7 @@ #include #include #include +#include /* * Overload PG_private_2 to give us PG_fscache - this is used to indicate that @@ -112,6 +113,7 @@ struct netfs_cache_resources { struct netfs_read_subrequest { struct netfs_read_request *rreq; /* Supervising read request */ struct list_head rreq_link; /* Link in rreq->subrequests */ + struct iov_iter iter; /* Iterator for this subrequest */ loff_t start; /* Where to start the I/O */ size_t len; /* Size of the I/O */ size_t transferred; /* Amount of data transferred */ From patchwork Wed Jul 21 13:45:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12390943 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00915C636C9 for ; Wed, 21 Jul 2021 13:45:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DCC8A61246 for ; Wed, 21 Jul 2021 13:45:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238044AbhGUNFQ (ORCPT ); Wed, 21 Jul 2021 09:05:16 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:56299 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238704AbhGUNE6 (ORCPT ); Wed, 21 Jul 2021 09:04:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875125; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l2ULAssdCvCuvxiyM6eqGFkpgO6JFY0hX9JtLUvVjhk=; b=SVIr604nd/jOLWcU/P+RUxMeg6x2JEI9WU9LhB+jbmud6ymBHf8ZsR2ncQZZx82Gu7NVlm SBRQNPCDWVOGgx7aQSzpMBW88upI2Py9msWBYTuqYi9eUVoONfRza1Z7sgDhcWxsQ5+SKo bFrcItALOhMgNHHd8gF/dgWAxGtVtGs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-109-REGEyuZqOumWlWTCvhfd0A-1; Wed, 21 Jul 2021 09:45:21 -0400 X-MC-Unique: REGEyuZqOumWlWTCvhfd0A-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 78771804140; Wed, 21 Jul 2021 13:45:19 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id A6E731036D14; Wed, 21 Jul 2021 13:45:11 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 03/12] netfs: Remove netfs_read_subrequest::transferred From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:45:11 +0100 Message-ID: <162687511125.276387.15493860267582539643.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org Remove netfs_read_subrequest::transferred as it's redundant as the count on the iterator added to the subrequest can be used instead. Signed-off-by: David Howells --- fs/afs/file.c | 4 ++-- fs/netfs/read_helper.c | 26 ++++---------------------- include/linux/netfs.h | 1 - include/trace/events/netfs.h | 12 ++++++------ 4 files changed, 12 insertions(+), 31 deletions(-) diff --git a/fs/afs/file.c b/fs/afs/file.c index ca529f23515a..82e945dbe379 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -315,8 +315,8 @@ static void afs_req_issue_op(struct netfs_read_subrequest *subreq) return netfs_subreq_terminated(subreq, -ENOMEM, false); fsreq->subreq = subreq; - fsreq->pos = subreq->start + subreq->transferred; - fsreq->len = subreq->len - subreq->transferred; + fsreq->pos = subreq->start + subreq->len - iov_iter_count(&subreq->iter); + fsreq->len = iov_iter_count(&subreq->iter); fsreq->key = subreq->rreq->netfs_priv; fsreq->vnode = vnode; fsreq->iter = &subreq->iter; diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index 715f3e9c380d..5e1a9be48130 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -148,12 +148,7 @@ static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq, */ static void netfs_clear_unread(struct netfs_read_subrequest *subreq) { - struct iov_iter iter; - - iov_iter_xarray(&iter, READ, &subreq->rreq->mapping->i_pages, - subreq->start + subreq->transferred, - subreq->len - subreq->transferred); - iov_iter_zero(iov_iter_count(&iter), &iter); + iov_iter_zero(iov_iter_count(&subreq->iter), &subreq->iter); } static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error, @@ -173,14 +168,9 @@ static void netfs_read_from_cache(struct netfs_read_request *rreq, bool seek_data) { struct netfs_cache_resources *cres = &rreq->cache_resources; - struct iov_iter iter; netfs_stat(&netfs_n_rh_read); - iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages, - subreq->start + subreq->transferred, - subreq->len - subreq->transferred); - - cres->ops->read(cres, subreq->start, &iter, seek_data, + cres->ops->read(cres, subreq->start, &subreq->iter, seek_data, netfs_cache_read_terminated, subreq); } @@ -419,7 +409,7 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq) if (pgend < iopos + subreq->len) break; - account += subreq->transferred; + account += subreq->len - iov_iter_count(&subreq->iter); iopos += subreq->len; if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { subreq = list_next_entry(subreq, rreq_link); @@ -635,15 +625,8 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, goto failed; } - if (WARN(transferred_or_error > subreq->len - subreq->transferred, - "Subreq overread: R%x[%x] %zd > %zu - %zu", - rreq->debug_id, subreq->debug_index, - transferred_or_error, subreq->len, subreq->transferred)) - transferred_or_error = subreq->len - subreq->transferred; - subreq->error = 0; - subreq->transferred += transferred_or_error; - if (subreq->transferred < subreq->len) + if (iov_iter_count(&subreq->iter)) goto incomplete; complete: @@ -667,7 +650,6 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, incomplete: if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) { netfs_clear_unread(subreq); - subreq->transferred = subreq->len; goto complete; } diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 5e4fafcc9480..45d40c622205 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -116,7 +116,6 @@ struct netfs_read_subrequest { struct iov_iter iter; /* Iterator for this subrequest */ loff_t start; /* Where to start the I/O */ size_t len; /* Size of the I/O */ - size_t transferred; /* Amount of data transferred */ refcount_t usage; short error; /* 0 or error that occurred */ unsigned short debug_index; /* Index in list (for debugging output) */ diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index 4d470bffd9f1..04ac29fc700f 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -190,7 +190,7 @@ TRACE_EVENT(netfs_sreq, __field(enum netfs_read_source, source ) __field(enum netfs_sreq_trace, what ) __field(size_t, len ) - __field(size_t, transferred ) + __field(size_t, remain ) __field(loff_t, start ) ), @@ -202,7 +202,7 @@ TRACE_EVENT(netfs_sreq, __entry->source = sreq->source; __entry->what = what; __entry->len = sreq->len; - __entry->transferred = sreq->transferred; + __entry->remain = iov_iter_count(&sreq->iter); __entry->start = sreq->start; ), @@ -211,7 +211,7 @@ TRACE_EVENT(netfs_sreq, __print_symbolic(__entry->what, netfs_sreq_traces), __print_symbolic(__entry->source, netfs_sreq_sources), __entry->flags, - __entry->start, __entry->transferred, __entry->len, + __entry->start, __entry->len - __entry->remain, __entry->len, __entry->error) ); @@ -230,7 +230,7 @@ TRACE_EVENT(netfs_failure, __field(enum netfs_read_source, source ) __field(enum netfs_failure, what ) __field(size_t, len ) - __field(size_t, transferred ) + __field(size_t, remain ) __field(loff_t, start ) ), @@ -242,7 +242,7 @@ TRACE_EVENT(netfs_failure, __entry->source = sreq ? sreq->source : NETFS_INVALID_READ; __entry->what = what; __entry->len = sreq ? sreq->len : 0; - __entry->transferred = sreq ? sreq->transferred : 0; + __entry->remain = sreq ? iov_iter_count(&sreq->iter) : 0; __entry->start = sreq ? sreq->start : 0; ), @@ -250,7 +250,7 @@ TRACE_EVENT(netfs_failure, __entry->rreq, __entry->index, __print_symbolic(__entry->source, netfs_sreq_sources), __entry->flags, - __entry->start, __entry->transferred, __entry->len, + __entry->start, __entry->len - __entry->remain, __entry->len, __print_symbolic(__entry->what, netfs_failures), __entry->error) ); From patchwork Wed Jul 21 13:45:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12390945 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA6DFC636C9 for ; Wed, 21 Jul 2021 13:46:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C409661241 for ; Wed, 21 Jul 2021 13:46:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238436AbhGUNFw (ORCPT ); Wed, 21 Jul 2021 09:05:52 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:22198 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238727AbhGUNFQ (ORCPT ); Wed, 21 Jul 2021 09:05:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BunbXc0Wst9L2wf9R177XyvcjHdup0hy9T3aS2JMD1E=; b=Fs2pHKri0KHcPKfoz1ttFDRg5h0Ps6z0AVoFDsfPe1N7RSOpmu/MBSQrr1Zb9Bql+sBo2G pcTh6/Rz0efbTgXFsyOlxQD9rt9yoeCD2Z/XtYiJyHuwsla0ICRXLsh1ezd4mwPAdoXFh8 kn4lA+2U8WgFjoQsrDVcFG0Qz61dOds= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-240-ehalWS83P-2_811YjHCgrw-1; Wed, 21 Jul 2021 09:45:49 -0400 X-MC-Unique: ehalWS83P-2_811YjHCgrw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6990A80058B; Wed, 21 Jul 2021 13:45:47 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 095B36A057; Wed, 21 Jul 2021 13:45:25 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 04/12] netfs: Use a buffer in netfs_read_request and add pages to it From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:45:24 +0100 Message-ID: <162687512469.276387.15723958695928327041.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org Add an "output" buffer to the netfs_read_request struct. This is an xarray to which the intended destination pages can be added, supplemented by additional pages to make the buffer up to a sufficient size to be the output for an overlarge read, decryption and/or decompression. The readahead_expand() function will only expand the requested pageset up to a point where it runs into an already extant page at either end - which means that the resulting buffer might not be large enough or may be misaligned for our purposes. With this, we can make sure we have a useful buffer and we can splice the extra pages from it into the pagecache if there are holes we can plug. The read buffer could also be useful in the future to perform RMW cycles when fixing up after disconnected operation or direct I/O with smaller-than-preferred granularity. Signed-off-by: David Howells --- fs/netfs/read_helper.c | 166 ++++++++++++++++++++++++++++++++++++++++++++---- include/linux/netfs.h | 1 2 files changed, 154 insertions(+), 13 deletions(-) diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index 5e1a9be48130..b03bc5b0da5a 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -28,6 +28,7 @@ module_param_named(debug, netfs_debug, uint, S_IWUSR | S_IRUGO); MODULE_PARM_DESC(netfs_debug, "Netfs support debugging mask"); static void netfs_rreq_work(struct work_struct *); +static void netfs_rreq_clear_buffer(struct netfs_read_request *); static void __netfs_put_subrequest(struct netfs_read_subrequest *, bool); static void netfs_put_subrequest(struct netfs_read_subrequest *subreq, @@ -51,6 +52,7 @@ static struct netfs_read_request *netfs_alloc_read_request( rreq->inode = file_inode(file); rreq->i_size = i_size_read(rreq->inode); rreq->debug_id = atomic_inc_return(&debug_ids); + xa_init(&rreq->buffer); INIT_LIST_HEAD(&rreq->subrequests); INIT_WORK(&rreq->work, netfs_rreq_work); refcount_set(&rreq->usage, 1); @@ -90,6 +92,7 @@ static void netfs_free_read_request(struct work_struct *work) trace_netfs_rreq(rreq, netfs_rreq_trace_free); if (rreq->cache_resources.ops) rreq->cache_resources.ops->end_operation(&rreq->cache_resources); + netfs_rreq_clear_buffer(rreq); kfree(rreq); netfs_stat_d(&netfs_n_rh_rreq); } @@ -727,7 +730,7 @@ netfs_rreq_prepare_read(struct netfs_read_request *rreq, if (WARN_ON(subreq->len == 0)) source = NETFS_INVALID_READ; - iov_iter_xarray(&subreq->iter, READ, &rreq->mapping->i_pages, + iov_iter_xarray(&subreq->iter, READ, &rreq->buffer, subreq->start, subreq->len); out: @@ -838,6 +841,133 @@ static void netfs_rreq_expand(struct netfs_read_request *rreq, } } +/* + * Clear a read buffer, discarding the pages which have XA_MARK_0 set. + */ +static void netfs_rreq_clear_buffer(struct netfs_read_request *rreq) +{ + struct page *page; + XA_STATE(xas, &rreq->buffer, 0); + + rcu_read_lock(); + xas_for_each_marked(&xas, page, ULONG_MAX, XA_MARK_0) { + put_page(page); + } + rcu_read_unlock(); + xa_destroy(&rreq->buffer); +} + +static int xa_insert_set_mark(struct xarray *xa, unsigned long index, + void *entry, xa_mark_t mark, gfp_t gfp_mask) +{ + int ret; + + xa_lock(xa); + ret = __xa_insert(xa, index, entry, gfp_mask); + if (ret == 0) + __xa_set_mark(xa, index, mark); + xa_unlock(xa); + return ret; +} + +/* + * Create the specified range of pages in the buffer attached to the read + * request. The pages are marked with XA_MARK_0 so that we know that these + * need freeing later. + */ +static int netfs_rreq_add_pages_to_buffer(struct netfs_read_request *rreq, + pgoff_t index, pgoff_t to, gfp_t gfp_mask) +{ + struct page *page; + int ret; + + if (to + 1 == index) /* Page range is inclusive */ + return 0; + + do { + page = __page_cache_alloc(gfp_mask); + if (!page) + return -ENOMEM; + page->index = index; + ret = xa_insert_set_mark(&rreq->buffer, index, page, XA_MARK_0, + gfp_mask); + if (ret < 0) { + __free_page(page); + return ret; + } + + index += thp_nr_pages(page); + } while (index < to); + + return 0; +} + +/* + * Set up a buffer into which to data will be read or decrypted/decompressed. + * The pages to be read into are attached to this buffer and the gaps filled in + * to form a continuous region. + */ +static int netfs_rreq_set_up_buffer(struct netfs_read_request *rreq, + struct readahead_control *ractl, + struct page *keep, + pgoff_t have_index, unsigned int have_pages) +{ + struct page *page; + gfp_t gfp_mask = readahead_gfp_mask(rreq->mapping); + unsigned int want_pages = have_pages; + pgoff_t want_index = have_index; + int ret; + +#if 0 + want_index = round_down(want_index, 256 * 1024 / PAGE_SIZE); + want_pages += have_index - want_index; + want_pages = round_up(want_pages, 256 * 1024 / PAGE_SIZE); + + kdebug("setup %lx-%lx -> %lx-%lx", + have_index, have_index + have_pages - 1, + want_index, want_index + want_pages - 1); +#endif + + ret = netfs_rreq_add_pages_to_buffer(rreq, want_index, have_index - 1, + gfp_mask); + if (ret < 0) + return ret; + have_pages += have_index - want_index; + + ret = netfs_rreq_add_pages_to_buffer(rreq, have_index + have_pages, + want_index + want_pages - 1, + gfp_mask); + if (ret < 0) + return ret; + + /* Transfer the pages proposed by the VM into the buffer along with + * their page refs. The locks will be dropped in netfs_rreq_unlock(). + */ + if (ractl) { + while ((page = readahead_page(ractl))) { + if (page == keep) + get_page(page); + ret = xa_insert_set_mark(&rreq->buffer, page->index, page, + XA_MARK_0, gfp_mask); + if (ret < 0) { + if (page != keep) + unlock_page(page); + put_page(page); + return ret; + } + } + } else { + get_page(keep); + ret = xa_insert_set_mark(&rreq->buffer, keep->index, keep, + XA_MARK_0, gfp_mask); + if (ret < 0) { + put_page(keep); + return ret; + } + } + return 0; +} + /** * netfs_readahead - Helper to manage a read request * @ractl: The description of the readahead request @@ -861,7 +991,6 @@ void netfs_readahead(struct readahead_control *ractl, void *netfs_priv) { struct netfs_read_request *rreq; - struct page *page; unsigned int debug_index = 0; int ret; @@ -889,6 +1018,12 @@ void netfs_readahead(struct readahead_control *ractl, netfs_rreq_expand(rreq, ractl); + /* Set up the output buffer */ + ret = netfs_rreq_set_up_buffer(rreq, ractl, NULL, + readahead_index(ractl), readahead_count(ractl)); + if (ret < 0) + goto cleanup_free; + atomic_set(&rreq->nr_rd_ops, 1); do { if (!netfs_rreq_submit_slice(rreq, &debug_index)) @@ -896,12 +1031,6 @@ void netfs_readahead(struct readahead_control *ractl, } while (rreq->submitted < rreq->len); - /* Drop the refs on the pages here rather than in the cache or - * filesystem. The locks will be dropped in netfs_rreq_unlock(). - */ - while ((page = readahead_page(ractl))) - put_page(page); - /* If we decrement nr_rd_ops to 0, the ref belongs to us. */ if (atomic_dec_and_test(&rreq->nr_rd_ops)) netfs_rreq_assess(rreq, false); @@ -967,6 +1096,12 @@ int netfs_readpage(struct file *file, netfs_stat(&netfs_n_rh_readpage); trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage); + /* Set up the output buffer */ + ret = netfs_rreq_set_up_buffer(rreq, NULL, page, + page_index(page), thp_nr_pages(page)); + if (ret < 0) + goto out; + netfs_get_read_request(rreq); atomic_set(&rreq->nr_rd_ops, 1); @@ -1134,13 +1269,18 @@ int netfs_write_begin(struct file *file, struct address_space *mapping, */ ractl._nr_pages = thp_nr_pages(page); netfs_rreq_expand(rreq, &ractl); - netfs_get_read_request(rreq); - /* We hold the page locks, so we can drop the references */ - while ((xpage = readahead_page(&ractl))) - if (xpage != page) - put_page(xpage); + /* Set up the output buffer */ + ret = netfs_rreq_set_up_buffer(rreq, &ractl, page, + readahead_index(&ractl), readahead_count(&ractl)); + if (ret < 0) { + while ((xpage = readahead_page(&ractl))) + if (xpage != page) + put_page(xpage); + goto error_put; + } + netfs_get_read_request(rreq); atomic_set(&rreq->nr_rd_ops, 1); do { if (!netfs_rreq_submit_slice(rreq, &debug_index)) diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 45d40c622205..815001fe7a76 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -138,6 +138,7 @@ struct netfs_read_request { struct address_space *mapping; /* The mapping being accessed */ struct netfs_cache_resources cache_resources; struct list_head subrequests; /* Requests to fetch I/O from disk or net */ + struct xarray buffer; /* Decryption/decompression buffer */ void *netfs_priv; /* Private data for the netfs */ unsigned int debug_id; atomic_t nr_rd_ops; /* Number of read ops in progress */ From patchwork Wed Jul 21 13:45:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12390947 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D621AC636CE for ; Wed, 21 Jul 2021 13:46:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B40D861246 for ; Wed, 21 Jul 2021 13:46:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238618AbhGUNGU (ORCPT ); Wed, 21 Jul 2021 09:06:20 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:23672 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238748AbhGUNFa (ORCPT ); Wed, 21 Jul 2021 09:05:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875166; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tJF6irQE00ShIEgld85BW9bY0f39qS1KE4GzM9lKwvI=; b=fWLA0O9z0sWG4/xLSEzvQDDnz13+qBIrzRR4RnRMXkCC5s44Zl+PVjI/V5ll36VRLhtfGf eXydgItOBswImpx85l4cd/3zOzz+OnxZDTbCWUcSKu7x8ZnoVQR7H2VzyINwsyOX+mR+3L iUL2D8xl1oigKHWFcZxV/9Y/M8ClCPs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-495-0ZU9Yv9pPY-NZ_7LkAyTLg-1; Wed, 21 Jul 2021 09:46:05 -0400 X-MC-Unique: 0ZU9Yv9pPY-NZ_7LkAyTLg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DFEBC93920; Wed, 21 Jul 2021 13:46:02 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id A515560C59; Wed, 21 Jul 2021 13:45:58 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 05/12] netfs: Add a netfs inode context From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:45:52 +0100 Message-ID: <162687515266.276387.1299416976214634692.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org Add a netfs_i_context struct that should be included in the network filesystem's own inode struct wrapper, directly after the VFS's inode struct, e.g.: struct my_inode { struct { struct inode vfs_inode; struct netfs_i_context netfs_ctx; }; }; The netfs_i_context struct contains two fields for the network filesystem to use: struct netfs_i_context { ... struct fscache_cookie *cache; unsigned long flags; #define NETFS_ICTX_NEW_CONTENT 0 }; There's a pointer to the cache cookie and a flag to indicate that the content in the file is locally generated and entirely new (ie. the file was just created locally or was truncated to nothing). Two functions are provided to help with this: (1) void netfs_i_context_init(struct inode *inode, const struct netfs_request_ops *ops); Initialise the netfs context and set the operations. (2) struct netfs_i_context *netfs_i_context(struct inode *inode); Find the netfs context from the inode struct. Signed-off-by: David Howells --- fs/afs/callback.c | 2 - fs/afs/dir.c | 2 - fs/afs/dynroot.c | 1 fs/afs/file.c | 29 ++--------- fs/afs/inode.c | 10 ++-- fs/afs/internal.h | 13 ++--- fs/afs/super.c | 2 - fs/afs/write.c | 7 +-- fs/ceph/addr.c | 2 - fs/netfs/internal.h | 11 ++++ fs/netfs/read_helper.c | 124 ++++++++++++++++++++++-------------------------- fs/netfs/stats.c | 1 include/linux/netfs.h | 66 +++++++++++++++++++++----- 13 files changed, 146 insertions(+), 124 deletions(-) diff --git a/fs/afs/callback.c b/fs/afs/callback.c index 7d9b23d981bf..0d4b9678ad22 100644 --- a/fs/afs/callback.c +++ b/fs/afs/callback.c @@ -41,7 +41,7 @@ void __afs_break_callback(struct afs_vnode *vnode, enum afs_cb_break_reason reas { _enter(""); - clear_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags); + clear_bit(NETFS_ICTX_NEW_CONTENT, &netfs_i_context(&vnode->vfs_inode)->flags); if (test_and_clear_bit(AFS_VNODE_CB_PROMISED, &vnode->flags)) { vnode->cb_break++; afs_clear_permits(vnode); diff --git a/fs/afs/dir.c b/fs/afs/dir.c index ac829e63c570..a4c9cd6de622 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -1350,7 +1350,7 @@ static void afs_vnode_new_inode(struct afs_operation *op) } vnode = AFS_FS_I(inode); - set_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags); + set_bit(NETFS_ICTX_NEW_CONTENT, &netfs_i_context(&vnode->vfs_inode)->flags); if (!op->error) afs_cache_permit(vnode, op->key, vnode->cb_break, &vp->scb); d_instantiate(op->dentry, inode); diff --git a/fs/afs/dynroot.c b/fs/afs/dynroot.c index db832cc931c8..f120bcb8bf73 100644 --- a/fs/afs/dynroot.c +++ b/fs/afs/dynroot.c @@ -76,6 +76,7 @@ struct inode *afs_iget_pseudo_dir(struct super_block *sb, bool root) /* there shouldn't be an existing inode */ BUG_ON(!(inode->i_state & I_NEW)); + netfs_i_context_init(inode, NULL); inode->i_size = 0; inode->i_mode = S_IFDIR | S_IRUGO | S_IXUGO; if (root) { diff --git a/fs/afs/file.c b/fs/afs/file.c index 82e945dbe379..1861e4ecc2ce 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -18,13 +18,11 @@ #include "internal.h" static int afs_file_mmap(struct file *file, struct vm_area_struct *vma); -static int afs_readpage(struct file *file, struct page *page); static int afs_symlink_readpage(struct file *file, struct page *page); static void afs_invalidatepage(struct page *page, unsigned int offset, unsigned int length); static int afs_releasepage(struct page *page, gfp_t gfp_flags); -static void afs_readahead(struct readahead_control *ractl); static ssize_t afs_direct_IO(struct kiocb *iocb, struct iov_iter *iter); const struct file_operations afs_file_operations = { @@ -48,8 +46,8 @@ const struct inode_operations afs_file_inode_operations = { }; const struct address_space_operations afs_file_aops = { - .readpage = afs_readpage, - .readahead = afs_readahead, + .readpage = netfs_readpage, + .readahead = netfs_readahead, .set_page_dirty = afs_set_page_dirty, .launder_page = afs_launder_page, .releasepage = afs_releasepage, @@ -153,7 +151,8 @@ int afs_open(struct inode *inode, struct file *file) } if (file->f_flags & O_TRUNC) - set_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags); + set_bit(NETFS_ICTX_NEW_CONTENT, + &netfs_i_context(&vnode->vfs_inode)->flags); fscache_use_cookie(afs_vnode_cache(vnode), file->f_mode & FMODE_WRITE); @@ -351,13 +350,6 @@ static void afs_init_rreq(struct netfs_read_request *rreq, struct file *file) rreq->netfs_priv = key_get(afs_file_key(file)); } -static bool afs_is_cache_enabled(struct inode *inode) -{ - struct fscache_cookie *cookie = afs_vnode_cache(AFS_FS_I(inode)); - - return fscache_cookie_enabled(cookie) && cookie->cache_priv; -} - static int afs_begin_cache_operation(struct netfs_read_request *rreq) { struct afs_vnode *vnode = AFS_FS_I(rreq->inode); @@ -378,25 +370,14 @@ static void afs_priv_cleanup(struct address_space *mapping, void *netfs_priv) key_put(netfs_priv); } -const struct netfs_read_request_ops afs_req_ops = { +const struct netfs_request_ops afs_req_ops = { .init_rreq = afs_init_rreq, - .is_cache_enabled = afs_is_cache_enabled, .begin_cache_operation = afs_begin_cache_operation, .check_write_begin = afs_check_write_begin, .issue_op = afs_req_issue_op, .cleanup = afs_priv_cleanup, }; -static int afs_readpage(struct file *file, struct page *page) -{ - return netfs_readpage(file, page, &afs_req_ops, NULL); -} - -static void afs_readahead(struct readahead_control *ractl) -{ - netfs_readahead(ractl, &afs_req_ops, NULL); -} - int afs_write_inode(struct inode *inode, struct writeback_control *wbc) { fscache_unpin_writeback(wbc, afs_vnode_cache(AFS_FS_I(inode))); diff --git a/fs/afs/inode.c b/fs/afs/inode.c index cf7b66957c6f..3e9e388245a1 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -430,7 +430,7 @@ static void afs_get_inode_cache(struct afs_vnode *vnode) struct afs_vnode_cache_aux aux; if (vnode->status.type != AFS_FTYPE_FILE) { - vnode->cache = NULL; + vnode->netfs_ctx.cache = NULL; return; } @@ -440,7 +440,7 @@ static void afs_get_inode_cache(struct afs_vnode *vnode) key.vnode_id_ext[1] = htonl(vnode->fid.vnode_hi); afs_set_cache_aux(vnode, &aux); - vnode->cache = fscache_acquire_cookie( + vnode->netfs_ctx.cache = fscache_acquire_cookie( vnode->volume->cache, vnode->status.type == AFS_FTYPE_FILE ? 0 : FSCACHE_ADV_SINGLE_CHUNK, &key, sizeof(key), @@ -479,6 +479,7 @@ struct inode *afs_iget(struct afs_operation *op, struct afs_vnode_param *vp) return inode; } + netfs_i_context_init(inode, &afs_req_ops); ret = afs_inode_init_from_status(op, vp, vnode); if (ret < 0) goto bad_inode; @@ -535,6 +536,7 @@ struct inode *afs_root_iget(struct super_block *sb, struct key *key) _debug("GOT ROOT INODE %p { vl=%llx }", inode, as->volume->vid); BUG_ON(!(inode->i_state & I_NEW)); + netfs_i_context_init(inode, &afs_req_ops); vnode = AFS_FS_I(inode); vnode->cb_v_break = as->volume->cb_v_break, @@ -803,9 +805,9 @@ void afs_evict_inode(struct inode *inode) } #ifdef CONFIG_AFS_FSCACHE - fscache_relinquish_cookie(vnode->cache, + fscache_relinquish_cookie(vnode->netfs_ctx.cache, test_bit(AFS_VNODE_DELETED, &vnode->flags)); - vnode->cache = NULL; + vnode->netfs_ctx.cache = NULL; #endif afs_prune_wb_keys(vnode); diff --git a/fs/afs/internal.h b/fs/afs/internal.h index ccdde00ada8a..e0204dde4b50 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -615,15 +615,15 @@ enum afs_lock_state { * leak from one inode to another. */ struct afs_vnode { - struct inode vfs_inode; /* the VFS's inode record */ + struct { + struct inode vfs_inode; /* the VFS's inode record */ + struct netfs_i_context netfs_ctx; /* Netfslib context */ + }; struct afs_volume *volume; /* volume on which vnode resides */ struct afs_fid fid; /* the file identifier for this inode */ struct afs_file_status status; /* AFS status info for this file */ afs_dataversion_t invalid_before; /* Child dentries are invalid before this */ -#ifdef CONFIG_AFS_FSCACHE - struct fscache_cookie *cache; /* caching cookie */ -#endif struct afs_permits __rcu *permit_cache; /* cache of permits so far obtained */ struct mutex io_lock; /* Lock for serialising I/O on this mutex */ struct rw_semaphore validate_lock; /* lock for validating this vnode */ @@ -640,7 +640,6 @@ struct afs_vnode { #define AFS_VNODE_MOUNTPOINT 5 /* set if vnode is a mountpoint symlink */ #define AFS_VNODE_AUTOCELL 6 /* set if Vnode is an auto mount point */ #define AFS_VNODE_PSEUDODIR 7 /* set if Vnode is a pseudo directory */ -#define AFS_VNODE_NEW_CONTENT 8 /* Set if file has new content (create/trunc-0) */ #define AFS_VNODE_SILLY_DELETED 9 /* Set if file has been silly-deleted */ #define AFS_VNODE_MODIFYING 10 /* Set if we're performing a modification op */ @@ -666,7 +665,7 @@ struct afs_vnode { static inline struct fscache_cookie *afs_vnode_cache(struct afs_vnode *vnode) { #ifdef CONFIG_AFS_FSCACHE - return vnode->cache; + return vnode->netfs_ctx.cache; #else return NULL; #endif @@ -1054,7 +1053,7 @@ extern const struct address_space_operations afs_file_aops; extern const struct address_space_operations afs_symlink_aops; extern const struct inode_operations afs_file_inode_operations; extern const struct file_operations afs_file_operations; -extern const struct netfs_read_request_ops afs_req_ops; +extern const struct netfs_request_ops afs_req_ops; extern int afs_cache_wb_key(struct afs_vnode *, struct afs_file *); extern void afs_put_wb_key(struct afs_wb_key *); diff --git a/fs/afs/super.c b/fs/afs/super.c index 85e52c78f44f..29c1178beb72 100644 --- a/fs/afs/super.c +++ b/fs/afs/super.c @@ -692,7 +692,7 @@ static struct inode *afs_alloc_inode(struct super_block *sb) vnode->lock_key = NULL; vnode->permit_cache = NULL; #ifdef CONFIG_AFS_FSCACHE - vnode->cache = NULL; + vnode->netfs_ctx.cache = NULL; #endif vnode->flags = 1 << AFS_VNODE_UNSET; diff --git a/fs/afs/write.c b/fs/afs/write.c index 3be3a594124c..a244187f3503 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -49,8 +49,7 @@ int afs_write_begin(struct file *file, struct address_space *mapping, * file. We need to do this before we get a lock on the page in case * there's more than one writer competing for the same cache block. */ - ret = netfs_write_begin(file, mapping, pos, len, flags, &page, fsdata, - &afs_req_ops, NULL); + ret = netfs_write_begin(file, mapping, pos, len, flags, &page, fsdata); if (ret < 0) return ret; @@ -76,7 +75,7 @@ int afs_write_begin(struct file *file, struct address_space *mapping, * spaces to be merged into writes. If it's not, only write * back what the user gives us. */ - if (!test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags) && + if (!test_bit(NETFS_ICTX_NEW_CONTENT, &vnode->netfs_ctx.flags) && (to < f || from > t)) goto flush_conflicting_write; } @@ -557,7 +556,7 @@ static ssize_t afs_write_back_from_locked_page(struct address_space *mapping, unsigned long priv; unsigned int offset, to, len, max_len; loff_t i_size = i_size_read(&vnode->vfs_inode); - bool new_content = test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags); + bool new_content = test_bit(NETFS_ICTX_NEW_CONTENT, &vnode->netfs_ctx.flags); long count = wbc->nr_to_write; int ret; diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index a1e2813731d1..a8a41254e691 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -305,7 +305,7 @@ static void ceph_readahead_cleanup(struct address_space *mapping, void *priv) ceph_put_cap_refs(ci, got); } -static const struct netfs_read_request_ops ceph_netfs_read_ops = { +static const struct netfs_request_ops ceph_netfs_read_ops = { .init_rreq = ceph_init_rreq, .is_cache_enabled = ceph_is_cache_enabled, .begin_cache_operation = ceph_begin_cache_operation, diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index b7f2c4459f33..4805d9fc8808 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -5,6 +5,10 @@ * Written by David Howells (dhowells@redhat.com) */ +#include +#include +#include + #ifdef pr_fmt #undef pr_fmt #endif @@ -50,6 +54,13 @@ static inline void netfs_stat_d(atomic_t *stat) atomic_dec(stat); } +static inline bool netfs_is_cache_enabled(struct inode *inode) +{ + struct fscache_cookie *cookie = netfs_i_cookie(inode); + + return fscache_cookie_enabled(cookie) && cookie->cache_priv; +} + #else #define netfs_stat(x) do {} while(0) #define netfs_stat_d(x) do {} while(0) diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index b03bc5b0da5a..aa98ecf6df6b 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -14,7 +14,6 @@ #include #include #include -#include #include "internal.h" #define CREATE_TRACE_POINTS #include @@ -38,26 +37,27 @@ static void netfs_put_subrequest(struct netfs_read_subrequest *subreq, __netfs_put_subrequest(subreq, was_async); } -static struct netfs_read_request *netfs_alloc_read_request( - const struct netfs_read_request_ops *ops, void *netfs_priv, - struct file *file) +static struct netfs_read_request *netfs_alloc_read_request(struct address_space *mapping, + struct file *file) { static atomic_t debug_ids; + struct inode *inode = file ? file_inode(file) : mapping->host; + struct netfs_i_context *ctx = netfs_i_context(inode); struct netfs_read_request *rreq; rreq = kzalloc(sizeof(struct netfs_read_request), GFP_KERNEL); if (rreq) { - rreq->netfs_ops = ops; - rreq->netfs_priv = netfs_priv; - rreq->inode = file_inode(file); - rreq->i_size = i_size_read(rreq->inode); + rreq->mapping = mapping; + rreq->inode = inode; + rreq->netfs_ops = ctx->ops; + rreq->i_size = i_size_read(inode); rreq->debug_id = atomic_inc_return(&debug_ids); xa_init(&rreq->buffer); INIT_LIST_HEAD(&rreq->subrequests); INIT_WORK(&rreq->work, netfs_rreq_work); refcount_set(&rreq->usage, 1); __set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags); - ops->init_rreq(rreq, file); + ctx->ops->init_rreq(rreq, file); netfs_stat(&netfs_n_rh_rreq); } @@ -971,8 +971,6 @@ static int netfs_rreq_set_up_buffer(struct netfs_read_request *rreq, /** * netfs_readahead - Helper to manage a read request * @ractl: The description of the readahead request - * @ops: The network filesystem's operations for the helper to use - * @netfs_priv: Private netfs data to be retained in the request * * Fulfil a readahead request by drawing data from the cache if possible, or * the netfs if not. Space beyond the EOF is zero-filled. Multiple I/O @@ -980,34 +978,31 @@ static int netfs_rreq_set_up_buffer(struct netfs_read_request *rreq, * readahead window can be expanded in either direction to a more convenient * alighment for RPC efficiency or to make storage in the cache feasible. * - * The calling netfs must provide a table of operations, only one of which, - * issue_op, is mandatory. It may also be passed a private token, which will - * be retained in rreq->netfs_priv and will be cleaned up by ops->cleanup(). + * The calling netfs must initialise a netfs context contiguous to the vfs + * inode before calling this. * * This is usable whether or not caching is enabled. */ -void netfs_readahead(struct readahead_control *ractl, - const struct netfs_read_request_ops *ops, - void *netfs_priv) +void netfs_readahead(struct readahead_control *ractl) { struct netfs_read_request *rreq; + struct netfs_i_context *ctx = netfs_i_context(ractl->mapping->host); unsigned int debug_index = 0; int ret; _enter("%lx,%x", readahead_index(ractl), readahead_count(ractl)); if (readahead_count(ractl) == 0) - goto cleanup; + return; - rreq = netfs_alloc_read_request(ops, netfs_priv, ractl->file); + rreq = netfs_alloc_read_request(ractl->mapping, ractl->file); if (!rreq) - goto cleanup; - rreq->mapping = ractl->mapping; + return; rreq->start = readahead_pos(ractl); rreq->len = readahead_length(ractl); - if (ops->begin_cache_operation) { - ret = ops->begin_cache_operation(rreq); + if (ctx->ops->begin_cache_operation) { + ret = ctx->ops->begin_cache_operation(rreq); if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) goto cleanup_free; } @@ -1039,10 +1034,6 @@ void netfs_readahead(struct readahead_control *ractl, cleanup_free: netfs_put_read_request(rreq, false); return; -cleanup: - if (netfs_priv) - ops->cleanup(ractl->mapping, netfs_priv); - return; } EXPORT_SYMBOL(netfs_readahead); @@ -1050,43 +1041,34 @@ EXPORT_SYMBOL(netfs_readahead); * netfs_readpage - Helper to manage a readpage request * @file: The file to read from * @page: The page to read - * @ops: The network filesystem's operations for the helper to use - * @netfs_priv: Private netfs data to be retained in the request * * Fulfil a readpage request by drawing data from the cache if possible, or the * netfs if not. Space beyond the EOF is zero-filled. Multiple I/O requests * from different sources will get munged together. * - * The calling netfs must provide a table of operations, only one of which, - * issue_op, is mandatory. It may also be passed a private token, which will - * be retained in rreq->netfs_priv and will be cleaned up by ops->cleanup(). + * The calling netfs must initialise a netfs context contiguous to the vfs + * inode before calling this. * * This is usable whether or not caching is enabled. */ -int netfs_readpage(struct file *file, - struct page *page, - const struct netfs_read_request_ops *ops, - void *netfs_priv) +int netfs_readpage(struct file *file, struct page *page) { + struct address_space *mapping = page_file_mapping(page); struct netfs_read_request *rreq; + struct netfs_i_context *ctx = netfs_i_context(mapping->host); unsigned int debug_index = 0; int ret; _enter("%lx", page_index(page)); - rreq = netfs_alloc_read_request(ops, netfs_priv, file); - if (!rreq) { - if (netfs_priv) - ops->cleanup(netfs_priv, page_file_mapping(page)); - unlock_page(page); - return -ENOMEM; - } - rreq->mapping = page_file_mapping(page); + rreq = netfs_alloc_read_request(mapping, file); + if (!rreq) + goto nomem; rreq->start = page_file_offset(page); rreq->len = thp_size(page); - if (ops->begin_cache_operation) { - ret = ops->begin_cache_operation(rreq); + if (ctx->ops->begin_cache_operation) { + ret = ctx->ops->begin_cache_operation(rreq); if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) { unlock_page(page); goto out; @@ -1128,6 +1110,9 @@ int netfs_readpage(struct file *file, out: netfs_put_read_request(rreq, false); return ret; +nomem: + unlock_page(page); + return -ENOMEM; } EXPORT_SYMBOL(netfs_readpage); @@ -1136,6 +1121,7 @@ EXPORT_SYMBOL(netfs_readpage); * @page: page being prepared * @pos: starting position for the write * @len: length of write + * @always_fill: T if the page should always be completely filled/cleared * * In some cases, write_begin doesn't need to read at all: * - full page write @@ -1145,14 +1131,24 @@ EXPORT_SYMBOL(netfs_readpage); * If any of these criteria are met, then zero out the unwritten parts * of the page and return true. Otherwise, return false. */ -static bool netfs_skip_page_read(struct page *page, loff_t pos, size_t len) +static bool netfs_skip_page_read(struct page *page, loff_t pos, size_t len, + bool always_fill) { struct inode *inode = page->mapping->host; loff_t i_size = i_size_read(inode); size_t offset = offset_in_thp(page, pos); + size_t plen = thp_size(page); + + if (unlikely(always_fill)) { + if (pos - offset + len <= i_size) + return false; /* Page entirely before EOF */ + zero_user_segment(page, 0, plen); + SetPageUptodate(page); + return true; + } /* Full page write */ - if (offset == 0 && len >= thp_size(page)) + if (offset == 0 && len >= plen) return true; /* pos beyond last page in the file */ @@ -1165,7 +1161,7 @@ static bool netfs_skip_page_read(struct page *page, loff_t pos, size_t len) return false; zero_out: - zero_user_segments(page, 0, offset, offset + len, thp_size(page)); + zero_user_segments(page, 0, offset, offset + len, plen); return true; } @@ -1178,8 +1174,6 @@ static bool netfs_skip_page_read(struct page *page, loff_t pos, size_t len) * @flags: AOP_* flags * @_page: Where to put the resultant page * @_fsdata: Place for the netfs to store a cookie - * @ops: The network filesystem's operations for the helper to use - * @netfs_priv: Private netfs data to be retained in the request * * Pre-read data for a write-begin request by drawing data from the cache if * possible, or the netfs if not. Space beyond the EOF is zero-filled. @@ -1198,17 +1192,19 @@ static bool netfs_skip_page_read(struct page *page, loff_t pos, size_t len) * should go ahead; unlock the page and return -EAGAIN to cause the page to be * regot; or return an error. * + * The calling netfs must initialise a netfs context contiguous to the vfs + * inode before calling this. + * * This is usable whether or not caching is enabled. */ int netfs_write_begin(struct file *file, struct address_space *mapping, loff_t pos, unsigned int len, unsigned int flags, - struct page **_page, void **_fsdata, - const struct netfs_read_request_ops *ops, - void *netfs_priv) + struct page **_page, void **_fsdata) { struct netfs_read_request *rreq; struct page *page, *xpage; struct inode *inode = file_inode(file); + struct netfs_i_context *ctx = netfs_i_context(inode); unsigned int debug_index = 0; pgoff_t index = pos >> PAGE_SHIFT; int ret; @@ -1220,9 +1216,9 @@ int netfs_write_begin(struct file *file, struct address_space *mapping, if (!page) return -ENOMEM; - if (ops->check_write_begin) { + if (ctx->ops->check_write_begin) { /* Allow the netfs (eg. ceph) to flush conflicts. */ - ret = ops->check_write_begin(file, pos, len, page, _fsdata); + ret = ctx->ops->check_write_begin(file, pos, len, page, _fsdata); if (ret < 0) { trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin); if (ret == -EAGAIN) @@ -1238,25 +1234,23 @@ int netfs_write_begin(struct file *file, struct address_space *mapping, * within the cache granule containing the EOF, in which case we need * to preload the granule. */ - if (!ops->is_cache_enabled(inode) && - netfs_skip_page_read(page, pos, len)) { + if (!netfs_is_cache_enabled(inode) && + netfs_skip_page_read(page, pos, len, false)) { netfs_stat(&netfs_n_rh_write_zskip); goto have_page_no_wait; } ret = -ENOMEM; - rreq = netfs_alloc_read_request(ops, netfs_priv, file); + rreq = netfs_alloc_read_request(mapping, file); if (!rreq) goto error; - rreq->mapping = page->mapping; rreq->start = page_offset(page); rreq->len = thp_size(page); rreq->no_unlock_page = page->index; __set_bit(NETFS_RREQ_NO_UNLOCK_PAGE, &rreq->flags); - netfs_priv = NULL; - if (ops->begin_cache_operation) { - ret = ops->begin_cache_operation(rreq); + if (ctx->ops->begin_cache_operation) { + ret = ctx->ops->begin_cache_operation(rreq); if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) goto error_put; } @@ -1314,8 +1308,6 @@ int netfs_write_begin(struct file *file, struct address_space *mapping, if (ret < 0) goto error; have_page_no_wait: - if (netfs_priv) - ops->cleanup(netfs_priv, mapping); *_page = page; _leave(" = 0"); return 0; @@ -1325,8 +1317,6 @@ int netfs_write_begin(struct file *file, struct address_space *mapping, error: unlock_page(page); put_page(page); - if (netfs_priv) - ops->cleanup(netfs_priv, mapping); _leave(" = %d", ret); return ret; } diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index 9ae538c85378..5510a7a14a40 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -7,7 +7,6 @@ #include #include -#include #include "internal.h" atomic_t netfs_n_rh_readahead; diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 815001fe7a76..35bcd916c3a0 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -157,14 +157,25 @@ struct netfs_read_request { #define NETFS_RREQ_DONT_UNLOCK_PAGES 3 /* Don't unlock the pages on completion */ #define NETFS_RREQ_FAILED 4 /* The request failed */ #define NETFS_RREQ_IN_PROGRESS 5 /* Unlocked when the request completes */ - const struct netfs_read_request_ops *netfs_ops; + const struct netfs_request_ops *netfs_ops; +}; + +/* + * Per-inode description. This must be directly after the inode struct. + */ +struct netfs_i_context { + const struct netfs_request_ops *ops; +#ifdef CONFIG_FSCACHE + struct fscache_cookie *cache; +#endif + unsigned long flags; +#define NETFS_ICTX_NEW_CONTENT 0 /* Set if file has new content (create/trunc-0) */ }; /* * Operations the network filesystem can/must provide to the helpers. */ -struct netfs_read_request_ops { - bool (*is_cache_enabled)(struct inode *inode); +struct netfs_request_ops { void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); int (*begin_cache_operation)(struct netfs_read_request *rreq); void (*expand_readahead)(struct netfs_read_request *rreq); @@ -218,20 +229,49 @@ struct netfs_cache_ops { }; struct readahead_control; -extern void netfs_readahead(struct readahead_control *, - const struct netfs_read_request_ops *, - void *); -extern int netfs_readpage(struct file *, - struct page *, - const struct netfs_read_request_ops *, - void *); +extern void netfs_readahead(struct readahead_control *); +extern int netfs_readpage(struct file *, struct page *); extern int netfs_write_begin(struct file *, struct address_space *, loff_t, unsigned int, unsigned int, struct page **, - void **, - const struct netfs_read_request_ops *, - void *); + void **); extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); extern void netfs_stats_show(struct seq_file *); +/** + * netfs_i_context - Get the netfs inode context from the inode + * @inode: The inode to query + * + * This function gets the netfs lib inode context from the network filesystem's + * inode. It expects it to follow on directly from the VFS inode struct. + */ +static inline struct netfs_i_context *netfs_i_context(struct inode *inode) +{ + return (struct netfs_i_context *)(inode + 1); +} + +static inline void netfs_i_context_init(struct inode *inode, + const struct netfs_request_ops *ops) +{ + struct netfs_i_context *ctx = netfs_i_context(inode); + + ctx->ops = ops; +} + +/** + * netfs_i_cookie - Get the cache cookie from the inode + * @inode: The inode to query + * + * Get the caching cookie (if enabled) from the network filesystem's inode. + */ +static inline struct fscache_cookie *netfs_i_cookie(struct inode *inode) +{ +#ifdef CONFIG_FSCACHE + struct netfs_i_context *ctx = netfs_i_context(inode); + return ctx->cache; +#else + return NULL; +#endif +} + #endif /* _LINUX_NETFS_H */ From patchwork Wed Jul 21 13:46:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12390953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B09C5C6377A for ; Wed, 21 Jul 2021 13:47:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9176561241 for ; Wed, 21 Jul 2021 13:47:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238648AbhGUNGc (ORCPT ); Wed, 21 Jul 2021 09:06:32 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:27679 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238763AbhGUNFn (ORCPT ); Wed, 21 Jul 2021 09:05:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875179; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i721gV0aIu0rvLexN0pBqMbwBbkKzEhWVB3KEf7BlKk=; b=fjSJTgr714BYvtBpuw/fi8+jODXPkrOa2yvTDy1xsbURRnrCGe1gCtSGPHsPP5uFe++sOM uQl8LlvAzuyQzEDvxrLHisPh16DS6p1xEjfFHcSA9Iz6s653BnhTjKhayeovdNvGmQYf6X wAaN995DfHofJUBm3OsrcPAArEd5I8Q= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-199-KrkAGfI-N4-Yedd1kx3LBQ-1; Wed, 21 Jul 2021 09:46:15 -0400 X-MC-Unique: KrkAGfI-N4-Yedd1kx3LBQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1B59C93920; Wed, 21 Jul 2021 13:46:13 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id E905A6EF4F; Wed, 21 Jul 2021 13:46:08 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 06/12] netfs: Keep lists of pending, active, dirty and flushed regions From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:46:08 +0100 Message-ID: <162687516812.276387.504081062999158040.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org This looks nice, in theory, and has the following features: (*) Things are managed with write records. (-) A WRITE is a region defined by an outer bounding box that spans the pages that are involved and an inner region that contains the actual modifications. (-) The bounding box must encompass all the data that will be necessary to perform a write operation to the server (for example, if we want to encrypt with a 64K block size when we have 4K pages). (*) There are four list of write records: (-) The PENDING LIST holds writes that are blocked by another active write. This list is in order of submission to avoid starvation and may overlap. (-) The ACTIVE LIST holds writes that have been granted exclusive access to a patch. This is in order of starting position and regions held therein may not overlap. (-) The DIRTY LIST holds a list of regions that have been modified. This is also in order of starting position and regions may not overlap, though they can be merged. (-) The FLUSH LIST holds a list of regions that require writing. This is in order of grouping. (*) An active region acts as an exclusion zone on part of the range, allowing the inode sem to be dropped once the region is on a list. (-) A DIO write creates its own exclusive region that must not overlap with any other dirty region. (-) An active write may overlap one or more dirty regions. (-) A dirty region may be overlapped by one or more writes. (-) If an active write overlaps with an incompatible dirty region, that region gets flushed, the active write has to wait for it to complete. (*) When an active write completes, the region is inserted or merged into the dirty list. (-) Merging can only happen between compatible regions. (-) Contiguous dirty regions can be merged. (-) If an inode has all new content, generated locally, dirty regions that have contiguous/ovelapping bounding boxes can be merged, bridging any gaps with zeros. (-) O_DSYNC causes the region to be flushed immediately. (*) There's a queue of groups of regions and those regions must be flushed in order. (-) If a region in a group needs flushing, then all prior groups must be flushed first. TRICKY BITS =========== (*) The active and dirty lists are O(n) search time. An interval tree might be a better option. (*) Having four list_heads is a lot of memory per inode. (*) Activating pending writes. (-) The pending list can contain a bunch of writes that can overlap. (-) When an active write completes, it is removed from the active queue and usually added to the dirty queue (except DIO, DSYNC). This makes a hole. (-) One or more pending writes can then be moved over, but care has to be taken not to misorder them to avoid starvation. (-) When a pending write is added to the active list, it may require part of the dirty list to be flushed. (*) A write that has been put onto the active queue may have to wait for flushing to complete. (*) How should an active write interact with a dirty region? (-) A dirty region may get flushed even whilst it is being modified on the assumption that the active write record will get added to the dirty list and cause a follow up write to the server. (*) RAM pinning. (-) An active write could pin a lot of pages, thereby causing a large write to run the system out of RAM. (-) Allow active writes to start being flushed whilst still being modified. (-) Use a scheduler hook to decant the modified portion into the dirty list when the modifying task is switched away from? (*) Bounding box and variably-sized pages/folios. (-) The bounding box needs to be rounded out to the page boundaries so that DIO writes can claim exclusivity on a series of pages so that they can be invalidated. (-) Allocation of higher-order folios could be limited in scope so that they don't escape the requested bounding box. (-) Bounding boxes could be enlarged to allow for larger folios. (-) Overlarge bounding boxes can be shrunk later, possibly on merging into the dirty list. (-) Ordinary writes can have overlapping bounding boxes, even if they're otherwise incompatible. --- fs/afs/file.c | 30 + fs/afs/internal.h | 7 fs/afs/write.c | 166 -------- fs/netfs/Makefile | 8 fs/netfs/dio_helper.c | 140 ++++++ fs/netfs/internal.h | 32 + fs/netfs/objects.c | 113 +++++ fs/netfs/read_helper.c | 94 ++++ fs/netfs/stats.c | 5 fs/netfs/write_helper.c | 908 ++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 98 +++++ include/trace/events/netfs.h | 180 ++++++++ 12 files changed, 1604 insertions(+), 177 deletions(-) create mode 100644 fs/netfs/dio_helper.c create mode 100644 fs/netfs/objects.c create mode 100644 fs/netfs/write_helper.c diff --git a/fs/afs/file.c b/fs/afs/file.c index 1861e4ecc2ce..8400cdf086b6 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -30,7 +30,7 @@ const struct file_operations afs_file_operations = { .release = afs_release, .llseek = generic_file_llseek, .read_iter = generic_file_read_iter, - .write_iter = afs_file_write, + .write_iter = netfs_file_write_iter, .mmap = afs_file_mmap, .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, @@ -53,8 +53,6 @@ const struct address_space_operations afs_file_aops = { .releasepage = afs_releasepage, .invalidatepage = afs_invalidatepage, .direct_IO = afs_direct_IO, - .write_begin = afs_write_begin, - .write_end = afs_write_end, .writepage = afs_writepage, .writepages = afs_writepages, }; @@ -370,12 +368,38 @@ static void afs_priv_cleanup(struct address_space *mapping, void *netfs_priv) key_put(netfs_priv); } +static void afs_init_dirty_region(struct netfs_dirty_region *region, struct file *file) +{ + region->netfs_priv = key_get(afs_file_key(file)); +} + +static void afs_free_dirty_region(struct netfs_dirty_region *region) +{ + key_put(region->netfs_priv); +} + +static void afs_update_i_size(struct file *file, loff_t new_i_size) +{ + struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); + loff_t i_size; + + write_seqlock(&vnode->cb_lock); + i_size = i_size_read(&vnode->vfs_inode); + if (new_i_size > i_size) + i_size_write(&vnode->vfs_inode, new_i_size); + write_sequnlock(&vnode->cb_lock); + fscache_update_cookie(afs_vnode_cache(vnode), NULL, &new_i_size); +} + const struct netfs_request_ops afs_req_ops = { .init_rreq = afs_init_rreq, .begin_cache_operation = afs_begin_cache_operation, .check_write_begin = afs_check_write_begin, .issue_op = afs_req_issue_op, .cleanup = afs_priv_cleanup, + .init_dirty_region = afs_init_dirty_region, + .free_dirty_region = afs_free_dirty_region, + .update_i_size = afs_update_i_size, }; int afs_write_inode(struct inode *inode, struct writeback_control *wbc) diff --git a/fs/afs/internal.h b/fs/afs/internal.h index e0204dde4b50..0d01ed2fe8fa 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -1511,15 +1511,8 @@ extern int afs_check_volume_status(struct afs_volume *, struct afs_operation *); * write.c */ extern int afs_set_page_dirty(struct page *); -extern int afs_write_begin(struct file *file, struct address_space *mapping, - loff_t pos, unsigned len, unsigned flags, - struct page **pagep, void **fsdata); -extern int afs_write_end(struct file *file, struct address_space *mapping, - loff_t pos, unsigned len, unsigned copied, - struct page *page, void *fsdata); extern int afs_writepage(struct page *, struct writeback_control *); extern int afs_writepages(struct address_space *, struct writeback_control *); -extern ssize_t afs_file_write(struct kiocb *, struct iov_iter *); extern int afs_fsync(struct file *, loff_t, loff_t, int); extern vm_fault_t afs_page_mkwrite(struct vm_fault *vmf); extern void afs_prune_wb_keys(struct afs_vnode *); diff --git a/fs/afs/write.c b/fs/afs/write.c index a244187f3503..e6e2e924c8ae 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -27,152 +27,6 @@ int afs_set_page_dirty(struct page *page) return fscache_set_page_dirty(page, afs_vnode_cache(AFS_FS_I(page->mapping->host))); } -/* - * Prepare to perform part of a write to a page. Note that len may extend - * beyond the end of the page. - */ -int afs_write_begin(struct file *file, struct address_space *mapping, - loff_t pos, unsigned len, unsigned flags, - struct page **_page, void **fsdata) -{ - struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); - struct page *page; - unsigned long priv; - unsigned f, from; - unsigned t, to; - int ret; - - _enter("{%llx:%llu},%llx,%x", - vnode->fid.vid, vnode->fid.vnode, pos, len); - - /* Prefetch area to be written into the cache if we're caching this - * file. We need to do this before we get a lock on the page in case - * there's more than one writer competing for the same cache block. - */ - ret = netfs_write_begin(file, mapping, pos, len, flags, &page, fsdata); - if (ret < 0) - return ret; - - from = offset_in_thp(page, pos); - len = min_t(size_t, len, thp_size(page) - from); - to = from + len; - -try_again: - /* See if this page is already partially written in a way that we can - * merge the new write with. - */ - if (PagePrivate(page)) { - priv = page_private(page); - f = afs_page_dirty_from(page, priv); - t = afs_page_dirty_to(page, priv); - ASSERTCMP(f, <=, t); - - if (PageWriteback(page)) { - trace_afs_page_dirty(vnode, tracepoint_string("alrdy"), page); - goto flush_conflicting_write; - } - /* If the file is being filled locally, allow inter-write - * spaces to be merged into writes. If it's not, only write - * back what the user gives us. - */ - if (!test_bit(NETFS_ICTX_NEW_CONTENT, &vnode->netfs_ctx.flags) && - (to < f || from > t)) - goto flush_conflicting_write; - } - - *_page = find_subpage(page, pos / PAGE_SIZE); - _leave(" = 0"); - return 0; - - /* The previous write and this write aren't adjacent or overlapping, so - * flush the page out. - */ -flush_conflicting_write: - _debug("flush conflict"); - ret = write_one_page(page); - if (ret < 0) - goto error; - - ret = lock_page_killable(page); - if (ret < 0) - goto error; - goto try_again; - -error: - put_page(page); - _leave(" = %d", ret); - return ret; -} - -/* - * Finalise part of a write to a page. Note that len may extend beyond the end - * of the page. - */ -int afs_write_end(struct file *file, struct address_space *mapping, - loff_t pos, unsigned len, unsigned copied, - struct page *subpage, void *fsdata) -{ - struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); - struct page *page = thp_head(subpage); - unsigned long priv; - unsigned int f, from = offset_in_thp(page, pos); - unsigned int t, to = from + copied; - loff_t i_size, write_end_pos; - - _enter("{%llx:%llu},{%lx}", - vnode->fid.vid, vnode->fid.vnode, page->index); - - len = min_t(size_t, len, thp_size(page) - from); - if (!PageUptodate(page)) { - if (copied < len) { - copied = 0; - goto out; - } - - SetPageUptodate(page); - } - - if (copied == 0) - goto out; - - write_end_pos = pos + copied; - - i_size = i_size_read(&vnode->vfs_inode); - if (write_end_pos > i_size) { - write_seqlock(&vnode->cb_lock); - i_size = i_size_read(&vnode->vfs_inode); - if (write_end_pos > i_size) - i_size_write(&vnode->vfs_inode, write_end_pos); - write_sequnlock(&vnode->cb_lock); - fscache_update_cookie(afs_vnode_cache(vnode), NULL, &write_end_pos); - } - - if (PagePrivate(page)) { - priv = page_private(page); - f = afs_page_dirty_from(page, priv); - t = afs_page_dirty_to(page, priv); - if (from < f) - f = from; - if (to > t) - t = to; - priv = afs_page_dirty(page, f, t); - set_page_private(page, priv); - trace_afs_page_dirty(vnode, tracepoint_string("dirty+"), page); - } else { - priv = afs_page_dirty(page, from, to); - attach_page_private(page, (void *)priv); - trace_afs_page_dirty(vnode, tracepoint_string("dirty"), page); - } - - if (set_page_dirty(page)) - _debug("dirtied %lx", page->index); - -out: - unlock_page(page); - put_page(page); - return copied; -} - /* * kill all the pages in the given range */ @@ -812,26 +666,6 @@ int afs_writepages(struct address_space *mapping, return ret; } -/* - * write to an AFS file - */ -ssize_t afs_file_write(struct kiocb *iocb, struct iov_iter *from) -{ - struct afs_vnode *vnode = AFS_FS_I(file_inode(iocb->ki_filp)); - size_t count = iov_iter_count(from); - - _enter("{%llx:%llu},{%zu},", - vnode->fid.vid, vnode->fid.vnode, count); - - if (IS_SWAPFILE(&vnode->vfs_inode)) { - printk(KERN_INFO - "AFS: Attempt to write to active swap file!\n"); - return -EBUSY; - } - - return generic_file_write_iter(iocb, from); -} - /* * flush any dirty pages for this process, and check for write errors. * - the return status from this call provides a reliable indication of diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index c15bfc966d96..3e11453ad2c5 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -1,5 +1,11 @@ # SPDX-License-Identifier: GPL-2.0 -netfs-y := read_helper.o stats.o +netfs-y := \ + objects.o \ + read_helper.o \ + write_helper.o +# dio_helper.o + +netfs-$(CONFIG_NETFS_STATS) += stats.o obj-$(CONFIG_NETFS_SUPPORT) := netfs.o diff --git a/fs/netfs/dio_helper.c b/fs/netfs/dio_helper.c new file mode 100644 index 000000000000..3072de344601 --- /dev/null +++ b/fs/netfs/dio_helper.c @@ -0,0 +1,140 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Network filesystem high-level DIO support. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "internal.h" +#include + +/* + * Perform a direct I/O write to a netfs server. + */ +ssize_t netfs_file_direct_write(struct netfs_dirty_region *region, + struct kiocb *iocb, struct iov_iter *from) +{ + struct file *file = iocb->ki_filp; + struct address_space *mapping = file->f_mapping; + struct inode *inode = mapping->host; + loff_t pos = iocb->ki_pos, last; + ssize_t written; + size_t write_len; + pgoff_t end; + int ret; + + write_len = iov_iter_count(from); + last = pos + write_len - 1; + end = to >> PAGE_SHIFT; + + if (iocb->ki_flags & IOCB_NOWAIT) { + /* If there are pages to writeback, return */ + if (filemap_range_has_page(file->f_mapping, pos, last)) + return -EAGAIN; + } else { + ret = filemap_write_and_wait_range(mapping, pos, last); + if (ret) + return ret; + } + + /* After a write we want buffered reads to be sure to go to disk to get + * the new data. We invalidate clean cached page from the region we're + * about to write. We do this *before* the write so that we can return + * without clobbering -EIOCBQUEUED from ->direct_IO(). + */ + ret = invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, end); + if (ret) { + /* If the page can't be invalidated, return 0 to fall back to + * buffered write. + */ + return ret == -EBUSY ? 0 : ret; + } + + written = mapping->a_ops->direct_IO(iocb, from); + + /* Finally, try again to invalidate clean pages which might have been + * cached by non-direct readahead, or faulted in by get_user_pages() + * if the source of the write was an mmap'ed region of the file + * we're writing. Either one is a pretty crazy thing to do, + * so we don't support it 100%. If this invalidation + * fails, tough, the write still worked... + * + * Most of the time we do not need this since dio_complete() will do + * the invalidation for us. However there are some file systems that + * do not end up with dio_complete() being called, so let's not break + * them by removing it completely. + * + * Noticeable example is a blkdev_direct_IO(). + * + * Skip invalidation for async writes or if mapping has no pages. + */ + if (written > 0 && mapping->nrpages && + invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, end)) + dio_warn_stale_pagecache(file); + + if (written > 0) { + pos += written; + write_len -= written; + if (pos > i_size_read(inode) && !S_ISBLK(inode->i_mode)) { + i_size_write(inode, pos); + mark_inode_dirty(inode); + } + iocb->ki_pos = pos; + } + if (written != -EIOCBQUEUED) + iov_iter_revert(from, write_len - iov_iter_count(from)); +out: +#if 0 + /* + * If the write stopped short of completing, fall back to + * buffered writes. Some filesystems do this for writes to + * holes, for example. For DAX files, a buffered write will + * not succeed (even if it did, DAX does not handle dirty + * page-cache pages correctly). + */ + if (written < 0 || !iov_iter_count(from) || IS_DAX(inode)) + goto out; + + status = netfs_perform_write(region, file, from, pos = iocb->ki_pos); + /* + * If generic_perform_write() returned a synchronous error + * then we want to return the number of bytes which were + * direct-written, or the error code if that was zero. Note + * that this differs from normal direct-io semantics, which + * will return -EFOO even if some bytes were written. + */ + if (unlikely(status < 0)) { + err = status; + goto out; + } + /* + * We need to ensure that the page cache pages are written to + * disk and invalidated to preserve the expected O_DIRECT + * semantics. + */ + endbyte = pos + status - 1; + err = filemap_write_and_wait_range(mapping, pos, endbyte); + if (err == 0) { + iocb->ki_pos = endbyte + 1; + written += status; + invalidate_mapping_pages(mapping, + pos >> PAGE_SHIFT, + endbyte >> PAGE_SHIFT); + } else { + /* + * We don't know how much we wrote, so just return + * the number of bytes which were direct-written + */ + } +#endif + return written; +} diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index 4805d9fc8808..77ceab694348 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -15,11 +15,41 @@ #define pr_fmt(fmt) "netfs: " fmt +/* + * dio_helper.c + */ +ssize_t netfs_file_direct_write(struct netfs_dirty_region *region, + struct kiocb *iocb, struct iov_iter *from); + +/* + * objects.c + */ +struct netfs_flush_group *netfs_get_flush_group(struct netfs_flush_group *group); +void netfs_put_flush_group(struct netfs_flush_group *group); +struct netfs_dirty_region *netfs_alloc_dirty_region(void); +struct netfs_dirty_region *netfs_get_dirty_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + enum netfs_region_trace what); +void netfs_free_dirty_region(struct netfs_i_context *ctx, struct netfs_dirty_region *region); +void netfs_put_dirty_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + enum netfs_region_trace what); + /* * read_helper.c */ extern unsigned int netfs_debug; +int netfs_prefetch_for_write(struct file *file, struct page *page, loff_t pos, size_t len, + bool always_fill); + +/* + * write_helper.c + */ +void netfs_flush_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + enum netfs_dirty_trace why); + /* * stats.c */ @@ -42,6 +72,8 @@ extern atomic_t netfs_n_rh_write_begin; extern atomic_t netfs_n_rh_write_done; extern atomic_t netfs_n_rh_write_failed; extern atomic_t netfs_n_rh_write_zskip; +extern atomic_t netfs_n_wh_region; +extern atomic_t netfs_n_wh_flush_group; static inline void netfs_stat(atomic_t *stat) diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c new file mode 100644 index 000000000000..ba1e052aa352 --- /dev/null +++ b/fs/netfs/objects.c @@ -0,0 +1,113 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Object lifetime handling and tracing. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include +#include +#include "internal.h" + +/** + * netfs_new_flush_group - Create a new write flush group + * @inode: The inode for which this is a flush group. + * @netfs_priv: Netfs private data to include in the new group + * + * Create a new flush group and add it to the tail of the inode's group list. + * Flush groups are used to control the order in which dirty data is written + * back to the server. + * + * The caller must hold ctx->lock. + */ +struct netfs_flush_group *netfs_new_flush_group(struct inode *inode, void *netfs_priv) +{ + struct netfs_flush_group *group; + struct netfs_i_context *ctx = netfs_i_context(inode); + + group = kzalloc(sizeof(*group), GFP_KERNEL); + if (group) { + group->netfs_priv = netfs_priv; + INIT_LIST_HEAD(&group->region_list); + refcount_set(&group->ref, 1); + netfs_stat(&netfs_n_wh_flush_group); + list_add_tail(&group->group_link, &ctx->flush_groups); + } + return group; +} +EXPORT_SYMBOL(netfs_new_flush_group); + +struct netfs_flush_group *netfs_get_flush_group(struct netfs_flush_group *group) +{ + refcount_inc(&group->ref); + return group; +} + +void netfs_put_flush_group(struct netfs_flush_group *group) +{ + if (group && refcount_dec_and_test(&group->ref)) { + netfs_stat_d(&netfs_n_wh_flush_group); + kfree(group); + } +} + +struct netfs_dirty_region *netfs_alloc_dirty_region(void) +{ + struct netfs_dirty_region *region; + + region = kzalloc(sizeof(struct netfs_dirty_region), GFP_KERNEL); + if (region) + netfs_stat(&netfs_n_wh_region); + return region; +} + +struct netfs_dirty_region *netfs_get_dirty_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + enum netfs_region_trace what) +{ + int ref; + + __refcount_inc(®ion->ref, &ref); + trace_netfs_ref_region(region->debug_id, ref + 1, what); + return region; +} + +void netfs_free_dirty_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region) +{ + if (region) { + trace_netfs_ref_region(region->debug_id, 0, netfs_region_trace_free); + if (ctx->ops->free_dirty_region) + ctx->ops->free_dirty_region(region); + netfs_put_flush_group(region->group); + netfs_stat_d(&netfs_n_wh_region); + kfree(region); + } +} + +void netfs_put_dirty_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + enum netfs_region_trace what) +{ + bool dead; + int ref; + + if (!region) + return; + dead = __refcount_dec_and_test(®ion->ref, &ref); + trace_netfs_ref_region(region->debug_id, ref - 1, what); + if (dead) { + if (!list_empty(®ion->active_link) || + !list_empty(®ion->dirty_link)) { + spin_lock(&ctx->lock); + list_del_init(®ion->active_link); + list_del_init(®ion->dirty_link); + spin_unlock(&ctx->lock); + } + netfs_free_dirty_region(ctx, region); + } +} diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index aa98ecf6df6b..bfcdbbd32f4c 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -1321,3 +1321,97 @@ int netfs_write_begin(struct file *file, struct address_space *mapping, return ret; } EXPORT_SYMBOL(netfs_write_begin); + +/* + * Preload the data into a page we're proposing to write into. + */ +int netfs_prefetch_for_write(struct file *file, struct page *page, + loff_t pos, size_t len, bool always_fill) +{ + struct address_space *mapping = page_file_mapping(page); + struct netfs_read_request *rreq; + struct netfs_i_context *ctx = netfs_i_context(mapping->host); + struct page *xpage; + unsigned int debug_index = 0; + int ret; + + DEFINE_READAHEAD(ractl, file, NULL, mapping, page_index(page)); + + /* If the page is beyond the EOF, we want to clear it - unless it's + * within the cache granule containing the EOF, in which case we need + * to preload the granule. + */ + if (!netfs_is_cache_enabled(mapping->host)) { + if (netfs_skip_page_read(page, pos, len, always_fill)) { + netfs_stat(&netfs_n_rh_write_zskip); + ret = 0; + goto error; + } + } + + ret = -ENOMEM; + rreq = netfs_alloc_read_request(mapping, file); + if (!rreq) + goto error; + rreq->start = page_offset(page); + rreq->len = thp_size(page); + rreq->no_unlock_page = page_file_offset(page); + __set_bit(NETFS_RREQ_NO_UNLOCK_PAGE, &rreq->flags); + + if (ctx->ops->begin_cache_operation) { + ret = ctx->ops->begin_cache_operation(rreq); + if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) + goto error_put; + } + + netfs_stat(&netfs_n_rh_write_begin); + trace_netfs_read(rreq, pos, len, netfs_read_trace_prefetch_for_write); + + /* Expand the request to meet caching requirements and download + * preferences. + */ + ractl._nr_pages = thp_nr_pages(page); + netfs_rreq_expand(rreq, &ractl); + + /* Set up the output buffer */ + ret = netfs_rreq_set_up_buffer(rreq, &ractl, page, + readahead_index(&ractl), readahead_count(&ractl)); + if (ret < 0) { + while ((xpage = readahead_page(&ractl))) + if (xpage != page) + put_page(xpage); + goto error_put; + } + + netfs_get_read_request(rreq); + atomic_set(&rreq->nr_rd_ops, 1); + do { + if (!netfs_rreq_submit_slice(rreq, &debug_index)) + break; + + } while (rreq->submitted < rreq->len); + + /* Keep nr_rd_ops incremented so that the ref always belongs to us, and + * the service code isn't punted off to a random thread pool to + * process. + */ + for (;;) { + wait_var_event(&rreq->nr_rd_ops, atomic_read(&rreq->nr_rd_ops) == 1); + netfs_rreq_assess(rreq, false); + if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags)) + break; + cond_resched(); + } + + ret = rreq->error; + if (ret == 0 && rreq->submitted < rreq->len) { + trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_write_begin); + ret = -EIO; + } + +error_put: + netfs_put_read_request(rreq, false); +error: + _leave(" = %d", ret); + return ret; +} diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index 5510a7a14a40..7c079ca47b5b 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -27,6 +27,8 @@ atomic_t netfs_n_rh_write_begin; atomic_t netfs_n_rh_write_done; atomic_t netfs_n_rh_write_failed; atomic_t netfs_n_rh_write_zskip; +atomic_t netfs_n_wh_region; +atomic_t netfs_n_wh_flush_group; void netfs_stats_show(struct seq_file *m) { @@ -54,5 +56,8 @@ void netfs_stats_show(struct seq_file *m) atomic_read(&netfs_n_rh_write), atomic_read(&netfs_n_rh_write_done), atomic_read(&netfs_n_rh_write_failed)); + seq_printf(m, "WrHelp : R=%u F=%u\n", + atomic_read(&netfs_n_wh_region), + atomic_read(&netfs_n_wh_flush_group)); } EXPORT_SYMBOL(netfs_stats_show); diff --git a/fs/netfs/write_helper.c b/fs/netfs/write_helper.c new file mode 100644 index 000000000000..a8c58eaa84d0 --- /dev/null +++ b/fs/netfs/write_helper.c @@ -0,0 +1,908 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Network filesystem high-level write support. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include +#include +#include "internal.h" + +static atomic_t netfs_region_debug_ids; + +static bool __overlaps(loff_t start1, loff_t end1, loff_t start2, loff_t end2) +{ + return (start1 < start2) ? end1 > start2 : end2 > start1; +} + +static bool overlaps(struct netfs_range *a, struct netfs_range *b) +{ + return __overlaps(a->start, a->end, b->start, b->end); +} + +static int wait_on_region(struct netfs_dirty_region *region, + enum netfs_region_state state) +{ + return wait_var_event_interruptible(®ion->state, + READ_ONCE(region->state) >= state); +} + +/* + * Grab a page for writing. We don't lock it at this point as we have yet to + * preemptively trigger a fault-in - but we need to know how large the page + * will be before we try that. + */ +static struct page *netfs_grab_page_for_write(struct address_space *mapping, + loff_t pos, size_t len_remaining) +{ + struct page *page; + int fgp_flags = FGP_LOCK | FGP_WRITE | FGP_CREAT; + + page = pagecache_get_page(mapping, pos >> PAGE_SHIFT, fgp_flags, + mapping_gfp_mask(mapping)); + if (!page) + return ERR_PTR(-ENOMEM); + wait_for_stable_page(page); + return page; +} + +/* + * Initialise a new dirty page group. The caller is responsible for setting + * the type and any flags that they want. + */ +static void netfs_init_dirty_region(struct netfs_dirty_region *region, + struct inode *inode, struct file *file, + enum netfs_region_type type, + unsigned long flags, + loff_t start, loff_t end) +{ + struct netfs_flush_group *group; + struct netfs_i_context *ctx = netfs_i_context(inode); + + region->state = NETFS_REGION_IS_PENDING; + region->type = type; + region->flags = flags; + region->reserved.start = start; + region->reserved.end = end; + region->dirty.start = start; + region->dirty.end = start; + region->bounds.start = round_down(start, ctx->bsize); + region->bounds.end = round_up(end, ctx->bsize); + region->i_size = i_size_read(inode); + region->debug_id = atomic_inc_return(&netfs_region_debug_ids); + INIT_LIST_HEAD(®ion->active_link); + INIT_LIST_HEAD(®ion->dirty_link); + INIT_LIST_HEAD(®ion->flush_link); + refcount_set(®ion->ref, 1); + spin_lock_init(®ion->lock); + if (file && ctx->ops->init_dirty_region) + ctx->ops->init_dirty_region(region, file); + if (!region->group) { + group = list_last_entry(&ctx->flush_groups, + struct netfs_flush_group, group_link); + region->group = netfs_get_flush_group(group); + list_add_tail(®ion->flush_link, &group->region_list); + } + trace_netfs_ref_region(region->debug_id, 1, netfs_region_trace_new); + trace_netfs_dirty(ctx, region, NULL, netfs_dirty_trace_new); +} + +/* + * Queue a region for flushing. Regions may need to be flushed in the right + * order (e.g. ceph snaps) and so we may need to chuck other regions onto the + * flush queue first. + * + * The caller must hold ctx->lock. + */ +void netfs_flush_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + enum netfs_dirty_trace why) +{ + struct netfs_flush_group *group; + + LIST_HEAD(flush_queue); + + kenter("%x", region->debug_id); + + if (test_bit(NETFS_REGION_FLUSH_Q, ®ion->flags) || + region->group->flush) + return; + + trace_netfs_dirty(ctx, region, NULL, why); + + /* If the region isn't in the bottom flush group, we need to flush out + * all of the flush groups below it. + */ + while (!list_is_first(®ion->group->group_link, &ctx->flush_groups)) { + group = list_first_entry(&ctx->flush_groups, + struct netfs_flush_group, group_link); + group->flush = true; + list_del_init(&group->group_link); + list_splice_tail_init(&group->region_list, &ctx->flush_queue); + netfs_put_flush_group(group); + } + + set_bit(NETFS_REGION_FLUSH_Q, ®ion->flags); + list_move_tail(®ion->flush_link, &ctx->flush_queue); +} + +/* + * Decide if/how a write can be merged with a dirty region. + */ +static enum netfs_write_compatibility netfs_write_compatibility( + struct netfs_i_context *ctx, + struct netfs_dirty_region *old, + struct netfs_dirty_region *candidate) +{ + if (old->type == NETFS_REGION_DIO || + old->type == NETFS_REGION_DSYNC || + old->state >= NETFS_REGION_IS_FLUSHING || + /* The bounding boxes of DSYNC writes can overlap with those of + * other DSYNC writes and ordinary writes. + */ + candidate->group != old->group || + old->group->flush) + return NETFS_WRITES_INCOMPATIBLE; + if (!ctx->ops->is_write_compatible) { + if (candidate->type == NETFS_REGION_DSYNC) + return NETFS_WRITES_SUPERSEDE; + return NETFS_WRITES_COMPATIBLE; + } + return ctx->ops->is_write_compatible(ctx, old, candidate); +} + +/* + * Split a dirty region. + */ +static struct netfs_dirty_region *netfs_split_dirty_region( + struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + struct netfs_dirty_region **spare, + unsigned long long pos) +{ + struct netfs_dirty_region *tail = *spare; + + *spare = NULL; + *tail = *region; + region->dirty.end = pos; + tail->dirty.start = pos; + tail->debug_id = atomic_inc_return(&netfs_region_debug_ids); + + refcount_set(&tail->ref, 1); + INIT_LIST_HEAD(&tail->active_link); + netfs_get_flush_group(tail->group); + spin_lock_init(&tail->lock); + // TODO: grab cache resources + + // need to split the bounding box? + __set_bit(NETFS_REGION_SUPERSEDED, &tail->flags); + if (ctx->ops->split_dirty_region) + ctx->ops->split_dirty_region(tail); + list_add(&tail->dirty_link, ®ion->dirty_link); + list_add(&tail->flush_link, ®ion->flush_link); + trace_netfs_dirty(ctx, tail, region, netfs_dirty_trace_split); + return tail; +} + +/* + * Queue a write for access to the pagecache. The caller must hold ctx->lock. + * The NETFS_REGION_PENDING flag will be cleared when it's possible to proceed. + */ +static void netfs_queue_write(struct netfs_i_context *ctx, + struct netfs_dirty_region *candidate) +{ + struct netfs_dirty_region *r; + struct list_head *p; + + /* We must wait for any overlapping pending writes */ + list_for_each_entry(r, &ctx->pending_writes, active_link) { + if (overlaps(&candidate->bounds, &r->bounds)) { + if (overlaps(&candidate->reserved, &r->reserved) || + netfs_write_compatibility(ctx, r, candidate) == + NETFS_WRITES_INCOMPATIBLE) + goto add_to_pending_queue; + } + } + + /* We mustn't let the request overlap with the reservation of any other + * active writes, though it can overlap with a bounding box if the + * writes are compatible. + */ + list_for_each(p, &ctx->active_writes) { + r = list_entry(p, struct netfs_dirty_region, active_link); + if (r->bounds.end <= candidate->bounds.start) + continue; + if (r->bounds.start >= candidate->bounds.end) + break; + if (overlaps(&candidate->bounds, &r->bounds)) { + if (overlaps(&candidate->reserved, &r->reserved) || + netfs_write_compatibility(ctx, r, candidate) == + NETFS_WRITES_INCOMPATIBLE) + goto add_to_pending_queue; + } + } + + /* We can install the record in the active list to reserve our slot */ + list_add(&candidate->active_link, p); + + /* Okay, we've reserved our slot in the active queue */ + smp_store_release(&candidate->state, NETFS_REGION_IS_RESERVED); + trace_netfs_dirty(ctx, candidate, NULL, netfs_dirty_trace_reserved); + wake_up_var(&candidate->state); + kleave(" [go]"); + return; + +add_to_pending_queue: + /* We get added to the pending list and then we have to wait */ + list_add(&candidate->active_link, &ctx->pending_writes); + trace_netfs_dirty(ctx, candidate, NULL, netfs_dirty_trace_wait_pend); + kleave(" [wait pend]"); +} + +/* + * Make sure there's a flush group. + */ +static int netfs_require_flush_group(struct inode *inode) +{ + struct netfs_flush_group *group; + struct netfs_i_context *ctx = netfs_i_context(inode); + + if (list_empty(&ctx->flush_groups)) { + kdebug("new flush group"); + group = netfs_new_flush_group(inode, NULL); + if (!group) + return -ENOMEM; + } + return 0; +} + +/* + * Create a dirty region record for the write we're about to do and add it to + * the list of regions. We may need to wait for conflicting writes to + * complete. + */ +static struct netfs_dirty_region *netfs_prepare_region(struct inode *inode, + struct file *file, + loff_t start, size_t len, + enum netfs_region_type type, + unsigned long flags) +{ + struct netfs_dirty_region *candidate; + struct netfs_i_context *ctx = netfs_i_context(inode); + loff_t end = start + len; + int ret; + + ret = netfs_require_flush_group(inode); + if (ret < 0) + return ERR_PTR(ret); + + candidate = netfs_alloc_dirty_region(); + if (!candidate) + return ERR_PTR(-ENOMEM); + + netfs_init_dirty_region(candidate, inode, file, type, flags, start, end); + + spin_lock(&ctx->lock); + netfs_queue_write(ctx, candidate); + spin_unlock(&ctx->lock); + return candidate; +} + +/* + * Activate a write. This adds it to the dirty list and does any necessary + * flushing and superceding there. The caller must provide a spare region + * record so that we can split a dirty record if we need to supersede it. + */ +static void __netfs_activate_write(struct netfs_i_context *ctx, + struct netfs_dirty_region *candidate, + struct netfs_dirty_region **spare) +{ + struct netfs_dirty_region *r; + struct list_head *p; + enum netfs_write_compatibility comp; + bool conflicts = false; + + /* See if there are any dirty regions that need flushing first. */ + list_for_each(p, &ctx->dirty_regions) { + r = list_entry(p, struct netfs_dirty_region, dirty_link); + if (r->bounds.end <= candidate->bounds.start) + continue; + if (r->bounds.start >= candidate->bounds.end) + break; + + if (list_empty(&candidate->dirty_link) && + r->dirty.start > candidate->dirty.start) + list_add_tail(&candidate->dirty_link, p); + + comp = netfs_write_compatibility(ctx, r, candidate); + switch (comp) { + case NETFS_WRITES_INCOMPATIBLE: + netfs_flush_region(ctx, r, netfs_dirty_trace_flush_conflict); + conflicts = true; + continue; + + case NETFS_WRITES_SUPERSEDE: + if (!overlaps(&candidate->reserved, &r->dirty)) + continue; + if (r->dirty.start < candidate->dirty.start) { + /* The region overlaps the beginning of our + * region, we split it and mark the overlapping + * part as superseded. We insert ourself + * between. + */ + r = netfs_split_dirty_region(ctx, r, spare, + candidate->reserved.start); + list_add_tail(&candidate->dirty_link, &r->dirty_link); + p = &r->dirty_link; /* Advance the for-loop */ + } else { + /* The region is after ours, so make sure we're + * inserted before it. + */ + if (list_empty(&candidate->dirty_link)) + list_add_tail(&candidate->dirty_link, &r->dirty_link); + set_bit(NETFS_REGION_SUPERSEDED, &r->flags); + trace_netfs_dirty(ctx, candidate, r, netfs_dirty_trace_supersedes); + } + continue; + + case NETFS_WRITES_COMPATIBLE: + continue; + } + } + + if (list_empty(&candidate->dirty_link)) + list_add_tail(&candidate->dirty_link, p); + netfs_get_dirty_region(ctx, candidate, netfs_region_trace_get_dirty); + + if (conflicts) { + /* The caller must wait for the flushes to complete. */ + trace_netfs_dirty(ctx, candidate, NULL, netfs_dirty_trace_wait_active); + kleave(" [wait flush]"); + return; + } + + /* Okay, we're cleared to proceed. */ + smp_store_release(&candidate->state, NETFS_REGION_IS_ACTIVE); + trace_netfs_dirty(ctx, candidate, NULL, netfs_dirty_trace_active); + wake_up_var(&candidate->state); + kleave(" [go]"); + return; +} + +static int netfs_activate_write(struct netfs_i_context *ctx, + struct netfs_dirty_region *region) +{ + struct netfs_dirty_region *spare; + + spare = netfs_alloc_dirty_region(); + if (!spare) + return -ENOMEM; + + spin_lock(&ctx->lock); + __netfs_activate_write(ctx, region, &spare); + spin_unlock(&ctx->lock); + netfs_free_dirty_region(ctx, spare); + return 0; +} + +/* + * Merge a completed active write into the list of dirty regions. The region + * can be in one of a number of states: + * + * - Ordinary write, error, no data copied. Discard. + * - Ordinary write, unflushed. Dirty + * - Ordinary write, flush started. Dirty + * - Ordinary write, completed/failed. Discard. + * - DIO write, completed/failed. Discard. + * - DSYNC write, error before flush. As ordinary. + * - DSYNC write, flushed in progress, EINTR. Dirty (supersede). + * - DSYNC write, written to server and cache. Dirty (supersede)/Discard. + * - DSYNC write, written to server but not yet cache. Dirty. + * + * Once we've dealt with this record, we see about activating some other writes + * to fill the activity hole. + * + * This eats the caller's ref on the region. + */ +static void netfs_merge_dirty_region(struct netfs_i_context *ctx, + struct netfs_dirty_region *region) +{ + struct netfs_dirty_region *p, *q, *front; + bool new_content = test_bit(NETFS_ICTX_NEW_CONTENT, &ctx->flags); + LIST_HEAD(graveyard); + + list_del_init(®ion->active_link); + + switch (region->type) { + case NETFS_REGION_DIO: + list_move_tail(®ion->dirty_link, &graveyard); + goto discard; + + case NETFS_REGION_DSYNC: + /* A DSYNC write may have overwritten some dirty data + * and caused the writeback of other dirty data. + */ + goto scan_forwards; + + case NETFS_REGION_ORDINARY: + if (region->dirty.end == region->dirty.start) { + list_move_tail(®ion->dirty_link, &graveyard); + goto discard; + } + goto scan_backwards; + } + +scan_backwards: + kdebug("scan_backwards"); + /* Search backwards for a preceding record that we might be able to + * merge with. We skip over any intervening flush-in-progress records. + */ + p = front = region; + list_for_each_entry_continue_reverse(p, &ctx->dirty_regions, dirty_link) { + kdebug("- back %x", p->debug_id); + if (p->state >= NETFS_REGION_IS_FLUSHING) + continue; + if (p->state == NETFS_REGION_IS_ACTIVE) + break; + if (p->bounds.end < region->bounds.start) + break; + if (p->dirty.end >= region->dirty.start || new_content) + goto merge_backwards; + } + goto scan_forwards; + +merge_backwards: + kdebug("merge_backwards"); + if (test_bit(NETFS_REGION_SUPERSEDED, &p->flags) || + netfs_write_compatibility(ctx, p, region) != NETFS_WRITES_COMPATIBLE) + goto scan_forwards; + + front = p; + front->bounds.end = max(front->bounds.end, region->bounds.end); + front->dirty.end = max(front->dirty.end, region->dirty.end); + set_bit(NETFS_REGION_SUPERSEDED, ®ion->flags); + list_del_init(®ion->flush_link); + trace_netfs_dirty(ctx, front, region, netfs_dirty_trace_merged_back); + +scan_forwards: + /* Subsume forwards any records this one covers. There should be no + * non-supersedeable incompatible regions in our range as we would have + * flushed and waited for them before permitting this write to start. + * + * There can, however, be regions undergoing flushing which we need to + * skip over and not merge with. + */ + kdebug("scan_forwards"); + p = region; + list_for_each_entry_safe_continue(p, q, &ctx->dirty_regions, dirty_link) { + kdebug("- forw %x", p->debug_id); + if (p->state >= NETFS_REGION_IS_FLUSHING) + continue; + if (p->state == NETFS_REGION_IS_ACTIVE) + break; + if (p->dirty.start > region->dirty.end && + (!new_content || p->bounds.start > p->bounds.end)) + break; + + if (region->dirty.end >= p->dirty.end) { + /* Entirely subsumed */ + list_move_tail(&p->dirty_link, &graveyard); + list_del_init(&p->flush_link); + trace_netfs_dirty(ctx, front, p, netfs_dirty_trace_merged_sub); + continue; + } + + goto merge_forwards; + } + goto merge_complete; + +merge_forwards: + kdebug("merge_forwards"); + if (test_bit(NETFS_REGION_SUPERSEDED, &p->flags) || + netfs_write_compatibility(ctx, p, front) == NETFS_WRITES_SUPERSEDE) { + /* If a region was partially superseded by us, we need to roll + * it forwards and remove the superseded flag. + */ + if (p->dirty.start < front->dirty.end) { + p->dirty.start = front->dirty.end; + clear_bit(NETFS_REGION_SUPERSEDED, &p->flags); + } + trace_netfs_dirty(ctx, p, front, netfs_dirty_trace_superseded); + goto merge_complete; + } + + /* Simply merge overlapping/contiguous ordinary areas together. */ + front->bounds.end = max(front->bounds.end, p->bounds.end); + front->dirty.end = max(front->dirty.end, p->dirty.end); + list_move_tail(&p->dirty_link, &graveyard); + list_del_init(&p->flush_link); + trace_netfs_dirty(ctx, front, p, netfs_dirty_trace_merged_forw); + +merge_complete: + if (test_bit(NETFS_REGION_SUPERSEDED, ®ion->flags)) { + list_move_tail(®ion->dirty_link, &graveyard); + } +discard: + while (!list_empty(&graveyard)) { + p = list_first_entry(&graveyard, struct netfs_dirty_region, dirty_link); + list_del_init(&p->dirty_link); + smp_store_release(&p->state, NETFS_REGION_IS_COMPLETE); + trace_netfs_dirty(ctx, p, NULL, netfs_dirty_trace_complete); + wake_up_var(&p->state); + netfs_put_dirty_region(ctx, p, netfs_region_trace_put_merged); + } +} + +/* + * Start pending writes in a window we've created by the removal of an active + * write. The writes are bundled onto the given queue and it's left as an + * exercise for the caller to actually start them. + */ +static void netfs_start_pending_writes(struct netfs_i_context *ctx, + struct list_head *prev_p, + struct list_head *queue) +{ + struct netfs_dirty_region *prev = NULL, *next = NULL, *p, *q; + struct netfs_range window = { 0, ULLONG_MAX }; + + if (prev_p != &ctx->active_writes) { + prev = list_entry(prev_p, struct netfs_dirty_region, active_link); + window.start = prev->reserved.end; + if (!list_is_last(prev_p, &ctx->active_writes)) { + next = list_next_entry(prev, active_link); + window.end = next->reserved.start; + } + } else if (!list_empty(&ctx->active_writes)) { + next = list_last_entry(&ctx->active_writes, + struct netfs_dirty_region, active_link); + window.end = next->reserved.start; + } + + list_for_each_entry_safe(p, q, &ctx->pending_writes, active_link) { + bool skip = false; + + if (!overlaps(&p->reserved, &window)) + continue; + + /* Narrow the window when we find a region that requires more + * than we can immediately provide. The queue is in submission + * order and we need to prevent starvation. + */ + if (p->type == NETFS_REGION_DIO) { + if (p->bounds.start < window.start) { + window.start = p->bounds.start; + skip = true; + } + if (p->bounds.end > window.end) { + window.end = p->bounds.end; + skip = true; + } + } else { + if (p->reserved.start < window.start) { + window.start = p->reserved.start; + skip = true; + } + if (p->reserved.end > window.end) { + window.end = p->reserved.end; + skip = true; + } + } + if (window.start >= window.end) + break; + if (skip) + continue; + + /* Okay, we have a gap that's large enough to start this write + * in. Make sure it's compatible with any region its bounds + * overlap. + */ + if (prev && + p->bounds.start < prev->bounds.end && + netfs_write_compatibility(ctx, prev, p) == NETFS_WRITES_INCOMPATIBLE) { + window.start = max(window.start, p->bounds.end); + skip = true; + } + + if (next && + p->bounds.end > next->bounds.start && + netfs_write_compatibility(ctx, next, p) == NETFS_WRITES_INCOMPATIBLE) { + window.end = min(window.end, p->bounds.start); + skip = true; + } + if (window.start >= window.end) + break; + if (skip) + continue; + + /* Okay, we can start this write. */ + trace_netfs_dirty(ctx, p, NULL, netfs_dirty_trace_start_pending); + list_move(&p->active_link, + prev ? &prev->active_link : &ctx->pending_writes); + list_add_tail(&p->dirty_link, queue); + if (p->type == NETFS_REGION_DIO) + window.start = p->bounds.end; + else + window.start = p->reserved.end; + prev = p; + } +} + +/* + * We completed the modification phase of a write. We need to fix up the dirty + * list, remove this region from the active list and start waiters. + */ +static void netfs_commit_write(struct netfs_i_context *ctx, + struct netfs_dirty_region *region) +{ + struct netfs_dirty_region *p; + struct list_head *prev; + LIST_HEAD(queue); + + spin_lock(&ctx->lock); + smp_store_release(®ion->state, NETFS_REGION_IS_DIRTY); + trace_netfs_dirty(ctx, region, NULL, netfs_dirty_trace_commit); + wake_up_var(®ion->state); + + prev = region->active_link.prev; + netfs_merge_dirty_region(ctx, region); + if (!list_empty(&ctx->pending_writes)) + netfs_start_pending_writes(ctx, prev, &queue); + spin_unlock(&ctx->lock); + + while (!list_empty(&queue)) { + p = list_first_entry(&queue, struct netfs_dirty_region, dirty_link); + list_del_init(&p->dirty_link); + smp_store_release(&p->state, NETFS_REGION_IS_DIRTY); + wake_up_var(&p->state); + } +} + +/* + * Write data into a prereserved region of the pagecache attached to a netfs + * inode. + */ +static ssize_t netfs_perform_write(struct netfs_dirty_region *region, + struct kiocb *iocb, struct iov_iter *i) +{ + struct file *file = iocb->ki_filp; + struct netfs_i_context *ctx = netfs_i_context(file_inode(file)); + struct page *page; + ssize_t written = 0, ret; + loff_t new_pos, i_size; + bool always_fill = false; + + do { + size_t plen; + size_t offset; /* Offset into pagecache page */ + size_t bytes; /* Bytes to write to page */ + size_t copied; /* Bytes copied from user */ + bool relock = false; + + page = netfs_grab_page_for_write(file->f_mapping, region->dirty.end, + iov_iter_count(i)); + if (!page) + return -ENOMEM; + + plen = thp_size(page); + offset = region->dirty.end - page_file_offset(page); + bytes = min_t(size_t, plen - offset, iov_iter_count(i)); + + kdebug("segment %zx @%zx", bytes, offset); + + if (!PageUptodate(page)) { + unlock_page(page); /* Avoid deadlocking fault-in */ + relock = true; + } + + /* Bring in the user page that we will copy from _first_. + * Otherwise there's a nasty deadlock on copying from the + * same page as we're writing to, without it being marked + * up-to-date. + * + * Not only is this an optimisation, but it is also required + * to check that the address is actually valid, when atomic + * usercopies are used, below. + */ + if (unlikely(iov_iter_fault_in_readable(i, bytes))) { + kdebug("fault-in"); + ret = -EFAULT; + goto error_page; + } + + if (fatal_signal_pending(current)) { + ret = -EINTR; + goto error_page; + } + + if (relock) { + ret = lock_page_killable(page); + if (ret < 0) + goto error_page; + } + +redo_prefetch: + /* Prefetch area to be written into the cache if we're caching + * this file. We need to do this before we get a lock on the + * page in case there's more than one writer competing for the + * same cache block. + */ + if (!PageUptodate(page)) { + ret = netfs_prefetch_for_write(file, page, region->dirty.end, + bytes, always_fill); + kdebug("prefetch %zx", ret); + if (ret < 0) + goto error_page; + } + + if (mapping_writably_mapped(page->mapping)) + flush_dcache_page(page); + copied = copy_page_from_iter_atomic(page, offset, bytes, i); + flush_dcache_page(page); + kdebug("copied %zx", copied); + + /* Deal with a (partially) failed copy */ + if (!PageUptodate(page)) { + if (copied == 0) { + ret = -EFAULT; + goto error_page; + } + if (copied < bytes) { + iov_iter_revert(i, copied); + always_fill = true; + goto redo_prefetch; + } + SetPageUptodate(page); + } + + /* Update the inode size if we moved the EOF marker */ + new_pos = region->dirty.end + copied; + i_size = i_size_read(file_inode(file)); + if (new_pos > i_size) { + if (ctx->ops->update_i_size) { + ctx->ops->update_i_size(file, new_pos); + } else { + i_size_write(file_inode(file), new_pos); + fscache_update_cookie(ctx->cache, NULL, &new_pos); + } + } + + /* Update the region appropriately */ + if (i_size > region->i_size) + region->i_size = i_size; + smp_store_release(®ion->dirty.end, new_pos); + + trace_netfs_dirty(ctx, region, NULL, netfs_dirty_trace_modified); + set_page_dirty(page); + unlock_page(page); + put_page(page); + page = NULL; + + cond_resched(); + + written += copied; + + balance_dirty_pages_ratelimited(file->f_mapping); + } while (iov_iter_count(i)); + +out: + if (likely(written)) { + kdebug("written"); + iocb->ki_pos += written; + + /* Flush and wait for a write that requires immediate synchronisation. */ + if (region->type == NETFS_REGION_DSYNC) { + kdebug("dsync"); + spin_lock(&ctx->lock); + netfs_flush_region(ctx, region, netfs_dirty_trace_flush_dsync); + spin_unlock(&ctx->lock); + + ret = wait_on_region(region, NETFS_REGION_IS_COMPLETE); + if (ret < 0) + written = ret; + } + } + + netfs_commit_write(ctx, region); + return written ? written : ret; + +error_page: + unlock_page(page); + put_page(page); + goto out; +} + +/** + * netfs_file_write_iter - write data to a file + * @iocb: IO state structure + * @from: iov_iter with data to write + * + * This is a wrapper around __generic_file_write_iter() to be used by most + * filesystems. It takes care of syncing the file in case of O_SYNC file + * and acquires i_mutex as needed. + * Return: + * * negative error code if no data has been written at all of + * vfs_fsync_range() failed for a synchronous write + * * number of bytes written, even for truncated writes + */ +ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct netfs_dirty_region *region = NULL; + struct file *file = iocb->ki_filp; + struct inode *inode = file->f_mapping->host; + struct netfs_i_context *ctx = netfs_i_context(inode); + enum netfs_region_type type; + unsigned long flags = 0; + ssize_t ret; + + printk("\n"); + kenter("%llx,%zx,%llx", iocb->ki_pos, iov_iter_count(from), i_size_read(inode)); + + inode_lock(inode); + ret = generic_write_checks(iocb, from); + if (ret <= 0) + goto error_unlock; + + if (iocb->ki_flags & IOCB_DIRECT) + type = NETFS_REGION_DIO; + if (iocb->ki_flags & IOCB_DSYNC) + type = NETFS_REGION_DSYNC; + else + type = NETFS_REGION_ORDINARY; + if (iocb->ki_flags & IOCB_SYNC) + __set_bit(NETFS_REGION_SYNC, &flags); + + region = netfs_prepare_region(inode, file, iocb->ki_pos, + iov_iter_count(from), type, flags); + if (IS_ERR(region)) { + ret = PTR_ERR(region); + goto error_unlock; + } + + trace_netfs_write_iter(region, iocb, from); + + /* We can write back this queue in page reclaim */ + current->backing_dev_info = inode_to_bdi(inode); + ret = file_remove_privs(file); + if (ret) + goto error_unlock; + + ret = file_update_time(file); + if (ret) + goto error_unlock; + + inode_unlock(inode); + + ret = wait_on_region(region, NETFS_REGION_IS_RESERVED); + if (ret < 0) + goto error; + + ret = netfs_activate_write(ctx, region); + if (ret < 0) + goto error; + + /* The region excludes overlapping writes and is used to synchronise + * versus flushes. + */ + if (iocb->ki_flags & IOCB_DIRECT) + ret = -EOPNOTSUPP; //netfs_file_direct_write(region, iocb, from); + else + ret = netfs_perform_write(region, iocb, from); + +out: + netfs_put_dirty_region(ctx, region, netfs_region_trace_put_write_iter); + current->backing_dev_info = NULL; + return ret; + +error_unlock: + inode_unlock(inode); +error: + if (region) + netfs_commit_write(ctx, region); + goto out; +} +EXPORT_SYMBOL(netfs_file_write_iter); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 35bcd916c3a0..fc91711d3178 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -165,17 +165,95 @@ struct netfs_read_request { */ struct netfs_i_context { const struct netfs_request_ops *ops; + struct list_head pending_writes; /* List of writes waiting to be begin */ + struct list_head active_writes; /* List of writes being applied */ + struct list_head dirty_regions; /* List of dirty regions in the pagecache */ + struct list_head flush_groups; /* Writeable region ordering queue */ + struct list_head flush_queue; /* Regions that need to be flushed */ #ifdef CONFIG_FSCACHE struct fscache_cookie *cache; #endif unsigned long flags; #define NETFS_ICTX_NEW_CONTENT 0 /* Set if file has new content (create/trunc-0) */ + spinlock_t lock; + unsigned int rsize; /* Maximum read size */ + unsigned int wsize; /* Maximum write size */ + unsigned int bsize; /* Min block size for bounding box */ + unsigned int inval_counter; /* Number of invalidations made */ +}; + +/* + * Descriptor for a set of writes that will need to be flushed together. + */ +struct netfs_flush_group { + struct list_head group_link; /* Link in i_context->flush_groups */ + struct list_head region_list; /* List of regions in this group */ + void *netfs_priv; + refcount_t ref; + bool flush; +}; + +struct netfs_range { + unsigned long long start; /* Start of region */ + unsigned long long end; /* End of region */ +}; + +/* State of a netfs_dirty_region */ +enum netfs_region_state { + NETFS_REGION_IS_PENDING, /* Proposed write is waiting on an active write */ + NETFS_REGION_IS_RESERVED, /* Writable region is reserved, waiting on flushes */ + NETFS_REGION_IS_ACTIVE, /* Write is actively modifying the pagecache */ + NETFS_REGION_IS_DIRTY, /* Region is dirty */ + NETFS_REGION_IS_FLUSHING, /* Region is being flushed */ + NETFS_REGION_IS_COMPLETE, /* Region has been completed (stored/invalidated) */ +} __attribute__((mode(byte))); + +enum netfs_region_type { + NETFS_REGION_ORDINARY, /* Ordinary write */ + NETFS_REGION_DIO, /* Direct I/O write */ + NETFS_REGION_DSYNC, /* O_DSYNC/RWF_DSYNC write */ +} __attribute__((mode(byte))); + +/* + * Descriptor for a dirty region that has a common set of parameters and can + * feasibly be written back in one go. These are held in an ordered list. + * + * Regions are not allowed to overlap, though they may be merged. + */ +struct netfs_dirty_region { + struct netfs_flush_group *group; + struct list_head active_link; /* Link in i_context->pending/active_writes */ + struct list_head dirty_link; /* Link in i_context->dirty_regions */ + struct list_head flush_link; /* Link in group->region_list or + * i_context->flush_queue */ + spinlock_t lock; + void *netfs_priv; /* Private data for the netfs */ + struct netfs_range bounds; /* Bounding box including all affected pages */ + struct netfs_range reserved; /* The region reserved against other writes */ + struct netfs_range dirty; /* The region that has been modified */ + loff_t i_size; /* Size of the file */ + enum netfs_region_type type; + enum netfs_region_state state; + unsigned long flags; +#define NETFS_REGION_SYNC 0 /* Set if metadata sync required (RWF_SYNC) */ +#define NETFS_REGION_FLUSH_Q 1 /* Set if region is on flush queue */ +#define NETFS_REGION_SUPERSEDED 2 /* Set if region is being superseded */ + unsigned int debug_id; + refcount_t ref; +}; + +enum netfs_write_compatibility { + NETFS_WRITES_COMPATIBLE, /* Dirty regions can be directly merged */ + NETFS_WRITES_SUPERSEDE, /* Second write can supersede the first without first + * having to be flushed (eg. authentication, DSYNC) */ + NETFS_WRITES_INCOMPATIBLE, /* Second write must wait for first (eg. DIO, ceph snap) */ }; /* * Operations the network filesystem can/must provide to the helpers. */ struct netfs_request_ops { + /* Read request handling */ void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); int (*begin_cache_operation)(struct netfs_read_request *rreq); void (*expand_readahead)(struct netfs_read_request *rreq); @@ -186,6 +264,17 @@ struct netfs_request_ops { struct page *page, void **_fsdata); void (*done)(struct netfs_read_request *rreq); void (*cleanup)(struct address_space *mapping, void *netfs_priv); + + /* Dirty region handling */ + void (*init_dirty_region)(struct netfs_dirty_region *region, struct file *file); + void (*split_dirty_region)(struct netfs_dirty_region *region); + void (*free_dirty_region)(struct netfs_dirty_region *region); + enum netfs_write_compatibility (*is_write_compatible)( + struct netfs_i_context *ctx, + struct netfs_dirty_region *old_region, + struct netfs_dirty_region *candidate); + bool (*check_compatible_write)(struct netfs_dirty_region *region, struct file *file); + void (*update_i_size)(struct file *file, loff_t i_size); }; /* @@ -234,9 +323,11 @@ extern int netfs_readpage(struct file *, struct page *); extern int netfs_write_begin(struct file *, struct address_space *, loff_t, unsigned int, unsigned int, struct page **, void **); +extern ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from); extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); extern void netfs_stats_show(struct seq_file *); +extern struct netfs_flush_group *netfs_new_flush_group(struct inode *, void *); /** * netfs_i_context - Get the netfs inode context from the inode @@ -256,6 +347,13 @@ static inline void netfs_i_context_init(struct inode *inode, struct netfs_i_context *ctx = netfs_i_context(inode); ctx->ops = ops; + ctx->bsize = PAGE_SIZE; + INIT_LIST_HEAD(&ctx->pending_writes); + INIT_LIST_HEAD(&ctx->active_writes); + INIT_LIST_HEAD(&ctx->dirty_regions); + INIT_LIST_HEAD(&ctx->flush_groups); + INIT_LIST_HEAD(&ctx->flush_queue); + spin_lock_init(&ctx->lock); } /** diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index 04ac29fc700f..808433e6ddd3 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -23,6 +23,7 @@ enum netfs_read_trace { netfs_read_trace_readahead, netfs_read_trace_readpage, netfs_read_trace_write_begin, + netfs_read_trace_prefetch_for_write, }; enum netfs_rreq_trace { @@ -56,12 +57,43 @@ enum netfs_failure { netfs_fail_prepare_write, }; +enum netfs_dirty_trace { + netfs_dirty_trace_active, + netfs_dirty_trace_commit, + netfs_dirty_trace_complete, + netfs_dirty_trace_flush_conflict, + netfs_dirty_trace_flush_dsync, + netfs_dirty_trace_merged_back, + netfs_dirty_trace_merged_forw, + netfs_dirty_trace_merged_sub, + netfs_dirty_trace_modified, + netfs_dirty_trace_new, + netfs_dirty_trace_reserved, + netfs_dirty_trace_split, + netfs_dirty_trace_start_pending, + netfs_dirty_trace_superseded, + netfs_dirty_trace_supersedes, + netfs_dirty_trace_wait_active, + netfs_dirty_trace_wait_pend, +}; + +enum netfs_region_trace { + netfs_region_trace_get_dirty, + netfs_region_trace_get_wreq, + netfs_region_trace_put_discard, + netfs_region_trace_put_merged, + netfs_region_trace_put_write_iter, + netfs_region_trace_free, + netfs_region_trace_new, +}; + #endif #define netfs_read_traces \ EM(netfs_read_trace_expanded, "EXPANDED ") \ EM(netfs_read_trace_readahead, "READAHEAD") \ EM(netfs_read_trace_readpage, "READPAGE ") \ + EM(netfs_read_trace_prefetch_for_write, "PREFETCHW") \ E_(netfs_read_trace_write_begin, "WRITEBEGN") #define netfs_rreq_traces \ @@ -98,6 +130,46 @@ enum netfs_failure { EM(netfs_fail_short_write_begin, "short-write-begin") \ E_(netfs_fail_prepare_write, "prep-write") +#define netfs_region_types \ + EM(NETFS_REGION_ORDINARY, "ORD") \ + EM(NETFS_REGION_DIO, "DIO") \ + E_(NETFS_REGION_DSYNC, "DSY") + +#define netfs_region_states \ + EM(NETFS_REGION_IS_PENDING, "pend") \ + EM(NETFS_REGION_IS_RESERVED, "resv") \ + EM(NETFS_REGION_IS_ACTIVE, "actv") \ + EM(NETFS_REGION_IS_DIRTY, "drty") \ + EM(NETFS_REGION_IS_FLUSHING, "flsh") \ + E_(NETFS_REGION_IS_COMPLETE, "done") + +#define netfs_dirty_traces \ + EM(netfs_dirty_trace_active, "ACTIVE ") \ + EM(netfs_dirty_trace_commit, "COMMIT ") \ + EM(netfs_dirty_trace_complete, "COMPLETE ") \ + EM(netfs_dirty_trace_flush_conflict, "FLSH CONFL") \ + EM(netfs_dirty_trace_flush_dsync, "FLSH DSYNC") \ + EM(netfs_dirty_trace_merged_back, "MERGE BACK") \ + EM(netfs_dirty_trace_merged_forw, "MERGE FORW") \ + EM(netfs_dirty_trace_merged_sub, "SUBSUMED ") \ + EM(netfs_dirty_trace_modified, "MODIFIED ") \ + EM(netfs_dirty_trace_new, "NEW ") \ + EM(netfs_dirty_trace_reserved, "RESERVED ") \ + EM(netfs_dirty_trace_split, "SPLIT ") \ + EM(netfs_dirty_trace_start_pending, "START PEND") \ + EM(netfs_dirty_trace_superseded, "SUPERSEDED") \ + EM(netfs_dirty_trace_supersedes, "SUPERSEDES") \ + EM(netfs_dirty_trace_wait_active, "WAIT ACTV ") \ + E_(netfs_dirty_trace_wait_pend, "WAIT PEND ") + +#define netfs_region_traces \ + EM(netfs_region_trace_get_dirty, "GET DIRTY ") \ + EM(netfs_region_trace_get_wreq, "GET WREQ ") \ + EM(netfs_region_trace_put_discard, "PUT DISCARD") \ + EM(netfs_region_trace_put_merged, "PUT MERGED ") \ + EM(netfs_region_trace_put_write_iter, "PUT WRITER ") \ + EM(netfs_region_trace_free, "FREE ") \ + E_(netfs_region_trace_new, "NEW ") /* * Export enum symbols via userspace. @@ -112,6 +184,9 @@ netfs_rreq_traces; netfs_sreq_sources; netfs_sreq_traces; netfs_failures; +netfs_region_types; +netfs_region_states; +netfs_dirty_traces; /* * Now redefine the EM() and E_() macros to map the enums to the strings that @@ -255,6 +330,111 @@ TRACE_EVENT(netfs_failure, __entry->error) ); +TRACE_EVENT(netfs_write_iter, + TP_PROTO(struct netfs_dirty_region *region, struct kiocb *iocb, + struct iov_iter *from), + + TP_ARGS(region, iocb, from), + + TP_STRUCT__entry( + __field(unsigned int, region ) + __field(unsigned long long, start ) + __field(size_t, len ) + __field(unsigned int, flags ) + ), + + TP_fast_assign( + __entry->region = region->debug_id; + __entry->start = iocb->ki_pos; + __entry->len = iov_iter_count(from); + __entry->flags = iocb->ki_flags; + ), + + TP_printk("D=%x WRITE-ITER s=%llx l=%zx f=%x", + __entry->region, __entry->start, __entry->len, __entry->flags) + ); + +TRACE_EVENT(netfs_ref_region, + TP_PROTO(unsigned int region_debug_id, int ref, + enum netfs_region_trace what), + + TP_ARGS(region_debug_id, ref, what), + + TP_STRUCT__entry( + __field(unsigned int, region ) + __field(int, ref ) + __field(enum netfs_region_trace, what ) + ), + + TP_fast_assign( + __entry->region = region_debug_id; + __entry->ref = ref; + __entry->what = what; + ), + + TP_printk("D=%x %s r=%u", + __entry->region, + __print_symbolic(__entry->what, netfs_region_traces), + __entry->ref) + ); + +TRACE_EVENT(netfs_dirty, + TP_PROTO(struct netfs_i_context *ctx, + struct netfs_dirty_region *region, + struct netfs_dirty_region *region2, + enum netfs_dirty_trace why), + + TP_ARGS(ctx, region, region2, why), + + TP_STRUCT__entry( + __field(ino_t, ino ) + __field(unsigned long long, bounds_start ) + __field(unsigned long long, bounds_end ) + __field(unsigned long long, reserved_start ) + __field(unsigned long long, reserved_end ) + __field(unsigned long long, dirty_start ) + __field(unsigned long long, dirty_end ) + __field(unsigned int, debug_id ) + __field(unsigned int, debug_id2 ) + __field(enum netfs_region_type, type ) + __field(enum netfs_region_state, state ) + __field(unsigned short, flags ) + __field(unsigned int, ref ) + __field(enum netfs_dirty_trace, why ) + ), + + TP_fast_assign( + __entry->ino = (((struct inode *)ctx) - 1)->i_ino; + __entry->why = why; + __entry->bounds_start = region->bounds.start; + __entry->bounds_end = region->bounds.end; + __entry->reserved_start = region->reserved.start; + __entry->reserved_end = region->reserved.end; + __entry->dirty_start = region->dirty.start; + __entry->dirty_end = region->dirty.end; + __entry->debug_id = region->debug_id; + __entry->type = region->type; + __entry->state = region->state; + __entry->flags = region->flags; + __entry->debug_id2 = region2 ? region2->debug_id : 0; + ), + + TP_printk("i=%lx D=%x %s %s dt=%04llx-%04llx bb=%04llx-%04llx rs=%04llx-%04llx %s f=%x XD=%x", + __entry->ino, __entry->debug_id, + __print_symbolic(__entry->why, netfs_dirty_traces), + __print_symbolic(__entry->type, netfs_region_types), + __entry->dirty_start, + __entry->dirty_end, + __entry->bounds_start, + __entry->bounds_end, + __entry->reserved_start, + __entry->reserved_end, + __print_symbolic(__entry->state, netfs_region_states), + __entry->flags, + __entry->debug_id2 + ) + ); + #endif /* _TRACE_NETFS_H */ /* This part must be outside protection */ From patchwork Wed Jul 21 13:46:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12390951 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 339B4C636CA for ; Wed, 21 Jul 2021 13:47:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1CE9F60FF1 for ; Wed, 21 Jul 2021 13:47:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238640AbhGUNGa (ORCPT ); Wed, 21 Jul 2021 09:06:30 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:52962 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238779AbhGUNFw (ORCPT ); Wed, 21 Jul 2021 09:05:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875187; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YIMMVaqdVhebN2c0WSSt5y4n7OTol1pb5VWKxV3dgW0=; b=cFaDkSjwBoUpP1t3bd7kV/xlAdKt/uKl5Os6WgvX40k/eppSUaTYNkHuYJanmEha0h84le NRyvCuMtnTgL+Pj4+Ri/+70OWF5zvS8w/x4qYzIEzotifq5p96Y/ti7stD8W80Git/cIBQ 6MpXZQpoGkaD126w6eyBhJGVcs/tWQc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-455-2dRCJwMHMCuYSt_yp_EJGw-1; Wed, 21 Jul 2021 09:46:25 -0400 X-MC-Unique: 2dRCJwMHMCuYSt_yp_EJGw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5FEE993921; Wed, 21 Jul 2021 13:46:23 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 45A5360583; Wed, 21 Jul 2021 13:46:19 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 07/12] netfs: Initiate write request from a dirty region From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:46:18 +0100 Message-ID: <162687517832.276387.10765642135364197990.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org Handle the initiation of writeback of a piece of the dirty list. The first region on the flush list is extracted and a write request is set up to manage it. The pages in the affected region are flipped from dirty to writeback-in-progress. The writeback is then dispatched (which currently just logs a "--- WRITE ---" message to dmesg and then abandons it). Notes: (*) A page may host multiple disjoint dirty regions, each with its own netfs_dirty_region, and a region may span multiple pages. Dirty regions are not permitted to overlap, though they may be merged if they would otherwise overlap. (*) A page may be involved in multiple simultaneous writebacks. Each one is managed by a separate netfs_dirty_region and netfs_write_request. (*) Multiple pages may be required to form a write (for crypto/compression purposes) and so adjacent non-dirty pages may also get marked for writeback. Signed-off-by: David Howells --- fs/afs/file.c | 128 ++---------------- fs/netfs/Makefile | 1 fs/netfs/internal.h | 16 ++ fs/netfs/objects.c | 78 +++++++++++ fs/netfs/read_helper.c | 34 +++++ fs/netfs/stats.c | 6 + fs/netfs/write_back.c | 306 ++++++++++++++++++++++++++++++++++++++++++ fs/netfs/xa_iterator.h | 85 ++++++++++++ include/linux/netfs.h | 35 +++++ include/trace/events/netfs.h | 72 ++++++++++ 10 files changed, 642 insertions(+), 119 deletions(-) create mode 100644 fs/netfs/write_back.c create mode 100644 fs/netfs/xa_iterator.h diff --git a/fs/afs/file.c b/fs/afs/file.c index 8400cdf086b6..a6d483fe4e74 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -19,9 +19,6 @@ static int afs_file_mmap(struct file *file, struct vm_area_struct *vma); static int afs_symlink_readpage(struct file *file, struct page *page); -static void afs_invalidatepage(struct page *page, unsigned int offset, - unsigned int length); -static int afs_releasepage(struct page *page, gfp_t gfp_flags); static ssize_t afs_direct_IO(struct kiocb *iocb, struct iov_iter *iter); @@ -50,17 +47,17 @@ const struct address_space_operations afs_file_aops = { .readahead = netfs_readahead, .set_page_dirty = afs_set_page_dirty, .launder_page = afs_launder_page, - .releasepage = afs_releasepage, - .invalidatepage = afs_invalidatepage, + .releasepage = netfs_releasepage, + .invalidatepage = netfs_invalidatepage, .direct_IO = afs_direct_IO, .writepage = afs_writepage, - .writepages = afs_writepages, + .writepages = netfs_writepages, }; const struct address_space_operations afs_symlink_aops = { .readpage = afs_symlink_readpage, - .releasepage = afs_releasepage, - .invalidatepage = afs_invalidatepage, + .releasepage = netfs_releasepage, + .invalidatepage = netfs_invalidatepage, }; static const struct vm_operations_struct afs_vm_ops = { @@ -378,6 +375,11 @@ static void afs_free_dirty_region(struct netfs_dirty_region *region) key_put(region->netfs_priv); } +static void afs_init_wreq(struct netfs_write_request *wreq) +{ + //wreq->netfs_priv = key_get(afs_file_key(file)); +} + static void afs_update_i_size(struct file *file, loff_t new_i_size) { struct afs_vnode *vnode = AFS_FS_I(file_inode(file)); @@ -400,6 +402,7 @@ const struct netfs_request_ops afs_req_ops = { .init_dirty_region = afs_init_dirty_region, .free_dirty_region = afs_free_dirty_region, .update_i_size = afs_update_i_size, + .init_wreq = afs_init_wreq, }; int afs_write_inode(struct inode *inode, struct writeback_control *wbc) @@ -408,115 +411,6 @@ int afs_write_inode(struct inode *inode, struct writeback_control *wbc) return 0; } -/* - * Adjust the dirty region of the page on truncation or full invalidation, - * getting rid of the markers altogether if the region is entirely invalidated. - */ -static void afs_invalidate_dirty(struct page *page, unsigned int offset, - unsigned int length) -{ - struct afs_vnode *vnode = AFS_FS_I(page->mapping->host); - unsigned long priv; - unsigned int f, t, end = offset + length; - - priv = page_private(page); - - /* we clean up only if the entire page is being invalidated */ - if (offset == 0 && length == thp_size(page)) - goto full_invalidate; - - /* If the page was dirtied by page_mkwrite(), the PTE stays writable - * and we don't get another notification to tell us to expand it - * again. - */ - if (afs_is_page_dirty_mmapped(priv)) - return; - - /* We may need to shorten the dirty region */ - f = afs_page_dirty_from(page, priv); - t = afs_page_dirty_to(page, priv); - - if (t <= offset || f >= end) - return; /* Doesn't overlap */ - - if (f < offset && t > end) - return; /* Splits the dirty region - just absorb it */ - - if (f >= offset && t <= end) - goto undirty; - - if (f < offset) - t = offset; - else - f = end; - if (f == t) - goto undirty; - - priv = afs_page_dirty(page, f, t); - set_page_private(page, priv); - trace_afs_page_dirty(vnode, tracepoint_string("trunc"), page); - return; - -undirty: - trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page); - clear_page_dirty_for_io(page); -full_invalidate: - trace_afs_page_dirty(vnode, tracepoint_string("inval"), page); - detach_page_private(page); -} - -/* - * invalidate part or all of a page - * - release a page and clean up its private data if offset is 0 (indicating - * the entire page) - */ -static void afs_invalidatepage(struct page *page, unsigned int offset, - unsigned int length) -{ - _enter("{%lu},%u,%u", page->index, offset, length); - - BUG_ON(!PageLocked(page)); - - if (PagePrivate(page)) - afs_invalidate_dirty(page, offset, length); - - wait_on_page_fscache(page); - _leave(""); -} - -/* - * release a page and clean up its private state if it's not busy - * - return true if the page can now be released, false if not - */ -static int afs_releasepage(struct page *page, gfp_t gfp_flags) -{ - struct afs_vnode *vnode = AFS_FS_I(page->mapping->host); - - _enter("{{%llx:%llu}[%lu],%lx},%x", - vnode->fid.vid, vnode->fid.vnode, page->index, page->flags, - gfp_flags); - - /* deny if page is being written to the cache and the caller hasn't - * elected to wait */ -#ifdef CONFIG_AFS_FSCACHE - if (PageFsCache(page)) { - if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS)) - return false; - wait_on_page_fscache(page); - fscache_note_page_release(afs_vnode_cache(vnode)); - } -#endif - - if (PagePrivate(page)) { - trace_afs_page_dirty(vnode, tracepoint_string("rel"), page); - detach_page_private(page); - } - - /* indicate that the page can be released */ - _leave(" = T"); - return 1; -} - /* * Handle setting up a memory mapping on an AFS file. */ diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index 3e11453ad2c5..a201fd7b22cf 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -3,6 +3,7 @@ netfs-y := \ objects.o \ read_helper.o \ + write_back.o \ write_helper.o # dio_helper.o diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index 77ceab694348..fe85581d8ac0 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -8,6 +8,7 @@ #include #include #include +#include "xa_iterator.h" #ifdef pr_fmt #undef pr_fmt @@ -34,6 +35,19 @@ void netfs_free_dirty_region(struct netfs_i_context *ctx, struct netfs_dirty_reg void netfs_put_dirty_region(struct netfs_i_context *ctx, struct netfs_dirty_region *region, enum netfs_region_trace what); +struct netfs_write_request *netfs_alloc_write_request(struct address_space *mapping, + bool is_dio); +void netfs_get_write_request(struct netfs_write_request *wreq, + enum netfs_wreq_trace what); +void netfs_free_write_request(struct work_struct *work); +void netfs_put_write_request(struct netfs_write_request *wreq, + bool was_async, enum netfs_wreq_trace what); + +static inline void netfs_see_write_request(struct netfs_write_request *wreq, + enum netfs_wreq_trace what) +{ + trace_netfs_ref_wreq(wreq->debug_id, refcount_read(&wreq->usage), what); +} /* * read_helper.c @@ -46,6 +60,7 @@ int netfs_prefetch_for_write(struct file *file, struct page *page, loff_t pos, s /* * write_helper.c */ +void netfs_writeback_worker(struct work_struct *work); void netfs_flush_region(struct netfs_i_context *ctx, struct netfs_dirty_region *region, enum netfs_dirty_trace why); @@ -74,6 +89,7 @@ extern atomic_t netfs_n_rh_write_failed; extern atomic_t netfs_n_rh_write_zskip; extern atomic_t netfs_n_wh_region; extern atomic_t netfs_n_wh_flush_group; +extern atomic_t netfs_n_wh_wreq; static inline void netfs_stat(atomic_t *stat) diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c index ba1e052aa352..6e9b2a00076d 100644 --- a/fs/netfs/objects.c +++ b/fs/netfs/objects.c @@ -111,3 +111,81 @@ void netfs_put_dirty_region(struct netfs_i_context *ctx, netfs_free_dirty_region(ctx, region); } } + +struct netfs_write_request *netfs_alloc_write_request(struct address_space *mapping, + bool is_dio) +{ + static atomic_t debug_ids; + struct inode *inode = mapping->host; + struct netfs_i_context *ctx = netfs_i_context(inode); + struct netfs_write_request *wreq; + + wreq = kzalloc(sizeof(struct netfs_write_request), GFP_KERNEL); + if (wreq) { + wreq->mapping = mapping; + wreq->inode = inode; + wreq->netfs_ops = ctx->ops; + wreq->debug_id = atomic_inc_return(&debug_ids); + xa_init(&wreq->buffer); + INIT_WORK(&wreq->work, netfs_writeback_worker); + refcount_set(&wreq->usage, 1); + ctx->ops->init_wreq(wreq); + netfs_stat(&netfs_n_wh_wreq); + trace_netfs_ref_wreq(wreq->debug_id, 1, netfs_wreq_trace_new); + } + + return wreq; +} + +void netfs_get_write_request(struct netfs_write_request *wreq, + enum netfs_wreq_trace what) +{ + int ref; + + __refcount_inc(&wreq->usage, &ref); + trace_netfs_ref_wreq(wreq->debug_id, ref + 1, what); +} + +void netfs_free_write_request(struct work_struct *work) +{ + struct netfs_write_request *wreq = + container_of(work, struct netfs_write_request, work); + struct netfs_i_context *ctx = netfs_i_context(wreq->inode); + struct page *page; + pgoff_t index; + + if (wreq->netfs_priv) + wreq->netfs_ops->cleanup(wreq->mapping, wreq->netfs_priv); + trace_netfs_ref_wreq(wreq->debug_id, 0, netfs_wreq_trace_free); + if (wreq->cache_resources.ops) + wreq->cache_resources.ops->end_operation(&wreq->cache_resources); + if (wreq->region) + netfs_put_dirty_region(ctx, wreq->region, + netfs_region_trace_put_wreq); + xa_for_each(&wreq->buffer, index, page) { + __free_page(page); + } + xa_destroy(&wreq->buffer); + kfree(wreq); + netfs_stat_d(&netfs_n_wh_wreq); +} + +void netfs_put_write_request(struct netfs_write_request *wreq, + bool was_async, enum netfs_wreq_trace what) +{ + unsigned int debug_id = wreq->debug_id; + bool dead; + int ref; + + dead = __refcount_dec_and_test(&wreq->usage, &ref); + trace_netfs_ref_wreq(debug_id, ref - 1, what); + if (dead) { + if (was_async) { + wreq->work.func = netfs_free_write_request; + if (!queue_work(system_unbound_wq, &wreq->work)) + BUG(); + } else { + netfs_free_write_request(&wreq->work); + } + } +} diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index bfcdbbd32f4c..0b771f2f5449 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -1415,3 +1415,37 @@ int netfs_prefetch_for_write(struct file *file, struct page *page, _leave(" = %d", ret); return ret; } + +/* + * Invalidate part or all of a page + * - release a page and clean up its private data if offset is 0 (indicating + * the entire page) + */ +void netfs_invalidatepage(struct page *page, unsigned int offset, unsigned int length) +{ + _enter("{%lu},%u,%u", page->index, offset, length); + + wait_on_page_fscache(page); +} +EXPORT_SYMBOL(netfs_invalidatepage); + +/* + * Release a page and clean up its private state if it's not busy + * - return true if the page can now be released, false if not + */ +int netfs_releasepage(struct page *page, gfp_t gfp_flags) +{ + struct netfs_i_context *ctx = netfs_i_context(page->mapping->host); + + kenter(""); + + if (PageFsCache(page)) { + if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS)) + return false; + wait_on_page_fscache(page); + fscache_note_page_release(ctx->cache); + } + + return true; +} +EXPORT_SYMBOL(netfs_releasepage); diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index 7c079ca47b5b..ac2510f8cab0 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -29,6 +29,7 @@ atomic_t netfs_n_rh_write_failed; atomic_t netfs_n_rh_write_zskip; atomic_t netfs_n_wh_region; atomic_t netfs_n_wh_flush_group; +atomic_t netfs_n_wh_wreq; void netfs_stats_show(struct seq_file *m) { @@ -56,8 +57,9 @@ void netfs_stats_show(struct seq_file *m) atomic_read(&netfs_n_rh_write), atomic_read(&netfs_n_rh_write_done), atomic_read(&netfs_n_rh_write_failed)); - seq_printf(m, "WrHelp : R=%u F=%u\n", + seq_printf(m, "WrHelp : R=%u F=%u wr=%u\n", atomic_read(&netfs_n_wh_region), - atomic_read(&netfs_n_wh_flush_group)); + atomic_read(&netfs_n_wh_flush_group), + atomic_read(&netfs_n_wh_wreq)); } EXPORT_SYMBOL(netfs_stats_show); diff --git a/fs/netfs/write_back.c b/fs/netfs/write_back.c new file mode 100644 index 000000000000..9fcb2ac50ebb --- /dev/null +++ b/fs/netfs/write_back.c @@ -0,0 +1,306 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Network filesystem high-level write support. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include "internal.h" + +/* + * Process a write request. + */ +static void netfs_writeback(struct netfs_write_request *wreq) +{ + kdebug("--- WRITE ---"); +} + +void netfs_writeback_worker(struct work_struct *work) +{ + struct netfs_write_request *wreq = + container_of(work, struct netfs_write_request, work); + + netfs_see_write_request(wreq, netfs_wreq_trace_see_work); + netfs_writeback(wreq); + netfs_put_write_request(wreq, false, netfs_wreq_trace_put_work); +} + +/* + * Flush some of the dirty queue. + */ +static int netfs_flush_dirty(struct address_space *mapping, + struct writeback_control *wbc, + struct netfs_range *range, + loff_t *next) +{ + struct netfs_dirty_region *p, *q; + struct netfs_i_context *ctx = netfs_i_context(mapping->host); + + kenter("%llx-%llx", range->start, range->end); + + spin_lock(&ctx->lock); + + /* Scan forwards to find dirty regions containing the suggested start + * point. + */ + list_for_each_entry_safe(p, q, &ctx->dirty_regions, dirty_link) { + _debug("D=%x %llx-%llx", p->debug_id, p->dirty.start, p->dirty.end); + if (p->dirty.end <= range->start) + continue; + if (p->dirty.start >= range->end) + break; + if (p->state != NETFS_REGION_IS_DIRTY) + continue; + if (test_bit(NETFS_REGION_FLUSH_Q, &p->flags)) + continue; + + netfs_flush_region(ctx, p, netfs_dirty_trace_flush_writepages); + } + + spin_unlock(&ctx->lock); + return 0; +} + +static int netfs_unlock_pages_iterator(struct page *page) +{ + unlock_page(page); + put_page(page); + return 0; +} + +/* + * Unlock all the pages in a range. + */ +static void netfs_unlock_pages(struct address_space *mapping, + pgoff_t start, pgoff_t end) +{ + netfs_iterate_pages(mapping, start, end, netfs_unlock_pages_iterator); +} + +static int netfs_lock_pages_iterator(struct xa_state *xas, + struct page *page, + struct netfs_write_request *wreq, + struct writeback_control *wbc) +{ + int ret; + + /* At this point we hold neither the i_pages lock nor the + * page lock: the page may be truncated or invalidated + * (changing page->mapping to NULL), or even swizzled + * back from swapper_space to tmpfs file mapping + */ + if (wbc->sync_mode != WB_SYNC_NONE) { + xas_pause(xas); + rcu_read_unlock(); + ret = lock_page_killable(page); + rcu_read_lock(); + } else { + if (!trylock_page(page)) + ret = -EBUSY; + } + + return ret; +} + +/* + * Lock all the pages in a range and add them to the write request. + */ +static int netfs_lock_pages(struct address_space *mapping, + struct writeback_control *wbc, + struct netfs_write_request *wreq) +{ + pgoff_t last = wreq->last; + int ret; + + kenter("%lx-%lx", wreq->first, wreq->last); + ret = netfs_iterate_get_pages(mapping, wreq->first, wreq->last, + netfs_lock_pages_iterator, wreq, wbc); + if (ret < 0) + goto failed; + + if (wreq->last < last) { + kdebug("Some pages missing %lx < %lx", wreq->last, last); + ret = -EIO; + goto failed; + } + + return 0; + +failed: + netfs_unlock_pages(mapping, wreq->first, wreq->last); + return ret; +} + +static int netfs_set_page_writeback(struct page *page) +{ + /* Now we need to clear the dirty flags on any page that's not shared + * with any other dirty region. + */ + if (!clear_page_dirty_for_io(page)) + BUG(); + + /* We set writeback unconditionally because a page may participate in + * more than one simultaneous writeback. + */ + set_page_writeback(page); + return 0; +} + +/* + * Extract a region to write back. + */ +static struct netfs_dirty_region *netfs_extract_dirty_region( + struct netfs_i_context *ctx, + struct netfs_write_request *wreq) +{ + struct netfs_dirty_region *region = NULL, *spare; + + spare = netfs_alloc_dirty_region(); + if (!spare) + return NULL; + + spin_lock(&ctx->lock); + + if (list_empty(&ctx->flush_queue)) + goto out; + + region = list_first_entry(&ctx->flush_queue, + struct netfs_dirty_region, flush_link); + + wreq->region = netfs_get_dirty_region(ctx, region, netfs_region_trace_get_wreq); + wreq->start = region->dirty.start; + wreq->len = region->dirty.end - region->dirty.start; + wreq->first = region->dirty.start / PAGE_SIZE; + wreq->last = (region->dirty.end - 1) / PAGE_SIZE; + + /* TODO: Split the region if it's larger than a certain size. This is + * tricky as we need to observe page, crypto and compression block + * boundaries. The crypto/comp bounds are defined by ctx->bsize, but + * we don't know where the page boundaries are. + * + * All of these boundaries, however, must be pow-of-2 sized and + * pow-of-2 aligned, so they never partially overlap + */ + + smp_store_release(®ion->state, NETFS_REGION_IS_FLUSHING); + trace_netfs_dirty(ctx, region, NULL, netfs_dirty_trace_flushing); + wake_up_var(®ion->state); + list_del_init(®ion->flush_link); + +out: + spin_unlock(&ctx->lock); + netfs_free_dirty_region(ctx, spare); + kleave(" = D=%x", region ? region->debug_id : 0); + return region; +} + +/* + * Schedule a write for the first region on the flush queue. + */ +static int netfs_begin_write(struct address_space *mapping, + struct writeback_control *wbc) +{ + struct netfs_write_request *wreq; + struct netfs_dirty_region *region; + struct netfs_i_context *ctx = netfs_i_context(mapping->host); + int ret; + + wreq = netfs_alloc_write_request(mapping, false); + if (!wreq) + return -ENOMEM; + + ret = 0; + region = netfs_extract_dirty_region(ctx, wreq); + if (!region) + goto error; + + ret = netfs_lock_pages(mapping, wbc, wreq); + if (ret < 0) + goto error; + + trace_netfs_wreq(wreq); + + netfs_iterate_pages(mapping, wreq->first, wreq->last, + netfs_set_page_writeback); + netfs_unlock_pages(mapping, wreq->first, wreq->last); + iov_iter_xarray(&wreq->source, WRITE, &wreq->mapping->i_pages, + wreq->start, wreq->len); + + if (!queue_work(system_unbound_wq, &wreq->work)) + BUG(); + + kleave(" = %lu", wreq->last - wreq->first + 1); + return wreq->last - wreq->first + 1; + +error: + netfs_put_write_request(wreq, wbc->sync_mode != WB_SYNC_NONE, + netfs_wreq_trace_put_discard); + kleave(" = %d", ret); + return ret; +} + +/** + * netfs_writepages - Initiate writeback to the server and cache + * @mapping: The pagecache to write from + * @wbc: Hints from the VM as to what to write + * + * This is a helper intended to be called directly from a network filesystem's + * address space operations table to perform writeback to the server and the + * cache. + * + * We have to be careful as we can end up racing with setattr() truncating the + * pagecache since the caller doesn't take a lock here to prevent it. + */ +int netfs_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + struct netfs_range range; + loff_t next; + int ret; + + kenter("%lx,%llx-%llx,%u,%c%c%c%c,%u,%u", + wbc->nr_to_write, + wbc->range_start, wbc->range_end, + wbc->sync_mode, + wbc->for_kupdate ? 'k' : '-', + wbc->for_background ? 'b' : '-', + wbc->for_reclaim ? 'r' : '-', + wbc->for_sync ? 's' : '-', + wbc->tagged_writepages, + wbc->range_cyclic); + + //dump_stack(); + + if (wbc->range_cyclic) { + range.start = mapping->writeback_index * PAGE_SIZE; + range.end = ULLONG_MAX; + ret = netfs_flush_dirty(mapping, wbc, &range, &next); + if (range.start > 0 && wbc->nr_to_write > 0 && ret == 0) { + range.start = 0; + range.end = mapping->writeback_index * PAGE_SIZE; + ret = netfs_flush_dirty(mapping, wbc, &range, &next); + } + mapping->writeback_index = next / PAGE_SIZE; + } else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) { + range.start = 0; + range.end = ULLONG_MAX; + ret = netfs_flush_dirty(mapping, wbc, &range, &next); + if (wbc->nr_to_write > 0 && ret == 0) + mapping->writeback_index = next; + } else { + range.start = wbc->range_start; + range.end = wbc->range_end + 1; + ret = netfs_flush_dirty(mapping, wbc, &range, &next); + } + + if (ret == 0) + ret = netfs_begin_write(mapping, wbc); + + _leave(" = %d", ret); + return ret; +} +EXPORT_SYMBOL(netfs_writepages); diff --git a/fs/netfs/xa_iterator.h b/fs/netfs/xa_iterator.h new file mode 100644 index 000000000000..3f37827f0f99 --- /dev/null +++ b/fs/netfs/xa_iterator.h @@ -0,0 +1,85 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* xarray iterator macros for netfslib. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +/* + * Iterate over a range of pages. xarray locks are not held over the iterator + * function, so it can sleep if necessary. The start and end positions are + * updated to indicate the span of pages actually processed. + */ +#define netfs_iterate_pages(MAPPING, START, END, ITERATOR, ...) \ + ({ \ + unsigned long __it_index; \ + struct page *page; \ + pgoff_t __it_start = (START); \ + pgoff_t __it_end = (END); \ + pgoff_t __it_tmp; \ + int ret = 0; \ + \ + (END) = __it_start; \ + xa_for_each_range(&(MAPPING)->i_pages, __it_index, page, \ + __it_start, __it_end) { \ + if (xa_is_value(page)) { \ + ret = -EIO; /* Not a real page. */ \ + break; \ + } \ + if (__it_index < (START)) \ + (START) = __it_index; \ + ret = ITERATOR(page, ##__VA_ARGS__); \ + if (ret < 0) \ + break; \ + __it_tmp = __it_index + thp_nr_pages(page) - 1; \ + if (__it_tmp > (END)) \ + (END) = __it_tmp; \ + } \ + ret; \ + }) + +/* + * Iterate over a set of pages, getting each one before calling the iteration + * function. The iteration function may drop the RCU read lock, but should + * call xas_pause() before it does so. The start and end positions are updated + * to indicate the span of pages actually processed. + */ +#define netfs_iterate_get_pages(MAPPING, START, END, ITERATOR, ...) \ + ({ \ + unsigned long __it_index; \ + struct page *page; \ + pgoff_t __it_start = (START); \ + pgoff_t __it_end = (END); \ + pgoff_t __it_tmp; \ + int ret = 0; \ + \ + XA_STATE(xas, &(MAPPING)->i_pages, __it_start); \ + (END) = __it_start; \ + rcu_read_lock(); \ + for (page = xas_load(&xas); page; page = xas_next_entry(&xas, __it_end)) { \ + if (xas_retry(&xas, page)) \ + continue; \ + if (xa_is_value(page)) \ + break; \ + if (!page_cache_get_speculative(page)) { \ + xas_reset(&xas); \ + continue; \ + } \ + if (unlikely(page != xas_reload(&xas))) { \ + put_page(page); \ + xas_reset(&xas); \ + continue; \ + } \ + __it_index = page_index(page); \ + if (__it_index < (START)) \ + (START) = __it_index; \ + ret = ITERATOR(&xas, page, ##__VA_ARGS__); \ + if (ret < 0) \ + break; \ + __it_tmp = __it_index + thp_nr_pages(page) - 1; \ + if (__it_tmp > (END)) \ + (END) = __it_tmp; \ + } \ + rcu_read_unlock(); \ + ret; \ + }) diff --git a/include/linux/netfs.h b/include/linux/netfs.h index fc91711d3178..9f874e7ed45a 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -242,6 +242,35 @@ struct netfs_dirty_region { refcount_t ref; }; +/* + * Descriptor for a write request. This is used to manage the preparation and + * storage of a sequence of dirty data - its compression/encryption and its + * writing to one or more servers and the cache. + * + * The prepared data is buffered here. + */ +struct netfs_write_request { + struct work_struct work; + struct inode *inode; /* The file being accessed */ + struct address_space *mapping; /* The mapping being accessed */ + struct netfs_dirty_region *region; /* The region we're writing back */ + struct netfs_cache_resources cache_resources; + struct xarray buffer; /* Buffer for encrypted/compressed data */ + struct iov_iter source; /* The iterator to be used */ + struct list_head write_link; /* Link in i_context->write_requests */ + void *netfs_priv; /* Private data for the netfs */ + unsigned int debug_id; + short error; /* 0 or error that occurred */ + loff_t i_size; /* Size of the file */ + loff_t start; /* Start position */ + size_t len; /* Length of the request */ + pgoff_t first; /* First page included */ + pgoff_t last; /* Last page included */ + refcount_t usage; + unsigned long flags; + const struct netfs_request_ops *netfs_ops; +}; + enum netfs_write_compatibility { NETFS_WRITES_COMPATIBLE, /* Dirty regions can be directly merged */ NETFS_WRITES_SUPERSEDE, /* Second write can supersede the first without first @@ -275,6 +304,9 @@ struct netfs_request_ops { struct netfs_dirty_region *candidate); bool (*check_compatible_write)(struct netfs_dirty_region *region, struct file *file); void (*update_i_size)(struct file *file, loff_t i_size); + + /* Write request handling */ + void (*init_wreq)(struct netfs_write_request *wreq); }; /* @@ -324,6 +356,9 @@ extern int netfs_write_begin(struct file *, struct address_space *, loff_t, unsigned int, unsigned int, struct page **, void **); extern ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from); +extern int netfs_writepages(struct address_space *mapping, struct writeback_control *wbc); +extern void netfs_invalidatepage(struct page *page, unsigned int offset, unsigned int length); +extern int netfs_releasepage(struct page *page, gfp_t gfp_flags); extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); extern void netfs_stats_show(struct seq_file *); diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index 808433e6ddd3..e70abb5033e6 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -63,6 +63,8 @@ enum netfs_dirty_trace { netfs_dirty_trace_complete, netfs_dirty_trace_flush_conflict, netfs_dirty_trace_flush_dsync, + netfs_dirty_trace_flush_writepages, + netfs_dirty_trace_flushing, netfs_dirty_trace_merged_back, netfs_dirty_trace_merged_forw, netfs_dirty_trace_merged_sub, @@ -82,11 +84,20 @@ enum netfs_region_trace { netfs_region_trace_get_wreq, netfs_region_trace_put_discard, netfs_region_trace_put_merged, + netfs_region_trace_put_wreq, netfs_region_trace_put_write_iter, netfs_region_trace_free, netfs_region_trace_new, }; +enum netfs_wreq_trace { + netfs_wreq_trace_free, + netfs_wreq_trace_put_discard, + netfs_wreq_trace_put_work, + netfs_wreq_trace_see_work, + netfs_wreq_trace_new, +}; + #endif #define netfs_read_traces \ @@ -149,6 +160,8 @@ enum netfs_region_trace { EM(netfs_dirty_trace_complete, "COMPLETE ") \ EM(netfs_dirty_trace_flush_conflict, "FLSH CONFL") \ EM(netfs_dirty_trace_flush_dsync, "FLSH DSYNC") \ + EM(netfs_dirty_trace_flush_writepages, "WRITEPAGES") \ + EM(netfs_dirty_trace_flushing, "FLUSHING ") \ EM(netfs_dirty_trace_merged_back, "MERGE BACK") \ EM(netfs_dirty_trace_merged_forw, "MERGE FORW") \ EM(netfs_dirty_trace_merged_sub, "SUBSUMED ") \ @@ -167,10 +180,19 @@ enum netfs_region_trace { EM(netfs_region_trace_get_wreq, "GET WREQ ") \ EM(netfs_region_trace_put_discard, "PUT DISCARD") \ EM(netfs_region_trace_put_merged, "PUT MERGED ") \ + EM(netfs_region_trace_put_wreq, "PUT WREQ ") \ EM(netfs_region_trace_put_write_iter, "PUT WRITER ") \ EM(netfs_region_trace_free, "FREE ") \ E_(netfs_region_trace_new, "NEW ") +#define netfs_wreq_traces \ + EM(netfs_wreq_trace_free, "FREE ") \ + EM(netfs_wreq_trace_put_discard, "PUT DISCARD") \ + EM(netfs_wreq_trace_put_work, "PUT WORK ") \ + EM(netfs_wreq_trace_see_work, "SEE WORK ") \ + E_(netfs_wreq_trace_new, "NEW ") + + /* * Export enum symbols via userspace. */ @@ -187,6 +209,7 @@ netfs_failures; netfs_region_types; netfs_region_states; netfs_dirty_traces; +netfs_wreq_traces; /* * Now redefine the EM() and E_() macros to map the enums to the strings that @@ -435,6 +458,55 @@ TRACE_EVENT(netfs_dirty, ) ); +TRACE_EVENT(netfs_wreq, + TP_PROTO(struct netfs_write_request *wreq), + + TP_ARGS(wreq), + + TP_STRUCT__entry( + __field(unsigned int, wreq ) + __field(unsigned int, cookie ) + __field(loff_t, start ) + __field(size_t, len ) + ), + + TP_fast_assign( + __entry->wreq = wreq->debug_id; + __entry->cookie = wreq->cache_resources.debug_id; + __entry->start = wreq->start; + __entry->len = wreq->len; + ), + + TP_printk("W=%08x c=%08x s=%llx %zx", + __entry->wreq, + __entry->cookie, + __entry->start, __entry->len) + ); + +TRACE_EVENT(netfs_ref_wreq, + TP_PROTO(unsigned int wreq_debug_id, int ref, + enum netfs_wreq_trace what), + + TP_ARGS(wreq_debug_id, ref, what), + + TP_STRUCT__entry( + __field(unsigned int, wreq ) + __field(int, ref ) + __field(enum netfs_wreq_trace, what ) + ), + + TP_fast_assign( + __entry->wreq = wreq_debug_id; + __entry->ref = ref; + __entry->what = what; + ), + + TP_printk("W=%08x %s r=%u", + __entry->wreq, + __print_symbolic(__entry->what, netfs_wreq_traces), + __entry->ref) + ); + #endif /* _TRACE_NETFS_H */ /* This part must be outside protection */ From patchwork Wed Jul 21 13:46:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12390949 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 093FFC636C9 for ; Wed, 21 Jul 2021 13:47:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E35D860FF1 for ; Wed, 21 Jul 2021 13:47:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238560AbhGUNGX (ORCPT ); Wed, 21 Jul 2021 09:06:23 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:49521 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238722AbhGUNGD (ORCPT ); Wed, 21 Jul 2021 09:06:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875196; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eiYfBTzqBLwjxFjmd1wf3xJRqWQqlFAJpRRIEev0yQ8=; b=SjgX77vvFYy6OLBH7FPl+SqOlRaWhllHyvAx9oBDtkGrYOF/OI7ah2xbKGQjjC6pEyGgbp 3STJ2MHK/RRzZwjbE+N/rmOKg16ZPQFYzGfreFyW5MuALZekh+laWm9zO3wPEuE8ZdJQVk Ze2i824PMq0h597pgkWVoOOIKrOaW8Q= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-238-BDsfvdSzMLKJ9mscsg0FJA-1; Wed, 21 Jul 2021 09:46:35 -0400 X-MC-Unique: BDsfvdSzMLKJ9mscsg0FJA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 18D8C18C8C0C; Wed, 21 Jul 2021 13:46:33 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 75AF75C225; Wed, 21 Jul 2021 13:46:29 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 08/12] netfs: Keep dirty mark for pages with more than one dirty region From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:46:28 +0100 Message-ID: <162687518862.276387.262991356873597293.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org If a page has more than one dirty region overlapping it, then we mustn't clear the dirty mark when we want to flush one of them. Make netfs_set_page_writeback() check the adjacent dirty regions to see if they overlap the page(s) the region we're interested in, and if they do, leave the page marked dirty. NOTES: (1) Might want to discount the overlapping regions if they're being flushed (in which case they wouldn't normally want to hold the dirty bit). (2) Similarly, the writeback mark should not be cleared if the page is still being written back by another, overlapping region. Signed-off-by: David Howells --- fs/netfs/write_back.c | 41 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 38 insertions(+), 3 deletions(-) diff --git a/fs/netfs/write_back.c b/fs/netfs/write_back.c index 9fcb2ac50ebb..5c779cb12345 100644 --- a/fs/netfs/write_back.c +++ b/fs/netfs/write_back.c @@ -135,12 +135,47 @@ static int netfs_lock_pages(struct address_space *mapping, return ret; } -static int netfs_set_page_writeback(struct page *page) +static int netfs_set_page_writeback(struct page *page, + struct netfs_i_context *ctx, + struct netfs_write_request *wreq) { + struct netfs_dirty_region *region = wreq->region, *r; + loff_t pos = page_offset(page); + bool clear_dirty = true; + /* Now we need to clear the dirty flags on any page that's not shared * with any other dirty region. */ - if (!clear_page_dirty_for_io(page)) + spin_lock(&ctx->lock); + if (pos < region->dirty.start) { + r = region; + list_for_each_entry_continue_reverse(r, &ctx->dirty_regions, dirty_link) { + if (r->dirty.end <= pos) + break; + if (r->state < NETFS_REGION_IS_DIRTY) + continue; + kdebug("keep-dirty-b %lx reg=%x r=%x", + page->index, region->debug_id, r->debug_id); + clear_dirty = false; + } + } + + pos += thp_size(page); + if (pos > region->dirty.end) { + r = region; + list_for_each_entry_continue(r, &ctx->dirty_regions, dirty_link) { + if (r->dirty.start >= pos) + break; + if (r->state < NETFS_REGION_IS_DIRTY) + continue; + kdebug("keep-dirty-f %lx reg=%x r=%x", + page->index, region->debug_id, r->debug_id); + clear_dirty = false; + } + } + spin_unlock(&ctx->lock); + + if (clear_dirty && !clear_page_dirty_for_io(page)) BUG(); /* We set writeback unconditionally because a page may participate in @@ -225,7 +260,7 @@ static int netfs_begin_write(struct address_space *mapping, trace_netfs_wreq(wreq); netfs_iterate_pages(mapping, wreq->first, wreq->last, - netfs_set_page_writeback); + netfs_set_page_writeback, ctx, wreq); netfs_unlock_pages(mapping, wreq->first, wreq->last); iov_iter_xarray(&wreq->source, WRITE, &wreq->mapping->i_pages, wreq->start, wreq->len); From patchwork Wed Jul 21 13:46:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12390957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60BFEC12002 for ; Wed, 21 Jul 2021 13:47:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4617160FF1 for ; Wed, 21 Jul 2021 13:47:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238629AbhGUNGn (ORCPT ); Wed, 21 Jul 2021 09:06:43 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:52124 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238793AbhGUNGL (ORCPT ); Wed, 21 Jul 2021 09:06:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875207; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PoKprYNGtQBdlM+LSdI4bPE2v5xsXclxnPzqIdG9rOU=; b=EoAHttHX5EtHzyDe9cv++5bmhhth98vlSrITbWEYfcYQTs6ryK6x3RS0o8gliwC6phLvJm Bx2ug2RYiD6A6hUNn9mOZ7RjRcAQoDYCfigAkW3HwXeu6IGuEyW+NuO5K/ZltKKWW0f/nU NnCD2SUcNufsU0VXiYqdUVjQtZRam/Y= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-231-Sw-cwihGMLmVlWaby_ARfw-1; Wed, 21 Jul 2021 09:46:45 -0400 X-MC-Unique: Sw-cwihGMLmVlWaby_ARfw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 47C96107ACF5; Wed, 21 Jul 2021 13:46:43 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2F6726091B; Wed, 21 Jul 2021 13:46:39 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 09/12] netfs: Send write request to multiple destinations From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:46:38 +0100 Message-ID: <162687519833.276387.1376700874310007511.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org Write requests are set up to have a number of "write streams", whereby each stream writes the entire request to a different destination. Destination types include server uploads and cache writes. Each stream may be segmented into a series of writes that can be issued consecutively, for example uploading to an AFS server, writing to a cache or both. A request has, or will have, a number of phases: (1) Preparation. The data may need to be copied into a buffer and compressed or encrypted. The modified data would then be stored to the cache or the server. (2) Writing. Each stream writes the data. (3) Completion. The pages are cleaned or redirtied as appropriate and the dirty list is updated to remove the now flushed region. Waiting write requests that are wholly within the range now made available are woken up. Signed-off-by: David Howells --- fs/afs/file.c | 1 fs/afs/inode.c | 13 ++ fs/afs/internal.h | 2 fs/afs/write.c | 179 ++++++------------------------ fs/netfs/internal.h | 6 + fs/netfs/objects.c | 25 ++++ fs/netfs/stats.c | 14 ++ fs/netfs/write_back.c | 249 ++++++++++++++++++++++++++++++++++++++++++ fs/netfs/write_helper.c | 28 +++-- fs/netfs/xa_iterator.h | 31 +++++ include/linux/netfs.h | 65 +++++++++++ include/trace/events/netfs.h | 61 ++++++++++ 12 files changed, 515 insertions(+), 159 deletions(-) diff --git a/fs/afs/file.c b/fs/afs/file.c index a6d483fe4e74..22030d5191cd 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -403,6 +403,7 @@ const struct netfs_request_ops afs_req_ops = { .free_dirty_region = afs_free_dirty_region, .update_i_size = afs_update_i_size, .init_wreq = afs_init_wreq, + .add_write_streams = afs_add_write_streams, }; int afs_write_inode(struct inode *inode, struct writeback_control *wbc) diff --git a/fs/afs/inode.c b/fs/afs/inode.c index 3e9e388245a1..a6ae031461c7 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -449,6 +449,15 @@ static void afs_get_inode_cache(struct afs_vnode *vnode) #endif } +static void afs_set_netfs_context(struct afs_vnode *vnode) +{ + struct netfs_i_context *ctx = netfs_i_context(&vnode->vfs_inode); + + netfs_i_context_init(&vnode->vfs_inode, &afs_req_ops); + ctx->n_wstreams = 1; + ctx->bsize = PAGE_SIZE; +} + /* * inode retrieval */ @@ -479,7 +488,7 @@ struct inode *afs_iget(struct afs_operation *op, struct afs_vnode_param *vp) return inode; } - netfs_i_context_init(inode, &afs_req_ops); + afs_set_netfs_context(vnode); ret = afs_inode_init_from_status(op, vp, vnode); if (ret < 0) goto bad_inode; @@ -536,10 +545,10 @@ struct inode *afs_root_iget(struct super_block *sb, struct key *key) _debug("GOT ROOT INODE %p { vl=%llx }", inode, as->volume->vid); BUG_ON(!(inode->i_state & I_NEW)); - netfs_i_context_init(inode, &afs_req_ops); vnode = AFS_FS_I(inode); vnode->cb_v_break = as->volume->cb_v_break, + afs_set_netfs_context(vnode); op = afs_alloc_operation(key, as->volume); if (IS_ERR(op)) { diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 0d01ed2fe8fa..32a36b96cc9b 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -1512,12 +1512,12 @@ extern int afs_check_volume_status(struct afs_volume *, struct afs_operation *); */ extern int afs_set_page_dirty(struct page *); extern int afs_writepage(struct page *, struct writeback_control *); -extern int afs_writepages(struct address_space *, struct writeback_control *); extern int afs_fsync(struct file *, loff_t, loff_t, int); extern vm_fault_t afs_page_mkwrite(struct vm_fault *vmf); extern void afs_prune_wb_keys(struct afs_vnode *); extern int afs_launder_page(struct page *); extern ssize_t afs_file_direct_write(struct kiocb *, struct iov_iter *); +extern void afs_add_write_streams(struct netfs_write_request *); /* * xattr.c diff --git a/fs/afs/write.c b/fs/afs/write.c index e6e2e924c8ae..0668389f3466 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "internal.h" static void afs_write_to_cache(struct afs_vnode *vnode, loff_t start, size_t len, @@ -120,31 +121,9 @@ static void afs_redirty_pages(struct writeback_control *wbc, */ static void afs_pages_written_back(struct afs_vnode *vnode, loff_t start, unsigned int len) { - struct address_space *mapping = vnode->vfs_inode.i_mapping; - struct page *page; - pgoff_t end; - - XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); - _enter("{%llx:%llu},{%x @%llx}", vnode->fid.vid, vnode->fid.vnode, len, start); - rcu_read_lock(); - - end = (start + len - 1) / PAGE_SIZE; - xas_for_each(&xas, page, end) { - if (!PageWriteback(page)) { - kdebug("bad %x @%llx page %lx %lx", len, start, page->index, end); - ASSERT(PageWriteback(page)); - } - - trace_afs_page_dirty(vnode, tracepoint_string("clear"), page); - detach_page_private(page); - page_endio(page, true, 0); - } - - rcu_read_unlock(); - afs_prune_wb_keys(vnode); _leave(""); } @@ -281,6 +260,39 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t return afs_put_operation(op); } +static void afs_upload_to_server(struct netfs_write_stream *stream, + struct netfs_write_request *wreq) +{ + struct afs_vnode *vnode = AFS_FS_I(wreq->inode); + ssize_t ret; + + kenter("%u", stream->index); + + trace_netfs_wstr(stream, netfs_write_stream_submit); + ret = afs_store_data(vnode, &wreq->source, wreq->start, false); + netfs_write_stream_completed(stream, ret, false); +} + +static void afs_upload_to_server_worker(struct work_struct *work) +{ + struct netfs_write_stream *stream = container_of(work, struct netfs_write_stream, work); + struct netfs_write_request *wreq = netfs_stream_to_wreq(stream); + + afs_upload_to_server(stream, wreq); + netfs_put_write_request(wreq, false, netfs_wreq_trace_put_stream_work); +} + +/* + * Add write streams to a write request. We need to add a single stream for + * the server we're writing to. + */ +void afs_add_write_streams(struct netfs_write_request *wreq) +{ + kenter(""); + netfs_set_up_write_stream(wreq, NETFS_UPLOAD_TO_SERVER, + afs_upload_to_server_worker); +} + /* * Extend the region to be written back to include subsequent contiguously * dirty pages if possible, but don't sleep while doing so. @@ -543,129 +555,6 @@ int afs_writepage(struct page *page, struct writeback_control *wbc) return 0; } -/* - * write a region of pages back to the server - */ -static int afs_writepages_region(struct address_space *mapping, - struct writeback_control *wbc, - loff_t start, loff_t end, loff_t *_next) -{ - struct page *page; - ssize_t ret; - int n; - - _enter("%llx,%llx,", start, end); - - do { - pgoff_t index = start / PAGE_SIZE; - - n = find_get_pages_range_tag(mapping, &index, end / PAGE_SIZE, - PAGECACHE_TAG_DIRTY, 1, &page); - if (!n) - break; - - start = (loff_t)page->index * PAGE_SIZE; /* May regress with THPs */ - - _debug("wback %lx", page->index); - - /* At this point we hold neither the i_pages lock nor the - * page lock: the page may be truncated or invalidated - * (changing page->mapping to NULL), or even swizzled - * back from swapper_space to tmpfs file mapping - */ - if (wbc->sync_mode != WB_SYNC_NONE) { - ret = lock_page_killable(page); - if (ret < 0) { - put_page(page); - return ret; - } - } else { - if (!trylock_page(page)) { - put_page(page); - return 0; - } - } - - if (page->mapping != mapping || !PageDirty(page)) { - start += thp_size(page); - unlock_page(page); - put_page(page); - continue; - } - - if (PageWriteback(page) || PageFsCache(page)) { - unlock_page(page); - if (wbc->sync_mode != WB_SYNC_NONE) { - wait_on_page_writeback(page); -#ifdef CONFIG_AFS_FSCACHE - wait_on_page_fscache(page); -#endif - } - put_page(page); - continue; - } - - if (!clear_page_dirty_for_io(page)) - BUG(); - ret = afs_write_back_from_locked_page(mapping, wbc, page, start, end); - put_page(page); - if (ret < 0) { - _leave(" = %zd", ret); - return ret; - } - - start += ret; - - cond_resched(); - } while (wbc->nr_to_write > 0); - - *_next = start; - _leave(" = 0 [%llx]", *_next); - return 0; -} - -/* - * write some of the pending data back to the server - */ -int afs_writepages(struct address_space *mapping, - struct writeback_control *wbc) -{ - struct afs_vnode *vnode = AFS_FS_I(mapping->host); - loff_t start, next; - int ret; - - _enter(""); - - /* We have to be careful as we can end up racing with setattr() - * truncating the pagecache since the caller doesn't take a lock here - * to prevent it. - */ - if (wbc->sync_mode == WB_SYNC_ALL) - down_read(&vnode->validate_lock); - else if (!down_read_trylock(&vnode->validate_lock)) - return 0; - - if (wbc->range_cyclic) { - start = mapping->writeback_index * PAGE_SIZE; - ret = afs_writepages_region(mapping, wbc, start, LLONG_MAX, &next); - if (start > 0 && wbc->nr_to_write > 0 && ret == 0) - ret = afs_writepages_region(mapping, wbc, 0, start, - &next); - mapping->writeback_index = next / PAGE_SIZE; - } else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) { - ret = afs_writepages_region(mapping, wbc, 0, LLONG_MAX, &next); - if (wbc->nr_to_write > 0 && ret == 0) - mapping->writeback_index = next; - } else { - ret = afs_writepages_region(mapping, wbc, - wbc->range_start, wbc->range_end, &next); - } - - up_read(&vnode->validate_lock); - _leave(" = %d", ret); - return ret; -} - /* * flush any dirty pages for this process, and check for write errors. * - the return status from this call provides a reliable indication of diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index fe85581d8ac0..6fdf9e5663f7 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -89,7 +89,13 @@ extern atomic_t netfs_n_rh_write_failed; extern atomic_t netfs_n_rh_write_zskip; extern atomic_t netfs_n_wh_region; extern atomic_t netfs_n_wh_flush_group; +extern atomic_t netfs_n_wh_upload; +extern atomic_t netfs_n_wh_upload_done; +extern atomic_t netfs_n_wh_upload_failed; extern atomic_t netfs_n_wh_wreq; +extern atomic_t netfs_n_wh_write; +extern atomic_t netfs_n_wh_write_done; +extern atomic_t netfs_n_wh_write_failed; static inline void netfs_stat(atomic_t *stat) diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c index 6e9b2a00076d..8926b4230d91 100644 --- a/fs/netfs/objects.c +++ b/fs/netfs/objects.c @@ -119,16 +119,29 @@ struct netfs_write_request *netfs_alloc_write_request(struct address_space *mapp struct inode *inode = mapping->host; struct netfs_i_context *ctx = netfs_i_context(inode); struct netfs_write_request *wreq; + unsigned int n_streams = ctx->n_wstreams, i; + bool cached; - wreq = kzalloc(sizeof(struct netfs_write_request), GFP_KERNEL); + if (!is_dio && netfs_is_cache_enabled(inode)) { + n_streams++; + cached = true; + } + + wreq = kzalloc(struct_size(wreq, streams, n_streams), GFP_KERNEL); if (wreq) { wreq->mapping = mapping; wreq->inode = inode; wreq->netfs_ops = ctx->ops; + wreq->max_streams = n_streams; wreq->debug_id = atomic_inc_return(&debug_ids); + if (cached) + __set_bit(NETFS_WREQ_WRITE_TO_CACHE, &wreq->flags); xa_init(&wreq->buffer); INIT_WORK(&wreq->work, netfs_writeback_worker); + for (i = 0; i < n_streams; i++) + INIT_LIST_HEAD(&wreq->streams[i].subrequests); refcount_set(&wreq->usage, 1); + atomic_set(&wreq->outstanding, 1); ctx->ops->init_wreq(wreq); netfs_stat(&netfs_n_wh_wreq); trace_netfs_ref_wreq(wreq->debug_id, 1, netfs_wreq_trace_new); @@ -170,6 +183,15 @@ void netfs_free_write_request(struct work_struct *work) netfs_stat_d(&netfs_n_wh_wreq); } +/** + * netfs_put_write_request - Drop a reference on a write request descriptor. + * @wreq: The write request to drop + * @was_async: True if being called in a non-sleeping context + * @what: Reason code, to be displayed in trace line + * + * Drop a reference on a write request and schedule it for destruction + * after the last ref is gone. + */ void netfs_put_write_request(struct netfs_write_request *wreq, bool was_async, enum netfs_wreq_trace what) { @@ -189,3 +211,4 @@ void netfs_put_write_request(struct netfs_write_request *wreq, } } } +EXPORT_SYMBOL(netfs_put_write_request); diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index ac2510f8cab0..a02d95bba158 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -30,6 +30,12 @@ atomic_t netfs_n_rh_write_zskip; atomic_t netfs_n_wh_region; atomic_t netfs_n_wh_flush_group; atomic_t netfs_n_wh_wreq; +atomic_t netfs_n_wh_upload; +atomic_t netfs_n_wh_upload_done; +atomic_t netfs_n_wh_upload_failed; +atomic_t netfs_n_wh_write; +atomic_t netfs_n_wh_write_done; +atomic_t netfs_n_wh_write_failed; void netfs_stats_show(struct seq_file *m) { @@ -61,5 +67,13 @@ void netfs_stats_show(struct seq_file *m) atomic_read(&netfs_n_wh_region), atomic_read(&netfs_n_wh_flush_group), atomic_read(&netfs_n_wh_wreq)); + seq_printf(m, "WrHelp : UL=%u us=%u uf=%u\n", + atomic_read(&netfs_n_wh_upload), + atomic_read(&netfs_n_wh_upload_done), + atomic_read(&netfs_n_wh_upload_failed)); + seq_printf(m, "WrHelp : WR=%u ws=%u wf=%u\n", + atomic_read(&netfs_n_wh_write), + atomic_read(&netfs_n_wh_write_done), + atomic_read(&netfs_n_wh_write_failed)); } EXPORT_SYMBOL(netfs_stats_show); diff --git a/fs/netfs/write_back.c b/fs/netfs/write_back.c index 5c779cb12345..15cc0e1b9acf 100644 --- a/fs/netfs/write_back.c +++ b/fs/netfs/write_back.c @@ -11,12 +11,259 @@ #include #include "internal.h" +static int netfs_redirty_iterator(struct xa_state *xas, struct page *page) +{ + __set_page_dirty_nobuffers(page); + account_page_redirty(page); + end_page_writeback(page); + return 0; +} + +/* + * Redirty all the pages in a given range. + */ +static void netfs_redirty_pages(struct netfs_write_request *wreq) +{ + _enter("%lx-%lx", wreq->first, wreq->last); + + netfs_iterate_pinned_pages(wreq->mapping, wreq->first, wreq->last, + netfs_redirty_iterator); + _leave(""); +} + +static int netfs_end_writeback_iterator(struct xa_state *xas, struct page *page) +{ + end_page_writeback(page); + return 0; +} + +/* + * Fix up the dirty list upon completion of write. + */ +static void netfs_fix_up_dirty_list(struct netfs_write_request *wreq) +{ + struct netfs_dirty_region *region = wreq->region, *r; + struct netfs_i_context *ctx = netfs_i_context(wreq->inode); + unsigned long long available_to; + struct list_head *lower, *upper, *p; + + netfs_iterate_pinned_pages(wreq->mapping, wreq->first, wreq->last, + netfs_end_writeback_iterator); + + spin_lock(&ctx->lock); + + /* Find the bounds of the region we're going to make available. */ + lower = &ctx->dirty_regions; + r = region; + list_for_each_entry_continue_reverse(r, &ctx->dirty_regions, dirty_link) { + _debug("- back %x", r->debug_id); + if (r->state >= NETFS_REGION_IS_DIRTY) { + lower = &r->dirty_link; + break; + } + } + + available_to = ULLONG_MAX; + upper = &ctx->dirty_regions; + r = region; + list_for_each_entry_continue(r, &ctx->dirty_regions, dirty_link) { + _debug("- forw %x", r->debug_id); + if (r->state >= NETFS_REGION_IS_DIRTY) { + available_to = r->dirty.start; + upper = &r->dirty_link; + break; + } + } + + /* Remove this region and we can start any waiters that are wholly + * inside of the now-available region. + */ + list_del_init(®ion->dirty_link); + + for (p = lower->next; p != upper; p = p->next) { + r = list_entry(p, struct netfs_dirty_region, dirty_link); + if (r->reserved.end <= available_to) { + smp_store_release(&r->state, NETFS_REGION_IS_ACTIVE); + trace_netfs_dirty(ctx, r, NULL, netfs_dirty_trace_activate); + wake_up_var(&r->state); + } + } + + spin_unlock(&ctx->lock); + netfs_put_dirty_region(ctx, region, netfs_region_trace_put_dirty); +} + +/* + * Process a completed write request once all the component streams have been + * completed. + */ +static void netfs_write_completed(struct netfs_write_request *wreq, bool was_async) +{ + struct netfs_i_context *ctx = netfs_i_context(wreq->inode); + unsigned int s; + + for (s = 0; s < wreq->n_streams; s++) { + struct netfs_write_stream *stream = &wreq->streams[s]; + if (!stream->error) + continue; + switch (stream->dest) { + case NETFS_UPLOAD_TO_SERVER: + /* Depending on the type of failure, this may prevent + * writeback completion unless we're in disconnected + * mode. + */ + if (!wreq->error) + wreq->error = stream->error; + break; + + case NETFS_WRITE_TO_CACHE: + /* Failure doesn't prevent writeback completion unless + * we're in disconnected mode. + */ + if (stream->error != -ENOBUFS) + ctx->ops->invalidate_cache(wreq); + break; + + default: + WARN_ON_ONCE(1); + if (!wreq->error) + wreq->error = -EIO; + return; + } + } + + if (wreq->error) + netfs_redirty_pages(wreq); + else + netfs_fix_up_dirty_list(wreq); + netfs_put_write_request(wreq, was_async, netfs_wreq_trace_put_for_outstanding); +} + +/* + * Deal with the completion of writing the data to the cache. + */ +void netfs_write_stream_completed(void *_stream, ssize_t transferred_or_error, + bool was_async) +{ + struct netfs_write_stream *stream = _stream; + struct netfs_write_request *wreq = netfs_stream_to_wreq(stream); + + if (IS_ERR_VALUE(transferred_or_error)) + stream->error = transferred_or_error; + switch (stream->dest) { + case NETFS_UPLOAD_TO_SERVER: + if (stream->error) + netfs_stat(&netfs_n_wh_upload_failed); + else + netfs_stat(&netfs_n_wh_upload_done); + break; + case NETFS_WRITE_TO_CACHE: + if (stream->error) + netfs_stat(&netfs_n_wh_write_failed); + else + netfs_stat(&netfs_n_wh_write_done); + break; + case NETFS_INVALID_WRITE: + break; + } + + trace_netfs_wstr(stream, netfs_write_stream_complete); + if (atomic_dec_and_test(&wreq->outstanding)) + netfs_write_completed(wreq, was_async); +} +EXPORT_SYMBOL(netfs_write_stream_completed); + +static void netfs_write_to_cache_stream(struct netfs_write_stream *stream, + struct netfs_write_request *wreq) +{ + trace_netfs_wstr(stream, netfs_write_stream_submit); + fscache_write_to_cache(netfs_i_cookie(wreq->inode), wreq->mapping, + wreq->start, wreq->len, wreq->region->i_size, + netfs_write_stream_completed, stream); +} + +static void netfs_write_to_cache_stream_worker(struct work_struct *work) +{ + struct netfs_write_stream *stream = container_of(work, struct netfs_write_stream, work); + struct netfs_write_request *wreq = netfs_stream_to_wreq(stream); + + netfs_write_to_cache_stream(stream, wreq); + netfs_put_write_request(wreq, false, netfs_wreq_trace_put_stream_work); +} + +/** + * netfs_set_up_write_stream - Allocate, set up and launch a write stream. + * @wreq: The write request this is storing from. + * @dest: The destination type + * @worker: The worker function to handle the write(s) + * + * Allocate the next write stream from a write request and queue the worker to + * make it happen. + */ +void netfs_set_up_write_stream(struct netfs_write_request *wreq, + enum netfs_write_dest dest, work_func_t worker) +{ + struct netfs_write_stream *stream; + unsigned int s = wreq->n_streams++; + + kenter("%u,%u", s, dest); + + stream = &wreq->streams[s]; + stream->dest = dest; + stream->index = s; + INIT_WORK(&stream->work, worker); + atomic_inc(&wreq->outstanding); + trace_netfs_wstr(stream, netfs_write_stream_setup); + + switch (stream->dest) { + case NETFS_UPLOAD_TO_SERVER: + netfs_stat(&netfs_n_wh_upload); + break; + case NETFS_WRITE_TO_CACHE: + netfs_stat(&netfs_n_wh_write); + break; + case NETFS_INVALID_WRITE: + BUG(); + } + + netfs_get_write_request(wreq, netfs_wreq_trace_get_stream_work); + if (!queue_work(system_unbound_wq, &stream->work)) + netfs_put_write_request(wreq, false, netfs_wreq_trace_put_discard); +} +EXPORT_SYMBOL(netfs_set_up_write_stream); + +/* + * Set up a stream for writing to the cache. + */ +static void netfs_set_up_write_to_cache(struct netfs_write_request *wreq) +{ + netfs_set_up_write_stream(wreq, NETFS_WRITE_TO_CACHE, + netfs_write_to_cache_stream_worker); +} + /* * Process a write request. + * + * All the pages in the bounding box have had a ref taken on them and those + * covering the dirty region have been marked as being written back and their + * dirty bits provisionally cleared. */ static void netfs_writeback(struct netfs_write_request *wreq) { - kdebug("--- WRITE ---"); + struct netfs_i_context *ctx = netfs_i_context(wreq->inode); + + kenter(""); + + /* TODO: Encrypt or compress the region as appropriate */ + + /* ->outstanding > 0 carries a ref */ + netfs_get_write_request(wreq, netfs_wreq_trace_get_for_outstanding); + + if (test_bit(NETFS_WREQ_WRITE_TO_CACHE, &wreq->flags)) + netfs_set_up_write_to_cache(wreq); + ctx->ops->add_write_streams(wreq); + if (atomic_dec_and_test(&wreq->outstanding)) + netfs_write_completed(wreq, false); } void netfs_writeback_worker(struct work_struct *work) diff --git a/fs/netfs/write_helper.c b/fs/netfs/write_helper.c index a8c58eaa84d0..fa048e3882ea 100644 --- a/fs/netfs/write_helper.c +++ b/fs/netfs/write_helper.c @@ -139,18 +139,30 @@ static enum netfs_write_compatibility netfs_write_compatibility( struct netfs_dirty_region *old, struct netfs_dirty_region *candidate) { - if (old->type == NETFS_REGION_DIO || - old->type == NETFS_REGION_DSYNC || - old->state >= NETFS_REGION_IS_FLUSHING || - /* The bounding boxes of DSYNC writes can overlap with those of - * other DSYNC writes and ordinary writes. - */ + /* Regions being actively flushed can't be merged with */ + if (old->state >= NETFS_REGION_IS_FLUSHING || candidate->group != old->group || - old->group->flush) + old->group->flush) { + kleave(" = INCOM [flush]"); return NETFS_WRITES_INCOMPATIBLE; + } + + /* The bounding boxes of DSYNC writes can overlap with those of other + * DSYNC writes and ordinary writes. DIO writes cannot overlap at all. + */ + if (candidate->type == NETFS_REGION_DIO || + old->type == NETFS_REGION_DIO || + old->type == NETFS_REGION_DSYNC) { + kleave(" = INCOM [dio/dsy]"); + return NETFS_WRITES_INCOMPATIBLE; + } + if (!ctx->ops->is_write_compatible) { - if (candidate->type == NETFS_REGION_DSYNC) + if (candidate->type == NETFS_REGION_DSYNC) { + kleave(" = SUPER [dsync]"); return NETFS_WRITES_SUPERSEDE; + } + kleave(" = COMPT"); return NETFS_WRITES_COMPATIBLE; } return ctx->ops->is_write_compatible(ctx, old, candidate); diff --git a/fs/netfs/xa_iterator.h b/fs/netfs/xa_iterator.h index 3f37827f0f99..67e1daa964ab 100644 --- a/fs/netfs/xa_iterator.h +++ b/fs/netfs/xa_iterator.h @@ -5,6 +5,37 @@ * Written by David Howells (dhowells@redhat.com) */ +/* + * Iterate over a set of pages that we hold pinned with the writeback flag. + * The iteration function may drop the RCU read lock, but should call + * xas_pause() before it does so. + */ +#define netfs_iterate_pinned_pages(MAPPING, START, END, ITERATOR, ...) \ + ({ \ + struct page *page; \ + pgoff_t __it_start = (START); \ + pgoff_t __it_end = (END); \ + int ret = 0; \ + \ + XA_STATE(xas, &(MAPPING)->i_pages, __it_start); \ + rcu_read_lock(); \ + for (page = xas_load(&xas); page; page = xas_next_entry(&xas, __it_end)) { \ + if (xas_retry(&xas, page)) \ + continue; \ + if (xa_is_value(page)) \ + break; \ + if (unlikely(page != xas_reload(&xas))) { \ + xas_reset(&xas); \ + continue; \ + } \ + ret = ITERATOR(&xas, page, ##__VA_ARGS__); \ + if (ret < 0) \ + break; \ + } \ + rcu_read_unlock(); \ + ret; \ + }) + /* * Iterate over a range of pages. xarray locks are not held over the iterator * function, so it can sleep if necessary. The start and end positions are diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 9f874e7ed45a..9d50c2933863 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -19,6 +19,8 @@ #include #include +enum netfs_wreq_trace; + /* * Overload PG_private_2 to give us PG_fscache - this is used to indicate that * a page is currently backed by a local disk cache @@ -180,6 +182,7 @@ struct netfs_i_context { unsigned int wsize; /* Maximum write size */ unsigned int bsize; /* Min block size for bounding box */ unsigned int inval_counter; /* Number of invalidations made */ + unsigned char n_wstreams; /* Number of write streams to allocate */ }; /* @@ -242,12 +245,53 @@ struct netfs_dirty_region { refcount_t ref; }; +enum netfs_write_dest { + NETFS_UPLOAD_TO_SERVER, + NETFS_WRITE_TO_CACHE, + NETFS_INVALID_WRITE, +} __mode(byte); + +/* + * Descriptor for a write subrequest. Each subrequest represents an individual + * write to a server or a cache. + */ +struct netfs_write_subrequest { + struct netfs_write_request *wreq; /* Supervising write request */ + struct list_head stream_link; /* Link in stream->subrequests */ + loff_t start; /* Where to start the I/O */ + size_t len; /* Size of the I/O */ + size_t transferred; /* Amount of data transferred */ + refcount_t usage; + short error; /* 0 or error that occurred */ + unsigned short debug_index; /* Index in list (for debugging output) */ + unsigned char stream_index; /* Which stream we're part of */ + enum netfs_write_dest dest; /* Where to write to */ +}; + +/* + * Descriptor for a write stream. Each stream represents a sequence of writes + * to a destination, where a stream covers the entirety of the write request. + * All of a stream goes to the same destination - and that destination might be + * a server, a cache, a journal. + * + * Each stream may be split up into separate subrequests according to different + * rules. + */ +struct netfs_write_stream { + struct work_struct work; + struct list_head subrequests; /* The subrequests comprising this stream */ + enum netfs_write_dest dest; /* Where to write to */ + unsigned char index; /* Index in wreq->streams[] */ + short error; /* 0 or error that occurred */ +}; + /* * Descriptor for a write request. This is used to manage the preparation and * storage of a sequence of dirty data - its compression/encryption and its * writing to one or more servers and the cache. * - * The prepared data is buffered here. + * The prepared data is buffered here, and then the streams are used to + * distribute the buffer to various destinations (servers, caches, etc.). */ struct netfs_write_request { struct work_struct work; @@ -260,15 +304,20 @@ struct netfs_write_request { struct list_head write_link; /* Link in i_context->write_requests */ void *netfs_priv; /* Private data for the netfs */ unsigned int debug_id; + unsigned char max_streams; /* Number of streams allocated */ + unsigned char n_streams; /* Number of streams in use */ short error; /* 0 or error that occurred */ loff_t i_size; /* Size of the file */ loff_t start; /* Start position */ size_t len; /* Length of the request */ pgoff_t first; /* First page included */ pgoff_t last; /* Last page included */ + atomic_t outstanding; /* Number of outstanding writes */ refcount_t usage; unsigned long flags; +#define NETFS_WREQ_WRITE_TO_CACHE 0 /* Need to write to the cache */ const struct netfs_request_ops *netfs_ops; + struct netfs_write_stream streams[]; /* Individual write streams */ }; enum netfs_write_compatibility { @@ -307,6 +356,8 @@ struct netfs_request_ops { /* Write request handling */ void (*init_wreq)(struct netfs_write_request *wreq); + void (*add_write_streams)(struct netfs_write_request *wreq); + void (*invalidate_cache)(struct netfs_write_request *wreq); }; /* @@ -363,6 +414,12 @@ extern int netfs_releasepage(struct page *page, gfp_t gfp_flags); extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); extern void netfs_stats_show(struct seq_file *); extern struct netfs_flush_group *netfs_new_flush_group(struct inode *, void *); +extern void netfs_set_up_write_stream(struct netfs_write_request *wreq, + enum netfs_write_dest dest, work_func_t worker); +extern void netfs_put_write_request(struct netfs_write_request *wreq, + bool was_async, enum netfs_wreq_trace what); +extern void netfs_write_stream_completed(void *_stream, ssize_t transferred_or_error, + bool was_async); /** * netfs_i_context - Get the netfs inode context from the inode @@ -407,4 +464,10 @@ static inline struct fscache_cookie *netfs_i_cookie(struct inode *inode) #endif } +static inline +struct netfs_write_request *netfs_stream_to_wreq(struct netfs_write_stream *stream) +{ + return container_of(stream, struct netfs_write_request, streams[stream->index]); +} + #endif /* _LINUX_NETFS_H */ diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index e70abb5033e6..aa002725b209 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -59,6 +59,7 @@ enum netfs_failure { enum netfs_dirty_trace { netfs_dirty_trace_active, + netfs_dirty_trace_activate, netfs_dirty_trace_commit, netfs_dirty_trace_complete, netfs_dirty_trace_flush_conflict, @@ -82,6 +83,7 @@ enum netfs_dirty_trace { enum netfs_region_trace { netfs_region_trace_get_dirty, netfs_region_trace_get_wreq, + netfs_region_trace_put_dirty, netfs_region_trace_put_discard, netfs_region_trace_put_merged, netfs_region_trace_put_wreq, @@ -92,12 +94,22 @@ enum netfs_region_trace { enum netfs_wreq_trace { netfs_wreq_trace_free, + netfs_wreq_trace_get_for_outstanding, + netfs_wreq_trace_get_stream_work, netfs_wreq_trace_put_discard, + netfs_wreq_trace_put_for_outstanding, + netfs_wreq_trace_put_stream_work, netfs_wreq_trace_put_work, netfs_wreq_trace_see_work, netfs_wreq_trace_new, }; +enum netfs_write_stream_trace { + netfs_write_stream_complete, + netfs_write_stream_setup, + netfs_write_stream_submit, +}; + #endif #define netfs_read_traces \ @@ -156,6 +168,7 @@ enum netfs_wreq_trace { #define netfs_dirty_traces \ EM(netfs_dirty_trace_active, "ACTIVE ") \ + EM(netfs_dirty_trace_activate, "ACTIVATE ") \ EM(netfs_dirty_trace_commit, "COMMIT ") \ EM(netfs_dirty_trace_complete, "COMPLETE ") \ EM(netfs_dirty_trace_flush_conflict, "FLSH CONFL") \ @@ -178,6 +191,7 @@ enum netfs_wreq_trace { #define netfs_region_traces \ EM(netfs_region_trace_get_dirty, "GET DIRTY ") \ EM(netfs_region_trace_get_wreq, "GET WREQ ") \ + EM(netfs_region_trace_put_dirty, "PUT DIRTY ") \ EM(netfs_region_trace_put_discard, "PUT DISCARD") \ EM(netfs_region_trace_put_merged, "PUT MERGED ") \ EM(netfs_region_trace_put_wreq, "PUT WREQ ") \ @@ -187,11 +201,24 @@ enum netfs_wreq_trace { #define netfs_wreq_traces \ EM(netfs_wreq_trace_free, "FREE ") \ + EM(netfs_wreq_trace_get_for_outstanding,"GET OUTSTND") \ + EM(netfs_wreq_trace_get_stream_work, "GET S-WORK ") \ EM(netfs_wreq_trace_put_discard, "PUT DISCARD") \ + EM(netfs_wreq_trace_put_for_outstanding,"PUT OUTSTND") \ + EM(netfs_wreq_trace_put_stream_work, "PUT S-WORK ") \ EM(netfs_wreq_trace_put_work, "PUT WORK ") \ EM(netfs_wreq_trace_see_work, "SEE WORK ") \ E_(netfs_wreq_trace_new, "NEW ") +#define netfs_write_destinations \ + EM(NETFS_UPLOAD_TO_SERVER, "UPLD") \ + EM(NETFS_WRITE_TO_CACHE, "WRIT") \ + E_(NETFS_INVALID_WRITE, "INVL") + +#define netfs_write_stream_traces \ + EM(netfs_write_stream_complete, "DONE ") \ + EM(netfs_write_stream_setup, "SETUP") \ + E_(netfs_write_stream_submit, "SUBMT") /* * Export enum symbols via userspace. @@ -210,6 +237,8 @@ netfs_region_types; netfs_region_states; netfs_dirty_traces; netfs_wreq_traces; +netfs_write_destinations; +netfs_write_stream_traces; /* * Now redefine the EM() and E_() macros to map the enums to the strings that @@ -507,6 +536,38 @@ TRACE_EVENT(netfs_ref_wreq, __entry->ref) ); +TRACE_EVENT(netfs_wstr, + TP_PROTO(struct netfs_write_stream *stream, + enum netfs_write_stream_trace what), + + TP_ARGS(stream, what), + + TP_STRUCT__entry( + __field(unsigned int, wreq ) + __field(unsigned char, stream ) + __field(short, error ) + __field(unsigned short, flags ) + __field(enum netfs_write_dest, dest ) + __field(enum netfs_write_stream_trace, what ) + ), + + TP_fast_assign( + struct netfs_write_request *wreq = + container_of(stream, struct netfs_write_request, streams[stream->index]); + __entry->wreq = wreq->debug_id; + __entry->stream = stream->index; + __entry->error = stream->error; + __entry->dest = stream->dest; + __entry->what = what; + ), + + TP_printk("W=%08x[%u] %s %s e=%d", + __entry->wreq, __entry->stream, + __print_symbolic(__entry->what, netfs_write_stream_traces), + __print_symbolic(__entry->dest, netfs_write_destinations), + __entry->error) + ); + #endif /* _TRACE_NETFS_H */ /* This part must be outside protection */ From patchwork Wed Jul 21 13:46:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12390955 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80EBEC6377A for ; Wed, 21 Jul 2021 13:47:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6C0AF60FF4 for ; Wed, 21 Jul 2021 13:47:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238691AbhGUNGm (ORCPT ); Wed, 21 Jul 2021 09:06:42 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:30534 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238815AbhGUNGX (ORCPT ); Wed, 21 Jul 2021 09:06:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875220; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=btFU5FSVk+iL4SOITrWVCSBkmK7PP/wEIEgBG7XWJ4I=; b=G/xbttuFOQ4mk8TAdlHLdqLJNXdKw/CjiGGws2oEEJvRywckEnNimENPM8d1cTuUPi9h2B Ku+1VWVLsn7Su99z7kYu9tujujuojhyV8ZNiJulxhFstO8jggJvb7zMD/54gQP7R55r0bP WsM4d8IxBb6ByGRxXAoRWVhR7TOIUao= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-234-GzAJ7hTqMgKNefHi6qc6oQ-1; Wed, 21 Jul 2021 09:46:58 -0400 X-MC-Unique: GzAJ7hTqMgKNefHi6qc6oQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id ABE1D10150A0; Wed, 21 Jul 2021 13:46:56 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id D66555D9DD; Wed, 21 Jul 2021 13:46:49 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 10/12] netfs: Do encryption in write preparatory phase From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:46:48 +0100 Message-ID: <162687520852.276387.2868702028972631448.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org When dealing with an encrypted or compressed file, we gather together sufficient pages from the pagecache to constitute a logical crypto/compression block, allocate a bounce buffer and then ask the filesystem to encrypt/compress between the buffers. The bounce buffer is then passed to the filesystem to upload. The network filesystem must set a flag to indicate what service is desired and when the logical blocksize will be. The netfs library iterates through each block to be processed, providing a pair of scatterlists to describe the start and end buffers. Note that it should be possible in future to encrypt/compress DIO writes also by this same mechanism. A mock-up block-encryption function for afs is included for illustration. Signed-off-by: David Howells --- fs/afs/file.c | 1 fs/afs/inode.c | 6 ++ fs/afs/internal.h | 5 ++ fs/afs/super.c | 7 ++ fs/afs/write.c | 49 +++++++++++++++ fs/netfs/Makefile | 3 + fs/netfs/internal.h | 5 ++ fs/netfs/write_back.c | 6 ++ fs/netfs/write_prep.c | 160 +++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 6 ++ 10 files changed, 246 insertions(+), 2 deletions(-) create mode 100644 fs/netfs/write_prep.c diff --git a/fs/afs/file.c b/fs/afs/file.c index 22030d5191cd..8a6be8d2b426 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -404,6 +404,7 @@ const struct netfs_request_ops afs_req_ops = { .update_i_size = afs_update_i_size, .init_wreq = afs_init_wreq, .add_write_streams = afs_add_write_streams, + .encrypt_block = afs_encrypt_block, }; int afs_write_inode(struct inode *inode, struct writeback_control *wbc) diff --git a/fs/afs/inode.c b/fs/afs/inode.c index a6ae031461c7..7cad099c3bb1 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -452,10 +452,16 @@ static void afs_get_inode_cache(struct afs_vnode *vnode) static void afs_set_netfs_context(struct afs_vnode *vnode) { struct netfs_i_context *ctx = netfs_i_context(&vnode->vfs_inode); + struct afs_super_info *as = AFS_FS_S(vnode->vfs_inode.i_sb); netfs_i_context_init(&vnode->vfs_inode, &afs_req_ops); ctx->n_wstreams = 1; ctx->bsize = PAGE_SIZE; + if (as->fscrypt) { + kdebug("ENCRYPT!"); + ctx->crypto_bsize = ilog2(4096); + __set_bit(NETFS_ICTX_ENCRYPTED, &ctx->flags); + } } /* diff --git a/fs/afs/internal.h b/fs/afs/internal.h index 32a36b96cc9b..b5f7c3659a0a 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -51,6 +51,7 @@ struct afs_fs_context { bool autocell; /* T if set auto mount operation */ bool dyn_root; /* T if dynamic root */ bool no_cell; /* T if the source is "none" (for dynroot) */ + bool fscrypt; /* T if content encryption is engaged */ enum afs_flock_mode flock_mode; /* Partial file-locking emulation mode */ afs_voltype_t type; /* type of volume requested */ unsigned int volnamesz; /* size of volume name */ @@ -230,6 +231,7 @@ struct afs_super_info { struct afs_volume *volume; /* volume record */ enum afs_flock_mode flock_mode:8; /* File locking emulation mode */ bool dyn_root; /* True if dynamic root */ + bool fscrypt; /* T if content encryption is engaged */ }; static inline struct afs_super_info *AFS_FS_S(struct super_block *sb) @@ -1518,6 +1520,9 @@ extern void afs_prune_wb_keys(struct afs_vnode *); extern int afs_launder_page(struct page *); extern ssize_t afs_file_direct_write(struct kiocb *, struct iov_iter *); extern void afs_add_write_streams(struct netfs_write_request *); +extern bool afs_encrypt_block(struct netfs_write_request *, loff_t, size_t, + struct scatterlist *, unsigned int, + struct scatterlist *, unsigned int); /* * xattr.c diff --git a/fs/afs/super.c b/fs/afs/super.c index 29c1178beb72..53f35ec7b17b 100644 --- a/fs/afs/super.c +++ b/fs/afs/super.c @@ -71,6 +71,7 @@ enum afs_param { Opt_autocell, Opt_dyn, Opt_flock, + Opt_fscrypt, Opt_source, }; @@ -86,6 +87,7 @@ static const struct fs_parameter_spec afs_fs_parameters[] = { fsparam_flag ("autocell", Opt_autocell), fsparam_flag ("dyn", Opt_dyn), fsparam_enum ("flock", Opt_flock, afs_param_flock), + fsparam_flag ("fscrypt", Opt_fscrypt), fsparam_string("source", Opt_source), {} }; @@ -342,6 +344,10 @@ static int afs_parse_param(struct fs_context *fc, struct fs_parameter *param) ctx->flock_mode = result.uint_32; break; + case Opt_fscrypt: + ctx->fscrypt = true; + break; + default: return -EINVAL; } @@ -516,6 +522,7 @@ static struct afs_super_info *afs_alloc_sbi(struct fs_context *fc) as->cell = afs_use_cell(ctx->cell, afs_cell_trace_use_sbi); as->volume = afs_get_volume(ctx->volume, afs_volume_trace_get_alloc_sbi); + as->fscrypt = ctx->fscrypt; } } return as; diff --git a/fs/afs/write.c b/fs/afs/write.c index 0668389f3466..d2b7cb1a4668 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include "internal.h" @@ -293,6 +294,54 @@ void afs_add_write_streams(struct netfs_write_request *wreq) afs_upload_to_server_worker); } +/* + * Encrypt part of a write for fscrypt. + */ +bool afs_encrypt_block(struct netfs_write_request *wreq, loff_t pos, size_t len, + struct scatterlist *source_sg, unsigned int n_source, + struct scatterlist *dest_sg, unsigned int n_dest) +{ + struct crypto_sync_skcipher *ci; + struct crypto_skcipher *tfm; + struct skcipher_request *req; + u8 session_key[8], iv[8]; + int ret; + + kenter("%llx", pos); + + ci = crypto_alloc_sync_skcipher("pcbc(fcrypt)", 0, 0); + if (IS_ERR(ci)) { + _debug("no cipher"); + ret = PTR_ERR(ci); + goto error; + } + tfm= &ci->base; + + ret = crypto_sync_skcipher_setkey(ci, session_key, sizeof(session_key)); + if (ret < 0) + goto error_ci; + + ret = -ENOMEM; + req = skcipher_request_alloc(tfm, GFP_NOFS); + if (!req) + goto error_ci; + + memset(iv, 0, sizeof(iv)); + skcipher_request_set_sync_tfm(req, ci); + skcipher_request_set_callback(req, 0, NULL, NULL); + skcipher_request_set_crypt(req, source_sg, dest_sg, len, iv); + ret = crypto_skcipher_encrypt(req); + + skcipher_request_free(req); +error_ci: + crypto_free_sync_skcipher(ci); +error: + if (ret < 0) + wreq->error = ret; + kleave(" = %d", ret); + return ret == 0; +} + /* * Extend the region to be written back to include subsequent contiguously * dirty pages if possible, but don't sleep while doing so. diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index a201fd7b22cf..a7c3a9173ac0 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -4,7 +4,8 @@ netfs-y := \ objects.o \ read_helper.o \ write_back.o \ - write_helper.o + write_helper.o \ + write_prep.o # dio_helper.o netfs-$(CONFIG_NETFS_STATS) += stats.o diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index 6fdf9e5663f7..381ca64062eb 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -65,6 +65,11 @@ void netfs_flush_region(struct netfs_i_context *ctx, struct netfs_dirty_region *region, enum netfs_dirty_trace why); +/* + * write_prep.c + */ +bool netfs_prepare_wreq(struct netfs_write_request *wreq); + /* * stats.c */ diff --git a/fs/netfs/write_back.c b/fs/netfs/write_back.c index 15cc0e1b9acf..7363c3324602 100644 --- a/fs/netfs/write_back.c +++ b/fs/netfs/write_back.c @@ -254,7 +254,9 @@ static void netfs_writeback(struct netfs_write_request *wreq) kenter(""); - /* TODO: Encrypt or compress the region as appropriate */ + if (test_bit(NETFS_ICTX_ENCRYPTED, &ctx->flags) && + !netfs_prepare_wreq(wreq)) + goto out; /* ->outstanding > 0 carries a ref */ netfs_get_write_request(wreq, netfs_wreq_trace_get_for_outstanding); @@ -262,6 +264,8 @@ static void netfs_writeback(struct netfs_write_request *wreq) if (test_bit(NETFS_WREQ_WRITE_TO_CACHE, &wreq->flags)) netfs_set_up_write_to_cache(wreq); ctx->ops->add_write_streams(wreq); + +out: if (atomic_dec_and_test(&wreq->outstanding)) netfs_write_completed(wreq, false); } diff --git a/fs/netfs/write_prep.c b/fs/netfs/write_prep.c new file mode 100644 index 000000000000..f0a9dfd92a18 --- /dev/null +++ b/fs/netfs/write_prep.c @@ -0,0 +1,160 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Network filesystem high-level write support. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include "internal.h" + +/* + * Allocate a bunch of pages and add them into the xarray buffer starting at + * the given index. + */ +static int netfs_alloc_buffer(struct xarray *xa, pgoff_t index, unsigned int nr_pages) +{ + struct page *page; + unsigned int n; + int ret; + LIST_HEAD(list); + + kenter(""); + + n = alloc_pages_bulk_list(GFP_NOIO, nr_pages, &list); + if (n < nr_pages) { + ret = -ENOMEM; + } + + while ((page = list_first_entry_or_null(&list, struct page, lru))) { + list_del(&page->lru); + ret = xa_insert(xa, index++, page, GFP_NOIO); + if (ret < 0) + break; + } + + while ((page = list_first_entry_or_null(&list, struct page, lru))) { + list_del(&page->lru); + __free_page(page); + } + return ret; +} + +/* + * Populate a scatterlist from pages in an xarray. + */ +static int netfs_xarray_to_sglist(struct xarray *xa, loff_t pos, size_t len, + struct scatterlist *sg, unsigned int n_sg) +{ + struct scatterlist *p = sg; + struct page *head = NULL; + size_t seg, offset, skip = 0; + loff_t start = pos; + pgoff_t index = start >> PAGE_SHIFT; + int j; + + XA_STATE(xas, xa, index); + + sg_init_table(sg, n_sg); + + rcu_read_lock(); + + xas_for_each(&xas, head, ULONG_MAX) { + kdebug("LOAD %lx %px", head->index, head); + if (xas_retry(&xas, head)) + continue; + if (WARN_ON(xa_is_value(head)) || WARN_ON(PageHuge(head))) + break; + for (j = (head->index < index) ? index - head->index : 0; + j < thp_nr_pages(head); j++ + ) { + offset = (pos + skip) & ~PAGE_MASK; + seg = min(len, PAGE_SIZE - offset); + + kdebug("[%zx] %lx %zx @%zx", p - sg, (head + j)->index, seg, offset); + sg_set_page(p++, head + j, seg, offset); + + len -= seg; + skip += seg; + if (len == 0) + break; + } + if (len == 0) + break; + } + + rcu_read_unlock(); + if (len > 0) { + WARN_ON(len > 0); + return -EIO; + } + + sg_mark_end(p - 1); + kleave(" = %zd", p - sg); + return p - sg; +} + +/* + * Perform content encryption on the data to be written before we write it to + * the server and the cache. + */ +static bool netfs_prepare_encrypt(struct netfs_write_request *wreq) +{ + struct netfs_i_context *ctx = netfs_i_context(wreq->inode); + struct scatterlist source_sg[16], dest_sg[16]; + unsigned int bsize = 1 << ctx->crypto_bsize, n_source, n_dest; + loff_t pos; + size_t n; + int ret; + + ret = netfs_alloc_buffer(&wreq->buffer, wreq->first, wreq->last - wreq->first + 1); + if (ret < 0) + goto error; + + pos = round_down(wreq->start, bsize); + n = round_up(wreq->start + wreq->len, bsize) - pos; + for (; n > 0; n -= bsize, pos += bsize) { + ret = netfs_xarray_to_sglist(&wreq->mapping->i_pages, pos, bsize, + source_sg, ARRAY_SIZE(source_sg)); + if (ret < 0) + goto error; + n_source = ret; + + ret = netfs_xarray_to_sglist(&wreq->buffer, pos, bsize, + dest_sg, ARRAY_SIZE(dest_sg)); + if (ret < 0) + goto error; + n_dest = ret; + + ret = ctx->ops->encrypt_block(wreq, pos, bsize, + source_sg, n_source, dest_sg, n_dest); + if (ret < 0) + goto error; + } + + iov_iter_xarray(&wreq->source, WRITE, &wreq->buffer, wreq->start, wreq->len); + kleave(" = t"); + return true; + +error: + wreq->error = ret; + kleave(" = f [%d]", ret); + return false; +} + +/* + * Prepare a write request for writing. All the pages in the bounding box have + * had a ref taken on them and those covering the dirty region have been marked + * as being written back and their dirty bits provisionally cleared. + */ +bool netfs_prepare_wreq(struct netfs_write_request *wreq) +{ + struct netfs_i_context *ctx = netfs_i_context(wreq->inode); + + if (test_bit(NETFS_ICTX_ENCRYPTED, &ctx->flags)) + return netfs_prepare_encrypt(wreq); + return true; +} diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 9d50c2933863..6acf3fb170c3 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -19,6 +19,7 @@ #include #include +struct scatterlist; enum netfs_wreq_trace; /* @@ -177,12 +178,14 @@ struct netfs_i_context { #endif unsigned long flags; #define NETFS_ICTX_NEW_CONTENT 0 /* Set if file has new content (create/trunc-0) */ +#define NETFS_ICTX_ENCRYPTED 1 /* The file contents are encrypted */ spinlock_t lock; unsigned int rsize; /* Maximum read size */ unsigned int wsize; /* Maximum write size */ unsigned int bsize; /* Min block size for bounding box */ unsigned int inval_counter; /* Number of invalidations made */ unsigned char n_wstreams; /* Number of write streams to allocate */ + unsigned char crypto_bsize; /* log2 of crypto block size */ }; /* @@ -358,6 +361,9 @@ struct netfs_request_ops { void (*init_wreq)(struct netfs_write_request *wreq); void (*add_write_streams)(struct netfs_write_request *wreq); void (*invalidate_cache)(struct netfs_write_request *wreq); + bool (*encrypt_block)(struct netfs_write_request *wreq, loff_t pos, size_t len, + struct scatterlist *source_sg, unsigned int n_source, + struct scatterlist *dest_sg, unsigned int n_dest); }; /* From patchwork Wed Jul 21 13:47:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12390959 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF1DCC636CE for ; Wed, 21 Jul 2021 13:47:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BF1A660FF4 for ; Wed, 21 Jul 2021 13:47:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238874AbhGUNGy (ORCPT ); Wed, 21 Jul 2021 09:06:54 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:34256 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238756AbhGUNGl (ORCPT ); Wed, 21 Jul 2021 09:06:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875237; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7b17LEAqphmyTt8gMhglRAPvylMkn7ZjLScOd5X+QhQ=; b=Wxv7U+SA5zdyaYpDf3JI5AScgxmQzKLgVVyanv/oBjfIfyUhIT5CB0XqUUFxVzX5nAU4rM oBaZyC113J0/IWBCFSXqqlHtpZf5yM6r38uPpH44bjNMbcjG43V7P8OFNtieby/EZRaPyK GOLa2iRmYJmNLrVfGG7oGULG8QeKR5o= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-479-LqENnw8hMdOglizr9-Q89w-1; Wed, 21 Jul 2021 09:47:13 -0400 X-MC-Unique: LqENnw8hMdOglizr9-Q89w-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1C0C1800581; Wed, 21 Jul 2021 13:47:10 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 675D85D6D1; Wed, 21 Jul 2021 13:47:02 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 11/12] netfs: Put a list of regions in /proc/fs/netfs/regions From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:47:01 +0100 Message-ID: <162687522190.276387.10953470388038836276.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org --- fs/netfs/Makefile | 1 fs/netfs/internal.h | 24 +++++++++++ fs/netfs/main.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++ fs/netfs/objects.c | 6 ++- fs/netfs/write_helper.c | 4 ++ include/linux/netfs.h | 1 6 files changed, 139 insertions(+), 1 deletion(-) create mode 100644 fs/netfs/main.c diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index a7c3a9173ac0..62dad3d7bea0 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 netfs-y := \ + main.o \ objects.o \ read_helper.o \ write_back.o \ diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index 381ca64062eb..a9ec6591f90a 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -22,6 +22,30 @@ ssize_t netfs_file_direct_write(struct netfs_dirty_region *region, struct kiocb *iocb, struct iov_iter *from); +/* + * main.c + */ +extern struct list_head netfs_regions; +extern spinlock_t netfs_regions_lock; + +#ifdef CONFIG_PROC_FS +static inline void netfs_proc_add_region(struct netfs_dirty_region *region) +{ + spin_lock(&netfs_regions_lock); + list_add_tail_rcu(®ion->proc_link, &netfs_regions); + spin_unlock(&netfs_regions_lock); +} +static inline void netfs_proc_del_region(struct netfs_dirty_region *region) +{ + spin_lock(&netfs_regions_lock); + list_del_rcu(®ion->proc_link); + spin_unlock(&netfs_regions_lock); +} +#else +static inline void netfs_proc_add_region(struct netfs_dirty_region *region) {} +static inline void netfs_proc_del_region(struct netfs_dirty_region *region) {} +#endif + /* * objects.c */ diff --git a/fs/netfs/main.c b/fs/netfs/main.c new file mode 100644 index 000000000000..125b570efefd --- /dev/null +++ b/fs/netfs/main.c @@ -0,0 +1,104 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Network filesystem library. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include +#include "internal.h" + +#ifdef CONFIG_PROC_FS +LIST_HEAD(netfs_regions); +DEFINE_SPINLOCK(netfs_regions_lock); + +static const char netfs_proc_region_states[] = "PRADFC"; +static const char *netfs_proc_region_types[] = { + [NETFS_REGION_ORDINARY] = "ORD ", + [NETFS_REGION_DIO] = "DIOW", + [NETFS_REGION_DSYNC] = "DSYN", +}; + +/* + * Generate a list of regions in /proc/fs/netfs/regions + */ +static int netfs_regions_seq_show(struct seq_file *m, void *v) +{ + struct netfs_dirty_region *region; + + if (v == &netfs_regions) { + seq_puts(m, + "REGION REF TYPE S FL DEV INODE DIRTY, BOUNDS, RESV\n" + "======== === ==== = == ===== ======== ==============================\n" + ); + return 0; + } + + region = list_entry(v, struct netfs_dirty_region, proc_link); + seq_printf(m, + "%08x %3d %s %c %2lx %02x:%02x %8x %04llx-%04llx %04llx-%04llx %04llx-%04llx\n", + region->debug_id, + refcount_read(®ion->ref), + netfs_proc_region_types[region->type], + netfs_proc_region_states[region->state], + region->flags, + 0, 0, 0, + region->dirty.start, region->dirty.end, + region->bounds.start, region->bounds.end, + region->reserved.start, region->reserved.end); + return 0; +} + +static void *netfs_regions_seq_start(struct seq_file *m, loff_t *_pos) + __acquires(rcu) +{ + rcu_read_lock(); + return seq_list_start_head(&netfs_regions, *_pos); +} + +static void *netfs_regions_seq_next(struct seq_file *m, void *v, loff_t *_pos) +{ + return seq_list_next(v, &netfs_regions, _pos); +} + +static void netfs_regions_seq_stop(struct seq_file *m, void *v) + __releases(rcu) +{ + rcu_read_unlock(); +} + +const struct seq_operations netfs_regions_seq_ops = { + .start = netfs_regions_seq_start, + .next = netfs_regions_seq_next, + .stop = netfs_regions_seq_stop, + .show = netfs_regions_seq_show, +}; +#endif /* CONFIG_PROC_FS */ + +static int __init netfs_init(void) +{ + if (!proc_mkdir("fs/netfs", NULL)) + goto error; + + if (!proc_create_seq("fs/netfs/regions", S_IFREG | 0444, NULL, + &netfs_regions_seq_ops)) + goto error_proc; + + return 0; + +error_proc: + remove_proc_entry("fs/netfs", NULL); +error: + return -ENOMEM; +} +fs_initcall(netfs_init); + +static void __exit netfs_exit(void) +{ + remove_proc_entry("fs/netfs", NULL); +} +module_exit(netfs_exit); diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c index 8926b4230d91..1149f12ca8c9 100644 --- a/fs/netfs/objects.c +++ b/fs/netfs/objects.c @@ -60,8 +60,10 @@ struct netfs_dirty_region *netfs_alloc_dirty_region(void) struct netfs_dirty_region *region; region = kzalloc(sizeof(struct netfs_dirty_region), GFP_KERNEL); - if (region) + if (region) { + INIT_LIST_HEAD(®ion->proc_link); netfs_stat(&netfs_n_wh_region); + } return region; } @@ -81,6 +83,8 @@ void netfs_free_dirty_region(struct netfs_i_context *ctx, { if (region) { trace_netfs_ref_region(region->debug_id, 0, netfs_region_trace_free); + if (!list_empty(®ion->proc_link)) + netfs_proc_del_region(region); if (ctx->ops->free_dirty_region) ctx->ops->free_dirty_region(region); netfs_put_flush_group(region->group); diff --git a/fs/netfs/write_helper.c b/fs/netfs/write_helper.c index fa048e3882ea..b1fe2d4c0df6 100644 --- a/fs/netfs/write_helper.c +++ b/fs/netfs/write_helper.c @@ -86,10 +86,13 @@ static void netfs_init_dirty_region(struct netfs_dirty_region *region, group = list_last_entry(&ctx->flush_groups, struct netfs_flush_group, group_link); region->group = netfs_get_flush_group(group); + spin_lock(&ctx->lock); list_add_tail(®ion->flush_link, &group->region_list); + spin_unlock(&ctx->lock); } trace_netfs_ref_region(region->debug_id, 1, netfs_region_trace_new); trace_netfs_dirty(ctx, region, NULL, netfs_dirty_trace_new); + netfs_proc_add_region(region); } /* @@ -198,6 +201,7 @@ static struct netfs_dirty_region *netfs_split_dirty_region( list_add(&tail->dirty_link, ®ion->dirty_link); list_add(&tail->flush_link, ®ion->flush_link); trace_netfs_dirty(ctx, tail, region, netfs_dirty_trace_split); + netfs_proc_add_region(tail); return tail; } diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 6acf3fb170c3..43d195badb0d 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -228,6 +228,7 @@ enum netfs_region_type { */ struct netfs_dirty_region { struct netfs_flush_group *group; + struct list_head proc_link; /* Link in /proc/fs/netfs/regions */ struct list_head active_link; /* Link in i_context->pending/active_writes */ struct list_head dirty_link; /* Link in i_context->dirty_regions */ struct list_head flush_link; /* Link in group->region_list or From patchwork Wed Jul 21 13:47:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12390961 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E00CC636C9 for ; Wed, 21 Jul 2021 13:47:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 58ACD60FF4 for ; Wed, 21 Jul 2021 13:47:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238698AbhGUNGy (ORCPT ); Wed, 21 Jul 2021 09:06:54 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:32696 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238840AbhGUNGv (ORCPT ); Wed, 21 Jul 2021 09:06:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626875247; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cux4vK7FasEebT+e6GSkGlqilieGd9BwE6MltPwMMpQ=; b=O8RzBlPLWQayBbsh+mkmVaGBS9JNuBR0R/x2Bz+0aa+0raBPskdFhCbVg0HMoOSuADBVRr FuE0ImWyNoc1zy63LzObjO3z0sZZTbb3gl3fBqqbjg1Tm/dGa9zPLZvFGKNI5ReVSXDYed /EDFLTXJhWfYRrZOrMqHv2JzVXUyBOg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-347-Q8UWrdvhOY-FMC_HmRYBkg-1; Wed, 21 Jul 2021 09:47:26 -0400 X-MC-Unique: Q8UWrdvhOY-FMC_HmRYBkg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4F80F107ACF5; Wed, 21 Jul 2021 13:47:23 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 99F0C19C79; Wed, 21 Jul 2021 13:47:16 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [RFC PATCH 12/12] netfs: Export some read-request ref functions From: David Howells To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 21 Jul 2021 14:47:15 +0100 Message-ID: <162687523532.276387.15449857111016442696.stgit@warthog.procyon.org.uk> In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org Export some functions for getting/putting read-request structures for use in later patches. Signed-off-by: David Howells --- fs/netfs/internal.h | 10 ++++++++++ fs/netfs/read_helper.c | 15 +++------------ 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index a9ec6591f90a..6ae1eb55093a 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -78,9 +78,19 @@ static inline void netfs_see_write_request(struct netfs_write_request *wreq, */ extern unsigned int netfs_debug; +void __netfs_put_subrequest(struct netfs_read_subrequest *subreq, bool was_async); +void netfs_put_read_request(struct netfs_read_request *rreq, bool was_async); +void netfs_rreq_completed(struct netfs_read_request *rreq, bool was_async); int netfs_prefetch_for_write(struct file *file, struct page *page, loff_t pos, size_t len, bool always_fill); +static inline void netfs_put_subrequest(struct netfs_read_subrequest *subreq, + bool was_async) +{ + if (refcount_dec_and_test(&subreq->usage)) + __netfs_put_subrequest(subreq, was_async); +} + /* * write_helper.c */ diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index 0b771f2f5449..e5c636acc756 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -28,14 +28,6 @@ MODULE_PARM_DESC(netfs_debug, "Netfs support debugging mask"); static void netfs_rreq_work(struct work_struct *); static void netfs_rreq_clear_buffer(struct netfs_read_request *); -static void __netfs_put_subrequest(struct netfs_read_subrequest *, bool); - -static void netfs_put_subrequest(struct netfs_read_subrequest *subreq, - bool was_async) -{ - if (refcount_dec_and_test(&subreq->usage)) - __netfs_put_subrequest(subreq, was_async); -} static struct netfs_read_request *netfs_alloc_read_request(struct address_space *mapping, struct file *file) @@ -97,7 +89,7 @@ static void netfs_free_read_request(struct work_struct *work) netfs_stat_d(&netfs_n_rh_rreq); } -static void netfs_put_read_request(struct netfs_read_request *rreq, bool was_async) +void netfs_put_read_request(struct netfs_read_request *rreq, bool was_async) { if (refcount_dec_and_test(&rreq->usage)) { if (was_async) { @@ -135,8 +127,7 @@ static void netfs_get_read_subrequest(struct netfs_read_subrequest *subreq) refcount_inc(&subreq->usage); } -static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq, - bool was_async) +void __netfs_put_subrequest(struct netfs_read_subrequest *subreq, bool was_async) { struct netfs_read_request *rreq = subreq->rreq; @@ -214,7 +205,7 @@ static void netfs_read_from_server(struct netfs_read_request *rreq, /* * Release those waiting. */ -static void netfs_rreq_completed(struct netfs_read_request *rreq, bool was_async) +void netfs_rreq_completed(struct netfs_read_request *rreq, bool was_async) { trace_netfs_rreq(rreq, netfs_rreq_trace_done); netfs_rreq_clear_subreqs(rreq, was_async); From patchwork Wed Jul 21 18:42:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12391937 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0DE7C12002 for ; Wed, 21 Jul 2021 18:42:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BD1A560FF1 for ; Wed, 21 Jul 2021 18:42:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238978AbhGUSBm (ORCPT ); Wed, 21 Jul 2021 14:01:42 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:41036 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236799AbhGUSBk (ORCPT ); Wed, 21 Jul 2021 14:01:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626892936; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ev/WKl0u0g6blajjg/aIie6Q9S6Qwmj0bAxvObPEahw=; b=QOKrJvMzf3MHO+bvMvJNzwzyCqzTR5TFU34RfpSUGAacXoDFqoeS+oFrp53aD6bIKY6Gv0 IXRwxeLoD0sKJDyEFRpJtjmkOzvz9sNSsCFCR2GFj5v8YNlnVgjMlSsvy3iZvsaKK3JHiw k5gb6IwTaHKKSyLQAHOO5dTaKZei+RQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-543-buZuGVf6OgCQ8QBChvYCsA-1; Wed, 21 Jul 2021 14:42:14 -0400 X-MC-Unique: buZuGVf6OgCQ8QBChvYCsA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2C47B801B0A; Wed, 21 Jul 2021 18:42:12 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-112-62.rdu2.redhat.com [10.10.112.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 843381970E; Wed, 21 Jul 2021 18:42:04 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> References: <162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk> To: linux-fsdevel@vger.kernel.org Cc: dhowells@redhat.com, Jeff Layton , "Matthew Wilcox (Oracle)" , Anna Schumaker , Steve French , Dominique Martinet , Mike Marshall , David Wysochanski , Shyam Prasad N , Miklos Szeredi , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH 13/12] netfs: Do copy-to-cache-on-read through VM writeback MIME-Version: 1.0 Content-ID: <297201.1626892923.1@warthog.procyon.org.uk> Date: Wed, 21 Jul 2021 19:42:03 +0100 Message-ID: <297202.1626892923@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org When data is read from the server and intended to be copied to the cache, offload the cache write to the VM writeback mechanism rather than scheduling it immediately. This allows the downloaded data to be superseded by local changes before it is written to the cache and means that we no longer need to use the PG_fscache flag. This is done by the following means: (1) The pages just downloaded into are marked dirty in netfs_rreq_unlock(). (2) A region of NETFS_REGION_CACHE_COPY type is added to the dirty region list. (3) If a region-to-be-modified overlaps the cache-copy region, the modifications supersede the download, moving the end marker over in netfs_merge_dirty_region(). (4) We don't really want to supersede in the middle of a region, so we may split a pristine region so that we can supersede forwards only. (5) We mark regions we're going to supersede with NETFS_REGION_SUPERSEDED to prevent them getting merged whilst we're superseding them. This flag is cleared when we're done and we may merge afterwards. (6) Adjacent download regions are potentially mergeable. (7) When being flushed, CACHE_COPY regions are intended only to be written to the cache, not the server, though they may contribute data to a cross-page chunk that has to be encrypted or compressed and sent to the server. Signed-off-by: David Howells --- fs/netfs/internal.h | 4 -- fs/netfs/main.c | 1 fs/netfs/read_helper.c | 126 ++-------------------------------------------------------------- fs/netfs/stats.c | 7 --- fs/netfs/write_back.c | 3 + fs/netfs/write_helper.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++++++- include/linux/netfs.h | 2 - include/trace/events/netfs.h | 3 + mm/filemap.c | 4 +- 9 files changed, 125 insertions(+), 137 deletions(-) diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index 6ae1eb55093a..ee83b81e4682 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -98,6 +98,7 @@ void netfs_writeback_worker(struct work_struct *work); void netfs_flush_region(struct netfs_i_context *ctx, struct netfs_dirty_region *region, enum netfs_dirty_trace why); +void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq); /* * write_prep.c @@ -121,10 +122,7 @@ extern atomic_t netfs_n_rh_read_done; extern atomic_t netfs_n_rh_read_failed; extern atomic_t netfs_n_rh_zero; extern atomic_t netfs_n_rh_short_read; -extern atomic_t netfs_n_rh_write; extern atomic_t netfs_n_rh_write_begin; -extern atomic_t netfs_n_rh_write_done; -extern atomic_t netfs_n_rh_write_failed; extern atomic_t netfs_n_rh_write_zskip; extern atomic_t netfs_n_wh_region; extern atomic_t netfs_n_wh_flush_group; diff --git a/fs/netfs/main.c b/fs/netfs/main.c index 125b570efefd..ad204dcbb5f7 100644 --- a/fs/netfs/main.c +++ b/fs/netfs/main.c @@ -21,6 +21,7 @@ static const char *netfs_proc_region_types[] = { [NETFS_REGION_ORDINARY] = "ORD ", [NETFS_REGION_DIO] = "DIOW", [NETFS_REGION_DSYNC] = "DSYN", + [NETFS_REGION_CACHE_COPY] = "CCPY", }; /* diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index e5c636acc756..7fa677d4c9ca 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -212,124 +212,6 @@ void netfs_rreq_completed(struct netfs_read_request *rreq, bool was_async) netfs_put_read_request(rreq, was_async); } -/* - * Deal with the completion of writing the data to the cache. We have to clear - * the PG_fscache bits on the pages involved and release the caller's ref. - * - * May be called in softirq mode and we inherit a ref from the caller. - */ -static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq, - bool was_async) -{ - struct netfs_read_subrequest *subreq; - struct page *page; - pgoff_t unlocked = 0; - bool have_unlocked = false; - - rcu_read_lock(); - - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - XA_STATE(xas, &rreq->mapping->i_pages, subreq->start / PAGE_SIZE); - - xas_for_each(&xas, page, (subreq->start + subreq->len - 1) / PAGE_SIZE) { - /* We might have multiple writes from the same huge - * page, but we mustn't unlock a page more than once. - */ - if (have_unlocked && page->index <= unlocked) - continue; - unlocked = page->index; - end_page_fscache(page); - have_unlocked = true; - } - } - - rcu_read_unlock(); - netfs_rreq_completed(rreq, was_async); -} - -static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error, - bool was_async) -{ - struct netfs_read_subrequest *subreq = priv; - struct netfs_read_request *rreq = subreq->rreq; - - if (IS_ERR_VALUE(transferred_or_error)) { - netfs_stat(&netfs_n_rh_write_failed); - trace_netfs_failure(rreq, subreq, transferred_or_error, - netfs_fail_copy_to_cache); - } else { - netfs_stat(&netfs_n_rh_write_done); - } - - trace_netfs_sreq(subreq, netfs_sreq_trace_write_term); - - /* If we decrement nr_wr_ops to 0, the ref belongs to us. */ - if (atomic_dec_and_test(&rreq->nr_wr_ops)) - netfs_rreq_unmark_after_write(rreq, was_async); - - netfs_put_subrequest(subreq, was_async); -} - -/* - * Perform any outstanding writes to the cache. We inherit a ref from the - * caller. - */ -static void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq) -{ - struct netfs_cache_resources *cres = &rreq->cache_resources; - struct netfs_read_subrequest *subreq, *next, *p; - struct iov_iter iter; - int ret; - - trace_netfs_rreq(rreq, netfs_rreq_trace_write); - - /* We don't want terminating writes trying to wake us up whilst we're - * still going through the list. - */ - atomic_inc(&rreq->nr_wr_ops); - - list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) { - if (!test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) { - list_del_init(&subreq->rreq_link); - netfs_put_subrequest(subreq, false); - } - } - - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - /* Amalgamate adjacent writes */ - while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { - next = list_next_entry(subreq, rreq_link); - if (next->start != subreq->start + subreq->len) - break; - subreq->len += next->len; - list_del_init(&next->rreq_link); - netfs_put_subrequest(next, false); - } - - ret = cres->ops->prepare_write(cres, &subreq->start, &subreq->len, - rreq->i_size); - if (ret < 0) { - trace_netfs_failure(rreq, subreq, ret, netfs_fail_prepare_write); - trace_netfs_sreq(subreq, netfs_sreq_trace_write_skip); - continue; - } - - iov_iter_xarray(&iter, WRITE, &rreq->mapping->i_pages, - subreq->start, subreq->len); - - atomic_inc(&rreq->nr_wr_ops); - netfs_stat(&netfs_n_rh_write); - netfs_get_read_subrequest(subreq); - trace_netfs_sreq(subreq, netfs_sreq_trace_write); - cres->ops->write(cres, subreq->start, &iter, - netfs_rreq_copy_terminated, subreq); - } - - /* If we decrement nr_wr_ops to 0, the usage ref belongs to us. */ - if (atomic_dec_and_test(&rreq->nr_wr_ops)) - netfs_rreq_unmark_after_write(rreq, false); -} - static void netfs_rreq_write_to_cache_work(struct work_struct *work) { struct netfs_read_request *rreq = @@ -390,19 +272,19 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq) xas_for_each(&xas, page, last_page) { unsigned int pgpos = (page->index - start_page) * PAGE_SIZE; unsigned int pgend = pgpos + thp_size(page); - bool pg_failed = false; + bool pg_failed = false, caching; for (;;) { if (!subreq) { pg_failed = true; break; } - if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) - set_page_fscache(page); pg_failed |= subreq_failed; if (pgend < iopos + subreq->len) break; + caching = test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags); + account += subreq->len - iov_iter_count(&subreq->iter); iopos += subreq->len; if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { @@ -420,6 +302,8 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq) for (i = 0; i < thp_nr_pages(page); i++) flush_dcache_page(page); SetPageUptodate(page); + if (caching) + set_page_dirty(page); } if (!test_bit(NETFS_RREQ_DONT_UNLOCK_PAGES, &rreq->flags)) { diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index a02d95bba158..414c2fca6b23 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -22,10 +22,7 @@ atomic_t netfs_n_rh_read_done; atomic_t netfs_n_rh_read_failed; atomic_t netfs_n_rh_zero; atomic_t netfs_n_rh_short_read; -atomic_t netfs_n_rh_write; atomic_t netfs_n_rh_write_begin; -atomic_t netfs_n_rh_write_done; -atomic_t netfs_n_rh_write_failed; atomic_t netfs_n_rh_write_zskip; atomic_t netfs_n_wh_region; atomic_t netfs_n_wh_flush_group; @@ -59,10 +56,6 @@ void netfs_stats_show(struct seq_file *m) atomic_read(&netfs_n_rh_read), atomic_read(&netfs_n_rh_read_done), atomic_read(&netfs_n_rh_read_failed)); - seq_printf(m, "RdHelp : WR=%u ws=%u wf=%u\n", - atomic_read(&netfs_n_rh_write), - atomic_read(&netfs_n_rh_write_done), - atomic_read(&netfs_n_rh_write_failed)); seq_printf(m, "WrHelp : R=%u F=%u wr=%u\n", atomic_read(&netfs_n_wh_region), atomic_read(&netfs_n_wh_flush_group), diff --git a/fs/netfs/write_back.c b/fs/netfs/write_back.c index 7363c3324602..4433c3121435 100644 --- a/fs/netfs/write_back.c +++ b/fs/netfs/write_back.c @@ -263,7 +263,8 @@ static void netfs_writeback(struct netfs_write_request *wreq) if (test_bit(NETFS_WREQ_WRITE_TO_CACHE, &wreq->flags)) netfs_set_up_write_to_cache(wreq); - ctx->ops->add_write_streams(wreq); + if (wreq->region->type != NETFS_REGION_CACHE_COPY) + ctx->ops->add_write_streams(wreq); out: if (atomic_dec_and_test(&wreq->outstanding)) diff --git a/fs/netfs/write_helper.c b/fs/netfs/write_helper.c index b1fe2d4c0df6..5e50b01527fb 100644 --- a/fs/netfs/write_helper.c +++ b/fs/netfs/write_helper.c @@ -80,6 +80,11 @@ static void netfs_init_dirty_region(struct netfs_dirty_region *region, INIT_LIST_HEAD(®ion->flush_link); refcount_set(®ion->ref, 1); spin_lock_init(®ion->lock); + if (type == NETFS_REGION_CACHE_COPY) { + region->state = NETFS_REGION_IS_DIRTY; + region->dirty.end = end; + } + if (file && ctx->ops->init_dirty_region) ctx->ops->init_dirty_region(region, file); if (!region->group) { @@ -160,6 +165,19 @@ static enum netfs_write_compatibility netfs_write_compatibility( return NETFS_WRITES_INCOMPATIBLE; } + /* Pending writes to the cache alone (ie. copy from a read) can be + * merged or superseded by a modification that will require writing to + * the server too. + */ + if (old->type == NETFS_REGION_CACHE_COPY) { + if (candidate->type == NETFS_REGION_CACHE_COPY) { + kleave(" = COMPT [ccopy]"); + return NETFS_WRITES_COMPATIBLE; + } + kleave(" = SUPER [ccopy]"); + return NETFS_WRITES_SUPERSEDE; + } + if (!ctx->ops->is_write_compatible) { if (candidate->type == NETFS_REGION_DSYNC) { kleave(" = SUPER [dsync]"); @@ -220,8 +238,11 @@ static void netfs_queue_write(struct netfs_i_context *ctx, if (overlaps(&candidate->bounds, &r->bounds)) { if (overlaps(&candidate->reserved, &r->reserved) || netfs_write_compatibility(ctx, r, candidate) == - NETFS_WRITES_INCOMPATIBLE) + NETFS_WRITES_INCOMPATIBLE) { + kdebug("conflict %x with pend %x", + candidate->debug_id, r->debug_id); goto add_to_pending_queue; + } } } @@ -238,8 +259,11 @@ static void netfs_queue_write(struct netfs_i_context *ctx, if (overlaps(&candidate->bounds, &r->bounds)) { if (overlaps(&candidate->reserved, &r->reserved) || netfs_write_compatibility(ctx, r, candidate) == - NETFS_WRITES_INCOMPATIBLE) + NETFS_WRITES_INCOMPATIBLE) { + kdebug("conflict %x with actv %x", + candidate->debug_id, r->debug_id); goto add_to_pending_queue; + } } } @@ -451,6 +475,9 @@ static void netfs_merge_dirty_region(struct netfs_i_context *ctx, goto discard; } goto scan_backwards; + + case NETFS_REGION_CACHE_COPY: + goto scan_backwards; } scan_backwards: @@ -922,3 +949,84 @@ ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) goto out; } EXPORT_SYMBOL(netfs_file_write_iter); + +/* + * Add a region that's just been read as a region on the dirty list to + * schedule a write to the cache. + */ +static bool netfs_copy_to_cache(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq) +{ + struct netfs_dirty_region *candidate, *r; + struct netfs_i_context *ctx = netfs_i_context(rreq->inode); + struct list_head *p; + loff_t end = subreq->start + subreq->len; + int ret; + + ret = netfs_require_flush_group(rreq->inode); + if (ret < 0) + return false; + + candidate = netfs_alloc_dirty_region(); + if (!candidate) + return false; + + netfs_init_dirty_region(candidate, rreq->inode, NULL, + NETFS_REGION_CACHE_COPY, 0, subreq->start, end); + + spin_lock(&ctx->lock); + + /* Find a place to insert. There can't be any dirty regions + * overlapping with the region we're adding. + */ + list_for_each(p, &ctx->dirty_regions) { + r = list_entry(p, struct netfs_dirty_region, dirty_link); + if (r->bounds.end <= candidate->bounds.start) + continue; + if (r->bounds.start >= candidate->bounds.end) + break; + } + + list_add_tail(&candidate->dirty_link, p); + netfs_merge_dirty_region(ctx, candidate); + + spin_unlock(&ctx->lock); + return true; +} + +/* + * If we downloaded some data and it now needs writing to the cache, we add it + * to the dirty region list and let that flush it. This way it can get merged + * with writes. + * + * We inherit a ref from the caller. + */ +void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq, *next, *p; + + trace_netfs_rreq(rreq, netfs_rreq_trace_write); + + list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) { + if (!test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) { + list_del_init(&subreq->rreq_link); + netfs_put_subrequest(subreq, false); + } + } + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + /* Amalgamate adjacent writes */ + while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { + next = list_next_entry(subreq, rreq_link); + if (next->start != subreq->start + subreq->len) + break; + subreq->len += next->len; + list_del_init(&next->rreq_link); + netfs_put_subrequest(next, false); + } + + netfs_copy_to_cache(rreq, subreq); + } + + netfs_rreq_completed(rreq, false); +} diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 43d195badb0d..527f08eb4898 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -145,7 +145,6 @@ struct netfs_read_request { void *netfs_priv; /* Private data for the netfs */ unsigned int debug_id; atomic_t nr_rd_ops; /* Number of read ops in progress */ - atomic_t nr_wr_ops; /* Number of write ops in progress */ size_t submitted; /* Amount submitted for I/O so far */ size_t len; /* Length of the request */ short error; /* 0 or error that occurred */ @@ -218,6 +217,7 @@ enum netfs_region_type { NETFS_REGION_ORDINARY, /* Ordinary write */ NETFS_REGION_DIO, /* Direct I/O write */ NETFS_REGION_DSYNC, /* O_DSYNC/RWF_DSYNC write */ + NETFS_REGION_CACHE_COPY, /* Data to be written to cache only */ } __attribute__((mode(byte))); /* diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index aa002725b209..136cc42263f9 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -156,7 +156,8 @@ enum netfs_write_stream_trace { #define netfs_region_types \ EM(NETFS_REGION_ORDINARY, "ORD") \ EM(NETFS_REGION_DIO, "DIO") \ - E_(NETFS_REGION_DSYNC, "DSY") + EM(NETFS_REGION_DSYNC, "DSY") \ + E_(NETFS_REGION_CACHE_COPY, "CCP") #define netfs_region_states \ EM(NETFS_REGION_IS_PENDING, "pend") \ diff --git a/mm/filemap.c b/mm/filemap.c index d1458ecf2f51..442cd767a047 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1545,8 +1545,10 @@ void end_page_writeback(struct page *page) * reused before the wake_up_page(). */ get_page(page); - if (!test_clear_page_writeback(page)) + if (!test_clear_page_writeback(page)) { + pr_err("Page %lx doesn't have wb set\n", page->index); BUG(); + } smp_mb__after_atomic(); wake_up_page(page, PG_writeback);