From patchwork Wed Jun 9 08:49:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Kent X-Patchwork-Id: 12309303 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49C93C48BCD for ; Wed, 9 Jun 2021 08:50:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2744B6139A for ; Wed, 9 Jun 2021 08:50:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237736AbhFIIvx convert rfc822-to-8bit (ORCPT ); Wed, 9 Jun 2021 04:51:53 -0400 Received: from us-smtp-delivery-44.mimecast.com ([205.139.111.44]:28055 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229740AbhFIIvw (ORCPT ); Wed, 9 Jun 2021 04:51:52 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-266-NhSfD_6EPAGczitIUQQIhg-1; Wed, 09 Jun 2021 04:49:54 -0400 X-MC-Unique: NhSfD_6EPAGczitIUQQIhg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6C478100C670; Wed, 9 Jun 2021 08:49:53 +0000 (UTC) Received: from web.messagingengine.com (ovpn-116-20.sin2.redhat.com [10.67.116.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id BFEF710016FB; Wed, 9 Jun 2021 08:49:40 +0000 (UTC) Subject: [PATCH v6 1/7] kernfs: move revalidate to be near lookup From: Ian Kent To: Greg Kroah-Hartman , Tejun Heo Cc: Eric Sandeen , Fox Chen , Brice Goglin , Al Viro , Rick Lindsley , David Howells , Miklos Szeredi , Marcelo Tosatti , "Eric W. Biederman" , Carlos Maiolino , linux-fsdevel , Kernel Mailing List Date: Wed, 09 Jun 2021 16:49:37 +0800 Message-ID: <162322857790.361452.16044356399148573870.stgit@web.messagingengine.com> In-Reply-To: <162322846765.361452.17051755721944717990.stgit@web.messagingengine.com> References: <162322846765.361452.17051755721944717990.stgit@web.messagingengine.com> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=raven@themaw.net X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: themaw.net Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org While the dentry operation kernfs_dop_revalidate() is grouped with dentry type functions it also has a strong affinity to the inode operation ->lookup(). It makes sense to locate this function near to kernfs_iop_lookup() because we will be adding VFS negative dentry caching to reduce path lookup overhead for non-existent paths. There's no functional change from this patch. Signed-off-by: Ian Kent Reviewed-by: Miklos Szeredi --- fs/kernfs/dir.c | 86 ++++++++++++++++++++++++++++--------------------------- 1 file changed, 43 insertions(+), 43 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 7e0e62deab53c..33166ec90a112 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -548,49 +548,6 @@ void kernfs_put(struct kernfs_node *kn) } EXPORT_SYMBOL_GPL(kernfs_put); -static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) -{ - struct kernfs_node *kn; - - if (flags & LOOKUP_RCU) - return -ECHILD; - - /* Always perform fresh lookup for negatives */ - if (d_really_is_negative(dentry)) - goto out_bad_unlocked; - - kn = kernfs_dentry_node(dentry); - mutex_lock(&kernfs_mutex); - - /* The kernfs node has been deactivated */ - if (!kernfs_active(kn)) - goto out_bad; - - /* The kernfs node has been moved? */ - if (kernfs_dentry_node(dentry->d_parent) != kn->parent) - goto out_bad; - - /* The kernfs node has been renamed */ - if (strcmp(dentry->d_name.name, kn->name) != 0) - goto out_bad; - - /* The kernfs node has been moved to a different namespace */ - if (kn->parent && kernfs_ns_enabled(kn->parent) && - kernfs_info(dentry->d_sb)->ns != kn->ns) - goto out_bad; - - mutex_unlock(&kernfs_mutex); - return 1; -out_bad: - mutex_unlock(&kernfs_mutex); -out_bad_unlocked: - return 0; -} - -const struct dentry_operations kernfs_dops = { - .d_revalidate = kernfs_dop_revalidate, -}; - /** * kernfs_node_from_dentry - determine kernfs_node associated with a dentry * @dentry: the dentry in question @@ -1073,6 +1030,49 @@ struct kernfs_node *kernfs_create_empty_dir(struct kernfs_node *parent, return ERR_PTR(rc); } +static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) +{ + struct kernfs_node *kn; + + if (flags & LOOKUP_RCU) + return -ECHILD; + + /* Always perform fresh lookup for negatives */ + if (d_really_is_negative(dentry)) + goto out_bad_unlocked; + + kn = kernfs_dentry_node(dentry); + mutex_lock(&kernfs_mutex); + + /* The kernfs node has been deactivated */ + if (!kernfs_active(kn)) + goto out_bad; + + /* The kernfs node has been moved? */ + if (kernfs_dentry_node(dentry->d_parent) != kn->parent) + goto out_bad; + + /* The kernfs node has been renamed */ + if (strcmp(dentry->d_name.name, kn->name) != 0) + goto out_bad; + + /* The kernfs node has been moved to a different namespace */ + if (kn->parent && kernfs_ns_enabled(kn->parent) && + kernfs_info(dentry->d_sb)->ns != kn->ns) + goto out_bad; + + mutex_unlock(&kernfs_mutex); + return 1; +out_bad: + mutex_unlock(&kernfs_mutex); +out_bad_unlocked: + return 0; +} + +const struct dentry_operations kernfs_dops = { + .d_revalidate = kernfs_dop_revalidate, +}; + static struct dentry *kernfs_iop_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) From patchwork Wed Jun 9 08:49:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Kent X-Patchwork-Id: 12309305 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB287C47095 for ; Wed, 9 Jun 2021 08:50:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D652F6139A for ; Wed, 9 Jun 2021 08:50:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237482AbhFIIwT convert rfc822-to-8bit (ORCPT ); Wed, 9 Jun 2021 04:52:19 -0400 Received: from us-smtp-delivery-44.mimecast.com ([205.139.111.44]:39450 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237740AbhFIIwS (ORCPT ); Wed, 9 Jun 2021 04:52:18 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-312-qhieEAuNNmGzp1Y2k73bTw-1; Wed, 09 Jun 2021 04:50:23 -0400 X-MC-Unique: qhieEAuNNmGzp1Y2k73bTw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D1757801B15; Wed, 9 Jun 2021 08:50:21 +0000 (UTC) Received: from web.messagingengine.com (ovpn-116-20.sin2.redhat.com [10.67.116.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id CAD1D60918; Wed, 9 Jun 2021 08:50:02 +0000 (UTC) Subject: [PATCH v6 2/7] kernfs: add a revision to identify directory node changes From: Ian Kent To: Greg Kroah-Hartman , Tejun Heo Cc: Eric Sandeen , Fox Chen , Brice Goglin , Al Viro , Rick Lindsley , David Howells , Miklos Szeredi , Marcelo Tosatti , "Eric W. Biederman" , Carlos Maiolino , linux-fsdevel , Kernel Mailing List Date: Wed, 09 Jun 2021 16:49:59 +0800 Message-ID: <162322859985.361452.14110524195807923374.stgit@web.messagingengine.com> In-Reply-To: <162322846765.361452.17051755721944717990.stgit@web.messagingengine.com> References: <162322846765.361452.17051755721944717990.stgit@web.messagingengine.com> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=raven@themaw.net X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: themaw.net Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a revision counter to kernfs directory nodes so it can be used to detect if a directory node has changed during negative dentry revalidation. There's an assumption that sizeof(unsigned long) <= sizeof(pointer) on all architectures and as far as I know that assumption holds. So adding a revision counter to the struct kernfs_elem_dir variant of the kernfs_node type union won't increase the size of the kernfs_node struct. This is because struct kernfs_elem_dir is at least sizeof(pointer) smaller than the largest union variant. It's tempting to make the revision counter a u64 but that would increase the size of kernfs_node on archs where sizeof(pointer) is smaller than the revision counter. Signed-off-by: Ian Kent --- fs/kernfs/dir.c | 2 ++ fs/kernfs/kernfs-internal.h | 23 +++++++++++++++++++++++ include/linux/kernfs.h | 5 +++++ 3 files changed, 30 insertions(+) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 33166ec90a112..b3d1bc0f317d0 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -372,6 +372,7 @@ static int kernfs_link_sibling(struct kernfs_node *kn) /* successfully added, account subdir number */ if (kernfs_type(kn) == KERNFS_DIR) kn->parent->dir.subdirs++; + kernfs_inc_rev(kn->parent); return 0; } @@ -394,6 +395,7 @@ static bool kernfs_unlink_sibling(struct kernfs_node *kn) if (kernfs_type(kn) == KERNFS_DIR) kn->parent->dir.subdirs--; + kernfs_inc_rev(kn->parent); rb_erase(&kn->rb, &kn->parent->dir.children); RB_CLEAR_NODE(&kn->rb); diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h index ccc3b44f6306f..b4e7579e04799 100644 --- a/fs/kernfs/kernfs-internal.h +++ b/fs/kernfs/kernfs-internal.h @@ -81,6 +81,29 @@ static inline struct kernfs_node *kernfs_dentry_node(struct dentry *dentry) return d_inode(dentry)->i_private; } +static inline void kernfs_set_rev(struct kernfs_node *kn, + struct dentry *dentry) +{ + if (kernfs_type(kn) == KERNFS_DIR) + dentry->d_time = kn->dir.rev; +} + +static inline void kernfs_inc_rev(struct kernfs_node *kn) +{ + if (kernfs_type(kn) == KERNFS_DIR) + kn->dir.rev++; +} + +static inline bool kernfs_dir_changed(struct kernfs_node *kn, + struct dentry *dentry) +{ + if (kernfs_type(kn) == KERNFS_DIR) { + if (kn->dir.rev != dentry->d_time) + return true; + } + return false; +} + extern const struct super_operations kernfs_sops; extern struct kmem_cache *kernfs_node_cache, *kernfs_iattrs_cache; diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 9e8ca8743c268..d7e0160fce6df 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -98,6 +98,11 @@ struct kernfs_elem_dir { * better directly in kernfs_node but is here to save space. */ struct kernfs_root *root; + /* + * Monotonic revision counter, used to identify if a directory + * node has changed during negative dentry revalidation. + */ + unsigned long rev; }; struct kernfs_elem_symlink { From patchwork Wed Jun 9 08:50:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Kent X-Patchwork-Id: 12309307 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D32A5C47095 for ; Wed, 9 Jun 2021 08:50:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C035E613AE for ; Wed, 9 Jun 2021 08:50:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237769AbhFIIwr convert rfc822-to-8bit (ORCPT ); Wed, 9 Jun 2021 04:52:47 -0400 Received: from us-smtp-delivery-44.mimecast.com ([205.139.111.44]:34034 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237762AbhFIIwq (ORCPT ); Wed, 9 Jun 2021 04:52:46 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-562-90EOlIFCNmSFizuN1V2kiw-1; Wed, 09 Jun 2021 04:50:48 -0400 X-MC-Unique: 90EOlIFCNmSFizuN1V2kiw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D4DB810C1ADC; Wed, 9 Jun 2021 08:50:46 +0000 (UTC) Received: from web.messagingengine.com (ovpn-116-20.sin2.redhat.com [10.67.116.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2B56E17C5F; Wed, 9 Jun 2021 08:50:31 +0000 (UTC) Subject: [PATCH v6 3/7] kernfs: use VFS negative dentry caching From: Ian Kent To: Greg Kroah-Hartman , Tejun Heo Cc: Eric Sandeen , Fox Chen , Brice Goglin , Al Viro , Rick Lindsley , David Howells , Miklos Szeredi , Marcelo Tosatti , "Eric W. Biederman" , Carlos Maiolino , linux-fsdevel , Kernel Mailing List Date: Wed, 09 Jun 2021 16:50:27 +0800 Message-ID: <162322862726.361452.10114120072438540655.stgit@web.messagingengine.com> In-Reply-To: <162322846765.361452.17051755721944717990.stgit@web.messagingengine.com> References: <162322846765.361452.17051755721944717990.stgit@web.messagingengine.com> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=raven@themaw.net X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: themaw.net Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If there are many lookups for non-existent paths these negative lookups can lead to a lot of overhead during path walks. The VFS allows dentries to be created as negative and hashed, and caches them so they can be used to reduce the fairly high overhead alloc/free cycle that occurs during these lookups. Use the kernfs node parent revision to identify if a change has been made to the containing directory so that the negative dentry can be discarded and the lookup redone. Signed-off-by: Ian Kent --- fs/kernfs/dir.c | 52 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 20 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index b3d1bc0f317d0..4f037456a8e17 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -1039,9 +1039,28 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) if (flags & LOOKUP_RCU) return -ECHILD; - /* Always perform fresh lookup for negatives */ - if (d_really_is_negative(dentry)) - goto out_bad_unlocked; + /* Negative hashed dentry? */ + if (d_really_is_negative(dentry)) { + struct dentry *d_parent = dget_parent(dentry); + struct kernfs_node *parent; + + /* If the kernfs parent node has changed discard and + * proceed to ->lookup. + */ + parent = kernfs_dentry_node(d_parent); + if (parent) { + if (kernfs_dir_changed(parent, dentry)) { + dput(d_parent); + return 0; + } + } + dput(d_parent); + + /* The kernfs node doesn't exist, leave the dentry + * negative and return success. + */ + return 1; + } kn = kernfs_dentry_node(dentry); mutex_lock(&kernfs_mutex); @@ -1067,7 +1086,6 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) return 1; out_bad: mutex_unlock(&kernfs_mutex); -out_bad_unlocked: return 0; } @@ -1082,33 +1100,27 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, struct dentry *ret; struct kernfs_node *parent = dir->i_private; struct kernfs_node *kn; - struct inode *inode; + struct inode *inode = NULL; const void *ns = NULL; mutex_lock(&kernfs_mutex); - if (kernfs_ns_enabled(parent)) ns = kernfs_info(dir->i_sb)->ns; kn = kernfs_find_ns(parent, dentry->d_name.name, ns); - - /* no such entry */ - if (!kn || !kernfs_active(kn)) { - ret = NULL; - goto out_unlock; - } - /* attach dentry and inode */ - inode = kernfs_get_inode(dir->i_sb, kn); - if (!inode) { - ret = ERR_PTR(-ENOMEM); - goto out_unlock; + if (kn && kernfs_active(kn)) { + inode = kernfs_get_inode(dir->i_sb, kn); + if (!inode) + inode = ERR_PTR(-ENOMEM); } - - /* instantiate and hash dentry */ + /* Needed only for negative dentry validation */ + if (!inode) + kernfs_set_rev(parent, dentry); + /* instantiate and hash (possibly negative) dentry */ ret = d_splice_alias(inode, dentry); - out_unlock: mutex_unlock(&kernfs_mutex); + return ret; } From patchwork Wed Jun 9 08:50:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Kent X-Patchwork-Id: 12309309 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CA96C48BCD for ; Wed, 9 Jun 2021 08:51:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7FDB06139A for ; Wed, 9 Jun 2021 08:51:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237782AbhFIIxR convert rfc822-to-8bit (ORCPT ); Wed, 9 Jun 2021 04:53:17 -0400 Received: from us-smtp-delivery-44.mimecast.com ([205.139.111.44]:37014 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237783AbhFIIxQ (ORCPT ); Wed, 9 Jun 2021 04:53:16 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-586-bp4ecx4OO0Oy-nHuFD0Qvg-1; Wed, 09 Jun 2021 04:51:18 -0400 X-MC-Unique: bp4ecx4OO0Oy-nHuFD0Qvg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4CC98800C60; Wed, 9 Jun 2021 08:51:17 +0000 (UTC) Received: from web.messagingengine.com (ovpn-116-20.sin2.redhat.com [10.67.116.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2C39060C04; Wed, 9 Jun 2021 08:50:59 +0000 (UTC) Subject: [PATCH v6 4/7] kernfs: switch kernfs to use an rwsem From: Ian Kent To: Greg Kroah-Hartman , Tejun Heo Cc: Eric Sandeen , Fox Chen , Brice Goglin , Al Viro , Rick Lindsley , David Howells , Miklos Szeredi , Marcelo Tosatti , "Eric W. Biederman" , Carlos Maiolino , linux-fsdevel , Kernel Mailing List Date: Wed, 09 Jun 2021 16:50:52 +0800 Message-ID: <162322865230.361452.5882168567975703664.stgit@web.messagingengine.com> In-Reply-To: <162322846765.361452.17051755721944717990.stgit@web.messagingengine.com> References: <162322846765.361452.17051755721944717990.stgit@web.messagingengine.com> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=raven@themaw.net X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: themaw.net Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The kernfs global lock restricts the ability to perform kernfs node lookup operations in parallel during path walks. Change the kernfs mutex to an rwsem so that, when opportunity arises, node searches can be done in parallel with path walk lookups. Signed-off-by: Ian Kent Reviewed-by: Miklos Szeredi --- fs/kernfs/dir.c | 94 ++++++++++++++++++++++--------------------- fs/kernfs/file.c | 4 +- fs/kernfs/inode.c | 16 ++++--- fs/kernfs/kernfs-internal.h | 5 +- fs/kernfs/mount.c | 12 +++-- fs/kernfs/symlink.c | 4 +- include/linux/kernfs.h | 2 - 7 files changed, 69 insertions(+), 68 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 4f037456a8e17..195561c08439a 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -17,7 +17,7 @@ #include "kernfs-internal.h" -DEFINE_MUTEX(kernfs_mutex); +DECLARE_RWSEM(kernfs_rwsem); static DEFINE_SPINLOCK(kernfs_rename_lock); /* kn->parent and ->name */ static char kernfs_pr_cont_buf[PATH_MAX]; /* protected by rename_lock */ static DEFINE_SPINLOCK(kernfs_idr_lock); /* root->ino_idr */ @@ -26,7 +26,7 @@ static DEFINE_SPINLOCK(kernfs_idr_lock); /* root->ino_idr */ static bool kernfs_active(struct kernfs_node *kn) { - lockdep_assert_held(&kernfs_mutex); + lockdep_assert_held(&kernfs_rwsem); return atomic_read(&kn->active) >= 0; } @@ -340,7 +340,7 @@ static int kernfs_sd_compare(const struct kernfs_node *left, * @kn->parent->dir.children. * * Locking: - * mutex_lock(kernfs_mutex) + * kernfs_rwsem held exclusive * * RETURNS: * 0 on susccess -EEXIST on failure. @@ -386,7 +386,7 @@ static int kernfs_link_sibling(struct kernfs_node *kn) * removed, %false if @kn wasn't on the rbtree. * * Locking: - * mutex_lock(kernfs_mutex) + * kernfs_rwsem held exclusive */ static bool kernfs_unlink_sibling(struct kernfs_node *kn) { @@ -457,14 +457,14 @@ void kernfs_put_active(struct kernfs_node *kn) * return after draining is complete. */ static void kernfs_drain(struct kernfs_node *kn) - __releases(&kernfs_mutex) __acquires(&kernfs_mutex) + __releases(&kernfs_rwsem) __acquires(&kernfs_rwsem) { struct kernfs_root *root = kernfs_root(kn); - lockdep_assert_held(&kernfs_mutex); + lockdep_assert_held_write(&kernfs_rwsem); WARN_ON_ONCE(kernfs_active(kn)); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); if (kernfs_lockdep(kn)) { rwsem_acquire(&kn->dep_map, 0, 0, _RET_IP_); @@ -483,7 +483,7 @@ static void kernfs_drain(struct kernfs_node *kn) kernfs_drain_open_files(kn); - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); } /** @@ -722,7 +722,7 @@ int kernfs_add_one(struct kernfs_node *kn) bool has_ns; int ret; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); ret = -EINVAL; has_ns = kernfs_ns_enabled(parent); @@ -753,7 +753,7 @@ int kernfs_add_one(struct kernfs_node *kn) ps_iattr->ia_mtime = ps_iattr->ia_ctime; } - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); /* * Activate the new node unless CREATE_DEACTIVATED is requested. @@ -767,7 +767,7 @@ int kernfs_add_one(struct kernfs_node *kn) return 0; out_unlock: - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); return ret; } @@ -788,7 +788,7 @@ static struct kernfs_node *kernfs_find_ns(struct kernfs_node *parent, bool has_ns = kernfs_ns_enabled(parent); unsigned int hash; - lockdep_assert_held(&kernfs_mutex); + lockdep_assert_held(&kernfs_rwsem); if (has_ns != (bool)ns) { WARN(1, KERN_WARNING "kernfs: ns %s in '%s' for '%s'\n", @@ -820,7 +820,7 @@ static struct kernfs_node *kernfs_walk_ns(struct kernfs_node *parent, size_t len; char *p, *name; - lockdep_assert_held(&kernfs_mutex); + lockdep_assert_held_read(&kernfs_rwsem); /* grab kernfs_rename_lock to piggy back on kernfs_pr_cont_buf */ spin_lock_irq(&kernfs_rename_lock); @@ -860,10 +860,10 @@ struct kernfs_node *kernfs_find_and_get_ns(struct kernfs_node *parent, { struct kernfs_node *kn; - mutex_lock(&kernfs_mutex); + down_read(&kernfs_rwsem); kn = kernfs_find_ns(parent, name, ns); kernfs_get(kn); - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); return kn; } @@ -884,10 +884,10 @@ struct kernfs_node *kernfs_walk_and_get_ns(struct kernfs_node *parent, { struct kernfs_node *kn; - mutex_lock(&kernfs_mutex); + down_read(&kernfs_rwsem); kn = kernfs_walk_ns(parent, path, ns); kernfs_get(kn); - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); return kn; } @@ -1063,7 +1063,7 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) } kn = kernfs_dentry_node(dentry); - mutex_lock(&kernfs_mutex); + down_read(&kernfs_rwsem); /* The kernfs node has been deactivated */ if (!kernfs_active(kn)) @@ -1082,10 +1082,10 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags) kernfs_info(dentry->d_sb)->ns != kn->ns) goto out_bad; - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); return 1; out_bad: - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); return 0; } @@ -1103,7 +1103,7 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, struct inode *inode = NULL; const void *ns = NULL; - mutex_lock(&kernfs_mutex); + down_read(&kernfs_rwsem); if (kernfs_ns_enabled(parent)) ns = kernfs_info(dir->i_sb)->ns; @@ -1119,7 +1119,7 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, kernfs_set_rev(parent, dentry); /* instantiate and hash (possibly negative) dentry */ ret = d_splice_alias(inode, dentry); - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); return ret; } @@ -1241,7 +1241,7 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos, { struct rb_node *rbn; - lockdep_assert_held(&kernfs_mutex); + lockdep_assert_held_write(&kernfs_rwsem); /* if first iteration, visit leftmost descendant which may be root */ if (!pos) @@ -1277,7 +1277,7 @@ void kernfs_activate(struct kernfs_node *kn) { struct kernfs_node *pos; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); pos = NULL; while ((pos = kernfs_next_descendant_post(pos, kn))) { @@ -1291,14 +1291,14 @@ void kernfs_activate(struct kernfs_node *kn) pos->flags |= KERNFS_ACTIVATED; } - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); } static void __kernfs_remove(struct kernfs_node *kn) { struct kernfs_node *pos; - lockdep_assert_held(&kernfs_mutex); + lockdep_assert_held_write(&kernfs_rwsem); /* * Short-circuit if non-root @kn has already finished removal. @@ -1321,7 +1321,7 @@ static void __kernfs_remove(struct kernfs_node *kn) pos = kernfs_leftmost_descendant(kn); /* - * kernfs_drain() drops kernfs_mutex temporarily and @pos's + * kernfs_drain() drops kernfs_rwsem temporarily and @pos's * base ref could have been put by someone else by the time * the function returns. Make sure it doesn't go away * underneath us. @@ -1368,9 +1368,9 @@ static void __kernfs_remove(struct kernfs_node *kn) */ void kernfs_remove(struct kernfs_node *kn) { - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); __kernfs_remove(kn); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); } /** @@ -1457,17 +1457,17 @@ bool kernfs_remove_self(struct kernfs_node *kn) { bool ret; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); kernfs_break_active_protection(kn); /* * SUICIDAL is used to arbitrate among competing invocations. Only * the first one will actually perform removal. When the removal * is complete, SUICIDED is set and the active ref is restored - * while holding kernfs_mutex. The ones which lost arbitration - * waits for SUICDED && drained which can happen only after the - * enclosing kernfs operation which executed the winning instance - * of kernfs_remove_self() finished. + * while kernfs_rwsem for held exclusive. The ones which lost + * arbitration waits for SUICIDED && drained which can happen only + * after the enclosing kernfs operation which executed the winning + * instance of kernfs_remove_self() finished. */ if (!(kn->flags & KERNFS_SUICIDAL)) { kn->flags |= KERNFS_SUICIDAL; @@ -1485,9 +1485,9 @@ bool kernfs_remove_self(struct kernfs_node *kn) atomic_read(&kn->active) == KN_DEACTIVATED_BIAS) break; - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); schedule(); - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); } finish_wait(waitq, &wait); WARN_ON_ONCE(!RB_EMPTY_NODE(&kn->rb)); @@ -1495,12 +1495,12 @@ bool kernfs_remove_self(struct kernfs_node *kn) } /* - * This must be done while holding kernfs_mutex; otherwise, waiting - * for SUICIDED && deactivated could finish prematurely. + * This must be done while kernfs_rwsem held exclusive; otherwise, + * waiting for SUICIDED && deactivated could finish prematurely. */ kernfs_unbreak_active_protection(kn); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); return ret; } @@ -1524,13 +1524,13 @@ int kernfs_remove_by_name_ns(struct kernfs_node *parent, const char *name, return -ENOENT; } - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); kn = kernfs_find_ns(parent, name, ns); if (kn) __kernfs_remove(kn); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); if (kn) return 0; @@ -1556,7 +1556,7 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent, if (!kn->parent) return -EINVAL; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); error = -ENOENT; if (!kernfs_active(kn) || !kernfs_active(new_parent) || @@ -1610,7 +1610,7 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent, error = 0; out: - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); return error; } @@ -1685,7 +1685,7 @@ static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx) if (!dir_emit_dots(file, ctx)) return 0; - mutex_lock(&kernfs_mutex); + down_read(&kernfs_rwsem); if (kernfs_ns_enabled(parent)) ns = kernfs_info(dentry->d_sb)->ns; @@ -1702,12 +1702,12 @@ static int kernfs_fop_readdir(struct file *file, struct dir_context *ctx) file->private_data = pos; kernfs_get(pos); - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); if (!dir_emit(ctx, name, len, ino, type)) return 0; - mutex_lock(&kernfs_mutex); + down_read(&kernfs_rwsem); } - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); file->private_data = NULL; ctx->pos = INT_MAX; return 0; diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index c757193121475..60e2a86c535eb 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -860,7 +860,7 @@ static void kernfs_notify_workfn(struct work_struct *work) spin_unlock_irq(&kernfs_notify_lock); /* kick fsnotify */ - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); list_for_each_entry(info, &kernfs_root(kn)->supers, node) { struct kernfs_node *parent; @@ -898,7 +898,7 @@ static void kernfs_notify_workfn(struct work_struct *work) iput(inode); } - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); kernfs_put(kn); goto repeat; } diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c index d73950fc3d57d..3b01e9e61f14e 100644 --- a/fs/kernfs/inode.c +++ b/fs/kernfs/inode.c @@ -106,9 +106,9 @@ int kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr) { int ret; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); ret = __kernfs_setattr(kn, iattr); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); return ret; } @@ -122,7 +122,7 @@ int kernfs_iop_setattr(struct user_namespace *mnt_userns, struct dentry *dentry, if (!kn) return -EINVAL; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); error = setattr_prepare(&init_user_ns, dentry, iattr); if (error) goto out; @@ -135,7 +135,7 @@ int kernfs_iop_setattr(struct user_namespace *mnt_userns, struct dentry *dentry, setattr_copy(&init_user_ns, inode, iattr); out: - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); return error; } @@ -191,9 +191,9 @@ int kernfs_iop_getattr(struct user_namespace *mnt_userns, struct inode *inode = d_inode(path->dentry); struct kernfs_node *kn = inode->i_private; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); kernfs_refresh_inode(kn, inode); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); generic_fillattr(&init_user_ns, inode, stat); return 0; @@ -284,9 +284,9 @@ int kernfs_iop_permission(struct user_namespace *mnt_userns, kn = inode->i_private; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); kernfs_refresh_inode(kn, inode); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); return generic_permission(&init_user_ns, inode, mask); } diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h index b4e7579e04799..8a067609f63ba 100644 --- a/fs/kernfs/kernfs-internal.h +++ b/fs/kernfs/kernfs-internal.h @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -69,7 +70,7 @@ struct kernfs_super_info { */ const void *ns; - /* anchored at kernfs_root->supers, protected by kernfs_mutex */ + /* anchored at kernfs_root->supers, protected by kernfs_rwsem */ struct list_head node; }; #define kernfs_info(SB) ((struct kernfs_super_info *)(SB->s_fs_info)) @@ -125,7 +126,7 @@ int __kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr); /* * dir.c */ -extern struct mutex kernfs_mutex; +extern struct rw_semaphore kernfs_rwsem; extern const struct dentry_operations kernfs_dops; extern const struct file_operations kernfs_dir_fops; extern const struct inode_operations kernfs_dir_iops; diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c index 9dc7e7a64e10f..baa4155ba2edf 100644 --- a/fs/kernfs/mount.c +++ b/fs/kernfs/mount.c @@ -255,9 +255,9 @@ static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *k sb->s_shrink.seeks = 0; /* get root inode, initialize and unlock it */ - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); inode = kernfs_get_inode(sb, info->root->kn); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); if (!inode) { pr_debug("kernfs: could not get root inode\n"); return -ENOMEM; @@ -344,9 +344,9 @@ int kernfs_get_tree(struct fs_context *fc) } sb->s_flags |= SB_ACTIVE; - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); list_add(&info->node, &info->root->supers); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); } fc->root = dget(sb->s_root); @@ -372,9 +372,9 @@ void kernfs_kill_sb(struct super_block *sb) { struct kernfs_super_info *info = kernfs_info(sb); - mutex_lock(&kernfs_mutex); + down_write(&kernfs_rwsem); list_del(&info->node); - mutex_unlock(&kernfs_mutex); + up_write(&kernfs_rwsem); /* * Remove the superblock from fs_supers/s_instances diff --git a/fs/kernfs/symlink.c b/fs/kernfs/symlink.c index 5432883d819f2..c8f8e41b84110 100644 --- a/fs/kernfs/symlink.c +++ b/fs/kernfs/symlink.c @@ -116,9 +116,9 @@ static int kernfs_getlink(struct inode *inode, char *path) struct kernfs_node *target = kn->symlink.target_kn; int error; - mutex_lock(&kernfs_mutex); + down_read(&kernfs_rwsem); error = kernfs_get_target_path(parent, target, path); - mutex_unlock(&kernfs_mutex); + up_read(&kernfs_rwsem); return error; } diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index d7e0160fce6df..6df8cec4af51d 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -193,7 +193,7 @@ struct kernfs_root { u32 id_highbits; struct kernfs_syscall_ops *syscall_ops; - /* list of kernfs_super_info of this root, protected by kernfs_mutex */ + /* list of kernfs_super_info of this root, protected by kernfs_rwsem */ struct list_head supers; wait_queue_head_t deactivate_waitq; From patchwork Wed Jun 9 08:51:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Kent X-Patchwork-Id: 12309311 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25D02C47095 for ; Wed, 9 Jun 2021 08:52:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EE5156101A for ; Wed, 9 Jun 2021 08:52:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237754AbhFIIyB convert rfc822-to-8bit (ORCPT ); Wed, 9 Jun 2021 04:54:01 -0400 Received: from us-smtp-delivery-44.mimecast.com ([205.139.111.44]:22648 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231474AbhFIIyB (ORCPT ); Wed, 9 Jun 2021 04:54:01 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-412-PTv6_32ZMuai4EYYRrW8Ww-1; Wed, 09 Jun 2021 04:52:01 -0400 X-MC-Unique: PTv6_32ZMuai4EYYRrW8Ww-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E0689C7400; Wed, 9 Jun 2021 08:51:59 +0000 (UTC) Received: from web.messagingengine.com (ovpn-116-20.sin2.redhat.com [10.67.116.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2F93A5D9DE; Wed, 9 Jun 2021 08:51:40 +0000 (UTC) Subject: [PATCH v6 5/7] kernfs: use i_lock to protect concurrent inode updates From: Ian Kent To: Greg Kroah-Hartman , Tejun Heo Cc: Eric Sandeen , Fox Chen , Brice Goglin , Al Viro , Rick Lindsley , David Howells , Miklos Szeredi , Marcelo Tosatti , "Eric W. Biederman" , Carlos Maiolino , linux-fsdevel , Kernel Mailing List Date: Wed, 09 Jun 2021 16:51:22 +0800 Message-ID: <162322868275.361452.17585267026652222121.stgit@web.messagingengine.com> In-Reply-To: <162322846765.361452.17051755721944717990.stgit@web.messagingengine.com> References: <162322846765.361452.17051755721944717990.stgit@web.messagingengine.com> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=raven@themaw.net X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: themaw.net Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The inode operations .permission() and .getattr() use the kernfs node write lock but all that's needed is to keep the rb tree stable while updating the inode attributes as well as protecting the update itself against concurrent changes. And .permission() is called frequently during path walks and can cause quite a bit of contention between kernfs node operations and path walks when the number of concurrent walks is high. To change kernfs_iop_getattr() and kernfs_iop_permission() to take the rw sem read lock instead of the write lock an additional lock is needed to protect against multiple processes concurrently updating the inode attributes and link count in kernfs_refresh_inode(). The inode i_lock seems like the sensible thing to use to protect these inode attribute updates so use it in kernfs_refresh_inode(). The last hunk in the patch, applied to kernfs_fill_super(), is possibly not needed but taking the lock was present originally and I prefer to continue to take it so the rb tree is held stable during the call to kernfs_refresh_inode() made by kernfs_get_inode(). Signed-off-by: Ian Kent Reviewed-by: Miklos Szeredi --- fs/kernfs/inode.c | 10 ++++++---- fs/kernfs/mount.c | 4 ++-- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c index 3b01e9e61f14e..6728ecd81eb37 100644 --- a/fs/kernfs/inode.c +++ b/fs/kernfs/inode.c @@ -172,6 +172,7 @@ static void kernfs_refresh_inode(struct kernfs_node *kn, struct inode *inode) { struct kernfs_iattrs *attrs = kn->iattr; + spin_lock(&inode->i_lock); inode->i_mode = kn->mode; if (attrs) /* @@ -182,6 +183,7 @@ static void kernfs_refresh_inode(struct kernfs_node *kn, struct inode *inode) if (kernfs_type(kn) == KERNFS_DIR) set_nlink(inode, kn->dir.subdirs + 2); + spin_unlock(&inode->i_lock); } int kernfs_iop_getattr(struct user_namespace *mnt_userns, @@ -191,9 +193,9 @@ int kernfs_iop_getattr(struct user_namespace *mnt_userns, struct inode *inode = d_inode(path->dentry); struct kernfs_node *kn = inode->i_private; - down_write(&kernfs_rwsem); + down_read(&kernfs_rwsem); kernfs_refresh_inode(kn, inode); - up_write(&kernfs_rwsem); + up_read(&kernfs_rwsem); generic_fillattr(&init_user_ns, inode, stat); return 0; @@ -284,9 +286,9 @@ int kernfs_iop_permission(struct user_namespace *mnt_userns, kn = inode->i_private; - down_write(&kernfs_rwsem); + down_read(&kernfs_rwsem); kernfs_refresh_inode(kn, inode); - up_write(&kernfs_rwsem); + up_read(&kernfs_rwsem); return generic_permission(&init_user_ns, inode, mask); } diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c index baa4155ba2edf..f2f909d09f522 100644 --- a/fs/kernfs/mount.c +++ b/fs/kernfs/mount.c @@ -255,9 +255,9 @@ static int kernfs_fill_super(struct super_block *sb, struct kernfs_fs_context *k sb->s_shrink.seeks = 0; /* get root inode, initialize and unlock it */ - down_write(&kernfs_rwsem); + down_read(&kernfs_rwsem); inode = kernfs_get_inode(sb, info->root->kn); - up_write(&kernfs_rwsem); + up_read(&kernfs_rwsem); if (!inode) { pr_debug("kernfs: could not get root inode\n"); return -ENOMEM; From patchwork Wed Jun 9 08:52:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Kent X-Patchwork-Id: 12309315 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5049C48BCF for ; Wed, 9 Jun 2021 08:52:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8EA006101A for ; Wed, 9 Jun 2021 08:52:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237810AbhFIIyU convert rfc822-to-8bit (ORCPT ); Wed, 9 Jun 2021 04:54:20 -0400 Received: from us-smtp-delivery-44.mimecast.com ([207.211.30.44]:25203 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237793AbhFIIyS (ORCPT ); Wed, 9 Jun 2021 04:54:18 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-561-ZeIgQlZ8P3qvM7_yvvXxQg-1; Wed, 09 Jun 2021 04:52:20 -0400 X-MC-Unique: ZeIgQlZ8P3qvM7_yvvXxQg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AE929C7400; Wed, 9 Jun 2021 08:52:18 +0000 (UTC) Received: from web.messagingengine.com (ovpn-116-20.sin2.redhat.com [10.67.116.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id A1E865D9DE; Wed, 9 Jun 2021 08:52:07 +0000 (UTC) Subject: [PATCH v6 6/7] kernfs: add kernfs_need_inode_refresh() From: Ian Kent To: Greg Kroah-Hartman , Tejun Heo Cc: Eric Sandeen , Fox Chen , Brice Goglin , Al Viro , Rick Lindsley , David Howells , Miklos Szeredi , Marcelo Tosatti , "Eric W. Biederman" , Carlos Maiolino , linux-fsdevel , Kernel Mailing List Date: Wed, 09 Jun 2021 16:52:05 +0800 Message-ID: <162322872534.361452.17619177755627322271.stgit@web.messagingengine.com> In-Reply-To: <162322846765.361452.17051755721944717990.stgit@web.messagingengine.com> References: <162322846765.361452.17051755721944717990.stgit@web.messagingengine.com> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=raven@themaw.net X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: themaw.net Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Now the kernfs_rwsem read lock is held for kernfs_refresh_inode() and the i_lock taken to protect inode updates there can be some contention introduced when .permission() is called with concurrent path walks in progress. Since .permission() is called frequently during path walks it's worth checking if the update is actually needed before taking the lock and performing the update. Signed-off-by: Ian Kent Reviewed-by: Miklos Szeredi --- fs/kernfs/inode.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c index 6728ecd81eb37..67fb1289c51dc 100644 --- a/fs/kernfs/inode.c +++ b/fs/kernfs/inode.c @@ -158,6 +158,30 @@ static inline void set_default_inode_attr(struct inode *inode, umode_t mode) inode->i_ctime = current_time(inode); } +static bool kernfs_need_inode_refresh(struct kernfs_node *kn, + struct inode *inode, + struct kernfs_iattrs *attrs) +{ + if (kernfs_type(kn) == KERNFS_DIR) { + if (inode->i_nlink != kn->dir.subdirs + 2) + return true; + } + + if (inode->i_mode != kn->mode) + return true; + + if (attrs) { + if (!timespec64_equal(&inode->i_atime, &attrs->ia_atime) || + !timespec64_equal(&inode->i_mtime, &attrs->ia_mtime) || + !timespec64_equal(&inode->i_ctime, &attrs->ia_ctime) || + !uid_eq(inode->i_uid, attrs->ia_uid) || + !gid_eq(inode->i_gid, attrs->ia_gid)) + return true; + } + + return false; +} + static inline void set_inode_attr(struct inode *inode, struct kernfs_iattrs *attrs) { @@ -172,6 +196,9 @@ static void kernfs_refresh_inode(struct kernfs_node *kn, struct inode *inode) { struct kernfs_iattrs *attrs = kn->iattr; + if (!kernfs_need_inode_refresh(kn, inode, attrs)) + return; + spin_lock(&inode->i_lock); inode->i_mode = kn->mode; if (attrs) From patchwork Wed Jun 9 08:52:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Kent X-Patchwork-Id: 12309317 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B95DBC47095 for ; Wed, 9 Jun 2021 08:52:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A4F4061263 for ; Wed, 9 Jun 2021 08:52:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237826AbhFIIyt convert rfc822-to-8bit (ORCPT ); Wed, 9 Jun 2021 04:54:49 -0400 Received: from us-smtp-delivery-44.mimecast.com ([205.139.111.44]:52286 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237818AbhFIIyr (ORCPT ); Wed, 9 Jun 2021 04:54:47 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-280-8y1QuQ0oMTu7V5NPd0r9VA-1; Wed, 09 Jun 2021 04:52:49 -0400 X-MC-Unique: 8y1QuQ0oMTu7V5NPd0r9VA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1D299801B16; Wed, 9 Jun 2021 08:52:48 +0000 (UTC) Received: from web.messagingengine.com (ovpn-116-20.sin2.redhat.com [10.67.116.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id B73145D9E2; Wed, 9 Jun 2021 08:52:26 +0000 (UTC) Subject: [PATCH v6 7/7] kernfs: dont call d_splice_alias() under kernfs node lock From: Ian Kent To: Greg Kroah-Hartman , Tejun Heo Cc: Eric Sandeen , Fox Chen , Brice Goglin , Al Viro , Rick Lindsley , David Howells , Miklos Szeredi , Marcelo Tosatti , "Eric W. Biederman" , Carlos Maiolino , linux-fsdevel , Kernel Mailing List Date: Wed, 09 Jun 2021 16:52:25 +0800 Message-ID: <162322874509.361452.3143376113190093370.stgit@web.messagingengine.com> In-Reply-To: <162322846765.361452.17051755721944717990.stgit@web.messagingengine.com> References: <162322846765.361452.17051755721944717990.stgit@web.messagingengine.com> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=raven@themaw.net X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: themaw.net Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The call to d_splice_alias() in kernfs_iop_lookup() doesn't depend on any kernfs node so there's no reason to hold the kernfs node lock when calling it. Signed-off-by: Ian Kent Reviewed-by: Miklos Szeredi --- fs/kernfs/dir.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 195561c08439a..a5820a8c139a2 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -1097,7 +1097,6 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) { - struct dentry *ret; struct kernfs_node *parent = dir->i_private; struct kernfs_node *kn; struct inode *inode = NULL; @@ -1117,11 +1116,10 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, /* Needed only for negative dentry validation */ if (!inode) kernfs_set_rev(parent, dentry); - /* instantiate and hash (possibly negative) dentry */ - ret = d_splice_alias(inode, dentry); up_read(&kernfs_rwsem); - return ret; + /* instantiate and hash (possibly negative) dentry */ + return d_splice_alias(inode, dentry); } static int kernfs_iop_mkdir(struct user_namespace *mnt_userns,