From patchwork Wed Feb 26 16:13:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11406711 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BF36814BC for ; Wed, 26 Feb 2020 16:15:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9D4652467B for ; Wed, 26 Feb 2020 16:15:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MniZHCOR" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727872AbgBZQPO (ORCPT ); Wed, 26 Feb 2020 11:15:14 -0500 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:56956 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727746AbgBZQPO (ORCPT ); Wed, 26 Feb 2020 11:15:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582733713; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=zbql1kVgqCWhrElPMYBBCs9TKzQ7a0c/Z0nt5++LsTQ=; b=MniZHCORYMQYbPGaRW7QMiPMUyC5R9YhQq7ZA5j5e+/pkBixKR2fB7v0hNNkzjMjlJDkRs DOgHuOL5NwfPfc+GNFzLnnWfmxwoPSVdmnaIv6YxjGkCOfwQ2FWSckaRx23dKbgh57TXbZ g6T3tP7I0DJoepglD7bx1Z9Dp3NIuk4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-167-yAhhoutbM-yxRwr1YGnfRQ-1; Wed, 26 Feb 2020 11:15:07 -0500 X-MC-Unique: yAhhoutbM-yxRwr1YGnfRQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 70751107B272; Wed, 26 Feb 2020 16:15:05 +0000 (UTC) Received: from llong.com (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id E290160BE1; Wed, 26 Feb 2020 16:15:02 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jonathan Corbet , Luis Chamberlain , Kees Cook , Iurii Zaikin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Mauro Carvalho Chehab , Eric Biggers , Dave Chinner , Eric Sandeen , Waiman Long Subject: [PATCH 01/11] fs/dcache: Fix incorrect accounting of negative dentries Date: Wed, 26 Feb 2020 11:13:54 -0500 Message-Id: <20200226161404.14136-2-longman@redhat.com> In-Reply-To: <20200226161404.14136-1-longman@redhat.com> References: <20200226161404.14136-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The nr_dentry_negative counter only tracks the number of negative dentries in lru lists, not when they are in shrink lists. In both __d_clear_type_and_inode() and __d_instantiate(), only the DCACHE_LRU_LIST flag is checked. Though it is highly unlikely that the DCACHE_SHRINK_LIST flag may be set, it is still possible. Fix that by checking the DCACHE_SHRINK_LIST flag as well to make sure that the accounting is correct. The negative dentry test is also moved from __d_instantiate() to __d_set_inode_and_type() to cover more cases. Fixes: af0c9af1b3f6 ("fs/dcache: Track & report number of negative dentries") Signed-off-by: Waiman Long --- fs/dcache.c | 13 +++++++------ include/linux/dcache.h | 9 +++++++++ 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index b280e07e162b..c17b538bf41c 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -315,6 +315,12 @@ static inline void __d_set_inode_and_type(struct dentry *dentry, { unsigned flags; + /* + * Decrement negative dentry count if it was in the LRU list. + */ + if (unlikely(d_in_lru(dentry) && d_is_negative(dentry))) + this_cpu_dec(nr_dentry_negative); + dentry->d_inode = inode; flags = READ_ONCE(dentry->d_flags); flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); @@ -329,7 +335,7 @@ static inline void __d_clear_type_and_inode(struct dentry *dentry) flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU); WRITE_ONCE(dentry->d_flags, flags); dentry->d_inode = NULL; - if (dentry->d_flags & DCACHE_LRU_LIST) + if (d_in_lru(dentry)) this_cpu_inc(nr_dentry_negative); } @@ -1919,11 +1925,6 @@ static void __d_instantiate(struct dentry *dentry, struct inode *inode) WARN_ON(d_in_lookup(dentry)); spin_lock(&dentry->d_lock); - /* - * Decrement negative dentry count if it was in the LRU list. - */ - if (dentry->d_flags & DCACHE_LRU_LIST) - this_cpu_dec(nr_dentry_negative); hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry); raw_write_seqcount_begin(&dentry->d_seq); __d_set_inode_and_type(dentry, inode, add_flags); diff --git a/include/linux/dcache.h b/include/linux/dcache.h index c1488cc84fd9..2762ca2508f9 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -369,6 +369,15 @@ static inline void d_lookup_done(struct dentry *dentry) } } +/* + * Dentry is in a LRU list, not a shrink list. + */ +static inline bool d_in_lru(struct dentry *dentry) +{ + return (dentry->d_flags & (DCACHE_SHRINK_LIST | DCACHE_LRU_LIST)) + == DCACHE_LRU_LIST; +} + extern void dput(struct dentry *); static inline bool d_managed(const struct dentry *dentry) From patchwork Wed Feb 26 16:13:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11406719 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED990159A for ; Wed, 26 Feb 2020 16:15:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C19C424680 for ; Wed, 26 Feb 2020 16:15:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="KkK1Uadi" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727925AbgBZQPR (ORCPT ); Wed, 26 Feb 2020 11:15:17 -0500 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:32575 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727902AbgBZQPP (ORCPT ); Wed, 26 Feb 2020 11:15:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582733714; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=uBXR1+O6Y9T5wb3u6HFevKde6rNHuDZ/iFf4G1XU/gM=; b=KkK1UadiKTpNJb4R6jaOyVey+tA3gz6+obGki7XOP+b2nmqf8r8ZIikmHyPvax8HI5EZsV 78XdcIO0LTayxrQzLkbht6Ky++yIdq3h673rI8jvPzq7Q5I9zaMs3FgoxdHWL+ftwYzmui hMN6YsqlY5BmEo7j2Kae81zJ8RoDG/U= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-272-u4MJdIEmNNyJHCy2hu9kuA-1; Wed, 26 Feb 2020 11:15:09 -0500 X-MC-Unique: u4MJdIEmNNyJHCy2hu9kuA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6AEA51084430; Wed, 26 Feb 2020 16:15:07 +0000 (UTC) Received: from llong.com (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9F6A760BE1; Wed, 26 Feb 2020 16:15:05 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jonathan Corbet , Luis Chamberlain , Kees Cook , Iurii Zaikin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Mauro Carvalho Chehab , Eric Biggers , Dave Chinner , Eric Sandeen , Waiman Long Subject: [PATCH 02/11] fs/dcache: Simplify __dentry_kill() Date: Wed, 26 Feb 2020 11:13:55 -0500 Message-Id: <20200226161404.14136-3-longman@redhat.com> In-Reply-To: <20200226161404.14136-1-longman@redhat.com> References: <20200226161404.14136-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Use the new d_in_lru() helper function to simplify __dentry_kill() a bit. Signed-off-by: Waiman Long --- fs/dcache.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index c17b538bf41c..a977f9e05840 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -572,10 +572,9 @@ static void __dentry_kill(struct dentry *dentry) if (dentry->d_flags & DCACHE_OP_PRUNE) dentry->d_op->d_prune(dentry); - if (dentry->d_flags & DCACHE_LRU_LIST) { - if (!(dentry->d_flags & DCACHE_SHRINK_LIST)) - d_lru_del(dentry); - } + if (d_in_lru(dentry)) + d_lru_del(dentry); + /* if it was on the hash then remove it */ __d_drop(dentry); dentry_unlist(dentry, parent); From patchwork Wed Feb 26 16:13:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11406713 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD38414BC for ; Wed, 26 Feb 2020 16:15:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 92F0024670 for ; Wed, 26 Feb 2020 16:15:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HbvRgu/5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728010AbgBZQPT (ORCPT ); Wed, 26 Feb 2020 11:15:19 -0500 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:45205 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727958AbgBZQPS (ORCPT ); Wed, 26 Feb 2020 11:15:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582733717; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=3ALKQxAFCE5F0LGykWYTT3UrmTGHD8OtYz0jmj8J0is=; b=HbvRgu/5MzllOFe6EqArhwpR8ruMRhle/FXZ+2d69LvxsVwUiibXsb+gvYAjPpFazImH68 BWaqnW/XybPGwLWgv6yO7LSlDjmJSAhtcdPWHK3x8PZJ1yhkUywy8A/3lFbeVWjNj+Indc 1BkfvwQ7ZCyW4JUUHEivU3Gz6bT1n94= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-265-ZKqxeAOBPGuZTRtzGLOUIg-1; Wed, 26 Feb 2020 11:15:13 -0500 X-MC-Unique: ZKqxeAOBPGuZTRtzGLOUIg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3FBEF1005513; Wed, 26 Feb 2020 16:15:09 +0000 (UTC) Received: from llong.com (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id 871A960BE1; Wed, 26 Feb 2020 16:15:07 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jonathan Corbet , Luis Chamberlain , Kees Cook , Iurii Zaikin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Mauro Carvalho Chehab , Eric Biggers , Dave Chinner , Eric Sandeen , Waiman Long Subject: [PATCH 03/11] fs/dcache: Add a counter to track number of children Date: Wed, 26 Feb 2020 11:13:56 -0500 Message-Id: <20200226161404.14136-4-longman@redhat.com> In-Reply-To: <20200226161404.14136-1-longman@redhat.com> References: <20200226161404.14136-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a new field d_nchildren to struct dentry to track the number of children in a directory. Theoretically, we could use reference count (d_lockref.count) as a proxy for the number of children. Normally the reference count should be quite close to the number of children. However, when the directory dentry is heavily contended, the reference count can differ from the number of children by quite a bit. The d_nchildren field is updated only when d_lock has already been held, so the performance cost of this tracking should be negligible. Signed-off-by: Waiman Long --- fs/dcache.c | 16 ++++++++++++---- include/linux/dcache.h | 7 ++++--- 2 files changed, 16 insertions(+), 7 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index a977f9e05840..0ee5aa2c31cf 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -526,6 +526,8 @@ static inline void dentry_unlist(struct dentry *dentry, struct dentry *parent) if (unlikely(list_empty(&dentry->d_child))) return; __list_del_entry(&dentry->d_child); + parent->d_nchildren--; + /* * Cursors can move around the list of children. While we'd been * a normal list member, it didn't matter - ->d_child.next would've @@ -1738,6 +1740,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name) dentry->d_sb = sb; dentry->d_op = NULL; dentry->d_fsdata = NULL; + dentry->d_nchildren = 0; INIT_HLIST_BL_NODE(&dentry->d_hash); INIT_LIST_HEAD(&dentry->d_lru); INIT_LIST_HEAD(&dentry->d_subdirs); @@ -1782,6 +1785,7 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name) __dget_dlock(parent); dentry->d_parent = parent; list_add(&dentry->d_child, &parent->d_subdirs); + parent->d_nchildren++; spin_unlock(&parent->d_lock); return dentry; @@ -2762,10 +2766,10 @@ static void swap_names(struct dentry *dentry, struct dentry *target) * Both are internal. */ unsigned int i; - BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long))); - for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) { - swap(((long *) &dentry->d_iname)[i], - ((long *) &target->d_iname)[i]); + BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(int))); + for (i = 0; i < DNAME_INLINE_LEN / sizeof(int); i++) { + swap(((int *) &dentry->d_iname)[i], + ((int *) &target->d_iname)[i]); } } } @@ -2855,6 +2859,10 @@ static void __d_move(struct dentry *dentry, struct dentry *target, dentry->d_parent->d_lockref.count++; if (dentry != old_parent) /* wasn't IS_ROOT */ WARN_ON(!--old_parent->d_lockref.count); + + /* Adjust d_nchildren */ + dentry->d_parent->d_nchildren++; + old_parent->d_nchildren--; } else { target->d_parent = old_parent; swap_names(dentry, target); diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 2762ca2508f9..e9e66eb50d1a 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -75,12 +75,12 @@ extern struct dentry_stat_t dentry_stat; * large memory footprint increase). */ #ifdef CONFIG_64BIT -# define DNAME_INLINE_LEN 32 /* 192 bytes */ +# define DNAME_INLINE_LEN 28 /* 192 bytes */ #else # ifdef CONFIG_SMP -# define DNAME_INLINE_LEN 36 /* 128 bytes */ +# define DNAME_INLINE_LEN 32 /* 128 bytes */ # else -# define DNAME_INLINE_LEN 40 /* 128 bytes */ +# define DNAME_INLINE_LEN 36 /* 128 bytes */ # endif #endif @@ -96,6 +96,7 @@ struct dentry { struct inode *d_inode; /* Where the name belongs to - NULL is * negative */ unsigned char d_iname[DNAME_INLINE_LEN]; /* small names */ + unsigned int d_nchildren; /* # of children (directory only) */ /* Ref lookup also touches following */ struct lockref d_lockref; /* per-dentry lock and refcount */ From patchwork Wed Feb 26 16:13:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11406715 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1E03492A for ; Wed, 26 Feb 2020 16:15:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E952324670 for ; Wed, 26 Feb 2020 16:15:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cpsgci1b" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727859AbgBZQP3 (ORCPT ); Wed, 26 Feb 2020 11:15:29 -0500 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:34701 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727968AbgBZQPS (ORCPT ); Wed, 26 Feb 2020 11:15:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582733718; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=uJRtUtLfopxQegOvimJxYCld7I/ihswh3Wfa9C5dXY4=; b=cpsgci1bE4NgOxYFFLmQZ94LuifL27XZ7TMrH7rmMqmy8aRDx5ya/HVJX36XdYK4LGu7Oy gm6J1H6Awn+Tgu1eALieMe5M2P5VXCfq47qmKzv5tQ15uhM94JtiBiwcxpQFep3LY+1e4h uxizfUsk9CWmVgWRPmc07E0wYppbc/w= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-462-IKwj9E_NMjSaT0s4LK-7Tg-1; Wed, 26 Feb 2020 11:15:14 -0500 X-MC-Unique: IKwj9E_NMjSaT0s4LK-7Tg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 24792107ACCA; Wed, 26 Feb 2020 16:15:11 +0000 (UTC) Received: from llong.com (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6E7B460BE1; Wed, 26 Feb 2020 16:15:09 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jonathan Corbet , Luis Chamberlain , Kees Cook , Iurii Zaikin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Mauro Carvalho Chehab , Eric Biggers , Dave Chinner , Eric Sandeen , Waiman Long Subject: [PATCH 04/11] fs/dcache: Add sysctl parameter dentry-dir-max Date: Wed, 26 Feb 2020 11:13:57 -0500 Message-Id: <20200226161404.14136-5-longman@redhat.com> In-Reply-To: <20200226161404.14136-1-longman@redhat.com> References: <20200226161404.14136-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The number of positive dentries in a system is limited by the actual number of files in the system. The number of negative dentries, however, has no such limit. As a result, it is possible that a system can have a significant amount of memory tied up in negative dentries. For example, running a negative dentry generator on a 4-socket 256GB x86-64 system, the system can use up more than 150GB of memory in dentries and more than 200GB in slabs and almost running out of free memory before memory reclaim kicks in. There are two major problems with having too many negative dentries: 1) When a filesystem with too many negative dentries is being unmounted, the process of draining the dentries associated with the filesystem can take some time. To users, the system may seem to hang for a while. The long wait may also cause unexpected timeout error or other warnings. This can happen when a long running container with many negative dentries is being destroyed, for instance. 2) Tying up too much memory in unused negative dentries means there are less memory available for other use. Even though the kernel is able to reclaim unused dentries when running out of free memory, it will still introduce additional latency to the application reducing its performance. This patch introduces a new sysctl parameter "dentry-dir-max" to /proc/sys/fs. This syctl parameter represents a soft limit on the total number of negative dentries allowable under any given directory. Any non-negative integer is allowed. The default is 0 which means there is no limit. The actual negative dentry reclaim process to enforce the limit will be done in a later patch. Signed-off-by: Waiman Long --- Documentation/admin-guide/sysctl/fs.rst | 18 ++++++++++++++++++ fs/dcache.c | 10 ++++++++++ kernel/sysctl.c | 10 ++++++++++ 3 files changed, 38 insertions(+) diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst index 2a45119e3331..7274a7b34ee4 100644 --- a/Documentation/admin-guide/sysctl/fs.rst +++ b/Documentation/admin-guide/sysctl/fs.rst @@ -28,6 +28,7 @@ Currently, these files are in /proc/sys/fs: - aio-max-nr - aio-nr +- dentry-dir-max - dentry-state - dquot-max - dquot-nr @@ -60,6 +61,23 @@ raising aio-max-nr does not result in the pre-allocation or re-sizing of any kernel data structures. +dentry-dir-max +-------------- + +This integer value specifies a soft limit on the maximum number +of negative dentries that are allowed under any given directory. +A negative dentry contains filename that is known to be nonexistent +in the directory. No restriction is placed on the number of positive +dentries as it is naturally limited by the number of files in the +directory. + +The default value is 0 which means there is no limit. Any non-negative +value is allowed. However, internal tracking is done on all dentry +types. So the value given, if non-zero, should be larger than the +number of files in a typical large directory in order to reduce the +tracking overhead. + + dentry-state ------------ diff --git a/fs/dcache.c b/fs/dcache.c index 0ee5aa2c31cf..8f3ac31a582b 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -123,6 +123,16 @@ static DEFINE_PER_CPU(long, nr_dentry); static DEFINE_PER_CPU(long, nr_dentry_unused); static DEFINE_PER_CPU(long, nr_dentry_negative); +/* + * dcache_dentry_dir_max_sysctl: + * + * This is sysctl parameter "dentry-dir-max" which specifies a limit on + * the maximum number of negative dentries that are allowed under any + * given directory. + */ +int dcache_dentry_dir_max_sysctl __read_mostly; +EXPORT_SYMBOL_GPL(dcache_dentry_dir_max_sysctl); + #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS) /* diff --git a/kernel/sysctl.c b/kernel/sysctl.c index d396aaaf19a3..cd0a83ff1029 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -118,6 +118,7 @@ extern unsigned int sysctl_nr_open_min, sysctl_nr_open_max; #ifndef CONFIG_MMU extern int sysctl_nr_trim_pages; #endif +extern int dcache_dentry_dir_max_sysctl; /* Constants used for minimum and maximum */ #ifdef CONFIG_LOCKUP_DETECTOR @@ -127,6 +128,7 @@ static int sixty = 60; static int __maybe_unused neg_one = -1; static int __maybe_unused two = 2; static int __maybe_unused four = 4; +static int __maybe_unused zero; static unsigned long zero_ul; static unsigned long one_ul = 1; static unsigned long long_max = LONG_MAX; @@ -1949,6 +1951,14 @@ static struct ctl_table fs_table[] = { .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ONE, }, + { + .procname = "dentry-dir-max", + .data = &dcache_dentry_dir_max_sysctl, + .maxlen = sizeof(dcache_dentry_dir_max_sysctl), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + }, { } }; From patchwork Wed Feb 26 16:13:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11406717 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B41B92A for ; Wed, 26 Feb 2020 16:15:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6206624689 for ; Wed, 26 Feb 2020 16:15:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NEq3/cmU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727972AbgBZQPa (ORCPT ); Wed, 26 Feb 2020 11:15:30 -0500 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:36101 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727933AbgBZQPS (ORCPT ); Wed, 26 Feb 2020 11:15:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582733717; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=kCzGpCS1aNV/c/qW+M9GWrB1Cx2h9S+8mlMVAKZLeTI=; b=NEq3/cmUkWXYj5GNgcCsgL186/AGC5/8/L+pwWwvlCcFmR7PkjQM/hWEFB8N0gdWgWb2pc 2yzLIiMzLQ3e8fOdG3VhOVIvMCC4DizL3mNQjhFXVT6s6hwieufoM2+BDmX5IfQYd2HCdN 5SbBBA1mwPezQgwQK0qF5jWvb25KFfI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-85-NJM5qrePP5WY2H0IdBsGmQ-1; Wed, 26 Feb 2020 11:15:15 -0500 X-MC-Unique: NJM5qrePP5WY2H0IdBsGmQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1C328800D5A; Wed, 26 Feb 2020 16:15:13 +0000 (UTC) Received: from llong.com (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id 53DD660BE1; Wed, 26 Feb 2020 16:15:11 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jonathan Corbet , Luis Chamberlain , Kees Cook , Iurii Zaikin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Mauro Carvalho Chehab , Eric Biggers , Dave Chinner , Eric Sandeen , Waiman Long Subject: [PATCH 05/11] fs/dcache: Reclaim excessive negative dentries in directories Date: Wed, 26 Feb 2020 11:13:58 -0500 Message-Id: <20200226161404.14136-6-longman@redhat.com> In-Reply-To: <20200226161404.14136-1-longman@redhat.com> References: <20200226161404.14136-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When the "dentry-dir-max" sysctl parameter is set, it will enable the checking of dentry count in the parent directory when a negative dentry is being retained. If the count exceeds the given limit, it will schedule a work function to scan the children of that parent directory to find negative dentries to be reclaimed. Positive dentries will not be touched. Signed-off-by: Waiman Long --- fs/dcache.c | 207 +++++++++++++++++++++++++++++++++++++++++ include/linux/dcache.h | 2 + 2 files changed, 209 insertions(+) diff --git a/fs/dcache.c b/fs/dcache.c index 8f3ac31a582b..01c6d7277244 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -133,6 +133,11 @@ static DEFINE_PER_CPU(long, nr_dentry_negative); int dcache_dentry_dir_max_sysctl __read_mostly; EXPORT_SYMBOL_GPL(dcache_dentry_dir_max_sysctl); +static LLIST_HEAD(negative_reclaim_list); +static void negative_reclaim_check(struct dentry *parent); +static void negative_reclaim_workfn(struct work_struct *work); +static DECLARE_WORK(negative_reclaim_work, negative_reclaim_workfn); + #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS) /* @@ -869,7 +874,20 @@ void dput(struct dentry *dentry) rcu_read_unlock(); if (likely(retain_dentry(dentry))) { + struct dentry *neg_parent = NULL; + + if (d_is_negative(dentry)) { + neg_parent = dentry->d_parent; + rcu_read_lock(); + } spin_unlock(&dentry->d_lock); + + /* + * Negative dentry reclaim check is only done when + * a negative dentry is being put into a LRU list. + */ + if (neg_parent) + negative_reclaim_check(neg_parent); return; } @@ -1261,6 +1279,195 @@ void shrink_dcache_sb(struct super_block *sb) } EXPORT_SYMBOL(shrink_dcache_sb); +/* + * Return true if reclaiming negative dentry can happen. + */ +static inline bool can_reclaim_dentry(unsigned int flags) +{ + return !(flags & (DCACHE_SHRINK_LIST | DCACHE_GENOCIDE | + DCACHE_DENTRY_KILLED)); +} + +struct reclaim_dentry +{ + struct llist_node reclaim_node; + struct dentry *parent_dir; +}; + +/* + * Reclaim excess negative dentries in a directory + */ +static void reclaim_negative_dentry(struct dentry *parent, + struct list_head *dispose) +{ + struct dentry *child; + int limit = READ_ONCE(dcache_dentry_dir_max_sysctl); + int cnt; + + lockdep_assert_held(&parent->d_lock); + + cnt = parent->d_nchildren; + + /* + * Compute # of negative dentries to be reclaimed + * An extra 1/8 of dcache_dentry_dir_max_sysctl is added. + */ + if (cnt <= limit) + return; + cnt -= limit; + cnt += (limit >> 3); + + /* + * The d_subdirs is treated like a kind of LRU where + * non-negative dentries are skipped. Negative dentries + * with DCACHE_REFERENCED bit set are also skipped but + * with DCACHE_REFERENCED cleared. + */ + list_for_each_entry(child, &parent->d_subdirs, d_child) { + /* + * This check is racy and so it may not be accurate. + */ + if (!d_is_negative(child)) + continue; + + if (!spin_trylock(&child->d_lock)) + continue; + + /* + * Only reclaim zero-refcnt negative dentries in the + * LRU, but not in shrink list. + */ + if (can_reclaim_dentry(child->d_flags) && + d_is_negative(child) && d_in_lru(child) && + !child->d_lockref.count) { + if (child->d_flags & DCACHE_REFERENCED) { + child->d_flags &= ~DCACHE_REFERENCED; + } else { + cnt--; + d_lru_del(child); + d_shrink_add(child, dispose); + } + } + spin_unlock(&child->d_lock); + if (!cnt) { + child = list_next_entry(child, d_child); + break; + } + } + + if (&child->d_child != &parent->d_subdirs) { + /* + * Split out the childs from the head up to just + * before the current entry into a separate list and + * then splice it to the end of the child list so + * that the unscanned entries will be in the front. + * This simulates LRU. + */ + struct list_head list; + + list_cut_before(&list, &parent->d_subdirs, + &child->d_child); + list_splice_tail(&list, &parent->d_subdirs); + } +} + +/* + * Excess negative dentry reclaim work function. + */ +static void negative_reclaim_workfn(struct work_struct *work) +{ + struct llist_node *nodes, *next; + struct dentry *parent; + struct reclaim_dentry *dentry_node; + + /* + * Collect excess negative dentries in dispose list & shrink them. + */ + for (nodes = llist_del_all(&negative_reclaim_list); + nodes; nodes = next) { + LIST_HEAD(dispose); + + next = llist_next(nodes); + dentry_node = container_of(nodes, struct reclaim_dentry, + reclaim_node); + parent = dentry_node->parent_dir; + spin_lock(&parent->d_lock); + + if (d_is_dir(parent) && + can_reclaim_dentry(parent->d_flags) && + (parent->d_flags & DCACHE_RECLAIMING)) + reclaim_negative_dentry(parent, &dispose); + + if (!list_empty(&dispose)) { + spin_unlock(&parent->d_lock); + shrink_dentry_list(&dispose); + spin_lock(&parent->d_lock); + } + + parent->d_flags &= ~DCACHE_RECLAIMING; + spin_unlock(&parent->d_lock); + dput(parent); + kfree(dentry_node); + cond_resched(); + } +} + +/* + * Check the parent to see if it has too many negative dentries and queue + * it up for the negative dentry reclaim work function to handle it. + */ +static void negative_reclaim_check(struct dentry *parent) + __releases(rcu) +{ + int limit = dcache_dentry_dir_max_sysctl; + struct reclaim_dentry *dentry_node; + + if (!limit) + goto rcu_unlock_out; + + /* + * These checks are racy before spin_lock(). + */ + if (!can_reclaim_dentry(parent->d_flags) || + (parent->d_flags & DCACHE_RECLAIMING) || + (READ_ONCE(parent->d_nchildren) <= limit)) + goto rcu_unlock_out; + + spin_lock(&parent->d_lock); + if (!can_reclaim_dentry(parent->d_flags) || + (parent->d_flags & DCACHE_RECLAIMING) || + (READ_ONCE(parent->d_nchildren) <= limit)) + goto unlock_out; + + if (!d_is_dir(parent)) + goto unlock_out; + + dentry_node = kzalloc(sizeof(*dentry_node), GFP_NOWAIT); + if (!dentry_node) + goto unlock_out; + + rcu_read_unlock(); + __dget_dlock(parent); + dentry_node->parent_dir = parent; + parent->d_flags |= DCACHE_RECLAIMING; + spin_unlock(&parent->d_lock); + + /* + * Only call schedule_work() if negative_reclaim_list is previously + * empty. Otherwise, schedule_work() had been called but the workfn + * workfn hasn't retrieved the list yet. + */ + if (llist_add(&dentry_node->reclaim_node, &negative_reclaim_list)) + schedule_work(&negative_reclaim_work); + return; + +unlock_out: + spin_unlock(&parent->d_lock); +rcu_unlock_out: + rcu_read_unlock(); + return; +} + /** * enum d_walk_ret - action to talke during tree walk * @D_WALK_CONTINUE: contrinue walk diff --git a/include/linux/dcache.h b/include/linux/dcache.h index e9e66eb50d1a..f14d72738903 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -13,6 +13,7 @@ #include #include #include +#include struct path; struct vfsmount; @@ -214,6 +215,7 @@ struct dentry_operations { #define DCACHE_FALLTHRU 0x01000000 /* Fall through to lower layer */ #define DCACHE_ENCRYPTED_NAME 0x02000000 /* Encrypted name (dir key was unavailable) */ #define DCACHE_OP_REAL 0x04000000 +#define DCACHE_RECLAIMING 0x08000000 /* Reclaiming negative dentries */ #define DCACHE_PAR_LOOKUP 0x10000000 /* being looked up (with parent locked shared) */ #define DCACHE_DENTRY_CURSOR 0x20000000 From patchwork Wed Feb 26 16:13:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11406721 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3EF6E92A for ; Wed, 26 Feb 2020 16:15:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 15E7A2468C for ; Wed, 26 Feb 2020 16:15:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gKvtt4U1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728260AbgBZQPf (ORCPT ); Wed, 26 Feb 2020 11:15:35 -0500 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:39002 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728225AbgBZQPd (ORCPT ); Wed, 26 Feb 2020 11:15:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582733733; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=6Nj8aozQ7eWo4g5FoVKfC5j2jdSztjGAeiNGp+btRF0=; b=gKvtt4U1ZOSY6zvAM8Ns5MfkXhe9tVuPtHBbWEU0wPjzQbdkbfdGEczvtkKfVcST75Xkyy kbqwyEafDOZBqspqluBUPicg2qOjBnZnMLjl4aKcdyh0xqDVCh02Uc0MRqWevSc0pTxvrg TmgYxn7yamgA6x1vOQBovzGofoUQHqU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-36-DAuDgxLROc2L7z0rH_u6rw-1; Wed, 26 Feb 2020 11:15:27 -0500 X-MC-Unique: DAuDgxLROc2L7z0rH_u6rw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 247448018A1; Wed, 26 Feb 2020 16:15:25 +0000 (UTC) Received: from llong.com (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3DB3F60BE1; Wed, 26 Feb 2020 16:15:13 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jonathan Corbet , Luis Chamberlain , Kees Cook , Iurii Zaikin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Mauro Carvalho Chehab , Eric Biggers , Dave Chinner , Eric Sandeen , Waiman Long Subject: [PATCH 06/11] fs/dcache: directory opportunistically stores # of positive dentries Date: Wed, 26 Feb 2020 11:13:59 -0500 Message-Id: <20200226161404.14136-7-longman@redhat.com> In-Reply-To: <20200226161404.14136-1-longman@redhat.com> References: <20200226161404.14136-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For directories that contain large number of files (e.g. in thousands), the number of positive dentries that will never be reclaimed by the negative dentry reclaiming process may approach or even exceed "dentry-dir-max". That will unnecessary cause the triggering of the reclaim process even if there aren't that many negative dentries that can be reclaimed. This can impact system performance. One possible way to solve this problem is to somehow store the number of positive or negative dentries in the directory dentry itself. The negative dentry count can change frequently, whereas the positive dentry count is relatively more stable, Keeping an accurate count of positive or negative dentries can be costly. Instead, an estimate of the positive dentry count is computed in the scan loop of the negative dentry reclaim work function. That computed value is then stored in the trailing end of the d_iname[] array if there is enough space for it. This value is then used to estimate the number of negative dentries in the directory to be compare against the "dentry-dir-max" value. Signed-off-by: Waiman Long --- fs/dcache.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 81 insertions(+), 9 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 01c6d7277244..c5364c2ed5d8 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1294,6 +1294,60 @@ struct reclaim_dentry struct dentry *parent_dir; }; +/* + * If there is enough space at the end of d_iname[] of a directory dentry + * to hold an integer. The space will be used to hold an estimate of the + * number of positive dentries in the directory. That number will be + * subtracted from d_nchildren to see if the limit has been exceeded and + * the number of excess negative dentries to be reclaimed. + */ +struct d_iname_count { + unsigned char d_dummy[DNAME_INLINE_LEN - sizeof(int)]; + unsigned int d_npositive; +}; + +static inline bool dentry_has_npositive(struct dentry *dentry) +{ + int len = dentry->d_name.len; + + BUILD_BUG_ON(sizeof(struct d_iname_count) != sizeof(dentry->d_iname)); + + return (len >= DNAME_INLINE_LEN) || + (len < DNAME_INLINE_LEN - sizeof(int)); +} + +static inline unsigned int read_dentry_npositive(struct dentry *dentry) +{ + struct d_iname_count *p = (struct d_iname_count *)dentry->d_iname; + + return p->d_npositive; +} + +static inline void set_dentry_npositive(struct dentry *dentry, + unsigned int npositive) +{ + struct d_iname_count *p = (struct d_iname_count *)dentry->d_iname; + + p->d_npositive = npositive; +} + +/* + * Get an estimated negative dentry count + */ +static inline unsigned int read_dentry_nnegative(struct dentry *dentry) +{ + return dentry->d_nchildren - (dentry_has_npositive(dentry) + ? read_dentry_npositive(dentry) : 0); +} + +/* + * Initialize d_iname[] to have null bytes at the end of the array. + */ +static inline void init_dentry_iname(struct dentry *dentry) +{ + set_dentry_npositive(dentry, 0); +} + /* * Reclaim excess negative dentries in a directory */ @@ -1302,11 +1356,11 @@ static void reclaim_negative_dentry(struct dentry *parent, { struct dentry *child; int limit = READ_ONCE(dcache_dentry_dir_max_sysctl); - int cnt; + int cnt, npositive; lockdep_assert_held(&parent->d_lock); - cnt = parent->d_nchildren; + cnt = read_dentry_nnegative(parent); /* * Compute # of negative dentries to be reclaimed @@ -1316,6 +1370,7 @@ static void reclaim_negative_dentry(struct dentry *parent, return; cnt -= limit; cnt += (limit >> 3); + npositive = 0; /* * The d_subdirs is treated like a kind of LRU where @@ -1327,8 +1382,10 @@ static void reclaim_negative_dentry(struct dentry *parent, /* * This check is racy and so it may not be accurate. */ - if (!d_is_negative(child)) + if (!d_is_negative(child)) { + npositive++; continue; + } if (!spin_trylock(&child->d_lock)) continue; @@ -1368,7 +1425,17 @@ static void reclaim_negative_dentry(struct dentry *parent, list_cut_before(&list, &parent->d_subdirs, &child->d_child); list_splice_tail(&list, &parent->d_subdirs); + + /* + * Update positive dentry count estimate + * Don't allow npositive to decay by more than 1/2. + */ + if (dentry_has_npositive(parent) && + (read_dentry_npositive(parent) > 2 * npositive)) + npositive = read_dentry_npositive(parent) / 2; } + if (dentry_has_npositive(parent)) + set_dentry_npositive(parent, npositive); } /* @@ -1430,16 +1497,14 @@ static void negative_reclaim_check(struct dentry *parent) */ if (!can_reclaim_dentry(parent->d_flags) || (parent->d_flags & DCACHE_RECLAIMING) || - (READ_ONCE(parent->d_nchildren) <= limit)) + (read_dentry_nnegative(parent) <= limit)) goto rcu_unlock_out; spin_lock(&parent->d_lock); if (!can_reclaim_dentry(parent->d_flags) || (parent->d_flags & DCACHE_RECLAIMING) || - (READ_ONCE(parent->d_nchildren) <= limit)) - goto unlock_out; - - if (!d_is_dir(parent)) + (read_dentry_nnegative(parent) <= limit) || + !d_is_dir(parent)) goto unlock_out; dentry_node = kzalloc(sizeof(*dentry_node), GFP_NOWAIT); @@ -1921,7 +1986,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name) * will still always have a NUL at the end, even if we might * be overwriting an internal NUL character */ - dentry->d_iname[DNAME_INLINE_LEN-1] = 0; + init_dentry_iname(dentry); if (unlikely(!name)) { name = &slash_name; dname = dentry->d_iname; @@ -2991,11 +3056,18 @@ static void swap_names(struct dentry *dentry, struct dentry *target) } } swap(dentry->d_name.hash_len, target->d_name.hash_len); + + if (dentry_has_npositive(dentry)) + set_dentry_npositive(dentry, 0); + if (dentry_has_npositive(target)) + set_dentry_npositive(target, 0); } static void copy_name(struct dentry *dentry, struct dentry *target) { struct external_name *old_name = NULL; + + init_dentry_iname(dentry); if (unlikely(dname_external(dentry))) old_name = external_name(dentry); if (unlikely(dname_external(target))) { From patchwork Wed Feb 26 16:14:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11406723 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A890192A for ; Wed, 26 Feb 2020 16:15:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 87BC824680 for ; Wed, 26 Feb 2020 16:15:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="E5B24mbs" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728610AbgBZQPi (ORCPT ); Wed, 26 Feb 2020 11:15:38 -0500 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:51956 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728373AbgBZQPh (ORCPT ); Wed, 26 Feb 2020 11:15:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582733737; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=plSY3/ieBBIb+uqAgd5rkUQMsog86fQPx4J35UFU/bc=; b=E5B24mbs4NAB4uNAp+xMnYWDt2c/RQyY3O1OtCOwDUvym0yILB+SVahps7umRKAptHFGTE M95aDP1cMFMdn4yLqeIZ/OQrctHVDCcqnthYJjolz9+egTWDUlQKuZvF2Db91jQKOIPY4H Gu/BfcLmqQrSUYDVA6JLIm1vzsk/5Po= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-182-Hj-vPNxaMieEVX_Boex0rg-1; Wed, 26 Feb 2020 11:15:33 -0500 X-MC-Unique: Hj-vPNxaMieEVX_Boex0rg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 289DADBF2; Wed, 26 Feb 2020 16:15:31 +0000 (UTC) Received: from llong.com (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id 536B860BE1; Wed, 26 Feb 2020 16:15:25 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jonathan Corbet , Luis Chamberlain , Kees Cook , Iurii Zaikin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Mauro Carvalho Chehab , Eric Biggers , Dave Chinner , Eric Sandeen , Waiman Long Subject: [PATCH 07/11] fs/dcache: Add static key negative_reclaim_enable Date: Wed, 26 Feb 2020 11:14:00 -0500 Message-Id: <20200226161404.14136-8-longman@redhat.com> In-Reply-To: <20200226161404.14136-1-longman@redhat.com> References: <20200226161404.14136-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Add a static_key negative_reclaim_enable to optimize the default case where negative dentry directory limit "dentry-dir-max" is not set. Signed-off-by: Waiman Long --- fs/dcache.c | 30 ++++++++++++++++++++++++++++-- kernel/sysctl.c | 3 ++- 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index c5364c2ed5d8..149c0a6c1a6e 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -32,6 +32,7 @@ #include #include #include +#include #include "internal.h" #include "mount.h" @@ -134,11 +135,13 @@ int dcache_dentry_dir_max_sysctl __read_mostly; EXPORT_SYMBOL_GPL(dcache_dentry_dir_max_sysctl); static LLIST_HEAD(negative_reclaim_list); +static DEFINE_STATIC_KEY_FALSE(negative_reclaim_enable); static void negative_reclaim_check(struct dentry *parent); static void negative_reclaim_workfn(struct work_struct *work); static DECLARE_WORK(negative_reclaim_work, negative_reclaim_workfn); #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS) +proc_handler proc_dcache_dentry_dir_max; /* * Here we resort to our own counters instead of using generic per-cpu counters @@ -188,6 +191,27 @@ int proc_nr_dentry(struct ctl_table *table, int write, void __user *buffer, dentry_stat.nr_negative = get_nr_dentry_negative(); return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); } + +/* + * Sysctl proc handler for dcache_dentry_dir_max_sysctl + */ +int proc_dcache_dentry_dir_max(struct ctl_table *ctl, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + int old = dcache_dentry_dir_max_sysctl; + int ret; + + ret = proc_dointvec_minmax(ctl, write, buffer, lenp, ppos); + + if (!write || ret || (dcache_dentry_dir_max_sysctl == old)) + return ret; + + if (!old && dcache_dentry_dir_max_sysctl) + static_branch_enable(&negative_reclaim_enable); + else if (old && !dcache_dentry_dir_max_sysctl) + static_branch_disable(&negative_reclaim_enable); + return 0; +} #endif /* @@ -876,7 +900,8 @@ void dput(struct dentry *dentry) if (likely(retain_dentry(dentry))) { struct dentry *neg_parent = NULL; - if (d_is_negative(dentry)) { + if (static_branch_unlikely(&negative_reclaim_enable) && + d_is_negative(dentry)) { neg_parent = dentry->d_parent; rcu_read_lock(); } @@ -886,7 +911,8 @@ void dput(struct dentry *dentry) * Negative dentry reclaim check is only done when * a negative dentry is being put into a LRU list. */ - if (neg_parent) + if (static_branch_unlikely(&negative_reclaim_enable) && + neg_parent) negative_reclaim_check(neg_parent); return; } diff --git a/kernel/sysctl.c b/kernel/sysctl.c index cd0a83ff1029..9a4b0a1376e8 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -119,6 +119,7 @@ extern unsigned int sysctl_nr_open_min, sysctl_nr_open_max; extern int sysctl_nr_trim_pages; #endif extern int dcache_dentry_dir_max_sysctl; +extern proc_handler proc_dcache_dentry_dir_max; /* Constants used for minimum and maximum */ #ifdef CONFIG_LOCKUP_DETECTOR @@ -1956,7 +1957,7 @@ static struct ctl_table fs_table[] = { .data = &dcache_dentry_dir_max_sysctl, .maxlen = sizeof(dcache_dentry_dir_max_sysctl), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dcache_dentry_dir_max, .extra1 = &zero, }, { } From patchwork Wed Feb 26 16:14:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11406725 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CE06914BC for ; Wed, 26 Feb 2020 16:15:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AD0E624689 for ; Wed, 26 Feb 2020 16:15:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="A8XQj9g2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728656AbgBZQPk (ORCPT ); Wed, 26 Feb 2020 11:15:40 -0500 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:49066 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727916AbgBZQPj (ORCPT ); Wed, 26 Feb 2020 11:15:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582733738; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=jFlp1Ai79InLDRWAlyPIGd+U3BTEfGYkEUUnbfg479Y=; b=A8XQj9g20ne5V0KqXfhAklyUdKJU46KKyWh6vCSUNzqN+TlT9tlZHM4/+h3mM+lLKA6LqQ jCo7s8VdH8kjBPySY5xLa7x+goHX4sOe8++18Oje8dKq2wrCKQGf6br2IaVQMeX/y492/O qv9ecEhv439jo8r6XTEW2w1ij80ts4w= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-161-CpvYF8T_O5Go3acqcBCJNA-1; Wed, 26 Feb 2020 11:15:36 -0500 X-MC-Unique: CpvYF8T_O5Go3acqcBCJNA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 08A6510883B9; Wed, 26 Feb 2020 16:15:34 +0000 (UTC) Received: from llong.com (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id 55CEE60BE1; Wed, 26 Feb 2020 16:15:31 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jonathan Corbet , Luis Chamberlain , Kees Cook , Iurii Zaikin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Mauro Carvalho Chehab , Eric Biggers , Dave Chinner , Eric Sandeen , Waiman Long Subject: [PATCH 08/11] fs/dcache: Limit dentry reclaim count in negative_reclaim_workfn() Date: Wed, 26 Feb 2020 11:14:01 -0500 Message-Id: <20200226161404.14136-9-longman@redhat.com> In-Reply-To: <20200226161404.14136-1-longman@redhat.com> References: <20200226161404.14136-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org To limit the d_lock hold time of directory dentry in the negative dentry reclaim process, a quota (currently 64k) is added to limit the amount of work that the work function can do and hence its execution time. This is done to minimize impact on other processes that depend on that d_lock or other work functions in the same work queue from excessive delay. Signed-off-by: Waiman Long --- fs/dcache.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 149c0a6c1a6e..0bd5d6974f75 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1374,10 +1374,25 @@ static inline void init_dentry_iname(struct dentry *dentry) set_dentry_npositive(dentry, 0); } +/* + * In the pathological case where a large number of negative dentries are + * generated in a short time in a given directory, there is a possibility + * that negative dentries reclaiming process will have many dentries to + * be dispose of. Thus the d_lock lock can be hold for too long impacting + * other running processes that need it. + * + * There is also the consideration that a long runtime will impact other + * work functions that need to be run in the same work queue. As a result, + * we have to limit the number of dentries that can be reclaimed in each + * invocation of the work function. + */ +#define MAX_DENTRY_RECLAIM (1 << 16) + /* * Reclaim excess negative dentries in a directory + * Return: true if the work function needs to be rescheduled, false otherwise */ -static void reclaim_negative_dentry(struct dentry *parent, +static void reclaim_negative_dentry(struct dentry *parent, int *quota, struct list_head *dispose) { struct dentry *child; @@ -1394,9 +1409,16 @@ static void reclaim_negative_dentry(struct dentry *parent, */ if (cnt <= limit) return; + + npositive = 0; cnt -= limit; cnt += (limit >> 3); - npositive = 0; + if (cnt >= *quota) { + cnt = *quota; + *quota = 0; + } else { + *quota -= cnt; + } /* * The d_subdirs is treated like a kind of LRU where @@ -1462,6 +1484,8 @@ static void reclaim_negative_dentry(struct dentry *parent, } if (dentry_has_npositive(parent)) set_dentry_npositive(parent, npositive); + + *quota += cnt; } /* @@ -1472,6 +1496,7 @@ static void negative_reclaim_workfn(struct work_struct *work) struct llist_node *nodes, *next; struct dentry *parent; struct reclaim_dentry *dentry_node; + int quota = MAX_DENTRY_RECLAIM; /* * Collect excess negative dentries in dispose list & shrink them. @@ -1486,10 +1511,10 @@ static void negative_reclaim_workfn(struct work_struct *work) parent = dentry_node->parent_dir; spin_lock(&parent->d_lock); - if (d_is_dir(parent) && + if (d_is_dir(parent) && quota && can_reclaim_dentry(parent->d_flags) && (parent->d_flags & DCACHE_RECLAIMING)) - reclaim_negative_dentry(parent, &dispose); + reclaim_negative_dentry(parent, "a, &dispose); if (!list_empty(&dispose)) { spin_unlock(&parent->d_lock); From patchwork Wed Feb 26 16:14:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11406727 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 233F414BC for ; Wed, 26 Feb 2020 16:15:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EDDEB2468C for ; Wed, 26 Feb 2020 16:15:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ieHfa08j" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727709AbgBZQPv (ORCPT ); Wed, 26 Feb 2020 11:15:51 -0500 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:60715 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728231AbgBZQPv (ORCPT ); Wed, 26 Feb 2020 11:15:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582733750; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=WAJNmARdDy6U3AnNzgF1Y6hbeCE51bqcuKxWLiiANy0=; b=ieHfa08jsQvjNPSW/slmP7AE9Sjg9A0qxqbHZh61ub2c6WehybEWhSL7rYJ4JjftkQ8eal VPVsHTY/LxJEKjILtv+tgwUHdAMZ9GEfnZGdkKTyodZZlLYRHvv9x8yYhozXiIygBSFw+q lbBmwvIkbFOq9S7FETS5dvmNzRJub8A= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-187-1xEJLjCZMlqeDisfGRA45A-1; Wed, 26 Feb 2020 11:15:46 -0500 X-MC-Unique: 1xEJLjCZMlqeDisfGRA45A-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D6BFF13E5; Wed, 26 Feb 2020 16:15:44 +0000 (UTC) Received: from llong.com (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3814260BE1; Wed, 26 Feb 2020 16:15:34 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jonathan Corbet , Luis Chamberlain , Kees Cook , Iurii Zaikin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Mauro Carvalho Chehab , Eric Biggers , Dave Chinner , Eric Sandeen , Waiman Long Subject: [PATCH 09/11] fs/dcache: Don't allow small values for dentry-dir-max Date: Wed, 26 Feb 2020 11:14:02 -0500 Message-Id: <20200226161404.14136-10-longman@redhat.com> In-Reply-To: <20200226161404.14136-1-longman@redhat.com> References: <20200226161404.14136-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org A small value for "dentry-dir-max", e.g. < 10, will cause excessive dentry count checking leading to noticeable performance degradation. In order to make this sysctl parameter more foolproof, we are not going to allow any positive integer value less than 256. Signed-off-by: Waiman Long --- Documentation/admin-guide/sysctl/fs.rst | 10 +++++----- fs/dcache.c | 24 +++++++++++++++++++----- 2 files changed, 24 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst index 7274a7b34ee4..e09d851f9d42 100644 --- a/Documentation/admin-guide/sysctl/fs.rst +++ b/Documentation/admin-guide/sysctl/fs.rst @@ -71,11 +71,11 @@ in the directory. No restriction is placed on the number of positive dentries as it is naturally limited by the number of files in the directory. -The default value is 0 which means there is no limit. Any non-negative -value is allowed. However, internal tracking is done on all dentry -types. So the value given, if non-zero, should be larger than the -number of files in a typical large directory in order to reduce the -tracking overhead. +The default value is 0 which means there is no limit. Any positive +integer value not less than 256 is also allowed. However, internal +tracking is done on all dentry types. So the value given, if non-zero, +should be larger than the number of files in a typical large directory +in order to reduce the tracking overhead. dentry-state diff --git a/fs/dcache.c b/fs/dcache.c index 0bd5d6974f75..f470763e7fb8 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -129,10 +129,14 @@ static DEFINE_PER_CPU(long, nr_dentry_negative); * * This is sysctl parameter "dentry-dir-max" which specifies a limit on * the maximum number of negative dentries that are allowed under any - * given directory. + * given directory. The allowable value of "dentry-dir-max" is either + * 0, which means no limit, or 256 and up. A low value of "dentry-dir-max" + * will cause excessive dentry count checking affecting system performance. */ -int dcache_dentry_dir_max_sysctl __read_mostly; +int dcache_dentry_dir_max_sysctl; EXPORT_SYMBOL_GPL(dcache_dentry_dir_max_sysctl); +static int negative_dentry_dir_max __read_mostly; +#define DENTRY_DIR_MAX_MIN 0x100 static LLIST_HEAD(negative_reclaim_list); static DEFINE_STATIC_KEY_FALSE(negative_reclaim_enable); @@ -206,6 +210,16 @@ int proc_dcache_dentry_dir_max(struct ctl_table *ctl, int write, if (!write || ret || (dcache_dentry_dir_max_sysctl == old)) return ret; + /* + * A non-zero value must be >= DENTRY_DIR_MAX_MIN. + */ + if (dcache_dentry_dir_max_sysctl && + (dcache_dentry_dir_max_sysctl < DENTRY_DIR_MAX_MIN)) { + dcache_dentry_dir_max_sysctl = old; + return -EINVAL; + } + + negative_dentry_dir_max = dcache_dentry_dir_max_sysctl; if (!old && dcache_dentry_dir_max_sysctl) static_branch_enable(&negative_reclaim_enable); else if (old && !dcache_dentry_dir_max_sysctl) @@ -1396,7 +1410,7 @@ static void reclaim_negative_dentry(struct dentry *parent, int *quota, struct list_head *dispose) { struct dentry *child; - int limit = READ_ONCE(dcache_dentry_dir_max_sysctl); + int limit = READ_ONCE(negative_dentry_dir_max); int cnt, npositive; lockdep_assert_held(&parent->d_lock); @@ -1405,7 +1419,7 @@ static void reclaim_negative_dentry(struct dentry *parent, int *quota, /* * Compute # of negative dentries to be reclaimed - * An extra 1/8 of dcache_dentry_dir_max_sysctl is added. + * An extra 1/8 of negative_dentry_dir_max is added. */ if (cnt <= limit) return; @@ -1537,7 +1551,7 @@ static void negative_reclaim_workfn(struct work_struct *work) static void negative_reclaim_check(struct dentry *parent) __releases(rcu) { - int limit = dcache_dentry_dir_max_sysctl; + int limit = negative_dentry_dir_max; struct reclaim_dentry *dentry_node; if (!limit) From patchwork Wed Feb 26 16:14:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11406729 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 746E792A for ; Wed, 26 Feb 2020 16:15:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 55D9E2468C for ; Wed, 26 Feb 2020 16:15:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NMUAcpxG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727989AbgBZQPz (ORCPT ); Wed, 26 Feb 2020 11:15:55 -0500 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:28200 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727113AbgBZQPy (ORCPT ); Wed, 26 Feb 2020 11:15:54 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582733754; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=GUkVDGXeY3ZWzjteqVo5pUlFA8rYYrLY++MrI/DOw0Y=; b=NMUAcpxG/Sag8zN1j6HB3Ly6ZmbnpD8kKQFLIlZXJlH11s1xaLSO5/BVj6wiDvcWN4wm6S LZX0lgOCxQfA+ZHHWF6Lg9SOzod+1iX3YUxelWHcweSNa8rTNHIBiUJBq78q+BxfgCQPJi apMB5dghZPXtGlq6G6oidFrSRcxxNGM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-93-SqNXyJ2kPl6nm_SNBbhjtQ-1; Wed, 26 Feb 2020 11:15:49 -0500 X-MC-Unique: SqNXyJ2kPl6nm_SNBbhjtQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A2287802573; Wed, 26 Feb 2020 16:15:47 +0000 (UTC) Received: from llong.com (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0CCA160BE1; Wed, 26 Feb 2020 16:15:44 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jonathan Corbet , Luis Chamberlain , Kees Cook , Iurii Zaikin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Mauro Carvalho Chehab , Eric Biggers , Dave Chinner , Eric Sandeen , Waiman Long Subject: [PATCH 10/11] fs/dcache: Kill off dentry as last resort Date: Wed, 26 Feb 2020 11:14:03 -0500 Message-Id: <20200226161404.14136-11-longman@redhat.com> In-Reply-To: <20200226161404.14136-1-longman@redhat.com> References: <20200226161404.14136-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org In the unlikely case that an out-of-control application is generating negative dentries faster than what the negative dentry reclaim process can get rid of, we will have to kill the negative dentry directly as the last resort. The current threshold for killing negative dentry is the maximum of 4 times dentry-dir-max and 10,000 within a directory. On a 32-vcpu VM, a 30-thread parallel negative dentry generation problem was run. Without this patch, the negative dentry reclaim process was overwhelmed by the negative dentry generator and the number of negative dentries kept growing. With this patch applied with a "dentry-dir-max" of 10,000. The number of negative dentries never went beyond 40,000. Signed-off-by: Waiman Long --- fs/dcache.c | 37 +++++++++++++++++++++++++++++-------- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index f470763e7fb8..fe48e00349c9 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -140,7 +140,7 @@ static int negative_dentry_dir_max __read_mostly; static LLIST_HEAD(negative_reclaim_list); static DEFINE_STATIC_KEY_FALSE(negative_reclaim_enable); -static void negative_reclaim_check(struct dentry *parent); +static void negative_reclaim_check(struct dentry *parent, struct dentry *child); static void negative_reclaim_workfn(struct work_struct *work); static DECLARE_WORK(negative_reclaim_work, negative_reclaim_workfn); @@ -927,7 +927,7 @@ void dput(struct dentry *dentry) */ if (static_branch_unlikely(&negative_reclaim_enable) && neg_parent) - negative_reclaim_check(neg_parent); + negative_reclaim_check(neg_parent, dentry); return; } @@ -1548,10 +1548,12 @@ static void negative_reclaim_workfn(struct work_struct *work) * Check the parent to see if it has too many negative dentries and queue * it up for the negative dentry reclaim work function to handle it. */ -static void negative_reclaim_check(struct dentry *parent) +static void negative_reclaim_check(struct dentry *parent, struct dentry *child) __releases(rcu) { int limit = negative_dentry_dir_max; + int kill_threshold = max(4 * limit, 10000); + int ncnt = read_dentry_nnegative(parent); struct reclaim_dentry *dentry_node; if (!limit) @@ -1560,16 +1562,16 @@ static void negative_reclaim_check(struct dentry *parent) /* * These checks are racy before spin_lock(). */ - if (!can_reclaim_dentry(parent->d_flags) || - (parent->d_flags & DCACHE_RECLAIMING) || - (read_dentry_nnegative(parent) <= limit)) + if ((!can_reclaim_dentry(parent->d_flags) || + (parent->d_flags & DCACHE_RECLAIMING) || (ncnt <= limit)) && + (ncnt < kill_threshold)) goto rcu_unlock_out; spin_lock(&parent->d_lock); + ncnt = read_dentry_nnegative(parent); if (!can_reclaim_dentry(parent->d_flags) || (parent->d_flags & DCACHE_RECLAIMING) || - (read_dentry_nnegative(parent) <= limit) || - !d_is_dir(parent)) + (ncnt <= limit) || !d_is_dir(parent)) goto unlock_out; dentry_node = kzalloc(sizeof(*dentry_node), GFP_NOWAIT); @@ -1592,6 +1594,25 @@ static void negative_reclaim_check(struct dentry *parent) return; unlock_out: + /* + * In the unlikely case that an out-of-control application is + * generating negative dentries faster than what the negative + * dentry reclaim process can get rid of, we will have to kill + * the negative dentry directly as the last resort. + * + * N.B. __dentry_kill() releases both the parent and child's d_lock. + */ + if (unlikely(ncnt >= kill_threshold)) { + spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED); + if (can_reclaim_dentry(child->d_flags) && + !child->d_lockref.count && (child->d_parent == parent)) { + rcu_read_unlock(); + __dentry_kill(child); + dput(parent); + return; + } + spin_unlock(&child->d_lock); + } spin_unlock(&parent->d_lock); rcu_unlock_out: rcu_read_unlock(); From patchwork Wed Feb 26 16:14:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 11406731 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E65792A for ; Wed, 26 Feb 2020 16:16:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6E11A24680 for ; Wed, 26 Feb 2020 16:16:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="BuqaloSo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728235AbgBZQQD (ORCPT ); Wed, 26 Feb 2020 11:16:03 -0500 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:39281 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728734AbgBZQP6 (ORCPT ); Wed, 26 Feb 2020 11:15:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582733756; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=ywx1qIVqpeFVyk5FdbSzu1McnsKdZIvwvf+WZMVi7Zs=; b=BuqaloSov3HuKrUUTLwuz5jdJwGMVxZTCOSee7MjY6m7196phtfa7WO4SNVb7ijQv6S5IY yQuC0uR1ug8O36LGeIoTWaezF1V9eFkWjWxlI4TpJmijNkVp1TTMnE5gdD29dYYWXnXSa0 UJjDG2uiDa4JksP+Aloux0+OCdEbgfQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-426-K7C2TJKJPyaLBrctJLBKxQ-1; Wed, 26 Feb 2020 11:15:53 -0500 X-MC-Unique: K7C2TJKJPyaLBrctJLBKxQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3873F800D6C; Wed, 26 Feb 2020 16:15:51 +0000 (UTC) Received: from llong.com (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id CFD6F60BE1; Wed, 26 Feb 2020 16:15:47 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jonathan Corbet , Luis Chamberlain , Kees Cook , Iurii Zaikin Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, Mauro Carvalho Chehab , Eric Biggers , Dave Chinner , Eric Sandeen , Waiman Long Subject: [PATCH 11/11] fs/dcache: Track # of negative dentries reclaimed & killed Date: Wed, 26 Feb 2020 11:14:04 -0500 Message-Id: <20200226161404.14136-12-longman@redhat.com> In-Reply-To: <20200226161404.14136-1-longman@redhat.com> References: <20200226161404.14136-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org The negative dentry reclaim process gave no visible indication that it was being activated. In order to allow system administrator to see if it is being activated as expected, two new debugfs variables "negative_dentry_reclaimed" and "negative_dentry_killed" are now added to report the total number of negative dentries that have been reclaimed and killed. These debugfs variables are only added after the negative dentry reclaim mechanism is activated for the first time. In reality, the actual number may be slightly less than the reported number as not all the negative dentries passed to shrink_dentry_list() and __dentry_kill() can be successfully reclaimed. Signed-off-by: Waiman Long --- fs/dcache.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/fs/dcache.c b/fs/dcache.c index fe48e00349c9..471b51316506 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -33,6 +33,7 @@ #include #include #include +#include #include "internal.h" #include "mount.h" @@ -136,6 +137,8 @@ static DEFINE_PER_CPU(long, nr_dentry_negative); int dcache_dentry_dir_max_sysctl; EXPORT_SYMBOL_GPL(dcache_dentry_dir_max_sysctl); static int negative_dentry_dir_max __read_mostly; +static unsigned long negative_dentry_reclaim_count; +static atomic_t negative_dentry_kill_count; #define DENTRY_DIR_MAX_MIN 0x100 static LLIST_HEAD(negative_reclaim_list); @@ -204,6 +207,7 @@ int proc_dcache_dentry_dir_max(struct ctl_table *ctl, int write, { int old = dcache_dentry_dir_max_sysctl; int ret; + static bool debugfs_file_created; ret = proc_dointvec_minmax(ctl, write, buffer, lenp, ppos); @@ -219,6 +223,14 @@ int proc_dcache_dentry_dir_max(struct ctl_table *ctl, int write, return -EINVAL; } + if (!debugfs_file_created) { + debugfs_create_ulong("negative_dentry_reclaimed", 0400, NULL, + &negative_dentry_reclaim_count); + debugfs_create_u32("negative_dentry_killed", 0400, NULL, + (u32 *)&negative_dentry_kill_count.counter); + debugfs_file_created = true; + } + negative_dentry_dir_max = dcache_dentry_dir_max_sysctl; if (!old && dcache_dentry_dir_max_sysctl) static_branch_enable(&negative_reclaim_enable); @@ -1542,6 +1554,8 @@ static void negative_reclaim_workfn(struct work_struct *work) kfree(dentry_node); cond_resched(); } + if (quota < MAX_DENTRY_RECLAIM) + negative_dentry_reclaim_count += MAX_DENTRY_RECLAIM - quota; } /* @@ -1609,6 +1623,7 @@ static void negative_reclaim_check(struct dentry *parent, struct dentry *child) rcu_read_unlock(); __dentry_kill(child); dput(parent); + atomic_inc(&negative_dentry_kill_count); return; } spin_unlock(&child->d_lock);