From patchwork Mon Jul 17 13:39:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 9845099 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A43D560386 for ; Mon, 17 Jul 2017 13:41:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 96A9627C05 for ; Mon, 17 Jul 2017 13:41:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8B47128503; Mon, 17 Jul 2017 13:41:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 19B2927C05 for ; Mon, 17 Jul 2017 13:41:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751379AbdGQNlN (ORCPT ); Mon, 17 Jul 2017 09:41:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47202 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751458AbdGQNj4 (ORCPT ); Mon, 17 Jul 2017 09:39:56 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E56C92FFC37; Mon, 17 Jul 2017 13:39:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com E56C92FFC37 Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=longman@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com E56C92FFC37 Received: from llong.com (dhcp-17-6.bos.redhat.com [10.18.17.6]) by smtp.corp.redhat.com (Postfix) with ESMTP id 941BD7EA29; Mon, 17 Jul 2017 13:39:54 +0000 (UTC) From: Waiman Long To: Alexander Viro , Jonathan Corbet Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, "Paul E. McKenney" , Andrew Morton , Ingo Molnar , Miklos Szeredi , Waiman Long Subject: [PATCH 3/4] fs/dcache: Enable automatic pruning of negative dentries Date: Mon, 17 Jul 2017 09:39:32 -0400 Message-Id: <1500298773-7510-4-git-send-email-longman@redhat.com> In-Reply-To: <1500298773-7510-1-git-send-email-longman@redhat.com> References: <1500298773-7510-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Mon, 17 Jul 2017 13:39:56 +0000 (UTC) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Having a limit for the number of negative dentries does have an undesirable side effect that no new negative dentries will be allowed when the limit is reached. This will have performance implication for some types of workloads. So we need a way to prune the negative dentries so that new ones can be created. This is done by using the workqueue API to do the pruning gradually when a threshold is reached to minimize performance impact on other running tasks. The pruning is done at a frequency of 10 runs per second. Each run scans at most 256 LRU dentries for each node LRU list of a certain superblock. Some non-negative dentries that happen to be at the front of the LRU lists will also be pruned. Signed-off-by: Waiman Long --- fs/dcache.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/list_lru.h | 1 + mm/list_lru.c | 4 +- 3 files changed, 113 insertions(+), 1 deletion(-) diff --git a/fs/dcache.c b/fs/dcache.c index bb0a519..6c7d86f 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -134,13 +134,19 @@ struct dentry_stat_t dentry_stat = { * Macros and variables to manage and count negative dentries. */ #define NEG_DENTRY_BATCH (1 << 8) +#define NEG_PRUNING_DELAY (HZ/10) static long neg_dentry_percpu_limit __read_mostly; static long neg_dentry_nfree_init __read_mostly; /* Free pool initial value */ static struct { raw_spinlock_t nfree_lock; long nfree; /* Negative dentry free pool */ + struct super_block *prune_sb; /* Super_block for pruning */ + int neg_count, prune_count; /* Pruning counts */ } ndblk ____cacheline_aligned_in_smp; +static void prune_negative_dentry(struct work_struct *work); +static DECLARE_DELAYED_WORK(prune_neg_dentry_work, prune_negative_dentry); + static DEFINE_PER_CPU(long, nr_dentry); static DEFINE_PER_CPU(long, nr_dentry_unused); static DEFINE_PER_CPU(long, nr_dentry_neg); @@ -323,6 +329,16 @@ static void __neg_dentry_inc(struct dentry *dentry) */ if (!cnt) dentry->d_flags |= DCACHE_KILL_NEGATIVE; + + /* + * Initiate negative dentry pruning if free pool has less than + * 1/4 of its initial value. + */ + if (READ_ONCE(ndblk.nfree) < neg_dentry_nfree_init/4) { + WRITE_ONCE(ndblk.prune_sb, dentry->d_sb); + schedule_delayed_work(&prune_neg_dentry_work, + NEG_PRUNING_DELAY); + } } static inline void neg_dentry_inc(struct dentry *dentry) @@ -1291,6 +1307,99 @@ void shrink_dcache_sb(struct super_block *sb) } EXPORT_SYMBOL(shrink_dcache_sb); +/* + * A modified version that attempts to remove a limited number of negative + * dentries as well as some other non-negative dentries at the front. + */ +static enum lru_status dentry_negative_lru_isolate(struct list_head *item, + struct list_lru_one *lru, spinlock_t *lru_lock, void *arg) +{ + struct list_head *freeable = arg; + struct dentry *dentry = container_of(item, struct dentry, d_lru); + enum lru_status status = LRU_SKIP; + + /* + * Stop further list walking for the current node list to limit + * performance impact, but allow further walking in the next node + * list. + */ + if ((ndblk.neg_count >= NEG_DENTRY_BATCH) || + (ndblk.prune_count >= NEG_DENTRY_BATCH)) { + ndblk.prune_count = 0; + return LRU_STOP; + } + + /* + * we are inverting the lru lock/dentry->d_lock here, + * so use a trylock. If we fail to get the lock, just skip + * it + */ + if (!spin_trylock(&dentry->d_lock)) { + ndblk.prune_count++; + return LRU_SKIP; + } + + /* + * Referenced dentries are still in use. If they have active + * counts, just remove them from the LRU. Otherwise give them + * another pass through the LRU. + */ + if (dentry->d_lockref.count) { + d_lru_isolate(lru, dentry); + status = LRU_REMOVED; + goto out; + } + + if (dentry->d_flags & DCACHE_REFERENCED) { + dentry->d_flags &= ~DCACHE_REFERENCED; + status = LRU_ROTATE; + goto out; + } + + status = LRU_REMOVED; + d_lru_shrink_move(lru, dentry, freeable); + if (d_is_negative(dentry)) + ndblk.neg_count++; +out: + spin_unlock(&dentry->d_lock); + ndblk.prune_count++; + return status; +} + +/* + * A workqueue function to prune negative dentry. + * + * The pruning is done gradually over time so as not to have noticeable + * performance impact. + */ +static void prune_negative_dentry(struct work_struct *work) +{ + int freed; + struct super_block *sb = READ_ONCE(ndblk.prune_sb); + LIST_HEAD(dispose); + + if (!sb) + return; + + ndblk.neg_count = ndblk.prune_count = 0; + freed = list_lru_walk(&sb->s_dentry_lru, dentry_negative_lru_isolate, + &dispose, NEG_DENTRY_BATCH); + + if (freed) + shrink_dentry_list(&dispose); + /* + * Continue delayed pruning once every second until negative dentry + * free pool is at least 1/2 of the initial value or the super_block + * has no more negative dentries left at the front. + */ + if (ndblk.neg_count && + (READ_ONCE(ndblk.nfree) < neg_dentry_nfree_init/2)) + schedule_delayed_work(&prune_neg_dentry_work, + NEG_PRUNING_DELAY); + else + WRITE_ONCE(ndblk.prune_sb, NULL); +} + /** * enum d_walk_ret - action to talke during tree walk * @D_WALK_CONTINUE: contrinue walk diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index fa7fd03..06c9d15 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -22,6 +22,7 @@ enum lru_status { LRU_SKIP, /* item cannot be locked, skip */ LRU_RETRY, /* item not freeable. May drop the lock internally, but has to return locked. */ + LRU_STOP, /* stop walking the list */ }; struct list_lru_one { diff --git a/mm/list_lru.c b/mm/list_lru.c index 7a40fa2..f6e7796 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -244,11 +244,13 @@ unsigned long list_lru_count_node(struct list_lru *lru, int nid) */ assert_spin_locked(&nlru->lock); goto restart; + case LRU_STOP: + goto out; default: BUG(); } } - +out: spin_unlock(&nlru->lock); return isolated; }