From patchwork Fri May 8 12:23:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Konstantin Khlebnikov X-Patchwork-Id: 11536415 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8CCFB912 for ; Fri, 8 May 2020 12:23:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 74EA0208D6 for ; Fri, 8 May 2020 12:23:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.ru header.i=@yandex-team.ru header.b="kA2ADiyG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726937AbgEHMXT (ORCPT ); Fri, 8 May 2020 08:23:19 -0400 Received: from forwardcorp1p.mail.yandex.net ([77.88.29.217]:34534 "EHLO forwardcorp1p.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726636AbgEHMXS (ORCPT ); Fri, 8 May 2020 08:23:18 -0400 Received: from mxbackcorp1j.mail.yandex.net (mxbackcorp1j.mail.yandex.net [IPv6:2a02:6b8:0:1619::162]) by forwardcorp1p.mail.yandex.net (Yandex) with ESMTP id E6E0A2E153D; Fri, 8 May 2020 15:23:15 +0300 (MSK) Received: from myt4-18a966dbd9be.qloud-c.yandex.net (myt4-18a966dbd9be.qloud-c.yandex.net [2a02:6b8:c00:12ad:0:640:18a9:66db]) by mxbackcorp1j.mail.yandex.net (mxbackcorp/Yandex) with ESMTP id noNF78ywCD-NEWOIxfj; Fri, 08 May 2020 15:23:15 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1588940595; bh=cr1cI1dstvPRsp7XjzJcaiOlA2cPzInYUH9vTttVd4k=; h=In-Reply-To:Message-ID:References:Date:To:From:Subject:Cc; b=kA2ADiyGv3YlEq7Je3o/FUKJKD4JCDDIB8HN+FNhiNMp9ZG180MbJuT2p+YaspCMr vW3pdAs7bqW7Ua+RgFnelbu2NyZwd+NA269zuPSff8Gb9opYnM7x+KREHlRSy3UYHC BnLGZL9ulqOIuhhVIVq+1eO8wctpWFLECfUW9Q6M= Authentication-Results: mxbackcorp1j.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from dynamic-vpn.dhcp.yndx.net (dynamic-vpn.dhcp.yndx.net [2a02:6b8:b080:7008::1:4]) by myt4-18a966dbd9be.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id sKsCfMvhoh-NEWqunQK; Fri, 08 May 2020 15:23:14 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) Subject: [PATCH RFC 1/8] dcache: show count of hash buckets in sysctl fs.dentry-state From: Konstantin Khlebnikov To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Alexander Viro Cc: Waiman Long Date: Fri, 08 May 2020 15:23:14 +0300 Message-ID: <158894059427.200862.341530589978120554.stgit@buzz> In-Reply-To: <158893941613.200862.4094521350329937435.stgit@buzz> References: <158893941613.200862.4094521350329937435.stgit@buzz> User-Agent: StGit/0.22-32-g6a05 MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Count of buckets is required for estimating average length of hash chains. Size of hash table depends on memory size and printed once at boot. Let's expose nr_buckets as sixth number in sysctl fs.dentry-state Signed-off-by: Konstantin Khlebnikov --- Documentation/admin-guide/sysctl/fs.rst | 12 ++++++------ fs/dcache.c | 12 ++++++++++-- include/linux/dcache.h | 2 +- 3 files changed, 17 insertions(+), 9 deletions(-) diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst index 2a45119e3331..b74df4714ddd 100644 --- a/Documentation/admin-guide/sysctl/fs.rst +++ b/Documentation/admin-guide/sysctl/fs.rst @@ -66,12 +66,12 @@ dentry-state From linux/include/linux/dcache.h:: struct dentry_stat_t dentry_stat { - int nr_dentry; - int nr_unused; - int age_limit; /* age in seconds */ - int want_pages; /* pages requested by system */ - int nr_negative; /* # of unused negative dentries */ - int dummy; /* Reserved for future use */ + long nr_dentry; + long nr_unused; + long age_limit; /* age in seconds */ + long want_pages; /* pages requested by system */ + long nr_negative; /* # of unused negative dentries */ + long nr_buckets; /* count of dcache hash buckets */ }; Dentries are dynamically allocated and deallocated. diff --git a/fs/dcache.c b/fs/dcache.c index b280e07e162b..386f97eaf2ff 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -3139,6 +3139,14 @@ static int __init set_dhash_entries(char *str) } __setup("dhash_entries=", set_dhash_entries); +static void __init dcache_init_hash(void) +{ + dentry_stat.nr_buckets = 1l << d_hash_shift; + + /* shift to use higher bits of 32 bit hash value */ + d_hash_shift = 32 - d_hash_shift; +} + static void __init dcache_init_early(void) { /* If hashes are distributed across NUMA nodes, defer @@ -3157,7 +3165,7 @@ static void __init dcache_init_early(void) NULL, 0, 0); - d_hash_shift = 32 - d_hash_shift; + dcache_init_hash(); } static void __init dcache_init(void) @@ -3185,7 +3193,7 @@ static void __init dcache_init(void) NULL, 0, 0); - d_hash_shift = 32 - d_hash_shift; + dcache_init_hash(); } /* SLAB cache for __getname() consumers */ diff --git a/include/linux/dcache.h b/include/linux/dcache.h index c1488cc84fd9..082b55068e4d 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -65,7 +65,7 @@ struct dentry_stat_t { long age_limit; /* age in seconds */ long want_pages; /* pages requested by system */ long nr_negative; /* # of unused negative dentries */ - long dummy; /* Reserved for future use */ + long nr_buckets; /* count of dcache hash buckets */ }; extern struct dentry_stat_t dentry_stat; From patchwork Fri May 8 12:23:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Konstantin Khlebnikov X-Patchwork-Id: 11536425 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B06931668 for ; Fri, 8 May 2020 12:23:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 94BC521974 for ; Fri, 8 May 2020 12:23:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.ru header.i=@yandex-team.ru header.b="Ph/BhQVD" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727116AbgEHMX2 (ORCPT ); Fri, 8 May 2020 08:23:28 -0400 Received: from forwardcorp1j.mail.yandex.net ([5.45.199.163]:34578 "EHLO forwardcorp1j.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727100AbgEHMX1 (ORCPT ); Fri, 8 May 2020 08:23:27 -0400 Received: from mxbackcorp1o.mail.yandex.net (mxbackcorp1o.mail.yandex.net [IPv6:2a02:6b8:0:1a2d::301]) by forwardcorp1j.mail.yandex.net (Yandex) with ESMTP id ED8512E151F; Fri, 8 May 2020 15:23:18 +0300 (MSK) Received: from myt4-18a966dbd9be.qloud-c.yandex.net (myt4-18a966dbd9be.qloud-c.yandex.net [2a02:6b8:c00:12ad:0:640:18a9:66db]) by mxbackcorp1o.mail.yandex.net (mxbackcorp/Yandex) with ESMTP id 7NK2742NLq-NHb8lZG3; Fri, 08 May 2020 15:23:18 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1588940598; bh=KU2g+Q0yPMTYZlBSSwWKdYtFznVsnPkB2sauyejITt4=; h=In-Reply-To:Message-ID:References:Date:To:From:Subject:Cc; b=Ph/BhQVD66wYtc1XAoeAQ8rPXfDRFEjS5ND7g8MKREO1d+1S3FOOlB09mrxBbycic ilB0e9MG7CuVf+MUwol8NI2wO8wjpSzQwLsV75E8K39ZyTg+gAnxFL0e2MCEm06cOD b0CaYwDrDTyPOgd42IzPNlN4Du43O30PJAgj/Vps= Authentication-Results: mxbackcorp1o.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from dynamic-vpn.dhcp.yndx.net (dynamic-vpn.dhcp.yndx.net [2a02:6b8:b080:7008::1:4]) by myt4-18a966dbd9be.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id ZuTxA7ymDv-NHWCQXMj; Fri, 08 May 2020 15:23:17 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) Subject: [PATCH RFC 2/8] selftests: add stress testing tool for dcache From: Konstantin Khlebnikov To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Alexander Viro Cc: Waiman Long Date: Fri, 08 May 2020 15:23:17 +0300 Message-ID: <158894059714.200862.11121403612367981747.stgit@buzz> In-Reply-To: <158893941613.200862.4094521350329937435.stgit@buzz> References: <158893941613.200862.4094521350329937435.stgit@buzz> User-Agent: StGit/0.22-32-g6a05 MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This tool fills dcache with negative dentries. Between iterations it prints statistics and measures time of inotify operation which might degrade. Signed-off-by: Konstantin Khlebnikov --- tools/testing/selftests/filesystems/Makefile | 1 .../testing/selftests/filesystems/dcache_stress.c | 210 ++++++++++++++++++++ 2 files changed, 211 insertions(+) create mode 100644 tools/testing/selftests/filesystems/dcache_stress.c diff --git a/tools/testing/selftests/filesystems/Makefile b/tools/testing/selftests/filesystems/Makefile index 129880fb42d3..6b5e08617d11 100644 --- a/tools/testing/selftests/filesystems/Makefile +++ b/tools/testing/selftests/filesystems/Makefile @@ -3,5 +3,6 @@ CFLAGS += -I../../../../usr/include/ TEST_GEN_PROGS := devpts_pts TEST_GEN_PROGS_EXTENDED := dnotify_test +TEST_GEN_FILES += dcache_stress include ../lib.mk diff --git a/tools/testing/selftests/filesystems/dcache_stress.c b/tools/testing/selftests/filesystems/dcache_stress.c new file mode 100644 index 000000000000..770e8876629e --- /dev/null +++ b/tools/testing/selftests/filesystems/dcache_stress.c @@ -0,0 +1,210 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +double now(void) +{ + struct timespec ts; + + clock_gettime(CLOCK_MONOTONIC, &ts); + return ts.tv_sec + ts.tv_nsec * 1e-9; +} + +struct dentry_stat { + long nr_dentry; + long nr_unused; + long age_limit; /* age in seconds */ + long want_pages; /* pages requested by system */ + long nr_negative; /* # of unused negative dentries */ + long nr_buckets; /* count of dcache hash buckets */ +}; + +void show_dentry_state(void) +{ + struct dentry_stat stat; + ssize_t len; + FILE *f; + + f = fopen("/proc/sys/fs/dentry-state", "r"); + if (!f) + err(2, "open fs.dentry-state"); + + if (fscanf(f, "%ld %ld %ld %ld %ld %ld", + &stat.nr_dentry, + &stat.nr_unused, + &stat.age_limit, + &stat.want_pages, + &stat.nr_negative, + &stat.nr_buckets) != 6) + err(2, "read fs.dentry-state"); + fclose(f); + + if (!stat.nr_buckets) + stat.nr_buckets = 1 << 20; // for 8Gb ram + + printf("nr_dentry = %ld\t%.1fM\n", stat.nr_dentry, stat.nr_dentry / 1e6); + printf("nr_buckets = %ld\t%.1f avg\n", stat.nr_buckets, (double)stat.nr_dentry / stat.nr_buckets); + printf("nr_unused = %ld\t%.1f%%\n", stat.nr_unused, stat.nr_unused * 100. / stat.nr_dentry); + printf("nr_negative = %ld\t%.1f%%\n\n", stat.nr_negative, stat.nr_negative * 100. / stat.nr_dentry); +} + +void test_inotify(const char *path) +{ + double tm; + int fd; + + fd = inotify_init1(0); + + tm = now(); + inotify_add_watch(fd, path, IN_OPEN); + tm = now() - tm; + + printf("inotify time: %f seconds\n\n", tm); + + close(fd); +} + +int main(int argc, char **argv) +{ + char dir_name[] = "dcache_stress.XXXXXX"; + char name[4096]; + char *suffix = name; + int nr_iterations = 10; + int nr_names = 1 << 20; + int iteration, index; + int other_dir = -1; + int mknod_unlink = 0; + int mkdir_chdir = 0; + int second_access = 0; + long long total_names = 0; + double tm; + int opt; + + while ((opt = getopt(argc, argv, "i:n:p:o:usdh")) != -1) { + switch (opt) { + case 'i': + nr_iterations = atoi(optarg); + break; + case 'n': + nr_names = atoi(optarg); + break; + case 'p': + strcpy(suffix, optarg); + suffix += strlen(suffix); + break; + case 'o': + other_dir = open(optarg, O_RDONLY | O_DIRECTORY); + if (other_dir < 0) + err(2, "open %s", optarg); + break; + case 'u': + mknod_unlink = 1; + break; + case 'd': + mkdir_chdir = 1; + break; + case 's': + second_access = 1; + break; + case '?': + case 'h': + printf("usage: %s [-i ] [-n ] [-p ] [-o ] [-u] [-s]\n" + " -i test iterations, default %d\n" + " -n names at each iterations, default %d\n" + " -p prefix for names\n" + " -o interlave with other dir\n" + " -s touch twice\n" + " -u mknod-unlink sequence\n" + " -d mkdir-chdir sequence (leaves garbage)\n", + argv[0], nr_iterations, nr_names); + return 1; + } + } + + + if (!mkdtemp(dir_name)) + err(2, "mkdtemp"); + + if (chdir(dir_name)) + err(2, "chdir"); + + show_dentry_state(); + + if (!mkdir_chdir) + test_inotify("."); + + printf("working in temporary directory %s\n\n", dir_name); + + for (iteration = 1; iteration <= nr_iterations; iteration++) { + + printf("start iteration %d, %d names\n", iteration, nr_names); + + tm = now(); + + sprintf(suffix, "%08x", iteration); + + for (index = 0; index < nr_names; index++) { + sprintf(suffix + 8, "%08x", index); + + if (mknod_unlink) { + if (mknod(name, S_IFREG, 0)) + err(2, "mknod %s", name); + if (unlink(name)) + err(2, "unlink %s", name); + } else if (mkdir_chdir) { + if (mkdir(name, 0775)) + err(2, "mkdir %s", name); + if (chdir(name)) + err(2, "chdir %s", name); + } else + access(name, 0); + + if (second_access) + access(name, 0); + + if (other_dir >= 0) { + faccessat(other_dir, name, 0, 0); + if (second_access) + faccessat(other_dir, name, 0, 0); + } + } + + total_names += nr_names; + + tm = now() - tm; + printf("iteration %d complete in %f seconds, total names %lld\n\n", iteration, tm, total_names); + + show_dentry_state(); + + if (!mkdir_chdir) + test_inotify("."); + } + + if (chdir("..")) + err(2, "chdir"); + + if (mkdir_chdir) { + printf("leave temporary directory %s\n", dir_name); + return 0; + } + + printf("removing temporary directory %s\n", dir_name); + tm = now(); + if (rmdir(dir_name)) + err(2, "rmdir"); + tm = now() - tm; + printf("remove complete in %f seconds\n\n", tm); + + show_dentry_state(); + + return 0; +} From patchwork Fri May 8 12:23:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Konstantin Khlebnikov X-Patchwork-Id: 11536449 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BAF1F912 for ; Fri, 8 May 2020 12:23:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9FEAB215A4 for ; Fri, 8 May 2020 12:23:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.ru header.i=@yandex-team.ru header.b="JK32nvVH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727849AbgEHMXy (ORCPT ); Fri, 8 May 2020 08:23:54 -0400 Received: from forwardcorp1o.mail.yandex.net ([95.108.205.193]:57082 "EHLO forwardcorp1o.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727095AbgEHMX0 (ORCPT ); Fri, 8 May 2020 08:23:26 -0400 Received: from mxbackcorp1o.mail.yandex.net (mxbackcorp1o.mail.yandex.net [IPv6:2a02:6b8:0:1a2d::301]) by forwardcorp1o.mail.yandex.net (Yandex) with ESMTP id EF52E2E15F8; Fri, 8 May 2020 15:23:21 +0300 (MSK) Received: from myt5-70c90f7d6d7d.qloud-c.yandex.net (myt5-70c90f7d6d7d.qloud-c.yandex.net [2a02:6b8:c12:3e2c:0:640:70c9:f7d]) by mxbackcorp1o.mail.yandex.net (mxbackcorp/Yandex) with ESMTP id XfLVNoDfkP-NKbWfVBj; Fri, 08 May 2020 15:23:21 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1588940601; bh=AnWcPTizG7xjvhjIOhOb0nEiHMXwPY48XzAP4e4IE9k=; h=In-Reply-To:Message-ID:References:Date:To:From:Subject:Cc; b=JK32nvVH7Xs+gWmfCIoDEGbduc8eAKEkWAZ5F3eBADjN0kX06/bxQkCcCw6GAWyI7 +UgI2GTHxrqE1QpVcR8DU+vfBnLImz+ooc1uTTUbWPDWqXGwg13DOxDWsC0BVtf02/ IzZcLiZEP7UaDctHjaGHwZdV/PIshD4bEHaQGU3Y= Authentication-Results: mxbackcorp1o.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from dynamic-vpn.dhcp.yndx.net (dynamic-vpn.dhcp.yndx.net [2a02:6b8:b080:7008::1:4]) by myt5-70c90f7d6d7d.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id vH1ERg2UWZ-NKWeoUXV; Fri, 08 May 2020 15:23:20 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) Subject: [PATCH RFC 3/8] dcache: sweep cached negative dentries to the end of list of siblings From: Konstantin Khlebnikov To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Alexander Viro Cc: Waiman Long Date: Fri, 08 May 2020 15:23:20 +0300 Message-ID: <158894060021.200862.15936671684100629802.stgit@buzz> In-Reply-To: <158893941613.200862.4094521350329937435.stgit@buzz> References: <158893941613.200862.4094521350329937435.stgit@buzz> User-Agent: StGit/0.22-32-g6a05 MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org For disk filesystems result of every negative lookup is cached, content of directories is usually cached too. Production of negative dentries isn't limited with disk speed. It's really easy to generate millions of them if system has enough memory. Negative dentries are linked into siblings list along with normal positive dentries. Some operations walks dcache tree but looks only for positive dentries: most important is fsnotify/inotify. This patch moves negative dentries to the end of list at final dput() and marks with flag which tells that all following dentries are negative too. Reverse operation is required before instantiating negative dentry. Signed-off-by: Konstantin Khlebnikov --- fs/dcache.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++-- include/linux/dcache.h | 6 +++++ 2 files changed, 66 insertions(+), 3 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 386f97eaf2ff..743255773cc7 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -632,6 +632,48 @@ static inline struct dentry *lock_parent(struct dentry *dentry) return __lock_parent(dentry); } +/* + * Move cached negative dentry to the tail of parent->d_subdirs. + * This lets walkers skip them all together at first sight. + * Must be called at dput of negative dentry. + */ +static void sweep_negative(struct dentry *dentry) +{ + struct dentry *parent; + + if (!d_is_tail_negative(dentry)) { + parent = lock_parent(dentry); + if (!parent) + return; + + if (!d_count(dentry) && d_is_negative(dentry) && + !d_is_tail_negative(dentry)) { + dentry->d_flags |= DCACHE_TAIL_NEGATIVE; + list_move_tail(&dentry->d_child, &parent->d_subdirs); + } + + spin_unlock(&parent->d_lock); + } +} + +/* + * Undo sweep_negative() and move to the head of parent->d_subdirs. + * Must be called before converting negative dentry into positive. + */ +static void recycle_negative(struct dentry *dentry) +{ + struct dentry *parent; + + spin_lock(&dentry->d_lock); + parent = lock_parent(dentry); + if (parent) { + list_move(&dentry->d_child, &parent->d_subdirs); + spin_unlock(&parent->d_lock); + } + dentry->d_flags &= ~DCACHE_TAIL_NEGATIVE; + spin_unlock(&dentry->d_lock); +} + static inline bool retain_dentry(struct dentry *dentry) { WARN_ON(d_in_lookup(dentry)); @@ -703,6 +745,8 @@ static struct dentry *dentry_kill(struct dentry *dentry) spin_unlock(&inode->i_lock); if (parent) spin_unlock(&parent->d_lock); + if (d_is_negative(dentry)) + sweep_negative(dentry); spin_unlock(&dentry->d_lock); return NULL; } @@ -718,7 +762,7 @@ static struct dentry *dentry_kill(struct dentry *dentry) static inline bool fast_dput(struct dentry *dentry) { int ret; - unsigned int d_flags; + unsigned int d_flags, required; /* * If we have a d_op->d_delete() operation, we sould not @@ -766,6 +810,8 @@ static inline bool fast_dput(struct dentry *dentry) * a 'delete' op, and it's referenced and already on * the LRU list. * + * Cached negative dentry must be swept to the tail. + * * NOTE! Since we aren't locked, these values are * not "stable". However, it is sufficient that at * some point after we dropped the reference the @@ -777,10 +823,15 @@ static inline bool fast_dput(struct dentry *dentry) */ smp_rmb(); d_flags = READ_ONCE(dentry->d_flags); - d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST | DCACHE_DISCONNECTED; + + required = DCACHE_REFERENCED | DCACHE_LRU_LIST | + (d_flags_negative(d_flags) ? DCACHE_TAIL_NEGATIVE : 0); + + d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST | + DCACHE_DISCONNECTED | DCACHE_TAIL_NEGATIVE; /* Nothing to do? Dropping the reference was all we needed? */ - if (d_flags == (DCACHE_REFERENCED | DCACHE_LRU_LIST) && !d_unhashed(dentry)) + if (d_flags == required && !d_unhashed(dentry)) return true; /* @@ -852,6 +903,8 @@ void dput(struct dentry *dentry) rcu_read_unlock(); if (likely(retain_dentry(dentry))) { + if (d_is_negative(dentry)) + sweep_negative(dentry); spin_unlock(&dentry->d_lock); return; } @@ -1951,6 +2004,8 @@ void d_instantiate(struct dentry *entry, struct inode * inode) { BUG_ON(!hlist_unhashed(&entry->d_u.d_alias)); if (inode) { + if (d_is_tail_negative(entry)) + recycle_negative(entry); security_d_instantiate(entry, inode); spin_lock(&inode->i_lock); __d_instantiate(entry, inode); @@ -1970,6 +2025,8 @@ void d_instantiate_new(struct dentry *entry, struct inode *inode) BUG_ON(!hlist_unhashed(&entry->d_u.d_alias)); BUG_ON(!inode); lockdep_annotate_inode_mutex_key(inode); + if (d_is_tail_negative(entry)) + recycle_negative(entry); security_d_instantiate(entry, inode); spin_lock(&inode->i_lock); __d_instantiate(entry, inode); diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 082b55068e4d..1127a394b931 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -217,6 +217,7 @@ struct dentry_operations { #define DCACHE_PAR_LOOKUP 0x10000000 /* being looked up (with parent locked shared) */ #define DCACHE_DENTRY_CURSOR 0x20000000 #define DCACHE_NORCU 0x40000000 /* No RCU delay for freeing */ +#define DCACHE_TAIL_NEGATIVE 0x80000000 /* All following siblings are negative */ extern seqlock_t rename_lock; @@ -493,6 +494,11 @@ static inline int simple_positive(const struct dentry *dentry) return d_really_is_positive(dentry) && !d_unhashed(dentry); } +static inline bool d_is_tail_negative(const struct dentry *dentry) +{ + return unlikely(dentry->d_flags & DCACHE_TAIL_NEGATIVE); +} + extern void d_set_fallthru(struct dentry *dentry); static inline bool d_is_fallthru(const struct dentry *dentry) From patchwork Fri May 8 12:23:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Konstantin Khlebnikov X-Patchwork-Id: 11536447 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F3946912 for ; Fri, 8 May 2020 12:23:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D1C15216FD for ; Fri, 8 May 2020 12:23:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.ru header.i=@yandex-team.ru header.b="H0viADmU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727823AbgEHMXy (ORCPT ); Fri, 8 May 2020 08:23:54 -0400 Received: from forwardcorp1j.mail.yandex.net ([5.45.199.163]:34642 "EHLO forwardcorp1j.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727110AbgEHMX2 (ORCPT ); Fri, 8 May 2020 08:23:28 -0400 Received: from mxbackcorp1g.mail.yandex.net (mxbackcorp1g.mail.yandex.net [IPv6:2a02:6b8:0:1402::301]) by forwardcorp1j.mail.yandex.net (Yandex) with ESMTP id 38F662E0DD7; Fri, 8 May 2020 15:23:24 +0300 (MSK) Received: from myt5-70c90f7d6d7d.qloud-c.yandex.net (myt5-70c90f7d6d7d.qloud-c.yandex.net [2a02:6b8:c12:3e2c:0:640:70c9:f7d]) by mxbackcorp1g.mail.yandex.net (mxbackcorp/Yandex) with ESMTP id CwfcYZS86f-NNAWclZ0; Fri, 08 May 2020 15:23:24 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1588940604; bh=fE5H5nxfbTVPsDYqy95Pi0qrqxG0n44r6YkokVlYuxs=; h=In-Reply-To:Message-ID:References:Date:To:From:Subject:Cc; b=H0viADmU8Q1pZpdoq3gUEj+1k51lST+hEaXaQzaspjNPreBUobGD5qFJJFWjDTZGp qlxY2mfOF2tOueZPJ4/OWPOMlUWZTHdSiJAtumhD0dek8Lrtd2BO77fiuVy2fciQPS vEFgofTlKQcoOtSbRFNqyibCu/ALHX0go6vjdkiE= Authentication-Results: mxbackcorp1g.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from dynamic-vpn.dhcp.yndx.net (dynamic-vpn.dhcp.yndx.net [2a02:6b8:b080:7008::1:4]) by myt5-70c90f7d6d7d.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id fhizLjzH9z-NNWe0AWK; Fri, 08 May 2020 15:23:23 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) Subject: [PATCH RFC 4/8] fsnotify: stop walking child dentries if remaining tail is negative From: Konstantin Khlebnikov To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Alexander Viro Cc: Waiman Long Date: Fri, 08 May 2020 15:23:23 +0300 Message-ID: <158894060308.200862.2000400345829882905.stgit@buzz> In-Reply-To: <158893941613.200862.4094521350329937435.stgit@buzz> References: <158893941613.200862.4094521350329937435.stgit@buzz> User-Agent: StGit/0.22-32-g6a05 MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When notification starts/stops listening events from inode's children it have to update dentry->d_flags of all positive child dentries. Scanning may took a long time if directory has a lot of negative child dentries. This is main beneficiary of sweeping cached negative dentries to the end. Before patch: nr_dentry = 24172597 24.2M nr_buckets = 8388608 2.9 avg nr_unused = 24158110 99.9% nr_negative = 24142810 99.9% inotify time: 0.507182 seconds After patch: nr_dentry = 24562747 24.6M nr_buckets = 8388608 2.9 avg nr_unused = 24548714 99.9% nr_negative = 24543867 99.9% inotify time: 0.000010 seconds Negative dentries no longer slow down inotify op at parent directory. Signed-off-by: Konstantin Khlebnikov --- fs/notify/fsnotify.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 72d332ce8e12..072974302950 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -127,8 +127,12 @@ void __fsnotify_update_child_dentry_flags(struct inode *inode) * original inode) */ spin_lock(&alias->d_lock); list_for_each_entry(child, &alias->d_subdirs, d_child) { - if (!child->d_inode) + if (!child->d_inode) { + /* all remaining children are negative */ + if (d_is_tail_negative(child)) + break; continue; + } spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED); if (watched) From patchwork Fri May 8 12:23:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Konstantin Khlebnikov X-Patchwork-Id: 11536433 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B195B912 for ; Fri, 8 May 2020 12:23:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9745E215A4 for ; Fri, 8 May 2020 12:23:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.ru header.i=@yandex-team.ru header.b="Dudi4cnw" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727785AbgEHMXd (ORCPT ); Fri, 8 May 2020 08:23:33 -0400 Received: from forwardcorp1p.mail.yandex.net ([77.88.29.217]:34772 "EHLO forwardcorp1p.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727124AbgEHMXb (ORCPT ); Fri, 8 May 2020 08:23:31 -0400 Received: from mxbackcorp1g.mail.yandex.net (mxbackcorp1g.mail.yandex.net [IPv6:2a02:6b8:0:1402::301]) by forwardcorp1p.mail.yandex.net (Yandex) with ESMTP id C19C22E158B; Fri, 8 May 2020 15:23:26 +0300 (MSK) Received: from myt4-18a966dbd9be.qloud-c.yandex.net (myt4-18a966dbd9be.qloud-c.yandex.net [2a02:6b8:c00:12ad:0:640:18a9:66db]) by mxbackcorp1g.mail.yandex.net (mxbackcorp/Yandex) with ESMTP id CMbOpsqaf5-NPAKSseD; Fri, 08 May 2020 15:23:26 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1588940606; bh=gAc8zimgDuqoVSCzjYmfsuvcNfaJdwfxgsEnXz43eRQ=; h=In-Reply-To:Message-ID:References:Date:To:From:Subject:Cc; b=Dudi4cnwJ7t5xj12iPVVfiBK1+NBPLvmfgEDa3uLI21UMZu+g4jxgwiqUQzMkFq2L 2yptU+ynirEEliWdMzQntV7Y9Uj6YtqlJQnZGHLdge2EEphAu+qaE/aZodpzCOKE8z PfrjiH4Ms99VevQrQltHSx3Ij7B4tUi8LDePTgvM= Authentication-Results: mxbackcorp1g.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from dynamic-vpn.dhcp.yndx.net (dynamic-vpn.dhcp.yndx.net [2a02:6b8:b080:7008::1:4]) by myt4-18a966dbd9be.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id rgqNvxq2Fn-NPWiN7wm; Fri, 08 May 2020 15:23:25 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) Subject: [PATCH RFC 5/8] dcache: add action D_WALK_SKIP_SIBLINGS to d_walk() From: Konstantin Khlebnikov To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Alexander Viro Cc: Waiman Long Date: Fri, 08 May 2020 15:23:25 +0300 Message-ID: <158894060525.200862.12478833917149869939.stgit@buzz> In-Reply-To: <158893941613.200862.4094521350329937435.stgit@buzz> References: <158893941613.200862.4094521350329937435.stgit@buzz> User-Agent: StGit/0.22-32-g6a05 MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This lets skip remaining siblings at seeing d_is_tail_negative(). Signed-off-by: Konstantin Khlebnikov --- fs/dcache.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/dcache.c b/fs/dcache.c index 743255773cc7..44c6832d21d6 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1303,12 +1303,14 @@ EXPORT_SYMBOL(shrink_dcache_sb); * @D_WALK_QUIT: quit walk * @D_WALK_NORETRY: quit when retry is needed * @D_WALK_SKIP: skip this dentry and its children + * @D_WALK_SKIP_SIBLINGS: skip siblings and their children */ enum d_walk_ret { D_WALK_CONTINUE, D_WALK_QUIT, D_WALK_NORETRY, D_WALK_SKIP, + D_WALK_SKIP_SIBLINGS, }; /** @@ -1339,6 +1341,7 @@ static void d_walk(struct dentry *parent, void *data, break; case D_WALK_QUIT: case D_WALK_SKIP: + case D_WALK_SKIP_SIBLINGS: goto out_unlock; case D_WALK_NORETRY: retry = false; @@ -1370,6 +1373,9 @@ static void d_walk(struct dentry *parent, void *data, case D_WALK_SKIP: spin_unlock(&dentry->d_lock); continue; + case D_WALK_SKIP_SIBLINGS: + spin_unlock(&dentry->d_lock); + goto skip_siblings; } if (!list_empty(&dentry->d_subdirs)) { @@ -1381,6 +1387,7 @@ static void d_walk(struct dentry *parent, void *data, } spin_unlock(&dentry->d_lock); } +skip_siblings: /* * All done at this level ... ascend and resume the search. */ From patchwork Fri May 8 12:23:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Konstantin Khlebnikov X-Patchwork-Id: 11536441 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 09EED912 for ; Fri, 8 May 2020 12:23:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E0176216FD for ; Fri, 8 May 2020 12:23:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.ru header.i=@yandex-team.ru header.b="OOLeXBCf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727822AbgEHMXj (ORCPT ); Fri, 8 May 2020 08:23:39 -0400 Received: from forwardcorp1o.mail.yandex.net ([95.108.205.193]:57296 "EHLO forwardcorp1o.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727772AbgEHMXe (ORCPT ); Fri, 8 May 2020 08:23:34 -0400 Received: from mxbackcorp2j.mail.yandex.net (mxbackcorp2j.mail.yandex.net [IPv6:2a02:6b8:0:1619::119]) by forwardcorp1o.mail.yandex.net (Yandex) with ESMTP id 389F62E1574; Fri, 8 May 2020 15:23:29 +0300 (MSK) Received: from myt4-18a966dbd9be.qloud-c.yandex.net (myt4-18a966dbd9be.qloud-c.yandex.net [2a02:6b8:c00:12ad:0:640:18a9:66db]) by mxbackcorp2j.mail.yandex.net (mxbackcorp/Yandex) with ESMTP id ar427vyawi-NSX4ptWY; Fri, 08 May 2020 15:23:29 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1588940609; bh=liJknFRYDA0nUUv0ro+iQXHRLau6qws9t8QCP/DSyVg=; h=In-Reply-To:Message-ID:References:Date:To:From:Subject:Cc; b=OOLeXBCfwoE6ZufUWCYkDj83HXktllePJsnMsIx9vr5wx0QX2UAsr1HG8rSOu8Omg hCskXJGWC8zlMJECTFy+TE+rgXbOXgjyzOM2bJUMmsZLj+koG8MwVHwzsFZVqOPa1c kb/l7qjGID09yskyKryJGtYdxXnUU4e2bJEkShxc= Authentication-Results: mxbackcorp2j.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from dynamic-vpn.dhcp.yndx.net (dynamic-vpn.dhcp.yndx.net [2a02:6b8:b080:7008::1:4]) by myt4-18a966dbd9be.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id uFJknMXhLs-NSWiskNQ; Fri, 08 May 2020 15:23:28 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) Subject: [PATCH RFC 6/8] dcache: stop walking siblings if remaining dentries all negative From: Konstantin Khlebnikov To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Alexander Viro Cc: Waiman Long Date: Fri, 08 May 2020 15:23:28 +0300 Message-ID: <158894060799.200862.477468763047350875.stgit@buzz> In-Reply-To: <158893941613.200862.4094521350329937435.stgit@buzz> References: <158893941613.200862.4094521350329937435.stgit@buzz> User-Agent: StGit/0.22-32-g6a05 MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Most walkers are interested only in positive dentries. Changes in simple_* libfs helpers are mostly cosmetic: it shouldn't cache negative dentries unless uses d_delete other than always_delete_dentry(). Signed-off-by: Konstantin Khlebnikov --- fs/dcache.c | 10 ++++++++++ fs/libfs.c | 10 +++++++++- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/fs/dcache.c b/fs/dcache.c index 44c6832d21d6..0fd2e02e507b 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1442,6 +1442,9 @@ static enum d_walk_ret path_check_mount(void *data, struct dentry *dentry) struct check_mount *info = data; struct path path = { .mnt = info->mnt, .dentry = dentry }; + if (d_is_tail_negative(dentry)) + return D_WALK_SKIP_SIBLINGS; + if (likely(!d_mountpoint(dentry))) return D_WALK_CONTINUE; if (__path_is_mountpoint(&path)) { @@ -1688,6 +1691,10 @@ void shrink_dcache_for_umount(struct super_block *sb) static enum d_walk_ret find_submount(void *_data, struct dentry *dentry) { struct dentry **victim = _data; + + if (d_is_tail_negative(dentry)) + return D_WALK_SKIP_SIBLINGS; + if (d_mountpoint(dentry)) { __dget_dlock(dentry); *victim = dentry; @@ -3159,6 +3166,9 @@ static enum d_walk_ret d_genocide_kill(void *data, struct dentry *dentry) { struct dentry *root = data; if (dentry != root) { + if (d_is_tail_negative(dentry)) + return D_WALK_SKIP_SIBLINGS; + if (d_unhashed(dentry) || !dentry->d_inode) return D_WALK_SKIP; diff --git a/fs/libfs.c b/fs/libfs.c index 3759fbacf522..de944c241cf0 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -106,6 +106,10 @@ static struct dentry *scan_positives(struct dentry *cursor, spin_lock(&dentry->d_lock); while ((p = p->next) != &dentry->d_subdirs) { struct dentry *d = list_entry(p, struct dentry, d_child); + + if (d_is_tail_negative(d)) + break; + // we must at least skip cursors, to avoid livelocks if (d->d_flags & DCACHE_DENTRY_CURSOR) continue; @@ -255,7 +259,8 @@ static struct dentry *find_next_child(struct dentry *parent, struct dentry *prev spin_unlock(&d->d_lock); if (likely(child)) break; - } + } else if (d_is_tail_negative(d)) + break; } spin_unlock(&parent->d_lock); dput(prev); @@ -408,6 +413,9 @@ int simple_empty(struct dentry *dentry) spin_lock(&dentry->d_lock); list_for_each_entry(child, &dentry->d_subdirs, d_child) { + if (d_is_tail_negative(child)) + break; + spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED); if (simple_positive(child)) { spin_unlock(&child->d_lock); From patchwork Fri May 8 12:23:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Konstantin Khlebnikov X-Patchwork-Id: 11536443 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6A2D51668 for ; Fri, 8 May 2020 12:23:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 52242215A4 for ; Fri, 8 May 2020 12:23:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.ru header.i=@yandex-team.ru header.b="GB1sHnQL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727815AbgEHMXj (ORCPT ); Fri, 8 May 2020 08:23:39 -0400 Received: from forwardcorp1o.mail.yandex.net ([95.108.205.193]:57310 "EHLO forwardcorp1o.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727774AbgEHMXe (ORCPT ); Fri, 8 May 2020 08:23:34 -0400 Received: from mxbackcorp1o.mail.yandex.net (mxbackcorp1o.mail.yandex.net [IPv6:2a02:6b8:0:1a2d::301]) by forwardcorp1o.mail.yandex.net (Yandex) with ESMTP id 20EE72E1612; Fri, 8 May 2020 15:23:32 +0300 (MSK) Received: from myt5-70c90f7d6d7d.qloud-c.yandex.net (myt5-70c90f7d6d7d.qloud-c.yandex.net [2a02:6b8:c12:3e2c:0:640:70c9:f7d]) by mxbackcorp1o.mail.yandex.net (mxbackcorp/Yandex) with ESMTP id ca7YkIQC43-NVbiCTJc; Fri, 08 May 2020 15:23:32 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1588940612; bh=JCfV9a+6LV7lO3l6vs+TYqJodxqNVXpsp3UM6EFrsaU=; h=In-Reply-To:Message-ID:References:Date:To:From:Subject:Cc; b=GB1sHnQLxmHeY5kYs6x5t0JS4qxbIgMT4npqmTRhL4nq211peBqHxULbblvWc6RNI NdcAdlZYpnAl6sRV19ayiqoCglrGK2rryvjg4ifVoJ3sWLaN1CX9jDEqejWApa7iuQ CNvETOlNi/7mABcXwm9qNZIZ8bTp4XnhRf4kk8MQ= Authentication-Results: mxbackcorp1o.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from dynamic-vpn.dhcp.yndx.net (dynamic-vpn.dhcp.yndx.net [2a02:6b8:b080:7008::1:4]) by myt5-70c90f7d6d7d.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id 74uZNHjqL4-NVWKuJN9; Fri, 08 May 2020 15:23:31 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) Subject: [PATCH RFC 7/8] dcache: push releasing dentry lock into sweep_negative From: Konstantin Khlebnikov To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Alexander Viro Cc: Waiman Long Date: Fri, 08 May 2020 15:23:30 +0300 Message-ID: <158894061026.200862.15846101347037556126.stgit@buzz> In-Reply-To: <158893941613.200862.4094521350329937435.stgit@buzz> References: <158893941613.200862.4094521350329937435.stgit@buzz> User-Agent: StGit/0.22-32-g6a05 MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This is preparation for the next patch. Signed-off-by: Konstantin Khlebnikov --- fs/dcache.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 0fd2e02e507b..60158065891e 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -636,15 +636,17 @@ static inline struct dentry *lock_parent(struct dentry *dentry) * Move cached negative dentry to the tail of parent->d_subdirs. * This lets walkers skip them all together at first sight. * Must be called at dput of negative dentry. + * dentry->d_lock must be held, returns with it unlocked. */ static void sweep_negative(struct dentry *dentry) + __releases(dentry->d_lock) { struct dentry *parent; if (!d_is_tail_negative(dentry)) { parent = lock_parent(dentry); if (!parent) - return; + goto out; if (!d_count(dentry) && d_is_negative(dentry) && !d_is_tail_negative(dentry)) { @@ -654,6 +656,8 @@ static void sweep_negative(struct dentry *dentry) spin_unlock(&parent->d_lock); } +out: + spin_unlock(&dentry->d_lock); } /* @@ -747,7 +751,8 @@ static struct dentry *dentry_kill(struct dentry *dentry) spin_unlock(&parent->d_lock); if (d_is_negative(dentry)) sweep_negative(dentry); - spin_unlock(&dentry->d_lock); + else + spin_unlock(&dentry->d_lock); return NULL; } @@ -905,7 +910,8 @@ void dput(struct dentry *dentry) if (likely(retain_dentry(dentry))) { if (d_is_negative(dentry)) sweep_negative(dentry); - spin_unlock(&dentry->d_lock); + else + spin_unlock(&dentry->d_lock); return; } From patchwork Fri May 8 12:23:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Konstantin Khlebnikov X-Patchwork-Id: 11536445 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D9A51668 for ; Fri, 8 May 2020 12:23:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 05DB6215A4 for ; Fri, 8 May 2020 12:23:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=yandex-team.ru header.i=@yandex-team.ru header.b="cCeGDeU1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727805AbgEHMXi (ORCPT ); Fri, 8 May 2020 08:23:38 -0400 Received: from forwardcorp1j.mail.yandex.net ([5.45.199.163]:34866 "EHLO forwardcorp1j.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727790AbgEHMXh (ORCPT ); Fri, 8 May 2020 08:23:37 -0400 Received: from mxbackcorp1g.mail.yandex.net (mxbackcorp1g.mail.yandex.net [IPv6:2a02:6b8:0:1402::301]) by forwardcorp1j.mail.yandex.net (Yandex) with ESMTP id AB6952E0DD7; Fri, 8 May 2020 15:23:34 +0300 (MSK) Received: from myt4-18a966dbd9be.qloud-c.yandex.net (myt4-18a966dbd9be.qloud-c.yandex.net [2a02:6b8:c00:12ad:0:640:18a9:66db]) by mxbackcorp1g.mail.yandex.net (mxbackcorp/Yandex) with ESMTP id kkiK2Ut3yM-NXA0BVW6; Fri, 08 May 2020 15:23:34 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1588940614; bh=wIdyxNf74ULZ5uIf1iPET3/wYp7NaFja75AOK6nS8TA=; h=In-Reply-To:Message-ID:References:Date:To:From:Subject:Cc; b=cCeGDeU1SZs0abVyG+qpdEg6kcAoSWrXwGBjf6MoT93mpYKw3EvgqzwLrYsLA+43F Mn9NRbrdWZ6cFhq5J6zw7eZuubM+697YXX4BX1Mh7Q4Z+vL6gZm/wEaw6XMN3VaGJr 4KgKiIpDgGUrk8h4pUnPFQWqWe6McSw090z9ny1A= Authentication-Results: mxbackcorp1g.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Received: from dynamic-vpn.dhcp.yndx.net (dynamic-vpn.dhcp.yndx.net [2a02:6b8:b080:7008::1:4]) by myt4-18a966dbd9be.qloud-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id Y6BD9yFlOA-NXWipiUC; Fri, 08 May 2020 15:23:33 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) Subject: [PATCH RFC 8/8] dcache: prevent flooding with negative dentries From: Konstantin Khlebnikov To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Alexander Viro Cc: Waiman Long Date: Fri, 08 May 2020 15:23:33 +0300 Message-ID: <158894061332.200862.9812452563558764287.stgit@buzz> In-Reply-To: <158893941613.200862.4094521350329937435.stgit@buzz> References: <158893941613.200862.4094521350329937435.stgit@buzz> User-Agent: StGit/0.22-32-g6a05 MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Without memory pressure count of negative dentries isn't bounded. They could consume all memory and drain all other inactive caches. Typical scenario is an idle system where some process periodically creates temporary files and removes them. After some time, memory will be filled with negative dentries for these random file names. Reclaiming them took some time because slab frees pages only when all related objects are gone. Time of dentry lookup is usually unaffected because hash table grows along with size of memory. Unless somebody especially crafts hash collisions. Simple lookup of random names also generates negative dentries very fast. This patch implements heuristic which detects such scenarios and prevents unbounded growth of completely unneeded negative dentries. It keeps up to three latest negative dentry in each bucket unless they were referenced. At first dput of negative dentry when it swept to the tail of siblings we'll also clear it's reference flag and look at next dentries in chain. Then kill third in series of negative, unused and unreferenced denries. This way each hash bucket will preserve three negative dentry to let them get reference and survive. Adding positive or used dentry into hash chain also protects few recent negative dentries. In result total size of dcache asymptotically limited by count of buckets and positive or used dentries. Before patch: tool 'dcache_stress' could fill entire memory with dentries. nr_dentry = 104913261 104.9M nr_buckets = 8388608 12.5 avg nr_unused = 104898729 100.0% nr_negative = 104883218 100.0% After this patch count of dentries saturates at around 3 per bucket: nr_dentry = 24619259 24.6M nr_buckets = 8388608 2.9 avg nr_unused = 24605226 99.9% nr_negative = 24600351 99.9% This heuristic isn't bulletproof and solves only most practical case. It's easy to deceive: just touch same random name twice. Signed-off-by: Konstantin Khlebnikov --- fs/dcache.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/fs/dcache.c b/fs/dcache.c index 60158065891e..9f3d331b4978 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -632,6 +632,58 @@ static inline struct dentry *lock_parent(struct dentry *dentry) return __lock_parent(dentry); } +/* + * Called at first dput of each negative dentry. + * Prevents filling cache with never reused negative dentries. + * + * This clears reference and then looks at following dentries in hash chain. + * If they are negative, unused and unreferenced then keep two and kill third. + */ +static void trim_negative(struct dentry *dentry) + __releases(dentry->d_lock) +{ + struct dentry *victim, *parent; + struct hlist_bl_node *next; + int keep = 2; + + rcu_read_lock(); + + dentry->d_flags &= ~DCACHE_REFERENCED; + spin_unlock(&dentry->d_lock); + + next = rcu_dereference_raw(dentry->d_hash.next); + while (1) { + victim = hlist_bl_entry(next, struct dentry, d_hash); + + if (!next || d_count(victim) || !d_is_negative(victim) || + (victim->d_flags & DCACHE_REFERENCED)) { + rcu_read_unlock(); + return; + } + + if (!keep--) + break; + + next = rcu_dereference_raw(next->next); + } + + spin_lock(&victim->d_lock); + parent = lock_parent(victim); + + rcu_read_unlock(); + + if (d_count(victim) || !d_is_negative(victim) || + (victim->d_flags & DCACHE_REFERENCED)) { + if (parent) + spin_unlock(&parent->d_lock); + spin_unlock(&victim->d_lock); + return; + } + + __dentry_kill(victim); + dput(parent); +} + /* * Move cached negative dentry to the tail of parent->d_subdirs. * This lets walkers skip them all together at first sight. @@ -655,6 +707,8 @@ static void sweep_negative(struct dentry *dentry) } spin_unlock(&parent->d_lock); + + return trim_negative(dentry); } out: spin_unlock(&dentry->d_lock);