Message ID | 87y3lxj9wr.fsf@notabene.neil.brown.name (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Dec 21, 2017 at 09:45:40AM +1100, NeilBrown wrote: > -c/ Helper routines to allocate anonymous dentries, and to help attach > + prefix. If the refcount on a dentry with this flag set > + becomes zero, the dentry is immediately discarded, rather than being > + kept in the dcache. If a dentry that is not already in the dcache > + is repeatedly accessed by filehandle (as NFSD might do), an new dentry > + will be a allocated for each access, and discarded at the end of s/a // > + the access. As there is no parent, children, or name in the dentry > + is it unlikely that there will be any useful information to lose, > + and allocating a new dentry should normally be fast. How about: As the dentry is completely unattached, there is little information to lose, and allocating a new dentry is normally fast. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 20 2017, Matthew Wilcox wrote: > On Thu, Dec 21, 2017 at 09:45:40AM +1100, NeilBrown wrote: >> -c/ Helper routines to allocate anonymous dentries, and to help attach >> + prefix. If the refcount on a dentry with this flag set >> + becomes zero, the dentry is immediately discarded, rather than being >> + kept in the dcache. If a dentry that is not already in the dcache >> + is repeatedly accessed by filehandle (as NFSD might do), an new dentry >> + will be a allocated for each access, and discarded at the end of > > s/a // > >> + the access. As there is no parent, children, or name in the dentry >> + is it unlikely that there will be any useful information to lose, >> + and allocating a new dentry should normally be fast. > > How about: > > As the dentry is completely unattached, there is little information to > lose, and allocating a new dentry is normally fast. Yes, that is less verbose, and probably just as informative. Every time I rehearse this argument I wonder about d_fsdata. I don't think it is actually relevant, but I feel that by not mentioning it, I'm being dishonest... Maybe I should think of it as being "less distracting". Thanks - if this gets far enough to deserve a resent, I'll add those changes. NeilBrown
On Thu, Dec 21, 2017 at 09:45:40AM +1100, NeilBrown wrote: > -c/ Helper routines to allocate anonymous dentries, and to help attach > + prefix. If the refcount on a dentry with this flag set > + becomes zero, the dentry is immediately discarded, rather than being > + kept in the dcache. If a dentry that is not already in the dcache > + is repeatedly accessed by filehandle (as NFSD might do), an new dentry > + will be a allocated for each access, and discarded at the end of > + the access. As there is no parent, children, or name in the dentry ^^^^^^^^ That part is where I have a problem with it. Consider nfsd failing to reconnect a growing subtree with the root. It has managed to get to some point, but then failed to get the parent for some reason (IO error, OOM, anything). Now we have a non-trivial subtree; its root does have children, but it's not connected to anything. It has been created by d_obtain_alias(); in __d_obtain_alias() IS_ROOT() had been true, and so was 'disconnected' argument. The question is not whether they carry any valuable information - it's whether we are guaranteed that they won't need pruning on umount. And they will - the invariant we maintain is that all descendents will have DCACHE_DISCONNECTED set until the sucker is reconnected to root. That's why they won't stick around - nothing in such subtree will be retained in dcache once the refcount hits 0. I believe that the actual changes are OK, but your explanation above is wrong and the logics there is convoluted enough, so this needs to be written accurately. BTW, I would like comments from Lustre folks - the situation with dcache in there is rather unusual. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 20, 2017 at 2:45 PM, NeilBrown <neilb@suse.com> wrote: > > We could just leave the code unchanged, but apart from that being > potentially confusing, the (unfair) bit-spin-lock which protects > s_anon can become a bottle neck when lots of disconnected dentries are > being created. > > So this patch renames s_anon to s_roots, and stops storing > disconnected dentries on the list. Only dentries obtained with > d_obtain_root() are now stored on this list. There are many fewer of > these (only NFS and NILFS2 use the call, and only during filesystem > mount) so contention on the bit-lock will not be a problem. Thanks, Neil. This is much nicer than the magical special case patch for s_anon bitlock. Al, I'm going to assume I'll get this through your vfs tree (with whatever edits from the comments people made). Linus -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 20, 2017 at 04:57:28PM -0800, Linus Torvalds wrote: > On Wed, Dec 20, 2017 at 2:45 PM, NeilBrown <neilb@suse.com> wrote: > > > > We could just leave the code unchanged, but apart from that being > > potentially confusing, the (unfair) bit-spin-lock which protects > > s_anon can become a bottle neck when lots of disconnected dentries are > > being created. > > > > So this patch renames s_anon to s_roots, and stops storing > > disconnected dentries on the list. Only dentries obtained with > > d_obtain_root() are now stored on this list. There are many fewer of > > these (only NFS and NILFS2 use the call, and only during filesystem > > mount) so contention on the bit-lock will not be a problem. > > Thanks, Neil. This is much nicer than the magical special case patch > for s_anon bitlock. > > Al, I'm going to assume I'll get this through your vfs tree (with > whatever edits from the comments people made). *nod* I've got sidetracked digging through the Lustre mess with dcache (revalidation stuff, again), will finish once I get some sleep. In any case, I like that variant; it's definitely going to be applied. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/filesystems/nfs/Exporting b/Documentation/filesystems/nfs/Exporting index 520a4becb75c..269ff98f1a0f 100644 --- a/Documentation/filesystems/nfs/Exporting +++ b/Documentation/filesystems/nfs/Exporting @@ -56,15 +56,16 @@ a/ A dentry flag DCACHE_DISCONNECTED which is set on any dentry that might not be part of the proper prefix. This is set when anonymous dentries are created, and cleared when a dentry is noticed to be a child of a dentry which is in the proper - prefix. - -b/ A per-superblock list "s_anon" of dentries which are the roots of - subtrees that are not in the proper prefix. These dentries, as - well as the proper prefix, need to be released at unmount time. As - these dentries will not be hashed, they are linked together on the - d_hash list_head. - -c/ Helper routines to allocate anonymous dentries, and to help attach + prefix. If the refcount on a dentry with this flag set + becomes zero, the dentry is immediately discarded, rather than being + kept in the dcache. If a dentry that is not already in the dcache + is repeatedly accessed by filehandle (as NFSD might do), an new dentry + will be a allocated for each access, and discarded at the end of + the access. As there is no parent, children, or name in the dentry + is it unlikely that there will be any useful information to lose, + and allocating a new dentry should normally be fast. + +b/ Helper routines to allocate anonymous dentries, and to help attach loose directory dentries at lookup time. They are: d_obtain_alias(inode) will return a dentry for the given inode. If the inode already has a dentry, one of those is returned. diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h index b133fd00c08c..0d62fcf016dc 100644 --- a/drivers/staging/lustre/lustre/llite/llite_internal.h +++ b/drivers/staging/lustre/lustre/llite/llite_internal.h @@ -1296,15 +1296,7 @@ static inline void d_lustre_invalidate(struct dentry *dentry, int nested) spin_lock_nested(&dentry->d_lock, nested ? DENTRY_D_LOCK_NESTED : DENTRY_D_LOCK_NORMAL); ll_d2d(dentry)->lld_invalid = 1; - /* - * We should be careful about dentries created by d_obtain_alias(). - * These dentries are not put in the dentry tree, instead they are - * linked to sb->s_anon through dentry->d_hash. - * shrink_dcache_for_umount() shrinks the tree and sb->s_anon list. - * If we unhashed such a dentry, unmount would not be able to find - * it and busy inodes would be reported. - */ - if (d_count(dentry) == 0 && !(dentry->d_flags & DCACHE_DISCONNECTED)) + if (d_count(dentry) == 0) __d_drop(dentry); spin_unlock(&dentry->d_lock); } diff --git a/fs/dcache.c b/fs/dcache.c index 5c7df1df81ff..2c0db70a92fa 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -49,8 +49,8 @@ * - i_dentry, d_u.d_alias, d_inode of aliases * dcache_hash_bucket lock protects: * - the dcache hash table - * s_anon bl list spinlock protects: - * - the s_anon list (see __d_drop) + * s_roots bl list spinlock protects: + * - the s_roots list (see __d_drop) * dentry->d_sb->s_dentry_lru_lock protects: * - the dcache lru lists and counters * d_lock protects: @@ -68,7 +68,7 @@ * dentry->d_lock * dentry->d_sb->s_dentry_lru_lock * dcache_hash_bucket lock - * s_anon lock + * s_roots lock * * If there is an ancestor relationship: * dentry->d_parent->...->d_parent->d_lock @@ -477,10 +477,10 @@ void __d_drop(struct dentry *dentry) /* * Hashed dentries are normally on the dentry hashtable, * with the exception of those newly allocated by - * d_obtain_alias, which are always IS_ROOT: + * d_obtain_root, which are always IS_ROOT: */ if (unlikely(IS_ROOT(dentry))) - b = &dentry->d_sb->s_anon; + b = &dentry->d_sb->s_roots; else b = d_hash(dentry->d_name.hash); @@ -1500,8 +1500,8 @@ void shrink_dcache_for_umount(struct super_block *sb) sb->s_root = NULL; do_one_tree(dentry); - while (!hlist_bl_empty(&sb->s_anon)) { - dentry = dget(hlist_bl_entry(hlist_bl_first(&sb->s_anon), struct dentry, d_hash)); + while (!hlist_bl_empty(&sb->s_roots)) { + dentry = dget(hlist_bl_entry(hlist_bl_first(&sb->s_roots), struct dentry, d_hash)); do_one_tree(dentry); } } @@ -1965,9 +1965,11 @@ static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected) spin_lock(&tmp->d_lock); __d_set_inode_and_type(tmp, inode, add_flags); hlist_add_head(&tmp->d_u.d_alias, &inode->i_dentry); - hlist_bl_lock(&tmp->d_sb->s_anon); - hlist_bl_add_head(&tmp->d_hash, &tmp->d_sb->s_anon); - hlist_bl_unlock(&tmp->d_sb->s_anon); + if (!disconnected) { + hlist_bl_lock(&tmp->d_sb->s_roots); + hlist_bl_add_head(&tmp->d_hash, &tmp->d_sb->s_roots); + hlist_bl_unlock(&tmp->d_sb->s_roots); + } spin_unlock(&tmp->d_lock); spin_unlock(&inode->i_lock); diff --git a/fs/super.c b/fs/super.c index 7ff1349609e4..14c2ba3a1c28 100644 --- a/fs/super.c +++ b/fs/super.c @@ -225,7 +225,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, if (s->s_user_ns != &init_user_ns) s->s_iflags |= SB_I_NODEV; INIT_HLIST_NODE(&s->s_instances); - INIT_HLIST_BL_HEAD(&s->s_anon); + INIT_HLIST_BL_HEAD(&s->s_roots); mutex_init(&s->s_sync_lock); INIT_LIST_HEAD(&s->s_inodes); spin_lock_init(&s->s_inode_list_lock); diff --git a/include/linux/fs.h b/include/linux/fs.h index 511fbaabf624..cfe88cc65a52 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1359,7 +1359,7 @@ struct super_block { const struct fscrypt_operations *s_cop; - struct hlist_bl_head s_anon; /* anonymous dentries for (nfs) exporting */ + struct hlist_bl_head s_roots; /* alternate root dentries for NFS */ struct list_head s_mounts; /* list of mounts; _not_ for fs use */ struct block_device *s_bdev; struct backing_dev_info *s_bdi;
The original purpose of the per-superblock d_anon list was to keep disconnected dentries in the cache between consecutive requests to the NFS server. Dentries can be disconnected if a client holds a file open and repeatedly performs IO on it, and if the server drops the dentry, whether due to memory pressure, server restart, or "echo 3 > /proc/sys/vm/drop_caches". This purpose was thwarted by commit 75a6f82a0d10 ("freeing unlinked file indefinitely delayed") which caused disconnected dentries to be freed as soon as their refcount reached zero. This means that, when a dentry being used by nfsd gets disconnected, a new one needs to be allocated for every request (unless requests overlap). As the dentry has no name, no parent, and no children, there is little of value to cache. As small memory allocations are typically fast (from per-cpu free lists) this likely has little cost. This means that the original purpose of s_anon is no longer relevant: there is no longer any need to keep disconnected dentries on a list so they appear to be hashed. However, s_anon now has a new use. When you mount an NFS filesystem, the dentry stored in s_root is just a placebo. The "real" root dentry is allocated using d_obtain_root() and so it kept on the s_anon list. I don't know the reason for this, but suspect it related to NFSv4 where a mount of "server:/some/path" require NFS to look up the root filehandle on the server, then walk down "/some" and "/path" to get the filehandle to mount. Whatever the reason, NFS depends on the s_anon list and on shrink_dcache_for_umount() pruning all dentries on this list. So we cannot simply remove s_anon. We could just leave the code unchanged, but apart from that being potentially confusing, the (unfair) bit-spin-lock which protects s_anon can become a bottle neck when lots of disconnected dentries are being created. So this patch renames s_anon to s_roots, and stops storing disconnected dentries on the list. Only dentries obtained with d_obtain_root() are now stored on this list. There are many fewer of these (only NFS and NILFS2 use the call, and only during filesystem mount) so contention on the bit-lock will not be a problem. Possibly an alternate solution should be found for NFS and NILFS2, but that would require understanding their needs first. Signed-off-by: NeilBrown <neilb@suse.com> --- Documentation/filesystems/nfs/Exporting | 19 ++++++++++--------- .../staging/lustre/lustre/llite/llite_internal.h | 10 +--------- fs/dcache.c | 22 ++++++++++++---------- fs/super.c | 2 +- include/linux/fs.h | 2 +- 5 files changed, 25 insertions(+), 30 deletions(-)