Message ID | 20190403042127.18755-15-tobin@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Slab Movable Objects (SMO) | expand |
On Wed, Apr 03, 2019 at 03:21:27PM +1100, Tobin C. Harding wrote: > The dentry slab cache is susceptible to internal fragmentation. Now > that we have Slab Movable Objects we can defragment the dcache. Object > migration is only possible for dentry objects that are not currently > referenced by anyone, i.e. we are using the object migration > infrastructure to free unused dentries. > > Implement isolate and migrate functions for the dentry slab cache. > + /* > + * Three sorts of dentries cannot be reclaimed: > + * > + * 1. dentries that are in the process of being allocated > + * or being freed. In that case the dentry is neither > + * on the LRU nor hashed. > + * > + * 2. Fake hashed entries as used for anonymous dentries > + * and pipe I/O. The fake hashed entries have d_flags > + * set to indicate a hashed entry. However, the > + * d_hash field indicates that the entry is not hashed. > + * > + * 3. dentries that have a backing store that is not > + * writable. This is true for tmpsfs and other in > + * memory filesystems. Removing dentries from them > + * would loose dentries for good. > + */ > + if ((d_unhashed(dentry) && list_empty(&dentry->d_lru)) || > + (!d_unhashed(dentry) && hlist_bl_unhashed(&dentry->d_hash)) || > + (dentry->d_inode && > + !mapping_cap_writeback_dirty(dentry->d_inode->i_mapping))) { > + /* Ignore this dentry */ > + v[i] = NULL; > + } else { > + __dget_dlock(dentry); > + } > + spin_unlock(&dentry->d_lock); > + } > + return NULL; /* No need for private data */ > +} > + > +/* > + * d_migrate() - Dentry migration callback function. > + * @s: The dentry cache. > + * @v: Vector of pointers to the objects to migrate. > + * @nr: Number of objects in @v. > + * @node: The NUMA node where new object should be allocated. > + * @private: Returned by d_isolate() (currently %NULL). > + * > + * Slab has dropped all the locks. Get rid of the refcount obtained > + * earlier and also free the object. > + */ > +static void d_migrate(struct kmem_cache *s, void **v, int nr, > + int node, void *_unused) > +{ > + struct dentry *dentry; > + int i; > + > + for (i = 0; i < nr; i++) { > + dentry = v[i]; > + if (dentry) > + d_invalidate(dentry); Oh, *brilliant* Let's do d_invalidate() on random dentries and hope they go away. With convoluted and brittle logics for deciding which ones to spare, which is actually wrong. This will pick mountpoints and tear them out, to start with. NAKed-by: Al Viro <viro@zeniv.linux.org.uk> And this is a NAK for the entire approach; if it has a positive refcount, LEAVE IT ALONE. Period. Don't play this kind of games, they are wrong. d_invalidate() is not something that can be done to an arbitrary dentry.
On Wed, Apr 03, 2019 at 06:08:11PM +0100, Al Viro wrote: > Oh, *brilliant* > > Let's do d_invalidate() on random dentries and hope they go away. > With convoluted and brittle logics for deciding which ones to > spare, which is actually wrong. This will pick mountpoints > and tear them out, to start with. > > NAKed-by: Al Viro <viro@zeniv.linux.org.uk> > > And this is a NAK for the entire approach; if it has a positive refcount, > LEAVE IT ALONE. Period. Don't play this kind of games, they are wrong. > d_invalidate() is not something that can be done to an arbitrary dentry. PS: "try to evict what can be evicted out of this set" can be done, but you want something like start with empty list go through your array of references grab dentry->d_lock if dentry->d_lockref.count is not zero unlock and continue if dentry->d_flags & DCACHE_SHRINK_LIST ditto, it's not for us to play with if (dentry->d_flags & DCACHE_LRU_LIST) d_lru_del(dentry); d_shrink_add(dentry, &list); unlock on the collection phase and if the list is not empty by the end of that loop shrink_dentry_list(&list); on the disposal.
On Wed, Apr 03, 2019 at 06:19:21PM +0100, Al Viro wrote: > On Wed, Apr 03, 2019 at 06:08:11PM +0100, Al Viro wrote: > > > Oh, *brilliant* > > > > Let's do d_invalidate() on random dentries and hope they go away. > > With convoluted and brittle logics for deciding which ones to > > spare, which is actually wrong. This will pick mountpoints > > and tear them out, to start with. > > > > NAKed-by: Al Viro <viro@zeniv.linux.org.uk> > > > > And this is a NAK for the entire approach; if it has a positive refcount, > > LEAVE IT ALONE. Period. Don't play this kind of games, they are wrong. > > d_invalidate() is not something that can be done to an arbitrary dentry. > > PS: "try to evict what can be evicted out of this set" can be done, but > you want something like > start with empty list > go through your array of references > grab dentry->d_lock > if dentry->d_lockref.count is not zero > unlock and continue > if dentry->d_flags & DCACHE_SHRINK_LIST > ditto, it's not for us to play with > if (dentry->d_flags & DCACHE_LRU_LIST) > d_lru_del(dentry); > d_shrink_add(dentry, &list); > unlock > > on the collection phase and > if the list is not empty by the end of that loop > shrink_dentry_list(&list); > on the disposal. Note, BTW, that your constructor is wrong - all it really needs to do is spin_lock_init() and setting ->d_lockref.count same as lockref_mark_dead() does, to match the state of dentries being torn down. __d_alloc() is not holding ->d_lock, since the object is not visible to anybody else yet; with your changes it *is* visible. However, if the assignment to ->d_lockref.count in __d_alloc() is guaranteed to be non-zero to non-zero, the above should be safe.
On Wed, 3 Apr 2019, Al Viro wrote: > Let's do d_invalidate() on random dentries and hope they go away. > With convoluted and brittle logics for deciding which ones to > spare, which is actually wrong. This will pick mountpoints > and tear them out, to start with. > > NAKed-by: Al Viro <viro@zeniv.linux.org.uk> > > And this is a NAK for the entire approach; if it has a positive refcount, > LEAVE IT ALONE. Period. Don't play this kind of games, they are wrong. > d_invalidate() is not something that can be done to an arbitrary dentry. Well could you help us figure out how to do it the right way? We (the MM guys) are having a hard time not being familiar with the filesystem stuff. This is an RFC and we want to know how to do this right.
On Wed, Apr 03, 2019 at 05:56:27PM +0000, Christopher Lameter wrote: > On Wed, 3 Apr 2019, Al Viro wrote: > > > Let's do d_invalidate() on random dentries and hope they go away. > > With convoluted and brittle logics for deciding which ones to > > spare, which is actually wrong. This will pick mountpoints > > and tear them out, to start with. > > > > NAKed-by: Al Viro <viro@zeniv.linux.org.uk> > > > > And this is a NAK for the entire approach; if it has a positive refcount, > > LEAVE IT ALONE. Period. Don't play this kind of games, they are wrong. > > d_invalidate() is not something that can be done to an arbitrary dentry. > > Well could you help us figure out how to do it the right way? We (the MM > guys) are having a hard time not being familiar with the filesystem stuff. > > This is an RFC and we want to know how to do this right. If by "how to do it right" you mean "expedit kicking out something with non-zero refcount" - there's no way to do that. Nothing even remotely sane. If you mean "kick out everything in this page with zero refcount" - that can be done (see further in the thread). Look, dentries and inodes are really, really not relocatable. If they can be evicted by memory pressure - sure, we can do that for a given set (e.g. "everything in that page"). But that's it - if memory pressure would _not_ get rid of that one, there's nothing to be done. Again, all VM can do is to simulate shrinker hitting hard on given bunch (rather than buggering the entire cache). If filesystem (or something in VFS) says "it's busy", it bloody well _is_ busy and won't be going away until it ceases to be such.
On Wed, Apr 03, 2019 at 07:24:54PM +0100, Al Viro wrote: > If by "how to do it right" you mean "expedit kicking out something with > non-zero refcount" - there's no way to do that. Nothing even remotely > sane. > > If you mean "kick out everything in this page with zero refcount" - that > can be done (see further in the thread). > > Look, dentries and inodes are really, really not relocatable. If they > can be evicted by memory pressure - sure, we can do that for a given > set (e.g. "everything in that page"). But that's it - if memory > pressure would _not_ get rid of that one, there's nothing to be done. > Again, all VM can do is to simulate shrinker hitting hard on given > bunch (rather than buggering the entire cache). If filesystem (or > something in VFS) says "it's busy", it bloody well _is_ busy and > won't be going away until it ceases to be such. FWIW, some theory: the only kind of long-term reference that can be killed off by memory pressure is that from child to parent. Anything else (e.g. an opened file, current directory, mountpoint, etc.) is out of limits - it either won't be going away until the thing is not pinned anymore (close, chdir, etc.) *or* it really shouldn't be ("VM wants this mountpoint dentry freed, so just dissolve the mount" is a bloody bad idea for obvious reasons). Stuff in somebody's shrink list is none of our business - somebody else is going to try and evict it anyway; if it can be evicted, it will be. Anything with zero refcount that isn't in somebody else's shrink list is fair game. Directories with children could, in principle, be helped along - we could try shrink_dcache_parent() on them, which might end up leaving them with zero refcount. However, it's not cheap and if you pick the root dentry of a filesystem, it'll try to evict everything on it that can be evicted, be it in this page or not. And there's no promise that it will end up evictable after that. So from the correctness POV * you can kick out everything with zero refcount not on shrink lists. * you _might_ try shrink_dcache_parent() on directory dentries, in hope to drive their refcount to zero. However, that's almost certainly going to hit too hard and be too costly. * d_invalidate() is no-go; if anything, you want something weaker than shrink_dcache_parent(), not stronger. For anything beyond "just kick out everything in that page that happens to have zero refcount" I would really like to see the stats - how much does it help, how costly it is _and_ how much of the cache does it throw away (see above re running into a root dentry of some filesystem and essentially trimming dcache for that fs down to the unevictable stuff).
On Wed, Apr 3, 2019 at 9:05 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Wed, Apr 03, 2019 at 07:24:54PM +0100, Al Viro wrote: > > > If by "how to do it right" you mean "expedit kicking out something with > > non-zero refcount" - there's no way to do that. Nothing even remotely > > sane. > > > > If you mean "kick out everything in this page with zero refcount" - that > > can be done (see further in the thread). > > > > Look, dentries and inodes are really, really not relocatable. If they > > can be evicted by memory pressure - sure, we can do that for a given > > set (e.g. "everything in that page"). But that's it - if memory > > pressure would _not_ get rid of that one, there's nothing to be done. > > Again, all VM can do is to simulate shrinker hitting hard on given > > bunch (rather than buggering the entire cache). If filesystem (or > > something in VFS) says "it's busy", it bloody well _is_ busy and > > won't be going away until it ceases to be such. > > FWIW, some theory: the only kind of long-term reference that can > be killed off by memory pressure is that from child to parent. > Anything else (e.g. an opened file, current directory, mountpoint, > etc.) is out of limits - it either won't be going away until > the thing is not pinned anymore (close, chdir, etc.) *or* > it really shouldn't be ("VM wants this mountpoint dentry freed, > so just dissolve the mount" is a bloody bad idea for obvious > reasons). Well, theoretically we could do two levels of references, where the long term reference is stable and contains an rcu protected unstable reference to the real object. In the likely case when only read-only access to the object is needed (d_lookup) then the cost is an extra dereference and the associated additional cache usage. If read-write access is needed to object, then extra locking is needed to protect against concurrent migration. So there's non-trivial cost in addition to the added complexity, and I don't see it actually making sense in practice. But maybe someone can expand this idea to something practicable... Thanks, Miklos
On Wed, 3 Apr 2019, Al Viro wrote: > > This is an RFC and we want to know how to do this right. > > If by "how to do it right" you mean "expedit kicking out something with > non-zero refcount" - there's no way to do that. Nothing even remotely > sane. Sure we know that. > If you mean "kick out everything in this page with zero refcount" - that > can be done (see further in the thread). Ok that would already be progress. If we can use this to liberate some slab pages with just a few dentry object then it may be worthwhile. > Look, dentries and inodes are really, really not relocatable. If they > can be evicted by memory pressure - sure, we can do that for a given > set (e.g. "everything in that page"). But that's it - if memory > pressure would _not_ get rid of that one, there's nothing to be done. > Again, all VM can do is to simulate shrinker hitting hard on given > bunch (rather than buggering the entire cache). If filesystem (or > something in VFS) says "it's busy", it bloody well _is_ busy and > won't be going away until it ceases to be such. Right. Thats why the patch attempted to check for these things to avoid touching such objects.
On Wed, Apr 03, 2019 at 06:48:55PM +0100, Al Viro wrote: > On Wed, Apr 03, 2019 at 06:19:21PM +0100, Al Viro wrote: > > On Wed, Apr 03, 2019 at 06:08:11PM +0100, Al Viro wrote: > > > > > Oh, *brilliant* > > > > > > Let's do d_invalidate() on random dentries and hope they go away. > > > With convoluted and brittle logics for deciding which ones to > > > spare, which is actually wrong. This will pick mountpoints > > > and tear them out, to start with. > > > > > > NAKed-by: Al Viro <viro@zeniv.linux.org.uk> > > > > > > And this is a NAK for the entire approach; if it has a positive refcount, > > > LEAVE IT ALONE. Period. Don't play this kind of games, they are wrong. > > > d_invalidate() is not something that can be done to an arbitrary dentry. > > > > PS: "try to evict what can be evicted out of this set" can be done, but > > you want something like > > start with empty list > > go through your array of references > > grab dentry->d_lock > > if dentry->d_lockref.count is not zero > > unlock and continue > > if dentry->d_flags & DCACHE_SHRINK_LIST > > ditto, it's not for us to play with > > if (dentry->d_flags & DCACHE_LRU_LIST) > > d_lru_del(dentry); > > d_shrink_add(dentry, &list); > > unlock > > > > on the collection phase and > > if the list is not empty by the end of that loop > > shrink_dentry_list(&list); > > on the disposal. > > Note, BTW, that your constructor is wrong - all it really needs to do > is spin_lock_init() and setting ->d_lockref.count same as lockref_mark_dead() > does, to match the state of dentries being torn down. Thanks for looking at this Al. > __d_alloc() is not holding ->d_lock, since the object is not visible to > anybody else yet; with your changes it *is* visible. I don't quite understand this comment. How is the object visible? The constructor is only called when allocating a new page to the slab and this is done with interrupts disabled. > However, if the > assignment to ->d_lockref.count in __d_alloc() is guaranteed to be > non-zero to non-zero, the above should be safe. I've done as you suggest and set it to -128 Thanks for schooling me on the VFS stuff. Tobin
On Wed, Apr 03, 2019 at 06:19:21PM +0100, Al Viro wrote: > On Wed, Apr 03, 2019 at 06:08:11PM +0100, Al Viro wrote: > > > Oh, *brilliant* > > > > Let's do d_invalidate() on random dentries and hope they go away. > > With convoluted and brittle logics for deciding which ones to > > spare, which is actually wrong. This will pick mountpoints > > and tear them out, to start with. > > > > NAKed-by: Al Viro <viro@zeniv.linux.org.uk> > > > > And this is a NAK for the entire approach; if it has a positive refcount, > > LEAVE IT ALONE. Period. Don't play this kind of games, they are wrong. > > d_invalidate() is not something that can be done to an arbitrary dentry. > > PS: "try to evict what can be evicted out of this set" can be done, but > you want something like > start with empty list > go through your array of references > grab dentry->d_lock > if dentry->d_lockref.count is not zero > unlock and continue > if dentry->d_flags & DCACHE_SHRINK_LIST > ditto, it's not for us to play with > if (dentry->d_flags & DCACHE_LRU_LIST) > d_lru_del(dentry); > d_shrink_add(dentry, &list); > unlock > > on the collection phase and > if the list is not empty by the end of that loop > shrink_dentry_list(&list); > on the disposal. Implemented as suggested, thanks. RFCv3 to come when we have some stats for you :) thanks, Tobin.
On Fri, Apr 05, 2019 at 07:29:14AM +1100, Tobin C. Harding wrote: > > __d_alloc() is not holding ->d_lock, since the object is not visible to > > anybody else yet; with your changes it *is* visible. > > I don't quite understand this comment. How is the object visible? The > constructor is only called when allocating a new page to the slab and > this is done with interrupts disabled. Not to constructor, to your try-to-evict code...
diff --git a/fs/dcache.c b/fs/dcache.c index 606844ad5171..4387715b7ebb 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -30,6 +30,7 @@ #include <linux/bit_spinlock.h> #include <linux/rculist_bl.h> #include <linux/list_lru.h> +#include <linux/backing-dev.h> #include "internal.h" #include "mount.h" @@ -3074,6 +3075,90 @@ void d_tmpfile(struct dentry *dentry, struct inode *inode) } EXPORT_SYMBOL(d_tmpfile); +/* + * d_isolate() - Dentry isolation callback function. + * @s: The dentry cache. + * @v: Vector of pointers to the objects to migrate. + * @nr: Number of objects in @v. + * + * The slab allocator is holding off frees. We can safely examine + * the object without the danger of it vanishing from under us. + */ +static void *d_isolate(struct kmem_cache *s, void **v, int nr) +{ + struct dentry *dentry; + int i; + + for (i = 0; i < nr; i++) { + dentry = v[i]; + spin_lock(&dentry->d_lock); + /* + * Three sorts of dentries cannot be reclaimed: + * + * 1. dentries that are in the process of being allocated + * or being freed. In that case the dentry is neither + * on the LRU nor hashed. + * + * 2. Fake hashed entries as used for anonymous dentries + * and pipe I/O. The fake hashed entries have d_flags + * set to indicate a hashed entry. However, the + * d_hash field indicates that the entry is not hashed. + * + * 3. dentries that have a backing store that is not + * writable. This is true for tmpsfs and other in + * memory filesystems. Removing dentries from them + * would loose dentries for good. + */ + if ((d_unhashed(dentry) && list_empty(&dentry->d_lru)) || + (!d_unhashed(dentry) && hlist_bl_unhashed(&dentry->d_hash)) || + (dentry->d_inode && + !mapping_cap_writeback_dirty(dentry->d_inode->i_mapping))) { + /* Ignore this dentry */ + v[i] = NULL; + } else { + __dget_dlock(dentry); + } + spin_unlock(&dentry->d_lock); + } + return NULL; /* No need for private data */ +} + +/* + * d_migrate() - Dentry migration callback function. + * @s: The dentry cache. + * @v: Vector of pointers to the objects to migrate. + * @nr: Number of objects in @v. + * @node: The NUMA node where new object should be allocated. + * @private: Returned by d_isolate() (currently %NULL). + * + * Slab has dropped all the locks. Get rid of the refcount obtained + * earlier and also free the object. + */ +static void d_migrate(struct kmem_cache *s, void **v, int nr, + int node, void *_unused) +{ + struct dentry *dentry; + int i; + + for (i = 0; i < nr; i++) { + dentry = v[i]; + if (dentry) + d_invalidate(dentry); + } + + for (i = 0; i < nr; i++) { + dentry = v[i]; + if (dentry) + dput(dentry); + } + + /* + * dentries are freed using RCU so we need to wait until RCU + * operations are complete. + */ + synchronize_rcu(); +} + static __initdata unsigned long dhash_entries; static int __init set_dhash_entries(char *str) { @@ -3119,6 +3204,8 @@ static void __init dcache_init(void) sizeof_field(struct dentry, d_iname), dcache_ctor); + kmem_cache_setup_mobility(dentry_cache, d_isolate, d_migrate); + /* Hash may have been set up in dcache_init_early */ if (!hashdist) return;
The dentry slab cache is susceptible to internal fragmentation. Now that we have Slab Movable Objects we can defragment the dcache. Object migration is only possible for dentry objects that are not currently referenced by anyone, i.e. we are using the object migration infrastructure to free unused dentries. Implement isolate and migrate functions for the dentry slab cache. Signed-off-by: Tobin C. Harding <tobin@kernel.org> --- fs/dcache.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+)