Message ID | 1573788472-87426-2-git-send-email-yukuai3@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fix potential infinite loop in debugfs_remove_recursive | expand |
On Fri, Nov 15, 2019 at 11:27:50AM +0800, yu kuai wrote: > 'dentry_d_lock_class' can be used for spin_lock_nested in case lockdep > confused about two different dentry take the 'd_lock'. > > However, a single 'DENTRY_D_LOCK_NESTED' may not be enough if more than > two dentry are involed. So, and in 'DENTRY_D_LOCK_NESTED_2' > > Signed-off-by: yu kuai <yukuai3@huawei.com> > --- > include/linux/dcache.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/linux/dcache.h b/include/linux/dcache.h > index 10090f1..8eb84ef 100644 > --- a/include/linux/dcache.h > +++ b/include/linux/dcache.h > @@ -129,7 +129,8 @@ struct dentry { > enum dentry_d_lock_class > { > DENTRY_D_LOCK_NORMAL, /* implicitly used by plain spin_lock() APIs. */ > - DENTRY_D_LOCK_NESTED > + DENTRY_D_LOCK_NESTED, > + DENTRY_D_LOCK_NESTED_2 You should document this, as "_2" does not make much sense to anyone only looking at the code :( Or rename it better. thanks, greg k-h
On Fri, Nov 15, 2019 at 11:27:59AM +0800, Greg KH wrote: > On Fri, Nov 15, 2019 at 11:27:50AM +0800, yu kuai wrote: > > 'dentry_d_lock_class' can be used for spin_lock_nested in case lockdep > > confused about two different dentry take the 'd_lock'. > > > > However, a single 'DENTRY_D_LOCK_NESTED' may not be enough if more than > > two dentry are involed. So, and in 'DENTRY_D_LOCK_NESTED_2' > > > > Signed-off-by: yu kuai <yukuai3@huawei.com> > > --- > > include/linux/dcache.h | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/dcache.h b/include/linux/dcache.h > > index 10090f1..8eb84ef 100644 > > --- a/include/linux/dcache.h > > +++ b/include/linux/dcache.h > > @@ -129,7 +129,8 @@ struct dentry { > > enum dentry_d_lock_class > > { > > DENTRY_D_LOCK_NORMAL, /* implicitly used by plain spin_lock() APIs. */ > > - DENTRY_D_LOCK_NESTED > > + DENTRY_D_LOCK_NESTED, > > + DENTRY_D_LOCK_NESTED_2 > > You should document this, as "_2" does not make much sense to anyone > only looking at the code :( > > Or rename it better. FWIW, I'm not sure it's a good solution. What are the rules for callers of that thing, anyway? If it can be called when somebody is creating more files in that subtree, we almost certainly will have massive problems with the lifetimes of underlying objects... Could somebody familiar with debugfs explain how is that thing actually used and what is required from/promised to its callers? I can try and grep through the tree and guess what the rules are, but I've way too much on my platter right now and I don't want to get sidetracked into yet another tree-wide search and analysis session ;-/
On Fri, Nov 15, 2019 at 04:12:43AM +0000, Al Viro wrote: > On Fri, Nov 15, 2019 at 11:27:59AM +0800, Greg KH wrote: > > On Fri, Nov 15, 2019 at 11:27:50AM +0800, yu kuai wrote: > > > 'dentry_d_lock_class' can be used for spin_lock_nested in case lockdep > > > confused about two different dentry take the 'd_lock'. > > > > > > However, a single 'DENTRY_D_LOCK_NESTED' may not be enough if more than > > > two dentry are involed. So, and in 'DENTRY_D_LOCK_NESTED_2' > > > > > > Signed-off-by: yu kuai <yukuai3@huawei.com> > > > --- > > > include/linux/dcache.h | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/include/linux/dcache.h b/include/linux/dcache.h > > > index 10090f1..8eb84ef 100644 > > > --- a/include/linux/dcache.h > > > +++ b/include/linux/dcache.h > > > @@ -129,7 +129,8 @@ struct dentry { > > > enum dentry_d_lock_class > > > { > > > DENTRY_D_LOCK_NORMAL, /* implicitly used by plain spin_lock() APIs. */ > > > - DENTRY_D_LOCK_NESTED > > > + DENTRY_D_LOCK_NESTED, > > > + DENTRY_D_LOCK_NESTED_2 > > > > You should document this, as "_2" does not make much sense to anyone > > only looking at the code :( > > > > Or rename it better. > > FWIW, I'm not sure it's a good solution. What are the rules for callers > of that thing, anyway? If it can be called when somebody is creating > more files in that subtree, we almost certainly will have massive > problems with the lifetimes of underlying objects... > > Could somebody familiar with debugfs explain how is that thing actually > used and what is required from/promised to its callers? I can try and > grep through the tree and guess what the rules are, but I've way too > much on my platter right now and I don't want to get sidetracked into yet > another tree-wide search and analysis session ;-/ Yu wants to use simple_empty() in debugfs_remove_recursive() instead of manually checking: if (!list_empty(&child->d_subdirs)) { See patch 3 of this series for that change and why they feel it is needed: https://lore.kernel.org/lkml/1573788472-87426-4-git-send-email-yukuai3@huawei.com/ As to if patch 3 really is needed, I'll leave that up to Yu given that I thought we had resolved these types of issues already a year or so ago. thanks, greg k-h
On 2019/11/15 11:27, Greg KH wrote: > On Fri, Nov 15, 2019 at 11:27:50AM +0800, yu kuai wrote: >> 'dentry_d_lock_class' can be used for spin_lock_nested in case lockdep >> confused about two different dentry take the 'd_lock'. >> >> However, a single 'DENTRY_D_LOCK_NESTED' may not be enough if more than >> two dentry are involed. So, and in 'DENTRY_D_LOCK_NESTED_2' >> >> Signed-off-by: yu kuai <yukuai3@huawei.com> >> --- >> include/linux/dcache.h | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/dcache.h b/include/linux/dcache.h >> index 10090f1..8eb84ef 100644 >> --- a/include/linux/dcache.h >> +++ b/include/linux/dcache.h >> @@ -129,7 +129,8 @@ struct dentry { >> enum dentry_d_lock_class >> { >> DENTRY_D_LOCK_NORMAL, /* implicitly used by plain spin_lock() APIs. */ >> - DENTRY_D_LOCK_NESTED >> + DENTRY_D_LOCK_NESTED, >> + DENTRY_D_LOCK_NESTED_2 > > You should document this, as "_2" does not make much sense to anyone > only looking at the code :( > > Or rename it better. > Thank you for your advise, I'll try to rename it better with comments. Yu Kuai > thanks, > > greg k-h > > . >
On 2019/11/15 15:20, Greg KH wrote: > On Fri, Nov 15, 2019 at 04:12:43AM +0000, Al Viro wrote: >> On Fri, Nov 15, 2019 at 11:27:59AM +0800, Greg KH wrote: >>> On Fri, Nov 15, 2019 at 11:27:50AM +0800, yu kuai wrote: >>>> 'dentry_d_lock_class' can be used for spin_lock_nested in case lockdep >>>> confused about two different dentry take the 'd_lock'. >>>> >>>> However, a single 'DENTRY_D_LOCK_NESTED' may not be enough if more than >>>> two dentry are involed. So, and in 'DENTRY_D_LOCK_NESTED_2' >>>> >>>> Signed-off-by: yu kuai <yukuai3@huawei.com> >>>> --- >>>> include/linux/dcache.h | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/include/linux/dcache.h b/include/linux/dcache.h >>>> index 10090f1..8eb84ef 100644 >>>> --- a/include/linux/dcache.h >>>> +++ b/include/linux/dcache.h >>>> @@ -129,7 +129,8 @@ struct dentry { >>>> enum dentry_d_lock_class >>>> { >>>> DENTRY_D_LOCK_NORMAL, /* implicitly used by plain spin_lock() APIs. */ >>>> - DENTRY_D_LOCK_NESTED >>>> + DENTRY_D_LOCK_NESTED, >>>> + DENTRY_D_LOCK_NESTED_2 >>> >>> You should document this, as "_2" does not make much sense to anyone >>> only looking at the code :( >>> >>> Or rename it better. >> >> FWIW, I'm not sure it's a good solution. What are the rules for callers >> of that thing, anyway? If it can be called when somebody is creating >> more files in that subtree, we almost certainly will have massive >> problems with the lifetimes of underlying objects... >> >> Could somebody familiar with debugfs explain how is that thing actually >> used and what is required from/promised to its callers? I can try and >> grep through the tree and guess what the rules are, but I've way too >> much on my platter right now and I don't want to get sidetracked into yet >> another tree-wide search and analysis session ;-/ > > Yu wants to use simple_empty() in debugfs_remove_recursive() instead of > manually checking: > if (!list_empty(&child->d_subdirs)) { > > See patch 3 of this series for that change and why they feel it is > needed: > https://lore.kernel.org/lkml/1573788472-87426-4-git-send-email-yukuai3@huawei.com/ > > As to if patch 3 really is needed, I'll leave that up to Yu given that I > thought we had resolved these types of issues already a year or so ago. > > thanks, > > greg k-h > > . > The main purpose of this patchset is to fix the infinite loop in debugfs_remove_recursive. Steven point out that simple replace list_empty with simple_empty will cause a splat with lockdep enabled. We try to fix it with the first two patch, do you think it's appropriate? Thanks, Yu Kuai
On Fri, Nov 15, 2019 at 03:20:11PM +0800, Greg KH wrote: > > FWIW, I'm not sure it's a good solution. What are the rules for callers > > of that thing, anyway? If it can be called when somebody is creating > > more files in that subtree, we almost certainly will have massive > > problems with the lifetimes of underlying objects... > > > > Could somebody familiar with debugfs explain how is that thing actually > > used and what is required from/promised to its callers? I can try and > > grep through the tree and guess what the rules are, but I've way too > > much on my platter right now and I don't want to get sidetracked into yet > > another tree-wide search and analysis session ;-/ > > Yu wants to use simple_empty() in debugfs_remove_recursive() instead of > manually checking: > if (!list_empty(&child->d_subdirs)) { > > See patch 3 of this series for that change and why they feel it is > needed: > https://lore.kernel.org/lkml/1573788472-87426-4-git-send-email-yukuai3@huawei.com/ > > As to if patch 3 really is needed, I'll leave that up to Yu given that I > thought we had resolved these types of issues already a year or so ago. What I'm asking is what concurrency warranties does the whole thing (debugfs_remove_recursive()) have to deal with. IMO the overall structure of the walk-and-remove the tree algorithm in there is Not Nice(tm) and I'd like to understand if it needs to be kept that way. And the locking is confused in there - it either locks too much, or not enough. So... can debugfs_remove_recursive() rely upon the lack of attempts to create new entries inside the subtree it's trying to kill? If it can, the things can be made simpler; if it can't, it's not locking enough; e.g. results of simple_empty() on child won't be guaranteed to remain unchanged just as it returns to caller. What's more, the uses of simple_unlink()/simple_rmdir() there are not imitiating the normal locking environment for ->unlink() and ->rmdir() resp. - the victim's inode is not locked, so just for starters the call of simple_empty() from simple_rmdir() itself is not guaranteed to give valid result. I want to understand the overall situation. No argument, list_empty() in there is BS, for many reasons. But I wonder if trying to keep the current structure of the iterator _and_ the use of simple_rmdir()/simple_unlink() is the right approach.
On Fri, 15 Nov 2019 13:16:25 +0000 Al Viro <viro@zeniv.linux.org.uk> wrote: > I want to understand the overall situation. No argument, list_empty() > in there is BS, for many reasons. But I wonder if trying to keep the > current structure of the iterator _and_ the use of simple_rmdir()/simple_unlink() > is the right approach. My guess is that debugfs was written to be as simple as possible. Nothing too complex. And in doing so, may have issues as you are pointing out. Just a way to allow communications between user space and kernel space (as tracefs started out). BTW, what do you mean by "can debugfs_remove_recursive() rely upon the lack of attempts to create new entries inside the subtree it's trying to kill?" -- Steve
On Fri, 15 Nov 2019 08:38:13 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > My guess is that debugfs was written to be as simple as possible. > Nothing too complex. And in doing so, may have issues as you are > pointing out. Just a way to allow communications between user space and > kernel space (as tracefs started out). And speaking of tracefs, as it was basically a cut and paste copy of debugfs, it too has the same issues. Thus, I'm very much interested in how this should be done. -- Steve
On Fri, Nov 15, 2019 at 08:38:13AM -0500, Steven Rostedt wrote: > On Fri, 15 Nov 2019 13:16:25 +0000 > Al Viro <viro@zeniv.linux.org.uk> wrote: > > > I want to understand the overall situation. No argument, list_empty() > > in there is BS, for many reasons. But I wonder if trying to keep the > > current structure of the iterator _and_ the use of simple_rmdir()/simple_unlink() > > is the right approach. > > My guess is that debugfs was written to be as simple as possible. > Nothing too complex. And in doing so, may have issues as you are > pointing out. Just a way to allow communications between user space and > kernel space (as tracefs started out). > > BTW, what do you mean by "can debugfs_remove_recursive() rely upon the > lack of attempts to create new entries inside the subtree it's trying > to kill?" Is it possible for something to call e.g. debugfs_create_dir() (or any similar primitive) with parent inside the subtree that has been passed to debugfs_remove_recursive() call that is still in progress? If debugfs needs to cope with that, debugfs_remove_recursive() needs considerably heavier locking, to start with.
On Fri, 15 Nov 2019 13:48:23 +0000 Al Viro <viro@zeniv.linux.org.uk> wrote: > > BTW, what do you mean by "can debugfs_remove_recursive() rely upon the > > lack of attempts to create new entries inside the subtree it's trying > > to kill?" > > Is it possible for something to call e.g. debugfs_create_dir() (or any > similar primitive) with parent inside the subtree that has been > passed to debugfs_remove_recursive() call that is still in progress? > > If debugfs needs to cope with that, debugfs_remove_recursive() needs > considerably heavier locking, to start with. I don't know about debugfs, but at least tracefs (which cut and pasted from debugfs) does not allow that. At least in theory it doesn't allow that (and if it does, it's a bug in the locking at the higher levels). And perhaps debugfs shouldn't allow that either. As it is only suppose to be a light weight way to interact with the kernel, hence the name "debugfs". Yu, do you have a test case for the "infinite loop" case? -- Steve
On Fri, Nov 15, 2019 at 08:58:05AM -0500, Steven Rostedt wrote: > On Fri, 15 Nov 2019 13:48:23 +0000 > Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > BTW, what do you mean by "can debugfs_remove_recursive() rely upon the > > > lack of attempts to create new entries inside the subtree it's trying > > > to kill?" > > > > Is it possible for something to call e.g. debugfs_create_dir() (or any > > similar primitive) with parent inside the subtree that has been > > passed to debugfs_remove_recursive() call that is still in progress? > > > > If debugfs needs to cope with that, debugfs_remove_recursive() needs > > considerably heavier locking, to start with. > > I don't know about debugfs, but at least tracefs (which cut and pasted > from debugfs) does not allow that. At least in theory it doesn't allow > that (and if it does, it's a bug in the locking at the higher levels). > > And perhaps debugfs shouldn't allow that either. As it is only suppose > to be a light weight way to interact with the kernel, hence the name > "debugfs". > > Yu, do you have a test case for the "infinite loop" case? Infinite loop, AFAICS, is reasonably easy to trigger - just open a non-empty subdirectory and lseek to e.g. next-to-last element in it. Again, list_empty() use in there is quite wrong - it can give false negatives just on the cursors. No arguments about that part...
On Fri, Nov 15, 2019 at 02:17:54PM +0000, Al Viro wrote: > On Fri, Nov 15, 2019 at 08:58:05AM -0500, Steven Rostedt wrote: > > On Fri, 15 Nov 2019 13:48:23 +0000 > > Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > > BTW, what do you mean by "can debugfs_remove_recursive() rely upon the > > > > lack of attempts to create new entries inside the subtree it's trying > > > > to kill?" > > > > > > Is it possible for something to call e.g. debugfs_create_dir() (or any > > > similar primitive) with parent inside the subtree that has been > > > passed to debugfs_remove_recursive() call that is still in progress? > > > > > > If debugfs needs to cope with that, debugfs_remove_recursive() needs > > > considerably heavier locking, to start with. > > > > I don't know about debugfs, but at least tracefs (which cut and pasted > > from debugfs) does not allow that. At least in theory it doesn't allow > > that (and if it does, it's a bug in the locking at the higher levels). > > > > And perhaps debugfs shouldn't allow that either. As it is only suppose > > to be a light weight way to interact with the kernel, hence the name > > "debugfs". > > > > Yu, do you have a test case for the "infinite loop" case? > > Infinite loop, AFAICS, is reasonably easy to trigger - just open > a non-empty subdirectory and lseek to e.g. next-to-last element > in it. Again, list_empty() use in there is quite wrong - it can > give false negatives just on the cursors. No arguments about > that part... FWIW, given that debugfs_remove_recursive() has no way to report an error *and* that we have a single chokepoint for all entry creations (start_creating()) we could make sure nothing gets added pretty easily - just mark the victim dentry as "don't allow any creations here" when we first reach it and have start_creating check that, using e.g. inode_lock() for serialization. Marking could be done e.g. by setting ->d_fsdata to (void *)DEBUGFS_FSDATA_IS_REAL_FOPS_BIT, so that ->d_release() doesn't need any changes. BTW, this if (!ret) d_delete(dentry); if (d_is_reg(dentry)) __debugfs_file_removed(dentry); dput(dentry); in __debugfs_remove() is both subtle and bogus. If we get here with refcount > 1, d_delete() is just a d_drop() - dentry can't be made negative, so it gets unhashed instead. If we *do* get here with refcount 1, it will be made negative, all right. Only to be killed off by immediately following dput(), since debugfs doesn't retain unpinned dentries. Why immediate? Because d_is_reg() is obviously false on negative dentries, so __debugfs_file_removed() is not called. That's where the subtle part begins: there should've been nobody for __debugfs_file_removed() to wait for, since they would've had to hold additional references to dentry and refcount wouldn't have been 1. So that's actually not a bug. However, it's too subtle to introduce without having bothered to even comment the damn thing. As for the "bogus" part - that d_delete() is bollocks. First of all, it is fully equivalent to d_drop(). Always had been. What's more, that sucker should've been d_invalidate() instead. Anyway, AFAICS removal could be done this way: // parent is held exclusive, after is NULL or a child of parent find_next_child(parent, prev) child = NULL node = prev ? &prev->d_child : &parent->d_subdirs; grab parent->d_lock for each entry in the list starting at node->next d = container_of(entry, struct dentry, d_child) grab d->d_lock if simple_positive(d) bump d->d_count child = d drop d->d_lock if child break drop parent->d_lock dput(prev); return child kill_it(victim, parent) if simple_positive(victim) d_invalidate(victim); // needed to avoid lost mounts if victim is a directory fsnotify_rmdir else fsnotify_unlink if victim is regular __debugfs_file_removed(victim) dput(victim) // unpin it recursive_removal(dentry) this = dentry; while (true) { victim = NULL; inode = this->d_inode; inode_lock(inode); if (d_is_dir(this)) mark this doomed while ((child = find_next_child(this, victim)) == NULL) { // no children (left); kill and ascend // update metadata while it's still locked inode->i_ctime = current_time(inode); clear_nlink(inode); inode_unlock(inode); victim = this; this = this->d_parent; inode = this->d_inode; inode_lock(inode); kill_it(victim, this); if (victim == dentry) { inode->i_ctime = inode->i_mtime = current_time(inode); if (d_is_dir(dentry)) drop_nlink(inode); inode_unlock(inode); dput(dentry); return; } } inode_unlock(inode); this = child; }
On Fri, Nov 15, 2019 at 05:54:23PM +0000, Al Viro wrote: > Anyway, AFAICS removal could be done this way: > > // parent is held exclusive, after is NULL or a child of parent > find_next_child(parent, prev) > child = NULL > node = prev ? &prev->d_child : &parent->d_subdirs; > grab parent->d_lock > for each entry in the list starting at node->next > d = container_of(entry, struct dentry, d_child) > grab d->d_lock > if simple_positive(d) > bump d->d_count > child = d > drop d->d_lock > if child > break > drop parent->d_lock > dput(prev); > return child > > kill_it(victim, parent) > if simple_positive(victim) > d_invalidate(victim); // needed to avoid lost mounts > if victim is a directory > fsnotify_rmdir > else > fsnotify_unlink > if victim is regular > __debugfs_file_removed(victim) > dput(victim) // unpin it > > recursive_removal(dentry) > this = dentry; > while (true) { > victim = NULL; > inode = this->d_inode; > inode_lock(inode); > if (d_is_dir(this)) > mark this doomed > while ((child = find_next_child(this, victim)) == NULL) { > // no children (left); kill and ascend > // update metadata while it's still locked > inode->i_ctime = current_time(inode); > clear_nlink(inode); > inode_unlock(inode); > victim = this; > this = this->d_parent; > inode = this->d_inode; > inode_lock(inode); > kill_it(victim, this); > if (victim == dentry) { > inode->i_ctime = inode->i_mtime = current_time(inode); > if (d_is_dir(dentry)) > drop_nlink(inode); > inode_unlock(inode); > dput(dentry); > return; > } > } > inode_unlock(inode); > this = child; > } Come to think of that, if we use IS_DEADDIR as "no more additions" marking, that looks like a good candidate for all in-kernel rm -rf on ramfs-style filesystems without cross-directory renames. This bit in kill_it() above if victim is regular __debugfs_file_removed(victim) would be an fs-specific callback passed by the caller, turning the whole thing into this: void simple_recursive_removal(struct dentry *dentry, void (*callback)(struct dentry *)) { struct dentry *this = dentry; while (true) { struct dentry *victim = NULL, *child; struct inode *inode = this->d_inode; inode_lock(inode); if (d_is_dir(this)) inode->i_flags |= S_DEAD; while ((child = find_next_child(this, victim)) == NULL) { // kill and ascend // update metadata while it's still locked inode->i_ctime = current_time(inode); clear_nlink(inode); inode_unlock(inode); victim = this; this = this->d_parent; inode = this->d_inode; inode_lock(inode); if (simple_positive(victim)) { d_invalidate(victim); // avoid lost mounts if (is_dir(victim)) fsnotify_rmdir(inode, victim); else fsnotify_unlink(inode, victim); if (callback) callback(victim); dput(victim) // unpin it } if (victim == dentry) { inode->i_ctime = inode->i_mtime = current_time(inode); if (d_is_dir(dentry)) drop_nlink(inode); inode_unlock(inode); dput(dentry); return; } } inode_unlock(inode); this = child; } } with find_next_child() easily implemented via scan_positives() already in libfs.c... Objections? The above is obviously completely untested, and I've got nowhere near enough sleep, so there may be any number of brown paperbag bugs in it. Review would be very welcome...
On Fri, Nov 15, 2019 at 06:42:09PM +0000, Al Viro wrote: > Come to think of that, if we use IS_DEADDIR as "no more additions" marking, > that looks like a good candidate for all in-kernel rm -rf on ramfs-style > filesystems without cross-directory renames. This bit in kill_it() above > if victim is regular > __debugfs_file_removed(victim) > would be an fs-specific callback passed by the caller, turning the whole > thing into this: Umm... A bit more than that, actually - the callback would be void remove_one(struct dentry *victim) { if (d_is_reg(victim)) __debugfs_file_removed(victim); simple_release_fs(&debugfs_mount, &debugfs_mount_count); } and the caller would do simple_pin_fs(&debug_fs_type, &debugfs_mount, &debugfs_mount_count); simple_recursive_removal(dentry, remove_one); simple_release_fs(&debugfs_mount, &debugfs_mount_count);
On Fri, Nov 15, 2019 at 07:41:38PM +0000, Al Viro wrote: > On Fri, Nov 15, 2019 at 06:42:09PM +0000, Al Viro wrote: > > Come to think of that, if we use IS_DEADDIR as "no more additions" marking, > > that looks like a good candidate for all in-kernel rm -rf on ramfs-style > > filesystems without cross-directory renames. This bit in kill_it() above > > if victim is regular > > __debugfs_file_removed(victim) > > would be an fs-specific callback passed by the caller, turning the whole > > thing into this: > > Umm... A bit more than that, actually - the callback would be > void remove_one(struct dentry *victim) > { > if (d_is_reg(victim)) > __debugfs_file_removed(victim); > simple_release_fs(&debugfs_mount, &debugfs_mount_count); > } > and the caller would do > simple_pin_fs(&debug_fs_type, &debugfs_mount, &debugfs_mount_count); > simple_recursive_removal(dentry, remove_one); > simple_release_fs(&debugfs_mount, &debugfs_mount_count); OK... debugfs and tracefs definitely convert to that; so do, AFAICS, spufs and selinuxfs, and I wouldn't be surprised if it could be used in a few more places... securityfs, almost certainly qibfs, gadgetfs looks like it could make use of that. Maybe subrpc as well, but I'll need to look in details. configfs won't, unfortunately...
On Fri, 15 Nov 2019 21:18:20 +0000 Al Viro <viro@zeniv.linux.org.uk> wrote: > OK... debugfs and tracefs definitely convert to that; so do, AFAICS, > spufs and selinuxfs, and I wouldn't be surprised if it could be > used in a few more places... securityfs, almost certainly qibfs, > gadgetfs looks like it could make use of that. Maybe subrpc > as well, but I'll need to look in details. configfs won't, > unfortunately... Thanks Al for looking into this. I'll try to test it in tracefs, and see if anything breaks. But probably wont get to it till next week. -- Steve
On Fri, Nov 15, 2019 at 04:26:09PM -0500, Steven Rostedt wrote: > On Fri, 15 Nov 2019 21:18:20 +0000 > Al Viro <viro@zeniv.linux.org.uk> wrote: > > > OK... debugfs and tracefs definitely convert to that; so do, AFAICS, > > spufs and selinuxfs, and I wouldn't be surprised if it could be > > used in a few more places... securityfs, almost certainly qibfs, > > gadgetfs looks like it could make use of that. Maybe subrpc > > as well, but I'll need to look in details. configfs won't, > > unfortunately... > > Thanks Al for looking into this. > > I'll try to test it in tracefs, and see if anything breaks. But > probably wont get to it till next week. I'll probably throw that into #next.dcache - if nothing else, that cuts down on the size of patch converting d_subdirs/d_child from list to hlist... Need to get some sleep first, though - only 5 hours today, so I want to take another look at that thing tomorrow morning - I don't trust my ability to spot obvious bugs right now... ;-/ Oh, well - that at least might finally push the old "kernel-side rm -rf done right" pile of half-baked patches into more useful state, probably superseding most of them.
On Fri, Nov 15, 2019 at 10:10:37PM +0000, Al Viro wrote: > On Fri, Nov 15, 2019 at 04:26:09PM -0500, Steven Rostedt wrote: > > On Fri, 15 Nov 2019 21:18:20 +0000 > > Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > > OK... debugfs and tracefs definitely convert to that; so do, AFAICS, > > > spufs and selinuxfs, and I wouldn't be surprised if it could be > > > used in a few more places... securityfs, almost certainly qibfs, > > > gadgetfs looks like it could make use of that. Maybe subrpc > > > as well, but I'll need to look in details. configfs won't, > > > unfortunately... > > > > Thanks Al for looking into this. > > > > I'll try to test it in tracefs, and see if anything breaks. But > > probably wont get to it till next week. > > I'll probably throw that into #next.dcache - if nothing else, > that cuts down on the size of patch converting d_subdirs/d_child > from list to hlist... > > Need to get some sleep first, though - only 5 hours today, so > I want to take another look at that thing tomorrow morning - > I don't trust my ability to spot obvious bugs right now... ;-/ > > Oh, well - that at least might finally push the old "kernel-side > rm -rf done right" pile of half-baked patches into more useful > state, probably superseding most of them. Thanks for doing this. Sorry for the delay in getting back to this, was on a long-haul flight... Anyway, this looks sane to me. debugfs "should" not be having a file added while a directory is being removed at the same time, but I really can't guarantee that someone is trying to do something crazy like that. So "heavy" locking is fine with me, this never has to be a "fast" operation, it's much more important to get it "correct". thanks, greg k-h
On Fri, Nov 15, 2019 at 10:10:37PM +0000, Al Viro wrote: > I'll probably throw that into #next.dcache - if nothing else, > that cuts down on the size of patch converting d_subdirs/d_child > from list to hlist... > > Need to get some sleep first, though - only 5 hours today, so > I want to take another look at that thing tomorrow morning - > I don't trust my ability to spot obvious bugs right now... ;-/ > > Oh, well - that at least might finally push the old "kernel-side > rm -rf done right" pile of half-baked patches into more useful > state, probably superseding most of them. Curious... Is there any point keeping debugfs_remove() and debugfs_remove_recursive() separate? The thing is, the only case when their behaviours differ is when the victim is non-empty. In that case the former quietly does nothing; the latter (also quietly) removes the entire subtree. And the caller has no way to tell if that case has happened - they can't even look at the dentry they'd passed, since in the normal case it's already pointing to freed (and possibly reused) memory by that point. The same goes for tracefs, except that there we have only one caller of tracefs_remove(), and it's guaranteed to be a non-directory. So there we definitely can fold them together. Greg, could we declare debufs_remove() to be an alias for debugfs_remove_recursive()?
On Sun, Nov 17, 2019 at 10:24:22PM +0000, Al Viro wrote: > On Fri, Nov 15, 2019 at 10:10:37PM +0000, Al Viro wrote: > > > I'll probably throw that into #next.dcache - if nothing else, > > that cuts down on the size of patch converting d_subdirs/d_child > > from list to hlist... > > > > Need to get some sleep first, though - only 5 hours today, so > > I want to take another look at that thing tomorrow morning - > > I don't trust my ability to spot obvious bugs right now... ;-/ > > > > Oh, well - that at least might finally push the old "kernel-side > > rm -rf done right" pile of half-baked patches into more useful > > state, probably superseding most of them. > > Curious... Is there any point keeping debugfs_remove() and > debugfs_remove_recursive() separate? The thing is, the only case > when their behaviours differ is when the victim is non-empty. In that > case the former quietly does nothing; the latter (also quietly) removes > the entire subtree. And the caller has no way to tell if that case has > happened - they can't even look at the dentry they'd passed, since > in the normal case it's already pointing to freed (and possibly reused) > memory by that point. > > The same goes for tracefs, except that there we have only > one caller of tracefs_remove(), and it's guaranteed to be a non-directory. > So there we definitely can fold them together. > > Greg, could we declare debufs_remove() to be an alias for > debugfs_remove_recursive()? Yes, we can do that there's no reason to keep those separate at all. Especially if it makes things easier overall. thanks, greg k-h
diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 10090f1..8eb84ef 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -129,7 +129,8 @@ struct dentry { enum dentry_d_lock_class { DENTRY_D_LOCK_NORMAL, /* implicitly used by plain spin_lock() APIs. */ - DENTRY_D_LOCK_NESTED + DENTRY_D_LOCK_NESTED, + DENTRY_D_LOCK_NESTED_2 }; struct dentry_operations {
'dentry_d_lock_class' can be used for spin_lock_nested in case lockdep confused about two different dentry take the 'd_lock'. However, a single 'DENTRY_D_LOCK_NESTED' may not be enough if more than two dentry are involed. So, and in 'DENTRY_D_LOCK_NESTED_2' Signed-off-by: yu kuai <yukuai3@huawei.com> --- include/linux/dcache.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)