Message ID | 20210715103600.3570667-6-dkadashev@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | namei: clean up retry logic in various do_* functions | expand |
On Thu, Jul 15, 2021 at 05:35:51PM +0700, Dmitry Kadashev wrote: > This is just a preparation for the move of the main mkdirat logic to a > separate function to make the logic easier to follow. This change > contains the flow changes so that the actual change to move the main > logic to a separate function does no change the flow at all. > > Just like the similar patches for rmdir and unlink a few commits before, > there two changes here: > > 1. Previously on filename_create() error the function used to exit > immediately, and now it will check the return code to see if ESTALE > retry is appropriate. The filename_create() does its own retries on > ESTALE (at least via filename_parentat() used inside), but this extra > check should be completely fine. This is the wrong way to go. Really. Look at it that way - LOOKUP_REVAL is the final stage of escalation; if we had to go there, there's no point being optimistic about the last dcache lookup, nevermind trying to retry the parent pathwalk if we fail with -ESTALE doing it. I'm not saying that it's something worth optimizing for; the problem is different - the logics makes no sense whatsoever that way. It's a matter of reader's cycles wasted on "what the fuck are we trying to do here?", not the CPU cycles wasted on execution. While we are at it, it makes no sense for filename_parentat() and its ilk to go for RCU and normal if it's been given LOOKUP_REVAL - I mean, look at the sequence of calls in there. And try to make sense of it. Especially of the "OK, RCU attempt told us to sod off and try normal; here, let's call path_parentat() with LOOKUP_REVAL for flags and if it says -ESTALE, call it again with exact same arguments" part. Seriously, look at that from the point of view of somebody who tries to make sense of the entire thing.
On Fri, Jul 16, 2021 at 3:17 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Thu, Jul 15, 2021 at 05:35:51PM +0700, Dmitry Kadashev wrote: > > This is just a preparation for the move of the main mkdirat logic to a > > separate function to make the logic easier to follow. This change > > contains the flow changes so that the actual change to move the main > > logic to a separate function does no change the flow at all. > > > > Just like the similar patches for rmdir and unlink a few commits before, > > there two changes here: > > > > 1. Previously on filename_create() error the function used to exit > > immediately, and now it will check the return code to see if ESTALE > > retry is appropriate. The filename_create() does its own retries on > > ESTALE (at least via filename_parentat() used inside), but this extra > > check should be completely fine. > > This is the wrong way to go. Really. Look at it that way - LOOKUP_REVAL > is the final stage of escalation; if we had to go there, there's no > point being optimistic about the last dcache lookup, nevermind trying > to retry the parent pathwalk if we fail with -ESTALE doing it. > > I'm not saying that it's something worth optimizing for; the problem > is different - the logics makes no sense whatsoever that way. It's > a matter of reader's cycles wasted on "what the fuck are we trying > to do here?", not the CPU cycles wasted on execution. > > While we are at it, it makes no sense for filename_parentat() and its > ilk to go for RCU and normal if it's been given LOOKUP_REVAL - I mean, > look at the sequence of calls in there. And try to make sense of > it. Especially of the "OK, RCU attempt told us to sod off and try normal; > here, let's call path_parentat() with LOOKUP_REVAL for flags and if it > says -ESTALE, call it again with exact same arguments" part. > > Seriously, look at that from the point of view of somebody who tries > to make sense of the entire thing OK, let me try to venture down that "change the way ESTALE retries are done completely" path. The problem here is I'm not familiar with the code enough to be sure the conversion is 1-to-1 (i.e. that we can't get ESTALE from somewhere unexpected), and that retries are open-coded in quite a few places it seems. Anyway, I'll try and dig in and come back with either an RFC patch or some questions. Thanks for the feedback, Al.
On Tue, Jul 20, 2021 at 01:59:29PM +0700, Dmitry Kadashev wrote: > > This is the wrong way to go. Really. Look at it that way - LOOKUP_REVAL > > is the final stage of escalation; if we had to go there, there's no > > point being optimistic about the last dcache lookup, nevermind trying > > to retry the parent pathwalk if we fail with -ESTALE doing it. > > > > I'm not saying that it's something worth optimizing for; the problem > > is different - the logics makes no sense whatsoever that way. It's > > a matter of reader's cycles wasted on "what the fuck are we trying > > to do here?", not the CPU cycles wasted on execution. > > > > While we are at it, it makes no sense for filename_parentat() and its > > ilk to go for RCU and normal if it's been given LOOKUP_REVAL - I mean, > > look at the sequence of calls in there. And try to make sense of > > it. Especially of the "OK, RCU attempt told us to sod off and try normal; > > here, let's call path_parentat() with LOOKUP_REVAL for flags and if it > > says -ESTALE, call it again with exact same arguments" part. > > > > Seriously, look at that from the point of view of somebody who tries > > to make sense of the entire thing > > OK, let me try to venture down that "change the way ESTALE retries are > done completely" path. The problem here is I'm not familiar with the > code enough to be sure the conversion is 1-to-1 (i.e. that we can't get > ESTALE from somewhere unexpected), and that retries are open-coded in > quite a few places it seems. Anyway, I'll try and dig in and come back > with either an RFC patch or some questions. Thanks for the feedback, Al. I'd try to look at the primitives that go through RCU/normal/REVAL series. They are all in fs/namei.c; filename_lookup(), filename_parentat(), do_filp_open() and do_filo_open_root(). The latter pair almost certainly is fine as-is. retry_estale() crap is limited to user_path_at/user_path_at_empty users, along with some filename_parentat() ones. There we follow that series with something that might give us ESTALE, and if it does, we want to repeat the whole thing in REVAL mode. OTOH, there are callers (and fairly similar ones, at that - look at e.g. AF_UNIX bind doing mknod) where we don't have that kind of logics. Question 1: which of those are lacking retry_estale(), even though they might arguably need it? Note that e.g. AF_UNIX bind uses kern_path_create(), so we need to look at all callchains leading to those, not just the ones in fs/namei.c guts. If most of those really want retry_estale, we'd be better off if we took the REVAL fallback out of filename_lookup() and filename_parentat() and turned massaged the users from do rcu/normal/reval lookups if failed, fuck off do other work if it fails with ESTALE do rcu/reval/reval (yes, really) if failed, fuck off do other work into do rcu/normal lookups if not failed do other work if something (including initial lookup) failed with ESTALE repeat the entire thing with LOOKUP_REVAL in the mix possibly with a helper function involved. For the ones that need retry_estale that's a win; for the rest it'd be boilerplate (that's basically the ones where "do other work" never fails with ESTALE). Question 2: how are "need retry_estale"/"fine just with ESTALE fallback in filename_{lookup,parentat}()" cases are distributed? If the majority is in "need retry_estale" class, then something similar to what's been outlined above would probably be a decent solution. Otherwise we'll need wrappers equivalent to current behaviour, and that's where it can get unpleasant - at which level in call chain do we put that wrapper? Sure, we can add filename_lookup_as_it_fucking_used_to_be(). Except that it's not called directly by those "don't need retry_estale" users, so we'd need to provide such counterparts for them as well ;-/ IOW, we need the call tree for filename_lookup()/filename_parentat(), with leaves (originators of call chain) marked with "does that user do retry_estale?" (and tracked far back for the answer to depend only upon the call site - if an intermediate can come from both kinds of places, we need to track back to its callers). Then we'll be able to see at which levels do we want those "as it used to behave" wrappers... If you want to dig around, that would probably be a reasonable place to start.
.On Tue, Jul 20, 2021 at 8:58 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Tue, Jul 20, 2021 at 01:59:29PM +0700, Dmitry Kadashev wrote: > > > > This is the wrong way to go. Really. Look at it that way - LOOKUP_REVAL > > > is the final stage of escalation; if we had to go there, there's no > > > point being optimistic about the last dcache lookup, nevermind trying > > > to retry the parent pathwalk if we fail with -ESTALE doing it. > > > > > > I'm not saying that it's something worth optimizing for; the problem > > > is different - the logics makes no sense whatsoever that way. It's > > > a matter of reader's cycles wasted on "what the fuck are we trying > > > to do here?", not the CPU cycles wasted on execution. > > > > > > While we are at it, it makes no sense for filename_parentat() and its > > > ilk to go for RCU and normal if it's been given LOOKUP_REVAL - I mean, > > > look at the sequence of calls in there. And try to make sense of > > > it. Especially of the "OK, RCU attempt told us to sod off and try normal; > > > here, let's call path_parentat() with LOOKUP_REVAL for flags and if it > > > says -ESTALE, call it again with exact same arguments" part. > > > > > > Seriously, look at that from the point of view of somebody who tries > > > to make sense of the entire thing > > > > OK, let me try to venture down that "change the way ESTALE retries are > > done completely" path. The problem here is I'm not familiar with the > > code enough to be sure the conversion is 1-to-1 (i.e. that we can't get > > ESTALE from somewhere unexpected), and that retries are open-coded in > > quite a few places it seems. Anyway, I'll try and dig in and come back > > with either an RFC patch or some questions. Thanks for the feedback, Al. > > I'd try to look at the primitives that go through RCU/normal/REVAL series. > They are all in fs/namei.c; filename_lookup(), filename_parentat(), > do_filp_open() and do_filo_open_root(). The latter pair almost certainly > is fine as-is. > > retry_estale() crap is limited to user_path_at/user_path_at_empty users, > along with some filename_parentat() ones. > > There we follow that series with something that might give us ESTALE, > and if it does, we want to repeat the whole thing in REVAL mode. > > OTOH, there are callers (and fairly similar ones, at that - look at e.g. > AF_UNIX bind doing mknod) where we don't have that kind of logics. > > Question 1: which of those are lacking retry_estale(), even though they > might arguably need it? Note that e.g. AF_UNIX bind uses kern_path_create(), > so we need to look at all callchains leading to those, not just the ones > in fs/namei.c guts. > > If most of those really want retry_estale, we'd be better off if we took > the REVAL fallback out of filename_lookup() and filename_parentat() > and turned massaged the users from > do rcu/normal/reval lookups > if failed, fuck off > do other work > if it fails with ESTALE > do rcu/reval/reval (yes, really) > if failed, fuck off > do other work > into > do rcu/normal lookups > if not failed > do other work > if something (including initial lookup) failed with ESTALE > repeat the entire thing with LOOKUP_REVAL in the mix > possibly with a helper function involved. > For the ones that need retry_estale that's a win; for the rest it'd > be boilerplate (that's basically the ones where "do other work" never > fails with ESTALE). > > Question 2: how are "need retry_estale"/"fine just with ESTALE fallback > in filename_{lookup,parentat}()" cases are distributed? > > If the majority is in "need retry_estale" class, then something similar > to what's been outlined above would probably be a decent solution. > > Otherwise we'll need wrappers equivalent to current behaviour, and that's > where it can get unpleasant - at which level in call chain do we put > that wrapper? Sure, we can add filename_lookup_as_it_fucking_used_to_be(). > Except that it's not called directly by those "don't need retry_estale" > users, so we'd need to provide such counterparts for them as well ;-/ > > IOW, we need the call tree for filename_lookup()/filename_parentat(), > with leaves (originators of call chain) marked with "does that user > do retry_estale?" (and tracked far back for the answer to depend only > upon the call site - if an intermediate can come from both kinds > of places, we need to track back to its callers). > > Then we'll be able to see at which levels do we want those "as it used > to behave" wrappers... > > If you want to dig around, that would probably be a reasonable place to > start. Thanks for the pointers! I am happy to dig into this. I can't spend much time per week on this though, so it will take some time.
diff --git a/fs/namei.c b/fs/namei.c index 703cce40d597..54dbd1e38298 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3861,7 +3861,7 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode) dentry = __filename_create(dfd, name, &path, lookup_flags); error = PTR_ERR(dentry); if (IS_ERR(dentry)) - goto out_putname; + goto out; if (!IS_POSIXACL(path.dentry->d_inode)) mode &= ~current_umask(); @@ -3873,11 +3873,11 @@ int do_mkdirat(int dfd, struct filename *name, umode_t mode) mode); } done_path_create(&path, dentry); - if (retry_estale(error, lookup_flags)) { +out: + if (unlikely(retry_estale(error, lookup_flags))) { lookup_flags |= LOOKUP_REVAL; goto retry; } -out_putname: putname(name); return error; }
This is just a preparation for the move of the main mkdirat logic to a separate function to make the logic easier to follow. This change contains the flow changes so that the actual change to move the main logic to a separate function does no change the flow at all. Just like the similar patches for rmdir and unlink a few commits before, there two changes here: 1. Previously on filename_create() error the function used to exit immediately, and now it will check the return code to see if ESTALE retry is appropriate. The filename_create() does its own retries on ESTALE (at least via filename_parentat() used inside), but this extra check should be completely fine. 2. The retry_estale() check is wrapped in unlikely(). Some other places already have that and overall it seems to make sense Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <christian.brauner@ubuntu.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/io-uring/CAHk-=wijsw1QSsQHFu_6dEoZEr_zvT7++WJWohcuEkLqqXBGrQ@mail.gmail.com/ Link: https://lore.kernel.org/io-uring/CAHk-=wjFd0qn6asio=zg7zUTRmSty_TpAEhnwym1Qb=wFgCKzA@mail.gmail.com/ Signed-off-by: Dmitry Kadashev <dkadashev@gmail.com> --- fs/namei.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)