mbox series

[0/7] namei: clean up retry logic in various do_* functions

Message ID 20210712123649.1102392-1-dkadashev@gmail.com (mailing list archive)
Headers show
Series namei: clean up retry logic in various do_* functions | expand

Message

Dmitry Kadashev July 12, 2021, 12:36 p.m. UTC
Suggested by Linus in https://lore.kernel.org/io-uring/CAHk-=wh=cpt_tQCirzFZRPawRpbuFTZ2MxNpXiyUF+eBXF=+sw@mail.gmail.com/

This patchset does all the do_* functions one by one. The idea is to
move the main logic to a helper function and handle stale retries /
struct filename cleanups outside, which makes the logic easier to
follow.

There is one minor change in the behavior: filename_lookup() /
filename_parentat() / filename_create() do their own retries on ESTALE
(regardless of flags), and previously they were exempt from retries in
the do_* functions (but they *were* called on retry - it's just the
return code wasn't checked for ESTALE). And now the retry is done on
the upper level, and so technically it could be called a behavior
change. Hopefully it's an edge case where an additional check does not
matter.

On top of https://lore.kernel.org/io-uring/20210708063447.3556403-1-dkadashev@gmail.com/

Dmitry Kadashev (7):
  namei: clean up do_rmdir retry logic
  namei: clean up do_unlinkat retry logic
  namei: clean up do_mkdirat retry logic
  namei: clean up do_mknodat retry logic
  namei: clean up do_symlinkat retry logic
  namei: clean up do_linkat retry logic
  namei: clean up do_renameat retry logic

 fs/namei.c | 283 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 163 insertions(+), 120 deletions(-)

Comments

Dmitry Kadashev July 12, 2021, 12:41 p.m. UTC | #1
On Mon, Jul 12, 2021 at 7:37 PM Dmitry Kadashev <dkadashev@gmail.com> wrote:
>
> Suggested by Linus in https://lore.kernel.org/io-uring/CAHk-=wh=cpt_tQCirzFZRPawRpbuFTZ2MxNpXiyUF+eBXF=+sw@mail.gmail.com/
>
> This patchset does all the do_* functions one by one. The idea is to
> move the main logic to a helper function and handle stale retries /
> struct filename cleanups outside, which makes the logic easier to
> follow.
>
> There is one minor change in the behavior: filename_lookup() /
> filename_parentat() / filename_create() do their own retries on ESTALE
> (regardless of flags), and previously they were exempt from retries in
> the do_* functions (but they *were* called on retry - it's just the
> return code wasn't checked for ESTALE). And now the retry is done on
> the upper level, and so technically it could be called a behavior
> change. Hopefully it's an edge case where an additional check does not
> matter.
>
> On top of https://lore.kernel.org/io-uring/20210708063447.3556403-1-dkadashev@gmail.com/

Since this is on top of the stuff that is going to be in the Jens' tree
only until the 5.15 merge window, I'm assuming this series should go
there as well.
Linus Torvalds July 12, 2021, 7:01 p.m. UTC | #2
On Mon, Jul 12, 2021 at 5:41 AM Dmitry Kadashev <dkadashev@gmail.com> wrote:
>
> Since this is on top of the stuff that is going to be in the Jens' tree
> only until the 5.15 merge window, I'm assuming this series should go
> there as well.

Yeah. Unless Al wants to pick this whole series up.

See my comments about the individual patches - some of them change
code flow, others do. And I think changing code flow as part of
cleanup is ok, but it at the very least needs to be mentioned (and it
might be good to do the "move code that is idempotent inside the
retry" as a separate patch from documentation purposes)

           Linus
Al Viro July 12, 2021, 8:25 p.m. UTC | #3
On Mon, Jul 12, 2021 at 12:01:52PM -0700, Linus Torvalds wrote:
> On Mon, Jul 12, 2021 at 5:41 AM Dmitry Kadashev <dkadashev@gmail.com> wrote:
> >
> > Since this is on top of the stuff that is going to be in the Jens' tree
> > only until the 5.15 merge window, I'm assuming this series should go
> > there as well.
> 
> Yeah. Unless Al wants to pick this whole series up.
> 
> See my comments about the individual patches - some of them change
> code flow, others do. And I think changing code flow as part of
> cleanup is ok, but it at the very least needs to be mentioned (and it
> might be good to do the "move code that is idempotent inside the
> retry" as a separate patch from documentation purposes)

TBH, my main problem with this is that ESTALE retry logics had never
felt right.  We ask e.g. filename_create() to get us the parent.  We
tell it whether we want it to be maximally suspicious or not.  It
still does the same RCU-normal-LOOKUP_REVAL sequence, only for "trust
no one" variant it's RCU-LOOKUP_REVAL-LOOKUP_REVAL instead.  We are
*not* told how far in that sequence did it have to get.  What's more,
even if we had to get all way up to LOOKUP_REVAL, we ignore that
when we do dcache lookup for the last component - only the argument
of filename_create() is looked at.

It really smells like the calling conventions are wrong.  I agree that
all of that is, by definition, a very slow path - it's just that the
logics makes me go "WTF?" every time I see it... ;-/

Hell knows - perhaps the lookup_flags thing wants to be passed by
reference (all the way into path_parentat()) and have the "we had
to go for LOOKUP_REVAL" returned that way.  Not sure...

Al, still crawling out of the bloody ptrace/asm glue horrors at the moment...
Dmitry Kadashev July 13, 2021, 10:22 a.m. UTC | #4
On Tue, Jul 13, 2021 at 2:02 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> See my comments about the individual patches - some of them change
> code flow, others do. And I think changing code flow as part of
> cleanup is ok, but it at the very least needs to be mentioned (and it
> might be good to do the "move code that is idempotent inside the
> retry" as a separate patch from documentation purposes)

Indeed, I should have at least included the flow changes into the commit
messages.

I'll send v2 with those comments addressed.
Dmitry Kadashev July 13, 2021, 12:28 p.m. UTC | #5
On Tue, Jul 13, 2021 at 3:28 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Jul 12, 2021 at 12:01:52PM -0700, Linus Torvalds wrote:
> > On Mon, Jul 12, 2021 at 5:41 AM Dmitry Kadashev <dkadashev@gmail.com> wrote:
> > >
> > > Since this is on top of the stuff that is going to be in the Jens' tree
> > > only until the 5.15 merge window, I'm assuming this series should go
> > > there as well.
> >
> > Yeah. Unless Al wants to pick this whole series up.
> >
> > See my comments about the individual patches - some of them change
> > code flow, others do. And I think changing code flow as part of
> > cleanup is ok, but it at the very least needs to be mentioned (and it
> > might be good to do the "move code that is idempotent inside the
> > retry" as a separate patch from documentation purposes)
>
> TBH, my main problem with this is that ESTALE retry logics had never
> felt right.  We ask e.g. filename_create() to get us the parent.  We
> tell it whether we want it to be maximally suspicious or not.  It
> still does the same RCU-normal-LOOKUP_REVAL sequence, only for "trust
> no one" variant it's RCU-LOOKUP_REVAL-LOOKUP_REVAL instead.

Regardless of the bigger changes discussed below, should we change
direct comparison to ESTALE to retry_estale(retval, lookup_flags) in
filename_lookup() and filename_parentat() (and probably also
do_filp_open() and do_file_open_root())? At least it won't do two
consecutive LOOKUP_REVAL lookups and the change is trivial.

> We are
> *not* told how far in that sequence did it have to get.  What's more,
> even if we had to get all way up to LOOKUP_REVAL, we ignore that
> when we do dcache lookup for the last component - only the argument
> of filename_create() is looked at.
>
> It really smells like the calling conventions are wrong.  I agree that
> all of that is, by definition, a very slow path - it's just that the
> logics makes me go "WTF?" every time I see it... ;-/

The current series does not make it worse though. I'm happy to work on
further improvements with some guidance, but hopefully in a separate
patchset?

> Hell knows - perhaps the lookup_flags thing wants to be passed by
> reference (all the way into path_parentat()) and have the "we had
> to go for LOOKUP_REVAL" returned that way.  Not sure...

Will that allow to get rid of the retries completely? I'm not sure I
understand all the code paths that can return ESTALE, I'd assume we'd
still have to keep the whole retry logic.