diff mbox series

fs: Provide helpers for manipulating sb->s_readonly_remount

Message ID 20230616163827.19377-1-jack@suse.cz (mailing list archive)
State New, archived
Headers show
Series fs: Provide helpers for manipulating sb->s_readonly_remount | expand

Commit Message

Jan Kara June 16, 2023, 4:38 p.m. UTC
Provide helpers to set and clear sb->s_readonly_remount including
appropriate memory barriers. Also use this opportunity to document what
the barriers pair with and why they are needed.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/internal.h      | 26 ++++++++++++++++++++++++++
 fs/namespace.c     | 10 ++++------
 fs/super.c         | 17 ++++++-----------
 include/linux/fs.h |  2 +-
 4 files changed, 37 insertions(+), 18 deletions(-)

Comments

Dave Chinner June 16, 2023, 11:33 p.m. UTC | #1
On Fri, Jun 16, 2023 at 06:38:27PM +0200, Jan Kara wrote:
> Provide helpers to set and clear sb->s_readonly_remount including
> appropriate memory barriers. Also use this opportunity to document what
> the barriers pair with and why they are needed.
> 
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Jan Kara <jack@suse.cz>

The helper conversion looks fine so from that perspective the patch
looks good.

However, I'm not sure the use of memory barriers is correct, though.

IIUC, we want mnt_is_readonly() to return true when ever
s_readonly_remount is set. Is that the behaviour we are trying to
acheive for both ro->rw and rw->ro transactions?

> ---
>  fs/internal.h      | 26 ++++++++++++++++++++++++++
>  fs/namespace.c     | 10 ++++------
>  fs/super.c         | 17 ++++++-----------
>  include/linux/fs.h |  2 +-
>  4 files changed, 37 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/internal.h b/fs/internal.h
> index bd3b2810a36b..01bff3f6db79 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -120,6 +120,32 @@ void put_super(struct super_block *sb);
>  extern bool mount_capable(struct fs_context *);
>  int sb_init_dio_done_wq(struct super_block *sb);
>  
> +/*
> + * Prepare superblock for changing its read-only state (i.e., either remount
> + * read-write superblock read-only or vice versa). After this function returns
> + * mnt_is_readonly() will return true for any mount of the superblock if its
> + * caller is able to observe any changes done by the remount. This holds until
> + * sb_end_ro_state_change() is called.
> + */
> +static inline void sb_start_ro_state_change(struct super_block *sb)
> +{
> +	WRITE_ONCE(sb->s_readonly_remount, 1);
> +	/* The barrier pairs with the barrier in mnt_is_readonly() */
> +	smp_wmb();
> +}

I'm not sure how this wmb pairs with the memory barrier in
mnt_is_readonly() to provide the correct behavior. The barrier in
mnt_is_readonly() happens after it checks s_readonly_remount, so
the s_readonly_remount in mnt_is_readonly is not ordered in any way
against this barrier.

The barrier in mnt_is_readonly() ensures that the loads of SB_RDONLY
and MNT_READONLY are ordered after s_readonly_remount(), but we
don't change those flags until a long way after s_readonly_remount
is set.

Hence if this is a ro->rw transistion, then I can see that racing on
s_readonly_remount being isn't an issue, because the mount/sb
flags will have SB_RDONLY/MNT_READONLY set and the correct thing
will be done (i.e. consider code between sb_start_ro_state_change()
and sb_end_ro_state_change() is RO).

However, it's not obvious (to me, anyway) how this works at all for
a rw->ro transition - if we race on s_readonly_remount being set
then we'll consider the fs to still be read-write regardless of the
smp_rmb() in mnt_is_readonly() because neither SB_RDONLY or
MNT_READONLY are set at this point.

So I can't see what the memory barrier is actually doing for
us here...

What am I missing?

> +/*
> + * Ends section changing read-only state of the superblock. After this function
> + * returns if mnt_is_readonly() returns false, the caller will be able to
> + * observe all the changes remount did to the superblock.
> + */
> +static inline void sb_end_ro_state_change(struct super_block *sb)
> +{
> +	/* The barrier pairs with the barrier in mnt_is_readonly() */
> +	smp_wmb();
> +	WRITE_ONCE(sb->s_readonly_remount, 0);
> +}

This one looks fine - it is providing release semantics,
ensuring that if s_readonly_remount is seen as zero, then the prior
sb/mnt flag changes will be seen by __mnt_is_readonly(mnt). i.e the
smp_rmb() in mnt_is_readonly() is providing acquire side
semantics on the s_readonly_remount access if it returns 0....

Cheers,

Dave.
Christian Brauner June 17, 2023, 3:05 p.m. UTC | #2
On Sat, Jun 17, 2023 at 09:33:42AM +1000, Dave Chinner wrote:
> On Fri, Jun 16, 2023 at 06:38:27PM +0200, Jan Kara wrote:
> > Provide helpers to set and clear sb->s_readonly_remount including
> > appropriate memory barriers. Also use this opportunity to document what
> > the barriers pair with and why they are needed.
> > 
> > Suggested-by: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> The helper conversion looks fine so from that perspective the patch
> looks good.
> 
> However, I'm not sure the use of memory barriers is correct, though.
> 
> IIUC, we want mnt_is_readonly() to return true when ever
> s_readonly_remount is set. Is that the behaviour we are trying to
> acheive for both ro->rw and rw->ro transactions?
> 
> > ---
> >  fs/internal.h      | 26 ++++++++++++++++++++++++++
> >  fs/namespace.c     | 10 ++++------
> >  fs/super.c         | 17 ++++++-----------
> >  include/linux/fs.h |  2 +-
> >  4 files changed, 37 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/internal.h b/fs/internal.h
> > index bd3b2810a36b..01bff3f6db79 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
> > @@ -120,6 +120,32 @@ void put_super(struct super_block *sb);
> >  extern bool mount_capable(struct fs_context *);
> >  int sb_init_dio_done_wq(struct super_block *sb);
> >  
> > +/*
> > + * Prepare superblock for changing its read-only state (i.e., either remount
> > + * read-write superblock read-only or vice versa). After this function returns
> > + * mnt_is_readonly() will return true for any mount of the superblock if its
> > + * caller is able to observe any changes done by the remount. This holds until
> > + * sb_end_ro_state_change() is called.
> > + */
> > +static inline void sb_start_ro_state_change(struct super_block *sb)
> > +{
> > +	WRITE_ONCE(sb->s_readonly_remount, 1);
> > +	/* The barrier pairs with the barrier in mnt_is_readonly() */
> > +	smp_wmb();
> > +}
> 
> I'm not sure how this wmb pairs with the memory barrier in
> mnt_is_readonly() to provide the correct behavior. The barrier in
> mnt_is_readonly() happens after it checks s_readonly_remount, so
> the s_readonly_remount in mnt_is_readonly is not ordered in any way
> against this barrier.
> 
> The barrier in mnt_is_readonly() ensures that the loads of SB_RDONLY
> and MNT_READONLY are ordered after s_readonly_remount(), but we
> don't change those flags until a long way after s_readonly_remount
> is set.
> 
> Hence if this is a ro->rw transistion, then I can see that racing on
> s_readonly_remount being isn't an issue, because the mount/sb
> flags will have SB_RDONLY/MNT_READONLY set and the correct thing
> will be done (i.e. consider code between sb_start_ro_state_change()
> and sb_end_ro_state_change() is RO).
> 
> However, it's not obvious (to me, anyway) how this works at all for
> a rw->ro transition - if we race on s_readonly_remount being set
> then we'll consider the fs to still be read-write regardless of the
> smp_rmb() in mnt_is_readonly() because neither SB_RDONLY or
> MNT_READONLY are set at this point.

Let me try and remember it all. I've documented a good portion of this
in the relevant functions but I should probably upstream some more
longer documentation blurb as well.

A rw->ro transition happen in two ways.

(1) A mount or mount tree is made read-only via
    mount_setattr(MNT_ATTR_READONLY) or
    mount(MS_BIND|MS_RDONLY|MS_REMOUNT).
(2) The filesystems/superblock is made read-only via fspick()+fsconfig()
    or mount(MS_REMOUNT|MS_RDONLY).

For both (1) and (2) we grab lock_mount_hash() in relevant codepaths
(because that's required for any vfsmount->mnt_flags changes) and then
call mnt_hold_writers().

mnt_hold_writers() will first raise MNT_WRITE_HOLD in @mnt->mnt_flags
before checking the write counter of that mount to see whether there are
any active writers on that mount. If there are any active writers we'll
fail mnt_hold_writers() and the whole rw->ro transition.

A memory barrier is used to order raising MNT_WRITE_HOLD against the
increment of the write counter of that mount in __mnt_want_write().
If __mnt_want_write() detects that MNT_WRITE_HOLD has been set after
it incremented the write counter it will spin until MNT_WRITE_HOLD is
cleared via mnt_unhold_writers(). This uses another memory barrier to
ensure ordering with the mnt_is_readonly() check in __mnt_want_write().

__mnt_want_write() doesn't know about the ro/rw state of the mount at
all until MNT_WRITE_HOLD has cleared. Then it calls mnt_is_readonly().

If the mount did indeed transition from rw->ro after MNT_WRITE_HOLD was
cleared __mnt_want_write() will back off. If not write access to the
mount is granted.

A superblock rw->ro transition is done the same way. It also requires
mnt_hold_writers() to be done. This is done in
sb_prepare_remount_readonly() which is called in reconfigure_super().

Only after mnt_hold_writers() has been raised successfully on every
mount of that filesystem (i.e., all bind mounts) will
sb->s_readonly_remount be set. After MNT_WRITE_HOLD is cleared and
mnt_is_readonly() is called sb->s_readonly_remount is guaranteed to be
visible or MNT_READONLY or SB_RDONLY are visible. The memory barrier in
sb->s_readonly_remount orders it against reading sb->s_flags. It doesn't
protect/order the rw->ro transition itself.

(The only exception is an emergency read-only remount where we don't
know what state the fs is in and don't care for any active writers on
that superblock so omit wading through all the mounts of that
filesystem. But that's only doable from withing the kernel via
SB_FORCE.)

Provided I understand your question/concern correctly.
Jan Kara June 19, 2023, 11:05 a.m. UTC | #3
On Sat 17-06-23 09:33:42, Dave Chinner wrote:
> On Fri, Jun 16, 2023 at 06:38:27PM +0200, Jan Kara wrote:
> > Provide helpers to set and clear sb->s_readonly_remount including
> > appropriate memory barriers. Also use this opportunity to document what
> > the barriers pair with and why they are needed.
> > 
> > Suggested-by: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> The helper conversion looks fine so from that perspective the patch
> looks good.
> 
> However, I'm not sure the use of memory barriers is correct, though.

AFAICS, the barriers are correct but my documentation was not ;)
Christian's reply has all the details but maybe let me attempt a bit more
targetted reply here.

> IIUC, we want mnt_is_readonly() to return true when ever
> s_readonly_remount is set. Is that the behaviour we are trying to
> acheive for both ro->rw and rw->ro transactions?

Yes. But what matters is the ordering of s_readonly_remount checking wrt
other flags. See below.

> > ---
> >  fs/internal.h      | 26 ++++++++++++++++++++++++++
> >  fs/namespace.c     | 10 ++++------
> >  fs/super.c         | 17 ++++++-----------
> >  include/linux/fs.h |  2 +-
> >  4 files changed, 37 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/internal.h b/fs/internal.h
> > index bd3b2810a36b..01bff3f6db79 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
> > @@ -120,6 +120,32 @@ void put_super(struct super_block *sb);
> >  extern bool mount_capable(struct fs_context *);
> >  int sb_init_dio_done_wq(struct super_block *sb);
> >  
> > +/*
> > + * Prepare superblock for changing its read-only state (i.e., either remount
> > + * read-write superblock read-only or vice versa). After this function returns
> > + * mnt_is_readonly() will return true for any mount of the superblock if its
> > + * caller is able to observe any changes done by the remount. This holds until
> > + * sb_end_ro_state_change() is called.
> > + */
> > +static inline void sb_start_ro_state_change(struct super_block *sb)
> > +{
> > +	WRITE_ONCE(sb->s_readonly_remount, 1);
> > +	/* The barrier pairs with the barrier in mnt_is_readonly() */
> > +	smp_wmb();
> > +}
> 
> I'm not sure how this wmb pairs with the memory barrier in
> mnt_is_readonly() to provide the correct behavior. The barrier in
> mnt_is_readonly() happens after it checks s_readonly_remount, so
> the s_readonly_remount in mnt_is_readonly is not ordered in any way
> against this barrier.
> 
> The barrier in mnt_is_readonly() ensures that the loads of SB_RDONLY
> and MNT_READONLY are ordered after s_readonly_remount(), but we
> don't change those flags until a long way after s_readonly_remount
> is set.

You are correct. I've reread the code and the ordering that matters is
__mnt_want_write() on the read side and reconfigure_super() on the write
side. In particular for RW->RO transition we must make sure that: If
__mnt_want_write() does not see MNT_WRITE_HOLD set, it will see
s_readonly_remount set. There is another set of barriers in those functions
that makes sure sb_prepare_remount_readonly() sees incremented mnt_writers
if __mnt_want_write() did not see MNT_WRITE_HOLD set, but that's a
different story.

Hence the barrier in sb_start_ro_state_change() pairs with smp_rmb()
barrier in __mnt_want_write() before the mnt_is_readonly() check at the end
of the function. I'll fix my patch, thanks for correction.

> Hence if this is a ro->rw transistion, then I can see that racing on
> s_readonly_remount being isn't an issue, because the mount/sb
> flags will have SB_RDONLY/MNT_READONLY set and the correct thing
> will be done (i.e. consider code between sb_start_ro_state_change()
> and sb_end_ro_state_change() is RO).

Yes, for the RO->RW the barrier in sb_prepare_remount_readonly() indeed
pairs with the barrier in mnt_is_readonly(). It makes sure that if
mnt_is_readonly() observes s_readonly_remount == 0, it will observe
SB_RDONLY / MNT_READONLY still set.

								Honza
Dave Chinner June 19, 2023, 11:11 p.m. UTC | #4
On Sat, Jun 17, 2023 at 05:05:25PM +0200, Christian Brauner wrote:
> On Sat, Jun 17, 2023 at 09:33:42AM +1000, Dave Chinner wrote:
> > On Fri, Jun 16, 2023 at 06:38:27PM +0200, Jan Kara wrote:
> > > Provide helpers to set and clear sb->s_readonly_remount including
> > > appropriate memory barriers. Also use this opportunity to document what
> > > the barriers pair with and why they are needed.
> > > 
> > > Suggested-by: Dave Chinner <david@fromorbit.com>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > 
> > The helper conversion looks fine so from that perspective the patch
> > looks good.
> > 
> > However, I'm not sure the use of memory barriers is correct, though.
> > 
> > IIUC, we want mnt_is_readonly() to return true when ever
> > s_readonly_remount is set. Is that the behaviour we are trying to
> > acheive for both ro->rw and rw->ro transactions?
> > 
> > > ---
> > >  fs/internal.h      | 26 ++++++++++++++++++++++++++
> > >  fs/namespace.c     | 10 ++++------
> > >  fs/super.c         | 17 ++++++-----------
> > >  include/linux/fs.h |  2 +-
> > >  4 files changed, 37 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/fs/internal.h b/fs/internal.h
> > > index bd3b2810a36b..01bff3f6db79 100644
> > > --- a/fs/internal.h
> > > +++ b/fs/internal.h
> > > @@ -120,6 +120,32 @@ void put_super(struct super_block *sb);
> > >  extern bool mount_capable(struct fs_context *);
> > >  int sb_init_dio_done_wq(struct super_block *sb);
> > >  
> > > +/*
> > > + * Prepare superblock for changing its read-only state (i.e., either remount
> > > + * read-write superblock read-only or vice versa). After this function returns
> > > + * mnt_is_readonly() will return true for any mount of the superblock if its
> > > + * caller is able to observe any changes done by the remount. This holds until
> > > + * sb_end_ro_state_change() is called.
> > > + */
> > > +static inline void sb_start_ro_state_change(struct super_block *sb)
> > > +{
> > > +	WRITE_ONCE(sb->s_readonly_remount, 1);
> > > +	/* The barrier pairs with the barrier in mnt_is_readonly() */
> > > +	smp_wmb();
> > > +}
> > 
> > I'm not sure how this wmb pairs with the memory barrier in
> > mnt_is_readonly() to provide the correct behavior. The barrier in
> > mnt_is_readonly() happens after it checks s_readonly_remount, so
> > the s_readonly_remount in mnt_is_readonly is not ordered in any way
> > against this barrier.
> > 
> > The barrier in mnt_is_readonly() ensures that the loads of SB_RDONLY
> > and MNT_READONLY are ordered after s_readonly_remount(), but we
> > don't change those flags until a long way after s_readonly_remount
> > is set.
> > 
> > Hence if this is a ro->rw transistion, then I can see that racing on
> > s_readonly_remount being isn't an issue, because the mount/sb
> > flags will have SB_RDONLY/MNT_READONLY set and the correct thing
> > will be done (i.e. consider code between sb_start_ro_state_change()
> > and sb_end_ro_state_change() is RO).
> > 
> > However, it's not obvious (to me, anyway) how this works at all for
> > a rw->ro transition - if we race on s_readonly_remount being set
> > then we'll consider the fs to still be read-write regardless of the
> > smp_rmb() in mnt_is_readonly() because neither SB_RDONLY or
> > MNT_READONLY are set at this point.
> 
> Let me try and remember it all. I've documented a good portion of this
> in the relevant functions but I should probably upstream some more
> longer documentation blurb as well.
> 
> A rw->ro transition happen in two ways.
> 
> (1) A mount or mount tree is made read-only via
>     mount_setattr(MNT_ATTR_READONLY) or
>     mount(MS_BIND|MS_RDONLY|MS_REMOUNT).
> (2) The filesystems/superblock is made read-only via fspick()+fsconfig()
>     or mount(MS_REMOUNT|MS_RDONLY).
> 
> For both (1) and (2) we grab lock_mount_hash() in relevant codepaths
> (because that's required for any vfsmount->mnt_flags changes) and then
> call mnt_hold_writers().
> 
> mnt_hold_writers() will first raise MNT_WRITE_HOLD in @mnt->mnt_flags
> before checking the write counter of that mount to see whether there are
> any active writers on that mount. If there are any active writers we'll
> fail mnt_hold_writers() and the whole rw->ro transition.
> 
> A memory barrier is used to order raising MNT_WRITE_HOLD against the
> increment of the write counter of that mount in __mnt_want_write().
> If __mnt_want_write() detects that MNT_WRITE_HOLD has been set after
> it incremented the write counter it will spin until MNT_WRITE_HOLD is
> cleared via mnt_unhold_writers(). This uses another memory barrier to
> ensure ordering with the mnt_is_readonly() check in __mnt_want_write().
> 
> __mnt_want_write() doesn't know about the ro/rw state of the mount at
> all until MNT_WRITE_HOLD has cleared. Then it calls mnt_is_readonly().
> 
> If the mount did indeed transition from rw->ro after MNT_WRITE_HOLD was
> cleared __mnt_want_write() will back off. If not write access to the
> mount is granted.
> 
> A superblock rw->ro transition is done the same way. It also requires
> mnt_hold_writers() to be done. This is done in
> sb_prepare_remount_readonly() which is called in reconfigure_super().
> 
> Only after mnt_hold_writers() has been raised successfully on every
> mount of that filesystem (i.e., all bind mounts) will
> sb->s_readonly_remount be set. After MNT_WRITE_HOLD is cleared and
> mnt_is_readonly() is called sb->s_readonly_remount is guaranteed to be
> visible or MNT_READONLY or SB_RDONLY are visible. The memory barrier in
> sb->s_readonly_remount orders it against reading sb->s_flags. It doesn't
> protect/order the rw->ro transition itself.
> 
> (The only exception is an emergency read-only remount where we don't
> know what state the fs is in and don't care for any active writers on
> that superblock so omit wading through all the mounts of that
> filesystem. But that's only doable from withing the kernel via
> SB_FORCE.)
> 
> Provided I understand your question/concern correctly.

Thank's for the info, Christian.  Yes, you did understand my
concern: that the memory barriers are poorly and/or incorrectly
documented. :/

Nothing I read in the code around the s_readonly_remount variable or
mnt_is_readonly() indicated that there was any serialisation around
__mnt_want_write() and MNT_WRITE_HOLD. I was completely unable to
make that jump from the code as it was written, or from the patch
that Jan proposed.

Now that I see mnt_hold_writers() and mnt_unhold_writers(), I see
more memory barriers, but I don't see any reference to
sb->s_readonly_remount in that code or the comments. Given that it
appears that these memory barriers are deeply intertwined, I think
that the mnt_[un]hold_writers() helpers also need to have their
documentation updated to indicate how they interact with
s_readonly_remount..

And for sb->s_readonly_remount helpers, some form of the above
explanation also needs to be added, or maybe just a pointer to the
documentation on the mnt_[un]hold_writers() helpers that also
explains how sb->s_readonly_remount factors into the memory barriers
in those helpers...

Cheers,

Dave.
Dave Chinner June 19, 2023, 11:16 p.m. UTC | #5
On Mon, Jun 19, 2023 at 01:05:26PM +0200, Jan Kara wrote:
> On Sat 17-06-23 09:33:42, Dave Chinner wrote:
> > On Fri, Jun 16, 2023 at 06:38:27PM +0200, Jan Kara wrote:
> > > Provide helpers to set and clear sb->s_readonly_remount including
> > > appropriate memory barriers. Also use this opportunity to document what
> > > the barriers pair with and why they are needed.
> > > 
> > > Suggested-by: Dave Chinner <david@fromorbit.com>
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > 
> > The helper conversion looks fine so from that perspective the patch
> > looks good.
> > 
> > However, I'm not sure the use of memory barriers is correct, though.
> 
> AFAICS, the barriers are correct but my documentation was not ;)
> Christian's reply has all the details but maybe let me attempt a bit more
> targetted reply here.

*nod*

> 
> > IIUC, we want mnt_is_readonly() to return true when ever
> > s_readonly_remount is set. Is that the behaviour we are trying to
> > acheive for both ro->rw and rw->ro transactions?
> 
> Yes. But what matters is the ordering of s_readonly_remount checking wrt
> other flags. See below.
> 
> > > ---
> > >  fs/internal.h      | 26 ++++++++++++++++++++++++++
> > >  fs/namespace.c     | 10 ++++------
> > >  fs/super.c         | 17 ++++++-----------
> > >  include/linux/fs.h |  2 +-
> > >  4 files changed, 37 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/fs/internal.h b/fs/internal.h
> > > index bd3b2810a36b..01bff3f6db79 100644
> > > --- a/fs/internal.h
> > > +++ b/fs/internal.h
> > > @@ -120,6 +120,32 @@ void put_super(struct super_block *sb);
> > >  extern bool mount_capable(struct fs_context *);
> > >  int sb_init_dio_done_wq(struct super_block *sb);
> > >  
> > > +/*
> > > + * Prepare superblock for changing its read-only state (i.e., either remount
> > > + * read-write superblock read-only or vice versa). After this function returns
> > > + * mnt_is_readonly() will return true for any mount of the superblock if its
> > > + * caller is able to observe any changes done by the remount. This holds until
> > > + * sb_end_ro_state_change() is called.
> > > + */
> > > +static inline void sb_start_ro_state_change(struct super_block *sb)
> > > +{
> > > +	WRITE_ONCE(sb->s_readonly_remount, 1);
> > > +	/* The barrier pairs with the barrier in mnt_is_readonly() */
> > > +	smp_wmb();
> > > +}
> > 
> > I'm not sure how this wmb pairs with the memory barrier in
> > mnt_is_readonly() to provide the correct behavior. The barrier in
> > mnt_is_readonly() happens after it checks s_readonly_remount, so
> > the s_readonly_remount in mnt_is_readonly is not ordered in any way
> > against this barrier.
> > 
> > The barrier in mnt_is_readonly() ensures that the loads of SB_RDONLY
> > and MNT_READONLY are ordered after s_readonly_remount(), but we
> > don't change those flags until a long way after s_readonly_remount
> > is set.
> 
> You are correct. I've reread the code and the ordering that matters is
> __mnt_want_write() on the read side and reconfigure_super() on the write
> side. In particular for RW->RO transition we must make sure that: If
> __mnt_want_write() does not see MNT_WRITE_HOLD set, it will see
> s_readonly_remount set. There is another set of barriers in those functions
> that makes sure sb_prepare_remount_readonly() sees incremented mnt_writers
> if __mnt_want_write() did not see MNT_WRITE_HOLD set, but that's a
> different story.

Yup, as I said to Christian, there is nothing in the old or new code
that even hints at an interaction with MNT_WRITE_HOLD or
__mnt_want_write() here. I couldn't make that jump from reading the
code, and so the memory barrier placement made no sense at all.

> Hence the barrier in sb_start_ro_state_change() pairs with
> smp_rmb() barrier in __mnt_want_write() before the
> mnt_is_readonly() check at the end of the function. I'll fix my
> patch, thanks for correction.

Please also update the mnt_[un]hold_writers() and __mnt_want_write()
documentation to also point at the new sb_start/end_ro_state_change
helpers, as all the memory barriers in this code are tightly
coupled.

Thanks!

-Dave.
diff mbox series

Patch

diff --git a/fs/internal.h b/fs/internal.h
index bd3b2810a36b..01bff3f6db79 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -120,6 +120,32 @@  void put_super(struct super_block *sb);
 extern bool mount_capable(struct fs_context *);
 int sb_init_dio_done_wq(struct super_block *sb);
 
+/*
+ * Prepare superblock for changing its read-only state (i.e., either remount
+ * read-write superblock read-only or vice versa). After this function returns
+ * mnt_is_readonly() will return true for any mount of the superblock if its
+ * caller is able to observe any changes done by the remount. This holds until
+ * sb_end_ro_state_change() is called.
+ */
+static inline void sb_start_ro_state_change(struct super_block *sb)
+{
+	WRITE_ONCE(sb->s_readonly_remount, 1);
+	/* The barrier pairs with the barrier in mnt_is_readonly() */
+	smp_wmb();
+}
+
+/*
+ * Ends section changing read-only state of the superblock. After this function
+ * returns if mnt_is_readonly() returns false, the caller will be able to
+ * observe all the changes remount did to the superblock.
+ */
+static inline void sb_end_ro_state_change(struct super_block *sb)
+{
+	/* The barrier pairs with the barrier in mnt_is_readonly() */
+	smp_wmb();
+	WRITE_ONCE(sb->s_readonly_remount, 0);
+}
+
 /*
  * open.c
  */
diff --git a/fs/namespace.c b/fs/namespace.c
index 54847db5b819..2eff091df549 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -309,9 +309,9 @@  static unsigned int mnt_get_writers(struct mount *mnt)
 
 static int mnt_is_readonly(struct vfsmount *mnt)
 {
-	if (mnt->mnt_sb->s_readonly_remount)
+	if (READ_ONCE(mnt->mnt_sb->s_readonly_remount))
 		return 1;
-	/* Order wrt setting s_flags/s_readonly_remount in do_remount() */
+	/* Order wrt barriers in sb_{start,end}_ro_state_change() */
 	smp_rmb();
 	return __mnt_is_readonly(mnt);
 }
@@ -588,10 +588,8 @@  int sb_prepare_remount_readonly(struct super_block *sb)
 	if (!err && atomic_long_read(&sb->s_remove_count))
 		err = -EBUSY;
 
-	if (!err) {
-		sb->s_readonly_remount = 1;
-		smp_wmb();
-	}
+	if (!err)
+		sb_start_ro_state_change(sb);
 	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
 		if (mnt->mnt.mnt_flags & MNT_WRITE_HOLD)
 			mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
diff --git a/fs/super.c b/fs/super.c
index 6cd64961aa07..8a39902b859f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -944,8 +944,7 @@  int reconfigure_super(struct fs_context *fc)
 	 */
 	if (remount_ro) {
 		if (force) {
-			sb->s_readonly_remount = 1;
-			smp_wmb();
+			sb_start_ro_state_change(sb);
 		} else {
 			retval = sb_prepare_remount_readonly(sb);
 			if (retval)
@@ -953,12 +952,10 @@  int reconfigure_super(struct fs_context *fc)
 		}
 	} else if (remount_rw) {
 		/*
-		 * We set s_readonly_remount here to protect filesystem's
-		 * reconfigure code from writes from userspace until
-		 * reconfigure finishes.
+		 * Protect filesystem's reconfigure code from writes from
+		 * userspace until reconfigure finishes.
 		 */
-		sb->s_readonly_remount = 1;
-		smp_wmb();
+		sb_start_ro_state_change(sb);
 	}
 
 	if (fc->ops->reconfigure) {
@@ -974,9 +971,7 @@  int reconfigure_super(struct fs_context *fc)
 
 	WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) |
 				 (fc->sb_flags & fc->sb_flags_mask)));
-	/* Needs to be ordered wrt mnt_is_readonly() */
-	smp_wmb();
-	sb->s_readonly_remount = 0;
+	sb_end_ro_state_change(sb);
 
 	/*
 	 * Some filesystems modify their metadata via some other path than the
@@ -991,7 +986,7 @@  int reconfigure_super(struct fs_context *fc)
 	return 0;
 
 cancel_readonly:
-	sb->s_readonly_remount = 0;
+	sb_end_ro_state_change(sb);
 	return retval;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 133f0640fb24..ede51d60d124 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1242,7 +1242,7 @@  struct super_block {
 	 */
 	atomic_long_t s_fsnotify_connectors;
 
-	/* Being remounted read-only */
+	/* Read-only state of the superblock is being changed */
 	int s_readonly_remount;
 
 	/* per-sb errseq_t for reporting writeback errors via syncfs */