Message ID | 20230616163827.19377-1-jack@suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs: Provide helpers for manipulating sb->s_readonly_remount | expand |
On Fri, Jun 16, 2023 at 06:38:27PM +0200, Jan Kara wrote: > Provide helpers to set and clear sb->s_readonly_remount including > appropriate memory barriers. Also use this opportunity to document what > the barriers pair with and why they are needed. > > Suggested-by: Dave Chinner <david@fromorbit.com> > Signed-off-by: Jan Kara <jack@suse.cz> The helper conversion looks fine so from that perspective the patch looks good. However, I'm not sure the use of memory barriers is correct, though. IIUC, we want mnt_is_readonly() to return true when ever s_readonly_remount is set. Is that the behaviour we are trying to acheive for both ro->rw and rw->ro transactions? > --- > fs/internal.h | 26 ++++++++++++++++++++++++++ > fs/namespace.c | 10 ++++------ > fs/super.c | 17 ++++++----------- > include/linux/fs.h | 2 +- > 4 files changed, 37 insertions(+), 18 deletions(-) > > diff --git a/fs/internal.h b/fs/internal.h > index bd3b2810a36b..01bff3f6db79 100644 > --- a/fs/internal.h > +++ b/fs/internal.h > @@ -120,6 +120,32 @@ void put_super(struct super_block *sb); > extern bool mount_capable(struct fs_context *); > int sb_init_dio_done_wq(struct super_block *sb); > > +/* > + * Prepare superblock for changing its read-only state (i.e., either remount > + * read-write superblock read-only or vice versa). After this function returns > + * mnt_is_readonly() will return true for any mount of the superblock if its > + * caller is able to observe any changes done by the remount. This holds until > + * sb_end_ro_state_change() is called. > + */ > +static inline void sb_start_ro_state_change(struct super_block *sb) > +{ > + WRITE_ONCE(sb->s_readonly_remount, 1); > + /* The barrier pairs with the barrier in mnt_is_readonly() */ > + smp_wmb(); > +} I'm not sure how this wmb pairs with the memory barrier in mnt_is_readonly() to provide the correct behavior. The barrier in mnt_is_readonly() happens after it checks s_readonly_remount, so the s_readonly_remount in mnt_is_readonly is not ordered in any way against this barrier. The barrier in mnt_is_readonly() ensures that the loads of SB_RDONLY and MNT_READONLY are ordered after s_readonly_remount(), but we don't change those flags until a long way after s_readonly_remount is set. Hence if this is a ro->rw transistion, then I can see that racing on s_readonly_remount being isn't an issue, because the mount/sb flags will have SB_RDONLY/MNT_READONLY set and the correct thing will be done (i.e. consider code between sb_start_ro_state_change() and sb_end_ro_state_change() is RO). However, it's not obvious (to me, anyway) how this works at all for a rw->ro transition - if we race on s_readonly_remount being set then we'll consider the fs to still be read-write regardless of the smp_rmb() in mnt_is_readonly() because neither SB_RDONLY or MNT_READONLY are set at this point. So I can't see what the memory barrier is actually doing for us here... What am I missing? > +/* > + * Ends section changing read-only state of the superblock. After this function > + * returns if mnt_is_readonly() returns false, the caller will be able to > + * observe all the changes remount did to the superblock. > + */ > +static inline void sb_end_ro_state_change(struct super_block *sb) > +{ > + /* The barrier pairs with the barrier in mnt_is_readonly() */ > + smp_wmb(); > + WRITE_ONCE(sb->s_readonly_remount, 0); > +} This one looks fine - it is providing release semantics, ensuring that if s_readonly_remount is seen as zero, then the prior sb/mnt flag changes will be seen by __mnt_is_readonly(mnt). i.e the smp_rmb() in mnt_is_readonly() is providing acquire side semantics on the s_readonly_remount access if it returns 0.... Cheers, Dave.
On Sat, Jun 17, 2023 at 09:33:42AM +1000, Dave Chinner wrote: > On Fri, Jun 16, 2023 at 06:38:27PM +0200, Jan Kara wrote: > > Provide helpers to set and clear sb->s_readonly_remount including > > appropriate memory barriers. Also use this opportunity to document what > > the barriers pair with and why they are needed. > > > > Suggested-by: Dave Chinner <david@fromorbit.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > The helper conversion looks fine so from that perspective the patch > looks good. > > However, I'm not sure the use of memory barriers is correct, though. > > IIUC, we want mnt_is_readonly() to return true when ever > s_readonly_remount is set. Is that the behaviour we are trying to > acheive for both ro->rw and rw->ro transactions? > > > --- > > fs/internal.h | 26 ++++++++++++++++++++++++++ > > fs/namespace.c | 10 ++++------ > > fs/super.c | 17 ++++++----------- > > include/linux/fs.h | 2 +- > > 4 files changed, 37 insertions(+), 18 deletions(-) > > > > diff --git a/fs/internal.h b/fs/internal.h > > index bd3b2810a36b..01bff3f6db79 100644 > > --- a/fs/internal.h > > +++ b/fs/internal.h > > @@ -120,6 +120,32 @@ void put_super(struct super_block *sb); > > extern bool mount_capable(struct fs_context *); > > int sb_init_dio_done_wq(struct super_block *sb); > > > > +/* > > + * Prepare superblock for changing its read-only state (i.e., either remount > > + * read-write superblock read-only or vice versa). After this function returns > > + * mnt_is_readonly() will return true for any mount of the superblock if its > > + * caller is able to observe any changes done by the remount. This holds until > > + * sb_end_ro_state_change() is called. > > + */ > > +static inline void sb_start_ro_state_change(struct super_block *sb) > > +{ > > + WRITE_ONCE(sb->s_readonly_remount, 1); > > + /* The barrier pairs with the barrier in mnt_is_readonly() */ > > + smp_wmb(); > > +} > > I'm not sure how this wmb pairs with the memory barrier in > mnt_is_readonly() to provide the correct behavior. The barrier in > mnt_is_readonly() happens after it checks s_readonly_remount, so > the s_readonly_remount in mnt_is_readonly is not ordered in any way > against this barrier. > > The barrier in mnt_is_readonly() ensures that the loads of SB_RDONLY > and MNT_READONLY are ordered after s_readonly_remount(), but we > don't change those flags until a long way after s_readonly_remount > is set. > > Hence if this is a ro->rw transistion, then I can see that racing on > s_readonly_remount being isn't an issue, because the mount/sb > flags will have SB_RDONLY/MNT_READONLY set and the correct thing > will be done (i.e. consider code between sb_start_ro_state_change() > and sb_end_ro_state_change() is RO). > > However, it's not obvious (to me, anyway) how this works at all for > a rw->ro transition - if we race on s_readonly_remount being set > then we'll consider the fs to still be read-write regardless of the > smp_rmb() in mnt_is_readonly() because neither SB_RDONLY or > MNT_READONLY are set at this point. Let me try and remember it all. I've documented a good portion of this in the relevant functions but I should probably upstream some more longer documentation blurb as well. A rw->ro transition happen in two ways. (1) A mount or mount tree is made read-only via mount_setattr(MNT_ATTR_READONLY) or mount(MS_BIND|MS_RDONLY|MS_REMOUNT). (2) The filesystems/superblock is made read-only via fspick()+fsconfig() or mount(MS_REMOUNT|MS_RDONLY). For both (1) and (2) we grab lock_mount_hash() in relevant codepaths (because that's required for any vfsmount->mnt_flags changes) and then call mnt_hold_writers(). mnt_hold_writers() will first raise MNT_WRITE_HOLD in @mnt->mnt_flags before checking the write counter of that mount to see whether there are any active writers on that mount. If there are any active writers we'll fail mnt_hold_writers() and the whole rw->ro transition. A memory barrier is used to order raising MNT_WRITE_HOLD against the increment of the write counter of that mount in __mnt_want_write(). If __mnt_want_write() detects that MNT_WRITE_HOLD has been set after it incremented the write counter it will spin until MNT_WRITE_HOLD is cleared via mnt_unhold_writers(). This uses another memory barrier to ensure ordering with the mnt_is_readonly() check in __mnt_want_write(). __mnt_want_write() doesn't know about the ro/rw state of the mount at all until MNT_WRITE_HOLD has cleared. Then it calls mnt_is_readonly(). If the mount did indeed transition from rw->ro after MNT_WRITE_HOLD was cleared __mnt_want_write() will back off. If not write access to the mount is granted. A superblock rw->ro transition is done the same way. It also requires mnt_hold_writers() to be done. This is done in sb_prepare_remount_readonly() which is called in reconfigure_super(). Only after mnt_hold_writers() has been raised successfully on every mount of that filesystem (i.e., all bind mounts) will sb->s_readonly_remount be set. After MNT_WRITE_HOLD is cleared and mnt_is_readonly() is called sb->s_readonly_remount is guaranteed to be visible or MNT_READONLY or SB_RDONLY are visible. The memory barrier in sb->s_readonly_remount orders it against reading sb->s_flags. It doesn't protect/order the rw->ro transition itself. (The only exception is an emergency read-only remount where we don't know what state the fs is in and don't care for any active writers on that superblock so omit wading through all the mounts of that filesystem. But that's only doable from withing the kernel via SB_FORCE.) Provided I understand your question/concern correctly.
On Sat 17-06-23 09:33:42, Dave Chinner wrote: > On Fri, Jun 16, 2023 at 06:38:27PM +0200, Jan Kara wrote: > > Provide helpers to set and clear sb->s_readonly_remount including > > appropriate memory barriers. Also use this opportunity to document what > > the barriers pair with and why they are needed. > > > > Suggested-by: Dave Chinner <david@fromorbit.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > The helper conversion looks fine so from that perspective the patch > looks good. > > However, I'm not sure the use of memory barriers is correct, though. AFAICS, the barriers are correct but my documentation was not ;) Christian's reply has all the details but maybe let me attempt a bit more targetted reply here. > IIUC, we want mnt_is_readonly() to return true when ever > s_readonly_remount is set. Is that the behaviour we are trying to > acheive for both ro->rw and rw->ro transactions? Yes. But what matters is the ordering of s_readonly_remount checking wrt other flags. See below. > > --- > > fs/internal.h | 26 ++++++++++++++++++++++++++ > > fs/namespace.c | 10 ++++------ > > fs/super.c | 17 ++++++----------- > > include/linux/fs.h | 2 +- > > 4 files changed, 37 insertions(+), 18 deletions(-) > > > > diff --git a/fs/internal.h b/fs/internal.h > > index bd3b2810a36b..01bff3f6db79 100644 > > --- a/fs/internal.h > > +++ b/fs/internal.h > > @@ -120,6 +120,32 @@ void put_super(struct super_block *sb); > > extern bool mount_capable(struct fs_context *); > > int sb_init_dio_done_wq(struct super_block *sb); > > > > +/* > > + * Prepare superblock for changing its read-only state (i.e., either remount > > + * read-write superblock read-only or vice versa). After this function returns > > + * mnt_is_readonly() will return true for any mount of the superblock if its > > + * caller is able to observe any changes done by the remount. This holds until > > + * sb_end_ro_state_change() is called. > > + */ > > +static inline void sb_start_ro_state_change(struct super_block *sb) > > +{ > > + WRITE_ONCE(sb->s_readonly_remount, 1); > > + /* The barrier pairs with the barrier in mnt_is_readonly() */ > > + smp_wmb(); > > +} > > I'm not sure how this wmb pairs with the memory barrier in > mnt_is_readonly() to provide the correct behavior. The barrier in > mnt_is_readonly() happens after it checks s_readonly_remount, so > the s_readonly_remount in mnt_is_readonly is not ordered in any way > against this barrier. > > The barrier in mnt_is_readonly() ensures that the loads of SB_RDONLY > and MNT_READONLY are ordered after s_readonly_remount(), but we > don't change those flags until a long way after s_readonly_remount > is set. You are correct. I've reread the code and the ordering that matters is __mnt_want_write() on the read side and reconfigure_super() on the write side. In particular for RW->RO transition we must make sure that: If __mnt_want_write() does not see MNT_WRITE_HOLD set, it will see s_readonly_remount set. There is another set of barriers in those functions that makes sure sb_prepare_remount_readonly() sees incremented mnt_writers if __mnt_want_write() did not see MNT_WRITE_HOLD set, but that's a different story. Hence the barrier in sb_start_ro_state_change() pairs with smp_rmb() barrier in __mnt_want_write() before the mnt_is_readonly() check at the end of the function. I'll fix my patch, thanks for correction. > Hence if this is a ro->rw transistion, then I can see that racing on > s_readonly_remount being isn't an issue, because the mount/sb > flags will have SB_RDONLY/MNT_READONLY set and the correct thing > will be done (i.e. consider code between sb_start_ro_state_change() > and sb_end_ro_state_change() is RO). Yes, for the RO->RW the barrier in sb_prepare_remount_readonly() indeed pairs with the barrier in mnt_is_readonly(). It makes sure that if mnt_is_readonly() observes s_readonly_remount == 0, it will observe SB_RDONLY / MNT_READONLY still set. Honza
On Sat, Jun 17, 2023 at 05:05:25PM +0200, Christian Brauner wrote: > On Sat, Jun 17, 2023 at 09:33:42AM +1000, Dave Chinner wrote: > > On Fri, Jun 16, 2023 at 06:38:27PM +0200, Jan Kara wrote: > > > Provide helpers to set and clear sb->s_readonly_remount including > > > appropriate memory barriers. Also use this opportunity to document what > > > the barriers pair with and why they are needed. > > > > > > Suggested-by: Dave Chinner <david@fromorbit.com> > > > Signed-off-by: Jan Kara <jack@suse.cz> > > > > The helper conversion looks fine so from that perspective the patch > > looks good. > > > > However, I'm not sure the use of memory barriers is correct, though. > > > > IIUC, we want mnt_is_readonly() to return true when ever > > s_readonly_remount is set. Is that the behaviour we are trying to > > acheive for both ro->rw and rw->ro transactions? > > > > > --- > > > fs/internal.h | 26 ++++++++++++++++++++++++++ > > > fs/namespace.c | 10 ++++------ > > > fs/super.c | 17 ++++++----------- > > > include/linux/fs.h | 2 +- > > > 4 files changed, 37 insertions(+), 18 deletions(-) > > > > > > diff --git a/fs/internal.h b/fs/internal.h > > > index bd3b2810a36b..01bff3f6db79 100644 > > > --- a/fs/internal.h > > > +++ b/fs/internal.h > > > @@ -120,6 +120,32 @@ void put_super(struct super_block *sb); > > > extern bool mount_capable(struct fs_context *); > > > int sb_init_dio_done_wq(struct super_block *sb); > > > > > > +/* > > > + * Prepare superblock for changing its read-only state (i.e., either remount > > > + * read-write superblock read-only or vice versa). After this function returns > > > + * mnt_is_readonly() will return true for any mount of the superblock if its > > > + * caller is able to observe any changes done by the remount. This holds until > > > + * sb_end_ro_state_change() is called. > > > + */ > > > +static inline void sb_start_ro_state_change(struct super_block *sb) > > > +{ > > > + WRITE_ONCE(sb->s_readonly_remount, 1); > > > + /* The barrier pairs with the barrier in mnt_is_readonly() */ > > > + smp_wmb(); > > > +} > > > > I'm not sure how this wmb pairs with the memory barrier in > > mnt_is_readonly() to provide the correct behavior. The barrier in > > mnt_is_readonly() happens after it checks s_readonly_remount, so > > the s_readonly_remount in mnt_is_readonly is not ordered in any way > > against this barrier. > > > > The barrier in mnt_is_readonly() ensures that the loads of SB_RDONLY > > and MNT_READONLY are ordered after s_readonly_remount(), but we > > don't change those flags until a long way after s_readonly_remount > > is set. > > > > Hence if this is a ro->rw transistion, then I can see that racing on > > s_readonly_remount being isn't an issue, because the mount/sb > > flags will have SB_RDONLY/MNT_READONLY set and the correct thing > > will be done (i.e. consider code between sb_start_ro_state_change() > > and sb_end_ro_state_change() is RO). > > > > However, it's not obvious (to me, anyway) how this works at all for > > a rw->ro transition - if we race on s_readonly_remount being set > > then we'll consider the fs to still be read-write regardless of the > > smp_rmb() in mnt_is_readonly() because neither SB_RDONLY or > > MNT_READONLY are set at this point. > > Let me try and remember it all. I've documented a good portion of this > in the relevant functions but I should probably upstream some more > longer documentation blurb as well. > > A rw->ro transition happen in two ways. > > (1) A mount or mount tree is made read-only via > mount_setattr(MNT_ATTR_READONLY) or > mount(MS_BIND|MS_RDONLY|MS_REMOUNT). > (2) The filesystems/superblock is made read-only via fspick()+fsconfig() > or mount(MS_REMOUNT|MS_RDONLY). > > For both (1) and (2) we grab lock_mount_hash() in relevant codepaths > (because that's required for any vfsmount->mnt_flags changes) and then > call mnt_hold_writers(). > > mnt_hold_writers() will first raise MNT_WRITE_HOLD in @mnt->mnt_flags > before checking the write counter of that mount to see whether there are > any active writers on that mount. If there are any active writers we'll > fail mnt_hold_writers() and the whole rw->ro transition. > > A memory barrier is used to order raising MNT_WRITE_HOLD against the > increment of the write counter of that mount in __mnt_want_write(). > If __mnt_want_write() detects that MNT_WRITE_HOLD has been set after > it incremented the write counter it will spin until MNT_WRITE_HOLD is > cleared via mnt_unhold_writers(). This uses another memory barrier to > ensure ordering with the mnt_is_readonly() check in __mnt_want_write(). > > __mnt_want_write() doesn't know about the ro/rw state of the mount at > all until MNT_WRITE_HOLD has cleared. Then it calls mnt_is_readonly(). > > If the mount did indeed transition from rw->ro after MNT_WRITE_HOLD was > cleared __mnt_want_write() will back off. If not write access to the > mount is granted. > > A superblock rw->ro transition is done the same way. It also requires > mnt_hold_writers() to be done. This is done in > sb_prepare_remount_readonly() which is called in reconfigure_super(). > > Only after mnt_hold_writers() has been raised successfully on every > mount of that filesystem (i.e., all bind mounts) will > sb->s_readonly_remount be set. After MNT_WRITE_HOLD is cleared and > mnt_is_readonly() is called sb->s_readonly_remount is guaranteed to be > visible or MNT_READONLY or SB_RDONLY are visible. The memory barrier in > sb->s_readonly_remount orders it against reading sb->s_flags. It doesn't > protect/order the rw->ro transition itself. > > (The only exception is an emergency read-only remount where we don't > know what state the fs is in and don't care for any active writers on > that superblock so omit wading through all the mounts of that > filesystem. But that's only doable from withing the kernel via > SB_FORCE.) > > Provided I understand your question/concern correctly. Thank's for the info, Christian. Yes, you did understand my concern: that the memory barriers are poorly and/or incorrectly documented. :/ Nothing I read in the code around the s_readonly_remount variable or mnt_is_readonly() indicated that there was any serialisation around __mnt_want_write() and MNT_WRITE_HOLD. I was completely unable to make that jump from the code as it was written, or from the patch that Jan proposed. Now that I see mnt_hold_writers() and mnt_unhold_writers(), I see more memory barriers, but I don't see any reference to sb->s_readonly_remount in that code or the comments. Given that it appears that these memory barriers are deeply intertwined, I think that the mnt_[un]hold_writers() helpers also need to have their documentation updated to indicate how they interact with s_readonly_remount.. And for sb->s_readonly_remount helpers, some form of the above explanation also needs to be added, or maybe just a pointer to the documentation on the mnt_[un]hold_writers() helpers that also explains how sb->s_readonly_remount factors into the memory barriers in those helpers... Cheers, Dave.
On Mon, Jun 19, 2023 at 01:05:26PM +0200, Jan Kara wrote: > On Sat 17-06-23 09:33:42, Dave Chinner wrote: > > On Fri, Jun 16, 2023 at 06:38:27PM +0200, Jan Kara wrote: > > > Provide helpers to set and clear sb->s_readonly_remount including > > > appropriate memory barriers. Also use this opportunity to document what > > > the barriers pair with and why they are needed. > > > > > > Suggested-by: Dave Chinner <david@fromorbit.com> > > > Signed-off-by: Jan Kara <jack@suse.cz> > > > > The helper conversion looks fine so from that perspective the patch > > looks good. > > > > However, I'm not sure the use of memory barriers is correct, though. > > AFAICS, the barriers are correct but my documentation was not ;) > Christian's reply has all the details but maybe let me attempt a bit more > targetted reply here. *nod* > > > IIUC, we want mnt_is_readonly() to return true when ever > > s_readonly_remount is set. Is that the behaviour we are trying to > > acheive for both ro->rw and rw->ro transactions? > > Yes. But what matters is the ordering of s_readonly_remount checking wrt > other flags. See below. > > > > --- > > > fs/internal.h | 26 ++++++++++++++++++++++++++ > > > fs/namespace.c | 10 ++++------ > > > fs/super.c | 17 ++++++----------- > > > include/linux/fs.h | 2 +- > > > 4 files changed, 37 insertions(+), 18 deletions(-) > > > > > > diff --git a/fs/internal.h b/fs/internal.h > > > index bd3b2810a36b..01bff3f6db79 100644 > > > --- a/fs/internal.h > > > +++ b/fs/internal.h > > > @@ -120,6 +120,32 @@ void put_super(struct super_block *sb); > > > extern bool mount_capable(struct fs_context *); > > > int sb_init_dio_done_wq(struct super_block *sb); > > > > > > +/* > > > + * Prepare superblock for changing its read-only state (i.e., either remount > > > + * read-write superblock read-only or vice versa). After this function returns > > > + * mnt_is_readonly() will return true for any mount of the superblock if its > > > + * caller is able to observe any changes done by the remount. This holds until > > > + * sb_end_ro_state_change() is called. > > > + */ > > > +static inline void sb_start_ro_state_change(struct super_block *sb) > > > +{ > > > + WRITE_ONCE(sb->s_readonly_remount, 1); > > > + /* The barrier pairs with the barrier in mnt_is_readonly() */ > > > + smp_wmb(); > > > +} > > > > I'm not sure how this wmb pairs with the memory barrier in > > mnt_is_readonly() to provide the correct behavior. The barrier in > > mnt_is_readonly() happens after it checks s_readonly_remount, so > > the s_readonly_remount in mnt_is_readonly is not ordered in any way > > against this barrier. > > > > The barrier in mnt_is_readonly() ensures that the loads of SB_RDONLY > > and MNT_READONLY are ordered after s_readonly_remount(), but we > > don't change those flags until a long way after s_readonly_remount > > is set. > > You are correct. I've reread the code and the ordering that matters is > __mnt_want_write() on the read side and reconfigure_super() on the write > side. In particular for RW->RO transition we must make sure that: If > __mnt_want_write() does not see MNT_WRITE_HOLD set, it will see > s_readonly_remount set. There is another set of barriers in those functions > that makes sure sb_prepare_remount_readonly() sees incremented mnt_writers > if __mnt_want_write() did not see MNT_WRITE_HOLD set, but that's a > different story. Yup, as I said to Christian, there is nothing in the old or new code that even hints at an interaction with MNT_WRITE_HOLD or __mnt_want_write() here. I couldn't make that jump from reading the code, and so the memory barrier placement made no sense at all. > Hence the barrier in sb_start_ro_state_change() pairs with > smp_rmb() barrier in __mnt_want_write() before the > mnt_is_readonly() check at the end of the function. I'll fix my > patch, thanks for correction. Please also update the mnt_[un]hold_writers() and __mnt_want_write() documentation to also point at the new sb_start/end_ro_state_change helpers, as all the memory barriers in this code are tightly coupled. Thanks! -Dave.
diff --git a/fs/internal.h b/fs/internal.h index bd3b2810a36b..01bff3f6db79 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -120,6 +120,32 @@ void put_super(struct super_block *sb); extern bool mount_capable(struct fs_context *); int sb_init_dio_done_wq(struct super_block *sb); +/* + * Prepare superblock for changing its read-only state (i.e., either remount + * read-write superblock read-only or vice versa). After this function returns + * mnt_is_readonly() will return true for any mount of the superblock if its + * caller is able to observe any changes done by the remount. This holds until + * sb_end_ro_state_change() is called. + */ +static inline void sb_start_ro_state_change(struct super_block *sb) +{ + WRITE_ONCE(sb->s_readonly_remount, 1); + /* The barrier pairs with the barrier in mnt_is_readonly() */ + smp_wmb(); +} + +/* + * Ends section changing read-only state of the superblock. After this function + * returns if mnt_is_readonly() returns false, the caller will be able to + * observe all the changes remount did to the superblock. + */ +static inline void sb_end_ro_state_change(struct super_block *sb) +{ + /* The barrier pairs with the barrier in mnt_is_readonly() */ + smp_wmb(); + WRITE_ONCE(sb->s_readonly_remount, 0); +} + /* * open.c */ diff --git a/fs/namespace.c b/fs/namespace.c index 54847db5b819..2eff091df549 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -309,9 +309,9 @@ static unsigned int mnt_get_writers(struct mount *mnt) static int mnt_is_readonly(struct vfsmount *mnt) { - if (mnt->mnt_sb->s_readonly_remount) + if (READ_ONCE(mnt->mnt_sb->s_readonly_remount)) return 1; - /* Order wrt setting s_flags/s_readonly_remount in do_remount() */ + /* Order wrt barriers in sb_{start,end}_ro_state_change() */ smp_rmb(); return __mnt_is_readonly(mnt); } @@ -588,10 +588,8 @@ int sb_prepare_remount_readonly(struct super_block *sb) if (!err && atomic_long_read(&sb->s_remove_count)) err = -EBUSY; - if (!err) { - sb->s_readonly_remount = 1; - smp_wmb(); - } + if (!err) + sb_start_ro_state_change(sb); list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) { if (mnt->mnt.mnt_flags & MNT_WRITE_HOLD) mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD; diff --git a/fs/super.c b/fs/super.c index 6cd64961aa07..8a39902b859f 100644 --- a/fs/super.c +++ b/fs/super.c @@ -944,8 +944,7 @@ int reconfigure_super(struct fs_context *fc) */ if (remount_ro) { if (force) { - sb->s_readonly_remount = 1; - smp_wmb(); + sb_start_ro_state_change(sb); } else { retval = sb_prepare_remount_readonly(sb); if (retval) @@ -953,12 +952,10 @@ int reconfigure_super(struct fs_context *fc) } } else if (remount_rw) { /* - * We set s_readonly_remount here to protect filesystem's - * reconfigure code from writes from userspace until - * reconfigure finishes. + * Protect filesystem's reconfigure code from writes from + * userspace until reconfigure finishes. */ - sb->s_readonly_remount = 1; - smp_wmb(); + sb_start_ro_state_change(sb); } if (fc->ops->reconfigure) { @@ -974,9 +971,7 @@ int reconfigure_super(struct fs_context *fc) WRITE_ONCE(sb->s_flags, ((sb->s_flags & ~fc->sb_flags_mask) | (fc->sb_flags & fc->sb_flags_mask))); - /* Needs to be ordered wrt mnt_is_readonly() */ - smp_wmb(); - sb->s_readonly_remount = 0; + sb_end_ro_state_change(sb); /* * Some filesystems modify their metadata via some other path than the @@ -991,7 +986,7 @@ int reconfigure_super(struct fs_context *fc) return 0; cancel_readonly: - sb->s_readonly_remount = 0; + sb_end_ro_state_change(sb); return retval; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 133f0640fb24..ede51d60d124 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1242,7 +1242,7 @@ struct super_block { */ atomic_long_t s_fsnotify_connectors; - /* Being remounted read-only */ + /* Read-only state of the superblock is being changed */ int s_readonly_remount; /* per-sb errseq_t for reporting writeback errors via syncfs */
Provide helpers to set and clear sb->s_readonly_remount including appropriate memory barriers. Also use this opportunity to document what the barriers pair with and why they are needed. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/internal.h | 26 ++++++++++++++++++++++++++ fs/namespace.c | 10 ++++------ fs/super.c | 17 ++++++----------- include/linux/fs.h | 2 +- 4 files changed, 37 insertions(+), 18 deletions(-)