diff mbox series

fs: unset MNT_WRITE_HOLD on failure

Message ID 20220420131925.2464685-1-brauner@kernel.org (mailing list archive)
State New, archived
Headers show
Series fs: unset MNT_WRITE_HOLD on failure | expand

Commit Message

Christian Brauner April 20, 2022, 1:19 p.m. UTC
After mnt_hold_writers() has been called we will always have set MNT_WRITE_HOLD
and consequently we always need to pair mnt_hold_writers() with
mnt_unhold_writers(). After the recent cleanup in [1] where Al switched from a
do-while to a for loop the cleanup currently fails to unset MNT_WRITE_HOLD for
the first mount that was changed. Fix this and make sure that the first mount
will be cleaned up and add some comments to make it more obvious.

Reported-by: syzbot+10a16d1c43580983f6a2@syzkaller.appspotmail.com
Reported-by: syzbot+306090cfa3294f0bbfb3@syzkaller.appspotmail.com
Fixes: e257039f0fc7 ("mount_setattr(): clean the control flow and calling conventions") [1]
Link: https://lore.kernel.org/lkml/0000000000007cc21d05dd0432b8@google.com
Link: https://lore.kernel.org/lkml/00000000000080e10e05dd043247@google.com
Cc: Hillf Danton <hdanton@sina.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
This should fix the syzbot issue. This is only relevant for making a
mount or mount tree read-only:
1. successul recursive read-only mount tree change:
   Cleanup loop isn't executed.
2. failed recursive read-only mount tree change:
   m will point to the mount we failed to handle. The cleanup loop will
   run until p == m and then terminate.
3. successful single read-only mount change:
   Cleanup loop won't be executed.
4. failed single read-only mount change:
   m will point to mnt and the cleanup loop will terminate if p == m.
I don't think there's any other weird corner cases since we now that
MNT_WRITE_HOLD can only have been set by us as it requires
lock_mount_hash() which we hold. So unconditionally unsetting it is
fine. But please make sure to take a close look at the changed loop.
---
 fs/namespace.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)


base-commit: b2d229d4ddb17db541098b83524d901257e93845

Comments

Christian Brauner April 21, 2022, 1:47 p.m. UTC | #1
On Wed, Apr 20, 2022 at 03:19:25PM +0200, Christian Brauner wrote:
> After mnt_hold_writers() has been called we will always have set MNT_WRITE_HOLD
> and consequently we always need to pair mnt_hold_writers() with
> mnt_unhold_writers(). After the recent cleanup in [1] where Al switched from a
> do-while to a for loop the cleanup currently fails to unset MNT_WRITE_HOLD for
> the first mount that was changed. Fix this and make sure that the first mount
> will be cleaned up and add some comments to make it more obvious.
> 
> Reported-by: syzbot+10a16d1c43580983f6a2@syzkaller.appspotmail.com
> Reported-by: syzbot+306090cfa3294f0bbfb3@syzkaller.appspotmail.com
> Fixes: e257039f0fc7 ("mount_setattr(): clean the control flow and calling conventions") [1]
> Link: https://lore.kernel.org/lkml/0000000000007cc21d05dd0432b8@google.com
> Link: https://lore.kernel.org/lkml/00000000000080e10e05dd043247@google.com
> Cc: Hillf Danton <hdanton@sina.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> ---
> This should fix the syzbot issue. This is only relevant for making a
> mount or mount tree read-only:
> 1. successul recursive read-only mount tree change:
>    Cleanup loop isn't executed.
> 2. failed recursive read-only mount tree change:
>    m will point to the mount we failed to handle. The cleanup loop will
>    run until p == m and then terminate.
> 3. successful single read-only mount change:
>    Cleanup loop won't be executed.
> 4. failed single read-only mount change:
>    m will point to mnt and the cleanup loop will terminate if p == m.
> I don't think there's any other weird corner cases since we now that
> MNT_WRITE_HOLD can only have been set by us as it requires
> lock_mount_hash() which we hold. So unconditionally unsetting it is
> fine. But please make sure to take a close look at the changed loop.
> ---

Unless I hear objections I'll route this upstream before -rc4 to get
this fixed because it's pretty trivial to trigger this.
Christoph Hellwig April 21, 2022, 2:43 p.m. UTC | #2
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
diff mbox series

Patch

diff --git a/fs/namespace.c b/fs/namespace.c
index a0a36bfa3aa0..afe2b64b14f1 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4058,10 +4058,22 @@  static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
 	if (err) {
 		struct mount *p;
 
-		for (p = mnt; p != m; p = next_mnt(p, mnt)) {
+		/*
+		 * If we had to call mnt_hold_writers() MNT_WRITE_HOLD will
+		 * be set in @mnt_flags. The loop unsets MNT_WRITE_HOLD for all
+		 * mounts and needs to take care to include the first mount.
+		 */
+		for (p = mnt; p; p = next_mnt(p, mnt)) {
 			/* If we had to hold writers unblock them. */
 			if (p->mnt.mnt_flags & MNT_WRITE_HOLD)
 				mnt_unhold_writers(p);
+
+			/*
+			 * We're done once the first mount we changed got
+			 * MNT_WRITE_HOLD unset.
+			 */
+			if (p == m)
+				break;
 		}
 	}
 	return err;