[RFC,v6,6/9] vfs: Add sb_want_write() function to get vfsmount from a given sb.

From: Qu Wenruo <quwenruo@cn.fujitsu.com>

*** Please DON'T merge this patch, it's only for disscusion purpose ***

There are sysfs interfaces in some fs, only btrfs yet, which will modify
on-disk data.
Unlike normal file operation routine we can use mnt_want_write_file() to
protect the operation, change through sysfs won't to be binded to any file
in the filesystem.

So introduce new sb_want_write() to do the protection agains a super
block, which acts much like mnt_want_write() but will return success if
the super block is read-write.

Since sysfs handler don't go through the normal vfsmount, so it won't
increase the refcount of and even we have sb_want_write() waiting sb to
be unfrozen, the fs can still be unmounted without problem.
Causing the modules unable to be removed and user can find out what's
wrong until 

To solve such problem, we have different strategies to solve it.
1) Extra check on last instance umount of a sb
This is the method the patch uses.
This method seems valid enough, since we want to get write protection on
a sb, so it's OK for the sb if there is *ANY* mount instance.
Problem 1.1)
But lsof and other tools won't help if sb_want_write() on frozen fs cause
it unable to be unmounted.

Problem 1.2)
When get namespace involved, things will get more complicated.
Like the following case:
	Alice				|		Bob
Mount devA on /mnt1 in her ns		| Mount devA on /mnt2/ in his ns
freeze /mnt1				|
sb_want_write() (waiting)		|
umount /mnt1 (success since there is 	|
another mount instance)			|
					| umount /mnt2 (fail since there
					| is sb_want_write() waiting)

So Alice can't thaw the fs since there is no mount point for it now.

2) Don't allow any umount of the sb if there is sb_want_write().
More aggressive one, purpose by Miao Xie.
Can't resolve problem 1.1) but will solve problem 1.2).
Although introduced new problem like the following:
	Alice
Mount devA on /mnt1
freeze /mnt1
sb_want_write() (waiting)
mount devA on /mnt2 and /mnt3

/mnt[123] all can't be unmounted, but new mount can still be created.

3) sb_want_write() doesn't make any sense and break VFS rules!
Action which will change on-disk data should not be tunable through sysfs,
and sb_want_write() things which by-pass all the VFS check is just evil.
And for btrfs, we already have the ioctl to set label, why bothering new
sysfs interface to do it again?

Although I use method 1) to do it, I am still not certain about which is
method is the correct one.

So any advise is welcomed.

Thanks,
Qu

Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>
Signed-off-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
Changelog:
v4:
  Newly introduced.
v5:
  Change name to sb_want_write() and receive sb and parameter.
v6:
  Add better check when umounting the last instance of a super block. So
  sb_want_write() waiting for fs unfrozen/transaction will prevent
  umount.
---
 fs/namespace.c        | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h    |  9 ++++++
 include/linux/mount.h |  2 ++
 3 files changed, 94 insertions(+)

[RFC,v6,6/9] vfs: Add sb_want_write() function to get vfsmount from a given sb.

Commit Message

Comments

Patch