Message ID | 87oafkc5kp.fsf@notabene.neil.brown.name (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
2015-10-27 20:25 GMT-02:00 Neil Brown <neil@brown.name>: > snif > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 611b66d73e80..e96c53590f72 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -5621,6 +5621,23 @@ static void btrfs_dentry_release(struct dentry *dentry) Signed-off-by: ? Albino -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 28, 2015 at 07:25:10AM +0900, Neil Brown wrote: > > If you create a subvolume in btrfs and access it (by name) without > mounting it, then the subvolume looks like a separate mount to some > extent, returning a different st_dev to stat(), but it doesn't look like > a separate mount in that it isn't listed in /proc/mounts. This > inconsistency can confuse tools. > > This patch causes these subvolumes to become separate mounts by using > the VFS' automount functionality, much like NFS uses automount when it > discovered mountpoints on the server. > > The VFS currently makes it impossible to auto-mount a directory on to itself > (i.e. a bind mount). For NFS this isn't a problem as a new superblock > is created for the child filesystem so there are two separate dentries > (and inodes) for the one directory: one in the parent filesystem, one in > the child (note that the two superblocks share a common connection to > the server so there is still a lot of commonality). > > BTRFS has chosen instead to use a single superblock for all subvolumes. Naive question: was there a reason for that choice? --b. > This results in a single dentry for the subvol-root. A dentry which > must be auto-mounted on itself. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 02, 2015 at 03:50:12PM -0500, J. Bruce Fields wrote: > On Wed, Oct 28, 2015 at 07:25:10AM +0900, Neil Brown wrote: > > > > If you create a subvolume in btrfs and access it (by name) without > > mounting it, then the subvolume looks like a separate mount to some > > extent, returning a different st_dev to stat(), but it doesn't look like > > a separate mount in that it isn't listed in /proc/mounts. This > > inconsistency can confuse tools. > > > > This patch causes these subvolumes to become separate mounts by using > > the VFS' automount functionality, much like NFS uses automount when it > > discovered mountpoints on the server. > > > > The VFS currently makes it impossible to auto-mount a directory on to itself > > (i.e. a bind mount). For NFS this isn't a problem as a new superblock > > is created for the child filesystem so there are two separate dentries > > (and inodes) for the one directory: one in the parent filesystem, one in > > the child (note that the two superblocks share a common connection to > > the server so there is still a lot of commonality). > > > > BTRFS has chosen instead to use a single superblock for all subvolumes. > > Naive question: was there a reason for that choice? They are really all part of the same FS, the single super better fits. Or said another way, it felt like there would be dramatically more duct tape around supers-per-subvolume than there was abusing st_dev. Neil's patch came up after I told him a few of us had tried to do the same thing and failed to find clean vfs changes to make it possible...he took it as a challenge. Now I have to remember what it was about our past attempts that I didn't like. I'll test this and queue for 4.5 if it all works out, thanks Neil! -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 03 2015, Chris Mason wrote: > On Mon, Nov 02, 2015 at 03:50:12PM -0500, J. Bruce Fields wrote: >> On Wed, Oct 28, 2015 at 07:25:10AM +0900, Neil Brown wrote: >> > >> > If you create a subvolume in btrfs and access it (by name) without >> > mounting it, then the subvolume looks like a separate mount to some >> > extent, returning a different st_dev to stat(), but it doesn't look like >> > a separate mount in that it isn't listed in /proc/mounts. This >> > inconsistency can confuse tools. >> > >> > This patch causes these subvolumes to become separate mounts by using >> > the VFS' automount functionality, much like NFS uses automount when it >> > discovered mountpoints on the server. >> > >> > The VFS currently makes it impossible to auto-mount a directory on to itself >> > (i.e. a bind mount). For NFS this isn't a problem as a new superblock >> > is created for the child filesystem so there are two separate dentries >> > (and inodes) for the one directory: one in the parent filesystem, one in >> > the child (note that the two superblocks share a common connection to >> > the server so there is still a lot of commonality). >> > >> > BTRFS has chosen instead to use a single superblock for all subvolumes. >> >> Naive question: was there a reason for that choice? > > They are really all part of the same FS, the single super better fits. > Or said another way, it felt like there would be dramatically more duct > tape around supers-per-subvolume than there was abusing st_dev. > > Neil's patch came up after I told him a few of us had tried to do the > same thing and failed to find clean vfs changes to make it possible...he > took it as a challenge. Now I have to remember what it was about our > past attempts that I didn't like. > > I'll test this and queue for 4.5 if it all works out, thanks Neil! I'd rather resend with proper documentation updates and s-o-b before it gets queued if that is OK. So once you are happy, please let me know and I'll do it "properly". Thanks, NeilBrown
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 611b66d73e80..e96c53590f72 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5621,6 +5621,23 @@ static void btrfs_dentry_release(struct dentry *dentry) kfree(dentry->d_fsdata); } +static int btrfs_dentry_manage(struct dentry *dentry, bool in_rcu) +{ + /* This is a 'rebind automount'. So only trigger automount + * when the dentry isn't the root of a mountpoint. + */ + return 1; +} + +static struct vfsmount *btrfs_dentry_automount(struct path *path) +{ + struct vfsmount *mnt; + mnt = clone_private_mount(path); + if (!IS_ERR(mnt)) + mntget(mnt); + return mnt; +} + static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) { @@ -5633,6 +5650,8 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry, else return ERR_CAST(inode); } + if (inode && inode->i_ino == BTRFS_FIRST_FREE_OBJECTID) + dentry->d_flags |= DCACHE_MANAGE_TRANSIT | DCACHE_NEED_AUTOMOUNT; return d_splice_alias(inode, dentry); } @@ -9990,4 +10009,6 @@ static const struct inode_operations btrfs_symlink_inode_operations = { const struct dentry_operations btrfs_dentry_operations = { .d_delete = btrfs_dentry_delete, .d_release = btrfs_dentry_release, + .d_manage = btrfs_dentry_manage, + .d_automount = btrfs_dentry_automount, }; diff --git a/fs/namei.c b/fs/namei.c index 33e9495a3129..07e4bbbadae1 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1178,6 +1178,7 @@ static int follow_managed(struct path *path, struct nameidata *nd) unlikely(managed != 0)) { /* Allow the filesystem to manage the transit without i_mutex * being held. */ + ret = 0; if (managed & DCACHE_MANAGE_TRANSIT) { BUG_ON(!path->dentry->d_op); BUG_ON(!path->dentry->d_op->d_manage); @@ -1207,7 +1208,12 @@ static int follow_managed(struct path *path, struct nameidata *nd) /* Handle an automount point */ if (managed & DCACHE_NEED_AUTOMOUNT) { - ret = follow_automount(path, nd, &need_mntput); + if (ret == 1 && path->mnt->mnt_root == path->dentry) { + /* only automount when not the root */ + ret = 0; + break; + } else + ret = follow_automount(path, nd, &need_mntput); if (ret < 0) break; continue; @@ -1219,7 +1225,7 @@ static int follow_managed(struct path *path, struct nameidata *nd) if (need_mntput && path->mnt == mnt) mntput(path->mnt); - if (ret == -EISDIR) + if (ret == -EISDIR || ret > 0) ret = 0; if (need_mntput) nd->flags |= LOOKUP_JUMPED; diff --git a/fs/namespace.c b/fs/namespace.c index 0570729c87fd..adfcb125bef0 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2431,7 +2431,8 @@ int finish_automount(struct vfsmount *m, struct path *path) BUG_ON(mnt_get_count(mnt) < 2); if (m->mnt_sb == path->mnt->mnt_sb && - m->mnt_root == path->dentry) { + m->mnt_root == path->dentry && + path->mnt->mnt_root == path->dentry) { err = -ELOOP; goto fail; }