mbox series

[v2,0/2] Add support to boot virtiofs and 9pfs as rootfs

Message ID 20210614174454.903555-1-vgoyal@redhat.com (mailing list archive)
Headers show
Series Add support to boot virtiofs and 9pfs as rootfs | expand

Message

Vivek Goyal June 14, 2021, 5:44 p.m. UTC
Hi,

We want to be able to compile in virtiofs/9pfs in kernel and then
boot kernel and mount virtiofs/9pfs as root filesystem.

Currently there does not seem to be any good way to be able to do
that. There seem to be some hacky ways like prefixing filesystem
tag with "mtd" or naming the filesystem tag as "/dev/root" to
mount viritofs.

Both viritofs and 9pfs have the notion of a "tag" to mount a filesystem
and they take this "tag" as a source argument of the mount. Filesystem
understands how to handle the tag.

Current code already has hooks to mount mtd/ubi/cifs/nfs root
filesystems (apart of regular block based filesystems). So intead
of creating two separate hooks for two filesystems, I have tried
creating a hook for tag based filesystems. And now both the filesystems
benefit from it.

This is generic enough that I think many more use cases might be
able to take advantage of it down the line.

Vivek Goyal (2):
  init/do_mounts.c: Add a path to boot from tag based filesystems
  init/do_mounts.c: Add 9pfs to the list of tag based filesystems

 init/do_mounts.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

Comments

Christoph Hellwig June 17, 2021, 10:14 a.m. UTC | #1
Why not something like the version below that should work for all nodev
file systems?

diff --git a/init/do_mounts.c b/init/do_mounts.c
index 74aede860de7..3c5676603fef 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -530,6 +530,39 @@ static int __init mount_cifs_root(void)
 }
 #endif
 
+static int __init mount_nodev_root(void)
+{
+	struct file_system_type *fs = get_fs_type(root_fs_names);
+	char *fs_names, *p;
+	int err = -ENODEV;
+
+	if (!fs)
+		goto out;
+	if (fs->fs_flags & FS_REQUIRES_DEV)
+		goto out_put_filesystem;
+
+	fs_names = (void *)__get_free_page(GFP_KERNEL);
+	if (!fs_names)
+		goto out_put_filesystem;
+	get_fs_names(fs_names);
+
+	for (p = fs_names; *p; p += strlen(p) + 1) {
+		err = do_mount_root(root_device_name, p, root_mountflags,
+					root_mount_data);
+		if (!err)
+			break;
+		if (err != -EACCES && err != -EINVAL)
+			panic("VFS: Unable to mount root \"%s\" (%s), err=%d\n",
+				      root_device_name, p, err);
+	}
+
+	free_page((unsigned long)fs_names);
+out_put_filesystem:
+	put_filesystem(fs);
+out:
+	return err;
+}
+
 void __init mount_root(void)
 {
 #ifdef CONFIG_ROOT_NFS
@@ -546,6 +579,8 @@ void __init mount_root(void)
 		return;
 	}
 #endif
+	if (ROOT_DEV == 0 && mount_nodev_root() == 0)
+		return;
 #ifdef CONFIG_BLOCK
 	{
 		int err = create_dev("/dev/root", ROOT_DEV);
Vivek Goyal June 17, 2021, 1:30 p.m. UTC | #2
On Thu, Jun 17, 2021 at 11:14:00AM +0100, Christoph Hellwig wrote:
> Why not something like the version below that should work for all nodev
> file systems?

Hi Christoph,

Thanks for this patch. It definitely looks much better. I had a fear
of breaking something if I were to go through this path of using
FS_REQUIRES_DEV.

This patch works for me with "root=myfs rootfstype=virtiofs rw". Have
few thoughts inline.
> 
> diff --git a/init/do_mounts.c b/init/do_mounts.c
> index 74aede860de7..3c5676603fef 100644
> --- a/init/do_mounts.c
> +++ b/init/do_mounts.c
> @@ -530,6 +530,39 @@ static int __init mount_cifs_root(void)
>  }
>  #endif
>  
> +static int __init mount_nodev_root(void)
> +{
> +	struct file_system_type *fs = get_fs_type(root_fs_names);

get_fs_type() assumes root_fs_names is not null. So if I pass
"root=myfs rw", it crashes with null pointer dereference.


> +	char *fs_names, *p;
> +	int err = -ENODEV;
> +
> +	if (!fs)
> +		goto out;
> +	if (fs->fs_flags & FS_REQUIRES_DEV)
> +		goto out_put_filesystem;
> +
> +	fs_names = (void *)__get_free_page(GFP_KERNEL);
> +	if (!fs_names)
> +		goto out_put_filesystem;
> +	get_fs_names(fs_names);

I am wondering what use case we are trying to address by calling
get_fs_names() and trying do_mount_root() on all filesystems
returned by get_fs_names(). I am assuming following use cases
you have in mind.

A. User passes a single filesystem in rootfstype.
   
   root=myfs rootfstype=virtiofs rw

B. User passes multiple filesystems in rootfstype and kernel tries all
   of them one after the other

   root=myfs, rootfstype=9p,virtiofs rw

C. User does not pass a filesystem type at all. And kernel will get a
   list of in-built filesystems and will try these one after the other.

   root=myfs rw

If that's the thought, will it make sense to call get_fs_names() first
and then inside the for loop call get_fs_type() and try mounting
only if FS_REQUIRES_DEV is not set, otherwise skip and move onto th
next filesystem in the list (fs_names).

Thanks
Vivek

> +
> +	for (p = fs_names; *p; p += strlen(p) + 1) {
> +		err = do_mount_root(root_device_name, p, root_mountflags,
> +					root_mount_data);
> +		if (!err)
> +			break;
> +		if (err != -EACCES && err != -EINVAL)
> +			panic("VFS: Unable to mount root \"%s\" (%s), err=%d\n",
> +				      root_device_name, p, err);
> +	}
> +
> +	free_page((unsigned long)fs_names);
> +out_put_filesystem:
> +	put_filesystem(fs);
> +out:
> +	return err;
> +}
> +
>  void __init mount_root(void)
>  {
>  #ifdef CONFIG_ROOT_NFS
> @@ -546,6 +579,8 @@ void __init mount_root(void)
>  		return;
>  	}
>  #endif
> +	if (ROOT_DEV == 0 && mount_nodev_root() == 0)
> +		return;
>  #ifdef CONFIG_BLOCK
>  	{
>  		int err = create_dev("/dev/root", ROOT_DEV);
>
Christoph Hellwig June 17, 2021, 2:25 p.m. UTC | #3
On Thu, Jun 17, 2021 at 09:30:52AM -0400, Vivek Goyal wrote:
> > +static int __init mount_nodev_root(void)
> > +{
> > +	struct file_system_type *fs = get_fs_type(root_fs_names);
> 
> get_fs_type() assumes root_fs_names is not null. So if I pass
> "root=myfs rw", it crashes with null pointer dereference.

Ok, I'll need to fix that.

> > +	int err = -ENODEV;
> > +
> > +	if (!fs)
> > +		goto out;
> > +	if (fs->fs_flags & FS_REQUIRES_DEV)
> > +		goto out_put_filesystem;
> > +
> > +	fs_names = (void *)__get_free_page(GFP_KERNEL);
> > +	if (!fs_names)
> > +		goto out_put_filesystem;
> > +	get_fs_names(fs_names);
> 
> I am wondering what use case we are trying to address by calling
> get_fs_names() and trying do_mount_root() on all filesystems
> returned by get_fs_names(). I am assuming following use cases
> you have in mind.
> 
> A. User passes a single filesystem in rootfstype.
>    
>    root=myfs rootfstype=virtiofs rw
> 
> B. User passes multiple filesystems in rootfstype and kernel tries all
>    of them one after the other
> 
>    root=myfs, rootfstype=9p,virtiofs rw
> 
> C. User does not pass a filesystem type at all. And kernel will get a
>    list of in-built filesystems and will try these one after the other.
> 
>    root=myfs rw
> 
> If that's the thought, will it make sense to call get_fs_names() first
> and then inside the for loop call get_fs_type() and try mounting
> only if FS_REQUIRES_DEV is not set, otherwise skip and move onto th
> next filesystem in the list (fs_names).

I thought of A and B.  I did not think at all of C and think it is
a rather bad idea.  I'll revisit the patch to avoid C and will resend it
as a formal patch.