Message ID | 20240206142453.1906268-4-amir73il@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | FUSE passthrough for file io | expand |
Hi Amir, On 2/6/24 10:24 PM, Amir Goldstein wrote: > FUSE server calls the FUSE_DEV_IOC_BACKING_OPEN ioctl with a backing file > descriptor. If the call succeeds, a backing file identifier is returned. > > A later change will be using this backing file id in a reply to OPEN > request with the flag FOPEN_PASSTHROUGH to setup passthrough of file > operations on the open FUSE file to the backing file. > > The FUSE server should call FUSE_DEV_IOC_BACKING_CLOSE ioctl to close the > backing file by its id. > > This can be done at any time, but if an open reply with FOPEN_PASSTHROUGH > flag is still in progress, the open may fail if the backing file is > closed before the fuse file was opened. > > Setting up backing files requires a server with CAP_SYS_ADMIN privileges. > For the backing file to be successfully setup, the backing file must > implement both read_iter and write_iter file operations. > > The limitation on the level of filesystem stacking allowed for the > backing file is enforced before setting up the backing file. > > Signed-off-by: Alessio Balsini <balsini@android.com> > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- [...] > +int fuse_backing_close(struct fuse_conn *fc, int backing_id) > +{ > + struct fuse_backing *fb = NULL; > + int err; > + > + pr_debug("%s: backing_id=%d\n", __func__, backing_id); > + > + /* TODO: relax CAP_SYS_ADMIN once backing files are visible to lsof */ > + err = -EPERM; > + if (!fc->passthrough || !capable(CAP_SYS_ADMIN)) > + goto out; Sorry for the late comment as I started reading this series these days. I don't understand why CAP_SYS_ADMIN is required for the fuse server, though I can understand it's a security constraint. I can only find that this constraint is newly added since v14, but failed to find any related discussion or hint. Besides, is there any chance relaxing the constraint to ns_capable(CAP_SYS_ADMIN), as FUSE supports FS_USERNS_MOUNT, i.e. support passthrough mode in user namespace?
On Wed, Feb 28, 2024 at 12:50 PM Jingbo Xu <jefflexu@linux.alibaba.com> wrote: > > Hi Amir, > > On 2/6/24 10:24 PM, Amir Goldstein wrote: > > FUSE server calls the FUSE_DEV_IOC_BACKING_OPEN ioctl with a backing file > > descriptor. If the call succeeds, a backing file identifier is returned. > > > > A later change will be using this backing file id in a reply to OPEN > > request with the flag FOPEN_PASSTHROUGH to setup passthrough of file > > operations on the open FUSE file to the backing file. > > > > The FUSE server should call FUSE_DEV_IOC_BACKING_CLOSE ioctl to close the > > backing file by its id. > > > > This can be done at any time, but if an open reply with FOPEN_PASSTHROUGH > > flag is still in progress, the open may fail if the backing file is > > closed before the fuse file was opened. > > > > Setting up backing files requires a server with CAP_SYS_ADMIN privileges. > > For the backing file to be successfully setup, the backing file must > > implement both read_iter and write_iter file operations. > > > > The limitation on the level of filesystem stacking allowed for the > > backing file is enforced before setting up the backing file. > > > > Signed-off-by: Alessio Balsini <balsini@android.com> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > --- > > [...] > > > +int fuse_backing_close(struct fuse_conn *fc, int backing_id) > > +{ > > + struct fuse_backing *fb = NULL; > > + int err; > > + > > + pr_debug("%s: backing_id=%d\n", __func__, backing_id); > > + > > + /* TODO: relax CAP_SYS_ADMIN once backing files are visible to lsof */ > > + err = -EPERM; > > + if (!fc->passthrough || !capable(CAP_SYS_ADMIN)) > > + goto out; > > Sorry for the late comment as I started reading this series these days. > > I don't understand why CAP_SYS_ADMIN is required for the fuse server, > though I can understand it's a security constraint. I can only find > that this constraint is newly added since v14, but failed to find any > related discussion or hint. > This requirement is from Miklos. The concern is that FUSE_DEV_IOC_BACKING_OPEN opens a file, which then prevent clean unmount of fs, is not accounted in the user's rlimit and is not visible in lsof, because it is not in any process file descriptors table. Miklos suggested that every FUSE connection will have a kernel thread that those open fds could be associated with. Hence the comment: /* TODO: relax CAP_SYS_ADMIN once backing files are visible to lsof */ But since then, Christian has made some changes to the lifetime of file objects, which require that backing_file must never be installed in files table, so this solution is not as straightforward to implement. In any case, we decided to defer this problem to the future. > Besides, is there any chance relaxing the constraint to > ns_capable(CAP_SYS_ADMIN), as FUSE supports FS_USERNS_MOUNT, i.e. > support passthrough mode in user namespace? > I don't think so, because it will allow unprivileged user to exceed its nested rlimits and hide open files that are invisble to lsof. Thanks, Amir.
On Wed, 28 Feb 2024 at 12:08, Amir Goldstein <amir73il@gmail.com> wrote: > I don't think so, because it will allow unprivileged user to exceed its > nested rlimits and hide open files that are invisble to lsof. How does io_uring deal with the similar problem of "fixed files"? Thanks, Miklos
On Wed, Feb 28, 2024 at 1:14 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Wed, 28 Feb 2024 at 12:08, Amir Goldstein <amir73il@gmail.com> wrote: > > > I don't think so, because it will allow unprivileged user to exceed its > > nested rlimits and hide open files that are invisble to lsof. > > How does io_uring deal with the similar problem of "fixed files"? > Good question. Jens, Chritian, Are fixed files visible to lsof? Do they have to remain open in the files table of process that set them in addition to being registered as fixed files? Do they get accounted in rlimit? of which user? Thanks, Amir.
On 2/28/24 12:14, Miklos Szeredi wrote: > On Wed, 28 Feb 2024 at 12:08, Amir Goldstein <amir73il@gmail.com> wrote: > >> I don't think so, because it will allow unprivileged user to exceed its >> nested rlimits and hide open files that are invisble to lsof. > > How does io_uring deal with the similar problem of "fixed files"? Maybe I miss something, but with io_uring the process opens a file descriptor and then registers that as fixed file? I.e. the open files are not invisible? Thanks, Bernd
On 2/28/24 4:28 AM, Amir Goldstein wrote: > On Wed, Feb 28, 2024 at 1:14?PM Miklos Szeredi <miklos@szeredi.hu> wrote: >> >> On Wed, 28 Feb 2024 at 12:08, Amir Goldstein <amir73il@gmail.com> wrote: >> >>> I don't think so, because it will allow unprivileged user to exceed its >>> nested rlimits and hide open files that are invisble to lsof. >> >> How does io_uring deal with the similar problem of "fixed files"? >> > > Good question. > > Jens, Chritian, > Are fixed files visible to lsof? lsof won't show them, but you can read the fdinfo of the io_uring fd to see them. Would probably be possible to make lsof find and show them too, but haven't looked into that. > Do they have to remain open in the files table of process that set them > in addition to being registered as fixed files? No, in fact they never have to be there in the first place. You can open a normal file and then register it, now it's in both. Then you can close the normal fd, and now it's not in the normal process file table anymore, just in the direct list. Or you can instantiate it as a direct descriptor to begin with, and then it'll never have been in the normal file table. > Do they get accounted in rlimit? of which user? The fixed file table is limited in size by RLIMIT_NOFILE by the user that registers it.
On Wed, 28 Feb 2024 at 15:32, Jens Axboe <axboe@kernel.dk> wrote: > > On 2/28/24 4:28 AM, Amir Goldstein wrote: > > Are fixed files visible to lsof? > > lsof won't show them, but you can read the fdinfo of the io_uring fd to > see them. Would probably be possible to make lsof find and show them > too, but haven't looked into that. Okay, that approach could be used with fuse as well. This isn't perfect, but we can think improving the interface for both. > > Do they get accounted in rlimit? of which user? > > The fixed file table is limited in size by RLIMIT_NOFILE by the user > that registers it. That's different for fuse as the server wouldn't register the whole file table in one go. The number of used slots could be limited by RLIMIT_NOFILE, I think. Thanks, Miklos
On 2/28/24 8:01 AM, Miklos Szeredi wrote: > On Wed, 28 Feb 2024 at 15:32, Jens Axboe <axboe@kernel.dk> wrote: >> >> On 2/28/24 4:28 AM, Amir Goldstein wrote: > >>> Are fixed files visible to lsof? >> >> lsof won't show them, but you can read the fdinfo of the io_uring fd to >> see them. Would probably be possible to make lsof find and show them >> too, but haven't looked into that. > > Okay, that approach could be used with fuse as well. This isn't > perfect, but we can think improving the interface for both. Yeah agree, would be nice to either patch lsof to show them, or come up with an alternative way to expose it so it just works. >>> Do they get accounted in rlimit? of which user? >> >> The fixed file table is limited in size by RLIMIT_NOFILE by the user >> that registers it. > > That's different for fuse as the server wouldn't register the whole > file table in one go. The number of used slots could be limited by > RLIMIT_NOFILE, I think. A normal use cases is to register an empty table of that size, which is just the table itself. And then the table gets filled out as direct descriptors are opened (and closed, etc).
On Wed, Feb 28, 2024 at 5:01 PM Miklos Szeredi <miklos@szeredi.hu> wrote: > > On Wed, 28 Feb 2024 at 15:32, Jens Axboe <axboe@kernel.dk> wrote: > > > > On 2/28/24 4:28 AM, Amir Goldstein wrote: > > > > Are fixed files visible to lsof? > > > > lsof won't show them, but you can read the fdinfo of the io_uring fd to > > see them. Would probably be possible to make lsof find and show them > > too, but haven't looked into that. > > Okay, that approach could be used with fuse as well. This isn't > perfect, but we can think improving the interface for both. > > > > Do they get accounted in rlimit? of which user? > > > > The fixed file table is limited in size by RLIMIT_NOFILE by the user > > that registers it. > > That's different for fuse as the server wouldn't register the whole > file table in one go. The number of used slots could be limited by > RLIMIT_NOFILE, I think. > Yes. We can limit the number of backing id slots, which is the number of FUSE_DEV_IOC_BACKING_OPEN for whom the server did not call FUSE_DEV_IOC_BACKING_CLOSE, and we can easily display the backing files in backing_files_map, but we also have inode attached backing file, whose lifetime is that of the fuse files. Those all harder to iterate and harder to limit, because the server cannot close/revoke them. OTOH, we could treat the inode attached backing files same as the overlayfs backing files - worst case they only double the number of files in the system. We can probably keep them also in backing_files_map so we can iterate them, but maybe let's see how the basic feature works first... Thanks, Amir.
On Wed, Feb 28, 2024 at 04:01:17PM +0100, Miklos Szeredi wrote: > On Wed, 28 Feb 2024 at 15:32, Jens Axboe <axboe@kernel.dk> wrote: > > > > On 2/28/24 4:28 AM, Amir Goldstein wrote: > > > > Are fixed files visible to lsof? > > > > lsof won't show them, but you can read the fdinfo of the io_uring fd to > > see them. Would probably be possible to make lsof find and show them > > too, but haven't looked into that. I actually wrote about this before when I suggested IORING_OP_FIXED_FD_INSTALL: https://patchwork.kernel.org/project/io-uring/patch/df0e24ff-f3a0-4818-8282-2a4e03b7b5a6@kernel.dk/#25629935
On Thu, Feb 29, 2024 at 11:15:35AM +0100, Christian Brauner wrote: > On Wed, Feb 28, 2024 at 04:01:17PM +0100, Miklos Szeredi wrote: > > On Wed, 28 Feb 2024 at 15:32, Jens Axboe <axboe@kernel.dk> wrote: > > > > > > On 2/28/24 4:28 AM, Amir Goldstein wrote: > > > > > > Are fixed files visible to lsof? > > > > > > lsof won't show them, but you can read the fdinfo of the io_uring fd to > > > see them. Would probably be possible to make lsof find and show them > > > too, but haven't looked into that. > > I actually wrote about this before when I suggested IORING_OP_FIXED_FD_INSTALL: > https://patchwork.kernel.org/project/io-uring/patch/df0e24ff-f3a0-4818-8282-2a4e03b7b5a6@kernel.dk/#25629935 I think that it shouldn't be a problem as long as userspace has some way of figuring this out. So extending lsof might just be enough for this.
On Thu, 29 Feb 2024 at 11:17, Christian Brauner <brauner@kernel.org> wrote: > > On Thu, Feb 29, 2024 at 11:15:35AM +0100, Christian Brauner wrote: > > On Wed, Feb 28, 2024 at 04:01:17PM +0100, Miklos Szeredi wrote: > > > On Wed, 28 Feb 2024 at 15:32, Jens Axboe <axboe@kernel.dk> wrote: > > > > > > > > On 2/28/24 4:28 AM, Amir Goldstein wrote: > > > > > > > > Are fixed files visible to lsof? > > > > > > > > lsof won't show them, but you can read the fdinfo of the io_uring fd to > > > > see them. Would probably be possible to make lsof find and show them > > > > too, but haven't looked into that. > > > > I actually wrote about this before when I suggested IORING_OP_FIXED_FD_INSTALL: > > https://patchwork.kernel.org/project/io-uring/patch/df0e24ff-f3a0-4818-8282-2a4e03b7b5a6@kernel.dk/#25629935 > > I think that it shouldn't be a problem as long as userspace has some way > of figuring this out. So extending lsof might just be enough for this. Problem is fdinfo on io_uring fd just contains the last component names. Do we want full "magic symlink" semantics for these? I'm not sure. But just the last component does seem too little. I've advocated using xattr for querying virtual attributes like these. So I'll advocate again. Does anyone see a problem with adding getxattr("/proc/$PID/fdinfo/$IO_URING_FD", "io_uring:fixed_files:$SLOT:path", buf, buflen); ? Thanks, Miklos
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index eba68b57bd7c..b680787bd66d 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -2283,6 +2283,41 @@ static long fuse_dev_ioctl_clone(struct file *file, __u32 __user *argp) return res; } +static long fuse_dev_ioctl_backing_open(struct file *file, + struct fuse_backing_map __user *argp) +{ + struct fuse_dev *fud = fuse_get_dev(file); + struct fuse_backing_map map; + + if (!fud) + return -EINVAL; + + if (!IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) + return -EOPNOTSUPP; + + if (copy_from_user(&map, argp, sizeof(map))) + return -EFAULT; + + return fuse_backing_open(fud->fc, &map); +} + +static long fuse_dev_ioctl_backing_close(struct file *file, __u32 __user *argp) +{ + struct fuse_dev *fud = fuse_get_dev(file); + int backing_id; + + if (!fud) + return -EINVAL; + + if (!IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) + return -EOPNOTSUPP; + + if (get_user(backing_id, argp)) + return -EFAULT; + + return fuse_backing_close(fud->fc, backing_id); +} + static long fuse_dev_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -2292,6 +2327,12 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd, case FUSE_DEV_IOC_CLONE: return fuse_dev_ioctl_clone(file, argp); + case FUSE_DEV_IOC_BACKING_OPEN: + return fuse_dev_ioctl_backing_open(file, argp); + + case FUSE_DEV_IOC_BACKING_CLOSE: + return fuse_dev_ioctl_backing_close(file, argp); + default: return -ENOTTY; } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index cae525147856..fb9ef02cbf45 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -79,6 +79,7 @@ struct fuse_submount_lookup { /** Container for data related to mapping to backing file */ struct fuse_backing { struct file *file; + struct cred *cred; /** refcount */ refcount_t count; @@ -897,6 +898,11 @@ struct fuse_conn { /* New writepages go into this bucket */ struct fuse_sync_bucket __rcu *curr_bucket; + +#ifdef CONFIG_FUSE_PASSTHROUGH + /** IDR for backing files ids */ + struct idr backing_files_map; +#endif }; /* @@ -1414,5 +1420,9 @@ static inline struct fuse_backing *fuse_inode_backing_set(struct fuse_inode *fi, struct fuse_backing *fuse_backing_get(struct fuse_backing *fb); void fuse_backing_put(struct fuse_backing *fb); +void fuse_backing_files_init(struct fuse_conn *fc); +void fuse_backing_files_free(struct fuse_conn *fc); +int fuse_backing_open(struct fuse_conn *fc, struct fuse_backing_map *map); +int fuse_backing_close(struct fuse_conn *fc, int backing_id); #endif /* _FS_FUSE_I_H */ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index c771bd3c1336..c26a84439934 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -930,6 +930,9 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm, fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ; fc->max_pages_limit = FUSE_MAX_MAX_PAGES; + if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) + fuse_backing_files_init(fc); + INIT_LIST_HEAD(&fc->mounts); list_add(&fm->fc_entry, &fc->mounts); fm->fc = fc; @@ -1392,6 +1395,8 @@ EXPORT_SYMBOL_GPL(fuse_send_init); void fuse_free_conn(struct fuse_conn *fc) { WARN_ON(!list_empty(&fc->devices)); + if (IS_ENABLED(CONFIG_FUSE_PASSTHROUGH)) + fuse_backing_files_free(fc); kfree_rcu(fc, rcu); } EXPORT_SYMBOL_GPL(fuse_free_conn); diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c index e8639c0a9ac6..6604d414adb5 100644 --- a/fs/fuse/passthrough.c +++ b/fs/fuse/passthrough.c @@ -18,8 +18,11 @@ struct fuse_backing *fuse_backing_get(struct fuse_backing *fb) static void fuse_backing_free(struct fuse_backing *fb) { + pr_debug("%s: fb=0x%p\n", __func__, fb); + if (fb->file) fput(fb->file); + put_cred(fb->cred); kfree_rcu(fb, rcu); } @@ -28,3 +31,135 @@ void fuse_backing_put(struct fuse_backing *fb) if (fb && refcount_dec_and_test(&fb->count)) fuse_backing_free(fb); } + +void fuse_backing_files_init(struct fuse_conn *fc) +{ + idr_init(&fc->backing_files_map); +} + +static int fuse_backing_id_alloc(struct fuse_conn *fc, struct fuse_backing *fb) +{ + int id; + + idr_preload(GFP_KERNEL); + spin_lock(&fc->lock); + id = idr_alloc_cyclic(&fc->backing_files_map, fb, 1, 0, GFP_ATOMIC); + spin_unlock(&fc->lock); + idr_preload_end(); + + WARN_ON_ONCE(id == 0); + return id; +} + +static struct fuse_backing *fuse_backing_id_remove(struct fuse_conn *fc, + int id) +{ + struct fuse_backing *fb; + + spin_lock(&fc->lock); + fb = idr_remove(&fc->backing_files_map, id); + spin_unlock(&fc->lock); + + return fb; +} + +static int fuse_backing_id_free(int id, void *p, void *data) +{ + struct fuse_backing *fb = p; + + WARN_ON_ONCE(refcount_read(&fb->count) != 1); + fuse_backing_free(fb); + return 0; +} + +void fuse_backing_files_free(struct fuse_conn *fc) +{ + idr_for_each(&fc->backing_files_map, fuse_backing_id_free, NULL); + idr_destroy(&fc->backing_files_map); +} + +int fuse_backing_open(struct fuse_conn *fc, struct fuse_backing_map *map) +{ + struct file *file; + struct super_block *backing_sb; + struct fuse_backing *fb = NULL; + int res; + + pr_debug("%s: fd=%d flags=0x%x\n", __func__, map->fd, map->flags); + + /* TODO: relax CAP_SYS_ADMIN once backing files are visible to lsof */ + res = -EPERM; + if (!fc->passthrough || !capable(CAP_SYS_ADMIN)) + goto out; + + res = -EINVAL; + if (map->flags) + goto out; + + file = fget(map->fd); + res = -EBADF; + if (!file) + goto out; + + res = -EOPNOTSUPP; + if (!file->f_op->read_iter || !file->f_op->write_iter) + goto out_fput; + + backing_sb = file_inode(file)->i_sb; + res = -ELOOP; + if (backing_sb->s_stack_depth >= fc->max_stack_depth) + goto out_fput; + + fb = kmalloc(sizeof(struct fuse_backing), GFP_KERNEL); + res = -ENOMEM; + if (!fb) + goto out_fput; + + fb->file = file; + fb->cred = prepare_creds(); + refcount_set(&fb->count, 1); + + res = fuse_backing_id_alloc(fc, fb); + if (res < 0) { + fuse_backing_free(fb); + fb = NULL; + } + +out: + pr_debug("%s: fb=0x%p, ret=%i\n", __func__, fb, res); + + return res; + +out_fput: + fput(file); + goto out; +} + +int fuse_backing_close(struct fuse_conn *fc, int backing_id) +{ + struct fuse_backing *fb = NULL; + int err; + + pr_debug("%s: backing_id=%d\n", __func__, backing_id); + + /* TODO: relax CAP_SYS_ADMIN once backing files are visible to lsof */ + err = -EPERM; + if (!fc->passthrough || !capable(CAP_SYS_ADMIN)) + goto out; + + err = -EINVAL; + if (backing_id <= 0) + goto out; + + err = -ENOENT; + fb = fuse_backing_id_remove(fc, backing_id); + if (!fb) + goto out; + + fuse_backing_put(fb); + err = 0; +out: + pr_debug("%s: fb=0x%p, err=%i\n", __func__, fb, err); + + return err; +} diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index edfc458e5e8f..af0fe3aec329 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -1059,9 +1059,18 @@ struct fuse_notify_retrieve_in { uint64_t dummy4; }; +struct fuse_backing_map { + int32_t fd; + uint32_t flags; + uint64_t padding; +}; + /* Device ioctls: */ #define FUSE_DEV_IOC_MAGIC 229 #define FUSE_DEV_IOC_CLONE _IOR(FUSE_DEV_IOC_MAGIC, 0, uint32_t) +#define FUSE_DEV_IOC_BACKING_OPEN _IOW(FUSE_DEV_IOC_MAGIC, 1, \ + struct fuse_backing_map) +#define FUSE_DEV_IOC_BACKING_CLOSE _IOW(FUSE_DEV_IOC_MAGIC, 2, uint32_t) struct fuse_lseek_in { uint64_t fh;