Message ID | 20180221202908.17258-1-ebiederm@xmission.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Feb 21, 2018 at 9:29 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > At the point of fuse_dev_do_read the user space process that initiated the > action on the fuse filesystem may no longer exist. The process have been > killed or may have fired an asynchronous request and exited. > > If the initial process has exited the code "pid_vnr(find_pid_ns(in->h.pid, > fc->pid_ns)" will either return a pid of 0, or in the unlikely event that > the pid has been reallocated it can return practically any pid. Any pid is > possible as the pid allocator allocates pid numbers in different pid > namespaces independently. > > The only way to make translation in fuse_dev_do_read reliable is to call > get_pid in fuse_req_init_context, and pid_vnr followed by put_pid in > fuse_dev_do_read. That reference counting in other contexts has been shown > to bounce cache lines between processors and in general be slow. So that is > not desirable. > > The only known user of running the fuse server in a different pid namespace > from the filesystem does not care what the pids are in the fuse messages > so removing this code should not matter. Shouldn't we at least zero out the pid in that case? Thanks, Miklos > > Getting the translation to a server running outside of the pid namespace > of a container can still be achieved by playing setns games at mount time. > It is also possible to add an option to pass a pid namespace into the fuse > filesystem at mount time. > > Fixes: 5d6d3a301c4e ("fuse: allow server to run in different pid_ns") > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > fs/fuse/dev.c | 6 ------ > 1 file changed, 6 deletions(-) > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > index 5d06384c2cae..0fb58f364fa6 100644 > --- a/fs/fuse/dev.c > +++ b/fs/fuse/dev.c > @@ -1260,12 +1260,6 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > in = &req->in; > reqsize = in->h.len; > > - if (task_active_pid_ns(current) != fc->pid_ns) { > - rcu_read_lock(); > - in->h.pid = pid_vnr(find_pid_ns(in->h.pid, fc->pid_ns)); > - rcu_read_unlock(); > - } > - > /* If request is too large, reply with an error and restart the read */ > if (nbytes < reqsize) { > req->out.h.error = -EIO; > -- > 2.14.1 >
Miklos Szeredi <mszeredi@redhat.com> writes: > On Wed, Feb 21, 2018 at 9:29 PM, Eric W. Biederman > <ebiederm@xmission.com> wrote: >> At the point of fuse_dev_do_read the user space process that initiated the >> action on the fuse filesystem may no longer exist. The process have been >> killed or may have fired an asynchronous request and exited. >> >> If the initial process has exited the code "pid_vnr(find_pid_ns(in->h.pid, >> fc->pid_ns)" will either return a pid of 0, or in the unlikely event that >> the pid has been reallocated it can return practically any pid. Any pid is >> possible as the pid allocator allocates pid numbers in different pid >> namespaces independently. >> >> The only way to make translation in fuse_dev_do_read reliable is to call >> get_pid in fuse_req_init_context, and pid_vnr followed by put_pid in >> fuse_dev_do_read. That reference counting in other contexts has been shown >> to bounce cache lines between processors and in general be slow. So that is >> not desirable. >> >> The only known user of running the fuse server in a different pid namespace >> from the filesystem does not care what the pids are in the fuse messages >> so removing this code should not matter. > > Shouldn't we at least zero out the pid in that case? This is an explicit case of passing a file descriptor between pid namespaces. So I think there are plenty of buyer be ware signs out. So I don't know if there are any real world advantages of zeroing the pid. I can see a case for using the pid namespace of the opener of /dev/fuse instead of the pid namespace of the mounter of the fuse filesystem. Although in practice I would be surprised if they were different. I am very leary about caring during a read operation. Caring about the current processes during read/write tends to break caching, is error prone as the need for this patch demonstrates, and is generally likely to be slower than not caring. So yes we can zero the pid. I don't think it is wise to zero the pid unless we zero the pid in fuse_req_init_context. Eric
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 5d06384c2cae..0fb58f364fa6 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1260,12 +1260,6 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, in = &req->in; reqsize = in->h.len; - if (task_active_pid_ns(current) != fc->pid_ns) { - rcu_read_lock(); - in->h.pid = pid_vnr(find_pid_ns(in->h.pid, fc->pid_ns)); - rcu_read_unlock(); - } - /* If request is too large, reply with an error and restart the read */ if (nbytes < reqsize) { req->out.h.error = -EIO;
At the point of fuse_dev_do_read the user space process that initiated the action on the fuse filesystem may no longer exist. The process have been killed or may have fired an asynchronous request and exited. If the initial process has exited the code "pid_vnr(find_pid_ns(in->h.pid, fc->pid_ns)" will either return a pid of 0, or in the unlikely event that the pid has been reallocated it can return practically any pid. Any pid is possible as the pid allocator allocates pid numbers in different pid namespaces independently. The only way to make translation in fuse_dev_do_read reliable is to call get_pid in fuse_req_init_context, and pid_vnr followed by put_pid in fuse_dev_do_read. That reference counting in other contexts has been shown to bounce cache lines between processors and in general be slow. So that is not desirable. The only known user of running the fuse server in a different pid namespace from the filesystem does not care what the pids are in the fuse messages so removing this code should not matter. Getting the translation to a server running outside of the pid namespace of a container can still be achieved by playing setns games at mount time. It is also possible to add an option to pass a pid namespace into the fuse filesystem at mount time. Fixes: 5d6d3a301c4e ("fuse: allow server to run in different pid_ns") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/fuse/dev.c | 6 ------ 1 file changed, 6 deletions(-)