Message ID | 20170217083324.627615532@openvz.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Feb 17, 2017 at 12:30 AM, Cyrill Gorcunov <gorcunov@openvz.org> wrote: > When we checkpoint a process we look into /proc/<pid>/fdinfo/<fd> of eventpoll > file and parse target files list from there. In most situations this is fine > because target file is present in the /proc/<pid>/fd/ list. But in case if file > descriptor was dup'ed or transferred via unix socket and closed after, > it might not be in the list and we can't figure out which file descriptor > to pass into epoll_ctl call. > > To resolve this tie lets add EPOLL_CTL_DUP operation which simply takes > target file descriptor number and installs it into a caller's file table, > thus we can use kcmp() syscall and figure out which exactly file to be > added into eventpoll on restore procedure. This is a scary thing to let an unprivileged process do. I'm wondering if there might be a nicer way to address this using a better interface in /proc.
On Fri, Feb 17, 2017 at 08:52:59AM -0800, Andy Lutomirski wrote: > > > > To resolve this tie lets add EPOLL_CTL_DUP operation which simply takes > > target file descriptor number and installs it into a caller's file table, > > thus we can use kcmp() syscall and figure out which exactly file to be > > added into eventpoll on restore procedure. > > This is a scary thing to let an unprivileged process do. > > I'm wondering if there might be a nicer way to address this using a > better interface in /proc. Well, I tend to agree. Need to add security checking if the target file is accessable by a caller. As to better interface to procfs nothing comes to mind immediately. Another potential problem is that since it is never guaranteed that target file number listed in fdinfo matching existing /proc/pid/fd/N, so that I think we will have to use this dup functionality for every target file, which of course not that fast. Probably need to think more if I manage to invent some better and faster interface to find where exactly target file belong in the whole process tree of a container. Thanks for pointing about this security problem, Andy!
On Fri, Feb 17, 2017 at 9:11 AM, Cyrill Gorcunov <gorcunov@gmail.com> wrote: > On Fri, Feb 17, 2017 at 08:52:59AM -0800, Andy Lutomirski wrote: >> > >> > To resolve this tie lets add EPOLL_CTL_DUP operation which simply takes >> > target file descriptor number and installs it into a caller's file table, >> > thus we can use kcmp() syscall and figure out which exactly file to be >> > added into eventpoll on restore procedure. >> >> This is a scary thing to let an unprivileged process do. >> >> I'm wondering if there might be a nicer way to address this using a >> better interface in /proc. > > Well, I tend to agree. Need to add security checking if the target > file is accessable by a caller. As to better interface to procfs > nothing comes to mind immediately. Another potential problem is that > since it is never guaranteed that target file number listed in fdinfo > matching existing /proc/pid/fd/N, so that I think we will have to > use this dup functionality for every target file, which of course > not that fast. Probably need to think more if I manage to invent > some better and faster interface to find where exactly target file > belong in the whole process tree of a container. i was imagining some proc or proc-like interface that lets you inspect an open file without needing to know what process has the fd. What if you introduced a new type of fd that's an "fd reference". You could add a kcmp mode that tells you whether an fd reference refers to the same thing as a real fd, but you'd arrange for fd references to be otherwise useless. Alternatively, you could simply have an interface like kcmp (maybe a new kcmp mode) that lets you compare an epoll set entry to an actual fd. Then you could figure out what it is but only if you already have the fd by some other means. Of course, if there are no references left, you still have a problem. Hmm. > > Thanks for pointing about this security problem, Andy!
On Fri, Feb 17, 2017 at 09:20:34AM -0800, Andy Lutomirski wrote: > >> > >> This is a scary thing to let an unprivileged process do. > >> > >> I'm wondering if there might be a nicer way to address this using a > >> better interface in /proc. > > > > Well, I tend to agree. Need to add security checking if the target > > file is accessable by a caller. As to better interface to procfs > > nothing comes to mind immediately. Another potential problem is that > > since it is never guaranteed that target file number listed in fdinfo > > matching existing /proc/pid/fd/N, so that I think we will have to > > use this dup functionality for every target file, which of course > > not that fast. Probably need to think more if I manage to invent > > some better and faster interface to find where exactly target file > > belong in the whole process tree of a container. > > i was imagining some proc or proc-like interface that lets you inspect > an open file without needing to know what process has the fd. We will have to find out which process opened the target fd (currently in criu we simply assume that target file is always in process which created epoll and other scenarios are not supported. in most situations that's enough but unfortunately we find a testcase where it's not true and have to find a way to support migrated targets too). > What if you introduced a new type of fd that's an "fd reference". You > could add a kcmp mode that tells you whether an fd reference refers to > the same thing as a real fd, but you'd arrange for fd references to be > otherwise useless. > > Alternatively, you could simply have an interface like kcmp (maybe a > new kcmp mode) that lets you compare an epoll set entry to an actual > fd. Then you could figure out what it is but only if you already have > the fd by some other means. Of course, if there are no references > left, you still have a problem. Hmm. kcmp over target set seems to be preferred since it gonna be a way faster. But I think about fd-ref idea too. Thanks a huge!
On 02/17/2017 01:01 PM, Cyrill Gorcunov wrote: > On Fri, Feb 17, 2017 at 09:20:34AM -0800, Andy Lutomirski wrote: >>>> >>>> This is a scary thing to let an unprivileged process do. >>>> >>>> I'm wondering if there might be a nicer way to address this using a >>>> better interface in /proc. >>> >>> Well, I tend to agree. Need to add security checking if the target >>> file is accessable by a caller. As to better interface to procfs >>> nothing comes to mind immediately. Another potential problem is that >>> since it is never guaranteed that target file number listed in fdinfo >>> matching existing /proc/pid/fd/N, so that I think we will have to >>> use this dup functionality for every target file, which of course >>> not that fast. Probably need to think more if I manage to invent >>> some better and faster interface to find where exactly target file >>> belong in the whole process tree of a container. >> >> i was imagining some proc or proc-like interface that lets you inspect >> an open file without needing to know what process has the fd. > > We will have to find out which process opened the target fd (currently > in criu we simply assume that target file is always in process which > created epoll and other scenarios are not supported. in most situations > that's enough but unfortunately we find a testcase where it's not > true and have to find a way to support migrated targets too). > >> What if you introduced a new type of fd that's an "fd reference". You >> could add a kcmp mode that tells you whether an fd reference refers to >> the same thing as a real fd, but you'd arrange for fd references to be >> otherwise useless. >> >> Alternatively, you could simply have an interface like kcmp (maybe a >> new kcmp mode) that lets you compare an epoll set entry to an actual >> fd. Then you could figure out what it is but only if you already have >> the fd by some other means. Of course, if there are no references >> left, you still have a problem. Hmm. > > kcmp over target set seems to be preferred since it gonna be a way > faster. But I think about fd-ref idea too. Thanks a huge! > I sort of like kcmp() for this as well maybe the epoll fd argument can be a pointer to something like: struct epoll_slot { int epfd; int slot; }; Where 'epfd' is the epoll fd, and slot is just an index into the target fds associated with the epoll fd. That is, you would start from 0 and go up to n. When you reach the last one you get in -EBADSLT or something. Regarding Andy's point of of not having a reference by some other means, I think you have to have it basically otherwise it would be removed from the epfd. I guess it could be being passed via a unix socket? But there must be a way to get at those anyways, in order to restore them. Also, in the initial patch this function can not restore all target epoll files: 'ep_find_tfd(struct eventpoll *ep, int tfd)' Because the 'target' file for an epfd is the pair: {fd, file}. Thus, you could have say two different 'target' files with the same fd number. Thanks, -Jason
On Fri, Feb 17, 2017 at 01:34:04PM -0500, Jason Baron wrote: > > > > kcmp over target set seems to be preferred since it gonna be a way > > faster. But I think about fd-ref idea too. Thanks a huge! > > > > I sort of like kcmp() for this as well maybe the epoll fd argument can be a > pointer to something like: > > struct epoll_slot { > int epfd; > int slot; > }; > > Where 'epfd' is the epoll fd, and slot is just an index into the target fds > associated with the epoll fd. That is, you would start from 0 and go up to > n. When you reach the last one you get in -EBADSLT or something. Yes, I think we can even extend the slot and pass it as a pointer to target fd set which are to be tested. > Regarding Andy's point of of not having a reference by some other means, I > think you have to have it basically otherwise it would be removed from the > epfd. I guess it could be being passed via a unix socket? But there must be > a way to get at those anyways, in order to restore them. Yes. But the file may be obtained from unix sock, then dup'ed and installed and then this dup'ed file get closed but unix instance is still alive. And then we need to figure out that it's unix socket from where original target file came from. > Also, in the initial patch this function can not restore all target epoll > files: > > 'ep_find_tfd(struct eventpoll *ep, int tfd)' > > Because the 'target' file for an epfd is the pair: {fd, file}. Thus, you > could have say two different 'target' files with the same fd number. Yes, but if we would have _DUP'ed them (ie installed into caller context) we can find out their order and restore properly I think. Thanks for comments, I'll think about all them at the weekend or early next week. Cyrill
Index: linux-ml.git/fs/eventpoll.c =================================================================== --- linux-ml.git.orig/fs/eventpoll.c +++ linux-ml.git/fs/eventpoll.c @@ -361,7 +361,7 @@ static inline struct epitem *ep_item_fro /* Tells if the epoll_ctl(2) operation needs an event copy from userspace */ static inline int ep_op_has_event(int op) { - return op != EPOLL_CTL_DEL; + return op != EPOLL_CTL_DEL && op != EPOLL_CTL_DUP; } /* Initialize the poll safe wake up structure */ @@ -967,6 +967,20 @@ free_uid: return error; } +static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd) +{ + struct rb_node *rbp; + struct epitem *epi; + + for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) { + epi = rb_entry(rbp, struct epitem, rbn); + if (epi->ffd.fd == tfd) + return epi; + } + + return NULL; +} + /* * Search the file inside the eventpoll tree. The RB tree operations * are protected by the "mtx" mutex, and ep_find() must be called with @@ -979,6 +993,9 @@ static struct epitem *ep_find(struct eve struct epitem *epi, *epir = NULL; struct epoll_filefd ffd; + if (unlikely(!file)) + return ep_find_tfd(ep, fd); + ep_set_ffd(&ffd, file, fd); for (rbp = ep->rbr.rb_node; rbp; ) { epi = rb_entry(rbp, struct epitem, rbn); @@ -1787,6 +1804,28 @@ static void clear_tfile_check_list(void) INIT_LIST_HEAD(&tfile_check_list); } +static int ep_install_tfd(struct eventpoll *ep, struct epitem *epi) +{ + struct file *file; + int ret = -ENOENT; + + rcu_read_lock(); + if (get_file_rcu(epi->ffd.file)) + file = epi->ffd.file; + else + file = NULL; + rcu_read_unlock(); + + if (file) { + ret = get_unused_fd_flags(0); + if (ret >= 0) + fd_install(ret, file); + else + fput(file); + } + return ret; +} + /* * Open an eventpoll file descriptor. */ @@ -1867,15 +1906,24 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, in if (!f.file) goto error_return; - /* Get the "struct file *" for the target file */ - tf = fdget(fd); - if (!tf.file) - goto error_fput; - - /* The target file descriptor must support poll */ - error = -EPERM; - if (!tf.file->f_op->poll) - goto error_tgt_fput; + if (likely(op != EPOLL_CTL_DUP)) { + /* Get the "struct file *" for the target file */ + tf = fdget(fd); + if (!tf.file) + goto error_fput; + + /* The target file descriptor must support poll */ + error = -EPERM; + if (!tf.file->f_op->poll) + goto error_tgt_fput; + } else { + /* + * A special case where target file + * is to be looked up and installed + * into a caller. + */ + memset(&tf, 0, sizeof(tf)); + } /* Check if EPOLLWAKEUP is allowed */ if (ep_op_has_event(op)) @@ -1972,6 +2020,12 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, in else error = -ENOENT; break; + case EPOLL_CTL_DUP: + if (epi) + error = ep_install_tfd(ep, epi); + else + error = -ENOENT; + break; case EPOLL_CTL_MOD: if (epi) { if (!(epi->event.events & EPOLLEXCLUSIVE)) { Index: linux-ml.git/include/uapi/linux/eventpoll.h =================================================================== --- linux-ml.git.orig/include/uapi/linux/eventpoll.h +++ linux-ml.git/include/uapi/linux/eventpoll.h @@ -25,6 +25,7 @@ #define EPOLL_CTL_ADD 1 #define EPOLL_CTL_DEL 2 #define EPOLL_CTL_MOD 3 +#define EPOLL_CTL_DUP 4 /* Set exclusive wakeup mode for the target file descriptor */ #define EPOLLEXCLUSIVE (1 << 28)
When we checkpoint a process we look into /proc/<pid>/fdinfo/<fd> of eventpoll file and parse target files list from there. In most situations this is fine because target file is present in the /proc/<pid>/fd/ list. But in case if file descriptor was dup'ed or transferred via unix socket and closed after, it might not be in the list and we can't figure out which file descriptor to pass into epoll_ctl call. To resolve this tie lets add EPOLL_CTL_DUP operation which simply takes target file descriptor number and installs it into a caller's file table, thus we can use kcmp() syscall and figure out which exactly file to be added into eventpoll on restore procedure. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Andrey Vagin <avagin@openvz.org> CC: Pavel Emelyanov <xemul@virtuozzo.com> CC: Al Viro <viro@zeniv.linux.org.uk> CC: Andrew Morton <akpm@linuxfoundation.org> CC: Michael Kerrisk <mtk.manpages@gmail.com> CC: Kir Kolyshkin <kir@openvz.org> --- fs/eventpoll.c | 74 +++++++++++++++++++++++++++++++++++------ include/uapi/linux/eventpoll.h | 1 2 files changed, 65 insertions(+), 10 deletions(-)