From patchwork Fri Feb 17 08:30:11 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cyrill Gorcunov X-Patchwork-Id: 9579197 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0DC2F6049F for ; Fri, 17 Feb 2017 08:34:33 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 042D0205D6 for ; Fri, 17 Feb 2017 08:34:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id ED3D22861C; Fri, 17 Feb 2017 08:34:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 74F5B205D6 for ; Fri, 17 Feb 2017 08:34:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755440AbdBQId7 (ORCPT ); Fri, 17 Feb 2017 03:33:59 -0500 Received: from mail-lf0-f66.google.com ([209.85.215.66]:35949 "EHLO mail-lf0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753622AbdBQId6 (ORCPT ); Fri, 17 Feb 2017 03:33:58 -0500 Received: by mail-lf0-f66.google.com with SMTP id h65so3247938lfi.3; Fri, 17 Feb 2017 00:33:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:message-id:user-agent:date:from:to:cc:subject:mime-version :content-disposition; bh=5bq1KBT56b/cvWLf+vi2SC4zUG9/NAIMNhZ+sR1Gbsg=; b=BkLAWH97hI8A7LgLvlLpUW7b86w3u/hMev/P6XUFNkEaPlcxHXwaT81p/G7qwLOhX/ CYEcRvnlZe+500pG/Ncr1LZQH1hexyTy4wjGP28c1D2cnytCehHd6LYS4D0+3xjf4Q1e /2R759DX3e1wBIumLuExjmhp4Pi1cB/dH0EsZssBSNUV7WurKNS16UPYr9ZXUJvKn3YA Su6ICVKYkLrStSFCyWrDnhqHyeChAnmjuKh2mbzKPWSZPjAKm+JA2SBMBmgZTWUGM3zC PzWIYW97Hc/dU8ihRE2+mC1nIhO03e6ZVU7exNXOvXyBM3dPWfFmINtsoC1B6sNWA6Bu brbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:message-id:user-agent:date:from:to:cc :subject:mime-version:content-disposition; bh=5bq1KBT56b/cvWLf+vi2SC4zUG9/NAIMNhZ+sR1Gbsg=; b=OjN5hriggxsGFAsZcB/6NWDwXAm9sYwEDAAKIOFnmEdD6CQ5OvP5mZar931eMjC36c hvRFjWfi5hhsGsvZqr/bLDGS+qcQc4YX0cUXFSqyS6Yohlq9k5TLnRBGwIaCrwx/2LjO OxQ+2wtbasWkZT8Awwm90f6vGTPFZmtpOXAjIwOX2jKBeA9caCb9V2KPIjcO+uV9IAt4 mk+zpp6Z1bcO4gfb8zepnB0tmLSCmuBLDZh8yVPhsDOgLL6jIhx4pKHIuypHjfSQQlCQ 6350MFIP+WKI+N30xgPMwyDs+vXPN2F3sNQk3n4nqZQ/t9SsamMi13fZfxLO8687dnnr 1tkA== X-Gm-Message-State: AMke39nxOEcVbNQnwMzwV6yjpn2LprFTyy42ZleUOOPvUOEwJViok2vhym/lC/cMBZjuSg== X-Received: by 10.25.25.8 with SMTP id 8mr2027561lfz.164.1487320435891; Fri, 17 Feb 2017 00:33:55 -0800 (PST) Received: from uranus.localdomain ([5.19.3.67]) by smtp.gmail.com with ESMTPSA id n98sm2319439lfi.59.2017.02.17.00.33.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Feb 2017 00:33:54 -0800 (PST) Received: by uranus.localdomain (Postfix, from userid 1000) id A05E222725; Fri, 17 Feb 2017 11:33:24 +0300 (MSK) Message-Id: <20170217083324.627615532@openvz.org> User-Agent: quilt/0.64 Date: Fri, 17 Feb 2017 11:30:11 +0300 From: Cyrill Gorcunov To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org Cc: viro@zeniv.linux.org.uk, akpm@linuxfoundation.org, avagin@virtuozzo.com, xemul@virtuozzo.com, mtk.manpages@gmail.com, kir@openvz.org, gorcunov@openvz.org, Andrey Vagin Subject: [RFC 1/2] fs, eventpoll: Add ability to install target file by its number MIME-Version: 1.0 Content-Disposition: inline; filename=epoll-install-fd-2 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When we checkpoint a process we look into /proc//fdinfo/ of eventpoll file and parse target files list from there. In most situations this is fine because target file is present in the /proc//fd/ list. But in case if file descriptor was dup'ed or transferred via unix socket and closed after, it might not be in the list and we can't figure out which file descriptor to pass into epoll_ctl call. To resolve this tie lets add EPOLL_CTL_DUP operation which simply takes target file descriptor number and installs it into a caller's file table, thus we can use kcmp() syscall and figure out which exactly file to be added into eventpoll on restore procedure. Signed-off-by: Cyrill Gorcunov CC: Andrey Vagin CC: Pavel Emelyanov CC: Al Viro CC: Andrew Morton CC: Michael Kerrisk CC: Kir Kolyshkin --- fs/eventpoll.c | 74 +++++++++++++++++++++++++++++++++++------ include/uapi/linux/eventpoll.h | 1 2 files changed, 65 insertions(+), 10 deletions(-) Index: linux-ml.git/fs/eventpoll.c =================================================================== --- linux-ml.git.orig/fs/eventpoll.c +++ linux-ml.git/fs/eventpoll.c @@ -361,7 +361,7 @@ static inline struct epitem *ep_item_fro /* Tells if the epoll_ctl(2) operation needs an event copy from userspace */ static inline int ep_op_has_event(int op) { - return op != EPOLL_CTL_DEL; + return op != EPOLL_CTL_DEL && op != EPOLL_CTL_DUP; } /* Initialize the poll safe wake up structure */ @@ -967,6 +967,20 @@ free_uid: return error; } +static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd) +{ + struct rb_node *rbp; + struct epitem *epi; + + for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) { + epi = rb_entry(rbp, struct epitem, rbn); + if (epi->ffd.fd == tfd) + return epi; + } + + return NULL; +} + /* * Search the file inside the eventpoll tree. The RB tree operations * are protected by the "mtx" mutex, and ep_find() must be called with @@ -979,6 +993,9 @@ static struct epitem *ep_find(struct eve struct epitem *epi, *epir = NULL; struct epoll_filefd ffd; + if (unlikely(!file)) + return ep_find_tfd(ep, fd); + ep_set_ffd(&ffd, file, fd); for (rbp = ep->rbr.rb_node; rbp; ) { epi = rb_entry(rbp, struct epitem, rbn); @@ -1787,6 +1804,28 @@ static void clear_tfile_check_list(void) INIT_LIST_HEAD(&tfile_check_list); } +static int ep_install_tfd(struct eventpoll *ep, struct epitem *epi) +{ + struct file *file; + int ret = -ENOENT; + + rcu_read_lock(); + if (get_file_rcu(epi->ffd.file)) + file = epi->ffd.file; + else + file = NULL; + rcu_read_unlock(); + + if (file) { + ret = get_unused_fd_flags(0); + if (ret >= 0) + fd_install(ret, file); + else + fput(file); + } + return ret; +} + /* * Open an eventpoll file descriptor. */ @@ -1867,15 +1906,24 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, in if (!f.file) goto error_return; - /* Get the "struct file *" for the target file */ - tf = fdget(fd); - if (!tf.file) - goto error_fput; - - /* The target file descriptor must support poll */ - error = -EPERM; - if (!tf.file->f_op->poll) - goto error_tgt_fput; + if (likely(op != EPOLL_CTL_DUP)) { + /* Get the "struct file *" for the target file */ + tf = fdget(fd); + if (!tf.file) + goto error_fput; + + /* The target file descriptor must support poll */ + error = -EPERM; + if (!tf.file->f_op->poll) + goto error_tgt_fput; + } else { + /* + * A special case where target file + * is to be looked up and installed + * into a caller. + */ + memset(&tf, 0, sizeof(tf)); + } /* Check if EPOLLWAKEUP is allowed */ if (ep_op_has_event(op)) @@ -1972,6 +2020,12 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, in else error = -ENOENT; break; + case EPOLL_CTL_DUP: + if (epi) + error = ep_install_tfd(ep, epi); + else + error = -ENOENT; + break; case EPOLL_CTL_MOD: if (epi) { if (!(epi->event.events & EPOLLEXCLUSIVE)) { Index: linux-ml.git/include/uapi/linux/eventpoll.h =================================================================== --- linux-ml.git.orig/include/uapi/linux/eventpoll.h +++ linux-ml.git/include/uapi/linux/eventpoll.h @@ -25,6 +25,7 @@ #define EPOLL_CTL_ADD 1 #define EPOLL_CTL_DEL 2 #define EPOLL_CTL_MOD 3 +#define EPOLL_CTL_DUP 4 /* Set exclusive wakeup mode for the target file descriptor */ #define EPOLLEXCLUSIVE (1 << 28)