From patchwork Tue Feb 21 16:59:46 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cyrill Gorcunov X-Patchwork-Id: 9585209 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 3FECA600CA for ; Tue, 21 Feb 2017 17:14:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2C23428559 for ; Tue, 21 Feb 2017 17:14:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2105E2861E; Tue, 21 Feb 2017 17:14:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6CDC128559 for ; Tue, 21 Feb 2017 17:14:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754060AbdBURNx (ORCPT ); Tue, 21 Feb 2017 12:13:53 -0500 Received: from mail-lf0-f65.google.com ([209.85.215.65]:33428 "EHLO mail-lf0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754052AbdBURNl (ORCPT ); Tue, 21 Feb 2017 12:13:41 -0500 Received: by mail-lf0-f65.google.com with SMTP id 86so5422111lfv.0; Tue, 21 Feb 2017 09:13:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:message-id:user-agent:date:from:to:cc:subject:mime-version :content-disposition; bh=lRsvNKqC+cGkGxx9QMjFkqYI2knp+fMUyJAQJKwnz7E=; b=iPNjHTdd2OMXxiUg4XIfPp+6jL9xWQ2vIXWLmwAhcZTT3DPAHHWDX/q7WWOj3N/GNy Ba6fSrNelCbPRiLU9QUe9uIFMPsrPGYot8iaqphst1BldbEhBw/BlagHkxsw/knoS7vr 7DTMveOO3npGxCEzEV0+UGNngWT9F4vnfwa3C2qomMuqsyyPHTyuRB6ZZLzyIOXibBID uaJYB2K7ItYXTfmhDXWT6BLbjsyUSDBxGich16A1ixudzjIdtmehq3a7cNnwzBJXjM8I cGv2s0bY/54KHmhHAXa9KyfCYILOhXKxRecodOFiDsnqNSFEnG2HVQAVNI6Leud14lzq xc5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:message-id:user-agent:date:from:to:cc :subject:mime-version:content-disposition; bh=lRsvNKqC+cGkGxx9QMjFkqYI2knp+fMUyJAQJKwnz7E=; b=KTRahm9Cxl0ZM6Hda1aJUXiaMlF5dhRaRR7bs8ixLAontgLLrTieJ+LeC+uD2eDqde dk5oNcXdd28/+3Mp65tqXzukjSGBovs8uR3O9orUWoXpBkVHnwMoO/KsZaCpXDetgszR AVKXVFu6txmcfiwVHGwkaP4Jf+WV59FMzhqe5tg0776krSdFlO5nr1++cG7R0D784gKB 9qSEQtI2CHxo70NmF/n3+4Ju6Y7RZTj0U/nPKJwfQS1TLH9grrtvynjBH0jII0NnJgoF Z3uoxysd7vwwFPGzwa9VSVXMVGLS8t2kQXZSnN3i5skwEQifujRZqBDV3ssL0TlR0X8D FRDw== X-Gm-Message-State: AMke39nZqsTnLb1x1+0QeHlYvTni6WnJ8zf7WmV3mgfyJzXBYmT6HZPuENnZqejEfIGitA== X-Received: by 10.46.0.218 with SMTP id e87mr7169482lji.137.1487697218785; Tue, 21 Feb 2017 09:13:38 -0800 (PST) Received: from uranus.localdomain ([5.19.3.67]) by smtp.gmail.com with ESMTPSA id a138sm6348644lfb.2.2017.02.21.09.13.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Feb 2017 09:13:37 -0800 (PST) Received: by uranus.localdomain (Postfix, from userid 1000) id A6CC3201A4; Tue, 21 Feb 2017 20:12:55 +0300 (MSK) Message-Id: <20170221171255.023016858@openvz.org> User-Agent: quilt/0.64 Date: Tue, 21 Feb 2017 19:59:46 +0300 From: Cyrill Gorcunov To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org Cc: viro@zeniv.linux.org.uk, akpm@linuxfoundation.org, avagin@virtuozzo.com, xemul@virtuozzo.com, mtk.manpages@gmail.com, kir@openvz.org, gorcunov@openvz.org, luto@amacapital.net, jbaron@akamai.com, Andrey Vagin Subject: [RFC 2/3] kcmp: Add KCMP_EPOLL_TFD mode to compare epoll target files MIME-Version: 1.0 Content-Disposition: inline; filename=kcmp-epoll Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP With current epoll architecture target files are addressed with file_struct and file descriptor number, where the last is not unique. Moreover files can be transferred from another process via unix socket, added into queue and closed then so we won't find this descriptor in the task fdinfo list. Thus to checkpoint and restore such processes CRIU needs to find out where exactly target file is present to add it into the epoll queue. For this sake one can use kcmp call where some particular target file from the queue is compared with arbitrary file passed as an argument. Because epoll target files can have same file descriptor number but different file_struct a caller should explicitly specify the offset within such entries. To test if some particular file is matching entry inside epoll one have to - fill kcmp_epoll_slot structure with epoll file descriptor, target file number and target file offset (in case if only one target is present then it should be 0) - call kcmp as kcmp(pid1, pid2, KCMP_EPOLL_TFD, fd, &kcmp_epoll_slot) - the kernel fetch file pointer matching file descriptor @fd of pid1 - lookups for file struct in epoll queue of pid2 and returns traditional 0,1,2 result for sorting purpose Signed-off-by: Cyrill Gorcunov CC: Al Viro CC: Andrew Morton CC: Andrey Vagin CC: Pavel Emelyanov CC: Michael Kerrisk CC: Kir Kolyshkin CC: Jason Baron CC: Andy Lutomirski --- fs/eventpoll.c | 42 ++++++++++++++++++++++++++++++++++++++++++ include/linux/eventpoll.h | 3 +++ include/uapi/linux/kcmp.h | 10 ++++++++++ kernel/kcmp.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 99 insertions(+) Index: linux-ml.git/fs/eventpoll.c =================================================================== --- linux-ml.git.orig/fs/eventpoll.c +++ linux-ml.git/fs/eventpoll.c @@ -1000,6 +1000,48 @@ static struct epitem *ep_find(struct eve return epir; } +static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long toff) +{ + struct rb_node *rbp; + struct epitem *epi; + + for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) { + epi = rb_entry(rbp, struct epitem, rbn); + if (epi->ffd.fd == tfd) { + if (toff == 0) + return epi; + else + toff--; + } + cond_resched(); + } + + return NULL; +} + +struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, + unsigned long toff) +{ + struct file *file_raw; + struct eventpoll *ep; + struct epitem *epi; + + if (!is_file_epoll(file)) + return ERR_PTR(-EINVAL); + + ep = file->private_data; + + mutex_lock(&ep->mtx); + epi = ep_find_tfd(ep, tfd, toff); + if (epi) + file_raw = epi->ffd.file; + else + file_raw = ERR_PTR(-ENOENT); + mutex_unlock(&ep->mtx); + + return file_raw; +} + /* * This is the callback that is passed to the wait queue wakeup * mechanism. It is called by the stored file descriptors when they Index: linux-ml.git/include/linux/eventpoll.h =================================================================== --- linux-ml.git.orig/include/linux/eventpoll.h +++ linux-ml.git/include/linux/eventpoll.h @@ -14,6 +14,7 @@ #define _LINUX_EVENTPOLL_H #include +#include /* Forward declarations to avoid compiler errors */ @@ -22,6 +23,8 @@ struct file; #ifdef CONFIG_EPOLL +struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long toff); + /* Used to initialize the epoll bits inside the "struct file" */ static inline void eventpoll_init_file(struct file *file) { Index: linux-ml.git/include/uapi/linux/kcmp.h =================================================================== --- linux-ml.git.orig/include/uapi/linux/kcmp.h +++ linux-ml.git/include/uapi/linux/kcmp.h @@ -1,6 +1,8 @@ #ifndef _UAPI_LINUX_KCMP_H #define _UAPI_LINUX_KCMP_H +#include + /* Comparison type */ enum kcmp_type { KCMP_FILE, @@ -10,8 +12,16 @@ enum kcmp_type { KCMP_SIGHAND, KCMP_IO, KCMP_SYSVSEM, + KCMP_EPOLL_TFD, KCMP_TYPES, }; +/* Slot for KCMP_EPOLL_TFD */ +struct kcmp_epoll_slot { + __u32 efd; /* epoll file descriptor */ + __u32 tfd; /* target file number */ + __u64 toff; /* target offset within same numbered sequence */ +}; + #endif /* _UAPI_LINUX_KCMP_H */ Index: linux-ml.git/kernel/kcmp.c =================================================================== --- linux-ml.git.orig/kernel/kcmp.c +++ linux-ml.git/kernel/kcmp.c @@ -11,6 +11,10 @@ #include #include #include +#include +#include +#include +#include #include @@ -165,6 +169,46 @@ SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t ret = -EOPNOTSUPP; #endif break; + case KCMP_EPOLL_TFD: { +#ifdef CONFIG_EPOLL + struct file *filp1, *filp_epoll, *filp_tgt; + struct kcmp_epoll_slot slot; + struct files_struct *files; + + if (copy_from_user(&slot, (void *)idx2, sizeof(slot))) { + ret = -EFAULT; + goto err_unlock; + } + + filp1 = get_file_raw_ptr(task1, idx1); + + files = get_files_struct(task2); + if (files) { + spin_lock(&files->file_lock); + filp_epoll = fcheck_files(files, slot.efd); + if (filp_epoll) + get_file(filp_epoll); + spin_unlock(&files->file_lock); + put_files_struct(files); + } else + filp_epoll = NULL; + + if (filp1 && filp_epoll) { + filp_tgt = get_epoll_tfile_raw_ptr(filp_epoll, slot.tfd, slot.toff); + if (IS_ERR(filp_tgt)) + ret = PTR_ERR(filp_tgt); + else + ret = kcmp_ptr(filp1, filp_tgt, KCMP_EPOLL_TFD); + } else + ret = -EBADF; + + if (filp_epoll) + fput(filp_epoll); +#else + ret = -EOPNOTSUPP; +#endif + break; + } default: ret = -EINVAL; break;