From patchwork Tue Jan 20 09:57:57 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Fam Zheng <famz@redhat.com>
X-Patchwork-Id: 5666891
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork2.web.kernel.org (Postfix) with ESMTP id 09612C058D
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Tue, 20 Jan 2015 10:01:39 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 1109520268
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Tue, 20 Jan 2015 10:01:38 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 9E9B6203B6
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Tue, 20 Jan 2015 10:01:36 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753596AbbATKAZ (ORCPT
	<rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>);
	Tue, 20 Jan 2015 05:00:25 -0500
Received: from mx1.redhat.com ([209.132.183.28]:44218 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753572AbbATKAW (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Tue, 20 Jan 2015 05:00:22 -0500
Received: from int-mx11.intmail.prod.int.phx2.redhat.com
	(int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id t0K9xRaT003037
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
	verify=FAIL); Tue, 20 Jan 2015 04:59:28 -0500
Received: from ad.nay.redhat.com (dhcp-14-137.nay.redhat.com [10.66.14.137])
	by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with
	ESMTP id t0K9w23o028843; Tue, 20 Jan 2015 04:59:14 -0500
From: Fam Zheng <famz@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>, x86@kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <keescook@chromium.org>, Andy Lutomirski <luto@amacapital.net>,
	David Herrmann <dh.herrmann@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>, Miklos Szeredi <mszeredi@suse.cz>,
	David Drysdale <drysdale@google.com>, Oleg Nesterov <oleg@redhat.com>,
	"David S. Miller" <davem@davemloft.net>, Vivek Goyal <vgoyal@redhat.com>,
	Mike Frysinger <vapier@gentoo.org>, "Theodore Ts'o" <tytso@mit.edu>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Rashika Kheria <rashika.kheria@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Fam Zheng <famz@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	Josh Triplett <josh@joshtriplett.org>,
	"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH RFC 5/6] epoll: Add implementation for epoll_mod_wait
Date: Tue, 20 Jan 2015 17:57:57 +0800
Message-Id: <1421747878-30744-6-git-send-email-famz@redhat.com>
In-Reply-To: <1421747878-30744-1-git-send-email-famz@redhat.com>
References: <1421747878-30744-1-git-send-email-famz@redhat.com>
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI,
	T_RP_MATCHES_RCVD,
	UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

This syscall is a sequence of

1) a number of epoll_ctl calls
2) a epoll_pwait, with timeout enhancement.

The epoll_ctl operations are embeded so that application doesn't have to use
separate syscalls to insert/delete/update the fds before poll. It is more
efficient if the set of fds varies from one poll to another, which is the
common pattern for certain applications. For example, depending on the input
buffer status, a data reading program may decide to temporarily not polling an
fd.

Because the enablement of batching in this interface, even that regular
epoll_ctl call sequence, which manipulates several fds, can be optimized to one
single epoll_ctl_wait (while specifying spec=NULL to skip the poll part).

The only complexity is returning the result of each operation.  For each
epoll_mod_cmd in cmds, the field "error" is an output field that will be stored
the return code *iff* the command is executed (0 for success and -errno of the
equivalent epoll_ctl call), and will be left unchanged if the command is not
executed because some earlier error, for example due to failure of
copy_from_user to copy the array.

Applications can utilize this fact to do error handling: they could initialize
all the epoll_mod_wait.error to a positive value, which is by definition not a
possible output value from epoll_mod_wait. Then when the syscall returned, they
know whether or not the command is executed by comparing each error with the
init value, if they're different, they have the result of the command.
More roughly, they can put any non-zero and not distinguish "not run" from
failure.

Also, timeout parameter is enhanced: timespec is used, compared to the old ms
scalar. This provides higher precision. The parameter field in struct
epoll_wait_spec, "clockid", also makes it possible for users to use a different
clock than the default when it makes more sense.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 fs/eventpoll.c           | 60 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/syscalls.h |  5 ++++
 2 files changed, 65 insertions(+)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index e7a116d..2cc22c9 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2067,6 +2067,66 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
 			      sigmask ? &ksigmask : NULL);
 }
 
+SYSCALL_DEFINE5(epoll_mod_wait, int, epfd, int, flags,
+		int, ncmds, struct epoll_mod_cmd __user *, cmds,
+		struct epoll_wait_spec __user *, spec)
+{
+	struct epoll_mod_cmd *kcmds = NULL;
+	int i, ret = 0;
+	int cmd_size = sizeof(struct epoll_mod_cmd) * ncmds;
+
+	if (flags)
+		return -EINVAL;
+	if (ncmds) {
+		if (!cmds)
+			return -EINVAL;
+		kcmds = kmalloc(cmd_size, GFP_KERNEL);
+		if (!kcmds)
+			return -ENOMEM;
+		if (copy_from_user(kcmds, cmds, cmd_size)) {
+			ret = -EFAULT;
+			goto out;
+		}
+	}
+	for (i = 0; i < ncmds; i++) {
+		struct epoll_event ev = (struct epoll_event) {
+			.events = kcmds[i].events,
+			.data = kcmds[i].data,
+		};
+		if (kcmds[i].flags) {
+			kcmds[i].error = ret = -EINVAL;
+			goto out;
+		}
+		kcmds[i].error = ret = ep_ctl_do(epfd, kcmds[i].op, kcmds[i].fd, ev);
+		if (ret)
+			goto out;
+	}
+	if (spec) {
+		sigset_t ksigmask;
+		struct epoll_wait_spec kspec;
+		ktime_t timeout;
+
+		if(copy_from_user(&kspec, spec, sizeof(struct epoll_wait_spec)))
+			return -EFAULT;
+		if (kspec.sigmask) {
+			if (kspec.sigsetsize != sizeof(sigset_t))
+				return -EINVAL;
+			if (copy_from_user(&ksigmask, kspec.sigmask, sizeof(ksigmask)))
+				return -EFAULT;
+		}
+		timeout = timespec_to_ktime(kspec.timeout);
+		ret = epoll_pwait_do(epfd, kspec.events, kspec.maxevents,
+				     kspec.clockid, timeout,
+				     kspec.sigmask ? &ksigmask : NULL);
+	}
+
+out:
+	if (ncmds && copy_to_user(cmds, kcmds, cmd_size))
+		return -EFAULT;
+	kfree(kcmds);
+	return ret;
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
 		       struct epoll_event __user *, events,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 85893d7..7156c80 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -12,6 +12,8 @@
 #define _LINUX_SYSCALLS_H
 
 struct epoll_event;
+struct epoll_mod_cmd;
+struct epoll_wait_spec;
 struct iattr;
 struct inode;
 struct iocb;
@@ -630,6 +632,9 @@ asmlinkage long sys_epoll_pwait(int epfd, struct epoll_event __user *events,
 				int maxevents, int timeout,
 				const sigset_t __user *sigmask,
 				size_t sigsetsize);
+asmlinkage long sys_epoll_mod_wait(int epfd, int flags,
+				   int ncmds, struct epoll_mod_cmd __user * cmds,
+				   struct epoll_wait_spec __user * spec);
 asmlinkage long sys_gethostname(char __user *name, int len);
 asmlinkage long sys_sethostname(char __user *name, int len);
 asmlinkage long sys_setdomainname(char __user *name, int len);