Message ID | 20190516085810.31077-14-rpenyaev@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | epoll: support pollable epoll from userspace | expand |
On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <rpenyaev@suse.de> wrote: > > epoll_create2() is needed to accept EPOLL_USERPOLL flags > and size, i.e. this patch wires up polling from userspace. Could you add the system call to all syscall*.tbl files at the same time here? Arnd
On 2019-05-16 12:03, Arnd Bergmann wrote: > On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <rpenyaev@suse.de> > wrote: >> >> epoll_create2() is needed to accept EPOLL_USERPOLL flags >> and size, i.e. this patch wires up polling from userspace. > > Could you add the system call to all syscall*.tbl files at the same > time here? For all other archs, you mean? Sure. But what is the rule of thumb? Sometimes people tend to add to the most common x86 and other tables are left untouched, but then you commit the rest, e.g. commit 39036cd2727395c3369b1051005da74059a85317 Author: Arnd Bergmann <arnd@arndb.de> Date: Thu Feb 28 13:59:19 2019 +0100 arch: add pidfd and io_uring syscalls everywhere -- Roman
On Thu, May 16, 2019 at 12:20 PM Roman Penyaev <rpenyaev@suse.de> wrote: > > On 2019-05-16 12:03, Arnd Bergmann wrote: > > On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <rpenyaev@suse.de> > > wrote: > >> > >> epoll_create2() is needed to accept EPOLL_USERPOLL flags > >> and size, i.e. this patch wires up polling from userspace. > > > > Could you add the system call to all syscall*.tbl files at the same > > time here? > > For all other archs, you mean? Sure. But what is the rule of thumb? > Sometimes people tend to add to the most common x86 and other tables > are left untouched, but then you commit the rest, e.g. > > commit 39036cd2727395c3369b1051005da74059a85317 > Author: Arnd Bergmann <arnd@arndb.de> > Date: Thu Feb 28 13:59:19 2019 +0100 > > arch: add pidfd and io_uring syscalls everywhere We only recently introduced syscall.tbl files in a common format, which makes it much easier to add new ones. I hope we can do it for all architectures right away from now on. I just noticed that the new mount API assigns six new system calls as well, but did not use the same numbers across architectures. I hope we can still rectify that before -rc1 and use the next available ones (428..433), then yours should be 434 on all architectures, with the exception of arch/alpha. Arnd
On Thu, 16 May 2019 12:20:50 +0200 Roman Penyaev <rpenyaev@suse.de> wrote: > On 2019-05-16 12:03, Arnd Bergmann wrote: > > On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <rpenyaev@suse.de> > > wrote: > >> > >> epoll_create2() is needed to accept EPOLL_USERPOLL flags > >> and size, i.e. this patch wires up polling from userspace. > > > > Could you add the system call to all syscall*.tbl files at the same > > time here? > > For all other archs, you mean? Sure. But what is the rule of thumb? > Sometimes people tend to add to the most common x86 and other tables > are left untouched, but then you commit the rest, e.g. > > commit 39036cd2727395c3369b1051005da74059a85317 > Author: Arnd Bergmann <arnd@arndb.de> > Date: Thu Feb 28 13:59:19 2019 +0100 > > arch: add pidfd and io_uring syscalls everywhere > I thought the preferred approach was to wire up the architectures on which the submitter has tested the syscall, then allow the arch maintainers to enable the syscall independently? And to help them in this, provide a test suite for the new syscall under tools/testing/selftests/. https://github.com/rouming/test-tools/blob/master/userpolled-epoll.c will certainly help but I do think it would be better to move the test into the kernel tree to keep it maintained and so that many people run it in their various setups on an ongoing basis.
On 2019-05-22 04:33, Andrew Morton wrote: > On Thu, 16 May 2019 12:20:50 +0200 Roman Penyaev <rpenyaev@suse.de> > wrote: > >> On 2019-05-16 12:03, Arnd Bergmann wrote: >> > On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <rpenyaev@suse.de> >> > wrote: >> >> >> >> epoll_create2() is needed to accept EPOLL_USERPOLL flags >> >> and size, i.e. this patch wires up polling from userspace. >> > >> > Could you add the system call to all syscall*.tbl files at the same >> > time here? >> >> For all other archs, you mean? Sure. But what is the rule of thumb? >> Sometimes people tend to add to the most common x86 and other tables >> are left untouched, but then you commit the rest, e.g. >> >> commit 39036cd2727395c3369b1051005da74059a85317 >> Author: Arnd Bergmann <arnd@arndb.de> >> Date: Thu Feb 28 13:59:19 2019 +0100 >> >> arch: add pidfd and io_uring syscalls everywhere >> > > I thought the preferred approach was to wire up the architectures on > which the submitter has tested the syscall, then allow the arch > maintainers to enable the syscall independently? > > And to help them in this, provide a test suite for the new syscall > under tools/testing/selftests/. > > https://github.com/rouming/test-tools/blob/master/userpolled-epoll.c > will certainly help but I do think it would be better to move the test > into the kernel tree to keep it maintained and so that many people run > it in their various setups on an ongoing basis. Yes, on a next iteration I will add the tool to selftests. Thanks. -- Roman
On Wed, May 22, 2019 at 4:33 AM Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 16 May 2019 12:20:50 +0200 Roman Penyaev <rpenyaev@suse.de> wrote: > > On 2019-05-16 12:03, Arnd Bergmann wrote: > > > On Thu, May 16, 2019 at 10:59 AM Roman Penyaev <rpenyaev@suse.de> > > > wrote: > > >> > > >> epoll_create2() is needed to accept EPOLL_USERPOLL flags > > >> and size, i.e. this patch wires up polling from userspace. > > > > > > Could you add the system call to all syscall*.tbl files at the same > > > time here? > > > > For all other archs, you mean? Sure. But what is the rule of thumb? > > Sometimes people tend to add to the most common x86 and other tables > > are left untouched, but then you commit the rest, e.g. > > > > commit 39036cd2727395c3369b1051005da74059a85317 > > Author: Arnd Bergmann <arnd@arndb.de> > > Date: Thu Feb 28 13:59:19 2019 +0100 > > > > arch: add pidfd and io_uring syscalls everywhere > > > > I thought the preferred approach was to wire up the architectures on > which the submitter has tested the syscall, then allow the arch > maintainers to enable the syscall independently? I'm hoping to change that practice now, as it has not worked well in the past: - half the architectures now use asm-generic/unistd.h, so they are already wired up at the same time, regardless of testing - in the other half, not adding them at the same time actually made it harder to test, as it was significantly harder to figure out how to build a modified kernel for a given architecture than to run the test case - Not having all architectures add a new call at the same time caused the architectures to get out of sync when some got added and others did not. Now that we use the same numbers across all architectures, that would be even more confusing. My plan for the long run is to only have one file to which new system calls get added in the future. > And to help them in this, provide a test suite for the new syscall > under tools/testing/selftests/. > > https://github.com/rouming/test-tools/blob/master/userpolled-epoll.c > will certainly help but I do think it would be better to move the test > into the kernel tree to keep it maintained and so that many people run > it in their various setups on an ongoing basis. No disagreement on that. Arnd
On Wed, 22 May 2019 13:14:41 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > > I thought the preferred approach was to wire up the architectures on > > which the submitter has tested the syscall, then allow the arch > > maintainers to enable the syscall independently? > > I'm hoping to change that practice now, as it has not worked well > in the past: > > - half the architectures now use asm-generic/unistd.h, so they are > already wired up at the same time, regardless of testing > - in the other half, not adding them at the same time actually > made it harder to test, as it was significantly harder to figure > out how to build a modified kernel for a given architecture > than to run the test case > - Not having all architectures add a new call at the same time caused > the architectures to get out of sync when some got added and others > did not. Now that we use the same numbers across all architectures, > that would be even more confusing. > > My plan for the long run is to only have one file to which new > system calls get added in the future. Fair enough. We're adding code to architectures without having tested it on those architectures but we do that all the time anyway - I guess there's not a lot of point in special-casing new syscalls.
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 4cd5f982b1e5..f0d271875f4e 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -438,3 +438,4 @@ 425 i386 io_uring_setup sys_io_uring_setup __ia32_sys_io_uring_setup 426 i386 io_uring_enter sys_io_uring_enter __ia32_sys_io_uring_enter 427 i386 io_uring_register sys_io_uring_register __ia32_sys_io_uring_register +428 i386 epoll_create2 sys_epoll_create2 __ia32_sys_epoll_create2 diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 64ca0d06259a..5ee9bb31a552 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -355,6 +355,7 @@ 425 common io_uring_setup __x64_sys_io_uring_setup 426 common io_uring_enter __x64_sys_io_uring_enter 427 common io_uring_register __x64_sys_io_uring_register +428 common epoll_create2 __x64_sys_epoll_create2 # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 9ff666ce7cb5..b44c3a0c4ad0 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2614,6 +2614,14 @@ static int do_epoll_create(int flags, size_t size) return error; } +SYSCALL_DEFINE2(epoll_create2, int, flags, size_t, size) +{ + if (size == 0) + return -EINVAL; + + return do_epoll_create(flags, size); +} + SYSCALL_DEFINE1(epoll_create1, int, flags) { return do_epoll_create(flags, 0); diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index e2870fe1be5b..5049b0d16949 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -358,6 +358,7 @@ asmlinkage long sys_eventfd2(unsigned int count, int flags); /* fs/eventpoll.c */ asmlinkage long sys_epoll_create1(int flags); +asmlinkage long sys_epoll_create2(int flags, size_t size); asmlinkage long sys_epoll_ctl(int epfd, int op, int fd, struct epoll_event __user *event); asmlinkage long sys_epoll_pwait(int epfd, struct epoll_event __user *events, diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index dee7292e1df6..fccfaab366ee 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -832,9 +832,11 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup) __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter) #define __NR_io_uring_register 427 __SYSCALL(__NR_io_uring_register, sys_io_uring_register) +#define __NR_epoll_create2 428 +__SYSCALL(__NR_epoll_create2, sys_epoll_create2) #undef __NR_syscalls -#define __NR_syscalls 428 +#define __NR_syscalls 429 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 4d9ae5ea6caf..665908b8a326 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -65,6 +65,7 @@ COND_SYSCALL(eventfd2); /* fs/eventfd.c */ COND_SYSCALL(epoll_create1); +COND_SYSCALL(epoll_create2); COND_SYSCALL(epoll_ctl); COND_SYSCALL(epoll_pwait); COND_SYSCALL_COMPAT(epoll_pwait);
epoll_create2() is needed to accept EPOLL_USERPOLL flags and size, i.e. this patch wires up polling from userspace. Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org