Message ID | 20190121201456.28338-3-rpenyaev@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | epoll: support pollable epoll from userspace | expand |
So I'm not entirely convinced, but I guess actual numbers and users might convince me otherwise. However, a quick comment: On Tue, Jan 22, 2019 at 9:15 AM Roman Penyaev <rpenyaev@suse.de> wrote: > > +struct epoll_uitem { > + __poll_t ready_events; > + struct epoll_event event; > +}; This really ends up being a horrible data structure. struct epoll_event is declared as struct epoll_event { __poll_t events; __u64 data; } EPOLL_PACKED; and __poll_t is "unsigned". So on pretty much all 64-bit architectures except for x86-64 (which sets that packed attribute), you have a packing hole there in between the events and the data, and "struct epoll_event" has 8-byte alignment. Now, in "struct epoll_uitem", you end up having *another* packing hold in between "ready_events" and "struct epoll_event". So this data structure that has 16 bytes of actual data, ends up being 24 bytes in size. Again, x86-64 happens to be the exception to this, but that's a random small implementation detail, not a design thing. I think "struct epoll_event" was badly designed to begin with to have this issue, but it shouldn't then be an excuse to make things even worse with this array of "struct epoll_uitem" things. Hmm? Linus
On 2019-01-21 22:34, Linus Torvalds wrote: > So I'm not entirely convinced, but I guess actual numbers and users > might convince me otherwise. > > However, a quick comment: > > On Tue, Jan 22, 2019 at 9:15 AM Roman Penyaev <rpenyaev@suse.de> wrote: >> >> +struct epoll_uitem { >> + __poll_t ready_events; >> + struct epoll_event event; >> +}; > > This really ends up being a horrible data structure. > > struct epoll_event is declared as > > struct epoll_event { > __poll_t events; > __u64 data; > } EPOLL_PACKED; > > and __poll_t is "unsigned". So on pretty much all 64-bit architectures > except for x86-64 (which sets that packed attribute), you have a > packing hole there in between the events and the data, and "struct > epoll_event" has 8-byte alignment. > > Now, in "struct epoll_uitem", you end up having *another* packing hold > in between "ready_events" and "struct epoll_event". > > So this data structure that has 16 bytes of actual data, ends up being > 24 bytes in size. > > Again, x86-64 happens to be the exception to this, but that's a random > small implementation detail, not a design thing. > > I think "struct epoll_event" was badly designed to begin with to have > this issue, but it shouldn't then be an excuse to make things even > worse with this array of "struct epoll_uitem" things. > > Hmm? Ha! Yes, you are right. Eyes see "packed" and brain responds "ok, this is 12 bytes, + 4 for ready_events = 16, perfect". I have not paid any attention to how actually this EPOLL_PACKED is defined. Not nice at all. I will unfold the structure like this: /* * Item, shared with userspace. Unfortunately we can't embed epoll_event * structure, because it is badly aligned on all 64-bit archs, except * x86-64 (see EPOLL_PACKED). sizeof(epoll_uitem) == 16 */ struct epoll_uitem { __poll_t ready_events; __poll_t events; __u64 data; }; Also BUILD_BUG_ON(sizeof(epoll_uitem) != 16) somewhere in alloc won't hurt. -- Roman
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 2cc183e86a29..f598442512f3 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -9,6 +9,8 @@ * * Davide Libenzi <davidel@xmailserver.org> * + * Polling from userspace support by Roman Penyaev <rpenyaev@suse.de> + * (C) Copyright 2019 SUSE, All Rights Reserved */ #include <linux/init.h> @@ -109,6 +111,13 @@ #define EP_ITEM_COST (sizeof(struct epitem) + sizeof(struct eppoll_entry)) +/* + * That is around 1.3mb of allocated memory for one epfd. What is more + * important is ->index_length, which should be ^2, so do not increase + * max items number to avoid size doubling of user index. + */ +#define EP_USERPOLL_MAX_ITEMS_NR 65536 + struct epoll_filefd { struct file *file; int fd; diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h index 39dfc29f0f52..690a625ddeb2 100644 --- a/include/uapi/linux/eventpoll.h +++ b/include/uapi/linux/eventpoll.h @@ -79,4 +79,23 @@ struct epoll_event { __u64 data; } EPOLL_PACKED; +struct epoll_uitem { + __poll_t ready_events; + struct epoll_event event; +}; + +#define EPOLL_USERPOLL_HEADER_SIZE 128 +#define EPOLL_USERPOLL_HEADER_MAGIC 0xeb01eb01 + +struct epoll_uheader { + u32 magic; /* epoll user header magic */ + u32 header_length; /* length of the header + items */ + u32 index_length; /* length of the index ring, always pow2 */ + u32 max_items_nr; /* max number of items */ + u32 head; /* updated by userland */ + u32 tail; /* updated by kernel */ + + struct epoll_uitem items[] __aligned(EPOLL_USERPOLL_HEADER_SIZE); +}; + #endif /* _UAPI_LINUX_EVENTPOLL_H */
This one introduces structures of user items array: struct user_epheader - describes inserted epoll items. struct user_epitem - single epoll item, visible to userspace. Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Jason Baron <jbaron@akamai.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrea Parri <andrea.parri@amarulasolutions.com> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- fs/eventpoll.c | 9 +++++++++ include/uapi/linux/eventpoll.h | 19 +++++++++++++++++++ 2 files changed, 28 insertions(+)