From patchwork Thu Nov 8 05:10:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10673485 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5F6571923 for ; Thu, 8 Nov 2018 05:10:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4E6812D683 for ; Thu, 8 Nov 2018 05:10:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 423492D85C; Thu, 8 Nov 2018 05:10:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AAAA62D683 for ; Thu, 8 Nov 2018 05:10:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728943AbeKHOoG (ORCPT ); Thu, 8 Nov 2018 09:44:06 -0500 Received: from smtp2.provo.novell.com ([137.65.250.81]:45010 "EHLO smtp2.provo.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726145AbeKHOoG (ORCPT ); Thu, 8 Nov 2018 09:44:06 -0500 Received: from linux-r8p5.suse.de (prv-ext-foundry1int.gns.novell.com [137.65.251.240]) by smtp2.provo.novell.com with ESMTP (TLS encrypted); Wed, 07 Nov 2018 22:10:17 -0700 From: Davidlohr Bueso To: akpm@linux-foundation.org Cc: jbaron@akamai.com, viro@zeniv.linux.org.uk, dave@stgolabs.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Davidlohr Bueso Subject: [PATCH 1/6] fs/epoll: remove max_nests argument from ep_call_nested() Date: Wed, 7 Nov 2018 21:10:01 -0800 Message-Id: <20181108051006.18751-2-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181108051006.18751-1-dave@stgolabs.net> References: <20181108051006.18751-1-dave@stgolabs.net> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP All callers pass the EP_MAX_NESTS constant already, so lets simplify this a tad and get rid of the redundant parameter for nested eventpolls. Signed-off-by: Davidlohr Bueso --- fs/eventpoll.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 42bbe6824b4b..7ad020f1353b 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -471,7 +471,6 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi) * no re-entered. * * @ncalls: Pointer to the nested_calls structure to be used for this call. - * @max_nests: Maximum number of allowed nesting calls. * @nproc: Nested call core function pointer. * @priv: Opaque data to be passed to the @nproc callback. * @cookie: Cookie to be used to identify this nested call. @@ -480,7 +479,7 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi) * Returns: Returns the code returned by the @nproc callback, or -1 if * the maximum recursion limit has been exceeded. */ -static int ep_call_nested(struct nested_calls *ncalls, int max_nests, +static int ep_call_nested(struct nested_calls *ncalls, int (*nproc)(void *, void *, int), void *priv, void *cookie, void *ctx) { @@ -499,7 +498,7 @@ static int ep_call_nested(struct nested_calls *ncalls, int max_nests, */ list_for_each_entry(tncur, lsthead, llink) { if (tncur->ctx == ctx && - (tncur->cookie == cookie || ++call_nests > max_nests)) { + (tncur->cookie == cookie || ++call_nests > EP_MAX_NESTS)) { /* * Ops ... loop detected or maximum nest level reached. * We abort this wake by breaking the cycle itself. @@ -573,7 +572,7 @@ static void ep_poll_safewake(wait_queue_head_t *wq) { int this_cpu = get_cpu(); - ep_call_nested(&poll_safewake_ncalls, EP_MAX_NESTS, + ep_call_nested(&poll_safewake_ncalls, ep_poll_wakeup_proc, NULL, wq, (void *) (long) this_cpu); put_cpu(); @@ -1333,7 +1332,6 @@ static int reverse_path_check_proc(void *priv, void *cookie, int call_nests) } } else { error = ep_call_nested(&poll_loop_ncalls, - EP_MAX_NESTS, reverse_path_check_proc, child_file, child_file, current); @@ -1367,7 +1365,7 @@ static int reverse_path_check(void) /* let's call this for all tfiles */ list_for_each_entry(current_file, &tfile_check_list, f_tfile_llink) { path_count_init(); - error = ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS, + error = ep_call_nested(&poll_loop_ncalls, reverse_path_check_proc, current_file, current_file, current); if (error) @@ -1876,7 +1874,7 @@ static int ep_loop_check_proc(void *priv, void *cookie, int call_nests) ep_tovisit = epi->ffd.file->private_data; if (ep_tovisit->visited) continue; - error = ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS, + error = ep_call_nested(&poll_loop_ncalls, ep_loop_check_proc, epi->ffd.file, ep_tovisit, current); if (error != 0) @@ -1916,7 +1914,7 @@ static int ep_loop_check(struct eventpoll *ep, struct file *file) int ret; struct eventpoll *ep_cur, *ep_next; - ret = ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS, + ret = ep_call_nested(&poll_loop_ncalls, ep_loop_check_proc, file, ep, current); /* clear visited list */ list_for_each_entry_safe(ep_cur, ep_next, &visited_list, From patchwork Thu Nov 8 05:10:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10673493 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A135A15A6 for ; Thu, 8 Nov 2018 05:10:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 901262D226 for ; Thu, 8 Nov 2018 05:10:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 846732D6AF; Thu, 8 Nov 2018 05:10:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1CB372D226 for ; Thu, 8 Nov 2018 05:10:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728956AbeKHOoH (ORCPT ); Thu, 8 Nov 2018 09:44:07 -0500 Received: from smtp2.provo.novell.com ([137.65.250.81]:56621 "EHLO smtp2.provo.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728915AbeKHOoG (ORCPT ); Thu, 8 Nov 2018 09:44:06 -0500 Received: from linux-r8p5.suse.de (prv-ext-foundry1int.gns.novell.com [137.65.251.240]) by smtp2.provo.novell.com with ESMTP (TLS encrypted); Wed, 07 Nov 2018 22:10:18 -0700 From: Davidlohr Bueso To: akpm@linux-foundation.org Cc: jbaron@akamai.com, viro@zeniv.linux.org.uk, dave@stgolabs.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Davidlohr Bueso Subject: [PATCH 2/6] fs/epoll: simplify ep_send_events_proc() ready-list loop Date: Wed, 7 Nov 2018 21:10:02 -0800 Message-Id: <20181108051006.18751-3-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181108051006.18751-1-dave@stgolabs.net> References: <20181108051006.18751-1-dave@stgolabs.net> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The current logic is a bit convoluted. Lets simplify this with a standard list_for_each_entry_safe() loop instead and just break out after maxevents is reached. While at it, remove an unnecessary indentation level in the loop when there are in fact ready events. Signed-off-by: Davidlohr Bueso --- fs/eventpoll.c | 73 +++++++++++++++++++++++++++++----------------------------- 1 file changed, 37 insertions(+), 36 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 7ad020f1353b..101d46b81f64 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1624,21 +1624,22 @@ static __poll_t ep_send_events_proc(struct eventpoll *ep, struct list_head *head { struct ep_send_events_data *esed = priv; __poll_t revents; - struct epitem *epi; - struct epoll_event __user *uevent; + struct epitem *epi, *tmp; + struct epoll_event __user *uevent = esed->events; struct wakeup_source *ws; poll_table pt; init_poll_funcptr(&pt, NULL); + esed->res = 0; /* * We can loop without lock because we are passed a task private list. * Items cannot vanish during the loop because ep_scan_ready_list() is * holding "mtx" during this call. */ - for (esed->res = 0, uevent = esed->events; - !list_empty(head) && esed->res < esed->maxevents;) { - epi = list_first_entry(head, struct epitem, rdllink); + list_for_each_entry_safe(epi, tmp, head, rdllink) { + if (esed->res >= esed->maxevents) + break; /* * Activate ep->ws before deactivating epi->ws to prevent @@ -1658,42 +1659,42 @@ static __poll_t ep_send_events_proc(struct eventpoll *ep, struct list_head *head list_del_init(&epi->rdllink); - revents = ep_item_poll(epi, &pt, 1); - /* * If the event mask intersect the caller-requested one, * deliver the event to userspace. Again, ep_scan_ready_list() - * is holding "mtx", so no operations coming from userspace + * is holding ep->mtx, so no operations coming from userspace * can change the item. */ - if (revents) { - if (__put_user(revents, &uevent->events) || - __put_user(epi->event.data, &uevent->data)) { - list_add(&epi->rdllink, head); - ep_pm_stay_awake(epi); - if (!esed->res) - esed->res = -EFAULT; - return 0; - } - esed->res++; - uevent++; - if (epi->event.events & EPOLLONESHOT) - epi->event.events &= EP_PRIVATE_BITS; - else if (!(epi->event.events & EPOLLET)) { - /* - * If this file has been added with Level - * Trigger mode, we need to insert back inside - * the ready list, so that the next call to - * epoll_wait() will check again the events - * availability. At this point, no one can insert - * into ep->rdllist besides us. The epoll_ctl() - * callers are locked out by - * ep_scan_ready_list() holding "mtx" and the - * poll callback will queue them in ep->ovflist. - */ - list_add_tail(&epi->rdllink, &ep->rdllist); - ep_pm_stay_awake(epi); - } + revents = ep_item_poll(epi, &pt, 1); + if (!revents) + continue; + + if (__put_user(revents, &uevent->events) || + __put_user(epi->event.data, &uevent->data)) { + list_add(&epi->rdllink, head); + ep_pm_stay_awake(epi); + if (!esed->res) + esed->res = -EFAULT; + return 0; + } + esed->res++; + uevent++; + if (epi->event.events & EPOLLONESHOT) + epi->event.events &= EP_PRIVATE_BITS; + else if (!(epi->event.events & EPOLLET)) { + /* + * If this file has been added with Level + * Trigger mode, we need to insert back inside + * the ready list, so that the next call to + * epoll_wait() will check again the events + * availability. At this point, no one can insert + * into ep->rdllist besides us. The epoll_ctl() + * callers are locked out by + * ep_scan_ready_list() holding "mtx" and the + * poll callback will queue them in ep->ovflist. + */ + list_add_tail(&epi->rdllink, &ep->rdllist); + ep_pm_stay_awake(epi); } } From patchwork Thu Nov 8 05:10:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10673491 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6C6CB15A6 for ; Thu, 8 Nov 2018 05:10:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5A49F2D226 for ; Thu, 8 Nov 2018 05:10:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4C0192D6AF; Thu, 8 Nov 2018 05:10:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E31262D226 for ; Thu, 8 Nov 2018 05:10:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728969AbeKHOoI (ORCPT ); Thu, 8 Nov 2018 09:44:08 -0500 Received: from smtp2.provo.novell.com ([137.65.250.81]:52650 "EHLO smtp2.provo.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726145AbeKHOoH (ORCPT ); Thu, 8 Nov 2018 09:44:07 -0500 Received: from linux-r8p5.suse.de (prv-ext-foundry1int.gns.novell.com [137.65.251.240]) by smtp2.provo.novell.com with ESMTP (TLS encrypted); Wed, 07 Nov 2018 22:10:19 -0700 From: Davidlohr Bueso To: akpm@linux-foundation.org Cc: jbaron@akamai.com, viro@zeniv.linux.org.uk, dave@stgolabs.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Davidlohr Bueso Subject: [PATCH 3/6] fs/epoll: drop ovflist branch prediction Date: Wed, 7 Nov 2018 21:10:03 -0800 Message-Id: <20181108051006.18751-4-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181108051006.18751-1-dave@stgolabs.net> References: <20181108051006.18751-1-dave@stgolabs.net> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The ep->ovflist is a secondary ready-list to temporarily store events that might occur when doing sproc without holding the ep->wq.lock. This accounts for every time we check for ready events and also send events back to userspace; both callbacks, particularly the later because of copy_to_user, can account for a non-trivial time. As such, the unlikely() check to see if the pointer is being used, seems both misleading and sub-optimal. In fact, we go to an awful lot of trouble to sync both lists, and populating the ovflist is far from an uncommon scenario. For example, profiling a concurrent epoll_wait(2) benchmark, with CONFIG_PROFILE_ANNOTATED_BRANCHES shows that for a two threads a 33% incorrect rate was seen; and when incrementally increasing the number of epoll instances (which is used, for example for multiple queuing load balancing models), up to a 90% incorrect rate was seen. Similarly, by deleting the prediction, 3% throughput boost was seen across incremental threads. Signed-off-by: Davidlohr Bueso --- fs/eventpoll.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 101d46b81f64..347da3f4f5d3 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1153,7 +1153,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v * semantics). All the events that happen during that period of time are * chained in ep->ovflist and requeued later on. */ - if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) { + if (ep->ovflist != EP_UNACTIVE_PTR) { if (epi->next == EP_UNACTIVE_PTR) { epi->next = ep->ovflist; ep->ovflist = epi; From patchwork Thu Nov 8 05:10:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10673487 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B565C15E9 for ; Thu, 8 Nov 2018 05:10:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A60362D683 for ; Thu, 8 Nov 2018 05:10:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9A7FC2D798; Thu, 8 Nov 2018 05:10:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 546AB2D683 for ; Thu, 8 Nov 2018 05:10:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729004AbeKHOoJ (ORCPT ); Thu, 8 Nov 2018 09:44:09 -0500 Received: from smtp2.provo.novell.com ([137.65.250.81]:45309 "EHLO smtp2.provo.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728965AbeKHOoI (ORCPT ); Thu, 8 Nov 2018 09:44:08 -0500 Received: from linux-r8p5.suse.de (prv-ext-foundry1int.gns.novell.com [137.65.251.240]) by smtp2.provo.novell.com with ESMTP (TLS encrypted); Wed, 07 Nov 2018 22:10:20 -0700 From: Davidlohr Bueso To: akpm@linux-foundation.org Cc: jbaron@akamai.com, viro@zeniv.linux.org.uk, dave@stgolabs.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Davidlohr Bueso Subject: [PATCH 4/6] fs/epoll: robustify ep->mtx held checks Date: Wed, 7 Nov 2018 21:10:04 -0800 Message-Id: <20181108051006.18751-5-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181108051006.18751-1-dave@stgolabs.net> References: <20181108051006.18751-1-dave@stgolabs.net> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Insted of just commenting how important it is, lets make it more robust and add a lockdep_assert_held() call. Signed-off-by: Davidlohr Bueso --- fs/eventpoll.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 347da3f4f5d3..6a0c2591e57e 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1637,6 +1637,8 @@ static __poll_t ep_send_events_proc(struct eventpoll *ep, struct list_head *head * Items cannot vanish during the loop because ep_scan_ready_list() is * holding "mtx" during this call. */ + lockdep_assert_held(&ep->mtx); + list_for_each_entry_safe(epi, tmp, head, rdllink) { if (esed->res >= esed->maxevents) break; From patchwork Thu Nov 8 05:10:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10673489 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 22BB015E9 for ; Thu, 8 Nov 2018 05:10:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 124CF2D683 for ; Thu, 8 Nov 2018 05:10:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 062D82D798; Thu, 8 Nov 2018 05:10:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5FBAA2D683 for ; Thu, 8 Nov 2018 05:10:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728999AbeKHOoI (ORCPT ); Thu, 8 Nov 2018 09:44:08 -0500 Received: from smtp2.provo.novell.com ([137.65.250.81]:55028 "EHLO smtp2.provo.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728967AbeKHOoI (ORCPT ); Thu, 8 Nov 2018 09:44:08 -0500 Received: from linux-r8p5.suse.de (prv-ext-foundry1int.gns.novell.com [137.65.251.240]) by smtp2.provo.novell.com with ESMTP (TLS encrypted); Wed, 07 Nov 2018 22:10:20 -0700 From: Davidlohr Bueso To: akpm@linux-foundation.org Cc: jbaron@akamai.com, viro@zeniv.linux.org.uk, dave@stgolabs.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Davidlohr Bueso Subject: [PATCH 5/6] fs/epoll: reduce the scope of wq lock in epoll_wait() Date: Wed, 7 Nov 2018 21:10:05 -0800 Message-Id: <20181108051006.18751-6-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181108051006.18751-1-dave@stgolabs.net> References: <20181108051006.18751-1-dave@stgolabs.net> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch aims at reducing ep wq.lock hold times in epoll_wait(2). For the blocking case, there is no need to constantly take and drop the spinlock, which is only needed to manipulate the waitqueue. The call to ep_events_available() is now lockless, and only exposed to benign races. Here, if false positive (returns available events and does not see another thread deleting an epi from the list) we call into send_events and then the list's state is correctly seen. Otoh, if a false negative and we don't see a list_add_tail(), for example, from irq callback, then it is rechecked again before blocking, which will see the correct state. In order for more accuracy to see concurrent list_del_init(), use the list_empty_careful() variant -- of course, this won't the safe against insertions from wakeup. For the overflow list we obviously need to prevent load/store tearing as we don't want to see partial values while the ready list is disabled. Signed-off-by: Davidlohr Bueso --- fs/eventpoll.c | 114 ++++++++++++++++++++++++++++++--------------------------- 1 file changed, 60 insertions(+), 54 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 6a0c2591e57e..f6c023f085f6 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -381,7 +381,8 @@ static void ep_nested_calls_init(struct nested_calls *ncalls) */ static inline int ep_events_available(struct eventpoll *ep) { - return !list_empty(&ep->rdllist) || ep->ovflist != EP_UNACTIVE_PTR; + return !list_empty(&ep->rdllist) || + READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR; } #ifdef CONFIG_NET_RX_BUSY_POLL @@ -698,7 +699,7 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep, */ spin_lock_irq(&ep->wq.lock); list_splice_init(&ep->rdllist, &txlist); - ep->ovflist = NULL; + WRITE_ONCE(ep->ovflist, NULL); spin_unlock_irq(&ep->wq.lock); /* @@ -712,7 +713,7 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep, * other events might have been queued by the poll callback. * We re-insert them inside the main ready-list here. */ - for (nepi = ep->ovflist; (epi = nepi) != NULL; + for (nepi = READ_ONCE(ep->ovflist); (epi = nepi) != NULL; nepi = epi->next, epi->next = EP_UNACTIVE_PTR) { /* * We need to check if the item is already in the list. @@ -730,7 +731,7 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep, * releasing the lock, events will be queued in the normal way inside * ep->rdllist. */ - ep->ovflist = EP_UNACTIVE_PTR; + WRITE_ONCE(ep->ovflist, EP_UNACTIVE_PTR); /* * Quickly re-inject items left on "txlist". @@ -1153,10 +1154,10 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v * semantics). All the events that happen during that period of time are * chained in ep->ovflist and requeued later on. */ - if (ep->ovflist != EP_UNACTIVE_PTR) { + if (READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR) { if (epi->next == EP_UNACTIVE_PTR) { - epi->next = ep->ovflist; - ep->ovflist = epi; + epi->next = READ_ONCE(ep->ovflist); + WRITE_ONCE(ep->ovflist, epi); if (epi->ws) { /* * Activate ep->ws since epi->ws may get @@ -1762,10 +1763,17 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, } else if (timeout == 0) { /* * Avoid the unnecessary trip to the wait queue loop, if the - * caller specified a non blocking operation. + * caller specified a non blocking operation. We still need + * lock because we could race and not see an epi being added + * to the ready list while in irq callback. Thus incorrectly + * returning 0 back to userspace. */ timed_out = 1; + spin_lock_irq(&ep->wq.lock); + eavail = ep_events_available(ep); + spin_unlock_irq(&ep->wq.lock); + goto check_events; } @@ -1774,64 +1782,62 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, if (!ep_events_available(ep)) ep_busy_loop(ep, timed_out); + eavail = ep_events_available(ep); + if (eavail) + goto check_events; + + /* + * Busy poll timed out. Drop NAPI ID for now, we can add + * it back in when we have moved a socket with a valid NAPI + * ID onto the ready list. + */ + ep_reset_busy_poll_napi_id(ep); + + /* + * We don't have any available event to return to the caller. + * We need to sleep here, and we will be wake up by + * ep_poll_callback() when events will become available. + */ + init_waitqueue_entry(&wait, current); spin_lock_irq(&ep->wq.lock); + __add_wait_queue_exclusive(&ep->wq, &wait); + spin_unlock_irq(&ep->wq.lock); - if (!ep_events_available(ep)) { + for (;;) { /* - * Busy poll timed out. Drop NAPI ID for now, we can add - * it back in when we have moved a socket with a valid NAPI - * ID onto the ready list. + * We don't want to sleep if the ep_poll_callback() sends us + * a wakeup in between. That's why we set the task state + * to TASK_INTERRUPTIBLE before doing the checks. */ - ep_reset_busy_poll_napi_id(ep); - + set_current_state(TASK_INTERRUPTIBLE); /* - * We don't have any available event to return to the caller. - * We need to sleep here, and we will be wake up by - * ep_poll_callback() when events will become available. + * Always short-circuit for fatal signals to allow + * threads to make a timely exit without the chance of + * finding more events available and fetching + * repeatedly. */ - init_waitqueue_entry(&wait, current); - __add_wait_queue_exclusive(&ep->wq, &wait); - - for (;;) { - /* - * We don't want to sleep if the ep_poll_callback() sends us - * a wakeup in between. That's why we set the task state - * to TASK_INTERRUPTIBLE before doing the checks. - */ - set_current_state(TASK_INTERRUPTIBLE); - /* - * Always short-circuit for fatal signals to allow - * threads to make a timely exit without the chance of - * finding more events available and fetching - * repeatedly. - */ - if (fatal_signal_pending(current)) { - res = -EINTR; - break; - } - if (ep_events_available(ep) || timed_out) - break; - if (signal_pending(current)) { - res = -EINTR; - break; - } - - spin_unlock_irq(&ep->wq.lock); - if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) - timed_out = 1; - - spin_lock_irq(&ep->wq.lock); + if (fatal_signal_pending(current)) { + res = -EINTR; + break; + } + if (ep_events_available(ep) || timed_out) + break; + if (signal_pending(current)) { + res = -EINTR; + break; } - __remove_wait_queue(&ep->wq, &wait); - __set_current_state(TASK_RUNNING); + if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) + timed_out = 1; } -check_events: - /* Is it worth to try to dig for events ? */ - eavail = ep_events_available(ep); + __set_current_state(TASK_RUNNING); + + spin_lock_irq(&ep->wq.lock); + __remove_wait_queue(&ep->wq, &wait); spin_unlock_irq(&ep->wq.lock); +check_events: /* * Try to transfer events to user space. In case we get 0 events and * there's still timeout left over, we go trying again in search of From patchwork Thu Nov 8 05:10:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Davidlohr Bueso X-Patchwork-Id: 10673483 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4458115E9 for ; Thu, 8 Nov 2018 05:10:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3348D2D798 for ; Thu, 8 Nov 2018 05:10:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 277FD2D854; Thu, 8 Nov 2018 05:10:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CF2482D798 for ; Thu, 8 Nov 2018 05:10:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729028AbeKHOoM (ORCPT ); Thu, 8 Nov 2018 09:44:12 -0500 Received: from smtp2.provo.novell.com ([137.65.250.81]:45488 "EHLO smtp2.provo.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726145AbeKHOoJ (ORCPT ); Thu, 8 Nov 2018 09:44:09 -0500 Received: from linux-r8p5.suse.de (prv-ext-foundry1int.gns.novell.com [137.65.251.240]) by smtp2.provo.novell.com with ESMTP (TLS encrypted); Wed, 07 Nov 2018 22:10:21 -0700 From: Davidlohr Bueso To: akpm@linux-foundation.org Cc: jbaron@akamai.com, viro@zeniv.linux.org.uk, dave@stgolabs.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 6/6] fs/epoll: avoid barrier after an epoll_wait(2) timeout Date: Wed, 7 Nov 2018 21:10:06 -0800 Message-Id: <20181108051006.18751-7-dave@stgolabs.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20181108051006.18751-1-dave@stgolabs.net> References: <20181108051006.18751-1-dave@stgolabs.net> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Upon timeout, we can just exit out of the loop, without the cost of the changing the task's state smp_store_mb call. Just exit out of the loop and be done - setting the task state afterwards will be, of course, redundant. Signed-off-by: Davidlohr Bueso --- fs/eventpoll.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index f6c023f085f6..ec14e5bcdaa9 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1820,15 +1820,18 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, res = -EINTR; break; } - if (ep_events_available(ep) || timed_out) + + if (ep_events_available(ep)) break; if (signal_pending(current)) { res = -EINTR; break; } - if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) + if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) { timed_out = 1; + break; + } } __set_current_state(TASK_RUNNING);