diff mbox series

[1/3] epoll: fix possible lost wakeup on epoll_ctl() path

Message ID 20200203205907.291929-1-rpenyaev@suse.de (mailing list archive)
State New, archived
Headers show
Series [1/3] epoll: fix possible lost wakeup on epoll_ctl() path | expand

Commit Message

Roman Penyaev Feb. 3, 2020, 8:59 p.m. UTC
This fixes possible lost wakeup introduced by the a218cc491420.
Originally modifications to ep->wq were serialized by ep->wq.lock,
but in the a218cc491420 new rw lock was introduced in order to
relax fd event path, i.e. callers of ep_poll_callback() function.

After the change ep_modify and ep_insert (both are called on
epoll_ctl() path) were switched to ep->lock, but ep_poll
(epoll_wait) was using ep->wq.lock on wqueue list modification.

The bug doesn't lead to any wqueue list corruptions, because wake up
path and list modifications were serialized by ep->wq.lock
internally, but actual waitqueue_active() check prior wake_up()
call can be reordered with modification of ep ready list, thus
wake up can be lost.

Current patch replaces ep->wq.lock with the ep->lock for wqueue
modifications, thus wake up path always observes activeness of
the wqueue correcty.

Fixes: a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Max Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 fs/eventpoll.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Roman Penyaev Feb. 4, 2020, 10:41 a.m. UTC | #1
Hi Andrew,

Could you please suggest me, do I need to include Reported-by tag,
or reference to the bug is enough?

--
Roman

On 2020-02-03 21:59, Roman Penyaev wrote:
> This fixes possible lost wakeup introduced by the a218cc491420.
> Originally modifications to ep->wq were serialized by ep->wq.lock,
> but in the a218cc491420 new rw lock was introduced in order to
> relax fd event path, i.e. callers of ep_poll_callback() function.
> 
> After the change ep_modify and ep_insert (both are called on
> epoll_ctl() path) were switched to ep->lock, but ep_poll
> (epoll_wait) was using ep->wq.lock on wqueue list modification.
> 
> The bug doesn't lead to any wqueue list corruptions, because wake up
> path and list modifications were serialized by ep->wq.lock
> internally, but actual waitqueue_active() check prior wake_up()
> call can be reordered with modification of ep ready list, thus
> wake up can be lost.
> 
> Current patch replaces ep->wq.lock with the ep->lock for wqueue
> modifications, thus wake up path always observes activeness of
> the wqueue correcty.
> 
> Fixes: a218cc491420 ("epoll: use rwlock in order to reduce
> ep_poll_callback() contention")
> References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Max Neunhoeffer <max@arangodb.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
> Cc: Davidlohr Bueso <dbueso@suse.de>
> Cc: Jason Baron <jbaron@akamai.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  fs/eventpoll.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index b041b66002db..eee3c92a9ebf 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1854,9 +1854,9 @@ static int ep_poll(struct eventpoll *ep, struct
> epoll_event __user *events,
>  		waiter = true;
>  		init_waitqueue_entry(&wait, current);
> 
> -		spin_lock_irq(&ep->wq.lock);
> +		write_lock_irq(&ep->lock);
>  		__add_wait_queue_exclusive(&ep->wq, &wait);
> -		spin_unlock_irq(&ep->wq.lock);
> +		write_unlock_irq(&ep->lock);
>  	}
> 
>  	for (;;) {
> @@ -1904,9 +1904,9 @@ static int ep_poll(struct eventpoll *ep, struct
> epoll_event __user *events,
>  		goto fetch_events;
> 
>  	if (waiter) {
> -		spin_lock_irq(&ep->wq.lock);
> +		write_lock_irq(&ep->lock);
>  		__remove_wait_queue(&ep->wq, &wait);
> -		spin_unlock_irq(&ep->wq.lock);
> +		write_unlock_irq(&ep->lock);
>  	}
> 
>  	return res;
Jakub Kicinski Feb. 4, 2020, 4:32 p.m. UTC | #2
On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
> Hi Andrew,
> 
> Could you please suggest me, do I need to include Reported-by tag,
> or reference to the bug is enough?

Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
fully credit the extra work done by the reporter.
Roman Penyaev Feb. 4, 2020, 5:20 p.m. UTC | #3
On 2020-02-04 17:32, Jakub Kicinski wrote:
> On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
>> Hi Andrew,
>> 
>> Could you please suggest me, do I need to include Reported-by tag,
>> or reference to the bug is enough?
> 
> Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
> fully credit the extra work done by the reporter.

Reported-by: Max Neunhoeffer <max@arangodb.com>
Bisected-by: Max Neunhoeffer <max@arangodb.com>

Correct?

--
Roman
Jakub Kicinski Feb. 4, 2020, 10:29 p.m. UTC | #4
On Tue, 04 Feb 2020 18:20:03 +0100, Roman Penyaev wrote:
> On 2020-02-04 17:32, Jakub Kicinski wrote:
> > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:  
> >> Hi Andrew,
> >> 
> >> Could you please suggest me, do I need to include Reported-by tag,
> >> or reference to the bug is enough?  
> > 
> > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
> > fully credit the extra work done by the reporter.  
> 
> Reported-by: Max Neunhoeffer <max@arangodb.com>
> Bisected-by: Max Neunhoeffer <max@arangodb.com>
> 
> Correct?

That should work, I like the brevity of the single combined

Reported-and-bisected-by: Max Neunhoeffer <max@arangodb.com>

line but looks like some separate the two even when both point 
to the same person. 

Unfortunately Documentation/process is silent on any "bisected-by" 
use, so you'll have to exercise your own judgement :)
Andrew Morton Feb. 10, 2020, 5:59 a.m. UTC | #5
On Tue, 04 Feb 2020 18:20:03 +0100 Roman Penyaev <rpenyaev@suse.de> wrote:

> On 2020-02-04 17:32, Jakub Kicinski wrote:
> > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
> >> Hi Andrew,
> >> 
> >> Could you please suggest me, do I need to include Reported-by tag,
> >> or reference to the bug is enough?
> > 
> > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
> > fully credit the extra work done by the reporter.
> 
> Reported-by: Max Neunhoeffer <max@arangodb.com>
> Bisected-by: Max Neunhoeffer <max@arangodb.com>
> 
> Correct?

We could do that, but preferably with Max's approval (please?).

Because people sometimes have issues with having their personal info
added to the kernel commit record without having being asked.
Max Neunhöffer Feb. 10, 2020, 6:44 a.m. UTC | #6
Dear Andrew and all,
Sorry, I did not understand that I should explicitly give consent.
This is fine with me.
Cheers
  Max

Am 10. Februar 2020 06:59:16 MEZ schrieb Andrew Morton <akpm@linux-foundation.org>:
>On Tue, 04 Feb 2020 18:20:03 +0100 Roman Penyaev <rpenyaev@suse.de>
>wrote:
>
>> On 2020-02-04 17:32, Jakub Kicinski wrote:
>> > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
>> >> Hi Andrew,
>> >> 
>> >> Could you please suggest me, do I need to include Reported-by tag,
>> >> or reference to the bug is enough?
>> > 
>> > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag
>to
>> > fully credit the extra work done by the reporter.
>> 
>> Reported-by: Max Neunhoeffer <max@arangodb.com>
>> Bisected-by: Max Neunhoeffer <max@arangodb.com>
>> 
>> Correct?
>
>We could do that, but preferably with Max's approval (please?).
>
>Because people sometimes have issues with having their personal info
>added to the kernel commit record without having being asked.
Roman Penyaev Feb. 10, 2020, 9:43 a.m. UTC | #7
On 2020-02-10 06:59, Andrew Morton wrote:
> On Tue, 04 Feb 2020 18:20:03 +0100 Roman Penyaev <rpenyaev@suse.de> 
> wrote:
> 
>> On 2020-02-04 17:32, Jakub Kicinski wrote:
>> > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote:
>> >> Hi Andrew,
>> >>
>> >> Could you please suggest me, do I need to include Reported-by tag,
>> >> or reference to the bug is enough?
>> >
>> > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to
>> > fully credit the extra work done by the reporter.
>> 
>> Reported-by: Max Neunhoeffer <max@arangodb.com>
>> Bisected-by: Max Neunhoeffer <max@arangodb.com>
>> 
>> Correct?
> 
> We could do that, but preferably with Max's approval (please?).
> 
> Because people sometimes have issues with having their personal info
> added to the kernel commit record without having being asked.

Max approved.  I've just resent v2, no code changes, comment
tweaks and 2 explicit tags with Max's name.

--
Roman
diff mbox series

Patch

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index b041b66002db..eee3c92a9ebf 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1854,9 +1854,9 @@  static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		waiter = true;
 		init_waitqueue_entry(&wait, current);
 
-		spin_lock_irq(&ep->wq.lock);
+		write_lock_irq(&ep->lock);
 		__add_wait_queue_exclusive(&ep->wq, &wait);
-		spin_unlock_irq(&ep->wq.lock);
+		write_unlock_irq(&ep->lock);
 	}
 
 	for (;;) {
@@ -1904,9 +1904,9 @@  static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 		goto fetch_events;
 
 	if (waiter) {
-		spin_lock_irq(&ep->wq.lock);
+		write_lock_irq(&ep->lock);
 		__remove_wait_queue(&ep->wq, &wait);
-		spin_unlock_irq(&ep->wq.lock);
+		write_unlock_irq(&ep->lock);
 	}
 
 	return res;