Message ID | 20200203205907.291929-1-rpenyaev@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] epoll: fix possible lost wakeup on epoll_ctl() path | expand |
Hi Andrew, Could you please suggest me, do I need to include Reported-by tag, or reference to the bug is enough? -- Roman On 2020-02-03 21:59, Roman Penyaev wrote: > This fixes possible lost wakeup introduced by the a218cc491420. > Originally modifications to ep->wq were serialized by ep->wq.lock, > but in the a218cc491420 new rw lock was introduced in order to > relax fd event path, i.e. callers of ep_poll_callback() function. > > After the change ep_modify and ep_insert (both are called on > epoll_ctl() path) were switched to ep->lock, but ep_poll > (epoll_wait) was using ep->wq.lock on wqueue list modification. > > The bug doesn't lead to any wqueue list corruptions, because wake up > path and list modifications were serialized by ep->wq.lock > internally, but actual waitqueue_active() check prior wake_up() > call can be reordered with modification of ep ready list, thus > wake up can be lost. > > Current patch replaces ep->wq.lock with the ep->lock for wqueue > modifications, thus wake up path always observes activeness of > the wqueue correcty. > > Fixes: a218cc491420 ("epoll: use rwlock in order to reduce > ep_poll_callback() contention") > References: https://bugzilla.kernel.org/show_bug.cgi?id=205933 > Signed-off-by: Roman Penyaev <rpenyaev@suse.de> > Cc: Max Neunhoeffer <max@arangodb.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io> > Cc: Davidlohr Bueso <dbueso@suse.de> > Cc: Jason Baron <jbaron@akamai.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: linux-fsdevel@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > --- > fs/eventpoll.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index b041b66002db..eee3c92a9ebf 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -1854,9 +1854,9 @@ static int ep_poll(struct eventpoll *ep, struct > epoll_event __user *events, > waiter = true; > init_waitqueue_entry(&wait, current); > > - spin_lock_irq(&ep->wq.lock); > + write_lock_irq(&ep->lock); > __add_wait_queue_exclusive(&ep->wq, &wait); > - spin_unlock_irq(&ep->wq.lock); > + write_unlock_irq(&ep->lock); > } > > for (;;) { > @@ -1904,9 +1904,9 @@ static int ep_poll(struct eventpoll *ep, struct > epoll_event __user *events, > goto fetch_events; > > if (waiter) { > - spin_lock_irq(&ep->wq.lock); > + write_lock_irq(&ep->lock); > __remove_wait_queue(&ep->wq, &wait); > - spin_unlock_irq(&ep->wq.lock); > + write_unlock_irq(&ep->lock); > } > > return res;
On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote: > Hi Andrew, > > Could you please suggest me, do I need to include Reported-by tag, > or reference to the bug is enough? Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to fully credit the extra work done by the reporter.
On 2020-02-04 17:32, Jakub Kicinski wrote: > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote: >> Hi Andrew, >> >> Could you please suggest me, do I need to include Reported-by tag, >> or reference to the bug is enough? > > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to > fully credit the extra work done by the reporter. Reported-by: Max Neunhoeffer <max@arangodb.com> Bisected-by: Max Neunhoeffer <max@arangodb.com> Correct? -- Roman
On Tue, 04 Feb 2020 18:20:03 +0100, Roman Penyaev wrote: > On 2020-02-04 17:32, Jakub Kicinski wrote: > > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote: > >> Hi Andrew, > >> > >> Could you please suggest me, do I need to include Reported-by tag, > >> or reference to the bug is enough? > > > > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to > > fully credit the extra work done by the reporter. > > Reported-by: Max Neunhoeffer <max@arangodb.com> > Bisected-by: Max Neunhoeffer <max@arangodb.com> > > Correct? That should work, I like the brevity of the single combined Reported-and-bisected-by: Max Neunhoeffer <max@arangodb.com> line but looks like some separate the two even when both point to the same person. Unfortunately Documentation/process is silent on any "bisected-by" use, so you'll have to exercise your own judgement :)
On Tue, 04 Feb 2020 18:20:03 +0100 Roman Penyaev <rpenyaev@suse.de> wrote: > On 2020-02-04 17:32, Jakub Kicinski wrote: > > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote: > >> Hi Andrew, > >> > >> Could you please suggest me, do I need to include Reported-by tag, > >> or reference to the bug is enough? > > > > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to > > fully credit the extra work done by the reporter. > > Reported-by: Max Neunhoeffer <max@arangodb.com> > Bisected-by: Max Neunhoeffer <max@arangodb.com> > > Correct? We could do that, but preferably with Max's approval (please?). Because people sometimes have issues with having their personal info added to the kernel commit record without having being asked.
Dear Andrew and all, Sorry, I did not understand that I should explicitly give consent. This is fine with me. Cheers Max Am 10. Februar 2020 06:59:16 MEZ schrieb Andrew Morton <akpm@linux-foundation.org>: >On Tue, 04 Feb 2020 18:20:03 +0100 Roman Penyaev <rpenyaev@suse.de> >wrote: > >> On 2020-02-04 17:32, Jakub Kicinski wrote: >> > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote: >> >> Hi Andrew, >> >> >> >> Could you please suggest me, do I need to include Reported-by tag, >> >> or reference to the bug is enough? >> > >> > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag >to >> > fully credit the extra work done by the reporter. >> >> Reported-by: Max Neunhoeffer <max@arangodb.com> >> Bisected-by: Max Neunhoeffer <max@arangodb.com> >> >> Correct? > >We could do that, but preferably with Max's approval (please?). > >Because people sometimes have issues with having their personal info >added to the kernel commit record without having being asked.
On 2020-02-10 06:59, Andrew Morton wrote: > On Tue, 04 Feb 2020 18:20:03 +0100 Roman Penyaev <rpenyaev@suse.de> > wrote: > >> On 2020-02-04 17:32, Jakub Kicinski wrote: >> > On Tue, 04 Feb 2020 11:41:42 +0100, Roman Penyaev wrote: >> >> Hi Andrew, >> >> >> >> Could you please suggest me, do I need to include Reported-by tag, >> >> or reference to the bug is enough? >> > >> > Sorry to jump in but FWIW I like the Reported-and-bisected-by tag to >> > fully credit the extra work done by the reporter. >> >> Reported-by: Max Neunhoeffer <max@arangodb.com> >> Bisected-by: Max Neunhoeffer <max@arangodb.com> >> >> Correct? > > We could do that, but preferably with Max's approval (please?). > > Because people sometimes have issues with having their personal info > added to the kernel commit record without having being asked. Max approved. I've just resent v2, no code changes, comment tweaks and 2 explicit tags with Max's name. -- Roman
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index b041b66002db..eee3c92a9ebf 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1854,9 +1854,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, waiter = true; init_waitqueue_entry(&wait, current); - spin_lock_irq(&ep->wq.lock); + write_lock_irq(&ep->lock); __add_wait_queue_exclusive(&ep->wq, &wait); - spin_unlock_irq(&ep->wq.lock); + write_unlock_irq(&ep->lock); } for (;;) { @@ -1904,9 +1904,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, goto fetch_events; if (waiter) { - spin_lock_irq(&ep->wq.lock); + write_lock_irq(&ep->lock); __remove_wait_queue(&ep->wq, &wait); - spin_unlock_irq(&ep->wq.lock); + write_unlock_irq(&ep->lock); } return res;
This fixes possible lost wakeup introduced by the a218cc491420. Originally modifications to ep->wq were serialized by ep->wq.lock, but in the a218cc491420 new rw lock was introduced in order to relax fd event path, i.e. callers of ep_poll_callback() function. After the change ep_modify and ep_insert (both are called on epoll_ctl() path) were switched to ep->lock, but ep_poll (epoll_wait) was using ep->wq.lock on wqueue list modification. The bug doesn't lead to any wqueue list corruptions, because wake up path and list modifications were serialized by ep->wq.lock internally, but actual waitqueue_active() check prior wake_up() call can be reordered with modification of ep ready list, thus wake up can be lost. Current patch replaces ep->wq.lock with the ep->lock for wqueue modifications, thus wake up path always observes activeness of the wqueue correcty. Fixes: a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention") References: https://bugzilla.kernel.org/show_bug.cgi?id=205933 Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Cc: Max Neunhoeffer <max@arangodb.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Jason Baron <jbaron@akamai.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- fs/eventpoll.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)