Message ID | 20190108100121.20247-1-rpenyaev@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/1] epoll: remove wrong assert that ep_poll_callback is always called with irqs off | expand |
On 2019-01-08 11:01, Roman Penyaev wrote: > That was wrong assumption that all drivers disable irqs before waking > up > a wait queue. Even assert line is removed the whole logic stays > correct: > epoll always locks rwlock with irqs disabled and by itself does not > call > from interrupts, thus it is up to driver how to call wake_up_locked(), > because if driver does not handle any interrupts (like fuse in the the > report) of course it is safe on its side to take a simple spin_lock. This is wrong and can lead to dead lock: we always call read_lock(), caller can call us with irqs enabled. Another driver, which also calls ep_poll_callback(), can be called from interrupt context (irqs disabled) thus it can interrupt the one who does not disable irqs. Even we take a read_lock() (which should be fine to interrupt), write_lock(), which comes in the middle, can cause a dead lock: #CPU0 #CPU1 task: task: irq: spin_lock(&wq1->lock); ep_poll_callback(): read_lock(&ep->lock) .... write_lock_irq(&ep->lock) .... #waits reads .... >>>>>>>>>>>>>> IRQ CPU1 spin_lock_irqsave(&wq2->lock) ep_poll_callback(): read_lock(&ep->lock); # to avoid write starve should # wait writer to finish, thus # dead lock What we can do: a) disable irqs if we are not in interrupt. b) revert the patch completely. David, is it really crucial in terms of performance to avoid double local_irq_save() on Xen on this ep_poll_callback() hot path? For example why not to do the following: if (!in_interrupt()) local_irq_save(flags); read_lock(ep->lock); with huge comment explaining performance number. Or just give up and simply revert the original patch completely and always call read_lock_irqsave(). -- Roman
On 2019-01-08 04:42, Roman Penyaev wrote: > What we can do: > > a) disable irqs if we are not in interrupt. > b) revert the patch completely. > > David, is it really crucial in terms of performance to avoid double > local_irq_save() on Xen on this ep_poll_callback() hot path? Note that such optimizations are also relevant for baremetal, ie: x86 PUSHF + POPF can be pretty expensive because of insn dependencies. > > For example why not to do the following: > > if (!in_interrupt()) > local_irq_save(flags); > read_lock(ep->lock); > > with huge comment explaining performance number. > > Or just give up and simply revert the original patch completely > and always call read_lock_irqsave(). Yeah so the reason why I had done the other epoll lock irq optimizations was because they were painfully obvious. ep_poll_callback(), however is a different beast, as you've encountered. I vote for not shooting ourselves in the foot and just dropping this patch -- most large performance benefits will come from microbenches anyway. Thanks, Davidlohr
On 2019-01-08 16:16, Davidlohr Bueso wrote: > On 2019-01-08 04:42, Roman Penyaev wrote: >> What we can do: >> >> a) disable irqs if we are not in interrupt. >> b) revert the patch completely. >> >> David, is it really crucial in terms of performance to avoid double >> local_irq_save() on Xen on this ep_poll_callback() hot path? > > Note that such optimizations are also relevant for baremetal, ie: x86 > PUSHF + POPF can be pretty expensive because of insn dependencies. > >> >> For example why not to do the following: >> >> if (!in_interrupt()) >> local_irq_save(flags); >> read_lock(ep->lock); >> >> with huge comment explaining performance number. >> >> Or just give up and simply revert the original patch completely >> and always call read_lock_irqsave(). > > Yeah so the reason why I had done the other epoll lock irq > optimizations was because they were painfully obvious. > ep_poll_callback(), however is a different beast, as you've > encountered. I vote for not shooting ourselves in the foot and just > dropping this patch -- most large performance benefits will come from > microbenches anyway. Then I will send another patch which changes read_lock() on read_lock_irqsave(), since simple revert of the original patch won't work. Thanks. -- Roman
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index f307c8679027..f5f88250cdf2 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1217,12 +1217,6 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v __poll_t pollflags = key_to_poll(key); int ewake = 0; - /* - * Called by irq context or interrupts are disabled by the wake_up_*poll - * callers. - */ - lockdep_assert_irqs_disabled(); - read_lock(&ep->lock); ep_set_busy_poll_napi_id(epi);
That was wrong assumption that all drivers disable irqs before waking up a wait queue. Even assert line is removed the whole logic stays correct: epoll always locks rwlock with irqs disabled and by itself does not call from interrupts, thus it is up to driver how to call wake_up_locked(), because if driver does not handle any interrupts (like fuse in the the report) of course it is safe on its side to take a simple spin_lock. Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Reported-by: syzbot+aea82bf9ee6ffd9a79d9@syzkaller.appspotmail.com Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Jason Baron <jbaron@akamai.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- fs/eventpoll.c | 6 ------ 1 file changed, 6 deletions(-)