Message ID | 164984498582.2000115.4023190177137486137.stgit@warthog.procyon.org.uk (mailing list archive) |
---|---|
State | Accepted |
Commit | ee3b0826b4764f6c13ad6db67495c5a1c38e9025 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] rxrpc: Restore removed timer deletion | expand |
On Wed, Apr 13, 2022 at 3:16 AM David Howells <dhowells@redhat.com> wrote: > > A recent patch[1] from Eric Dumazet flipped the order in which the > keepalive timer and the keepalive worker were cancelled in order to fix a > syzbot reported issue[2]. Unfortunately, this enables the mirror image bug > whereby the timer races with rxrpc_exit_net(), restarting the worker after > it has been cancelled: > > CPU 1 CPU 2 > =============== ===================== > if (rxnet->live) > <INTERRUPT> > rxnet->live = false; > cancel_work_sync(&rxnet->peer_keepalive_work); > rxrpc_queue_work(&rxnet->peer_keepalive_work); > del_timer_sync(&rxnet->peer_keepalive_timer); > > Fix this by restoring the removed del_timer_sync() so that we try to remove > the timer twice. If the timer runs again, it should see ->live == false > and not restart the worker. > > Fixes: 1946014ca3b1 ("rxrpc: fix a race in rxrpc_exit_net()") > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Eric Dumazet <edumazet@google.com> > cc: Marc Dionne <marc.dionne@auristor.com> > cc: linux-afs@lists.infradead.org > Link: https://lore.kernel.org/r/20220404183439.3537837-1-eric.dumazet@gmail.com/ [1] > Link: https://syzkaller.appspot.com/bug?extid=724378c4bb58f703b09a [2] > --- > > net/rxrpc/net_ns.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/rxrpc/net_ns.c b/net/rxrpc/net_ns.c > index f15d6942da45..cc7e30733feb 100644 > --- a/net/rxrpc/net_ns.c > +++ b/net/rxrpc/net_ns.c > @@ -113,7 +113,9 @@ static __net_exit void rxrpc_exit_net(struct net *net) > struct rxrpc_net *rxnet = rxrpc_net(net); > > rxnet->live = false; > + del_timer_sync(&rxnet->peer_keepalive_timer); > cancel_work_sync(&rxnet->peer_keepalive_work); > + /* Remove the timer again as the worker may have restarted it. */ > del_timer_sync(&rxnet->peer_keepalive_timer); > rxrpc_destroy_all_calls(rxnet); > rxrpc_destroy_all_connections(rxnet); > > ok... so we have a timer and a work queue, both activating each other in kind of a ping pong ? Any particular reason not using delayed works ? Thanks.
Eric Dumazet <edumazet@google.com> wrote: > ok... so we have a timer and a work queue, both activating each other > in kind of a ping pong ? Yes. I want to emit regular keepalive pokes. > Any particular reason not using delayed works ? Because there's a race between starting the keepalive timer when a new peer is added and when the keepalive worker is resetting the timer for the next peer in the list. This is why I'm using timer_reduce(). delayed_work doesn't currently have such a facility. It's not simple to add because try_to_grab_pending() as called from mod_delayed_work_on() cancels the timer - which is not what I want it to do. David
On Wed, Apr 13, 2022 at 10:41 AM David Howells <dhowells@redhat.com> wrote: > > Eric Dumazet <edumazet@google.com> wrote: > > > ok... so we have a timer and a work queue, both activating each other > > in kind of a ping pong ? > > Yes. I want to emit regular keepalive pokes. > > > Any particular reason not using delayed works ? > > Because there's a race between starting the keepalive timer when a new peer is > added and when the keepalive worker is resetting the timer for the next peer > in the list. This is why I'm using timer_reduce(). delayed_work doesn't > currently have such a facility. It's not simple to add because > try_to_grab_pending() as called from mod_delayed_work_on() cancels the timer - > which is not what I want it to do. > SGTM, thanks ! Reviewed-by: Eric Dumazet <edumazet@google.com>
Hello: This patch was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Wed, 13 Apr 2022 11:16:25 +0100 you wrote: > A recent patch[1] from Eric Dumazet flipped the order in which the > keepalive timer and the keepalive worker were cancelled in order to fix a > syzbot reported issue[2]. Unfortunately, this enables the mirror image bug > whereby the timer races with rxrpc_exit_net(), restarting the worker after > it has been cancelled: > > CPU 1 CPU 2 > =============== ===================== > if (rxnet->live) > <INTERRUPT> > rxnet->live = false; > cancel_work_sync(&rxnet->peer_keepalive_work); > rxrpc_queue_work(&rxnet->peer_keepalive_work); > del_timer_sync(&rxnet->peer_keepalive_timer); > > [...] Here is the summary with links: - [net] rxrpc: Restore removed timer deletion https://git.kernel.org/netdev/net/c/ee3b0826b476 You are awesome, thank you!
diff --git a/net/rxrpc/net_ns.c b/net/rxrpc/net_ns.c index f15d6942da45..cc7e30733feb 100644 --- a/net/rxrpc/net_ns.c +++ b/net/rxrpc/net_ns.c @@ -113,7 +113,9 @@ static __net_exit void rxrpc_exit_net(struct net *net) struct rxrpc_net *rxnet = rxrpc_net(net); rxnet->live = false; + del_timer_sync(&rxnet->peer_keepalive_timer); cancel_work_sync(&rxnet->peer_keepalive_work); + /* Remove the timer again as the worker may have restarted it. */ del_timer_sync(&rxnet->peer_keepalive_timer); rxrpc_destroy_all_calls(rxnet); rxrpc_destroy_all_connections(rxnet);
A recent patch[1] from Eric Dumazet flipped the order in which the keepalive timer and the keepalive worker were cancelled in order to fix a syzbot reported issue[2]. Unfortunately, this enables the mirror image bug whereby the timer races with rxrpc_exit_net(), restarting the worker after it has been cancelled: CPU 1 CPU 2 =============== ===================== if (rxnet->live) <INTERRUPT> rxnet->live = false; cancel_work_sync(&rxnet->peer_keepalive_work); rxrpc_queue_work(&rxnet->peer_keepalive_work); del_timer_sync(&rxnet->peer_keepalive_timer); Fix this by restoring the removed del_timer_sync() so that we try to remove the timer twice. If the timer runs again, it should see ->live == false and not restart the worker. Fixes: 1946014ca3b1 ("rxrpc: fix a race in rxrpc_exit_net()") Signed-off-by: David Howells <dhowells@redhat.com> cc: Eric Dumazet <edumazet@google.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20220404183439.3537837-1-eric.dumazet@gmail.com/ [1] Link: https://syzkaller.appspot.com/bug?extid=724378c4bb58f703b09a [2] --- net/rxrpc/net_ns.c | 2 ++ 1 file changed, 2 insertions(+)