diff mbox series

[net] netlink: disable IRQs for netlink_lock_table()

Message ID 20210517163807.4d305e53c177.Ic19a47c0690e366ee84e3957b73ec6baddffad8a@changeid (mailing list archive)
State Accepted
Commit 1d482e666b8e74c7555dbdfbfb77205eeed3ff2d
Delegated to: Netdev Maintainers
Headers show
Series [net] netlink: disable IRQs for netlink_lock_table() | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers warning 9 maintainers not CCed: yhs@fb.com marcelo.leitner@gmail.com ast@kernel.org 0x7f454c46@gmail.com rdunlap@infradead.org fw@strlen.de davem@davemloft.net yangyingliang@huawei.com kuba@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 4 this patch: 4
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 15 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/header_inline success Link

Commit Message

Johannes Berg May 17, 2021, 2:38 p.m. UTC
From: Johannes Berg <johannes.berg@intel.com>

Syzbot reports that in mac80211 we have a potential deadlock
between our "local->stop_queue_reasons_lock" (spinlock) and
netlink's nl_table_lock (rwlock). This is because there's at
least one situation in which we might try to send a netlink
message with this spinlock held while it is also possible to
take the spinlock from a hardirq context, resulting in the
following deadlock scenario reported by lockdep:

       CPU0                    CPU1
       ----                    ----
  lock(nl_table_lock);
                               local_irq_disable();
                               lock(&local->queue_stop_reason_lock);
                               lock(nl_table_lock);
  <Interrupt>
    lock(&local->queue_stop_reason_lock);

This seems valid, we can take the queue_stop_reason_lock in
any kind of context ("CPU0"), and call ieee80211_report_ack_skb()
with the spinlock held and IRQs disabled ("CPU1") in some
code path (ieee80211_do_stop() via ieee80211_free_txskb()).

Short of disallowing netlink use in scenarios like these
(which would be rather complex in mac80211's case due to
the deep callchain), it seems the only fix for this is to
disable IRQs while nl_table_lock is held to avoid hitting
this scenario, this disallows the "CPU0" portion of the
reported deadlock.

Note that the writer side (netlink_table_grab()) already
disables IRQs for this lock.

Unfortunately though, this seems like a huge hammer, and
maybe the whole netlink table locking should be reworked.

Reported-by: syzbot+69ff9dff50dcfe14ddd4@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/netlink/af_netlink.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org May 17, 2021, 10:40 p.m. UTC | #1
Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Mon, 17 May 2021 16:38:09 +0200 you wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> Syzbot reports that in mac80211 we have a potential deadlock
> between our "local->stop_queue_reasons_lock" (spinlock) and
> netlink's nl_table_lock (rwlock). This is because there's at
> least one situation in which we might try to send a netlink
> message with this spinlock held while it is also possible to
> take the spinlock from a hardirq context, resulting in the
> following deadlock scenario reported by lockdep:
> 
> [...]

Here is the summary with links:
  - [net] netlink: disable IRQs for netlink_lock_table()
    https://git.kernel.org/netdev/net/c/1d482e666b8e

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
diff mbox series

Patch

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 3a62f97acf39..6133e412b948 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -461,11 +461,13 @@  void netlink_table_ungrab(void)
 static inline void
 netlink_lock_table(void)
 {
+	unsigned long flags;
+
 	/* read_lock() synchronizes us to netlink_table_grab */
 
-	read_lock(&nl_table_lock);
+	read_lock_irqsave(&nl_table_lock, flags);
 	atomic_inc(&nl_table_users);
-	read_unlock(&nl_table_lock);
+	read_unlock_irqrestore(&nl_table_lock, flags);
 }
 
 static inline void