diff mbox series

[nf] netfilter: nf_tables: unconditionally flush pending work before notifier

Message ID 20240702140841.3337-1-fw@strlen.de (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [nf] netfilter: nf_tables: unconditionally flush pending work before notifier | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 839 this patch: 839
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: pablo@netfilter.org; 6 maintainers not CCed: kuba@kernel.org coreteam@netfilter.org pabeni@redhat.com kadlec@netfilter.org edumazet@google.com pablo@netfilter.org
netdev/build_clang success Errors and warnings before: 846 this patch: 846
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 862 this patch: 862
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 5 this patch: 5
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-03--12-00 (tests: 666)

Commit Message

Florian Westphal July 2, 2024, 2:08 p.m. UTC
syzbot reports:

KASAN: slab-uaf in nft_ctx_update include/net/netfilter/nf_tables.h:1831
KASAN: slab-uaf in nft_commit_release net/netfilter/nf_tables_api.c:9530
KASAN: slab-uaf int nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
Read of size 2 at addr ffff88802b0051c4 by task kworker/1:1/45
[..]
Workqueue: events nf_tables_trans_destroy_work
Call Trace:
 nft_ctx_update include/net/netfilter/nf_tables.h:1831 [inline]
 nft_commit_release net/netfilter/nf_tables_api.c:9530 [inline]
 nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597

Problem is that the notifier does a conditional flush, but its possible
that the table-to-be-removed is still referenced by transactions being
processed by the worker, so we need to flush unconditionally.

We could make the flush_work depend on whether we found a table to delete
in nf-next to avoid the flush for most cases.

AFAICS this problem is only exposed in nf-next, with
commit e169285f8c56 ("netfilter: nf_tables: do not store nft_ctx in transaction objects"),
with this commit applied there is an unconditional fetch of
table->family which is whats triggering the above splat.

Fixes: 2c9f0293280e ("netfilter: nf_tables: flush pending destroy work before netlink notifier")
Reported-and-tested-by: syzbot+4fd66a69358fc15ae2ad@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4fd66a69358fc15ae2ad
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_tables_api.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Hillf Danton July 3, 2024, 10:35 a.m. UTC | #1
On Tue,  2 Jul 2024 16:08:14 +0200 Florian Westphal <fw@strlen.de>
> syzbot reports:
> 
> KASAN: slab-uaf in nft_ctx_update include/net/netfilter/nf_tables.h:1831
> KASAN: slab-uaf in nft_commit_release net/netfilter/nf_tables_api.c:9530
> KASAN: slab-uaf int nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
> Read of size 2 at addr ffff88802b0051c4 by task kworker/1:1/45
> [..]
> Workqueue: events nf_tables_trans_destroy_work
> Call Trace:
>  nft_ctx_update include/net/netfilter/nf_tables.h:1831 [inline]
>  nft_commit_release net/netfilter/nf_tables_api.c:9530 [inline]
>  nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
> 
> Problem is that the notifier does a conditional flush, but its possible
> that the table-to-be-removed is still referenced by transactions being
> processed by the worker, so we need to flush unconditionally.
> 
Given trans->table goes thru the lifespan of trans, your proposal is a bandaid
if trans outlives table.
Florian Westphal July 3, 2024, 10:52 a.m. UTC | #2
Hillf Danton <hdanton@sina.com> wrote:
> Given trans->table goes thru the lifespan of trans, your proposal is a bandaid
> if trans outlives table.

trans must never outlive table.
Hillf Danton July 3, 2024, 12:09 p.m. UTC | #3
On Wed, 3 Jul 2024 12:52:15 +0200 Florian Westphal <fw@strlen.de>
> Hillf Danton <hdanton@sina.com> wrote:
> > Given trans->table goes thru the lifespan of trans, your proposal is a bandaid
> > if trans outlives table.
> 
> trans must never outlive table.
> 
What is preventing trans from being freed after closing sock, given
trans is freed in workqueue?

	close sock
	queue work
Florian Westphal July 3, 2024, 1:01 p.m. UTC | #4
Hillf Danton <hdanton@sina.com> wrote:
> On Wed, 3 Jul 2024 12:52:15 +0200 Florian Westphal <fw@strlen.de>
> > Hillf Danton <hdanton@sina.com> wrote:
> > > Given trans->table goes thru the lifespan of trans, your proposal is a bandaid
> > > if trans outlives table.
> > 
> > trans must never outlive table.
> > 
> What is preventing trans from being freed after closing sock, given
> trans is freed in workqueue?
> 
> 	close sock
> 	queue work

The notifier acquires the transaction mutex, locking out all other
transactions, so no further transactions requests referencing
the table can be queued.

The work queue is flushed before potentially ripping the table
out.  After this, no transactions referencing the table can exist
anymore; the only transactions than can still be queued are those
coming from a different netns, and tables are scoped per netns.

Table is torn down.  Transaction mutex is released.

Next transaction from userspace can't find the table anymore (its gone),
so no more transactions can be queued for this table.

As I wrote in the commit message, the flush is dumb, this should first
walk to see if there is a matching table to be torn down, and then flush
work queue once before tearing the table down.

But its better to clearly split bug fix and such a change.
Hillf Danton July 4, 2024, 10:35 a.m. UTC | #5
On Wed, 3 Jul 2024 15:01:07 +0200 Florian Westphal <fw@strlen.de>
> Hillf Danton <hdanton@sina.com> wrote:
> > On Wed, 3 Jul 2024 12:52:15 +0200 Florian Westphal <fw@strlen.de>
> > > Hillf Danton <hdanton@sina.com> wrote:
> > > > Given trans->table goes thru the lifespan of trans, your proposal is a bandaid
> > > > if trans outlives table.
> > > 
> > > trans must never outlive table.
> > > 
> > What is preventing trans from being freed after closing sock, given
> > trans is freed in workqueue?
> > 
> > 	close sock
> > 	queue work
> 
> The notifier acquires the transaction mutex, locking out all other
> transactions, so no further transactions requests referencing
> the table can be queued.
> 
As per the syzbot report, trans->table could be instantiated before
notifier acquires the transaction mutex. And in fact the lock helps
trans outlive table even with your patch.

	cpu1			cpu2
	---			---
	transB->table = A
				lock trans mutex
				flush work
				free A
				unlock trans mutex

	queue work to free transB

> The work queue is flushed before potentially ripping the table
> out.  After this, no transactions referencing the table can exist
> anymore; the only transactions than can still be queued are those
> coming from a different netns, and tables are scoped per netns.
> 
> Table is torn down.  Transaction mutex is released.
> 
> Next transaction from userspace can't find the table anymore (its gone),
> so no more transactions can be queued for this table.
> 
> As I wrote in the commit message, the flush is dumb, this should first
> walk to see if there is a matching table to be torn down, and then flush
> work queue once before tearing the table down.
> 
> But its better to clearly split bug fix and such a change.
Florian Westphal July 4, 2024, 10:54 a.m. UTC | #6
Hillf Danton <hdanton@sina.com> wrote:
> On Wed, 3 Jul 2024 15:01:07 +0200 Florian Westphal <fw@strlen.de>
> > Hillf Danton <hdanton@sina.com> wrote:
> > > On Wed, 3 Jul 2024 12:52:15 +0200 Florian Westphal <fw@strlen.de>
> > > > Hillf Danton <hdanton@sina.com> wrote:
> > > > > Given trans->table goes thru the lifespan of trans, your proposal is a bandaid
> > > > > if trans outlives table.
> > > > 
> > > > trans must never outlive table.
> > > > 
> > > What is preventing trans from being freed after closing sock, given
> > > trans is freed in workqueue?
> > > 
> > > 	close sock
> > > 	queue work
> > 
> > The notifier acquires the transaction mutex, locking out all other
> > transactions, so no further transactions requests referencing
> > the table can be queued.
> > 
> As per the syzbot report, trans->table could be instantiated before
> notifier acquires the transaction mutex. And in fact the lock helps
> trans outlive table even with your patch.
> 
> 	cpu1			cpu2
> 	---			---
> 	transB->table = A
> 				lock trans mutex
> 				flush work
> 				free A
> 				unlock trans mutex
> 
> 	queue work to free transB

Can you show a crash reproducer or explain how this assign
and queueing happens unordered wrt. cpu2?

This should look like this:

 	cpu1			cpu2
 	---			---
	lock trans mutex
  				lock trans mutex -> blocks
 	transB->table = A
  	queue work to free transB
	unlock trans mutex
				lock trans mutex returns
  				flush work
  				free A
  				unlock trans mutex
Hillf Danton July 5, 2024, 10:48 a.m. UTC | #7
On Thu, 4 Jul 2024 12:54:18 +0200 Florian Westphal <fw@strlen.de>
> Hillf Danton <hdanton@sina.com> wrote:
> > On Wed, 3 Jul 2024 15:01:07 +0200 Florian Westphal <fw@strlen.de>
> > > Hillf Danton <hdanton@sina.com> wrote:
> > > > On Wed, 3 Jul 2024 12:52:15 +0200 Florian Westphal <fw@strlen.de>
> > > > > Hillf Danton <hdanton@sina.com> wrote:
> > > > > > Given trans->table goes thru the lifespan of trans, your proposal is a bandaid
> > > > > > if trans outlives table.
> > > > > 
> > > > > trans must never outlive table.
> > > > > 
> > > > What is preventing trans from being freed after closing sock, given
> > > > trans is freed in workqueue?
> > > > 
> > > > 	close sock
> > > > 	queue work
> > > 
> > > The notifier acquires the transaction mutex, locking out all other
> > > transactions, so no further transactions requests referencing
> > > the table can be queued.
> > > 
> > As per the syzbot report, trans->table could be instantiated before
> > notifier acquires the transaction mutex. And in fact the lock helps
> > trans outlive table even with your patch.
> > 
> > 	cpu1			cpu2
> > 	---			---
> > 	transB->table = A
> > 				lock trans mutex
> > 				flush work
> > 				free A
> > 				unlock trans mutex
> > 
> > 	queue work to free transB
> 
> Can you show a crash reproducer or explain how this assign
> and queueing happens unordered wrt. cpu2?
> 
Not so difficult.

> This should look like this:
> 
>  	cpu1			cpu2
>  	---			---
> 	lock trans mutex
>   				lock trans mutex -> blocks
>  	transB->table = A
>   	queue work to free transB
> 	unlock trans mutex
> 				lock trans mutex returns
>   				flush work
>   				free A
>   				unlock trans mutex
> 
If your patch is correct, it should survive a warning.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git  1c5fc27bc48a

--- x/net/netfilter/nf_tables_api.c
+++ y/net/netfilter/nf_tables_api.c
@@ -11552,9 +11552,10 @@ static int nft_rcv_nl_event(struct notif
 
 	gc_seq = nft_gc_seq_begin(nft_net);
 
-	if (!list_empty(&nf_tables_destroy_list))
-		nf_tables_trans_destroy_flush_work();
+	nf_tables_trans_destroy_flush_work();
 again:
+	WARN_ON(!list_empty(&nft_net->commit_list));
+
 	list_for_each_entry(table, &nft_net->tables, list) {
 		if (nft_table_has_owner(table) &&
 		    n->portid == table->nlpid) {
--
Florian Westphal July 5, 2024, 11:02 a.m. UTC | #8
Hillf Danton <hdanton@sina.com> wrote:
> > 				lock trans mutex returns
> >   				flush work
> >   				free A
> >   				unlock trans mutex
> > 
> If your patch is correct, it should survive a warning.
> 
> #syz test https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git  1c5fc27bc48a
> 
> --- x/net/netfilter/nf_tables_api.c
> +++ y/net/netfilter/nf_tables_api.c
> @@ -11552,9 +11552,10 @@ static int nft_rcv_nl_event(struct notif
>  
>  	gc_seq = nft_gc_seq_begin(nft_net);
>  
> -	if (!list_empty(&nf_tables_destroy_list))
> -		nf_tables_trans_destroy_flush_work();
> +	nf_tables_trans_destroy_flush_work();
>  again:
> +	WARN_ON(!list_empty(&nft_net->commit_list));
> +

You could officially submit this patch to nf-next, this
is a slow path and the transaction list must be empty here.

I think this change might be useful as it also documents
this requirement.
syzbot July 5, 2024, 11:18 a.m. UTC | #9
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+4fd66a69358fc15ae2ad@syzkaller.appspotmail.com

Tested on:

commit:         1c5fc27b Merge tag 'nf-next-24-06-28' of git://git.ker..
git tree:       https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=152db3d1980000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5264b58fdff6e881
dashboard link: https://syzkaller.appspot.com/bug?extid=4fd66a69358fc15ae2ad
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=1395cd71980000

Note: testing is done by a robot and is best-effort only.
Hillf Danton July 7, 2024, 7:56 a.m. UTC | #10
On Fri, 5 Jul 2024 13:02:18 +0200 Florian Westphal <fw@strlen.de>
> Hillf Danton <hdanton@sina.com> wrote:
> > > 				lock trans mutex returns
> > >   				flush work
> > >   				free A
> > >   				unlock trans mutex
> > > 
> > If your patch is correct, it should survive a warning.
> > 
> > #syz test https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git  1c5fc27bc48a
> > 
> > --- x/net/netfilter/nf_tables_api.c
> > +++ y/net/netfilter/nf_tables_api.c
> > @@ -11552,9 +11552,10 @@ static int nft_rcv_nl_event(struct notif
> >  
> >  	gc_seq = nft_gc_seq_begin(nft_net);
> >  
> > -	if (!list_empty(&nf_tables_destroy_list))
> > -		nf_tables_trans_destroy_flush_work();
> > +	nf_tables_trans_destroy_flush_work();
> >  again:
> > +	WARN_ON(!list_empty(&nft_net->commit_list));
> > +
> 
> You could officially submit this patch to nf-next, this
> is a slow path and the transaction list must be empty here.
> 
> I think this change might be useful as it also documents
> this requirement.

Yes it is boy and the current reproducer triggered another warning [1,2].

[1] https://lore.kernel.org/lkml/20240706231332.3261-1-hdanton@sina.com/
[2] https://lore.kernel.org/lkml/000000000000a42289061c9d452e@google.com/

And feel free to take a look at fput() and sock_put() for instance of
freeing slab pieces asyncally.
Florian Westphal July 7, 2024, 8:08 a.m. UTC | #11
Hillf Danton <hdanton@sina.com> wrote:
> > I think this change might be useful as it also documents
> > this requirement.
> 
> Yes it is boy and the current reproducer triggered another warning [1,2].
> 
> [1] https://lore.kernel.org/lkml/20240706231332.3261-1-hdanton@sina.com/

The WARN is incorrect.  The destroy list can be non-empty; i already
tried to explain why.
Hillf Danton July 8, 2024, 10:56 a.m. UTC | #12
On Sun, 7 Jul 2024 10:08:24 +0200 Florian Westphal <fw@strlen.de>
> Hillf Danton <hdanton@sina.com> wrote:
> > > I think this change might be useful as it also documents
> > > this requirement.
> > 
> > Yes it is boy and the current reproducer triggered another warning [1,2].
> > 
> > [1] https://lore.kernel.org/lkml/20240706231332.3261-1-hdanton@sina.com/
> 
> The WARN is incorrect.  The destroy list can be non-empty; i already
> tried to explain why.
>
That warning as-is could be false positive but it could be triggered with a
single netns.

	cpu1		cpu2		cpu3
	---		---		---
					nf_tables_trans_destroy_work()
					spin_lock(&nf_tables_destroy_list_lock);

					// 1) clear the destroy list
					list_splice_init(&nf_tables_destroy_list, &head);
					spin_unlock(&nf_tables_destroy_list_lock);

			nf_tables_commit_release()
			spin_lock(&nf_tables_destroy_list_lock);

			// 2) refill the destroy list
			list_splice_tail_init(&nft_net->commit_list, &nf_tables_destroy_list);
			spin_unlock(&nf_tables_destroy_list_lock);
			schedule_work(&trans_destroy_work);
			mutex_unlock(&nft_net->commit_mutex);

	nft_rcv_nl_event()
	mutex_lock(&nft_net->commit_mutex);
	flush_work(&trans_destroy_work);
	  start_flush_work()
	    insert_wq_barrier()
	    /*
	     * If @target is currently being executed, schedule the
	     * barrier to the worker; otherwise, put it after @target.
	     */

	// 3) flush work ends with the refilled destroy list left intact
	tear tables down
Florian Westphal July 8, 2024, 11:58 a.m. UTC | #13
Hillf Danton <hdanton@sina.com> wrote:
> On Sun, 7 Jul 2024 10:08:24 +0200 Florian Westphal <fw@strlen.de>
> > Hillf Danton <hdanton@sina.com> wrote:
> > > > I think this change might be useful as it also documents
> > > > this requirement.
> > > 
> > > Yes it is boy and the current reproducer triggered another warning [1,2].
> > > 
> > > [1] https://lore.kernel.org/lkml/20240706231332.3261-1-hdanton@sina.com/
> > 
> > The WARN is incorrect.  The destroy list can be non-empty; i already
> > tried to explain why.
> >
> That warning as-is could be false positive but it could be triggered with a
> single netns.

How?

> 	cpu1		cpu2		cpu3
> 	---		---		---
> 					nf_tables_trans_destroy_work()
> 					spin_lock(&nf_tables_destroy_list_lock);
> 
> 					// 1) clear the destroy list
> 					list_splice_init(&nf_tables_destroy_list, &head);
> 					spin_unlock(&nf_tables_destroy_list_lock);
> 
> 			nf_tables_commit_release()
> 			spin_lock(&nf_tables_destroy_list_lock);
> 
> 			// 2) refill the destroy list
> 			list_splice_tail_init(&nft_net->commit_list, &nf_tables_destroy_list);
> 			spin_unlock(&nf_tables_destroy_list_lock);
> 			schedule_work(&trans_destroy_work);
> 			mutex_unlock(&nft_net->commit_mutex);

So you're saying work can be IDLE after schedule_work()?

I'm not following at all.
Hillf Danton July 8, 2024, 12:17 p.m. UTC | #14
On Mon, 8 Jul 2024 13:58:31 +0200 Florian Westphal <fw@strlen.de>
> Hillf Danton <hdanton@sina.com> wrote:
> > On Sun, 7 Jul 2024 10:08:24 +0200 Florian Westphal <fw@strlen.de>
> > > Hillf Danton <hdanton@sina.com> wrote:
> > > > > I think this change might be useful as it also documents
> > > > > this requirement.
> > > > 
> > > > Yes it is boy and the current reproducer triggered another warning [1,2].
> > > > 
> > > > [1] https://lore.kernel.org/lkml/20240706231332.3261-1-hdanton@sina.com/
> > > 
> > > The WARN is incorrect.  The destroy list can be non-empty; i already
> > > tried to explain why.
> > >
> > That warning as-is could be false positive but it could be triggered with a
> > single netns.
> 
> How?
> 
You saw the below cpu diagram, no?

> > 	cpu1		cpu2		cpu3
> > 	---		---		---
> > 					nf_tables_trans_destroy_work()
> > 					spin_lock(&nf_tables_destroy_list_lock);
> > 
> > 					// 1) clear the destroy list
> > 					list_splice_init(&nf_tables_destroy_list, &head);
> > 					spin_unlock(&nf_tables_destroy_list_lock);
> > 
> > 			nf_tables_commit_release()
> > 			spin_lock(&nf_tables_destroy_list_lock);
> > 
> > 			// 2) refill the destroy list
> > 			list_splice_tail_init(&nft_net->commit_list, &nf_tables_destroy_list);
> > 			spin_unlock(&nf_tables_destroy_list_lock);
> > 			schedule_work(&trans_destroy_work);
> > 			mutex_unlock(&nft_net->commit_mutex);
> 
> So you're saying work can be IDLE after schedule_work()?
> 
I got your point but difficult to explain you. In simple words,
like runqueue, workqueue has latency.

> I'm not following at all.

This does not matter but is why I added tj to the cc list.
Florian Westphal July 8, 2024, 12:43 p.m. UTC | #15
Hillf Danton <hdanton@sina.com> wrote:
> On Mon, 8 Jul 2024 13:58:31 +0200 Florian Westphal <fw@strlen.de>
> > Hillf Danton <hdanton@sina.com> wrote:
> > > On Sun, 7 Jul 2024 10:08:24 +0200 Florian Westphal <fw@strlen.de>
> > > > Hillf Danton <hdanton@sina.com> wrote:
> > > > > > I think this change might be useful as it also documents
> > > > > > this requirement.
> > > > > 
> > > > > Yes it is boy and the current reproducer triggered another warning [1,2].
> > > > > 
> > > > > [1] https://lore.kernel.org/lkml/20240706231332.3261-1-hdanton@sina.com/
> > > > 
> > > > The WARN is incorrect.  The destroy list can be non-empty; i already
> > > > tried to explain why.
> > > >
> > > That warning as-is could be false positive but it could be triggered with a
> > > single netns.
> > 
> > How?
> > 
> You saw the below cpu diagram, no?

It did not explain the problem in a way I understand.

>	cpu1		cpu2		cpu3
>	---		---		---
>					nf_tables_trans_destroy_work()
>					spin_lock(&nf_tables_destroy_list_lock);
>
>					// 1) clear the destroy list
>					list_splice_init(&nf_tables_destroy_list, &head);
>					spin_unlock(&nf_tables_destroy_list_lock);

This means @work is running on cpu3 and made a snapshot of the list.
I don't even understand how thats relevant, but OK.

>			nf_tables_commit_release()
>			spin_lock(&nf_tables_destroy_list_lock);
>			// 2) refill the destroy list
>			list_splice_tail_init(&nft_net->commit_list, &nf_tables_destroy_list);
>			spin_unlock(&nf_tables_destroy_list_lock);
>			schedule_work(&trans_destroy_work);
>			mutex_unlock(&nft_net->commit_mutex);

Means CPU2 has added transaction structures that could
reference @table to list.

It also called schedule_work BEFORE releasing the mutex and
after placing entries on destroy list.

> nft_rcv_nl_event()
> mutex_lock(&nft_net->commit_mutex);
> flush_work(&trans_destroy_work);

Means cpu1 serializes vs. cpu2, @work
was scheduled.

flush_work() must only return if @work is idle, without
any other pending execution.

If it gets scheduled again right after flush_work
returns that is NOT a problem, as I tried to explain several times.

We hold the transaction mutex, only a different netns can queue more
work, and such foreign netns can only see struct nft_table structures
that are private to their namespaces.

> // 3) flush work ends with the refilled destroy list left intact
> tear tables down

Again, I do not understand how its possible.

The postcondition after flush_work returns is:

1. nf_tables_destroy_list must be empty, UNLESS its from unrelated
   net namespaces, they cannot see the tables we're tearing down in 3),
   so they cannot reference them.

2. nf_tables_trans_destroy_work() is NOT running, unless its
   processing entries queued by other netns, after flush work
   returned.


cpu2 does:
   -> add trans->table to @nf_tables_destroy_list
   -> unlock list spinlock
   -> schedule_work
   -> unlock mutex

cpu1 does:
 -> lock mutex
 -> flush work

You say its not enough and that trans->table queued by cpu2 can still
be on @nf_tables_destroy_list.

I say flush_work after taking the mutex guarantees strans->table has been
processed by @work in all cases.
diff mbox series

Patch

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 02d75aefaa8e..683f6a4518ee 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -11552,8 +11552,7 @@  static int nft_rcv_nl_event(struct notifier_block *this, unsigned long event,
 
 	gc_seq = nft_gc_seq_begin(nft_net);
 
-	if (!list_empty(&nf_tables_destroy_list))
-		nf_tables_trans_destroy_flush_work();
+	nf_tables_trans_destroy_flush_work();
 again:
 	list_for_each_entry(table, &nft_net->tables, list) {
 		if (nft_table_has_owner(table) &&