diff mbox series

[net,1/1] netfilter: flowtable: Fix use after free after freeing flow table

Message ID 1660807674-28447-1-git-send-email-paulb@nvidia.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [net,1/1] netfilter: flowtable: Fix use after free after freeing flow table | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 85 this patch: 85
netdev/cc_maintainers fail 1 blamed authors not CCed: davem@davemloft.net; 7 maintainers not CCed: davem@davemloft.net kadlec@netfilter.org fw@strlen.de edumazet@google.com coreteam@netfilter.org kuba@kernel.org pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 85 this patch: 85
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 7 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Paul Blakey Aug. 18, 2022, 7:27 a.m. UTC
To clear the flow table on flow table free, the following sequence
normally happens in order:
  1) gc_step work is stopped to disable any further stats/del requests.
  2) All flow table entries are set to teardown state.
  3) Run gc_step which will queue HW del work for each flow table entry.
  4) Waiting for the above del work to finish (flush).
  5) Run gc_step again, deleting all entries from the flow table.
  6) Flow table is freed.

But if a flow table entry already has pending HW stats or HW add work
step 3 will not queue HW del work (it will be skipped), step 4 will wait
for the pending add/stats to finish, and step 5 will queue HW del work
which might execute after freeing of the flow table.

To fix the above, add another flush (before step 2 above) to wait for
any pending add/stats work to finish, so next steps will work as expected
(schedule HW del, wait for it, then delete the flow from the flow table).

Stack trace:
[47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460
[47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704
[47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2
[47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)
[47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table]
[47773.889727] Call Trace:
[47773.890214]  dump_stack+0xbb/0x107
[47773.890818]  print_address_description.constprop.0+0x18/0x140
[47773.892990]  kasan_report.cold+0x7c/0xd8
[47773.894459]  kasan_check_range+0x145/0x1a0
[47773.895174]  down_read+0x99/0x460
[47773.899706]  nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table]
[47773.907137]  flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table]
[47773.913372]  process_one_work+0x8ac/0x14e0
[47773.921325]
[47773.921325] Allocated by task 592159:
[47773.922031]  kasan_save_stack+0x1b/0x40
[47773.922730]  __kasan_kmalloc+0x7a/0x90
[47773.923411]  tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct]
[47773.924363]  tcf_ct_init+0x71c/0x1156 [act_ct]
[47773.925207]  tcf_action_init_1+0x45b/0x700
[47773.925987]  tcf_action_init+0x453/0x6b0
[47773.926692]  tcf_exts_validate+0x3d0/0x600
[47773.927419]  fl_change+0x757/0x4a51 [cls_flower]
[47773.928227]  tc_new_tfilter+0x89a/0x2070
[47773.936652]
[47773.936652] Freed by task 543704:
[47773.937303]  kasan_save_stack+0x1b/0x40
[47773.938039]  kasan_set_track+0x1c/0x30
[47773.938731]  kasan_set_free_info+0x20/0x30
[47773.939467]  __kasan_slab_free+0xe7/0x120
[47773.940194]  slab_free_freelist_hook+0x86/0x190
[47773.941038]  kfree+0xce/0x3a0
[47773.941644]  tcf_ct_flow_table_cleanup_work+0x1b/0x30 [act_ct]
[47773.942656]  process_one_work+0x8ac/0x14e0

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
---
 net/netfilter/nf_flow_table_core.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Pablo Neira Ayuso Aug. 18, 2022, 11:04 p.m. UTC | #1
Hi Paul,

On Thu, Aug 18, 2022 at 10:27:54AM +0300, Paul Blakey wrote:
[...]
> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> index f2def06d1070..19fd3b5f8a1b 100644
> --- a/net/netfilter/nf_flow_table_core.c
> +++ b/net/netfilter/nf_flow_table_core.c
> @@ -605,6 +605,7 @@ void nf_flow_table_free(struct nf_flowtable *flow_table)
>  	mutex_unlock(&flowtable_lock);
>  
>  	cancel_delayed_work_sync(&flow_table->gc_work);
> +	nf_flow_table_offload_flush(flow_table);
>  	nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
>  	nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);
>  	nf_flow_table_offload_flush(flow_table);

patch looks very similar to:

https://patchwork.ozlabs.org/project/netfilter-devel/patch/1633854320-12326-1-git-send-email-volodymyr.mytnyk@plvision.eu/

I proposed these two instead to avoid reiterative calls to flush from
the cleanup path (see attached).

It should be possible to either take your patch to nf.git (easier for
-stable backport), then look into my patches for nf-next.git, would
you pick up on these follow up?
Paul Blakey Aug. 21, 2022, 9:23 a.m. UTC | #2
On 19/08/2022 02:04, Pablo Neira Ayuso wrote:
> Hi Paul,
> 
> On Thu, Aug 18, 2022 at 10:27:54AM +0300, Paul Blakey wrote:
> [...]
>> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
>> index f2def06d1070..19fd3b5f8a1b 100644
>> --- a/net/netfilter/nf_flow_table_core.c
>> +++ b/net/netfilter/nf_flow_table_core.c
>> @@ -605,6 +605,7 @@ void nf_flow_table_free(struct nf_flowtable *flow_table)
>>   	mutex_unlock(&flowtable_lock);
>>   
>>   	cancel_delayed_work_sync(&flow_table->gc_work);
>> +	nf_flow_table_offload_flush(flow_table);
>>   	nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
>>   	nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);
>>   	nf_flow_table_offload_flush(flow_table);
> 
> patch looks very similar to:
> 
> https://patchwork.ozlabs.org/project/netfilter-devel/patch/1633854320-12326-1-git-send-email-volodymyr.mytnyk@plvision.eu/
> 
> I proposed these two instead to avoid reiterative calls to flush from
> the cleanup path (see attached).
> 
> It should be possible to either take your patch to nf.git (easier for
> -stable backport), then look into my patches for nf-next.git, would
> you pick up on these follow up?


Hi!

The only functional difference here (for HW table) is your patches call 
flush just for the del workqueue instead of del/stats/add, right?

Because in the end you do:
cancel_delayed_work_sync(&flow_table->gc_work);
nf_flow_table_offload_flush(flow_table);
nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
nf_flow_table_gc_run(flow_table);
nf_flow_table_offload_flush_cleanup(flow_table);


resulting in the following sequence (after expending flush_cleanup()):

cancel_delayed_work_sync(&flow_table->gc_work);
nf_flow_table_offload_flush(flow_table);
nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
nf_flow_table_gc_run(flow_table);
flush_workqueue(nf_flow_offload_del_wq);
nf_flow_table_gc_run(flowtable);


Where as my (and Volodymyr's) patch does:

cancel_delayed_work_sync(&flow_table->gc_work);
nf_flow_table_offload_flush(flow_table);
nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);
nf_flow_table_offload_flush(flow_table);
nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);


so almost identical, I don't see "extra reiterative calls to flush" 
here,  but I'm fine with just your patch as it's more efficient, can we 
take yours to both gits?

Thanks,
Paul.
Pablo Neira Ayuso Aug. 22, 2022, 9:10 p.m. UTC | #3
Hi Paul,

On Sun, Aug 21, 2022 at 12:23:39PM +0300, Paul Blakey wrote:
> Hi!
> 
> The only functional difference here (for HW table) is your patches call
> flush just for the del workqueue instead of del/stats/add, right?
> 
> Because in the end you do:
> cancel_delayed_work_sync(&flow_table->gc_work);
> nf_flow_table_offload_flush(flow_table);
> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
> nf_flow_table_gc_run(flow_table);
> nf_flow_table_offload_flush_cleanup(flow_table);
> 
> 
> resulting in the following sequence (after expending flush_cleanup()):
> 
> cancel_delayed_work_sync(&flow_table->gc_work);
> nf_flow_table_offload_flush(flow_table);
> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
> nf_flow_table_gc_run(flow_table);
> flush_workqueue(nf_flow_offload_del_wq);
> nf_flow_table_gc_run(flowtable);
> 
> 
> Where as my (and Volodymyr's) patch does:
> 
> cancel_delayed_work_sync(&flow_table->gc_work);
> nf_flow_table_offload_flush(flow_table);
> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
> nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);
> nf_flow_table_offload_flush(flow_table);
> nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);
> 
> 
> so almost identical, I don't see "extra reiterative calls to flush" here,
> but I'm fine with just your patch as it's more efficient, can we take yours
> to both gits?

Yes, I'll submit them. I'll re-use your patch description.

Maybe I get a Tested-by: tag from you?

Thanks!
Paul Blakey Aug. 23, 2022, 7:56 a.m. UTC | #4
On 23/08/2022 00:10, Pablo Neira Ayuso wrote:
> Hi Paul,
> 
> On Sun, Aug 21, 2022 at 12:23:39PM +0300, Paul Blakey wrote:
>> Hi!
>>
>> The only functional difference here (for HW table) is your patches call
>> flush just for the del workqueue instead of del/stats/add, right?
>>
>> Because in the end you do:
>> cancel_delayed_work_sync(&flow_table->gc_work);
>> nf_flow_table_offload_flush(flow_table);
>> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
>> nf_flow_table_gc_run(flow_table);
>> nf_flow_table_offload_flush_cleanup(flow_table);
>>
>>
>> resulting in the following sequence (after expending flush_cleanup()):
>>
>> cancel_delayed_work_sync(&flow_table->gc_work);
>> nf_flow_table_offload_flush(flow_table);
>> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
>> nf_flow_table_gc_run(flow_table);
>> flush_workqueue(nf_flow_offload_del_wq);
>> nf_flow_table_gc_run(flowtable);
>>
>>
>> Where as my (and Volodymyr's) patch does:
>>
>> cancel_delayed_work_sync(&flow_table->gc_work);
>> nf_flow_table_offload_flush(flow_table);
>> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
>> nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);
>> nf_flow_table_offload_flush(flow_table);
>> nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);
>>
>>
>> so almost identical, I don't see "extra reiterative calls to flush" here,
>> but I'm fine with just your patch as it's more efficient, can we take yours
>> to both gits?
> 
> Yes, I'll submit them. I'll re-use your patch description.
> 
> Maybe I get a Tested-by: tag from you?
> 
> Thanks!

Sure I'll test and post.
Thanks.
Paul Blakey Aug. 23, 2022, 9:49 a.m. UTC | #5
On 23/08/2022 10:56, Paul Blakey wrote:
> 
> 
> On 23/08/2022 00:10, Pablo Neira Ayuso wrote:
>> Hi Paul,
>>
>> On Sun, Aug 21, 2022 at 12:23:39PM +0300, Paul Blakey wrote:
>>> Hi!
>>>
>>> The only functional difference here (for HW table) is your patches call
>>> flush just for the del workqueue instead of del/stats/add, right?
>>>
>>> Because in the end you do:
>>> cancel_delayed_work_sync(&flow_table->gc_work);
>>> nf_flow_table_offload_flush(flow_table);
>>> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
>>> nf_flow_table_gc_run(flow_table);
>>> nf_flow_table_offload_flush_cleanup(flow_table);
>>>
>>>
>>> resulting in the following sequence (after expending flush_cleanup()):
>>>
>>> cancel_delayed_work_sync(&flow_table->gc_work);
>>> nf_flow_table_offload_flush(flow_table);
>>> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
>>> nf_flow_table_gc_run(flow_table);
>>> flush_workqueue(nf_flow_offload_del_wq);
>>> nf_flow_table_gc_run(flowtable);
>>>
>>>
>>> Where as my (and Volodymyr's) patch does:
>>>
>>> cancel_delayed_work_sync(&flow_table->gc_work);
>>> nf_flow_table_offload_flush(flow_table);
>>> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
>>> nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);
>>> nf_flow_table_offload_flush(flow_table);
>>> nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);
>>>
>>>
>>> so almost identical, I don't see "extra reiterative calls to flush" 
>>> here,
>>> but I'm fine with just your patch as it's more efficient, can we take 
>>> yours
>>> to both gits?
>>
>> Yes, I'll submit them. I'll re-use your patch description.
>>
>> Maybe I get a Tested-by: tag from you?
>>
>> Thanks!
> 
> Sure I'll test and post.
> Thanks.

Tested-By: Paul Blakey <paulb@nvidia.com>

Works, thanks.
diff mbox series

Patch

diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index f2def06d1070..19fd3b5f8a1b 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -605,6 +605,7 @@  void nf_flow_table_free(struct nf_flowtable *flow_table)
 	mutex_unlock(&flowtable_lock);
 
 	cancel_delayed_work_sync(&flow_table->gc_work);
+	nf_flow_table_offload_flush(flow_table);
 	nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
 	nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL);
 	nf_flow_table_offload_flush(flow_table);