Message ID | 1660807674-28447-1-git-send-email-paulb@nvidia.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,1/1] netfilter: flowtable: Fix use after free after freeing flow table | expand |
Hi Paul, On Thu, Aug 18, 2022 at 10:27:54AM +0300, Paul Blakey wrote: [...] > diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c > index f2def06d1070..19fd3b5f8a1b 100644 > --- a/net/netfilter/nf_flow_table_core.c > +++ b/net/netfilter/nf_flow_table_core.c > @@ -605,6 +605,7 @@ void nf_flow_table_free(struct nf_flowtable *flow_table) > mutex_unlock(&flowtable_lock); > > cancel_delayed_work_sync(&flow_table->gc_work); > + nf_flow_table_offload_flush(flow_table); > nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); > nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL); > nf_flow_table_offload_flush(flow_table); patch looks very similar to: https://patchwork.ozlabs.org/project/netfilter-devel/patch/1633854320-12326-1-git-send-email-volodymyr.mytnyk@plvision.eu/ I proposed these two instead to avoid reiterative calls to flush from the cleanup path (see attached). It should be possible to either take your patch to nf.git (easier for -stable backport), then look into my patches for nf-next.git, would you pick up on these follow up?
On 19/08/2022 02:04, Pablo Neira Ayuso wrote: > Hi Paul, > > On Thu, Aug 18, 2022 at 10:27:54AM +0300, Paul Blakey wrote: > [...] >> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c >> index f2def06d1070..19fd3b5f8a1b 100644 >> --- a/net/netfilter/nf_flow_table_core.c >> +++ b/net/netfilter/nf_flow_table_core.c >> @@ -605,6 +605,7 @@ void nf_flow_table_free(struct nf_flowtable *flow_table) >> mutex_unlock(&flowtable_lock); >> >> cancel_delayed_work_sync(&flow_table->gc_work); >> + nf_flow_table_offload_flush(flow_table); >> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); >> nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL); >> nf_flow_table_offload_flush(flow_table); > > patch looks very similar to: > > https://patchwork.ozlabs.org/project/netfilter-devel/patch/1633854320-12326-1-git-send-email-volodymyr.mytnyk@plvision.eu/ > > I proposed these two instead to avoid reiterative calls to flush from > the cleanup path (see attached). > > It should be possible to either take your patch to nf.git (easier for > -stable backport), then look into my patches for nf-next.git, would > you pick up on these follow up? Hi! The only functional difference here (for HW table) is your patches call flush just for the del workqueue instead of del/stats/add, right? Because in the end you do: cancel_delayed_work_sync(&flow_table->gc_work); nf_flow_table_offload_flush(flow_table); nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); nf_flow_table_gc_run(flow_table); nf_flow_table_offload_flush_cleanup(flow_table); resulting in the following sequence (after expending flush_cleanup()): cancel_delayed_work_sync(&flow_table->gc_work); nf_flow_table_offload_flush(flow_table); nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); nf_flow_table_gc_run(flow_table); flush_workqueue(nf_flow_offload_del_wq); nf_flow_table_gc_run(flowtable); Where as my (and Volodymyr's) patch does: cancel_delayed_work_sync(&flow_table->gc_work); nf_flow_table_offload_flush(flow_table); nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL); nf_flow_table_offload_flush(flow_table); nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL); so almost identical, I don't see "extra reiterative calls to flush" here, but I'm fine with just your patch as it's more efficient, can we take yours to both gits? Thanks, Paul.
Hi Paul, On Sun, Aug 21, 2022 at 12:23:39PM +0300, Paul Blakey wrote: > Hi! > > The only functional difference here (for HW table) is your patches call > flush just for the del workqueue instead of del/stats/add, right? > > Because in the end you do: > cancel_delayed_work_sync(&flow_table->gc_work); > nf_flow_table_offload_flush(flow_table); > nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); > nf_flow_table_gc_run(flow_table); > nf_flow_table_offload_flush_cleanup(flow_table); > > > resulting in the following sequence (after expending flush_cleanup()): > > cancel_delayed_work_sync(&flow_table->gc_work); > nf_flow_table_offload_flush(flow_table); > nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); > nf_flow_table_gc_run(flow_table); > flush_workqueue(nf_flow_offload_del_wq); > nf_flow_table_gc_run(flowtable); > > > Where as my (and Volodymyr's) patch does: > > cancel_delayed_work_sync(&flow_table->gc_work); > nf_flow_table_offload_flush(flow_table); > nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); > nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL); > nf_flow_table_offload_flush(flow_table); > nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL); > > > so almost identical, I don't see "extra reiterative calls to flush" here, > but I'm fine with just your patch as it's more efficient, can we take yours > to both gits? Yes, I'll submit them. I'll re-use your patch description. Maybe I get a Tested-by: tag from you? Thanks!
On 23/08/2022 00:10, Pablo Neira Ayuso wrote: > Hi Paul, > > On Sun, Aug 21, 2022 at 12:23:39PM +0300, Paul Blakey wrote: >> Hi! >> >> The only functional difference here (for HW table) is your patches call >> flush just for the del workqueue instead of del/stats/add, right? >> >> Because in the end you do: >> cancel_delayed_work_sync(&flow_table->gc_work); >> nf_flow_table_offload_flush(flow_table); >> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); >> nf_flow_table_gc_run(flow_table); >> nf_flow_table_offload_flush_cleanup(flow_table); >> >> >> resulting in the following sequence (after expending flush_cleanup()): >> >> cancel_delayed_work_sync(&flow_table->gc_work); >> nf_flow_table_offload_flush(flow_table); >> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); >> nf_flow_table_gc_run(flow_table); >> flush_workqueue(nf_flow_offload_del_wq); >> nf_flow_table_gc_run(flowtable); >> >> >> Where as my (and Volodymyr's) patch does: >> >> cancel_delayed_work_sync(&flow_table->gc_work); >> nf_flow_table_offload_flush(flow_table); >> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); >> nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL); >> nf_flow_table_offload_flush(flow_table); >> nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL); >> >> >> so almost identical, I don't see "extra reiterative calls to flush" here, >> but I'm fine with just your patch as it's more efficient, can we take yours >> to both gits? > > Yes, I'll submit them. I'll re-use your patch description. > > Maybe I get a Tested-by: tag from you? > > Thanks! Sure I'll test and post. Thanks.
On 23/08/2022 10:56, Paul Blakey wrote: > > > On 23/08/2022 00:10, Pablo Neira Ayuso wrote: >> Hi Paul, >> >> On Sun, Aug 21, 2022 at 12:23:39PM +0300, Paul Blakey wrote: >>> Hi! >>> >>> The only functional difference here (for HW table) is your patches call >>> flush just for the del workqueue instead of del/stats/add, right? >>> >>> Because in the end you do: >>> cancel_delayed_work_sync(&flow_table->gc_work); >>> nf_flow_table_offload_flush(flow_table); >>> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); >>> nf_flow_table_gc_run(flow_table); >>> nf_flow_table_offload_flush_cleanup(flow_table); >>> >>> >>> resulting in the following sequence (after expending flush_cleanup()): >>> >>> cancel_delayed_work_sync(&flow_table->gc_work); >>> nf_flow_table_offload_flush(flow_table); >>> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); >>> nf_flow_table_gc_run(flow_table); >>> flush_workqueue(nf_flow_offload_del_wq); >>> nf_flow_table_gc_run(flowtable); >>> >>> >>> Where as my (and Volodymyr's) patch does: >>> >>> cancel_delayed_work_sync(&flow_table->gc_work); >>> nf_flow_table_offload_flush(flow_table); >>> nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); >>> nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL); >>> nf_flow_table_offload_flush(flow_table); >>> nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL); >>> >>> >>> so almost identical, I don't see "extra reiterative calls to flush" >>> here, >>> but I'm fine with just your patch as it's more efficient, can we take >>> yours >>> to both gits? >> >> Yes, I'll submit them. I'll re-use your patch description. >> >> Maybe I get a Tested-by: tag from you? >> >> Thanks! > > Sure I'll test and post. > Thanks. Tested-By: Paul Blakey <paulb@nvidia.com> Works, thanks.
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index f2def06d1070..19fd3b5f8a1b 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -605,6 +605,7 @@ void nf_flow_table_free(struct nf_flowtable *flow_table) mutex_unlock(&flowtable_lock); cancel_delayed_work_sync(&flow_table->gc_work); + nf_flow_table_offload_flush(flow_table); nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL); nf_flow_table_iterate(flow_table, nf_flow_offload_gc_step, NULL); nf_flow_table_offload_flush(flow_table);
To clear the flow table on flow table free, the following sequence normally happens in order: 1) gc_step work is stopped to disable any further stats/del requests. 2) All flow table entries are set to teardown state. 3) Run gc_step which will queue HW del work for each flow table entry. 4) Waiting for the above del work to finish (flush). 5) Run gc_step again, deleting all entries from the flow table. 6) Flow table is freed. But if a flow table entry already has pending HW stats or HW add work step 3 will not queue HW del work (it will be skipped), step 4 will wait for the pending add/stats to finish, and step 5 will queue HW del work which might execute after freeing of the flow table. To fix the above, add another flush (before step 2 above) to wait for any pending add/stats work to finish, so next steps will work as expected (schedule HW del, wait for it, then delete the flow from the flow table). Stack trace: [47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460 [47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704 [47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2 [47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009) [47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table] [47773.889727] Call Trace: [47773.890214] dump_stack+0xbb/0x107 [47773.890818] print_address_description.constprop.0+0x18/0x140 [47773.892990] kasan_report.cold+0x7c/0xd8 [47773.894459] kasan_check_range+0x145/0x1a0 [47773.895174] down_read+0x99/0x460 [47773.899706] nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table] [47773.907137] flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table] [47773.913372] process_one_work+0x8ac/0x14e0 [47773.921325] [47773.921325] Allocated by task 592159: [47773.922031] kasan_save_stack+0x1b/0x40 [47773.922730] __kasan_kmalloc+0x7a/0x90 [47773.923411] tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct] [47773.924363] tcf_ct_init+0x71c/0x1156 [act_ct] [47773.925207] tcf_action_init_1+0x45b/0x700 [47773.925987] tcf_action_init+0x453/0x6b0 [47773.926692] tcf_exts_validate+0x3d0/0x600 [47773.927419] fl_change+0x757/0x4a51 [cls_flower] [47773.928227] tc_new_tfilter+0x89a/0x2070 [47773.936652] [47773.936652] Freed by task 543704: [47773.937303] kasan_save_stack+0x1b/0x40 [47773.938039] kasan_set_track+0x1c/0x30 [47773.938731] kasan_set_free_info+0x20/0x30 [47773.939467] __kasan_slab_free+0xe7/0x120 [47773.940194] slab_free_freelist_hook+0x86/0x190 [47773.941038] kfree+0xce/0x3a0 [47773.941644] tcf_ct_flow_table_cleanup_work+0x1b/0x30 [act_ct] [47773.942656] process_one_work+0x8ac/0x14e0 Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Paul Blakey <paulb@nvidia.com> --- net/netfilter/nf_flow_table_core.c | 1 + 1 file changed, 1 insertion(+)