Message ID | 1633848948-11315-1-git-send-email-volodymyr.mytnyk@plvision.eu (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] sched: fix infinite loop when creating tc filter | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Single patches do not need cover letters |
netdev/fixes_present | success | Fixes tag present in non-next series |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | success | CCed 7 of 7 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 8 this patch: 8 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | fail | Problems with Fixes tag: 1 |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 17 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 8 this patch: 8 |
netdev/header_inline | success | No static functions without inline keyword in header files |
Hi Volodymyr, On Sun 10 Oct 2021 at 09:55, Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu> wrote: > From: Volodymyr Mytnyk <vmytnyk@marvell.com> > > After running a specific set of commands tc will become unresponsive: > > $ ip link add dev DEV type veth > $ tc qdisc add dev DEV clsact > $ tc chain add dev DEV chain 0 ingress > $ tc filter del dev DEV ingress > $ tc filter add dev DEV ingress flower action pass > > When executing chain flush, the "chain->flushing" field is set > to true, which prevents insertion of new classifier instances. > It is unset in one place under two conditions: > > `refcnt - chain->action_refcnt == 0` and `!by_act`. > > Ignoring the by_act and action_refcnt arguments the `flushing procedure` > will be over when refcnt is 0. > > But if the chain is explicitly created (e.g. `tc chain add .. chain 0 ..`) > refcnt is set to 1 without any classifier instances. Thus the condition > is never met and the chain->flushing field is never cleared. > And because the default chain is `flushing` new classifiers cannot > be added. tc_new_tfilter is stuck in a loop trying to find a chain > where chain->flushing is false. > > By moving `chain->flushing = false` from __tcf_chain_put to the end > of tcf_chain_flush will avoid the condition and the field will always > be reset after the flush procedure. > > Fixes: 91052fa1c657 ("net: sched: protect chain->explicitly_created with block->lock") Thanks for working on this! > > Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu> > Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu> > Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com> > --- > net/sched/cls_api.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c > index d73b5c5514a9..327594cce554 100644 > --- a/net/sched/cls_api.c > +++ b/net/sched/cls_api.c > @@ -563,8 +563,6 @@ static void __tcf_chain_put(struct tcf_chain *chain, bool by_act, > if (refcnt - chain->action_refcnt == 0 && !by_act) { > tc_chain_notify_delete(tmplt_ops, tmplt_priv, chain->index, > block, NULL, 0, 0, false); > - /* Last reference to chain, no need to lock. */ > - chain->flushing = false; > } > > if (refcnt == 0) > @@ -615,6 +613,9 @@ static void tcf_chain_flush(struct tcf_chain *chain, bool rtnl_held) > tcf_proto_put(tp, rtnl_held, NULL); > tp = tp_next; > } > + > + /* Last reference to chain, no need to lock. */ But after moving the code block here you can no longer guarantee that this is the last reference, right? > + chain->flushing = false; Resetting the flag here is probably correct for actual flush use-case (e.g. RTM_DELTFILTER message with prio==0), but can cause undesired side-effects for other users of tcf_chain_flush(). Consider following interaction between new filter creation and explicit chain deletion that also uses tcf_chanin_flush(): RTM_DELCHAIN RTM_NEWTFILTER + + | | | +----------v-----------+ | | | | | __tcf_block_find | | | | | +----------+-----------+ | | | | | +----------v------------+ | | | | | tcf_chain_get | | | | | +----------+------------+ | | +--------v--------+ | | | | | tcf_chain_flush | | | | | +--------+--------+ | | | | +----------v------------+ | | | | | tcf_chain_tp_find | | | | | +----------+------------+ | | | |tp==NULL | |chain->flushing==false | | | +---------------v----------------+ | | | | | tp_created = 1 | | | tcf_chain_tp_insert_unique | | | | | +---------------+----------------+ | | | | +---------------v-----------------+ | | | | |tcf_chain_put_explicitly_created | | | | | +---------------+-----------------+ | | | v v In this example tc_new_tfilter() holds chain reference during flush. If flush finishes concurrently before the check for chain->flushing, the chain reference counter will not reach 0 (because new filter creation code will not back off and release the reference). In the described example tc_chain_notify_delete() will not be called which will confuse any userland code that expects to receive delete chain notification after sending RTM_DELCHAIN message. With these considerations I can propose following approach to fix the issue: 1. Extend tcf_chain_flush() with additional boolean argument and only call it with 'true' value from tc_del_tfilter(). (or create helper function that calls tcf_chain_flush() and then resets chain->flushing flag) 2. Reset the 'flushing' flag when new argument is true. 3. Wrap the 'flushing' flag reset code in filter_chain_lock critical section. > } > > static int tcf_block_setup(struct tcf_block *block,
Hi Vlad, Thanks for your review comments and good explanation of the problem you observe. I will take a look at this and will back to you. Regards, Volodymyr > Hi Volodymyr, > > On Sun 10 Oct 2021 at 09:55, Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu> wrote: > > From: Volodymyr Mytnyk <vmytnyk@marvell.com> > > > > After running a specific set of commands tc will become unresponsive: > > > > $ ip link add dev DEV type veth > > $ tc qdisc add dev DEV clsact > > $ tc chain add dev DEV chain 0 ingress > > $ tc filter del dev DEV ingress > > $ tc filter add dev DEV ingress flower action pass > > > > When executing chain flush, the "chain->flushing" field is set > > to true, which prevents insertion of new classifier instances. > > It is unset in one place under two conditions: > > > > `refcnt - chain->action_refcnt == 0` and `!by_act`. > > > > Ignoring the by_act and action_refcnt arguments the `flushing procedure` > > will be over when refcnt is 0. > > > > But if the chain is explicitly created (e.g. `tc chain add .. chain 0 ..`) > > refcnt is set to 1 without any classifier instances. Thus the condition > > is never met and the chain->flushing field is never cleared. > > And because the default chain is `flushing` new classifiers cannot > > be added. tc_new_tfilter is stuck in a loop trying to find a chain > > where chain->flushing is false. > > > > By moving `chain->flushing = false` from __tcf_chain_put to the end > > of tcf_chain_flush will avoid the condition and the field will always > > be reset after the flush procedure. > > > > Fixes: 91052fa1c657 ("net: sched: protect chain->explicitly_created with block->lock") > > Thanks for working on this! > > > > > Co-developed-by: Serhiy Boiko <serhiy.boiko@plvision.eu> > > Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu> > > Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com> > > --- > > net/sched/cls_api.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c > > index d73b5c5514a9..327594cce554 100644 > > --- a/net/sched/cls_api.c > > +++ b/net/sched/cls_api.c > > @@ -563,8 +563,6 @@ static void __tcf_chain_put(struct tcf_chain *chain, bool by_act, > > if (refcnt - chain->action_refcnt == 0 && !by_act) { > > tc_chain_notify_delete(tmplt_ops, tmplt_priv, chain->index, > > block, NULL, 0, 0, false); > > - /* Last reference to chain, no need to lock. */ > > - chain->flushing = false; > > } > > > > if (refcnt == 0) > > @@ -615,6 +613,9 @@ static void tcf_chain_flush(struct tcf_chain *chain, bool rtnl_held) > > tcf_proto_put(tp, rtnl_held, NULL); > > tp = tp_next; > > } > > + > > + /* Last reference to chain, no need to lock. */ > > But after moving the code block here you can no longer guarantee that > this is the last reference, right? > > > + chain->flushing = false; > > Resetting the flag here is probably correct for actual flush use-case > (e.g. RTM_DELTFILTER message with prio==0), but can cause undesired > side-effects for other users of tcf_chain_flush(). Consider following > interaction between new filter creation and explicit chain deletion that > also uses tcf_chanin_flush(): > > RTM_DELCHAIN RTM_NEWTFILTER > + + > | | > | +----------v-----------+ > | | | > | | __tcf_block_find | > | | | > | +----------+-----------+ > | | > | | > | +----------v------------+ > | | | > | | tcf_chain_get | > | | | > | +----------+------------+ > | | > +--------v--------+ | > | | | > | tcf_chain_flush | | > | | | > +--------+--------+ | > | | > | +----------v------------+ > | | | > | | tcf_chain_tp_find | > | | | > | +----------+------------+ > | | > | |tp==NULL > | |chain->flushing==false > | | > | +---------------v----------------+ > | | | > | | tp_created = 1 | > | | tcf_chain_tp_insert_unique | > | | | > | +---------------+----------------+ > | | > | | > +---------------v-----------------+ | > | | | > |tcf_chain_put_explicitly_created | | > | | | > +---------------+-----------------+ | > | | > v v > > In this example tc_new_tfilter() holds chain reference during flush. If > flush finishes concurrently before the check for chain->flushing, the > chain reference counter will not reach 0 (because new filter creation > code will not back off and release the reference). In the described > example tc_chain_notify_delete() will not be called which will confuse > any userland code that expects to receive delete chain notification > after sending RTM_DELCHAIN message. > > With these considerations I can propose following approach to fix the > issue: > > 1. Extend tcf_chain_flush() with additional boolean argument and only > call it with 'true' value from tc_del_tfilter(). (or create helper > function that calls tcf_chain_flush() and then resets chain->flushing > flag) > > 2. Reset the 'flushing' flag when new argument is true. > > 3. Wrap the 'flushing' flag reset code in filter_chain_lock critical > section. > > > } > > >
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index d73b5c5514a9..327594cce554 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -563,8 +563,6 @@ static void __tcf_chain_put(struct tcf_chain *chain, bool by_act, if (refcnt - chain->action_refcnt == 0 && !by_act) { tc_chain_notify_delete(tmplt_ops, tmplt_priv, chain->index, block, NULL, 0, 0, false); - /* Last reference to chain, no need to lock. */ - chain->flushing = false; } if (refcnt == 0) @@ -615,6 +613,9 @@ static void tcf_chain_flush(struct tcf_chain *chain, bool rtnl_held) tcf_proto_put(tp, rtnl_held, NULL); tp = tp_next; } + + /* Last reference to chain, no need to lock. */ + chain->flushing = false; } static int tcf_block_setup(struct tcf_block *block,