Message ID | 20231205153012.484687-2-pctammela@mojatatu.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net/sched: optimizations around action binding and init | expand |
On Tue 05 Dec 2023 at 12:30, Pedro Tammela <pctammela@mojatatu.com> wrote: > Instead of relying only on the idrinfo->lock mutex for > bind/alloc logic, rely on a combination of rcu + mutex + atomics > to better scale the case where multiple rtnl-less filters are > binding to the same action object. > > Action binding happens when an action index is specified explicitly and > an action exists which such index exists. Example: Nit: the first sentence looks mangled, extra 'exists' word and probably 'which' should be 'with'. > tc actions add action drop index 1 > tc filter add ... matchall action drop index 1 > tc filter add ... matchall action drop index 1 > tc filter add ... matchall action drop index 1 > tc filter ls ... > filter protocol all pref 49150 matchall chain 0 filter protocol all pref 49150 matchall chain 0 handle 0x1 > not_in_hw > action order 1: gact action drop > random type none pass val 0 > index 1 ref 4 bind 3 > > filter protocol all pref 49151 matchall chain 0 filter protocol all pref 49151 matchall chain 0 handle 0x1 > not_in_hw > action order 1: gact action drop > random type none pass val 0 > index 1 ref 4 bind 3 > > filter protocol all pref 49152 matchall chain 0 filter protocol all pref 49152 matchall chain 0 handle 0x1 > not_in_hw > action order 1: gact action drop > random type none pass val 0 > index 1 ref 4 bind 3 > > When no index is specified, as before, grab the mutex and allocate > in the idr the next available id. In this version, as opposed to before, > it's simplified to store the -EBUSY pointer instead of the previous > alloc + replace combination. > > When an index is specified, rely on rcu to find if there's an object in > such index. If there's none, fallback to the above, serializing on the > mutex and reserving the specified id. If there's one, it can be an -EBUSY > pointer, in which case we just try again until it's an action, or an action. > Given the rcu guarantees, the action found could be dead and therefore > we need to bump the refcount if it's not 0, handling the case it's > in fact 0. > > As bind and the action refcount are already atomics, these increments can > happen without the mutex protection while many tcf_idr_check_alloc race > to bind to the same action instance. > > Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> > --- > net/sched/act_api.c | 56 +++++++++++++++++++++++++++------------------ > 1 file changed, 34 insertions(+), 22 deletions(-) > > diff --git a/net/sched/act_api.c b/net/sched/act_api.c > index abec5c45b5a4..79a044d2ae02 100644 > --- a/net/sched/act_api.c > +++ b/net/sched/act_api.c > @@ -824,43 +824,55 @@ int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index, > struct tcf_idrinfo *idrinfo = tn->idrinfo; > struct tc_action *p; > int ret; > + u32 max; > > -again: > - mutex_lock(&idrinfo->lock); > if (*index) { > +again: > + rcu_read_lock(); > p = idr_find(&idrinfo->action_idr, *index); > + > if (IS_ERR(p)) { > /* This means that another process allocated > * index but did not assign the pointer yet. > */ > - mutex_unlock(&idrinfo->lock); > + rcu_read_unlock(); > goto again; > } > > - if (p) { > - refcount_inc(&p->tcfa_refcnt); > - if (bind) > - atomic_inc(&p->tcfa_bindcnt); > - *a = p; > - ret = 1; > - } else { > - *a = NULL; > - ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index, > - *index, GFP_KERNEL); > - if (!ret) > - idr_replace(&idrinfo->action_idr, > - ERR_PTR(-EBUSY), *index); > + if (!p) { > + /* Empty slot, try to allocate it */ > + max = *index; > + rcu_read_unlock(); > + goto new; > + } > + > + if (!refcount_inc_not_zero(&p->tcfa_refcnt)) { > + /* Action was deleted in parallel */ > + rcu_read_unlock(); > + return -ENOENT; Current version doesn't return ENOENT since it is synchronous. You are now introducing basically a change to UAPI since users of this function (individual actions) are not prepared to retry on ENOENT and will propagate the error up the call chain. I guess you need to try to create a new action with specified index instead. > } > + > + if (bind) > + atomic_inc(&p->tcfa_bindcnt); > + *a = p; > + > + rcu_read_unlock(); > + > + return 1; > } else { > + /* Find a slot */ > *index = 1; > - *a = NULL; > - ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index, > - UINT_MAX, GFP_KERNEL); > - if (!ret) > - idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY), > - *index); > + max = UINT_MAX; > } > + > +new: > + *a = NULL; > + > + mutex_lock(&idrinfo->lock); > + ret = idr_alloc_u32(&idrinfo->action_idr, ERR_PTR(-EBUSY), index, max, > + GFP_KERNEL); What if multiple concurrent tasks didn't find the action by index with rcu and get here, synchronizing on the idrinfo->lock? It looks like after the one who got the lock first successfully allocates the index everyone else will fail (also propagating ENOSPACE to the user). I guess you need some mechanism to account for such case and retry. > mutex_unlock(&idrinfo->lock); > + > return ret; > } > EXPORT_SYMBOL(tcf_idr_check_alloc);
On 05/12/2023 15:34, Vlad Buslov wrote: > On Tue 05 Dec 2023 at 12:30, Pedro Tammela <pctammela@mojatatu.com> wrote: >> Instead of relying only on the idrinfo->lock mutex for >> bind/alloc logic, rely on a combination of rcu + mutex + atomics >> to better scale the case where multiple rtnl-less filters are >> binding to the same action object. >> >> Action binding happens when an action index is specified explicitly and >> an action exists which such index exists. Example: > > Nit: the first sentence looks mangled, extra 'exists' word and probably > 'which' should be 'with'. > >> tc actions add action drop index 1 >> tc filter add ... matchall action drop index 1 >> tc filter add ... matchall action drop index 1 >> tc filter add ... matchall action drop index 1 >> tc filter ls ... >> filter protocol all pref 49150 matchall chain 0 filter protocol all pref 49150 matchall chain 0 handle 0x1 >> not_in_hw >> action order 1: gact action drop >> random type none pass val 0 >> index 1 ref 4 bind 3 >> >> filter protocol all pref 49151 matchall chain 0 filter protocol all pref 49151 matchall chain 0 handle 0x1 >> not_in_hw >> action order 1: gact action drop >> random type none pass val 0 >> index 1 ref 4 bind 3 >> >> filter protocol all pref 49152 matchall chain 0 filter protocol all pref 49152 matchall chain 0 handle 0x1 >> not_in_hw >> action order 1: gact action drop >> random type none pass val 0 >> index 1 ref 4 bind 3 >> >> When no index is specified, as before, grab the mutex and allocate >> in the idr the next available id. In this version, as opposed to before, >> it's simplified to store the -EBUSY pointer instead of the previous >> alloc + replace combination. >> >> When an index is specified, rely on rcu to find if there's an object in >> such index. If there's none, fallback to the above, serializing on the >> mutex and reserving the specified id. If there's one, it can be an -EBUSY >> pointer, in which case we just try again until it's an action, or an action. >> Given the rcu guarantees, the action found could be dead and therefore >> we need to bump the refcount if it's not 0, handling the case it's >> in fact 0. >> >> As bind and the action refcount are already atomics, these increments can >> happen without the mutex protection while many tcf_idr_check_alloc race >> to bind to the same action instance. >> >> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> >> --- >> net/sched/act_api.c | 56 +++++++++++++++++++++++++++------------------ >> 1 file changed, 34 insertions(+), 22 deletions(-) >> >> diff --git a/net/sched/act_api.c b/net/sched/act_api.c >> index abec5c45b5a4..79a044d2ae02 100644 >> --- a/net/sched/act_api.c >> +++ b/net/sched/act_api.c >> @@ -824,43 +824,55 @@ int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index, >> struct tcf_idrinfo *idrinfo = tn->idrinfo; >> struct tc_action *p; >> int ret; >> + u32 max; >> >> -again: >> - mutex_lock(&idrinfo->lock); >> if (*index) { >> +again: >> + rcu_read_lock(); >> p = idr_find(&idrinfo->action_idr, *index); >> + >> if (IS_ERR(p)) { >> /* This means that another process allocated >> * index but did not assign the pointer yet. >> */ >> - mutex_unlock(&idrinfo->lock); >> + rcu_read_unlock(); >> goto again; >> } >> >> - if (p) { >> - refcount_inc(&p->tcfa_refcnt); >> - if (bind) >> - atomic_inc(&p->tcfa_bindcnt); >> - *a = p; >> - ret = 1; >> - } else { >> - *a = NULL; >> - ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index, >> - *index, GFP_KERNEL); >> - if (!ret) >> - idr_replace(&idrinfo->action_idr, >> - ERR_PTR(-EBUSY), *index); >> + if (!p) { >> + /* Empty slot, try to allocate it */ >> + max = *index; >> + rcu_read_unlock(); >> + goto new; >> + } >> + >> + if (!refcount_inc_not_zero(&p->tcfa_refcnt)) { >> + /* Action was deleted in parallel */ >> + rcu_read_unlock(); >> + return -ENOENT; > > Current version doesn't return ENOENT since it is synchronous. You are > now introducing basically a change to UAPI since users of this function > (individual actions) are not prepared to retry on ENOENT and will > propagate the error up the call chain. I guess you need to try to create > a new action with specified index instead. I see. So you are saying that in the case where action foo is deleted and a binding in parallel observes the deleted action, it should fallback into trying to allocate the index. We could goto again and hope that idr_find will observe the idr index being freed, in which case it would fall back into action allocation if it does or simply go via the same path as before (jumping to 'again'). I don't see much problems here, it seems to converge in this scenario as it eventually transforms into race for action allocation (more below) if you have an unfortunate delete with many bindings in flight. > >> } >> + >> + if (bind) >> + atomic_inc(&p->tcfa_bindcnt); >> + *a = p; >> + >> + rcu_read_unlock(); >> + >> + return 1; >> } else { >> + /* Find a slot */ >> *index = 1; >> - *a = NULL; >> - ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index, >> - UINT_MAX, GFP_KERNEL); >> - if (!ret) >> - idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY), >> - *index); >> + max = UINT_MAX; >> } >> + >> +new: >> + *a = NULL; >> + >> + mutex_lock(&idrinfo->lock); >> + ret = idr_alloc_u32(&idrinfo->action_idr, ERR_PTR(-EBUSY), index, max, >> + GFP_KERNEL); > > What if multiple concurrent tasks didn't find the action by index with > rcu and get here, synchronizing on the idrinfo->lock? It looks like > after the one who got the lock first successfully allocates the index > everyone else will fail (also propagating ENOSPACE to the user). Correct > I guess you need some mechanism to account for such case and retry. Ok, so if I'm binding and it's observed a free index, which means "try to allocate" and I get a ENOSPC after jumping to new, try again but this time binding into the allocated action. In this scenario when we come back to 'again' we will wait until -EBUSY is replaced with the real pointer. Seems like a big enough window that any race for allocating from binding would most probably end up in this contention loop. However I think when we have these two retry mechanisms there's a extremely small window for an infinite loop if an action delete is timed just right, in between the action pointer is found and when we grab the tcfa_refcnt. idr_find (pointer) tcfa_refcnt (0) <-------| again: | idr_find (free index!) | new: | idr_alloc_u32 (ENOSPC) | again: | idr_find (EBUSY) | again: | idr_find (pointer) | <evil delete happens> | ------->>>>--------------| Another potential problem, is that this will race with non binding actions. So if the ENOSPC was actually from another unrelated action. A practical example would be a race between a binding to an 'action drop index 1' and an 'action ok' allocation. Actually it's a general problem and not particular to this case here but it seems like we could be amplifying it. I'm conflicted here. If I were to choose one of the two, I would pick the action respawing as to me it seems to converge much quicker and removes the uapi change (my bad! :). As for usability, if all of a sudden there's a huge influx of ENOSPC errors because users are abusing 'tc filter add ... action index 1 ...' in parallel _before_ actually creating the action object the fix is to just: tc actions add action index 1 ... tc filter add ... As tc has always supported
On Tue 05 Dec 2023 at 17:19, Pedro Tammela <pctammela@mojatatu.com> wrote: > On 05/12/2023 15:34, Vlad Buslov wrote: >> On Tue 05 Dec 2023 at 12:30, Pedro Tammela <pctammela@mojatatu.com> wrote: >>> Instead of relying only on the idrinfo->lock mutex for >>> bind/alloc logic, rely on a combination of rcu + mutex + atomics >>> to better scale the case where multiple rtnl-less filters are >>> binding to the same action object. >>> >>> Action binding happens when an action index is specified explicitly and >>> an action exists which such index exists. Example: >> Nit: the first sentence looks mangled, extra 'exists' word and probably >> 'which' should be 'with'. >> >>> tc actions add action drop index 1 >>> tc filter add ... matchall action drop index 1 >>> tc filter add ... matchall action drop index 1 >>> tc filter add ... matchall action drop index 1 >>> tc filter ls ... >>> filter protocol all pref 49150 matchall chain 0 filter protocol all pref 49150 matchall chain 0 handle 0x1 >>> not_in_hw >>> action order 1: gact action drop >>> random type none pass val 0 >>> index 1 ref 4 bind 3 >>> >>> filter protocol all pref 49151 matchall chain 0 filter protocol all pref 49151 matchall chain 0 handle 0x1 >>> not_in_hw >>> action order 1: gact action drop >>> random type none pass val 0 >>> index 1 ref 4 bind 3 >>> >>> filter protocol all pref 49152 matchall chain 0 filter protocol all pref 49152 matchall chain 0 handle 0x1 >>> not_in_hw >>> action order 1: gact action drop >>> random type none pass val 0 >>> index 1 ref 4 bind 3 >>> >>> When no index is specified, as before, grab the mutex and allocate >>> in the idr the next available id. In this version, as opposed to before, >>> it's simplified to store the -EBUSY pointer instead of the previous >>> alloc + replace combination. >>> >>> When an index is specified, rely on rcu to find if there's an object in >>> such index. If there's none, fallback to the above, serializing on the >>> mutex and reserving the specified id. If there's one, it can be an -EBUSY >>> pointer, in which case we just try again until it's an action, or an action. >>> Given the rcu guarantees, the action found could be dead and therefore >>> we need to bump the refcount if it's not 0, handling the case it's >>> in fact 0. >>> >>> As bind and the action refcount are already atomics, these increments can >>> happen without the mutex protection while many tcf_idr_check_alloc race >>> to bind to the same action instance. >>> >>> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> >>> --- >>> net/sched/act_api.c | 56 +++++++++++++++++++++++++++------------------ >>> 1 file changed, 34 insertions(+), 22 deletions(-) >>> >>> diff --git a/net/sched/act_api.c b/net/sched/act_api.c >>> index abec5c45b5a4..79a044d2ae02 100644 >>> --- a/net/sched/act_api.c >>> +++ b/net/sched/act_api.c >>> @@ -824,43 +824,55 @@ int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index, >>> struct tcf_idrinfo *idrinfo = tn->idrinfo; >>> struct tc_action *p; >>> int ret; >>> + u32 max; >>> -again: >>> - mutex_lock(&idrinfo->lock); >>> if (*index) { >>> +again: >>> + rcu_read_lock(); >>> p = idr_find(&idrinfo->action_idr, *index); >>> + >>> if (IS_ERR(p)) { >>> /* This means that another process allocated >>> * index but did not assign the pointer yet. >>> */ >>> - mutex_unlock(&idrinfo->lock); >>> + rcu_read_unlock(); >>> goto again; >>> } >>> - if (p) { >>> - refcount_inc(&p->tcfa_refcnt); >>> - if (bind) >>> - atomic_inc(&p->tcfa_bindcnt); >>> - *a = p; >>> - ret = 1; >>> - } else { >>> - *a = NULL; >>> - ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index, >>> - *index, GFP_KERNEL); >>> - if (!ret) >>> - idr_replace(&idrinfo->action_idr, >>> - ERR_PTR(-EBUSY), *index); >>> + if (!p) { >>> + /* Empty slot, try to allocate it */ >>> + max = *index; >>> + rcu_read_unlock(); >>> + goto new; >>> + } >>> + >>> + if (!refcount_inc_not_zero(&p->tcfa_refcnt)) { >>> + /* Action was deleted in parallel */ >>> + rcu_read_unlock(); >>> + return -ENOENT; >> Current version doesn't return ENOENT since it is synchronous. You are >> now introducing basically a change to UAPI since users of this function >> (individual actions) are not prepared to retry on ENOENT and will >> propagate the error up the call chain. I guess you need to try to create >> a new action with specified index instead. > > I see. > So you are saying that in the case where action foo is deleted and a binding in > parallel observes the deleted action, it should fallback into trying to allocate > the index. Correct. > > We could goto again and hope that idr_find will observe the idr index being > freed, in which case it would fall back into action allocation if it does or > simply go via the same path as before (jumping to 'again'). > > I don't see much problems here, it seems to converge in this scenario > as it eventually transforms into race for action allocation (more below) if you > have an unfortunate delete with many bindings in flight. > >> >>> } >>> + >>> + if (bind) >>> + atomic_inc(&p->tcfa_bindcnt); >>> + *a = p; >>> + >>> + rcu_read_unlock(); >>> + >>> + return 1; >>> } else { >>> + /* Find a slot */ >>> *index = 1; >>> - *a = NULL; >>> - ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index, >>> - UINT_MAX, GFP_KERNEL); >>> - if (!ret) >>> - idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY), >>> - *index); >>> + max = UINT_MAX; >>> } >>> + >>> +new: >>> + *a = NULL; >>> + >>> + mutex_lock(&idrinfo->lock); >>> + ret = idr_alloc_u32(&idrinfo->action_idr, ERR_PTR(-EBUSY), index, max, >>> + GFP_KERNEL); >> What if multiple concurrent tasks didn't find the action by index with >> rcu and get here, synchronizing on the idrinfo->lock? It looks like >> after the one who got the lock first successfully allocates the index >> everyone else will fail (also propagating ENOSPACE to the user). > > Correct > >> I guess you need some mechanism to account for such case and retry. > > Ok, so if I'm binding and it's observed a free index, which means "try to > allocate" and I get a ENOSPC after jumping to new, try again but this time > binding into the allocated action. > > In this scenario when we come back to 'again' we will wait until -EBUSY is > replaced with the real pointer. Seems like a big enough window that any race for > allocating from binding would most probably end up in this contention loop. > > However I think when we have these two retry mechanisms there's a extremely > small window for an infinite loop if an action delete is timed just right, in > between the action pointer is found and when we grab the tcfa_refcnt. > > idr_find (pointer) > tcfa_refcnt (0) <-------| > again: | > idr_find (free index!) | > new: | > idr_alloc_u32 (ENOSPC) | > again: | > idr_find (EBUSY) | > again: | > idr_find (pointer) | > <evil delete happens> | > ------->>>>--------------| I'm not sure I'm following. Why would this sequence cause infinite loop? > > Another potential problem, is that this will race with non binding actions. So > if the ENOSPC was actually from another unrelated action. A practical example > would be a race between a binding to an 'action drop index 1' and an 'action ok' > allocation. Actually it's a general problem and not particular to this case here > but it seems like we could be amplifying it. > > I'm conflicted here. If I were to choose one of the two, I would pick the action > respawing as to me it seems to converge much quicker and removes the uapi change > (my bad! :). > > As for usability, if all of a sudden there's a huge influx of ENOSPC errors > because users are abusing 'tc filter add ... action index 1 ...' in parallel > _before_ actually creating the action object the fix is to just: > tc actions add action index 1 ... > tc filter add ... > > As tc has always supported
On 06/12/2023 06:52, Vlad Buslov wrote: >> Ok, so if I'm binding and it's observed a free index, which means "try to >> allocate" and I get a ENOSPC after jumping to new, try again but this time >> binding into the allocated action. >> >> In this scenario when we come back to 'again' we will wait until -EBUSY is >> replaced with the real pointer. Seems like a big enough window that any race for >> allocating from binding would most probably end up in this contention loop. >> >> However I think when we have these two retry mechanisms there's a extremely >> small window for an infinite loop if an action delete is timed just right, in >> between the action pointer is found and when we grab the tcfa_refcnt. >> >> idr_find (pointer) >> tcfa_refcnt (0) <-------| >> again: | >> idr_find (free index!) | >> new: | >> idr_alloc_u32 (ENOSPC) | >> again: | >> idr_find (EBUSY) | >> again: | >> idr_find (pointer) | >> <evil delete happens> | >> ------->>>>--------------| > > I'm not sure I'm following. Why would this sequence cause infinite loop? > Perhaps I was being overly paranoid. Taking a look again it seems that not only an evil delete but also EBUSY must be in the action idr for a long time. I see it now, it looks like it converges. I was wondering if instead of looping in 'again:' in either scenarios you presented, what if we return -EAGAIN and let the filter infrastructure retry it under rtnl_lock()? At least will give enough breathing room for a call to schedule() to kick in if needed.
On Fri 08 Dec 2023 at 18:07, Pedro Tammela <pctammela@mojatatu.com> wrote: > On 06/12/2023 06:52, Vlad Buslov wrote: >>> Ok, so if I'm binding and it's observed a free index, which means "try to >>> allocate" and I get a ENOSPC after jumping to new, try again but this time >>> binding into the allocated action. >>> >>> In this scenario when we come back to 'again' we will wait until -EBUSY is >>> replaced with the real pointer. Seems like a big enough window that any race for >>> allocating from binding would most probably end up in this contention loop. >>> >>> However I think when we have these two retry mechanisms there's a extremely >>> small window for an infinite loop if an action delete is timed just right, in >>> between the action pointer is found and when we grab the tcfa_refcnt. >>> >>> idr_find (pointer) >>> tcfa_refcnt (0) <-------| >>> again: | >>> idr_find (free index!) | >>> new: | >>> idr_alloc_u32 (ENOSPC) | >>> again: | >>> idr_find (EBUSY) | >>> again: | >>> idr_find (pointer) | >>> <evil delete happens> | >>> ------->>>>--------------| >> I'm not sure I'm following. Why would this sequence cause infinite loop? >> > > Perhaps I was being overly paranoid. Taking a look again it seems that not only > an evil delete but also EBUSY must be in the action idr for a long time. I see > it now, it looks like it converges. > > I was wondering if instead of looping in 'again:' in either scenarios you > presented, what if we return -EAGAIN and let the filter infrastructure retry it > under rtnl_lock()? At least will give enough breathing room for a call to > schedule() to kick in if needed. Sounds good, but you will need to ensure that both act and cls api implementations properly retry on EAGAIN (looks like they do, but I only gave it a cursory glance).
diff --git a/net/sched/act_api.c b/net/sched/act_api.c index abec5c45b5a4..79a044d2ae02 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -824,43 +824,55 @@ int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index, struct tcf_idrinfo *idrinfo = tn->idrinfo; struct tc_action *p; int ret; + u32 max; -again: - mutex_lock(&idrinfo->lock); if (*index) { +again: + rcu_read_lock(); p = idr_find(&idrinfo->action_idr, *index); + if (IS_ERR(p)) { /* This means that another process allocated * index but did not assign the pointer yet. */ - mutex_unlock(&idrinfo->lock); + rcu_read_unlock(); goto again; } - if (p) { - refcount_inc(&p->tcfa_refcnt); - if (bind) - atomic_inc(&p->tcfa_bindcnt); - *a = p; - ret = 1; - } else { - *a = NULL; - ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index, - *index, GFP_KERNEL); - if (!ret) - idr_replace(&idrinfo->action_idr, - ERR_PTR(-EBUSY), *index); + if (!p) { + /* Empty slot, try to allocate it */ + max = *index; + rcu_read_unlock(); + goto new; + } + + if (!refcount_inc_not_zero(&p->tcfa_refcnt)) { + /* Action was deleted in parallel */ + rcu_read_unlock(); + return -ENOENT; } + + if (bind) + atomic_inc(&p->tcfa_bindcnt); + *a = p; + + rcu_read_unlock(); + + return 1; } else { + /* Find a slot */ *index = 1; - *a = NULL; - ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index, - UINT_MAX, GFP_KERNEL); - if (!ret) - idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY), - *index); + max = UINT_MAX; } + +new: + *a = NULL; + + mutex_lock(&idrinfo->lock); + ret = idr_alloc_u32(&idrinfo->action_idr, ERR_PTR(-EBUSY), index, max, + GFP_KERNEL); mutex_unlock(&idrinfo->lock); + return ret; } EXPORT_SYMBOL(tcf_idr_check_alloc);
Instead of relying only on the idrinfo->lock mutex for bind/alloc logic, rely on a combination of rcu + mutex + atomics to better scale the case where multiple rtnl-less filters are binding to the same action object. Action binding happens when an action index is specified explicitly and an action exists which such index exists. Example: tc actions add action drop index 1 tc filter add ... matchall action drop index 1 tc filter add ... matchall action drop index 1 tc filter add ... matchall action drop index 1 tc filter ls ... filter protocol all pref 49150 matchall chain 0 filter protocol all pref 49150 matchall chain 0 handle 0x1 not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 4 bind 3 filter protocol all pref 49151 matchall chain 0 filter protocol all pref 49151 matchall chain 0 handle 0x1 not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 4 bind 3 filter protocol all pref 49152 matchall chain 0 filter protocol all pref 49152 matchall chain 0 handle 0x1 not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 4 bind 3 When no index is specified, as before, grab the mutex and allocate in the idr the next available id. In this version, as opposed to before, it's simplified to store the -EBUSY pointer instead of the previous alloc + replace combination. When an index is specified, rely on rcu to find if there's an object in such index. If there's none, fallback to the above, serializing on the mutex and reserving the specified id. If there's one, it can be an -EBUSY pointer, in which case we just try again until it's an action, or an action. Given the rcu guarantees, the action found could be dead and therefore we need to bump the refcount if it's not 0, handling the case it's in fact 0. As bind and the action refcount are already atomics, these increments can happen without the mutex protection while many tcf_idr_check_alloc race to bind to the same action instance. Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> --- net/sched/act_api.c | 56 +++++++++++++++++++++++++++------------------ 1 file changed, 34 insertions(+), 22 deletions(-)