Message ID | 20250328201634.3876474-2-tavip@google.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net_sched: sch_sfq: reject a derived limit of 1 | expand |
On Fri, Mar 28, 2025 at 01:16:32PM -0700, Octavian Purdila wrote: > Many configuration parameters have influence on others (e.g. divisor > -> flows -> limit, depth -> limit) and so it is difficult to correctly > do all of the validation before applying the configuration. And if a > validation error is detected late it is difficult to roll back a > partially applied configuration. > > To avoid these issues use a temporary work area to update and validate > the configuration and only then apply the configuration to the > internal state. > > Signed-off-by: Octavian Purdila <tavip@google.com> > --- > net/sched/sch_sfq.c | 60 ++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 48 insertions(+), 12 deletions(-) > > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > index 65d5b59da583..027a3fde2139 100644 > --- a/net/sched/sch_sfq.c > +++ b/net/sched/sch_sfq.c > @@ -631,6 +631,18 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt, > struct red_parms *p = NULL; > struct sk_buff *to_free = NULL; > struct sk_buff *tail = NULL; > + /* work area for validating changes before committing them */ > + struct { > + int limit; > + unsigned int divisor; > + unsigned int maxflows; > + int perturb_period; > + unsigned int quantum; > + u8 headdrop; > + u8 maxdepth; > + u8 flags; > + } tmp; Thanks for your patch. It reminds me again about the lacking of complete RCU support in TC. ;-) Instead of using a temporary struct, how about introducing a new one called struct sfq_sched_opt and putting it inside struct sfq_sched_data? It looks more elegant to me. Regards, Cong
On 3/31/25 1:49 AM, Cong Wang wrote: > On Fri, Mar 28, 2025 at 01:16:32PM -0700, Octavian Purdila wrote: >> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c >> index 65d5b59da583..027a3fde2139 100644 >> --- a/net/sched/sch_sfq.c >> +++ b/net/sched/sch_sfq.c >> @@ -631,6 +631,18 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt, >> struct red_parms *p = NULL; >> struct sk_buff *to_free = NULL; >> struct sk_buff *tail = NULL; >> + /* work area for validating changes before committing them */ >> + struct { >> + int limit; >> + unsigned int divisor; >> + unsigned int maxflows; >> + int perturb_period; >> + unsigned int quantum; >> + u8 headdrop; >> + u8 maxdepth; >> + u8 flags; >> + } tmp; > > Thanks for your patch. It reminds me again about the lacking of complete > RCU support in TC. ;-) > > Instead of using a temporary struct, how about introducing a new one > called struct sfq_sched_opt and putting it inside struct sfq_sched_data? > It looks more elegant to me. I agree with that. It should also make the code more compact. @Octavian, please update the patch as per Cong's suggestion. Thanks, Paolo
On Tue, Apr 1, 2025 at 11:27 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On 3/31/25 1:49 AM, Cong Wang wrote: > > On Fri, Mar 28, 2025 at 01:16:32PM -0700, Octavian Purdila wrote: > >> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > >> index 65d5b59da583..027a3fde2139 100644 > >> --- a/net/sched/sch_sfq.c > >> +++ b/net/sched/sch_sfq.c > >> @@ -631,6 +631,18 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt, > >> struct red_parms *p = NULL; > >> struct sk_buff *to_free = NULL; > >> struct sk_buff *tail = NULL; > >> + /* work area for validating changes before committing them */ > >> + struct { > >> + int limit; > >> + unsigned int divisor; > >> + unsigned int maxflows; > >> + int perturb_period; > >> + unsigned int quantum; > >> + u8 headdrop; > >> + u8 maxdepth; > >> + u8 flags; > >> + } tmp; > > > > Thanks for your patch. It reminds me again about the lacking of complete > > RCU support in TC. ;-) > > > > Instead of using a temporary struct, how about introducing a new one > > called struct sfq_sched_opt and putting it inside struct sfq_sched_data? > > It looks more elegant to me. > > I agree with that. It should also make the code more compact. @Octavian, > please update the patch as per Cong's suggestion. The concern with this approach was data locality. I had in my TODO list a patch to remove (accumulated over time) holes and put together hot fields. Something like : struct sfq_sched_data { int limit; /* 0 0x4 */ unsigned int divisor; /* 0x4 0x4 */ u8 headdrop; /* 0x8 0x1 */ u8 maxdepth; /* 0x9 0x1 */ u8 cur_depth; /* 0xa 0x1 */ u8 flags; /* 0xb 0x1 */ unsigned int quantum; /* 0xc 0x4 */ siphash_key_t perturbation; /* 0x10 0x10 */ struct tcf_proto * filter_list; /* 0x20 0x8 */ struct tcf_block * block; /* 0x28 0x8 */ sfq_index * ht; /* 0x30 0x8 */ struct sfq_slot * slots; /* 0x38 0x8 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct red_parms * red_parms; /* 0x40 0x8 */ struct tc_sfqred_stats stats; /* 0x48 0x18 */ struct sfq_slot * tail; /* 0x60 0x8 */ struct sfq_head dep[128]; /* 0x68 0x200 */ /* --- cacheline 9 boundary (576 bytes) was 40 bytes ago --- */ unsigned int maxflows; /* 0x268 0x4 */ int perturb_period; /* 0x26c 0x4 */ struct timer_list perturb_timer; /* 0x270 0x28 */ /* XXX last struct has 4 bytes of padding */ /* --- cacheline 10 boundary (640 bytes) was 24 bytes ago --- */ struct Qdisc * sch; /* 0x298 0x8 */ /* size: 672, cachelines: 11, members: 20 */ /* paddings: 1, sum paddings: 4 */ /* last cacheline: 32 bytes */ }; With this patch : diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 65d5b59da583..f8fec2bc0d25 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -110,10 +110,11 @@ struct sfq_sched_data { unsigned int divisor; /* number of slots in hash table */ u8 headdrop; u8 maxdepth; /* limit of packets per flow */ - - siphash_key_t perturbation; u8 cur_depth; /* depth of longest slot */ u8 flags; + unsigned int quantum; /* Allotment per round: MUST BE >= MTU */ + + siphash_key_t perturbation; struct tcf_proto __rcu *filter_list; struct tcf_block *block; sfq_index *ht; /* Hash table ('divisor' slots) */ @@ -132,7 +133,6 @@ struct sfq_sched_data { unsigned int maxflows; /* number of flows in flows array */ int perturb_period; - unsigned int quantum; /* Allotment per round: MUST BE >= MTU */ struct timer_list perturb_timer; struct Qdisc *sch; };
On 4/1/25 12:47 PM, Eric Dumazet wrote: > On Tue, Apr 1, 2025 at 11:27 AM Paolo Abeni <pabeni@redhat.com> wrote: >> On 3/31/25 1:49 AM, Cong Wang wrote: >>> On Fri, Mar 28, 2025 at 01:16:32PM -0700, Octavian Purdila wrote: >>>> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c >>>> index 65d5b59da583..027a3fde2139 100644 >>>> --- a/net/sched/sch_sfq.c >>>> +++ b/net/sched/sch_sfq.c >>>> @@ -631,6 +631,18 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt, >>>> struct red_parms *p = NULL; >>>> struct sk_buff *to_free = NULL; >>>> struct sk_buff *tail = NULL; >>>> + /* work area for validating changes before committing them */ >>>> + struct { >>>> + int limit; >>>> + unsigned int divisor; >>>> + unsigned int maxflows; >>>> + int perturb_period; >>>> + unsigned int quantum; >>>> + u8 headdrop; >>>> + u8 maxdepth; >>>> + u8 flags; >>>> + } tmp; >>> >>> Thanks for your patch. It reminds me again about the lacking of complete >>> RCU support in TC. ;-) >>> >>> Instead of using a temporary struct, how about introducing a new one >>> called struct sfq_sched_opt and putting it inside struct sfq_sched_data? >>> It looks more elegant to me. >> >> I agree with that. It should also make the code more compact. @Octavian, >> please update the patch as per Cong's suggestion. > > The concern with this approach was data locality. I did not consider that aspect. How about not using the struct at all, then? int cur_limit; // ... u8 cur_flags; the 'tmp' struct is IMHO not so nice. > I had in my TODO list a patch to remove (accumulated over time) holes > and put together hot fields. > > Something like : > > struct sfq_sched_data { > int limit; /* 0 0x4 */ > unsigned int divisor; /* 0x4 0x4 */ > u8 headdrop; /* 0x8 0x1 */ > u8 maxdepth; /* 0x9 0x1 */ > u8 cur_depth; /* 0xa 0x1 */ > u8 flags; /* 0xb 0x1 */ > unsigned int quantum; /* 0xc 0x4 */ > siphash_key_t perturbation; /* 0x10 0x10 */ > struct tcf_proto * filter_list; /* 0x20 0x8 */ > struct tcf_block * block; /* 0x28 0x8 */ > sfq_index * ht; /* 0x30 0x8 */ > struct sfq_slot * slots; /* 0x38 0x8 */ > /* --- cacheline 1 boundary (64 bytes) --- */ > struct red_parms * red_parms; /* 0x40 0x8 */ > struct tc_sfqred_stats stats; /* 0x48 0x18 */ > struct sfq_slot * tail; /* 0x60 0x8 */ > struct sfq_head dep[128]; /* 0x68 0x200 */ > /* --- cacheline 9 boundary (576 bytes) was 40 bytes ago --- */ > unsigned int maxflows; /* 0x268 0x4 */ > int perturb_period; /* 0x26c 0x4 */ > struct timer_list perturb_timer; /* 0x270 0x28 */ > > /* XXX last struct has 4 bytes of padding */ > > /* --- cacheline 10 boundary (640 bytes) was 24 bytes ago --- */ > struct Qdisc * sch; /* 0x298 0x8 */ > > /* size: 672, cachelines: 11, members: 20 */ > /* paddings: 1, sum paddings: 4 */ > /* last cacheline: 32 bytes */ > }; > > > With this patch : > > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > index 65d5b59da583..f8fec2bc0d25 100644 > --- a/net/sched/sch_sfq.c > +++ b/net/sched/sch_sfq.c > @@ -110,10 +110,11 @@ struct sfq_sched_data { > unsigned int divisor; /* number of slots in hash table */ > u8 headdrop; > u8 maxdepth; /* limit of packets per flow */ > - > - siphash_key_t perturbation; > u8 cur_depth; /* depth of longest slot */ > u8 flags; > + unsigned int quantum; /* Allotment per round: MUST > BE >= MTU */ > + > + siphash_key_t perturbation; > struct tcf_proto __rcu *filter_list; > struct tcf_block *block; > sfq_index *ht; /* Hash table ('divisor' slots) */ > @@ -132,7 +133,6 @@ struct sfq_sched_data { > > unsigned int maxflows; /* number of flows in flows array */ > int perturb_period; Would it make any sense to additionally move 'maxflows' and 'perturb_period' at the top, just after 'perturbation'? Thanks, Paolo
On Tue, Apr 1, 2025 at 4:13 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On 4/1/25 12:47 PM, Eric Dumazet wrote: > > On Tue, Apr 1, 2025 at 11:27 AM Paolo Abeni <pabeni@redhat.com> wrote: > >> On 3/31/25 1:49 AM, Cong Wang wrote: > >>> On Fri, Mar 28, 2025 at 01:16:32PM -0700, Octavian Purdila wrote: > >>>> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > >>>> index 65d5b59da583..027a3fde2139 100644 > >>>> --- a/net/sched/sch_sfq.c > >>>> +++ b/net/sched/sch_sfq.c > >>>> @@ -631,6 +631,18 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt, > >>>> struct red_parms *p = NULL; > >>>> struct sk_buff *to_free = NULL; > >>>> struct sk_buff *tail = NULL; > >>>> + /* work area for validating changes before committing them */ > >>>> + struct { > >>>> + int limit; > >>>> + unsigned int divisor; > >>>> + unsigned int maxflows; > >>>> + int perturb_period; > >>>> + unsigned int quantum; > >>>> + u8 headdrop; > >>>> + u8 maxdepth; > >>>> + u8 flags; > >>>> + } tmp; > >>> > >>> Thanks for your patch. It reminds me again about the lacking of complete > >>> RCU support in TC. ;-) > >>> > >>> Instead of using a temporary struct, how about introducing a new one > >>> called struct sfq_sched_opt and putting it inside struct sfq_sched_data? > >>> It looks more elegant to me. > >> > >> I agree with that. It should also make the code more compact. @Octavian, > >> please update the patch as per Cong's suggestion. > > > > The concern with this approach was data locality. > > I did not consider that aspect. > > How about not using the struct at all, then? > > int cur_limit; > // ... > u8 cur_flags; > > the 'tmp' struct is IMHO not so nice. > Thanks Paolo and Cong for the review. I'll drop the structure, my initial thoughts were that by grouping them into a structure it would make it easier to understand the purpose of those variables, especially since we have a lot of local variables now. > > I had in my TODO list a patch to remove (accumulated over time) holes > > and put together hot fields. > > > > Something like : > > > > struct sfq_sched_data { > > int limit; /* 0 0x4 */ > > unsigned int divisor; /* 0x4 0x4 */ > > u8 headdrop; /* 0x8 0x1 */ > > u8 maxdepth; /* 0x9 0x1 */ > > u8 cur_depth; /* 0xa 0x1 */ > > u8 flags; /* 0xb 0x1 */ > > unsigned int quantum; /* 0xc 0x4 */ > > siphash_key_t perturbation; /* 0x10 0x10 */ > > struct tcf_proto * filter_list; /* 0x20 0x8 */ > > struct tcf_block * block; /* 0x28 0x8 */ > > sfq_index * ht; /* 0x30 0x8 */ > > struct sfq_slot * slots; /* 0x38 0x8 */ > > /* --- cacheline 1 boundary (64 bytes) --- */ > > struct red_parms * red_parms; /* 0x40 0x8 */ > > struct tc_sfqred_stats stats; /* 0x48 0x18 */ > > struct sfq_slot * tail; /* 0x60 0x8 */ > > struct sfq_head dep[128]; /* 0x68 0x200 */ > > /* --- cacheline 9 boundary (576 bytes) was 40 bytes ago --- */ > > unsigned int maxflows; /* 0x268 0x4 */ > > int perturb_period; /* 0x26c 0x4 */ > > struct timer_list perturb_timer; /* 0x270 0x28 */ > > > > /* XXX last struct has 4 bytes of padding */ > > > > /* --- cacheline 10 boundary (640 bytes) was 24 bytes ago --- */ > > struct Qdisc * sch; /* 0x298 0x8 */ > > > > /* size: 672, cachelines: 11, members: 20 */ > > /* paddings: 1, sum paddings: 4 */ > > /* last cacheline: 32 bytes */ > > }; > > > > > > With this patch : > > > > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > > index 65d5b59da583..f8fec2bc0d25 100644 > > --- a/net/sched/sch_sfq.c > > +++ b/net/sched/sch_sfq.c > > @@ -110,10 +110,11 @@ struct sfq_sched_data { > > unsigned int divisor; /* number of slots in hash table */ > > u8 headdrop; > > u8 maxdepth; /* limit of packets per flow */ > > - > > - siphash_key_t perturbation; > > u8 cur_depth; /* depth of longest slot */ > > u8 flags; > > + unsigned int quantum; /* Allotment per round: MUST > > BE >= MTU */ > > + > > + siphash_key_t perturbation; > > struct tcf_proto __rcu *filter_list; > > struct tcf_block *block; > > sfq_index *ht; /* Hash table ('divisor' slots) */ > > @@ -132,7 +133,6 @@ struct sfq_sched_data { > > > > unsigned int maxflows; /* number of flows in flows array */ > > int perturb_period; > > Would it make any sense to additionally move 'maxflows' and > 'perturb_period' at the top, just after 'perturbation'? > > Thanks, > > Paolo >
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 65d5b59da583..027a3fde2139 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -631,6 +631,18 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt, struct red_parms *p = NULL; struct sk_buff *to_free = NULL; struct sk_buff *tail = NULL; + /* work area for validating changes before committing them */ + struct { + int limit; + unsigned int divisor; + unsigned int maxflows; + int perturb_period; + unsigned int quantum; + u8 headdrop; + u8 maxdepth; + u8 flags; + } tmp; + if (opt->nla_len < nla_attr_size(sizeof(*ctl))) return -EINVAL; @@ -656,36 +668,60 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt, NL_SET_ERR_MSG_MOD(extack, "invalid limit"); return -EINVAL; } + sch_tree_lock(sch); + + /* copy configuration to work area */ + tmp.limit = q->limit; + tmp.divisor = q->divisor; + tmp.headdrop = q->headdrop; + tmp.maxdepth = q->maxdepth; + tmp.maxflows = q->maxflows; + tmp.perturb_period = q->perturb_period; + tmp.quantum = q->quantum; + tmp.flags = q->flags; + + /* update and validate configuration */ if (ctl->quantum) - q->quantum = ctl->quantum; - WRITE_ONCE(q->perturb_period, ctl->perturb_period * HZ); + tmp.quantum = ctl->quantum; + tmp.perturb_period = ctl->perturb_period * HZ; if (ctl->flows) - q->maxflows = min_t(u32, ctl->flows, SFQ_MAX_FLOWS); + tmp.maxflows = min_t(u32, ctl->flows, SFQ_MAX_FLOWS); if (ctl->divisor) { - q->divisor = ctl->divisor; - q->maxflows = min_t(u32, q->maxflows, q->divisor); + tmp.divisor = ctl->divisor; + tmp.maxflows = min_t(u32, tmp.maxflows, tmp.divisor); } if (ctl_v1) { if (ctl_v1->depth) - q->maxdepth = min_t(u32, ctl_v1->depth, SFQ_MAX_DEPTH); + tmp.maxdepth = min_t(u32, ctl_v1->depth, SFQ_MAX_DEPTH); if (p) { - swap(q->red_parms, p); - red_set_parms(q->red_parms, + red_set_parms(p, ctl_v1->qth_min, ctl_v1->qth_max, ctl_v1->Wlog, ctl_v1->Plog, ctl_v1->Scell_log, NULL, ctl_v1->max_P); } - q->flags = ctl_v1->flags; - q->headdrop = ctl_v1->headdrop; + tmp.flags = ctl_v1->flags; + tmp.headdrop = ctl_v1->headdrop; } if (ctl->limit) { - q->limit = min_t(u32, ctl->limit, q->maxdepth * q->maxflows); - q->maxflows = min_t(u32, q->maxflows, q->limit); + tmp.limit = min_t(u32, ctl->limit, tmp.maxdepth * tmp.maxflows); + tmp.maxflows = min_t(u32, tmp.maxflows, tmp.limit); } + /* commit configuration, no return from this point further */ + q->limit = tmp.limit; + q->divisor = tmp.divisor; + q->headdrop = tmp.headdrop; + q->maxdepth = tmp.maxdepth; + q->maxflows = tmp.maxflows; + WRITE_ONCE(q->perturb_period, tmp.perturb_period); + q->quantum = tmp.quantum; + q->flags = tmp.flags; + if (p) + swap(q->red_parms, p); + qlen = sch->q.qlen; while (sch->q.qlen > q->limit) { dropped += sfq_drop(sch, &to_free);
Many configuration parameters have influence on others (e.g. divisor -> flows -> limit, depth -> limit) and so it is difficult to correctly do all of the validation before applying the configuration. And if a validation error is detected late it is difficult to roll back a partially applied configuration. To avoid these issues use a temporary work area to update and validate the configuration and only then apply the configuration to the internal state. Signed-off-by: Octavian Purdila <tavip@google.com> --- net/sched/sch_sfq.c | 60 ++++++++++++++++++++++++++++++++++++--------- 1 file changed, 48 insertions(+), 12 deletions(-)