diff mbox series

[RESEND,net-next,5/5] net/sched: taprio: dump class stats for the actual q->qdiscs[]

Message ID 20230602103750.2290132-6-vladimir.oltean@nxp.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Improve the taprio qdisc's relationship with its children | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 22 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Vladimir Oltean June 2, 2023, 10:37 a.m. UTC
This makes a difference for the software scheduling mode, where
dev_queue->qdisc_sleeping is the same as the taprio root Qdisc itself,
but when we're talking about what Qdisc and stats get reported for a
traffic class, the root taprio isn't what comes to mind, but q->qdiscs[]
is.

To understand the difference, I've attempted to send 100 packets in
software mode through traffic class 0 (they are in the Qdisc's backlog),
and recorded the stats before and after the change.

Here is before:

$ tc -s class show dev eth0
class taprio 8001:1 root leaf 8001:
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 9400b 100p requeues 0
 Window drops: 0
class taprio 8001:2 root leaf 8001:
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 9400b 100p requeues 0
 Window drops: 0
class taprio 8001:3 root leaf 8001:
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 9400b 100p requeues 0
 Window drops: 0
class taprio 8001:4 root leaf 8001:
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 9400b 100p requeues 0
 Window drops: 0
class taprio 8001:5 root leaf 8001:
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 9400b 100p requeues 0
 Window drops: 0
class taprio 8001:6 root leaf 8001:
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 9400b 100p requeues 0
 Window drops: 0
class taprio 8001:7 root leaf 8001:
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 9400b 100p requeues 0
 Window drops: 0
class taprio 8001:8 root leaf 8001:
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 9400b 100p requeues 0
 Window drops: 0

and here is after:

class taprio 8001:1 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 9400b 100p requeues 0
 Window drops: 0
class taprio 8001:2 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 Window drops: 0
class taprio 8001:3 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 Window drops: 0
class taprio 8001:4 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 Window drops: 0
class taprio 8001:5 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 Window drops: 0
class taprio 8001:6 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 Window drops: 0
class taprio 8001:7 root leaf 8010:
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 Window drops: 0
class taprio 8001:8 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 Window drops: 0

The most glaring (and expected) difference is that before, all class
stats reported the global stats, whereas now, they really report just
the counters for that traffic class.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/sched/sch_taprio.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Jamal Hadi Salim June 8, 2023, 6:44 p.m. UTC | #1
On Fri, Jun 2, 2023 at 6:38 AM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>
> This makes a difference for the software scheduling mode, where
> dev_queue->qdisc_sleeping is the same as the taprio root Qdisc itself,
> but when we're talking about what Qdisc and stats get reported for a
> traffic class, the root taprio isn't what comes to mind, but q->qdiscs[]
> is.
>
> To understand the difference, I've attempted to send 100 packets in
> software mode through traffic class 0 (they are in the Qdisc's backlog),
> and recorded the stats before and after the change.
>

Other than the refcount issue i think the approach looks reasonable to
me. The stats before/after you are showing below though are
interesting; are you showing a transient phase where packets are
temporarily in the backlog. Typically the backlog is a transient phase
which lasts a very short period. Maybe it works differently for
taprio? I took a quick look at the code and do see to decrement the
backlog in the dequeue, so if it is not transient then some code path
is not being hit.

Aside: I realize you are busy - but if you get time and provide some
sample tc command lines for testing we could help create the tests for
you, at least the first time. The advantage of putting these tests in
tools/testing/selftests/tc-testing/ is that there are test tools out
there that run these tests and so regressions are easier to catch
sooner.

cheers,
jamal

> Here is before:
>
> $ tc -s class show dev eth0
> class taprio 8001:1 root leaf 8001:
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 9400b 100p requeues 0
>  Window drops: 0
> class taprio 8001:2 root leaf 8001:
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 9400b 100p requeues 0
>  Window drops: 0
> class taprio 8001:3 root leaf 8001:
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 9400b 100p requeues 0
>  Window drops: 0
> class taprio 8001:4 root leaf 8001:
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 9400b 100p requeues 0
>  Window drops: 0
> class taprio 8001:5 root leaf 8001:
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 9400b 100p requeues 0
>  Window drops: 0
> class taprio 8001:6 root leaf 8001:
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 9400b 100p requeues 0
>  Window drops: 0
> class taprio 8001:7 root leaf 8001:
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 9400b 100p requeues 0
>  Window drops: 0
> class taprio 8001:8 root leaf 8001:
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 9400b 100p requeues 0
>  Window drops: 0
>
> and here is after:
>
> class taprio 8001:1 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 9400b 100p requeues 0
>  Window drops: 0
> class taprio 8001:2 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>  Window drops: 0
> class taprio 8001:3 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>  Window drops: 0
> class taprio 8001:4 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>  Window drops: 0
> class taprio 8001:5 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>  Window drops: 0
> class taprio 8001:6 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>  Window drops: 0
> class taprio 8001:7 root leaf 8010:
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>  Window drops: 0
> class taprio 8001:8 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>  Window drops: 0
>
> The most glaring (and expected) difference is that before, all class
> stats reported the global stats, whereas now, they really report just
> the counters for that traffic class.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  net/sched/sch_taprio.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
> index cc7ff98e5e86..23b98c3af8b2 100644
> --- a/net/sched/sch_taprio.c
> +++ b/net/sched/sch_taprio.c
> @@ -2452,11 +2452,11 @@ static unsigned long taprio_find(struct Qdisc *sch, u32 classid)
>  static int taprio_dump_class(struct Qdisc *sch, unsigned long cl,
>                              struct sk_buff *skb, struct tcmsg *tcm)
>  {
> -       struct netdev_queue *dev_queue = taprio_queue_get(sch, cl);
> +       struct Qdisc *child = taprio_leaf(sch, cl);
>
>         tcm->tcm_parent = TC_H_ROOT;
>         tcm->tcm_handle |= TC_H_MIN(cl);
> -       tcm->tcm_info = dev_queue->qdisc_sleeping->handle;
> +       tcm->tcm_info = child->handle;
>
>         return 0;
>  }
> @@ -2466,8 +2466,7 @@ static int taprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
>         __releases(d->lock)
>         __acquires(d->lock)
>  {
> -       struct netdev_queue *dev_queue = taprio_queue_get(sch, cl);
> -       struct Qdisc *child = dev_queue->qdisc_sleeping;
> +       struct Qdisc *child = taprio_leaf(sch, cl);
>         struct tc_taprio_qopt_offload offload = {
>                 .cmd = TAPRIO_CMD_TC_STATS,
>                 .tc_stats = {
> --
> 2.34.1
>
Vladimir Oltean June 9, 2023, 12:10 p.m. UTC | #2
On Thu, Jun 08, 2023 at 02:44:46PM -0400, Jamal Hadi Salim wrote:
> Other than the refcount issue i think the approach looks reasonable to
> me. The stats before/after you are showing below though are
> interesting; are you showing a transient phase where packets are
> temporarily in the backlog. Typically the backlog is a transient phase
> which lasts a very short period. Maybe it works differently for
> taprio? I took a quick look at the code and do see to decrement the
> backlog in the dequeue, so if it is not transient then some code path
> is not being hit.

It's a fair concern. The thing is that I put very aggressive time slots
in the schedule that I'm testing with, and my kernel has a lot of
debugging stuff which bogs it down (kasan, kmemleak, lockdep, DMA API
debug etc). Not to mention that the CPU isn't the fastest to begin with.

The way taprio works is that there's a hrtimer which fires at the
expiration time of the current schedule entry and sets up the gates for
the next one. Each schedule entry has a gate for each traffic class
which determines what traffic classes are eligible for dequeue() and
which ones aren't.

The dequeue() procedure, though also invoked by the advance_schedule()
hrtimer -> __netif_schedule(), is also time-sensitive. By the time
taprio_dequeue() runs, taprio_entry_allows_tx() function might return
false when the system is so bogged down that it wasn't able to make
enough progress to dequeue() an skb in time. When that happens, there is
no mechanism, currently, to age out packets that stood too much in the
TX queues (what does "too much" mean?).

Whereas enqueue() is technically not time-sensitive, i.e. you can
enqueue whenever you want and the Qdisc will dequeue whenever it can.
Though in practice, to make this scheduling technique useful, the user
space enqueue should also be time-aware (though you can't capture this
with ping).

If I increase all my sched-entry intervals by a factor of 100, the
backlog issue goes away and the system can make forward progress.

So yeah, sorry, I didn't pay too much attention to the data I was
presenting for illustrative purposes.

> Aside: I realize you are busy - but if you get time and provide some
> sample tc command lines for testing we could help create the tests for
> you, at least the first time. The advantage of putting these tests in
> tools/testing/selftests/tc-testing/ is that there are test tools out
> there that run these tests and so regressions are easier to catch
> sooner.

Yeah, ok. The script posted in a reply on the cover letter is still what
I'm working with. The things it intends to capture are:
- attaching a custom Qdisc to one of taprio's classes doesn't fail
- attaching taprio to one of taprio's classes fails
- sending packets through one queue increases the counters (any counters)
  of just that queue

All the above, replicated once for the software scheduling case and once
for the offload case. Currently netdevsim doesn't attempt to emulate
taprio offload.

Is there a way to skip tests? I may look into tdc, but I honestly don't
have time for unrelated stuff such as figuring out why my kernel isn't
configured for the other tests to pass - and it seems that once one test
fails, the others are completely skipped, see below.

Also, by which rule are the test IDs created?

root@debian:~# cd selftests/tc-testing/
root@debian:~/selftests/tc-testing# ./tdc.sh
considering category qdisc
 -- ns/SubPlugin.__init__
Test 0582: Create QFQ with default setting
Test c9a3: Create QFQ with class weight setting
Test d364: Test QFQ with max class weight setting
Test 8452: Create QFQ with class maxpkt setting
Test 22df: Test QFQ class maxpkt setting lower bound
Test 92ee: Test QFQ class maxpkt setting upper bound
Test d920: Create QFQ with multiple class setting
Test 0548: Delete QFQ with handle
Test 5901: Show QFQ class
Test 0385: Create DRR with default setting
Test 2375: Delete DRR with handle
Test 3092: Show DRR class
Test 3460: Create CBQ with default setting
exit: 2
exit: 0
Error: Specified qdisc kind is unknown.


-----> teardown stage *** Could not execute: "$TC qdisc del dev $DUMMY handle 1: root"

-----> teardown stage *** Error message: "Error: Invalid handle.
"
returncode 2; expected [0]

-----> teardown stage *** Aborting test run.


<_io.BufferedReader name=3> *** stdout ***


<_io.BufferedReader name=5> *** stderr ***
"-----> teardown stage" did not complete successfully
Exception <class '__main__.PluginMgrTestFail'> ('teardown', 'Error: Specified qdisc kind is unknown.\n', '"-----> teardown stage" did not complete successfully') (caught in test_runner, running test 14 3460 Create CBQ with default setting stage teardown)
---------------
traceback
  File "/root/selftests/tc-testing/./tdc.py", line 495, in test_runner
    res = run_one_test(pm, args, index, tidx)
  File "/root/selftests/tc-testing/./tdc.py", line 434, in run_one_test
    prepare_env(args, pm, 'teardown', '-----> teardown stage', tidx['teardown'], procout)
  File "/root/selftests/tc-testing/./tdc.py", line 245, in prepare_env
    raise PluginMgrTestFail(
---------------
accumulated output for this test: 
Error: Specified qdisc kind is unknown.
---------------

All test results:

1..336
ok 1 0582 - Create QFQ with default setting
ok 2 c9a3 - Create QFQ with class weight setting
ok 3 d364 - Test QFQ with max class weight setting
ok 4 8452 - Create QFQ with class maxpkt setting
ok 5 22df - Test QFQ class maxpkt setting lower bound
ok 6 92ee - Test QFQ class maxpkt setting upper bound
ok 7 d920 - Create QFQ with multiple class setting
ok 8 0548 - Delete QFQ with handle
ok 9 5901 - Show QFQ class
ok 10 0385 - Create DRR with default setting
ok 11 2375 - Delete DRR with handle
ok 12 3092 - Show DRR class
ok 13 3460 - Create CBQ with default setting # skipped - "-----> teardown stage" did not complete successfully

ok 14 0592 - Create CBQ with mpu # skipped - skipped - previous teardown failed 14 3460

ok 15 4684 - Create CBQ with valid cell num # skipped - skipped - previous teardown failed 14 3460

ok 16 4345 - Create CBQ with invalid cell num # skipped - skipped - previous teardown failed 14 3460

ok 17 4525 - Create CBQ with valid ewma # skipped - skipped - previous teardown failed 14 3460

ok 18 6784 - Create CBQ with invalid ewma # skipped - skipped - previous teardown failed 14 3460

ok 19 5468 - Delete CBQ with handle # skipped - skipped - previous teardown failed 14 3460

ok 20 492a - Show CBQ class # skipped - skipped - previous teardown failed 14 3460

ok 21 9903 - Add mqprio Qdisc to multi-queue device (8 queues) # skipped - skipped - previous teardown failed 14 3460

ok 22 453a - Delete nonexistent mqprio Qdisc # skipped - skipped - previous teardown failed 14 3460

ok 23 5292 - Delete mqprio Qdisc twice # skipped - skipped - previous teardown failed 14 3460

ok 24 45a9 - Add mqprio Qdisc to single-queue device # skipped - skipped - previous teardown failed 14 3460

ok 25 2ba9 - Show mqprio class # skipped - skipped - previous teardown failed 14 3460

ok 26 4812 - Create HHF with default setting # skipped - skipped - previous teardown failed 14 3460

ok 27 8a92 - Create HHF with limit setting # skipped - skipped - previous teardown failed 14 3460

ok 28 3491 - Create HHF with quantum setting # skipped - skipped - previous teardown failed 14 3460
(...)
Victor Nogueira June 9, 2023, 2:56 p.m. UTC | #3
On 09/06/2023 09:10, Vladimir Oltean wrote:
> On Thu, Jun 08, 2023 at 02:44:46PM -0400, Jamal Hadi Salim wrote:
>> Other than the refcount issue i think the approach looks reasonable to
>> me. The stats before/after you are showing below though are
>> interesting; are you showing a transient phase where packets are
>> temporarily in the backlog. Typically the backlog is a transient phase
>> which lasts a very short period. Maybe it works differently for
>> taprio? I took a quick look at the code and do see to decrement the
>> backlog in the dequeue, so if it is not transient then some code path
>> is not being hit.
> 
> It's a fair concern. The thing is that I put very aggressive time slots
> in the schedule that I'm testing with, and my kernel has a lot of
> debugging stuff which bogs it down (kasan, kmemleak, lockdep, DMA API
> debug etc). Not to mention that the CPU isn't the fastest to begin with.
> 
> The way taprio works is that there's a hrtimer which fires at the
> expiration time of the current schedule entry and sets up the gates for
> the next one. Each schedule entry has a gate for each traffic class
> which determines what traffic classes are eligible for dequeue() and
> which ones aren't.
> 
> The dequeue() procedure, though also invoked by the advance_schedule()
> hrtimer -> __netif_schedule(), is also time-sensitive. By the time
> taprio_dequeue() runs, taprio_entry_allows_tx() function might return
> false when the system is so bogged down that it wasn't able to make
> enough progress to dequeue() an skb in time. When that happens, there is
> no mechanism, currently, to age out packets that stood too much in the
> TX queues (what does "too much" mean?).
> 
> Whereas enqueue() is technically not time-sensitive, i.e. you can
> enqueue whenever you want and the Qdisc will dequeue whenever it can.
> Though in practice, to make this scheduling technique useful, the user
> space enqueue should also be time-aware (though you can't capture this
> with ping).
> 
> If I increase all my sched-entry intervals by a factor of 100, the
> backlog issue goes away and the system can make forward progress.
> 
> So yeah, sorry, I didn't pay too much attention to the data I was
> presenting for illustrative purposes.
> 
>> Aside: I realize you are busy - but if you get time and provide some
>> sample tc command lines for testing we could help create the tests for
>> you, at least the first time. The advantage of putting these tests in
>> tools/testing/selftests/tc-testing/ is that there are test tools out
>> there that run these tests and so regressions are easier to catch
>> sooner.
> 
> Yeah, ok. The script posted in a reply on the cover letter is still what
> I'm working with. The things it intends to capture are:
> - attaching a custom Qdisc to one of taprio's classes doesn't fail
> - attaching taprio to one of taprio's classes fails
> - sending packets through one queue increases the counters (any counters)
>    of just that queue
> 
> All the above, replicated once for the software scheduling case and once
> for the offload case. Currently netdevsim doesn't attempt to emulate
> taprio offload.
> 
> Is there a way to skip tests? I may look into tdc, but I honestly don't
> have time for unrelated stuff such as figuring out why my kernel isn't
> configured for the other tests to pass - and it seems that once one test
> fails, the others are completely skipped, see below.

You can tell tdc to run a specific test file by providing the "-f" option.
For example, if you want to run only taprio tests, you can issue the
following command:

./tdc.py -f tc-tests/qdiscs/taprio.json

This is also described in tdc's README.

> Also, by which rule are the test IDs created?

When creating a test case in tdc, you must have an ID field.
What we do to generate the IDs is leave the "id" field as an
empty string on the test case description, for example:


{
     "id": "",
     "name": "My dummy test case",
     ...
}

and run the following:

./tdc.py -i

This will automatically fill up the "id" field in the JSON
with an appropriate ID.

> root@debian:~# cd selftests/tc-testing/
> root@debian:~/selftests/tc-testing# ./tdc.sh
> considering category qdisc
>   -- ns/SubPlugin.__init__
> Test 0582: Create QFQ with default setting
> Test c9a3: Create QFQ with class weight setting
> Test d364: Test QFQ with max class weight setting
> Test 8452: Create QFQ with class maxpkt setting
> Test 22df: Test QFQ class maxpkt setting lower bound
> Test 92ee: Test QFQ class maxpkt setting upper bound
> Test d920: Create QFQ with multiple class setting
> Test 0548: Delete QFQ with handle
> Test 5901: Show QFQ class
> Test 0385: Create DRR with default setting
> Test 2375: Delete DRR with handle
> Test 3092: Show DRR class
> Test 3460: Create CBQ with default setting
> exit: 2
> exit: 0
> Error: Specified qdisc kind is unknown.

As you stated, this likely means you are missing a config option.
However you can just ask tdc to run a specific test file to avoid this.

> (...)

cheers,
Victor
Jamal Hadi Salim June 9, 2023, 4:19 p.m. UTC | #4
On Fri, Jun 9, 2023 at 8:10 AM Vladimir Oltean <vladimir.oltean@nxp.com> wrote:
>
> On Thu, Jun 08, 2023 at 02:44:46PM -0400, Jamal Hadi Salim wrote:
> > Other than the refcount issue i think the approach looks reasonable to
> > me. The stats before/after you are showing below though are
> > interesting; are you showing a transient phase where packets are
> > temporarily in the backlog. Typically the backlog is a transient phase
> > which lasts a very short period. Maybe it works differently for
> > taprio? I took a quick look at the code and do see to decrement the
> > backlog in the dequeue, so if it is not transient then some code path
> > is not being hit.
>
> It's a fair concern. The thing is that I put very aggressive time slots
> in the schedule that I'm testing with, and my kernel has a lot of
> debugging stuff which bogs it down (kasan, kmemleak, lockdep, DMA API
> debug etc). Not to mention that the CPU isn't the fastest to begin with.
>
> The way taprio works is that there's a hrtimer which fires at the
> expiration time of the current schedule entry and sets up the gates for
> the next one. Each schedule entry has a gate for each traffic class
> which determines what traffic classes are eligible for dequeue() and
> which ones aren't.
>
> The dequeue() procedure, though also invoked by the advance_schedule()
> hrtimer -> __netif_schedule(), is also time-sensitive. By the time
> taprio_dequeue() runs, taprio_entry_allows_tx() function might return
> false when the system is so bogged down that it wasn't able to make
> enough progress to dequeue() an skb in time. When that happens, there is
> no mechanism, currently, to age out packets that stood too much in the
> TX queues (what does "too much" mean?).
>
> Whereas enqueue() is technically not time-sensitive, i.e. you can
> enqueue whenever you want and the Qdisc will dequeue whenever it can.
> Though in practice, to make this scheduling technique useful, the user
> space enqueue should also be time-aware (though you can't capture this
> with ping).
>
> If I increase all my sched-entry intervals by a factor of 100, the
> backlog issue goes away and the system can make forward progress.
>
> So yeah, sorry, I didn't pay too much attention to the data I was
> presenting for illustrative purposes.
>

So it seems to me it is a transient phase and that at some point the
backlog will clear up and the sent stats will go up. Maybe just say so
in your commit or show the final result after the packet is gone.

I have to admit, I dont know much about taprio - that's why i am
asking all these leading questions. You spoke of gates etc and thats
klingon to me; but iiuc there's some time sensitive stuff that needs
to be sent out within a deadline. Q: What should happen to skbs that
are no longer valid?
On the aging thing which you say is missing, shouldnt the hrtimer or
schedule kick not be able to dequeue timestamped packets and just drop
them?

cheers,
jamal



cheers,
jamal

> > Aside: I realize you are busy - but if you get time and provide some
> > sample tc command lines for testing we could help create the tests for
> > you, at least the first time. The advantage of putting these tests in
> > tools/testing/selftests/tc-testing/ is that there are test tools out
> > there that run these tests and so regressions are easier to catch
> > sooner.
>
> Yeah, ok. The script posted in a reply on the cover letter is still what
> I'm working with. The things it intends to capture are:
> - attaching a custom Qdisc to one of taprio's classes doesn't fail
> - attaching taprio to one of taprio's classes fails
> - sending packets through one queue increases the counters (any counters)
>   of just that queue
>
> All the above, replicated once for the software scheduling case and once
> for the offload case. Currently netdevsim doesn't attempt to emulate
> taprio offload.
>
> Is there a way to skip tests? I may look into tdc, but I honestly don't
> have time for unrelated stuff such as figuring out why my kernel isn't
> configured for the other tests to pass - and it seems that once one test
> fails, the others are completely skipped, see below.
>
> Also, by which rule are the test IDs created?
>
> root@debian:~# cd selftests/tc-testing/
> root@debian:~/selftests/tc-testing# ./tdc.sh
> considering category qdisc
>  -- ns/SubPlugin.__init__
> Test 0582: Create QFQ with default setting
> Test c9a3: Create QFQ with class weight setting
> Test d364: Test QFQ with max class weight setting
> Test 8452: Create QFQ with class maxpkt setting
> Test 22df: Test QFQ class maxpkt setting lower bound
> Test 92ee: Test QFQ class maxpkt setting upper bound
> Test d920: Create QFQ with multiple class setting
> Test 0548: Delete QFQ with handle
> Test 5901: Show QFQ class
> Test 0385: Create DRR with default setting
> Test 2375: Delete DRR with handle
> Test 3092: Show DRR class
> Test 3460: Create CBQ with default setting
> exit: 2
> exit: 0
> Error: Specified qdisc kind is unknown.
>
>
> -----> teardown stage *** Could not execute: "$TC qdisc del dev $DUMMY handle 1: root"
>
> -----> teardown stage *** Error message: "Error: Invalid handle.
> "
> returncode 2; expected [0]
>
> -----> teardown stage *** Aborting test run.
>
>
> <_io.BufferedReader name=3> *** stdout ***
>
>
> <_io.BufferedReader name=5> *** stderr ***
> "-----> teardown stage" did not complete successfully
> Exception <class '__main__.PluginMgrTestFail'> ('teardown', 'Error: Specified qdisc kind is unknown.\n', '"-----> teardown stage" did not complete successfully') (caught in test_runner, running test 14 3460 Create CBQ with default setting stage teardown)
> ---------------
> traceback
>   File "/root/selftests/tc-testing/./tdc.py", line 495, in test_runner
>     res = run_one_test(pm, args, index, tidx)
>   File "/root/selftests/tc-testing/./tdc.py", line 434, in run_one_test
>     prepare_env(args, pm, 'teardown', '-----> teardown stage', tidx['teardown'], procout)
>   File "/root/selftests/tc-testing/./tdc.py", line 245, in prepare_env
>     raise PluginMgrTestFail(
> ---------------
> accumulated output for this test:
> Error: Specified qdisc kind is unknown.
> ---------------
>
> All test results:
>
> 1..336
> ok 1 0582 - Create QFQ with default setting
> ok 2 c9a3 - Create QFQ with class weight setting
> ok 3 d364 - Test QFQ with max class weight setting
> ok 4 8452 - Create QFQ with class maxpkt setting
> ok 5 22df - Test QFQ class maxpkt setting lower bound
> ok 6 92ee - Test QFQ class maxpkt setting upper bound
> ok 7 d920 - Create QFQ with multiple class setting
> ok 8 0548 - Delete QFQ with handle
> ok 9 5901 - Show QFQ class
> ok 10 0385 - Create DRR with default setting
> ok 11 2375 - Delete DRR with handle
> ok 12 3092 - Show DRR class
> ok 13 3460 - Create CBQ with default setting # skipped - "-----> teardown stage" did not complete successfully
>
> ok 14 0592 - Create CBQ with mpu # skipped - skipped - previous teardown failed 14 3460
>
> ok 15 4684 - Create CBQ with valid cell num # skipped - skipped - previous teardown failed 14 3460
>
> ok 16 4345 - Create CBQ with invalid cell num # skipped - skipped - previous teardown failed 14 3460
>
> ok 17 4525 - Create CBQ with valid ewma # skipped - skipped - previous teardown failed 14 3460
>
> ok 18 6784 - Create CBQ with invalid ewma # skipped - skipped - previous teardown failed 14 3460
>
> ok 19 5468 - Delete CBQ with handle # skipped - skipped - previous teardown failed 14 3460
>
> ok 20 492a - Show CBQ class # skipped - skipped - previous teardown failed 14 3460
>
> ok 21 9903 - Add mqprio Qdisc to multi-queue device (8 queues) # skipped - skipped - previous teardown failed 14 3460
>
> ok 22 453a - Delete nonexistent mqprio Qdisc # skipped - skipped - previous teardown failed 14 3460
>
> ok 23 5292 - Delete mqprio Qdisc twice # skipped - skipped - previous teardown failed 14 3460
>
> ok 24 45a9 - Add mqprio Qdisc to single-queue device # skipped - skipped - previous teardown failed 14 3460
>
> ok 25 2ba9 - Show mqprio class # skipped - skipped - previous teardown failed 14 3460
>
> ok 26 4812 - Create HHF with default setting # skipped - skipped - previous teardown failed 14 3460
>
> ok 27 8a92 - Create HHF with limit setting # skipped - skipped - previous teardown failed 14 3460
>
> ok 28 3491 - Create HHF with quantum setting # skipped - skipped - previous teardown failed 14 3460
> (...)
Vladimir Oltean June 9, 2023, 5:56 p.m. UTC | #5
On Fri, Jun 09, 2023 at 12:19:12PM -0400, Jamal Hadi Salim wrote:
> So it seems to me it is a transient phase and that at some point the
> backlog will clear up and the sent stats will go up. Maybe just say so
> in your commit or show the final result after the packet is gone.

I will re-collect some stats where there is nothing backlogged.

> I have to admit, I dont know much about taprio - that's why i am
> asking all these leading questions. You spoke of gates etc and thats
> klingon to me; but iiuc there's some time sensitive stuff that needs
> to be sent out within a deadline.

If sch_taprio.c is klingon to you, you can just imagine how sch_api.c
reads to me :)

> Q: What should happen to skbs that are no longer valid?
> On the aging thing which you say is missing, shouldnt the hrtimer or
> schedule kick not be able to dequeue timestamped packets and just drop
> them?

I think the skbs being "valid" is an application-defined metric (except
in the txtime assist mode, where skbs do truly have a transmit deadline).
The user space could reasonably enqueue 100 packets at a time, fully
aware of the fact that the cycle length is 1 second and that there's
only room in one cycle to send one packet, thus it would take 100
seconds for all those packets to be dequeued and sent.

It could also be that this isn't the case.

I guess what could be auto-detected and warned about is when a cycle
passes (sort of like an RCU grace period), the backlog was non-zero,
the gates were open, but still, no skb was dequeued. After one cycle,
the schedule repeats itself identically. But then do what? why drop?
it's a system issue..
Vladimir Oltean June 10, 2023, 5:49 p.m. UTC | #6
On Fri, Jun 09, 2023 at 11:56:35AM -0300, Victor Nogueira wrote:
> You can tell tdc to run a specific test file by providing the "-f" option.
> For example, if you want to run only taprio tests, you can issue the
> following command:
> 
> ./tdc.py -f tc-tests/qdiscs/taprio.json

Thanks. I've been able to make some progress with this.

Unfortunately the updated series conflicts with this in-flight patch
set, so I wouldn't want to post it just yet.
https://patchwork.kernel.org/project/netdevbpf/cover/20230609135917.1084327-1-vladimir.oltean@nxp.com/

However, I did add taprio offload to the netdevsim driver so that this
code path could be covered with tdc. I'd like Jamal and Paolo, who asked
for it, to comment on whether this is what they were looking to see.
https://github.com/vladimiroltean/linux/commits/sch-taprio-relationship-children-v2
diff mbox series

Patch

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index cc7ff98e5e86..23b98c3af8b2 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -2452,11 +2452,11 @@  static unsigned long taprio_find(struct Qdisc *sch, u32 classid)
 static int taprio_dump_class(struct Qdisc *sch, unsigned long cl,
 			     struct sk_buff *skb, struct tcmsg *tcm)
 {
-	struct netdev_queue *dev_queue = taprio_queue_get(sch, cl);
+	struct Qdisc *child = taprio_leaf(sch, cl);
 
 	tcm->tcm_parent = TC_H_ROOT;
 	tcm->tcm_handle |= TC_H_MIN(cl);
-	tcm->tcm_info = dev_queue->qdisc_sleeping->handle;
+	tcm->tcm_info = child->handle;
 
 	return 0;
 }
@@ -2466,8 +2466,7 @@  static int taprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	__releases(d->lock)
 	__acquires(d->lock)
 {
-	struct netdev_queue *dev_queue = taprio_queue_get(sch, cl);
-	struct Qdisc *child = dev_queue->qdisc_sleeping;
+	struct Qdisc *child = taprio_leaf(sch, cl);
 	struct tc_taprio_qopt_offload offload = {
 		.cmd = TAPRIO_CMD_TC_STATS,
 		.tc_stats = {