Message ID | 20210930075948.36981-1-xiaoliang.yang_1@nxp.com (mailing list archive) |
---|---|
Headers | show |
Series | net: dsa: felix: psfp support on vsc9959 | expand |
On Thu, 30 Sep 2021 15:59:40 +0800 Xiaoliang Yang wrote: > VSC9959 hardware supports Per-Stream Filtering and Policing(PSFP). > This patch series add PSFP support on tc flower offload of ocelot > driver. Use chain 30000 to distinguish PSFP from VCAP blocks. Add gate > and police set to support PSFP in VSC9959 driver. Vladimir, any comments?
On Fri, Oct 01, 2021 at 03:11:15PM -0700, Jakub Kicinski wrote: > On Thu, 30 Sep 2021 15:59:40 +0800 Xiaoliang Yang wrote: > > VSC9959 hardware supports Per-Stream Filtering and Policing(PSFP). > > This patch series add PSFP support on tc flower offload of ocelot > > driver. Use chain 30000 to distinguish PSFP from VCAP blocks. Add gate > > and police set to support PSFP in VSC9959 driver. > > Vladimir, any comments? Sorry, I was intending to try out the patches and get an overall feel from there, but I had an incredibly busy week and simply didn't have time. If it's okay to wait a bit more I will do that tomorrow. In general I feel that the most glaring issue Xiaoliang has still avoided to address is the one discussed here: https://patchwork.kernel.org/project/netdevbpf/patch/20210831034536.17497-6-xiaoliang.yang_1@nxp.com/#24416737 where basically some tc filters depend on some bridge fdb entries, and there's no way to prevent the bridge from deleting the fdb entries which would in turn break the tc filters, but also no way of removing the tc filters when the bridge fdb entries disappear. The hardware design is poor, no two ways around that, but arguably it's a tricky issue to handle in software too, the bridge simply doesn't give switchdev drivers a chance to veto an fdb removal, and I've no idea what changing that would even mean. So I can understand why Xiaoliang is avoiding it. That's why I wanted to run the patches too, first I feel that we should provide a selftest for the feature, and that is absent from this patch series, and second I would like to see how broken can the driver state end up being if we just leave tc filters around which are just inactive in the absence of a bridge, or a bridge fdb entry. I simply don't know that right now. It's almost as if we would be better off stealing some hardware FDB entries from the bridge and reserving them for the tc filter, and not depending on the bridge driver at all.
On Fri, 1 Oct 2021 22:46:34 +0000 Vladimir Oltean wrote: > On Fri, Oct 01, 2021 at 03:11:15PM -0700, Jakub Kicinski wrote: > > On Thu, 30 Sep 2021 15:59:40 +0800 Xiaoliang Yang wrote: > > > VSC9959 hardware supports Per-Stream Filtering and Policing(PSFP). > > > This patch series add PSFP support on tc flower offload of ocelot > > > driver. Use chain 30000 to distinguish PSFP from VCAP blocks. Add gate > > > and police set to support PSFP in VSC9959 driver. > > > > Vladimir, any comments? > > Sorry, I was intending to try out the patches and get an overall feel > from there, but I had an incredibly busy week and simply didn't have time. > If it's okay to wait a bit more I will do that tomorrow. Take your time, I'll mark it as Deferred for now. > In general I feel that the most glaring issue Xiaoliang has still > avoided to address is the one discussed here: > https://patchwork.kernel.org/project/netdevbpf/patch/20210831034536.17497-6-xiaoliang.yang_1@nxp.com/#24416737 > where basically some tc filters depend on some bridge fdb entries, and > there's no way to prevent the bridge from deleting the fdb entries which > would in turn break the tc filters, but also no way of removing the tc > filters when the bridge fdb entries disappear. > The hardware design is poor, no two ways around that, but arguably it's > a tricky issue to handle in software too, the bridge simply doesn't give > switchdev drivers a chance to veto an fdb removal, and I've no idea what > changing that would even mean. So I can understand why Xiaoliang is > avoiding it. > That's why I wanted to run the patches too, first I feel that we should > provide a selftest for the feature, and that is absent from this patch > series, and second I would like to see how broken can the driver state > end up being if we just leave tc filters around which are just inactive > in the absence of a bridge, or a bridge fdb entry. I simply don't know > that right now. > It's almost as if we would be better off stealing some hardware FDB > entries from the bridge and reserving them for the tc filter, and not > depending on the bridge driver at all. Maybe I shouldn't comment based on the snippets of understanding but "steal some FDB entries" would be my first reaction. Xiaoliang said: The PSFP gate and police action are set on ingress port, and "tc-filter" has no parameter to set the forward port for the filtered stream. which seems to undersell TC.
On Fri, Oct 01, 2021 at 03:59:36PM -0700, Jakub Kicinski wrote: > On Fri, 1 Oct 2021 22:46:34 +0000 Vladimir Oltean wrote: > > On Fri, Oct 01, 2021 at 03:11:15PM -0700, Jakub Kicinski wrote: > > > On Thu, 30 Sep 2021 15:59:40 +0800 Xiaoliang Yang wrote: > > > > VSC9959 hardware supports Per-Stream Filtering and Policing(PSFP). > > > > This patch series add PSFP support on tc flower offload of ocelot > > > > driver. Use chain 30000 to distinguish PSFP from VCAP blocks. Add gate > > > > and police set to support PSFP in VSC9959 driver. > > > > > > Vladimir, any comments? > > > > Sorry, I was intending to try out the patches and get an overall feel > > from there, but I had an incredibly busy week and simply didn't have time. > > If it's okay to wait a bit more I will do that tomorrow. > > Take your time, I'll mark it as Deferred for now. Thank you. > Maybe I shouldn't comment based on the snippets of understanding but > "steal some FDB entries" would be my first reaction. No, it's absolutely reasonable. I feel like it's also going to be your second reaction, and third, and... The thing is, if we depend on the bridge driver's state only for this snowflake piece of hardware, user experience is going to absolutely suck. Having a selftest inside the kernel would be the litmus test we need for deciding whether the way we expose the feature is sane or not. The kernel should abstract the hardware and its quirks and provide a standardized and abstract interface, that's literally its job. If you tell me your hardware needs special massaging in this or that way, I'm better off just using a random SDK. > Xiaoliang said: > > The PSFP gate and police action are set on ingress port, and > "tc-filter" has no parameter to set the forward port for the > filtered stream. > > which seems to undersell TC. I know, right? That's the loop we can't get out of, TSN is still pretty much in its infancy, and with pre-standard hardware it's really difficult to create software models that stand the test of time. I've seen other hardware implementations for PSFP and they do use TCAM and are strictly decoupled from the bridging service. Microchip say that their newer hardware generations are also thought out this way. So while yes, Ocelot is driving me crazy for special-casing the NULL stream identification function and implementing it using bridge FDB entries (because those types of flows only match on MAC DA and VLAN ID, somebody thought "oh, I know something that can already do that!"). That is definitely not what TSN streams in the general sense are about, and I do feel that classifier-action pairs in TC are really the right software model overall, from an application/user point of view.
On Sat, Oct 02, 2021 at 02:17:06AM +0300, Vladimir Oltean wrote: > On Fri, Oct 01, 2021 at 03:59:36PM -0700, Jakub Kicinski wrote: > > On Fri, 1 Oct 2021 22:46:34 +0000 Vladimir Oltean wrote: > > > On Fri, Oct 01, 2021 at 03:11:15PM -0700, Jakub Kicinski wrote: > > > > On Thu, 30 Sep 2021 15:59:40 +0800 Xiaoliang Yang wrote: > > > > > VSC9959 hardware supports Per-Stream Filtering and Policing(PSFP). > > > > > This patch series add PSFP support on tc flower offload of ocelot > > > > > driver. Use chain 30000 to distinguish PSFP from VCAP blocks. Add gate > > > > > and police set to support PSFP in VSC9959 driver. > > > > > > > > Vladimir, any comments? > > > > > > Sorry, I was intending to try out the patches and get an overall feel > > > from there, but I had an incredibly busy week and simply didn't have time. > > > If it's okay to wait a bit more I will do that tomorrow. > > > > Take your time, I'll mark it as Deferred for now. > > Thank you. I'm very sorry for being late. I wrote this selftest for the ingress time gating portion of Xiaoliang's work: cat tools/testing/selftests/drivers/net/ocelot/psfp.sh -----------------------------[ cut here ]----------------------------- #!/bin/bash # SPDX-License-Identifier: GPL-2.0 # Copyright 2021 NXP WAIT_TIME=1 NUM_NETIFS=4 lib_dir=$(dirname $0)/../../../net/forwarding source $lib_dir/tc_common.sh source $lib_dir/lib.sh utc_tai_offset=37 num_pkts=1000 fudge_factor=5000000 # https://github.com/vladimiroltean/tsn-scripts require_command isochron require_command phc2sys # This setup requires patching the LS1028A device tree to move the DSA master # from eno2 to eno3, and to use eno2 as a data port. # # swp0 swp4 eno2 eno0 # +---------------------------------------------+ # | DUT ports Generator ports | # | +--------+ +--------+ +--------+ +--------+ | # | | | | | | | | | | # | |dut_recv| |dut_send| |gen_recv| |gen_send| | # | | | | | | | | | | # +-+--------+-+--------+-+--------+-+--------+-+ # ^ v ^ v # | | | | # | +-----------+ | # | | # +--------------------------------+ dut_recv=${NETIFS[p1]} dut_send=${NETIFS[p2]} gen_recv=${NETIFS[p3]} gen_send=${NETIFS[p4]} dut_recv_mac="de:ad:be:ef:00:00" dut_send_mac="de:ad:be:ef:00:01" gen_recv_mac="de:ad:be:ef:00:02" gen_send_mac="de:ad:be:ef:00:03" stream_vid="100" PSFP() { echo 30000 } psfp_chain_create() { local eth=$1 tc qdisc add dev $eth clsact tc filter add dev $eth ingress chain 0 pref 49152 flower \ skip_sw action goto chain $(PSFP) } psfp_chain_destroy() { local eth=$1 tc qdisc del dev $eth clsact } psfp_filter_check() { local expected_matches=$1 local packets="" local drops="" local stats="" stats=$(tc -j -s filter show dev ${dut_recv} ingress chain $(PSFP) pref 1) packets=$(echo ${stats} | jq ".[1].options.actions[].stats.packets") drops=$(echo ${stats} | jq ".[1].options.actions[].stats.drops") if ! [ "${packets}" = "${expected_matches}" ]; then echo "Expected filter to match on ${expected_matches} packets but matched on ${packets} instead" fi echo "Hardware filter reports ${drops} drops" } scrape_logs_for_phc2sys_offset() { local awk_program='/sys offset/ { print $5; exit; }' tac ${phc2sys_log} | awk "${awk_program}" } scrape_logs_for_ptp4l_offset() { local log=$1 local awk_program='/master offset/ { print $4; exit; }' tac ${log} | awk "${awk_program}" } check_sync_phc2sys() { local threshold_ns=50 local system_clock_offset= while :; do sleep 1 system_clock_offset=$(scrape_logs_for_phc2sys_offset) # Got something, is it a number? case "${system_clock_offset}" in ''|[!\-][!0-9]*) if ! pidof phc2sys > /dev/null; then echo "Please start the phc2sys service." return 1 else echo "Trying again..." continue fi ;; esac if [ "${system_clock_offset}" -lt 0 ]; then system_clock_offset=$((-${system_clock_offset})) fi echo "System clock offset ${system_clock_offset} ns" if [ "${system_clock_offset}" -gt "${threshold_ns}" ]; then echo "System clock is not yet synchronized..." continue fi # Success break done } check_sync_ptp4l() { local eth=$1 local log="ptp4l_log_${eth}" local indirect="${!log}" local threshold_ns=100 local phc_offset= while :; do sleep 1 phc_offset=$(scrape_logs_for_ptp4l_offset "${indirect}") # Got something, is it a number? case "${phc_offset}" in ''|[!\-][!0-9]*) if ! pidof ptp4l > /dev/null; then echo "Please start the ptp4l service." return 1 else echo "Trying again..." continue fi ;; esac echo "Master offset ${phc_offset} ns" if [ "${phc_offset}" -lt 0 ]; then phc_offset=$((-${phc_offset})) fi if [ "${phc_offset}" -gt "${threshold_ns}" ]; then echo "PTP clock is not yet synchronized..." continue fi # Success break done } check_sync() { check_sync_phc2sys # gen_send is master, no need to check sync check_sync_ptp4l ${dut_recv} } phc2sys_start() { local eth=$1 phc2sys_log="$(mktemp)" phc2sys -m \ -c ${eth} \ -s CLOCK_REALTIME \ -O ${utc_tai_offset} \ > "${phc2sys_log}" 2>&1 & phc2sys_pid=$! sleep 1 } phc2sys_stop() { { kill ${phc2sys_pid} && wait ${phc2sys_pid}; } 2> /dev/null rm "${phc2sys_log}" 2> /dev/null } ptp4l_start() { local eth=$1 local slave_only=$2 local log="ptp4l_log_${eth}" local pid="ptp4l_pid_${eth}" local extra_args="" if [ "${slave_only}" = true ]; then extra_args=" -s" fi # declare dynamic variables as global, will reference them later in # ptp4l_stop and scrape_logs_for_ptp4l_offset declare -g "${log}=$(mktemp)" ptp4l -m -2 -P \ -i ${eth} ${extra_args} \ --step_threshold 0.00002 --first_step_threshold 0.00002 \ > "${!log}" 2>&1 & declare -g "${pid}=$!" echo "ptp4l for interface ${eth} logs to ${!log} and has pid ${!pid}" sleep 1 } ptp4l_stop() { local eth=$1 local log="ptp4l_log_${eth}" local pid="ptp4l_pid_${eth}" { kill ${!pid} && wait ${!pid}; } 2> /dev/null rm "${!log}" 2> /dev/null } txtime_setup() { local eth=$1 tc qdisc add dev ${eth} clsact tc filter add dev ${eth} egress protocol 0x88f7 \ flower action skbedit priority 7 tc filter add dev ${eth} egress protocol 802.1Q \ flower vlan_ethtype 0xdeaf action skbedit priority 6 tc qdisc add dev ${eth} handle 100: parent root mqprio num_tc 8 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ map 0 1 2 3 4 5 6 7 \ hw 1 tc qdisc replace dev ${eth} parent 100:6 etf \ clockid CLOCK_TAI offload delta ${fudge_factor} tc qdisc replace dev ${eth} parent 100:7 etf \ clockid CLOCK_TAI offload delta ${fudge_factor} } txtime_cleanup() { local eth=$1 tc qdisc del dev ${eth} root tc qdisc del dev ${eth} clsact } setup_prepare() { ip link set ${dut_recv} up ip link set ${dut_send} up ip link set ${gen_recv} up ip link set ${gen_send} up ip link add br0 type bridge vlan_filtering 1 ip link set ${dut_recv} master br0 ip link set ${dut_send} master br0 ip link set br0 up bridge vlan add dev ${dut_send} vid ${stream_vid} bridge vlan add dev ${dut_recv} vid ${stream_vid} bridge fdb add dev ${dut_send} \ ${gen_recv_mac} vlan ${stream_vid} static master psfp_chain_create ${dut_recv} tc filter add dev ${dut_recv} ingress chain $(PSFP) pref 1 \ protocol 802.1Q flower skip_sw \ dst_mac ${gen_recv_mac} vlan_id ${stream_vid} \ action gate base-time 0.000000000 \ sched-entry OPEN 5000000 -1 -1 \ sched-entry CLOSE 5000000 -1 -1 # Deleting the MAC table entry (uncommenting the lines below) will # render the filter useless, because the SFID processing will be # bypassed, and the kernel does not protect against this structure # getting altered. # Interestingly though, the spurious matches described below are not # eliminated even with the MAC table being deleted. #bridge fdb del dev ${dut_send} \ # ${gen_recv_mac} vlan ${stream_vid} static master ip link set ${gen_recv} promisc on txtime_setup ${gen_send} # Assumption true for LS1028A: gen_send and gen_recv use the same PHC phc2sys_start ${gen_send} ptp4l_start ${gen_send} false ptp4l_start ${dut_recv} true # ?! Hardware seems to spuriously match on the PTP packets send through # the switch port here. The filter reports anywhere between 2 to 4 # packets matching on it, some even dropped, even though there isn't # any packet with ${gen_recv_mac} being sent. psfp_filter_check 0 check_sync } cleanup() { ptp4l_stop ${dut_recv} ptp4l_stop ${gen_send} phc2sys_stop isochron_recv_stop txtime_cleanup ${gen_send} ip link set ${gen_recv} promisc off psfp_chain_destroy ${dut_recv} ip link del br0 } isochron_recv_start() { local eth=$1 taskset $((1 << 1)) isochron rcv \ --interface ${eth} \ --num-frames ${num_pkts} \ --sched-priority 98 \ --sched-rr \ --etype 0xdead \ --utc-tai-offset ${utc_tai_offset} \ --quiet & isochron_pid=$! sleep 1 } isochron_recv_stop() { { kill ${isochron_pid} && wait ${isochron_pid}; } 2> /dev/null } isochron_drops_check() { local expected_lost=$1 local drops="" drops=$(cat ${isochron_log} | grep -E 'seqid .* lost' | wc -l) if ! [ "${drops}" = "${expected_lost}" ]; then echo "Expected isochron to drop ${expected_lost} packets but dropped ${drops}" return 1 fi return 0 } test_isochron_common() { local base_time=$1 local expected_lost=$2 local expected_matches=$3 local isochron_log="$(mktemp)" isochron_recv_start ${gen_recv} taskset $((1 << 0)) isochron send \ --interface ${gen_send} \ --dmac ${gen_recv_mac} \ --priority 6 \ --base-time ${base_time} \ --cycle-time 0.010000000 \ --num-frames ${num_pkts} \ --frame-size 64 \ --vid ${stream_vid} \ --etype 0xdead \ --txtime \ --utc-tai-offset ${utc_tai_offset} \ --sched-rr \ --sched-priority 98 \ --client 127.0.0.1 \ --quiet \ > "${isochron_log}" 2>&1 isochron_recv_stop isochron_drops_check ${expected_lost} && echo "OK" || echo "FAIL" psfp_filter_check ${expected_matches} rm "${isochron_log}" 2> /dev/null } test_gate_in_band() { # Send packets in-band with the OPEN gate entry test_isochron_common 0.002500000 0 ${num_pkts} } test_gate_out_of_band() { # Send packets in-band with the CLOSE gate entry test_isochron_common 0.007500000 ${num_pkts} $((2 * ${num_pkts})) } trap cleanup EXIT ALL_TESTS=" test_gate_in_band test_gate_out_of_band " setup_prepare setup_wait tests_run exit $EXIT_STATUS -----------------------------[ cut here ]----------------------------- and both tests pass with OK, but here are some parts of my log: Expected filter to match on 0 packets but matched on 2 instead ~~~~~~~~~~~~~~~~~~~~ I put "psfp_filter_check 0" at the end of "setup_prepare", during a time where it is guaranteed that no test packet belonging to the TSN stream has been sent, yet the hardware seems to spuriously increment this counter. This makes it very difficult to actually assess what's going on by looking at counters. If you look at the comments, the SFID counters increment spuriously even if I delete the MAC table entry. Hardware filter reports 0 drops OK [ 275.429138] mscc_felix 0000:00:00.5: vsc9959_psfp_stats_get: pkts 1000 drops 0 sfid 0 match 1000 not_pass_gate 0 not_pass_sdu 0 red 0 Expected filter to match on 1000 packets but matched on 1002 instead Hardware filter reports 0 drops Accepted connection from 127.0.0.1 Accepted connection from 127.0.0.1 OK [ 288.091715] mscc_felix 0000:00:00.5: vsc9959_psfp_stats_get: pkts 1000 drops 1000 sfid 0 match 1000 not_pass_gate 1000 not_pass_sdu 0 red 0 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the driver sums these up and puts them in stats->drops Expected filter to match on 2000 packets but matched on 2002 instead Hardware filter reports 0 drops ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ however "tc -s filter show ..." shows 0 drops, so the information is lost somewhere along the way (the "packets" counter is correct though). It's very hard to have an opinion considering the fact that the hardware doesn't behave according to my understanding. One of us must be wrong :)
Hi Vladimir, On Oct 06, 2021 at 20:12:27 +0300, Vladimir Oltean wrote: > I'm very sorry for being late. > I wrote this selftest for the ingress time gating portion of Xiaoliang's work: > > cat tools/testing/selftests/drivers/net/ocelot/psfp.sh > -----------------------------[ cut here ]----------------------------- #!/bin/bash # > SPDX-License-Identifier: GPL-2.0 # Copyright 2021 NXP > > WAIT_TIME=1 > NUM_NETIFS=4 > lib_dir=$(dirname $0)/../../../net/forwarding source $lib_dir/tc_common.sh > source $lib_dir/lib.sh ... Skip ... > -----------------------------[ cut here ]----------------------------- > > and both tests pass with OK, but here are some parts of my log: > > > Expected filter to match on 0 packets but matched on 2 instead > ~~~~~~~~~~~~~~~~~~~~ > I put "psfp_filter_check 0" at > the end of "setup_prepare", > during a time where it is > guaranteed that no test packet belonging > to the TSN stream has been > sent, yet the hardware seems to > spuriously increment this > counter. This makes it very difficult > to actually assess what's > going on by looking at counters. > If you look at the comments, > the SFID counters increment > spuriously even if I delete the > MAC table entry. > > Hardware filter reports 0 drops > OK > [ 275.429138] mscc_felix 0000:00:00.5: vsc9959_psfp_stats_get: pkts 1000 > drops 0 sfid 0 match 1000 not_pass_gate 0 not_pass_sdu 0 red 0 Expected > filter to match on 1000 packets but matched on 1002 instead Hardware filter > reports 0 drops Accepted connection from 127.0.0.1 Accepted connection > from 127.0.0.1 OK [ 288.091715] mscc_felix 0000:00:00.5: > vsc9959_psfp_stats_get: pkts 1000 drops 1000 sfid 0 match 1000 > not_pass_gate 1000 not_pass_sdu 0 red 0 The hardware count also increased in my test. This happens occasionally when first plug in the internet cable. > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > the driver sums these up and puts them > > in stats->drops Expected filter to match on 2000 packets but matched on 2002 > instead Hardware filter reports 0 drops ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > however "tc -s filter show ..." shows 0 drops, so the information is lost > somewhere along the way (the "packets" counter is correct though). This is because stats.drops has not transmit to flow_stats_update() in ocelot_flower.c: flow_stats_update(&f->stats, 0x0, stats.pkts, 0, 0x0, FLOW_ACTION_HW_STATS_IMMEDIATE); I can update this though the stats.drops is not be used for other VCAPs rules. > > > It's very hard to have an opinion considering the fact that the hardware > doesn't behave according to my understanding. One of us must be wrong :) Thanks, Xiaoliang