Message ID | 20220330113116.3166219-1-mattias.forsblad@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | net: tc: dsa: Implement offload of matchall for bridged DSA ports | expand |
On Wed, Mar 30, 2022 at 01:31:14PM +0200, Mattias Forsblad wrote: > Greetings, > > This series implements offloading of tc matchall filter to HW > for bridged DSA ports. > > Background > When using a non-VLAN filtering bridge we want to be able to drop > traffic directed to the CPU port so that the CPU doesn't get unnecessary loaded. > This is specially important when we have disabled learning on user ports. > > A sample configuration could be something like this: > > br0 > / \ > swp0 swp1 > > ip link add dev br0 type bridge stp_state 0 vlan_filtering 0 > ip link set swp0 master br0 > ip link set swp1 master br0 > ip link set swp0 type bridge_slave learning off > ip link set swp1 type bridge_slave learning off > ip link set swp0 up > ip link set swp1 up > ip link set br0 up > > After discussions here: https://lore.kernel.org/netdev/YjMo9xyoycXgSWXS@shredder/ > it was advised to use tc to set an ingress filter that could then > be offloaded to HW, like so: > > tc qdisc add dev br0 clsact > tc filter add dev br0 ingress pref 1 proto all matchall action drop > > Limitations > If there is tc rules on a bridge and all the ports leave the bridge > and then joins the bridge again, the indirect framwork doesn't seem > to reoffload them at join. The tc rules need to be torn down and > re-added. > > The first part of this serie uses the flow indirect framework to > setup monitoring of tc qdisc and filters added to a bridge. > The second part offloads the matchall filter to HW for Marvell > switches. > > Mattias Forsblad (2): > net: tc: dsa: Add the matchall filter with drop action for bridged DSA ports. > net: dsa: Implement tc offloading for drop target. > > drivers/net/dsa/mv88e6xxx/chip.c | 23 +++- > include/net/dsa.h | 13 ++ > net/dsa/dsa2.c | 5 + > net/dsa/dsa_priv.h | 3 + > net/dsa/port.c | 1 + > net/dsa/slave.c | 217 ++++++++++++++++++++++++++++++- > 6 files changed, 258 insertions(+), 4 deletions(-) > > -- > 2.25.1 > Have you considered point b of my argument here? https://patchwork.kernel.org/project/netdevbpf/patch/20220317065031.3830481-5-mattias.forsblad@gmail.com/#24782383 To make that argument even clearer, the following script produces: #!/bin/bash ip netns add ns0 ip netns add ns1 ip link add veth0 type veth peer name veth1 ip link add veth2 type veth peer name veth3 ip link add br0 type bridge && ip link set br0 up ip link set veth0 netns ns0 ip link set veth3 netns ns1 ip -n ns0 addr add 192.168.100.1/24 dev veth0 && ip -n ns0 link set veth0 up ip -n ns1 addr add 192.168.100.2/24 dev veth3 && ip -n ns1 link set veth3 up ip addr add 192.168.100.3/24 dev br0 ip link set veth1 master br0 && ip link set veth1 up ip link set veth2 master br0 && ip link set veth2 up tc qdisc add dev br0 clsact tc filter add dev br0 ingress matchall action drop echo "Pinging another bridge port" && ip netns exec ns0 ping -c 3 192.168.100.2 echo "Pinging the bridge" && ip netns exec ns0 ping -c 3 192.168.100.3 ip netns del ns0 ip netns del ns1 ip link del br0 [ 1857.000393] br0: port 1(veth1) entered blocking state [ 1857.005537] br0: port 1(veth1) entered disabled state [ 1857.011433] device veth1 entered promiscuous mode [ 1857.047291] IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready [ 1857.054019] br0: port 1(veth1) entered blocking state [ 1857.059205] br0: port 1(veth1) entered forwarding state [ 1857.064791] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready [ 1857.124507] br0: port 2(veth2) entered blocking state [ 1857.129658] br0: port 2(veth2) entered disabled state [ 1857.135585] device veth2 entered promiscuous mode [ 1857.209748] br0: port 2(veth2) entered blocking state [ 1857.214900] br0: port 2(veth2) entered forwarding state [ 1857.220680] IPv6: ADDRCONF(NETDEV_CHANGE): veth3: link becomes ready Pinging another bridge port PING 192.168.100.2 (192.168.100.2) 56(84) bytes of data. 64 bytes from 192.168.100.2: icmp_seq=1 ttl=64 time=0.508 ms 64 bytes from 192.168.100.2: icmp_seq=2 ttl=64 time=0.222 ms 64 bytes from 192.168.100.2: icmp_seq=3 ttl=64 time=0.405 ms --- 192.168.100.2 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2051ms rtt min/avg/max/mdev = 0.222/0.378/0.508/0.118 ms Pinging the bridge PING 192.168.100.3 (192.168.100.3) 56(84) bytes of data. ^C --- 192.168.100.3 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2040ms filter protocol all pref 49152 matchall chain 0 filter protocol all pref 49152 matchall chain 0 handle 0x1 not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 installed 12 sec used 6 sec Action statistics: Sent 936 bytes 16 pkt (dropped 16, overlimits 0 requeues 0) backlog 0b 0p requeues 0 [ 1870.189158] br0: port 1(veth1) entered disabled state [ 1870.204061] device veth1 left promiscuous mode [ 1870.208751] br0: port 1(veth1) entered disabled state [ 1870.232677] device veth2 left promiscuous mode [ 1870.237814] br0: port 2(veth2) entered disabled state Now imagine that veth0 is a DSA switch interface which monitors and offloads the drop rule. How could it distinguish between pinging veth3 and pinging br0, so as to comply with software semantics?
On 2022-03-30 14:09, Vladimir Oltean wrote: > On Wed, Mar 30, 2022 at 01:31:14PM +0200, Mattias Forsblad wrote: >> Greetings, >> >> This series implements offloading of tc matchall filter to HW >> for bridged DSA ports. >> >> Background >> When using a non-VLAN filtering bridge we want to be able to drop >> traffic directed to the CPU port so that the CPU doesn't get unnecessary loaded. >> This is specially important when we have disabled learning on user ports. >> >> A sample configuration could be something like this: >> >> br0 >> / \ >> swp0 swp1 >> >> ip link add dev br0 type bridge stp_state 0 vlan_filtering 0 >> ip link set swp0 master br0 >> ip link set swp1 master br0 >> ip link set swp0 type bridge_slave learning off >> ip link set swp1 type bridge_slave learning off >> ip link set swp0 up >> ip link set swp1 up >> ip link set br0 up >> >> After discussions here: https://lore.kernel.org/netdev/YjMo9xyoycXgSWXS@shredder/ >> it was advised to use tc to set an ingress filter that could then >> be offloaded to HW, like so: >> >> tc qdisc add dev br0 clsact >> tc filter add dev br0 ingress pref 1 proto all matchall action drop >> >> Limitations >> If there is tc rules on a bridge and all the ports leave the bridge >> and then joins the bridge again, the indirect framwork doesn't seem >> to reoffload them at join. The tc rules need to be torn down and >> re-added. >> >> The first part of this serie uses the flow indirect framework to >> setup monitoring of tc qdisc and filters added to a bridge. >> The second part offloads the matchall filter to HW for Marvell >> switches. >> >> Mattias Forsblad (2): >> net: tc: dsa: Add the matchall filter with drop action for bridged DSA ports. >> net: dsa: Implement tc offloading for drop target. >> >> drivers/net/dsa/mv88e6xxx/chip.c | 23 +++- >> include/net/dsa.h | 13 ++ >> net/dsa/dsa2.c | 5 + >> net/dsa/dsa_priv.h | 3 + >> net/dsa/port.c | 1 + >> net/dsa/slave.c | 217 ++++++++++++++++++++++++++++++- >> 6 files changed, 258 insertions(+), 4 deletions(-) >> >> -- >> 2.25.1 >> > > Have you considered point b of my argument here? > https://patchwork.kernel.org/project/netdevbpf/patch/20220317065031.3830481-5-mattias.forsblad@gmail.com/#24782383 > > To make that argument even clearer, the following script produces: > > #!/bin/bash > > ip netns add ns0 > ip netns add ns1 > ip link add veth0 type veth peer name veth1 > ip link add veth2 type veth peer name veth3 > ip link add br0 type bridge && ip link set br0 up > ip link set veth0 netns ns0 > ip link set veth3 netns ns1 > ip -n ns0 addr add 192.168.100.1/24 dev veth0 && ip -n ns0 link set veth0 up > ip -n ns1 addr add 192.168.100.2/24 dev veth3 && ip -n ns1 link set veth3 up > ip addr add 192.168.100.3/24 dev br0 > ip link set veth1 master br0 && ip link set veth1 up > ip link set veth2 master br0 && ip link set veth2 up > tc qdisc add dev br0 clsact > tc filter add dev br0 ingress matchall action drop > echo "Pinging another bridge port" && ip netns exec ns0 ping -c 3 192.168.100.2 > echo "Pinging the bridge" && ip netns exec ns0 ping -c 3 192.168.100.3 > ip netns del ns0 > ip netns del ns1 > ip link del br0 > > [ 1857.000393] br0: port 1(veth1) entered blocking state > [ 1857.005537] br0: port 1(veth1) entered disabled state > [ 1857.011433] device veth1 entered promiscuous mode > [ 1857.047291] IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready > [ 1857.054019] br0: port 1(veth1) entered blocking state > [ 1857.059205] br0: port 1(veth1) entered forwarding state > [ 1857.064791] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready > [ 1857.124507] br0: port 2(veth2) entered blocking state > [ 1857.129658] br0: port 2(veth2) entered disabled state > [ 1857.135585] device veth2 entered promiscuous mode > [ 1857.209748] br0: port 2(veth2) entered blocking state > [ 1857.214900] br0: port 2(veth2) entered forwarding state > [ 1857.220680] IPv6: ADDRCONF(NETDEV_CHANGE): veth3: link becomes ready > Pinging another bridge port > PING 192.168.100.2 (192.168.100.2) 56(84) bytes of data. > 64 bytes from 192.168.100.2: icmp_seq=1 ttl=64 time=0.508 ms > 64 bytes from 192.168.100.2: icmp_seq=2 ttl=64 time=0.222 ms > 64 bytes from 192.168.100.2: icmp_seq=3 ttl=64 time=0.405 ms > > --- 192.168.100.2 ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2051ms > rtt min/avg/max/mdev = 0.222/0.378/0.508/0.118 ms > Pinging the bridge > PING 192.168.100.3 (192.168.100.3) 56(84) bytes of data. > ^C > --- 192.168.100.3 ping statistics --- > 3 packets transmitted, 0 received, 100% packet loss, time 2040ms > > filter protocol all pref 49152 matchall chain 0 > filter protocol all pref 49152 matchall chain 0 handle 0x1 > not_in_hw > action order 1: gact action drop > random type none pass val 0 > index 1 ref 1 bind 1 installed 12 sec used 6 sec > Action statistics: > Sent 936 bytes 16 pkt (dropped 16, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > > [ 1870.189158] br0: port 1(veth1) entered disabled state > [ 1870.204061] device veth1 left promiscuous mode > [ 1870.208751] br0: port 1(veth1) entered disabled state > [ 1870.232677] device veth2 left promiscuous mode > [ 1870.237814] br0: port 2(veth2) entered disabled state > > Now imagine that veth0 is a DSA switch interface which monitors and > offloads the drop rule. How could it distinguish between pinging veth3 > and pinging br0, so as to comply with software semantics? Hi Vladimir, thanks for your comments. The patch series takes in account that a foreign interface is bridged and doesn't offload the rule in this case (dsa_slave_check_offload). Regarding your previous comment point b. Tobias could see some problems with that approach. I'd think he will comment on that.
On Thu, Mar 31, 2022 at 10:06:20AM +0200, Mattias Forsblad wrote: > Hi Vladimir, > thanks for your comments. The patch series takes in account that a foreign > interface is bridged and doesn't offload the rule in this case (dsa_slave_check_offload). I certainly appreciate the intention, but it could be that a foreign interface will join the bridge after the matchall action drop is installed on the bridge. So actively monitoring bridge joins/leaves would be required to offload/unoffload the rule. > Regarding your previous comment point b. Tobias could see some problems > with that approach. I'd think he will comment on that. I'll respond there.
On 2022-03-31 15:42, Vladimir Oltean wrote: > On Thu, Mar 31, 2022 at 10:06:20AM +0200, Mattias Forsblad wrote: >> Hi Vladimir, >> thanks for your comments. The patch series takes in account that a foreign >> interface is bridged and doesn't offload the rule in this case (dsa_slave_check_offload). > > I certainly appreciate the intention, but it could be that a foreign > interface will join the bridge after the matchall action drop is > installed on the bridge. So actively monitoring bridge joins/leaves > would be required to offload/unoffload the rule. > Ah, you're right. I'll fix that. Thanks.