mbox series

[RFC,net-next,0/2] net: tc: dsa: Implement offload of matchall for bridged DSA ports

Message ID 20220330113116.3166219-1-mattias.forsblad@gmail.com (mailing list archive)
Headers show
Series net: tc: dsa: Implement offload of matchall for bridged DSA ports | expand

Message

Mattias Forsblad March 30, 2022, 11:31 a.m. UTC
Greetings,

This series implements offloading of tc matchall filter to HW
for bridged DSA ports.

Background
When using a non-VLAN filtering bridge we want to be able to drop
traffic directed to the CPU port so that the CPU doesn't get unnecessary loaded.
This is specially important when we have disabled learning on user ports.

A sample configuration could be something like this:

       br0
      /   \
   swp0   swp1

ip link add dev br0 type bridge stp_state 0 vlan_filtering 0
ip link set swp0 master br0
ip link set swp1 master br0
ip link set swp0 type bridge_slave learning off
ip link set swp1 type bridge_slave learning off
ip link set swp0 up
ip link set swp1 up
ip link set br0 up

After discussions here: https://lore.kernel.org/netdev/YjMo9xyoycXgSWXS@shredder/
it was advised to use tc to set an ingress filter that could then
be offloaded to HW, like so:

tc qdisc add dev br0 clsact
tc filter add dev br0 ingress pref 1 proto all matchall action drop

Limitations
If there is tc rules on a bridge and all the ports leave the bridge
and then joins the bridge again, the indirect framwork doesn't seem
to reoffload them at join. The tc rules need to be torn down and
re-added.

The first part of this serie uses the flow indirect framework to
setup monitoring of tc qdisc and filters added to a bridge.
The second part offloads the matchall filter to HW for Marvell
switches.

Mattias Forsblad (2):
  net: tc: dsa: Add the matchall filter with drop action for bridged DSA ports.
  net: dsa: Implement tc offloading for drop target.

 drivers/net/dsa/mv88e6xxx/chip.c |  23 +++-
 include/net/dsa.h                |  13 ++
 net/dsa/dsa2.c                   |   5 +
 net/dsa/dsa_priv.h               |   3 +
 net/dsa/port.c                   |   1 +
 net/dsa/slave.c                  | 217 ++++++++++++++++++++++++++++++-
 6 files changed, 258 insertions(+), 4 deletions(-)

Comments

Vladimir Oltean March 30, 2022, 12:09 p.m. UTC | #1
On Wed, Mar 30, 2022 at 01:31:14PM +0200, Mattias Forsblad wrote:
> Greetings,
> 
> This series implements offloading of tc matchall filter to HW
> for bridged DSA ports.
> 
> Background
> When using a non-VLAN filtering bridge we want to be able to drop
> traffic directed to the CPU port so that the CPU doesn't get unnecessary loaded.
> This is specially important when we have disabled learning on user ports.
> 
> A sample configuration could be something like this:
> 
>        br0
>       /   \
>    swp0   swp1
> 
> ip link add dev br0 type bridge stp_state 0 vlan_filtering 0
> ip link set swp0 master br0
> ip link set swp1 master br0
> ip link set swp0 type bridge_slave learning off
> ip link set swp1 type bridge_slave learning off
> ip link set swp0 up
> ip link set swp1 up
> ip link set br0 up
> 
> After discussions here: https://lore.kernel.org/netdev/YjMo9xyoycXgSWXS@shredder/
> it was advised to use tc to set an ingress filter that could then
> be offloaded to HW, like so:
> 
> tc qdisc add dev br0 clsact
> tc filter add dev br0 ingress pref 1 proto all matchall action drop
> 
> Limitations
> If there is tc rules on a bridge and all the ports leave the bridge
> and then joins the bridge again, the indirect framwork doesn't seem
> to reoffload them at join. The tc rules need to be torn down and
> re-added.
> 
> The first part of this serie uses the flow indirect framework to
> setup monitoring of tc qdisc and filters added to a bridge.
> The second part offloads the matchall filter to HW for Marvell
> switches.
> 
> Mattias Forsblad (2):
>   net: tc: dsa: Add the matchall filter with drop action for bridged DSA ports.
>   net: dsa: Implement tc offloading for drop target.
> 
>  drivers/net/dsa/mv88e6xxx/chip.c |  23 +++-
>  include/net/dsa.h                |  13 ++
>  net/dsa/dsa2.c                   |   5 +
>  net/dsa/dsa_priv.h               |   3 +
>  net/dsa/port.c                   |   1 +
>  net/dsa/slave.c                  | 217 ++++++++++++++++++++++++++++++-
>  6 files changed, 258 insertions(+), 4 deletions(-)
> 
> -- 
> 2.25.1
> 

Have you considered point b of my argument here?
https://patchwork.kernel.org/project/netdevbpf/patch/20220317065031.3830481-5-mattias.forsblad@gmail.com/#24782383

To make that argument even clearer, the following script produces:

#!/bin/bash

ip netns add ns0
ip netns add ns1
ip link add veth0 type veth peer name veth1
ip link add veth2 type veth peer name veth3
ip link add br0 type bridge && ip link set br0 up
ip link set veth0 netns ns0
ip link set veth3 netns ns1
ip -n ns0 addr add 192.168.100.1/24 dev veth0 && ip -n ns0 link set veth0 up
ip -n ns1 addr add 192.168.100.2/24 dev veth3 && ip -n ns1 link set veth3 up
ip addr add 192.168.100.3/24 dev br0
ip link set veth1 master br0 && ip link set veth1 up
ip link set veth2 master br0 && ip link set veth2 up
tc qdisc add dev br0 clsact
tc filter add dev br0 ingress matchall action drop
echo "Pinging another bridge port" && ip netns exec ns0 ping -c 3 192.168.100.2
echo "Pinging the bridge" && ip netns exec ns0 ping -c 3 192.168.100.3
ip netns del ns0
ip netns del ns1
ip link del br0

[ 1857.000393] br0: port 1(veth1) entered blocking state
[ 1857.005537] br0: port 1(veth1) entered disabled state
[ 1857.011433] device veth1 entered promiscuous mode
[ 1857.047291] IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
[ 1857.054019] br0: port 1(veth1) entered blocking state
[ 1857.059205] br0: port 1(veth1) entered forwarding state
[ 1857.064791] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 1857.124507] br0: port 2(veth2) entered blocking state
[ 1857.129658] br0: port 2(veth2) entered disabled state
[ 1857.135585] device veth2 entered promiscuous mode
[ 1857.209748] br0: port 2(veth2) entered blocking state
[ 1857.214900] br0: port 2(veth2) entered forwarding state
[ 1857.220680] IPv6: ADDRCONF(NETDEV_CHANGE): veth3: link becomes ready
Pinging another bridge port
PING 192.168.100.2 (192.168.100.2) 56(84) bytes of data.
64 bytes from 192.168.100.2: icmp_seq=1 ttl=64 time=0.508 ms
64 bytes from 192.168.100.2: icmp_seq=2 ttl=64 time=0.222 ms
64 bytes from 192.168.100.2: icmp_seq=3 ttl=64 time=0.405 ms

--- 192.168.100.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2051ms
rtt min/avg/max/mdev = 0.222/0.378/0.508/0.118 ms
Pinging the bridge
PING 192.168.100.3 (192.168.100.3) 56(84) bytes of data.
^C
--- 192.168.100.3 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2040ms

filter protocol all pref 49152 matchall chain 0
filter protocol all pref 49152 matchall chain 0 handle 0x1
  not_in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 12 sec used 6 sec
        Action statistics:
        Sent 936 bytes 16 pkt (dropped 16, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0

[ 1870.189158] br0: port 1(veth1) entered disabled state
[ 1870.204061] device veth1 left promiscuous mode
[ 1870.208751] br0: port 1(veth1) entered disabled state
[ 1870.232677] device veth2 left promiscuous mode
[ 1870.237814] br0: port 2(veth2) entered disabled state

Now imagine that veth0 is a DSA switch interface which monitors and
offloads the drop rule. How could it distinguish between pinging veth3
and pinging br0, so as to comply with software semantics?
Mattias Forsblad March 31, 2022, 8:06 a.m. UTC | #2
On 2022-03-30 14:09, Vladimir Oltean wrote:
> On Wed, Mar 30, 2022 at 01:31:14PM +0200, Mattias Forsblad wrote:
>> Greetings,
>>
>> This series implements offloading of tc matchall filter to HW
>> for bridged DSA ports.
>>
>> Background
>> When using a non-VLAN filtering bridge we want to be able to drop
>> traffic directed to the CPU port so that the CPU doesn't get unnecessary loaded.
>> This is specially important when we have disabled learning on user ports.
>>
>> A sample configuration could be something like this:
>>
>>        br0
>>       /   \
>>    swp0   swp1
>>
>> ip link add dev br0 type bridge stp_state 0 vlan_filtering 0
>> ip link set swp0 master br0
>> ip link set swp1 master br0
>> ip link set swp0 type bridge_slave learning off
>> ip link set swp1 type bridge_slave learning off
>> ip link set swp0 up
>> ip link set swp1 up
>> ip link set br0 up
>>
>> After discussions here: https://lore.kernel.org/netdev/YjMo9xyoycXgSWXS@shredder/
>> it was advised to use tc to set an ingress filter that could then
>> be offloaded to HW, like so:
>>
>> tc qdisc add dev br0 clsact
>> tc filter add dev br0 ingress pref 1 proto all matchall action drop
>>
>> Limitations
>> If there is tc rules on a bridge and all the ports leave the bridge
>> and then joins the bridge again, the indirect framwork doesn't seem
>> to reoffload them at join. The tc rules need to be torn down and
>> re-added.
>>
>> The first part of this serie uses the flow indirect framework to
>> setup monitoring of tc qdisc and filters added to a bridge.
>> The second part offloads the matchall filter to HW for Marvell
>> switches.
>>
>> Mattias Forsblad (2):
>>   net: tc: dsa: Add the matchall filter with drop action for bridged DSA ports.
>>   net: dsa: Implement tc offloading for drop target.
>>
>>  drivers/net/dsa/mv88e6xxx/chip.c |  23 +++-
>>  include/net/dsa.h                |  13 ++
>>  net/dsa/dsa2.c                   |   5 +
>>  net/dsa/dsa_priv.h               |   3 +
>>  net/dsa/port.c                   |   1 +
>>  net/dsa/slave.c                  | 217 ++++++++++++++++++++++++++++++-
>>  6 files changed, 258 insertions(+), 4 deletions(-)
>>
>> -- 
>> 2.25.1
>>
> 
> Have you considered point b of my argument here?
> https://patchwork.kernel.org/project/netdevbpf/patch/20220317065031.3830481-5-mattias.forsblad@gmail.com/#24782383
> 
> To make that argument even clearer, the following script produces:
> 
> #!/bin/bash
> 
> ip netns add ns0
> ip netns add ns1
> ip link add veth0 type veth peer name veth1
> ip link add veth2 type veth peer name veth3
> ip link add br0 type bridge && ip link set br0 up
> ip link set veth0 netns ns0
> ip link set veth3 netns ns1
> ip -n ns0 addr add 192.168.100.1/24 dev veth0 && ip -n ns0 link set veth0 up
> ip -n ns1 addr add 192.168.100.2/24 dev veth3 && ip -n ns1 link set veth3 up
> ip addr add 192.168.100.3/24 dev br0
> ip link set veth1 master br0 && ip link set veth1 up
> ip link set veth2 master br0 && ip link set veth2 up
> tc qdisc add dev br0 clsact
> tc filter add dev br0 ingress matchall action drop
> echo "Pinging another bridge port" && ip netns exec ns0 ping -c 3 192.168.100.2
> echo "Pinging the bridge" && ip netns exec ns0 ping -c 3 192.168.100.3
> ip netns del ns0
> ip netns del ns1
> ip link del br0
> 
> [ 1857.000393] br0: port 1(veth1) entered blocking state
> [ 1857.005537] br0: port 1(veth1) entered disabled state
> [ 1857.011433] device veth1 entered promiscuous mode
> [ 1857.047291] IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> [ 1857.054019] br0: port 1(veth1) entered blocking state
> [ 1857.059205] br0: port 1(veth1) entered forwarding state
> [ 1857.064791] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> [ 1857.124507] br0: port 2(veth2) entered blocking state
> [ 1857.129658] br0: port 2(veth2) entered disabled state
> [ 1857.135585] device veth2 entered promiscuous mode
> [ 1857.209748] br0: port 2(veth2) entered blocking state
> [ 1857.214900] br0: port 2(veth2) entered forwarding state
> [ 1857.220680] IPv6: ADDRCONF(NETDEV_CHANGE): veth3: link becomes ready
> Pinging another bridge port
> PING 192.168.100.2 (192.168.100.2) 56(84) bytes of data.
> 64 bytes from 192.168.100.2: icmp_seq=1 ttl=64 time=0.508 ms
> 64 bytes from 192.168.100.2: icmp_seq=2 ttl=64 time=0.222 ms
> 64 bytes from 192.168.100.2: icmp_seq=3 ttl=64 time=0.405 ms
> 
> --- 192.168.100.2 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2051ms
> rtt min/avg/max/mdev = 0.222/0.378/0.508/0.118 ms
> Pinging the bridge
> PING 192.168.100.3 (192.168.100.3) 56(84) bytes of data.
> ^C
> --- 192.168.100.3 ping statistics ---
> 3 packets transmitted, 0 received, 100% packet loss, time 2040ms
> 
> filter protocol all pref 49152 matchall chain 0
> filter protocol all pref 49152 matchall chain 0 handle 0x1
>   not_in_hw
>         action order 1: gact action drop
>          random type none pass val 0
>          index 1 ref 1 bind 1 installed 12 sec used 6 sec
>         Action statistics:
>         Sent 936 bytes 16 pkt (dropped 16, overlimits 0 requeues 0)
>         backlog 0b 0p requeues 0
> 
> [ 1870.189158] br0: port 1(veth1) entered disabled state
> [ 1870.204061] device veth1 left promiscuous mode
> [ 1870.208751] br0: port 1(veth1) entered disabled state
> [ 1870.232677] device veth2 left promiscuous mode
> [ 1870.237814] br0: port 2(veth2) entered disabled state
> 
> Now imagine that veth0 is a DSA switch interface which monitors and
> offloads the drop rule. How could it distinguish between pinging veth3
> and pinging br0, so as to comply with software semantics?

Hi Vladimir,
thanks for your comments. The patch series takes in account that a foreign
interface is bridged and doesn't offload the rule in this case (dsa_slave_check_offload).

Regarding your previous comment point b. Tobias could see some problems
with that approach. I'd think he will comment on that.
Vladimir Oltean March 31, 2022, 1:42 p.m. UTC | #3
On Thu, Mar 31, 2022 at 10:06:20AM +0200, Mattias Forsblad wrote:
> Hi Vladimir,
> thanks for your comments. The patch series takes in account that a foreign
> interface is bridged and doesn't offload the rule in this case (dsa_slave_check_offload).

I certainly appreciate the intention, but it could be that a foreign
interface will join the bridge after the matchall action drop is
installed on the bridge. So actively monitoring bridge joins/leaves
would be required to offload/unoffload the rule.

> Regarding your previous comment point b. Tobias could see some problems
> with that approach. I'd think he will comment on that.

I'll respond there.
Mattias Forsblad April 1, 2022, 5:21 a.m. UTC | #4
On 2022-03-31 15:42, Vladimir Oltean wrote:
> On Thu, Mar 31, 2022 at 10:06:20AM +0200, Mattias Forsblad wrote:
>> Hi Vladimir,
>> thanks for your comments. The patch series takes in account that a foreign
>> interface is bridged and doesn't offload the rule in this case (dsa_slave_check_offload).
> 
> I certainly appreciate the intention, but it could be that a foreign
> interface will join the bridge after the matchall action drop is
> installed on the bridge. So actively monitoring bridge joins/leaves
> would be required to offload/unoffload the rule.
> 

Ah, you're right. I'll fix that. Thanks.