mbox series

[v4,net-next,0/3] net: dsa: mv88e6xxx: Implement offload of matchall for bridged DSA ports

Message ID 20220411120633.40054-1-mattias.forsblad@gmail.com (mailing list archive)
Headers show
Series net: dsa: mv88e6xxx: Implement offload of matchall for bridged DSA ports | expand

Message

Mattias Forsblad April 11, 2022, 12:06 p.m. UTC
Greetings,

This series implements offloading of tc matchall filter to HW
for bridged DSA ports.

Background
When using a non-VLAN filtering bridge we want to be able to drop
traffic directed to the CPU port so that the CPU doesn't get unnecessary loaded.
This is specially important when we have disabled learning on user ports.

A basic sample configuration could be something like this:

       br0
      /   \
   swp0   swp1

ip link add dev br0 type bridge stp_state 0 vlan_filtering 0
ip link set swp0 master br0
ip link set swp1 master br0
ip link set swp0 type bridge_slave learning off
ip link set swp1 type bridge_slave learning off
ip link set swp0 up
ip link set swp1 up
ip link set br0 up

After discussions here: https://lore.kernel.org/netdev/YjMo9xyoycXgSWXS@shredder/
it was advised to use tc to set an ingress filter that could then
be offloaded to HW, like so:

tc qdisc add dev br0 clsact
tc filter add dev br0 ingress pref 1 proto all matchall action drop

Another situation that needs to be handled is when there is a
foreign interface in the bridge. In this case traffic must reach the
bridge, like the setup below.

               br0
             /  |  \
          swp0 swp1 veth0

Yet another case is when we have ports that are bonded with an
foreign interface added to the bridge.

               br0
             /     \
          bond0   veth0
         /     \
       swp0   swp1

These examples highlight the need to evaluate the bridge stack
to be able to make the right decision about whetever we can
offload this to hw or not.

Limitations
If there is tc rules on a bridge and all the ports leave the bridge
and then joins the bridge again, the indirect framwork doesn't seem
to reoffload them at join. The tc rules need to be torn down and
re-added. This seems to be because of limitations in the tc
framework. A fix for this would need another patch-series in itself.
However we prepare for when this issue is fixed by registring and
deregistring when a dsa_bridge is created/destroyed so it should
work when it's solved.

The first patch in this series now include changes done by Vladimir
Oltean to cleanup netdev notifier code and check for foreign
interfaces. The second part uses the flow indirect framework to
setup monitoring of tc qdisc and filters added to a bridge.
The last part offloads the matchall filter to HW for Marvell
switches.

RFC -> v1: Monitor bridge join/leave and re-evaluate offloading (Vladimir Oltean)
v2: Fix code standard compliance (Jakub Kicinski)
v3: Fix warning from kernel test robot (<lkp@intel.com>)
v4: Check matchall priority (Jakub)
    Use boolean type (Vladimir)
    Use Vladimirs code for checking foreign interfaces
    Drop unused argument (Vladimir)
    Add switchdev notifier (Vladimir)
    Only call ops when value have changed (Vladimir)
    Add error check (Vladimir)

Mattias Forsblad (3):
  net: dsa: track whetever bridges have foreign interfaces in them
  net: dsa: Add support for offloading tc matchall with drop target
  net: dsa: mv88e6xxx: Add HW offload support for tc matchall in Marvell
    switches

 drivers/net/dsa/mv88e6xxx/chip.c |  17 +-
 include/net/dsa.h                |  15 ++
 include/net/switchdev.h          |   2 +
 net/dsa/dsa2.c                   |   2 +
 net/dsa/dsa_priv.h               |   3 +
 net/dsa/port.c                   |  14 ++
 net/dsa/slave.c                  | 321 +++++++++++++++++++++++++++++--
 7 files changed, 361 insertions(+), 13 deletions(-)

Comments

Vladimir Oltean April 11, 2022, 12:39 p.m. UTC | #1
On Mon, Apr 11, 2022 at 02:06:30PM +0200, Mattias Forsblad wrote:
> RFC -> v1: Monitor bridge join/leave and re-evaluate offloading (Vladimir Oltean)
> v2: Fix code standard compliance (Jakub Kicinski)
> v3: Fix warning from kernel test robot (<lkp@intel.com>)
> v4: Check matchall priority (Jakub)
>     Use boolean type (Vladimir)
>     Use Vladimirs code for checking foreign interfaces
>     Drop unused argument (Vladimir)
>     Add switchdev notifier (Vladimir)

By switchdev notifier you mean SWITCHDEV_ATTR_ID_BRIDGE_LOCAL_RCV?
I'm sorry, you must have misunderstood. I said, in reference to
dp->ds->ops->bridge_local_rcv():

| Not to mention this should be a cross-chip notifier, maybe a
| cross-tree notifier.

https://patchwork.kernel.org/project/netdevbpf/patch/20220404104826.1902292-2-mattias.forsblad@gmail.com/#24805497

A cross-chip notifier is an event signaled using dsa_tree_notify() and
handled in switch.c. Its purpose is to replicate an event exactly once
towards all switches in a multi-switch topology.

You could have explained that this isn't necessary, because
dsa_slave_setup_bridge_tc_indr_block(netdev=bridge_dev) indirectly binds
dsa_slave_setup_bridge_block_cb() which calls dsa_slave_setup_tc_block_cb()
for each user port under said bridge. So replicating the ds->ops->bridge_local_rcv()
towards each switch is already taken care of in another way, although
suboptimally, because if there are 4 user ports under br0 in switch A
and 4 user ports in switch B, ds->ops->bridge_local_rcv() will be called
4 times for switch A and 4 times for switch B. 6 out of those 8 calls
are for nothing.

Or you could have said that you don't understand the request and ask me
to clarify.

But I don't understand why you've added SWITCHDEV_ATTR_ID_BRIDGE_LOCAL_RCV
which has no consumer. Initially I thought you'd go back to having the
bridge monitor flow blocks binding to its ingress chain instead of this
broken indirect stuff, then emit SWITCHDEV_ATTR_ID_BRIDGE_LOCAL_RCV on
the switchdev notifier chain which DSA catches and offloads. And initial
state would be synced/unsynced via attribute replays in
dsa_port_switchdev_sync_attrs(). At least that would have worked.
But nope. It really looks like SWITCHDEV_ATTR_ID_BRIDGE_LOCAL_RCV was
added to appease an unclear comment even if it made no sense.

>     Only call ops when value have changed (Vladimir)
>     Add error check (Vladimir)
> 
> Mattias Forsblad (3):
>   net: dsa: track whetever bridges have foreign interfaces in them
>   net: dsa: Add support for offloading tc matchall with drop target
>   net: dsa: mv88e6xxx: Add HW offload support for tc matchall in Marvell
>     switches
> 
>  drivers/net/dsa/mv88e6xxx/chip.c |  17 +-
>  include/net/dsa.h                |  15 ++
>  include/net/switchdev.h          |   2 +
>  net/dsa/dsa2.c                   |   2 +
>  net/dsa/dsa_priv.h               |   3 +
>  net/dsa/port.c                   |  14 ++
>  net/dsa/slave.c                  | 321 +++++++++++++++++++++++++++++--
>  7 files changed, 361 insertions(+), 13 deletions(-)
> 
> -- 
> 2.25.1
>
Mattias Forsblad April 11, 2022, 12:48 p.m. UTC | #2
On 2022-04-11 14:39, Vladimir Oltean wrote:
> On Mon, Apr 11, 2022 at 02:06:30PM +0200, Mattias Forsblad wrote:
>> RFC -> v1: Monitor bridge join/leave and re-evaluate offloading (Vladimir Oltean)
>> v2: Fix code standard compliance (Jakub Kicinski)
>> v3: Fix warning from kernel test robot (<lkp@intel.com>)
>> v4: Check matchall priority (Jakub)
>>     Use boolean type (Vladimir)
>>     Use Vladimirs code for checking foreign interfaces
>>     Drop unused argument (Vladimir)
>>     Add switchdev notifier (Vladimir)
> 
> By switchdev notifier you mean SWITCHDEV_ATTR_ID_BRIDGE_LOCAL_RCV?
> I'm sorry, you must have misunderstood. I said, in reference to
> dp->ds->ops->bridge_local_rcv():
> 
> | Not to mention this should be a cross-chip notifier, maybe a
> | cross-tree notifier.
> 
> https://patchwork.kernel.org/project/netdevbpf/patch/20220404104826.1902292-2-mattias.forsblad@gmail.com/#24805497
> 
> A cross-chip notifier is an event signaled using dsa_tree_notify() and
> handled in switch.c. Its purpose is to replicate an event exactly once
> towards all switches in a multi-switch topology.
> 
> You could have explained that this isn't necessary, because
> dsa_slave_setup_bridge_tc_indr_block(netdev=bridge_dev) indirectly binds
> dsa_slave_setup_bridge_block_cb() which calls dsa_slave_setup_tc_block_cb()
> for each user port under said bridge. So replicating the ds->ops->bridge_local_rcv()
> towards each switch is already taken care of in another way, although
> suboptimally, because if there are 4 user ports under br0 in switch A
> and 4 user ports in switch B, ds->ops->bridge_local_rcv() will be called
> 4 times for switch A and 4 times for switch B. 6 out of those 8 calls
> are for nothing.
> 
> Or you could have said that you don't understand the request and ask me
> to clarify.
> 
> But I don't understand why you've added SWITCHDEV_ATTR_ID_BRIDGE_LOCAL_RCV
> which has no consumer. Initially I thought you'd go back to having the
> bridge monitor flow blocks binding to its ingress chain instead of this
> broken indirect stuff, then emit SWITCHDEV_ATTR_ID_BRIDGE_LOCAL_RCV on
> the switchdev notifier chain which DSA catches and offloads. And initial
> state would be synced/unsynced via attribute replays in
> dsa_port_switchdev_sync_attrs(). At least that would have worked.
> But nope. It really looks like SWITCHDEV_ATTR_ID_BRIDGE_LOCAL_RCV was
> added to appease an unclear comment even if it made no sense.
> 

My thinking was that the notifier I was aware of was the one I implemented
and someway was a preparation for consumers (that didn't exist yes). I didn't even
know about dsa_tree_notify(). So I'll remove that part, yes? Is that ok even
if it's not optimal, like you say?


>>     Only call ops when value have changed (Vladimir)
>>     Add error check (Vladimir)
>>
>> Mattias Forsblad (3):
>>   net: dsa: track whetever bridges have foreign interfaces in them
>>   net: dsa: Add support for offloading tc matchall with drop target
>>   net: dsa: mv88e6xxx: Add HW offload support for tc matchall in Marvell
>>     switches
>>
>>  drivers/net/dsa/mv88e6xxx/chip.c |  17 +-
>>  include/net/dsa.h                |  15 ++
>>  include/net/switchdev.h          |   2 +
>>  net/dsa/dsa2.c                   |   2 +
>>  net/dsa/dsa_priv.h               |   3 +
>>  net/dsa/port.c                   |  14 ++
>>  net/dsa/slave.c                  | 321 +++++++++++++++++++++++++++++--
>>  7 files changed, 361 insertions(+), 13 deletions(-)
>>
>> -- 
>> 2.25.1
>>
Vladimir Oltean April 11, 2022, 1:16 p.m. UTC | #3
On Mon, Apr 11, 2022 at 02:48:29PM +0200, Mattias Forsblad wrote:
> My thinking was that the notifier I was aware of was the one I implemented
> and someway was a preparation for consumers (that didn't exist yes). I didn't even
> know about dsa_tree_notify(). So I'll remove that part, yes? Is that ok even
> if it's not optimal, like you say?

Optimal or sub-optimal, I think there are bigger problems which we're
taking too lightly. Like this one:

| If there is tc rules on a bridge and all the ports leave the bridge
| and then joins the bridge again, the indirect framwork doesn't seem
| to reoffload them at join. The tc rules need to be torn down and
| re-added. This seems to be because of limitations in the tc
| framework. A fix for this would need another patch-series in itself.
| However we prepare for when this issue is fixed by registring and
| deregistring when a dsa_bridge is created/destroyed so it should
| work when it's solved.

You've presented just the more convenient angle of the limitation
(DSA interfaces present as bridge ports, then leave, then re-join).
But the same problem is there even when the tc rule is added to the
bridge before adding any port to it. Which is more likely to result in
buggy user experience, because it doesn't say anywhere that there are
restrictions to the order in which things should be set up.

Personally, I would first try to ask for help, as some work was clearly
put into the indirect flow block offload, and the reasonable expectation
is that the filter replay (and not just the bind/unbind) works on
register/unregister, yet for some unknown reason it doesn't.

Then, if I get no answer/help, I would consider as an alternative
implementing the flow block binding logic in the bridge, and
re-notifying the relevant stuff through switchdev. Via this
SWITCHDEV_ATTR_ID_BRIDGE_LOCAL_RCV we're talking about - hence the
reason I'm mentioning it. After all, the flow block binding code is
mostly bloatware, so we can reduce the duplicated code switchdev drivers
have to write.

But I wouldn't consider leaving this up in the air if there is a
non-complicated way to address it, which it seems like there is.

So in short, let's discuss what's optimal overall only when we see an
implementation that fully works, ok? It was my mistake to point out
during review minor optimizations that could be made even if they don't
follow the overall direction that the patch set should go in.