[net-next,0/3] bridge: dsa: switchdev: mv88e6xxx: Implement local_receive bridge flag

Message ID	20220301123104.226731-1-mattias.forsblad+netdev@gmail.com (mailing list archive)
Headers	show Return-Path: <netdev-owner@kernel.org> From: Mattias Forsblad <mattias.forsblad@gmail.com> To: netdev@vger.kernel.org Cc: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Andrew Lunn <andrew@lunn.ch>, Florian Fainelli <f.fainelli@gmail.com>, Vivien Didelot <vivien.didelot@gmail.com>, Roopa Prabhu <roopa@nvidia.com>, Nikolay Aleksandrov <razor@blackwall.org>, Mattias Forsblad <mattias.forsblad+netdev@gmail.com>, Tobias Waldekranz <tobias@waldekranz.com> Subject: [PATCH net-next 0/3] bridge: dsa: switchdev: mv88e6xxx: Implement local_receive bridge flag Date: Tue, 1 Mar 2022 13:31:01 +0100 Message-Id: <20220301123104.226731-1-mattias.forsblad+netdev@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	bridge: dsa: switchdev: mv88e6xxx: Implement local_receive bridge flag \| expand [net-next,0/3] bridge: dsa: switchdev: mv88e6xxx: Implement local_receive bridge flag [1/3] net: bridge: Implement bridge flag local_receive [2/3] dsa: Handle the local_receive flag in the DSA layer. [3/3] mv88e6xxx: Offload the local_receive flag

Mattias Forsblad March 1, 2022, 12:31 p.m. UTC

Greetings,

This series implements a new bridge flag 'local_receive' and HW
offloading for Marvell mv88e6xxx.

When using a non-VLAN filtering bridge we want to be able to limit
traffic to the CPU port to lessen the CPU load. This is specially
important when we have disabled learning on user ports.

A sample configuration could be something like this:

       br0
      /   \
   swp0   swp1

ip link add dev br0 type bridge stp_state 0 vlan_filtering 0
ip link set swp0 master br0
ip link set swp1 master br0
ip link set swp0 type bridge_slave learning off
ip link set swp1 type bridge_slave learning off
ip link set swp0 up
ip link set swp1 up
ip link set br0 type bridge local_receive 0
ip link set br0 up

The first part of the series implements the flag for the SW bridge
and the second part the DSA infrastructure. The last part implements
offloading of this flag to HW for mv88e6xxx, which uses the
port vlan table to restrict the ingress from user ports
to the CPU port when this flag is cleared.

Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>

Regards,
Mattias Forsblad (3):
  net: bridge: Implement bridge flag local_receive
  dsa: Handle the local_receive flag in the DSA layer.
  mv88e6xxx: Offload the local_receive flag

 drivers/net/dsa/mv88e6xxx/chip.c | 45 ++++++++++++++++++++++++++++++--
 include/linux/if_bridge.h        |  6 +++++
 include/net/dsa.h                |  6 +++++
 include/net/switchdev.h          |  2 ++
 include/uapi/linux/if_bridge.h   |  1 +
 include/uapi/linux/if_link.h     |  1 +
 net/bridge/br.c                  | 18 +++++++++++++
 net/bridge/br_device.c           |  1 +
 net/bridge/br_input.c            |  3 +++
 net/bridge/br_ioctl.c            |  1 +
 net/bridge/br_netlink.c          | 14 +++++++++-
 net/bridge/br_private.h          |  2 ++
 net/bridge/br_sysfs_br.c         | 23 ++++++++++++++++
 net/bridge/br_vlan.c             |  8 ++++++
 net/dsa/dsa_priv.h               |  1 +
 net/dsa/slave.c                  | 16 ++++++++++++
 16 files changed, 145 insertions(+), 3 deletions(-)

Florian Fainelli March 1, 2022, 5:14 p.m. UTC | #1

On 3/1/2022 4:31 AM, Mattias Forsblad wrote:
> Greetings,
> 
> This series implements a new bridge flag 'local_receive' and HW
> offloading for Marvell mv88e6xxx.
> 
> When using a non-VLAN filtering bridge we want to be able to limit
> traffic to the CPU port to lessen the CPU load. This is specially
> important when we have disabled learning on user ports.
> 
> A sample configuration could be something like this:
> 
>         br0
>        /   \
>     swp0   swp1
> 
> ip link add dev br0 type bridge stp_state 0 vlan_filtering 0
> ip link set swp0 master br0
> ip link set swp1 master br0
> ip link set swp0 type bridge_slave learning off
> ip link set swp1 type bridge_slave learning off
> ip link set swp0 up
> ip link set swp1 up
> ip link set br0 type bridge local_receive 0
> ip link set br0 up
> 
> The first part of the series implements the flag for the SW bridge
> and the second part the DSA infrastructure. The last part implements
> offloading of this flag to HW for mv88e6xxx, which uses the
> port vlan table to restrict the ingress from user ports
> to the CPU port when this flag is cleared.

Why not use a bridge with VLAN filtering enabled? I cannot quite find it 
right now, but Vladimir recently picked up what I had attempted before 
which was to allow removing the CPU port (via the bridge master device) 
from a specific group of VLANs to achieve that isolation.

> 
> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>

I don't believe this tag has much value since it was presumably carried 
over from an internal review. Might be worth adding it publicly now, though.

> 
> Regards,
> Mattias Forsblad (3):
>    net: bridge: Implement bridge flag local_receive
>    dsa: Handle the local_receive flag in the DSA layer.
>    mv88e6xxx: Offload the local_receive flag
> 
>   drivers/net/dsa/mv88e6xxx/chip.c | 45 ++++++++++++++++++++++++++++++--
>   include/linux/if_bridge.h        |  6 +++++
>   include/net/dsa.h                |  6 +++++
>   include/net/switchdev.h          |  2 ++
>   include/uapi/linux/if_bridge.h   |  1 +
>   include/uapi/linux/if_link.h     |  1 +
>   net/bridge/br.c                  | 18 +++++++++++++
>   net/bridge/br_device.c           |  1 +
>   net/bridge/br_input.c            |  3 +++
>   net/bridge/br_ioctl.c            |  1 +
>   net/bridge/br_netlink.c          | 14 +++++++++-
>   net/bridge/br_private.h          |  2 ++
>   net/bridge/br_sysfs_br.c         | 23 ++++++++++++++++
>   net/bridge/br_vlan.c             |  8 ++++++
>   net/dsa/dsa_priv.h               |  1 +
>   net/dsa/slave.c                  | 16 ++++++++++++
>   16 files changed, 145 insertions(+), 3 deletions(-)
>

Tobias Waldekranz March 1, 2022, 9:04 p.m. UTC | #2

On Tue, Mar 01, 2022 at 09:14, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 3/1/2022 4:31 AM, Mattias Forsblad wrote:
>> Greetings,
>> 
>> This series implements a new bridge flag 'local_receive' and HW
>> offloading for Marvell mv88e6xxx.
>> 
>> When using a non-VLAN filtering bridge we want to be able to limit
>> traffic to the CPU port to lessen the CPU load. This is specially
>> important when we have disabled learning on user ports.
>> 
>> A sample configuration could be something like this:
>> 
>>         br0
>>        /   \
>>     swp0   swp1
>> 
>> ip link add dev br0 type bridge stp_state 0 vlan_filtering 0
>> ip link set swp0 master br0
>> ip link set swp1 master br0
>> ip link set swp0 type bridge_slave learning off
>> ip link set swp1 type bridge_slave learning off
>> ip link set swp0 up
>> ip link set swp1 up
>> ip link set br0 type bridge local_receive 0
>> ip link set br0 up
>> 
>> The first part of the series implements the flag for the SW bridge
>> and the second part the DSA infrastructure. The last part implements
>> offloading of this flag to HW for mv88e6xxx, which uses the
>> port vlan table to restrict the ingress from user ports
>> to the CPU port when this flag is cleared.
>
> Why not use a bridge with VLAN filtering enabled? I cannot quite find it 
> right now, but Vladimir recently picked up what I had attempted before 
> which was to allow removing the CPU port (via the bridge master device) 
> from a specific group of VLANs to achieve that isolation.
>

Hi Florian,

Yes we are aware of this work, which is awesome by the way! For anyone
else who is interested, I believe you are referring to this series:

https://lore.kernel.org/netdev/20220215170218.2032432-1-vladimir.oltean@nxp.com/

There are cases though, where you want a TPMR-like setup (or "dumb hub"
mode, if you will) and ignore all tag information.

One application could be to use a pair of ports on a switch as an
ethernet extender/repeater for topologies that span large physical
distances. If this repeater is part of a redundant topology, you'd to
well to disable learning, in order to avoid dropping packets when the
surrounding active topology changes. This, in turn, will mean that all
flows will be classified as unknown unicast. For that reason it is very
important that the CPU be shielded.

You might be tempted to solve this using flooding filters of the
switch's CPU port, but these go out the window if you have another
bridge configured, that requires that flooding of unknown traffic is
enabled.

Another application is to create a similar setup, but with three ports,
and have the third one be used as a TAP.

>> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
>
> I don't believe this tag has much value since it was presumably carried 
> over from an internal review. Might be worth adding it publicly now, though.

I think Mattias meant to replicate this tag on each individual
patch. Aside from that though, are you saying that a tag is never valid
unless there is a public message on the list from the signee? Makes
sense I suppose. Anyway, I will send separate tags for this series.

Vladimir Oltean March 17, 2022, 2:05 p.m. UTC | #3

Hello Tobias,

On Tue, Mar 01, 2022 at 10:04:09PM +0100, Tobias Waldekranz wrote:
> On Tue, Mar 01, 2022 at 09:14, Florian Fainelli <f.fainelli@gmail.com> wrote:
> > On 3/1/2022 4:31 AM, Mattias Forsblad wrote:
> >> Greetings,
> >> 
> >> This series implements a new bridge flag 'local_receive' and HW
> >> offloading for Marvell mv88e6xxx.
> >> 
> >> When using a non-VLAN filtering bridge we want to be able to limit
> >> traffic to the CPU port to lessen the CPU load. This is specially
> >> important when we have disabled learning on user ports.
> >> 
> >> A sample configuration could be something like this:
> >> 
> >>         br0
> >>        /   \
> >>     swp0   swp1
> >> 
> >> ip link add dev br0 type bridge stp_state 0 vlan_filtering 0
> >> ip link set swp0 master br0
> >> ip link set swp1 master br0
> >> ip link set swp0 type bridge_slave learning off
> >> ip link set swp1 type bridge_slave learning off
> >> ip link set swp0 up
> >> ip link set swp1 up
> >> ip link set br0 type bridge local_receive 0
> >> ip link set br0 up
> >> 
> >> The first part of the series implements the flag for the SW bridge
> >> and the second part the DSA infrastructure. The last part implements
> >> offloading of this flag to HW for mv88e6xxx, which uses the
> >> port vlan table to restrict the ingress from user ports
> >> to the CPU port when this flag is cleared.
> >
> > Why not use a bridge with VLAN filtering enabled? I cannot quite find it 
> > right now, but Vladimir recently picked up what I had attempted before 
> > which was to allow removing the CPU port (via the bridge master device) 
> > from a specific group of VLANs to achieve that isolation.
> >
> 
> Hi Florian,
> 
> Yes we are aware of this work, which is awesome by the way! For anyone
> else who is interested, I believe you are referring to this series:
> 
> https://lore.kernel.org/netdev/20220215170218.2032432-1-vladimir.oltean@nxp.com/
> 
> There are cases though, where you want a TPMR-like setup (or "dumb hub"
> mode, if you will) and ignore all tag information.
> 
> One application could be to use a pair of ports on a switch as an
> ethernet extender/repeater for topologies that span large physical
> distances. If this repeater is part of a redundant topology, you'd to
> well to disable learning, in order to avoid dropping packets when the
> surrounding active topology changes. This, in turn, will mean that all
> flows will be classified as unknown unicast. For that reason it is very
> important that the CPU be shielded.

So have you seriously considered making the bridge ports that operate in
'dumb hub' mode have a pvid which isn't installed as a 'self' entry on
the bridge device?

> You might be tempted to solve this using flooding filters of the
> switch's CPU port, but these go out the window if you have another
> bridge configured, that requires that flooding of unknown traffic is
> enabled.

Not if CPU flooding can be managed on a per-user-port basis.

> Another application is to create a similar setup, but with three ports,
> and have the third one be used as a TAP.

Could you expand more on this use case?

> >> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
> >
> > I don't believe this tag has much value since it was presumably carried 
> > over from an internal review. Might be worth adding it publicly now, though.
> 
> I think Mattias meant to replicate this tag on each individual
> patch. Aside from that though, are you saying that a tag is never valid
> unless there is a public message on the list from the signee? Makes
> sense I suppose. Anyway, I will send separate tags for this series.

Tobias Waldekranz March 18, 2022, 7:58 a.m. UTC | #4

On Thu, Mar 17, 2022 at 16:05, Vladimir Oltean <olteanv@gmail.com> wrote:
> Hello Tobias,
>
> On Tue, Mar 01, 2022 at 10:04:09PM +0100, Tobias Waldekranz wrote:
>> On Tue, Mar 01, 2022 at 09:14, Florian Fainelli <f.fainelli@gmail.com> wrote:
>> > On 3/1/2022 4:31 AM, Mattias Forsblad wrote:
>> >> Greetings,
>> >> 
>> >> This series implements a new bridge flag 'local_receive' and HW
>> >> offloading for Marvell mv88e6xxx.
>> >> 
>> >> When using a non-VLAN filtering bridge we want to be able to limit
>> >> traffic to the CPU port to lessen the CPU load. This is specially
>> >> important when we have disabled learning on user ports.
>> >> 
>> >> A sample configuration could be something like this:
>> >> 
>> >>         br0
>> >>        /   \
>> >>     swp0   swp1
>> >> 
>> >> ip link add dev br0 type bridge stp_state 0 vlan_filtering 0
>> >> ip link set swp0 master br0
>> >> ip link set swp1 master br0
>> >> ip link set swp0 type bridge_slave learning off
>> >> ip link set swp1 type bridge_slave learning off
>> >> ip link set swp0 up
>> >> ip link set swp1 up
>> >> ip link set br0 type bridge local_receive 0
>> >> ip link set br0 up
>> >> 
>> >> The first part of the series implements the flag for the SW bridge
>> >> and the second part the DSA infrastructure. The last part implements
>> >> offloading of this flag to HW for mv88e6xxx, which uses the
>> >> port vlan table to restrict the ingress from user ports
>> >> to the CPU port when this flag is cleared.
>> >
>> > Why not use a bridge with VLAN filtering enabled? I cannot quite find it 
>> > right now, but Vladimir recently picked up what I had attempted before 
>> > which was to allow removing the CPU port (via the bridge master device) 
>> > from a specific group of VLANs to achieve that isolation.
>> >
>> 
>> Hi Florian,
>> 
>> Yes we are aware of this work, which is awesome by the way! For anyone
>> else who is interested, I believe you are referring to this series:
>> 
>> https://lore.kernel.org/netdev/20220215170218.2032432-1-vladimir.oltean@nxp.com/
>> 
>> There are cases though, where you want a TPMR-like setup (or "dumb hub"
>> mode, if you will) and ignore all tag information.
>> 
>> One application could be to use a pair of ports on a switch as an
>> ethernet extender/repeater for topologies that span large physical
>> distances. If this repeater is part of a redundant topology, you'd to
>> well to disable learning, in order to avoid dropping packets when the
>> surrounding active topology changes. This, in turn, will mean that all
>> flows will be classified as unknown unicast. For that reason it is very
>> important that the CPU be shielded.
>
> So have you seriously considered making the bridge ports that operate in
> 'dumb hub' mode have a pvid which isn't installed as a 'self' entry on
> the bridge device?

Just so there's no confusion, you mean something like...

    ip link add dev br0 type bridge vlan_filtering 1 vlan_default_pvid 0

    for p in swp0 swp1; do
        ip link set dev $p master br0
        bridge vlan add dev $p vid 1 pvid untagged
    done

... right?

In that case, the repeater is no longer transparent with respect to
tagged packets, which the application requires.

>> You might be tempted to solve this using flooding filters of the
>> switch's CPU port, but these go out the window if you have another
>> bridge configured, that requires that flooding of unknown traffic is
>> enabled.
>
> Not if CPU flooding can be managed on a per-user-port basis.

True, but we aren't lucky enough to have hardware that can do that :)

>> Another application is to create a similar setup, but with three ports,
>> and have the third one be used as a TAP.
>
> Could you expand more on this use case?

Its just the standard use-case for a TAP really. You have some link of
interest that you want to snoop, but for some reason there is no way of
getting a PCAP from the station on either side:

   Link of interest
          |
.-------. v .-------.
| Alice +---+  Bob  |
'-------'   '-------'

So you insert a hub in the middle, and listen on a third port:

.-------.   .-----.   .-------.
| Alice +---+ TAP +---+  Bob  |
'-------'   '--+--'   '-------'
               |
 PC running tcpdump/wireshark

The nice thing about being able to set this up in Linux is that if your
hardware comes with a mix of media types, you can dynamically create the
TAP for the job at hand. E.g. if Alice and Bob are communicating over a
fiber, but your PC only has a copper interface, you can bridge to fiber
ports with one copper; if you need to monitor a copper link 5min later,
you just swap out the fiber ports for two copper ports.

>> >> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
>> >
>> > I don't believe this tag has much value since it was presumably carried 
>> > over from an internal review. Might be worth adding it publicly now, though.
>> 
>> I think Mattias meant to replicate this tag on each individual
>> patch. Aside from that though, are you saying that a tag is never valid
>> unless there is a public message on the list from the signee? Makes
>> sense I suppose. Anyway, I will send separate tags for this series.

Vladimir Oltean March 18, 2022, 11:11 a.m. UTC | #5

On Fri, Mar 18, 2022 at 08:58:11AM +0100, Tobias Waldekranz wrote:
> On Thu, Mar 17, 2022 at 16:05, Vladimir Oltean <olteanv@gmail.com> wrote:
> > Hello Tobias,
> >
> > On Tue, Mar 01, 2022 at 10:04:09PM +0100, Tobias Waldekranz wrote:
> >> On Tue, Mar 01, 2022 at 09:14, Florian Fainelli <f.fainelli@gmail.com> wrote:
> >> > On 3/1/2022 4:31 AM, Mattias Forsblad wrote:
> >> >> Greetings,
> >> >> 
> >> >> This series implements a new bridge flag 'local_receive' and HW
> >> >> offloading for Marvell mv88e6xxx.
> >> >> 
> >> >> When using a non-VLAN filtering bridge we want to be able to limit
> >> >> traffic to the CPU port to lessen the CPU load. This is specially
> >> >> important when we have disabled learning on user ports.
> >> >> 
> >> >> A sample configuration could be something like this:
> >> >> 
> >> >>         br0
> >> >>        /   \
> >> >>     swp0   swp1
> >> >> 
> >> >> ip link add dev br0 type bridge stp_state 0 vlan_filtering 0
> >> >> ip link set swp0 master br0
> >> >> ip link set swp1 master br0
> >> >> ip link set swp0 type bridge_slave learning off
> >> >> ip link set swp1 type bridge_slave learning off
> >> >> ip link set swp0 up
> >> >> ip link set swp1 up
> >> >> ip link set br0 type bridge local_receive 0
> >> >> ip link set br0 up
> >> >> 
> >> >> The first part of the series implements the flag for the SW bridge
> >> >> and the second part the DSA infrastructure. The last part implements
> >> >> offloading of this flag to HW for mv88e6xxx, which uses the
> >> >> port vlan table to restrict the ingress from user ports
> >> >> to the CPU port when this flag is cleared.
> >> >
> >> > Why not use a bridge with VLAN filtering enabled? I cannot quite find it 
> >> > right now, but Vladimir recently picked up what I had attempted before 
> >> > which was to allow removing the CPU port (via the bridge master device) 
> >> > from a specific group of VLANs to achieve that isolation.
> >> >
> >> 
> >> Hi Florian,
> >> 
> >> Yes we are aware of this work, which is awesome by the way! For anyone
> >> else who is interested, I believe you are referring to this series:
> >> 
> >> https://lore.kernel.org/netdev/20220215170218.2032432-1-vladimir.oltean@nxp.com/
> >> 
> >> There are cases though, where you want a TPMR-like setup (or "dumb hub"
> >> mode, if you will) and ignore all tag information.
> >> 
> >> One application could be to use a pair of ports on a switch as an
> >> ethernet extender/repeater for topologies that span large physical
> >> distances. If this repeater is part of a redundant topology, you'd to
> >> well to disable learning, in order to avoid dropping packets when the
> >> surrounding active topology changes. This, in turn, will mean that all
> >> flows will be classified as unknown unicast. For that reason it is very
> >> important that the CPU be shielded.
> >
> > So have you seriously considered making the bridge ports that operate in
> > 'dumb hub' mode have a pvid which isn't installed as a 'self' entry on
> > the bridge device?
> 
> Just so there's no confusion, you mean something like...
> 
>     ip link add dev br0 type bridge vlan_filtering 1 vlan_default_pvid 0
> 
>     for p in swp0 swp1; do
>         ip link set dev $p master br0
>         bridge vlan add dev $p vid 1 pvid untagged
>     done
> 
> ... right?
> 
> In that case, the repeater is no longer transparent with respect to
> tagged packets, which the application requires.

If you are sure that there exists one VLAN ID which is never used (like
4094), what you could do is you could set the port pvids to that VID
instead of 1, and add the entire VLAN_N_VID range sans that VID in the
membership list of the two ports, as egress-tagged.

This is 'practical transparency' - if true transparency is required then
yes, this doesn't work.

> >> You might be tempted to solve this using flooding filters of the
> >> switch's CPU port, but these go out the window if you have another
> >> bridge configured, that requires that flooding of unknown traffic is
> >> enabled.
> >
> > Not if CPU flooding can be managed on a per-user-port basis.
> 
> True, but we aren't lucky enough to have hardware that can do that :)
> 
> >> Another application is to create a similar setup, but with three ports,
> >> and have the third one be used as a TAP.
> >
> > Could you expand more on this use case?
> 
> Its just the standard use-case for a TAP really. You have some link of
> interest that you want to snoop, but for some reason there is no way of
> getting a PCAP from the station on either side:
> 
>    Link of interest
>           |
> .-------. v .-------.
> | Alice +---+  Bob  |
> '-------'   '-------'
> 
> So you insert a hub in the middle, and listen on a third port:
> 
> .-------.   .-----.   .-------.
> | Alice +---+ TAP +---+  Bob  |
> '-------'   '--+--'   '-------'
>                |
>  PC running tcpdump/wireshark
> 
> The nice thing about being able to set this up in Linux is that if your
> hardware comes with a mix of media types, you can dynamically create the
> TAP for the job at hand. E.g. if Alice and Bob are communicating over a
> fiber, but your PC only has a copper interface, you can bridge to fiber
> ports with one copper; if you need to monitor a copper link 5min later,
> you just swap out the fiber ports for two copper ports.
> 
> >> >> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
> >> >
> >> > I don't believe this tag has much value since it was presumably carried 
> >> > over from an internal review. Might be worth adding it publicly now, though.
> >> 
> >> I think Mattias meant to replicate this tag on each individual
> >> patch. Aside from that though, are you saying that a tag is never valid
> >> unless there is a public message on the list from the signee? Makes
> >> sense I suppose. Anyway, I will send separate tags for this series.

Tobias Waldekranz March 18, 2022, 12:09 p.m. UTC | #6

On Fri, Mar 18, 2022 at 13:11, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Fri, Mar 18, 2022 at 08:58:11AM +0100, Tobias Waldekranz wrote:
>> On Thu, Mar 17, 2022 at 16:05, Vladimir Oltean <olteanv@gmail.com> wrote:
>> > Hello Tobias,
>> >
>> > On Tue, Mar 01, 2022 at 10:04:09PM +0100, Tobias Waldekranz wrote:
>> >> On Tue, Mar 01, 2022 at 09:14, Florian Fainelli <f.fainelli@gmail.com> wrote:
>> >> > On 3/1/2022 4:31 AM, Mattias Forsblad wrote:
>> >> >> Greetings,
>> >> >> 
>> >> >> This series implements a new bridge flag 'local_receive' and HW
>> >> >> offloading for Marvell mv88e6xxx.
>> >> >> 
>> >> >> When using a non-VLAN filtering bridge we want to be able to limit
>> >> >> traffic to the CPU port to lessen the CPU load. This is specially
>> >> >> important when we have disabled learning on user ports.
>> >> >> 
>> >> >> A sample configuration could be something like this:
>> >> >> 
>> >> >>         br0
>> >> >>        /   \
>> >> >>     swp0   swp1
>> >> >> 
>> >> >> ip link add dev br0 type bridge stp_state 0 vlan_filtering 0
>> >> >> ip link set swp0 master br0
>> >> >> ip link set swp1 master br0
>> >> >> ip link set swp0 type bridge_slave learning off
>> >> >> ip link set swp1 type bridge_slave learning off
>> >> >> ip link set swp0 up
>> >> >> ip link set swp1 up
>> >> >> ip link set br0 type bridge local_receive 0
>> >> >> ip link set br0 up
>> >> >> 
>> >> >> The first part of the series implements the flag for the SW bridge
>> >> >> and the second part the DSA infrastructure. The last part implements
>> >> >> offloading of this flag to HW for mv88e6xxx, which uses the
>> >> >> port vlan table to restrict the ingress from user ports
>> >> >> to the CPU port when this flag is cleared.
>> >> >
>> >> > Why not use a bridge with VLAN filtering enabled? I cannot quite find it 
>> >> > right now, but Vladimir recently picked up what I had attempted before 
>> >> > which was to allow removing the CPU port (via the bridge master device) 
>> >> > from a specific group of VLANs to achieve that isolation.
>> >> >
>> >> 
>> >> Hi Florian,
>> >> 
>> >> Yes we are aware of this work, which is awesome by the way! For anyone
>> >> else who is interested, I believe you are referring to this series:
>> >> 
>> >> https://lore.kernel.org/netdev/20220215170218.2032432-1-vladimir.oltean@nxp.com/
>> >> 
>> >> There are cases though, where you want a TPMR-like setup (or "dumb hub"
>> >> mode, if you will) and ignore all tag information.
>> >> 
>> >> One application could be to use a pair of ports on a switch as an
>> >> ethernet extender/repeater for topologies that span large physical
>> >> distances. If this repeater is part of a redundant topology, you'd to
>> >> well to disable learning, in order to avoid dropping packets when the
>> >> surrounding active topology changes. This, in turn, will mean that all
>> >> flows will be classified as unknown unicast. For that reason it is very
>> >> important that the CPU be shielded.
>> >
>> > So have you seriously considered making the bridge ports that operate in
>> > 'dumb hub' mode have a pvid which isn't installed as a 'self' entry on
>> > the bridge device?
>> 
>> Just so there's no confusion, you mean something like...
>> 
>>     ip link add dev br0 type bridge vlan_filtering 1 vlan_default_pvid 0
>> 
>>     for p in swp0 swp1; do
>>         ip link set dev $p master br0
>>         bridge vlan add dev $p vid 1 pvid untagged
>>     done
>> 
>> ... right?
>> 
>> In that case, the repeater is no longer transparent with respect to
>> tagged packets, which the application requires.
>
> If you are sure that there exists one VLAN ID which is never used (like
> 4094), what you could do is you could set the port pvids to that VID
> instead of 1, and add the entire VLAN_N_VID range sans that VID in the
> membership list of the two ports, as egress-tagged.

Yeah, I've thought about this too. If the device's only role is to act
as a repeater, then you can get away with it. But you will have consumed
all rows in the VTU and half of the rows in the ATU (we add an entry for
the broadcast address in every FID). So if you want to use your other
ports for regular bridging you're left with a very limited feature set.

> This is 'practical transparency' - if true transparency is required then
> yes, this doesn't work.
>
>> >> You might be tempted to solve this using flooding filters of the
>> >> switch's CPU port, but these go out the window if you have another
>> >> bridge configured, that requires that flooding of unknown traffic is
>> >> enabled.
>> >
>> > Not if CPU flooding can be managed on a per-user-port basis.
>> 
>> True, but we aren't lucky enough to have hardware that can do that :)
>> 
>> >> Another application is to create a similar setup, but with three ports,
>> >> and have the third one be used as a TAP.
>> >
>> > Could you expand more on this use case?
>> 
>> Its just the standard use-case for a TAP really. You have some link of
>> interest that you want to snoop, but for some reason there is no way of
>> getting a PCAP from the station on either side:
>> 
>>    Link of interest
>>           |
>> .-------. v .-------.
>> | Alice +---+  Bob  |
>> '-------'   '-------'
>> 
>> So you insert a hub in the middle, and listen on a third port:
>> 
>> .-------.   .-----.   .-------.
>> | Alice +---+ TAP +---+  Bob  |
>> '-------'   '--+--'   '-------'
>>                |
>>  PC running tcpdump/wireshark
>> 
>> The nice thing about being able to set this up in Linux is that if your
>> hardware comes with a mix of media types, you can dynamically create the
>> TAP for the job at hand. E.g. if Alice and Bob are communicating over a
>> fiber, but your PC only has a copper interface, you can bridge to fiber
>> ports with one copper; if you need to monitor a copper link 5min later,
>> you just swap out the fiber ports for two copper ports.
>> 
>> >> >> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
>> >> >
>> >> > I don't believe this tag has much value since it was presumably carried 
>> >> > over from an internal review. Might be worth adding it publicly now, though.
>> >> 
>> >> I think Mattias meant to replicate this tag on each individual
>> >> patch. Aside from that though, are you saying that a tag is never valid
>> >> unless there is a public message on the list from the signee? Makes
>> >> sense I suppose. Anyway, I will send separate tags for this series.

Vladimir Oltean March 18, 2022, 12:44 p.m. UTC | #7

On Fri, Mar 18, 2022 at 01:09:08PM +0100, Tobias Waldekranz wrote:
> >> > So have you seriously considered making the bridge ports that operate in
> >> > 'dumb hub' mode have a pvid which isn't installed as a 'self' entry on
> >> > the bridge device?
> >> 
> >> Just so there's no confusion, you mean something like...
> >> 
> >>     ip link add dev br0 type bridge vlan_filtering 1 vlan_default_pvid 0
> >> 
> >>     for p in swp0 swp1; do
> >>         ip link set dev $p master br0
> >>         bridge vlan add dev $p vid 1 pvid untagged
> >>     done
> >> 
> >> ... right?
> >> 
> >> In that case, the repeater is no longer transparent with respect to
> >> tagged packets, which the application requires.
> >
> > If you are sure that there exists one VLAN ID which is never used (like
> > 4094), what you could do is you could set the port pvids to that VID
> > instead of 1, and add the entire VLAN_N_VID range sans that VID in the
> > membership list of the two ports, as egress-tagged.
> 
> Yeah, I've thought about this too. If the device's only role is to act
> as a repeater, then you can get away with it. But you will have consumed
> all rows in the VTU and half of the rows in the ATU (we add an entry for
> the broadcast address in every FID). So if you want to use your other
> ports for regular bridging you're left with a very limited feature set.

But VLANs in other bridges would reuse the same FIDs, at least in the
current mv88e6xxx implementation with no FDB isolation, no? So even
though the VTU is maxed out, it wouldn't get 'more' maxed out.

As for the broadcast address needing to be present in the ATU, honestly
I don't know too much about that. I see that some switches have a
FloodBC bit, wouldn't that be useful?

> > This is 'practical transparency' - if true transparency is required then
> > yes, this doesn't work.
> >
> >> >> You might be tempted to solve this using flooding filters of the
> >> >> switch's CPU port, but these go out the window if you have another
> >> >> bridge configured, that requires that flooding of unknown traffic is
> >> >> enabled.
> >> >
> >> > Not if CPU flooding can be managed on a per-user-port basis.
> >> 
> >> True, but we aren't lucky enough to have hardware that can do that :)
> >> 
> >> >> Another application is to create a similar setup, but with three ports,
> >> >> and have the third one be used as a TAP.
> >> >
> >> > Could you expand more on this use case?
> >> 
> >> Its just the standard use-case for a TAP really. You have some link of
> >> interest that you want to snoop, but for some reason there is no way of
> >> getting a PCAP from the station on either side:
> >> 
> >>    Link of interest
> >>           |
> >> .-------. v .-------.
> >> | Alice +---+  Bob  |
> >> '-------'   '-------'
> >> 
> >> So you insert a hub in the middle, and listen on a third port:
> >> 
> >> .-------.   .-----.   .-------.
> >> | Alice +---+ TAP +---+  Bob  |
> >> '-------'   '--+--'   '-------'
> >>                |
> >>  PC running tcpdump/wireshark
> >> 
> >> The nice thing about being able to set this up in Linux is that if your
> >> hardware comes with a mix of media types, you can dynamically create the
> >> TAP for the job at hand. E.g. if Alice and Bob are communicating over a
> >> fiber, but your PC only has a copper interface, you can bridge to fiber
> >> ports with one copper; if you need to monitor a copper link 5min later,
> >> you just swap out the fiber ports for two copper ports.
> >> 
> >> >> >> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
> >> >> >
> >> >> > I don't believe this tag has much value since it was presumably carried 
> >> >> > over from an internal review. Might be worth adding it publicly now, though.
> >> >> 
> >> >> I think Mattias meant to replicate this tag on each individual
> >> >> patch. Aside from that though, are you saying that a tag is never valid
> >> >> unless there is a public message on the list from the signee? Makes
> >> >> sense I suppose. Anyway, I will send separate tags for this series.

Tobias Waldekranz March 18, 2022, 4:03 p.m. UTC | #8

On Fri, Mar 18, 2022 at 14:44, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Fri, Mar 18, 2022 at 01:09:08PM +0100, Tobias Waldekranz wrote:
>> >> > So have you seriously considered making the bridge ports that operate in
>> >> > 'dumb hub' mode have a pvid which isn't installed as a 'self' entry on
>> >> > the bridge device?
>> >> 
>> >> Just so there's no confusion, you mean something like...
>> >> 
>> >>     ip link add dev br0 type bridge vlan_filtering 1 vlan_default_pvid 0
>> >> 
>> >>     for p in swp0 swp1; do
>> >>         ip link set dev $p master br0
>> >>         bridge vlan add dev $p vid 1 pvid untagged
>> >>     done
>> >> 
>> >> ... right?
>> >> 
>> >> In that case, the repeater is no longer transparent with respect to
>> >> tagged packets, which the application requires.
>> >
>> > If you are sure that there exists one VLAN ID which is never used (like
>> > 4094), what you could do is you could set the port pvids to that VID
>> > instead of 1, and add the entire VLAN_N_VID range sans that VID in the
>> > membership list of the two ports, as egress-tagged.
>> 
>> Yeah, I've thought about this too. If the device's only role is to act
>> as a repeater, then you can get away with it. But you will have consumed
>> all rows in the VTU and half of the rows in the ATU (we add an entry for
>> the broadcast address in every FID). So if you want to use your other
>> ports for regular bridging you're left with a very limited feature set.
>
> But VLANs in other bridges would reuse the same FIDs, at least in the
> current mv88e6xxx implementation with no FDB isolation, no? So even
> though the VTU is maxed out, it wouldn't get 'more' maxed out.

I'm pretty sure that mv88e6xxx won't allow the same VID to be configured
on multiple bridges. A quick test seems to support that:

   root@coronet:~# ip link add dev br0 type bridge vlan_filtering 1
   root@coronet:~# ip link add dev br1 type bridge vlan_filtering 1
   root@coronet:~# ip link set dev br0 up
   root@coronet:~# ip link set dev br1 up
   root@coronet:~# ip link set dev swp1 master br0
   root@coronet:~# ip link set dev swp2 master br1
   RTNETLINK answers: Operation not supported

> As for the broadcast address needing to be present in the ATU, honestly
> I don't know too much about that. I see that some switches have a
> FloodBC bit, wouldn't that be useful?

mv88e6xxx can handle broadcast in two ways:

1. Always flood broadcast, independent of all other settings.

2. Treat broadcast as multicast, only allow flooding if unknown
   multicast is allowed on the port, or if there's an entry in the ATU
   (making it known) that allows it.

The kernel driver uses (2), because that is the only way (I know of)
that we can support the BCAST_FLOOD flag. In order to make BCAST_FLOOD
independent of MCAST_FLOOD, we have to load an entry allowing bc to
egress on all ports by default. De Morgan comes back to guide us once
more :)

>> > This is 'practical transparency' - if true transparency is required then
>> > yes, this doesn't work.
>> >
>> >> >> You might be tempted to solve this using flooding filters of the
>> >> >> switch's CPU port, but these go out the window if you have another
>> >> >> bridge configured, that requires that flooding of unknown traffic is
>> >> >> enabled.
>> >> >
>> >> > Not if CPU flooding can be managed on a per-user-port basis.
>> >> 
>> >> True, but we aren't lucky enough to have hardware that can do that :)
>> >> 
>> >> >> Another application is to create a similar setup, but with three ports,
>> >> >> and have the third one be used as a TAP.
>> >> >
>> >> > Could you expand more on this use case?
>> >> 
>> >> Its just the standard use-case for a TAP really. You have some link of
>> >> interest that you want to snoop, but for some reason there is no way of
>> >> getting a PCAP from the station on either side:
>> >> 
>> >>    Link of interest
>> >>           |
>> >> .-------. v .-------.
>> >> | Alice +---+  Bob  |
>> >> '-------'   '-------'
>> >> 
>> >> So you insert a hub in the middle, and listen on a third port:
>> >> 
>> >> .-------.   .-----.   .-------.
>> >> | Alice +---+ TAP +---+  Bob  |
>> >> '-------'   '--+--'   '-------'
>> >>                |
>> >>  PC running tcpdump/wireshark
>> >> 
>> >> The nice thing about being able to set this up in Linux is that if your
>> >> hardware comes with a mix of media types, you can dynamically create the
>> >> TAP for the job at hand. E.g. if Alice and Bob are communicating over a
>> >> fiber, but your PC only has a copper interface, you can bridge to fiber
>> >> ports with one copper; if you need to monitor a copper link 5min later,
>> >> you just swap out the fiber ports for two copper ports.
>> >> 
>> >> >> >> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
>> >> >> >
>> >> >> > I don't believe this tag has much value since it was presumably carried 
>> >> >> > over from an internal review. Might be worth adding it publicly now, though.
>> >> >> 
>> >> >> I think Mattias meant to replicate this tag on each individual
>> >> >> patch. Aside from that though, are you saying that a tag is never valid
>> >> >> unless there is a public message on the list from the signee? Makes
>> >> >> sense I suppose. Anyway, I will send separate tags for this series.

Vladimir Oltean March 18, 2022, 4:26 p.m. UTC | #9

On Fri, Mar 18, 2022 at 05:03:31PM +0100, Tobias Waldekranz wrote:
> On Fri, Mar 18, 2022 at 14:44, Vladimir Oltean <olteanv@gmail.com> wrote:
> > On Fri, Mar 18, 2022 at 01:09:08PM +0100, Tobias Waldekranz wrote:
> >> >> > So have you seriously considered making the bridge ports that operate in
> >> >> > 'dumb hub' mode have a pvid which isn't installed as a 'self' entry on
> >> >> > the bridge device?
> >> >> 
> >> >> Just so there's no confusion, you mean something like...
> >> >> 
> >> >>     ip link add dev br0 type bridge vlan_filtering 1 vlan_default_pvid 0
> >> >> 
> >> >>     for p in swp0 swp1; do
> >> >>         ip link set dev $p master br0
> >> >>         bridge vlan add dev $p vid 1 pvid untagged
> >> >>     done
> >> >> 
> >> >> ... right?
> >> >> 
> >> >> In that case, the repeater is no longer transparent with respect to
> >> >> tagged packets, which the application requires.
> >> >
> >> > If you are sure that there exists one VLAN ID which is never used (like
> >> > 4094), what you could do is you could set the port pvids to that VID
> >> > instead of 1, and add the entire VLAN_N_VID range sans that VID in the
> >> > membership list of the two ports, as egress-tagged.
> >> 
> >> Yeah, I've thought about this too. If the device's only role is to act
> >> as a repeater, then you can get away with it. But you will have consumed
> >> all rows in the VTU and half of the rows in the ATU (we add an entry for
> >> the broadcast address in every FID). So if you want to use your other
> >> ports for regular bridging you're left with a very limited feature set.
> >
> > But VLANs in other bridges would reuse the same FIDs, at least in the
> > current mv88e6xxx implementation with no FDB isolation, no? So even
> > though the VTU is maxed out, it wouldn't get 'more' maxed out.
> 
> I'm pretty sure that mv88e6xxx won't allow the same VID to be configured
> on multiple bridges. A quick test seems to support that:
> 
>    root@coronet:~# ip link add dev br0 type bridge vlan_filtering 1
>    root@coronet:~# ip link add dev br1 type bridge vlan_filtering 1
>    root@coronet:~# ip link set dev br0 up
>    root@coronet:~# ip link set dev br1 up
>    root@coronet:~# ip link set dev swp1 master br0
>    root@coronet:~# ip link set dev swp2 master br1
>    RTNETLINK answers: Operation not supported

Ok, I forgot about mv88e6xxx_port_check_hw_vlan() even though I was
there on multiple occasions. Thanks for reminding me.

> > As for the broadcast address needing to be present in the ATU, honestly
> > I don't know too much about that. I see that some switches have a
> > FloodBC bit, wouldn't that be useful?
> 
> mv88e6xxx can handle broadcast in two ways:
> 
> 1. Always flood broadcast, independent of all other settings.
> 
> 2. Treat broadcast as multicast, only allow flooding if unknown
>    multicast is allowed on the port, or if there's an entry in the ATU
>    (making it known) that allows it.
> 
> The kernel driver uses (2), because that is the only way (I know of)
> that we can support the BCAST_FLOOD flag. In order to make BCAST_FLOOD
> independent of MCAST_FLOOD, we have to load an entry allowing bc to
> egress on all ports by default. De Morgan comes back to guide us once
> more :)

Ok, so this alternative falls flat on its face due to excessive resource
usage. Next...

Does your application require bridged foreign interfaces with the other
switch ports? In other words, is there a reason to keep the CPU port in
the flood domain of the switch, other than current software limitations?

[net-next,0/3] bridge: dsa: switchdev: mv88e6xxx: Implement local_receive bridge flag

Message

Comments