Message ID | 20241013185509.4430-1-ericwouds@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | bridge-fastpath and related improvements | expand |
On 13/10/2024 21:54, Eric Woudstra wrote: > This patchset makes it possible to set up a (hardware offloaded) fastpath > for bridged interfaces. > The subject and this sentence are misleading, you're talking about netfilter bridge fastpath offload, please mention it in both places. When you just say bridge fast path, I think of the software fast path. > To set up the fastpath with offloading, add this extra flowtable: > > table bridge filter { > flowtable fb { > hook ingress priority filter > devices = { lan0, lan1, lan2, lan3, lan4, wlan0, wlan1 } > flags offload > } > chain forward { > type filter hook forward priority filter; policy accept; > ct state established flow add @fb > } > } > > Creating a separate fastpath for bridges. > > forward fastpath bypass > .----------------------------------------. > / \ > | IP - forwarding | > | / \ v > | / wan ... > | / > | | > | | > | brlan.1 > | | > | +-------------------------------+ > | | vlan 1 | > | | | > | | brlan (vlan-filtering) | > | +---------------+ | > | | DSA-SWITCH | | > | | | vlan 1 | > | | | to | > | | vlan 1 | untagged | > | +---------------+---------------+ > . / \ > ------>lan0 wlan1 > . ^ ^ > . | | > . \_________________/ > . bridge fastpath bypass > . > ^ > vlan 1 tagged packets > > To have the ability to handle xmit direct with outgoing encaps in the > bridge fastpass bypass, we need to be able to handle them without going > through vlan/pppoe devices. So I've applied, amended and squashed wenxu's > patchset. This patch also makes it possible to egress from vlan-filtering > brlan to lan0 with vlan tagged packets, if the bridge master port is doing > the vlan tagging, instead of the vlan-device. Without this patch, this is > not possible in the bridge-fastpath and also not in the forward-fastpath, > as seen in the figure above. > > There are also some more fixes for filling in the forward path. These > fixes also apply to for the forward-fastpath. They include handling > DEV_PATH_MTK_WDMA in nft_dev_path_info() and avoiding > DEV_PATH_BR_VLAN_UNTAG_HW for bridges with ports that use dsa. > > Conntrack bridge only tracks untagged and 802.1q. To make the bridge > fastpath experience more similar to the forward fastpath experience, > I've added double vlan, pppoe and pppoe-in-q tagged packets to bridge > conntrack and to bridge filter chain. > > Eric Woudstra (12): > netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit > direct > netfilter: bridge: Add conntrack double vlan and pppoe > netfilter: nft_chain_filter: Add bridge double vlan and pppoe > bridge: br_vlan_fill_forward_path_pvid: Add port to port > bridge: br_fill_forward_path add port to port > net: core: dev: Add dev_fill_bridge_path() > netfilter :nf_flow_table_offload: Add nf_flow_rule_bridge() > netfilter: nf_flow_table_inet: Add nf_flowtable_type flowtable_bridge > netfilter: nft_flow_offload: Add NFPROTO_BRIDGE to validate > netfilter: nft_flow_offload: Add DEV_PATH_MTK_WDMA to > nft_dev_path_info() > bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa > netfilter: nft_flow_offload: Add bridgeflow to nft_flow_offload_eval() > > include/linux/netdevice.h | 2 + > include/net/netfilter/nf_flow_table.h | 3 + > net/bridge/br_device.c | 20 ++- > net/bridge/br_private.h | 2 + > net/bridge/br_vlan.c | 24 +++- > net/bridge/netfilter/nf_conntrack_bridge.c | 86 ++++++++++-- > net/core/dev.c | 77 +++++++++-- > net/netfilter/nf_flow_table_inet.c | 13 ++ > net/netfilter/nf_flow_table_ip.c | 96 ++++++++++++- > net/netfilter/nf_flow_table_offload.c | 13 ++ > net/netfilter/nft_chain_filter.c | 20 ++- > net/netfilter/nft_flow_offload.c | 154 +++++++++++++++++++-- > 12 files changed, 463 insertions(+), 47 deletions(-) >
On 10/14/24 8:35 AM, Nikolay Aleksandrov wrote: > On 13/10/2024 21:54, Eric Woudstra wrote: >> This patchset makes it possible to set up a (hardware offloaded) fastpath >> for bridged interfaces. >> > > The subject and this sentence are misleading, you're talking about netfilter bridge > fastpath offload, please mention it in both places. When you just say bridge fast > path, I think of the software fast path. > Hello Nikolay, It would be no problem for me to change the subject and body, if you think that is better. The thing is, these patches actually make it possible to set up a fully functional software fastpath between bridged interfaces. Only after the software fastpath is set up and functional, it can be offloaded, which happens to by my personal motivation to write this patch-set. If the offload flag is set in the flowtable, the software fastpath will be offloaded. But in this patch-set, there is nothing that changes anything there, the existing code is used unchanged. >> To set up the fastpath with offloading, add this extra flowtable: >> >> table bridge filter { >> flowtable fb { >> hook ingress priority filter >> devices = { lan0, lan1, lan2, lan3, lan4, wlan0, wlan1 } >> flags offload >> } >> chain forward { >> type filter hook forward priority filter; policy accept; >> ct state established flow add @fb >> } >> } >> >> Creating a separate fastpath for bridges. >> >> forward fastpath bypass >> .----------------------------------------. >> / \ >> | IP - forwarding | >> | / \ v >> | / wan ... >> | / >> | | >> | | >> | brlan.1 >> | | >> | +-------------------------------+ >> | | vlan 1 | >> | | | >> | | brlan (vlan-filtering) | >> | +---------------+ | >> | | DSA-SWITCH | | >> | | | vlan 1 | >> | | | to | >> | | vlan 1 | untagged | >> | +---------------+---------------+ >> . / \ >> ------>lan0 wlan1 >> . ^ ^ >> . | | >> . \_________________/ >> . bridge fastpath bypass >> . >> ^ >> vlan 1 tagged packets >> >> To have the ability to handle xmit direct with outgoing encaps in the >> bridge fastpass bypass, we need to be able to handle them without going >> through vlan/pppoe devices. So I've applied, amended and squashed wenxu's >> patchset. This patch also makes it possible to egress from vlan-filtering >> brlan to lan0 with vlan tagged packets, if the bridge master port is doing >> the vlan tagging, instead of the vlan-device. Without this patch, this is >> not possible in the bridge-fastpath and also not in the forward-fastpath, >> as seen in the figure above. >> >> There are also some more fixes for filling in the forward path. These >> fixes also apply to for the forward-fastpath. They include handling >> DEV_PATH_MTK_WDMA in nft_dev_path_info() and avoiding >> DEV_PATH_BR_VLAN_UNTAG_HW for bridges with ports that use dsa. >> >> Conntrack bridge only tracks untagged and 802.1q. To make the bridge >> fastpath experience more similar to the forward fastpath experience, >> I've added double vlan, pppoe and pppoe-in-q tagged packets to bridge >> conntrack and to bridge filter chain. >> >> Eric Woudstra (12): >> netfilter: nf_flow_table_offload: Add nf_flow_encap_push() for xmit >> direct >> netfilter: bridge: Add conntrack double vlan and pppoe >> netfilter: nft_chain_filter: Add bridge double vlan and pppoe >> bridge: br_vlan_fill_forward_path_pvid: Add port to port >> bridge: br_fill_forward_path add port to port >> net: core: dev: Add dev_fill_bridge_path() >> netfilter :nf_flow_table_offload: Add nf_flow_rule_bridge() >> netfilter: nf_flow_table_inet: Add nf_flowtable_type flowtable_bridge >> netfilter: nft_flow_offload: Add NFPROTO_BRIDGE to validate >> netfilter: nft_flow_offload: Add DEV_PATH_MTK_WDMA to >> nft_dev_path_info() >> bridge: br_vlan_fill_forward_path_mode no _UNTAG_HW for dsa >> netfilter: nft_flow_offload: Add bridgeflow to nft_flow_offload_eval() >> >> include/linux/netdevice.h | 2 + >> include/net/netfilter/nf_flow_table.h | 3 + >> net/bridge/br_device.c | 20 ++- >> net/bridge/br_private.h | 2 + >> net/bridge/br_vlan.c | 24 +++- >> net/bridge/netfilter/nf_conntrack_bridge.c | 86 ++++++++++-- >> net/core/dev.c | 77 +++++++++-- >> net/netfilter/nf_flow_table_inet.c | 13 ++ >> net/netfilter/nf_flow_table_ip.c | 96 ++++++++++++- >> net/netfilter/nf_flow_table_offload.c | 13 ++ >> net/netfilter/nft_chain_filter.c | 20 ++- >> net/netfilter/nft_flow_offload.c | 154 +++++++++++++++++++-- >> 12 files changed, 463 insertions(+), 47 deletions(-) >> >
Hi Eric, On 14.10.24 20:29, Eric Woudstra wrote: > It would be no problem for me to change the subject and body, if you > think that is better. > > The thing is, these patches actually make it possible to set up a fully > functional software fastpath between bridged interfaces. Only after the > software fastpath is set up and functional, it can be offloaded, which > happens to by my personal motivation to write this patch-set. > > If the offload flag is set in the flowtable, the software fastpath will > be offloaded. But in this patch-set, there is nothing that changes > anything there, the existing code is used unchanged. FWIW, a while back, I also wanted to add a software fast path for the bridge layer to the kernel, also with the intention of using it for hardware offload. It wasn't accepted back then, because (if I remember correctly) people didn't want any extra complexity in the network stack to make the bridge layer faster. Because of that, I created this piece of software: https://github.com/nbd168/bridger It uses an eBPF TC classifier for discovering flows and handling the software fast path, and also creates hardware offload rules for flows. With that, hardware offloading for bridged LAN->WLAN flows is fully supported on MediaTek hardware with upstream kernels. - Felix
On 10/15/24 2:16 PM, Felix Fietkau wrote: > Hi Eric, > > On 14.10.24 20:29, Eric Woudstra wrote: >> It would be no problem for me to change the subject and body, if you >> think that is better. >> >> The thing is, these patches actually make it possible to set up a fully >> functional software fastpath between bridged interfaces. Only after the >> software fastpath is set up and functional, it can be offloaded, which >> happens to by my personal motivation to write this patch-set. >> >> If the offload flag is set in the flowtable, the software fastpath will >> be offloaded. But in this patch-set, there is nothing that changes >> anything there, the existing code is used unchanged. > > FWIW, a while back, I also wanted to add a software fast path for the > bridge layer to the kernel, also with the intention of using it for > hardware offload. It wasn't accepted back then, because (if I remember > correctly) people didn't want any extra complexity in the network stack > to make the bridge layer faster. Hello Felix, I think this patch-set is a clear showcase it is not very complex at all. The core of making it possible only consists a few patches. Half of this patch-set involves improvements that also apply to the forward-fastpath. > Because of that, I created this piece of software: > https://github.com/nbd168/bridger > > It uses an eBPF TC classifier for discovering flows and handling the > software fast path, and also creates hardware offload rules for flows. > With that, hardware offloading for bridged LAN->WLAN flows is fully > supported on MediaTek hardware with upstream kernels. > > - Felix Thanks, I've seen that already. Nice piece of software, but I'm not running openwrt. I would like to see a solution implemented in the kernel, so any operating system can use it.
On 15.10.24 15:32, Eric Woudstra wrote: > > > On 10/15/24 2:16 PM, Felix Fietkau wrote: >> Hi Eric, >> >> On 14.10.24 20:29, Eric Woudstra wrote: >>> It would be no problem for me to change the subject and body, if you >>> think that is better. >>> >>> The thing is, these patches actually make it possible to set up a fully >>> functional software fastpath between bridged interfaces. Only after the >>> software fastpath is set up and functional, it can be offloaded, which >>> happens to by my personal motivation to write this patch-set. >>> >>> If the offload flag is set in the flowtable, the software fastpath will >>> be offloaded. But in this patch-set, there is nothing that changes >>> anything there, the existing code is used unchanged. >> >> FWIW, a while back, I also wanted to add a software fast path for the >> bridge layer to the kernel, also with the intention of using it for >> hardware offload. It wasn't accepted back then, because (if I remember >> correctly) people didn't want any extra complexity in the network stack >> to make the bridge layer faster. > > Hello Felix, > > I think this patch-set is a clear showcase it is not very complex at > all. The core of making it possible only consists a few patches. Half of > this patch-set involves improvements that also apply to the > forward-fastpath. It's definitely an interesting approach. How does it deal with devices roaming from one bridge port to another? I couldn't find that in the code. >> Because of that, I created this piece of software: >> https://github.com/nbd168/bridger >> >> It uses an eBPF TC classifier for discovering flows and handling the >> software fast path, and also creates hardware offload rules for flows. >> With that, hardware offloading for bridged LAN->WLAN flows is fully >> supported on MediaTek hardware with upstream kernels. >> >> - Felix > > Thanks, I've seen that already. Nice piece of software, but I'm not > running openwrt. I would like to see a solution implemented in the > kernel, so any operating system can use it. Makes sense. By the way, bridger can easily be built for non-OpenWrt systems too. The only library that's actually needed is libubox - that one is small and can be linked in statically. ubus support is fully optional and not necessary for standard cases. - Felix
On 10/15/24 9:44 PM, Felix Fietkau wrote: > On 15.10.24 15:32, Eric Woudstra wrote: >> >> >> On 10/15/24 2:16 PM, Felix Fietkau wrote: >>> Hi Eric, >>> >>> On 14.10.24 20:29, Eric Woudstra wrote: >>>> It would be no problem for me to change the subject and body, if you >>>> think that is better. >>>> >>>> The thing is, these patches actually make it possible to set up a fully >>>> functional software fastpath between bridged interfaces. Only after the >>>> software fastpath is set up and functional, it can be offloaded, which >>>> happens to by my personal motivation to write this patch-set. >>>> >>>> If the offload flag is set in the flowtable, the software fastpath will >>>> be offloaded. But in this patch-set, there is nothing that changes >>>> anything there, the existing code is used unchanged. >>> >>> FWIW, a while back, I also wanted to add a software fast path for the >>> bridge layer to the kernel, also with the intention of using it for >>> hardware offload. It wasn't accepted back then, because (if I remember >>> correctly) people didn't want any extra complexity in the network stack >>> to make the bridge layer faster. >> >> Hello Felix, >> >> I think this patch-set is a clear showcase it is not very complex at >> all. The core of making it possible only consists a few patches. Half of >> this patch-set involves improvements that also apply to the >> forward-fastpath. > > It's definitely an interesting approach. How does it deal with devices > roaming from one bridge port to another? I couldn't find that in the code. It is handled in the same manner when dealing with the forward-fastpath, with the aid of conntrack. If roaming is problematic, then it would be for both the forward-fastpath and the bridge-fastpath. I have a topic on the banana-pi forum about this patch-set, so I think long discussions about additional details we could have there, keeping the mailing list more clean. >>> Because of that, I created this piece of software: >>> https://github.com/nbd168/bridger >>> >>> It uses an eBPF TC classifier for discovering flows and handling the >>> software fast path, and also creates hardware offload rules for flows. >>> With that, hardware offloading for bridged LAN->WLAN flows is fully >>> supported on MediaTek hardware with upstream kernels. >>> >>> - Felix >> >> Thanks, I've seen that already. Nice piece of software, but I'm not >> running openwrt. I would like to see a solution implemented in the >> kernel, so any operating system can use it. > > Makes sense. By the way, bridger can easily be built for non-OpenWrt > systems too. The only library that's actually needed is libubox - that > one is small and can be linked in statically. ubus support is fully > optional and not necessary for standard cases. > > - Felix
On 16.10.24 17:59, Eric Woudstra wrote: > > > On 10/15/24 9:44 PM, Felix Fietkau wrote: >> On 15.10.24 15:32, Eric Woudstra wrote: >>> >>> >>> On 10/15/24 2:16 PM, Felix Fietkau wrote: >>>> Hi Eric, >>>> >>>> On 14.10.24 20:29, Eric Woudstra wrote: >>>>> It would be no problem for me to change the subject and body, if you >>>>> think that is better. >>>>> >>>>> The thing is, these patches actually make it possible to set up a fully >>>>> functional software fastpath between bridged interfaces. Only after the >>>>> software fastpath is set up and functional, it can be offloaded, which >>>>> happens to by my personal motivation to write this patch-set. >>>>> >>>>> If the offload flag is set in the flowtable, the software fastpath will >>>>> be offloaded. But in this patch-set, there is nothing that changes >>>>> anything there, the existing code is used unchanged. >>>> >>>> FWIW, a while back, I also wanted to add a software fast path for the >>>> bridge layer to the kernel, also with the intention of using it for >>>> hardware offload. It wasn't accepted back then, because (if I remember >>>> correctly) people didn't want any extra complexity in the network stack >>>> to make the bridge layer faster. >>> >>> Hello Felix, >>> >>> I think this patch-set is a clear showcase it is not very complex at >>> all. The core of making it possible only consists a few patches. Half of >>> this patch-set involves improvements that also apply to the >>> forward-fastpath. >> >> It's definitely an interesting approach. How does it deal with devices >> roaming from one bridge port to another? I couldn't find that in the code. > > It is handled in the same manner when dealing with the forward-fastpath, > with the aid of conntrack. If roaming is problematic, then it would be > for both the forward-fastpath and the bridge-fastpath. I have a topic on > the banana-pi forum about this patch-set, so I think long discussions > about additional details we could have there, keeping the mailing list > more clean. You forgot to include a link to the forum topic :) By the way, based on some reports that I received, I do believe that the existing forwarding fastpath also doesn't handle roaming properly. I just didn't have the time to properly look into that yet. - Felix
On Thu, Oct 17, 2024 at 11:17:09AM +0200, Felix Fietkau wrote: [...] > By the way, based on some reports that I received, I do believe that the > existing forwarding fastpath also doesn't handle roaming properly. > I just didn't have the time to properly look into that yet. I think it should work for the existing forwarding fastpath. - If computer roams from different port, packets follow classic path, then new flow entry is created. The flow old entry expires after 30 seconds. - If route is stale, flow entry is also removed. Maybe I am missing another possible scenario? Thanks.
On 17.10.24 14:39, Pablo Neira Ayuso wrote: > On Thu, Oct 17, 2024 at 11:17:09AM +0200, Felix Fietkau wrote: > [...] >> By the way, based on some reports that I received, I do believe that the >> existing forwarding fastpath also doesn't handle roaming properly. >> I just didn't have the time to properly look into that yet. > > I think it should work for the existing forwarding fastpath. > > - If computer roams from different port, packets follow classic path, > then new flow entry is created. The flow old entry expires after 30 > seconds. > - If route is stale, flow entry is also removed. > > Maybe I am missing another possible scenario? I'm mainly talking about the scenario where a computer moves to a different switch port on L2 only, so all routes remain the same. I haven't fully analyzed the issue, but I did find a few potential issues with what you're describing. 1. Since one direction remains the same when a computer roams, a new flow entry would probably fail to be added because of an existing entry in the flow hash table. 2. Even with that out of the way, the MTK hardware offload currently does not support matching the incoming switch/ethernet port. So even if we manage to add an updated entry, the old entry could still be kept alive by the hardware. The issues I found probably wouldn't cause connection hangs in pure L3 software flow offload, since it will use the bridge device for xmit instead of its members. But since hardware offload needs to redirect traffic to individual bridge ports, it could cause connection hangs with stale flow entries. There might be other issues as well, but this is what I could come up with on short notice. I think in order to properly address this, we should probably monitor for FDB / neigh entry changes somehow and clear affected flows. Routes do not become stale in my scenario, so something else is needed to trigger flow entry removal. - Felix
On Thu, Oct 17, 2024 at 07:06:51PM +0200, Felix Fietkau wrote: > On 17.10.24 14:39, Pablo Neira Ayuso wrote: > > On Thu, Oct 17, 2024 at 11:17:09AM +0200, Felix Fietkau wrote: > > [...] > > > By the way, based on some reports that I received, I do believe that the > > > existing forwarding fastpath also doesn't handle roaming properly. > > > I just didn't have the time to properly look into that yet. > > > > I think it should work for the existing forwarding fastpath. > > > > - If computer roams from different port, packets follow classic path, > > then new flow entry is created. The flow old entry expires after 30 > > seconds. > > - If route is stale, flow entry is also removed. > > > > Maybe I am missing another possible scenario? > > I'm mainly talking about the scenario where a computer moves to a different > switch port on L2 only, so all routes remain the same. > > I haven't fully analyzed the issue, but I did find a few potential issues > with what you're describing. > > 1. Since one direction remains the same when a computer roams, a new flow > entry would probably fail to be added because of an existing entry in the > flow hash table. I don't think so, hash includes iifidx. > 2. Even with that out of the way, the MTK hardware offload currently does > not support matching the incoming switch/ethernet port. > So even if we manage to add an updated entry, the old entry could still be > kept alive by the hardware. OK, that means probably driver needs to address the lack of iifidx in the matching by dealling with more than one single flow entry to point to one single hardware entry (refcounting?). > The issues I found probably wouldn't cause connection hangs in pure L3 > software flow offload, since it will use the bridge device for xmit instead > of its members. But since hardware offload needs to redirect traffic to > individual bridge ports, it could cause connection hangs with stale flow > entries. I would not expect a hang, packets will just flow over classic path for a little while for the computer that is roaming until the new flow entry is added. > There might be other issues as well, but this is what I could come up with > on short notice. I think in order to properly address this, we should > probably monitor for FDB / neigh entry changes somehow and clear affected > flows. > > Routes do not become stale in my scenario, so something else is needed to > trigger flow entry removal. Yes. In case letting expire stale flow entries with old iifidx is not enough some other mechanism could be required.
On 17.10.24 20:09, Pablo Neira Ayuso wrote: > On Thu, Oct 17, 2024 at 07:06:51PM +0200, Felix Fietkau wrote: >> On 17.10.24 14:39, Pablo Neira Ayuso wrote: >> > On Thu, Oct 17, 2024 at 11:17:09AM +0200, Felix Fietkau wrote: >> > [...] >> > > By the way, based on some reports that I received, I do believe that the >> > > existing forwarding fastpath also doesn't handle roaming properly. >> > > I just didn't have the time to properly look into that yet. >> > >> > I think it should work for the existing forwarding fastpath. >> > >> > - If computer roams from different port, packets follow classic path, >> > then new flow entry is created. The flow old entry expires after 30 >> > seconds. >> > - If route is stale, flow entry is also removed. >> > >> > Maybe I am missing another possible scenario? >> >> I'm mainly talking about the scenario where a computer moves to a different >> switch port on L2 only, so all routes remain the same. >> >> I haven't fully analyzed the issue, but I did find a few potential issues >> with what you're describing. >> >> 1. Since one direction remains the same when a computer roams, a new flow >> entry would probably fail to be added because of an existing entry in the >> flow hash table. > > I don't think so, hash includes iifidx. I'm talking about the side where the input ifindex remains the same, but the output interface doesn't. >> 2. Even with that out of the way, the MTK hardware offload currently does >> not support matching the incoming switch/ethernet port. >> So even if we manage to add an updated entry, the old entry could still be >> kept alive by the hardware. > > OK, that means probably driver needs to address the lack of iifidx in > the matching by dealling with more than one single flow entry to point > to one single hardware entry (refcounting?). If we have multiple colliding entries, I think a more reasonable behavior would be allowing the newer flow to override the older one. >> The issues I found probably wouldn't cause connection hangs in pure L3 >> software flow offload, since it will use the bridge device for xmit instead >> of its members. But since hardware offload needs to redirect traffic to >> individual bridge ports, it could cause connection hangs with stale flow >> entries. > > I would not expect a hang, packets will just flow over classic path > for a little while for the computer that is roaming until the new flow > entry is added. If the hardware still handles traffic, but redirects it to the wrong destination port, the connection will hang. - Felix