Message ID | 20230312202424.1495439-3-horatiu.vultur@microchip.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: lan966x: Improve TX/RX of frames from/to CPU | expand |
Context | Check | Description |
---|---|---|
netdev/series_format | success | Posting correctly formatted |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/fixes_present | success | Fixes tag not required for -next series |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 18 this patch: 18 |
netdev/cc_maintainers | success | CCed 7 of 7 maintainers |
netdev/build_clang | success | Errors and warnings before: 18 this patch: 18 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/deprecated_api | success | None detected |
netdev/check_selftest | success | No net selftest shell script |
netdev/verify_fixes | success | No Fixes tag |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 18 this patch: 18 |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 133 lines checked |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
From: Horatiu Vultur > Sent: 12 March 2023 20:24 > > When a frame is injected from CPU, it is required to create an IFH(Inter > frame header) which sits in front of the frame that is transmitted. > This IFH, contains different fields like destination port, to bypass the > analyzer, priotity, etc. Lan966x it is using packing library to set and > get the fields of this IFH. But this seems to be an expensive > operations. > If this is changed with a simpler implementation, the RX will be > improved with ~5Mbit while on the TX is a much bigger improvement as it > is required to set more fields. Below are the numbers for TX. ... > +static void lan966x_ifh_set(u8 *ifh, size_t val, size_t pos, size_t length) > +{ > + u32 v = 0; > + > + for (int i = 0; i < length ; i++) { > + int j = pos + i; > + int k = j % 8; > + > + if (i == 0 || k == 0) > + v = ifh[IFH_LEN_BYTES - (j / 8) - 1]; > + > + if (val & (1 << i)) > + v |= (1 << k); > + > + if (i == (length - 1) || k == 7) > + ifh[IFH_LEN_BYTES - (j / 8) - 1] = v; > + } > +} > + It has to be possible to do much better that that. Given that 'pos' and 'length' are always constants it looks like each call should reduce to (something like): ifh[k] |= val << n; ifk[k + 1] |= val >> (8 - n); ... It might be that the compiler manages to do this, but I doubt it. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Mon, 13 Mar 2023 17:04:11 +0000 David Laight wrote: > It has to be possible to do much better that that. > Given that 'pos' and 'length' are always constants it looks like > each call should reduce to (something like): > ifh[k] |= val << n; > ifk[k + 1] |= val >> (8 - n); > ... > It might be that the compiler manages to do this, but I doubt it. Agreed, going bit-by-bit seems overly cautious.
The 03/13/2023 17:04, David Laight wrote: > > From: Horatiu Vultur > > Sent: 12 March 2023 20:24 > > > > When a frame is injected from CPU, it is required to create an IFH(Inter > > frame header) which sits in front of the frame that is transmitted. > > This IFH, contains different fields like destination port, to bypass the > > analyzer, priotity, etc. Lan966x it is using packing library to set and > > get the fields of this IFH. But this seems to be an expensive > > operations. > > If this is changed with a simpler implementation, the RX will be > > improved with ~5Mbit while on the TX is a much bigger improvement as it > > is required to set more fields. Below are the numbers for TX. > ... > > +static void lan966x_ifh_set(u8 *ifh, size_t val, size_t pos, size_t length) > > +{ > > + u32 v = 0; > > + > > + for (int i = 0; i < length ; i++) { > > + int j = pos + i; > > + int k = j % 8; > > + > > + if (i == 0 || k == 0) > > + v = ifh[IFH_LEN_BYTES - (j / 8) - 1]; > > + > > + if (val & (1 << i)) > > + v |= (1 << k); > > + > > + if (i == (length - 1) || k == 7) > > + ifh[IFH_LEN_BYTES - (j / 8) - 1] = v; > > + } > > +} > > + > > It has to be possible to do much better that that. > Given that 'pos' and 'length' are always constants it looks like > each call should reduce to (something like): > ifh[k] |= val << n; > ifk[k + 1] |= val >> (8 - n); > ... > It might be that the compiler manages to do this, but I doubt it. Thanks for the review. I will update this in the next version. Do you think it is worth updating the code in lan966x_ifh_get to use byte access and not to read each bit individually? As there is no much improvement on the RX side that is using lan966x_ifh_get. > > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) >
diff --git a/drivers/net/ethernet/microchip/lan966x/Kconfig b/drivers/net/ethernet/microchip/lan966x/Kconfig index 8bcd60f17d6d3..571e6d4da1e9d 100644 --- a/drivers/net/ethernet/microchip/lan966x/Kconfig +++ b/drivers/net/ethernet/microchip/lan966x/Kconfig @@ -6,7 +6,6 @@ config LAN966X_SWITCH depends on NET_SWITCHDEV depends on BRIDGE || BRIDGE=n select PHYLINK - select PACKING select PAGE_POOL select VCAP help diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c index 4584a78c6ecbd..9134716b62a55 100644 --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c @@ -7,7 +7,6 @@ #include <linux/ip.h> #include <linux/of_platform.h> #include <linux/of_net.h> -#include <linux/packing.h> #include <linux/phy/phy.h> #include <linux/reset.h> #include <net/addrconf.h> @@ -305,46 +304,58 @@ static int lan966x_port_ifh_xmit(struct sk_buff *skb, return NETDEV_TX_BUSY; } +static void lan966x_ifh_set(u8 *ifh, size_t val, size_t pos, size_t length) +{ + u32 v = 0; + + for (int i = 0; i < length ; i++) { + int j = pos + i; + int k = j % 8; + + if (i == 0 || k == 0) + v = ifh[IFH_LEN_BYTES - (j / 8) - 1]; + + if (val & (1 << i)) + v |= (1 << k); + + if (i == (length - 1) || k == 7) + ifh[IFH_LEN_BYTES - (j / 8) - 1] = v; + } +} + void lan966x_ifh_set_bypass(void *ifh, u64 bypass) { - packing(ifh, &bypass, IFH_POS_BYPASS + IFH_WID_BYPASS - 1, - IFH_POS_BYPASS, IFH_LEN * 4, PACK, 0); + lan966x_ifh_set(ifh, bypass, IFH_POS_BYPASS, IFH_WID_BYPASS); } -void lan966x_ifh_set_port(void *ifh, u64 bypass) +void lan966x_ifh_set_port(void *ifh, u64 port) { - packing(ifh, &bypass, IFH_POS_DSTS + IFH_WID_DSTS - 1, - IFH_POS_DSTS, IFH_LEN * 4, PACK, 0); + lan966x_ifh_set(ifh, port, IFH_POS_DSTS, IFH_WID_DSTS); } -static void lan966x_ifh_set_qos_class(void *ifh, u64 bypass) +static void lan966x_ifh_set_qos_class(void *ifh, u64 qos) { - packing(ifh, &bypass, IFH_POS_QOS_CLASS + IFH_WID_QOS_CLASS - 1, - IFH_POS_QOS_CLASS, IFH_LEN * 4, PACK, 0); + lan966x_ifh_set(ifh, qos, IFH_POS_QOS_CLASS, IFH_WID_QOS_CLASS); } -static void lan966x_ifh_set_ipv(void *ifh, u64 bypass) +static void lan966x_ifh_set_ipv(void *ifh, u64 ipv) { - packing(ifh, &bypass, IFH_POS_IPV + IFH_WID_IPV - 1, - IFH_POS_IPV, IFH_LEN * 4, PACK, 0); + lan966x_ifh_set(ifh, ipv, IFH_POS_IPV, IFH_WID_IPV); } static void lan966x_ifh_set_vid(void *ifh, u64 vid) { - packing(ifh, &vid, IFH_POS_TCI + IFH_WID_TCI - 1, - IFH_POS_TCI, IFH_LEN * 4, PACK, 0); + lan966x_ifh_set(ifh, vid, IFH_POS_TCI, IFH_WID_TCI); } static void lan966x_ifh_set_rew_op(void *ifh, u64 rew_op) { - packing(ifh, &rew_op, IFH_POS_REW_CMD + IFH_WID_REW_CMD - 1, - IFH_POS_REW_CMD, IFH_LEN * 4, PACK, 0); + lan966x_ifh_set(ifh, rew_op, IFH_POS_REW_CMD, IFH_WID_REW_CMD); } static void lan966x_ifh_set_timestamp(void *ifh, u64 timestamp) { - packing(ifh, ×tamp, IFH_POS_TIMESTAMP + IFH_WID_TIMESTAMP - 1, - IFH_POS_TIMESTAMP, IFH_LEN * 4, PACK, 0); + lan966x_ifh_set(ifh, timestamp, IFH_POS_TIMESTAMP, IFH_WID_TIMESTAMP); } static netdev_tx_t lan966x_port_xmit(struct sk_buff *skb, @@ -582,22 +593,38 @@ static int lan966x_rx_frame_word(struct lan966x *lan966x, u8 grp, u32 *rval) } } +static u64 lan966x_ifh_get(u8 *ifh, size_t pos, size_t length) +{ + u64 val = 0; + u8 v; + + for (int i = 0; i < length ; i++) { + int j = pos + i; + int k = j % 8; + + if (i == 0 || k == 0) + v = ifh[IFH_LEN_BYTES - (j / 8) - 1]; + + if (v & (1 << k)) + val |= (1 << i); + } + + return val; +} + void lan966x_ifh_get_src_port(void *ifh, u64 *src_port) { - packing(ifh, src_port, IFH_POS_SRCPORT + IFH_WID_SRCPORT - 1, - IFH_POS_SRCPORT, IFH_LEN * 4, UNPACK, 0); + *src_port = lan966x_ifh_get(ifh, IFH_POS_SRCPORT, IFH_WID_SRCPORT); } static void lan966x_ifh_get_len(void *ifh, u64 *len) { - packing(ifh, len, IFH_POS_LEN + IFH_WID_LEN - 1, - IFH_POS_LEN, IFH_LEN * 4, UNPACK, 0); + *len = lan966x_ifh_get(ifh, IFH_POS_LEN, IFH_WID_LEN); } void lan966x_ifh_get_timestamp(void *ifh, u64 *timestamp) { - packing(ifh, timestamp, IFH_POS_TIMESTAMP + IFH_WID_TIMESTAMP - 1, - IFH_POS_TIMESTAMP, IFH_LEN * 4, UNPACK, 0); + *timestamp = lan966x_ifh_get(ifh, IFH_POS_TIMESTAMP, IFH_WID_TIMESTAMP); } static irqreturn_t lan966x_xtr_irq_handler(int irq, void *args)
When a frame is injected from CPU, it is required to create an IFH(Inter frame header) which sits in front of the frame that is transmitted. This IFH, contains different fields like destination port, to bypass the analyzer, priotity, etc. Lan966x it is using packing library to set and get the fields of this IFH. But this seems to be an expensive operations. If this is changed with a simpler implementation, the RX will be improved with ~5Mbit while on the TX is a much bigger improvement as it is required to set more fields. Below are the numbers for TX. Before: [ 5] 0.00-10.02 sec 439 MBytes 367 Mbits/sec 0 sender After: [ 5] 0.00-10.00 sec 563 MBytes 472 Mbits/sec 0 sender Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> --- .../net/ethernet/microchip/lan966x/Kconfig | 1 - .../ethernet/microchip/lan966x/lan966x_main.c | 75 +++++++++++++------ 2 files changed, 51 insertions(+), 25 deletions(-)